J-HowHuang opened a new pull request, #15368:
URL: https://github.com/apache/pinot/pull/15368
## Description
To include more info for consuming segments during table rebalance. It's
important because users will likely to experience unexpected Kafka topic
re-consumption. would be better to show relevant information so they can expect
that.
New info will be added under `rebalanceSummaryResult.segmentInfo`:
```
"consumingSegmentSummary": {
"numConsumingSegmentsToBeMoved": 5,
"maxBytesConsumingSegmentsToCatchUp": 81,
"bytesConsumingSegmentsToCatchUpPerServer": {
"Server_<redacted>": 375
}
}
```
## Changes
While doing rebalance summary, `TableRebalancer` will get
`consumingSegmentInfo` of the table using `ConsumingSegmentInfoReader`, which
is the same as the controller API `GET
/tables/{tableName}/consumingSegmentsInfo` uses, to get the latest offset of
each consuming segment w.r.t. its stream partition. It also gets segment ZK
metadata of these consuming segments to acquire their `startOffset`. Then the
size of each consuming segment is calculated and turned into the info in the
summary.
If any consuming segment failed to get `consumingSegmentInfo`,
`maxBytesConsumingSegmentsToCatchUp` and
`bytesConsumingSegmentsToCatchUpPerServer` will become `null`
## Tests
With `pinot-admin.sh QuickStart -type STREAM` and set 2 servers. Start the
cluster, and remove `DefaultTenant_REALTIME` tag for one server, then dry-run
rebalance the table `airlineStats_REALTIME` (10-partition Kafka stream):
```
{
"jobId": "",
"status": "DONE",
"description": "Dry-run mode",
"rebalanceSummaryResult": {
"serverInfo": {
...
}
},
"segmentInfo": {
"totalSegmentsToBeMoved": 5,
"maxSegmentsAddedToASingleServer": 5,
"estimatedAverageSegmentSizeInBytes": 0,
"totalEstimatedDataToBeMovedInBytes": 0,
"replicationFactor": {
"valueBeforeRebalance": 1,
"expectedValueAfterRebalance": 1
},
"numSegmentsInSingleReplica": {
"valueBeforeRebalance": 10,
"expectedValueAfterRebalance": 10
},
"numSegmentsAcrossAllReplicas": {
"valueBeforeRebalance": 10,
"expectedValueAfterRebalance": 10
},
"consumingSegmentSummary": {
"numConsumingSegmentsToBeMoved": 5,
"maxBytesConsumingSegmentsToCatchUp": 81,
"bytesConsumingSegmentsToCatchUpPerServer": {
"Server_<redacted>": 375
}
}
}
},
"instanceAssignment": {
"CONSUMING": {
...
}
},
"segmentAssignment": {
...
}
}
```
Run `POST http://localhost:9000/tables/airlineStats_REALTIME/forceCommit`,
dry-run rebalance again. Notice that total segments doubled (10 consuming
segments were committed), and now there's no bytes to catch up during rebalance.
```
{
"jobId": "",
"status": "DONE",
"description": "Dry-run mode",
"rebalanceSummaryResult": {
"serverInfo": {
...
}
},
"segmentInfo": {
"totalSegmentsToBeMoved": 5,
"maxSegmentsAddedToASingleServer": 5,
"estimatedAverageSegmentSizeInBytes": 43048,
"totalEstimatedDataToBeMovedInBytes": 215240,
"replicationFactor": {
"valueBeforeRebalance": 1,
"expectedValueAfterRebalance": 1
},
"numSegmentsInSingleReplica": {
"valueBeforeRebalance": 20,
"expectedValueAfterRebalance": 20
},
"numSegmentsAcrossAllReplicas": {
"valueBeforeRebalance": 20,
"expectedValueAfterRebalance": 20
},
"consumingSegmentSummary": {
"numConsumingSegmentsToBeMoved": 0,
"maxBytesConsumingSegmentsToCatchUp": 0,
"bytesConsumingSegmentsToCatchUpPerServer": {
"Server_100.114.242.49_7050": 0
}
}
}
},
"instanceAssignment": {
"CONSUMING": {
...
}
},
"segmentAssignment": {
...
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]