somandal opened a new pull request, #15427:
URL: https://github.com/apache/pinot/pull/15427
## Description
This PR enhances the Table Rebalance progress stats.
Today the progress stats can be confusing due to the various sections (there
are 3), and it is hard to track progress because most of the time only the
EV-IS convergence step is updated and this is only calculated for a single step
and not for the full rebalance. To understand the progress stats, the code that
does the stats updates must be well understood which is not a good assumption
and experience for end users.
Another concern with the existing stats is that the percentage progress
calculations are based off the total segments in the target, rather than based
on the actual segments that need to be added / deleted. This makes it harder to
assess the progress of the rebalance since not all segments may even need to
change.
This PR adds the following to the existing progress stats:
```
"rebalanceProgressStatsOverall" : {
"totalSegmentsToBeAdded" : 30,
"totalSegmentsToBeDeleted" : 30,
"totalRemainingSegmentsToBeAdded" : 14,
"totalRemainingSegmentsToBeDeleted" : 11,
"totalCarryOverSegmentsToBeAdded" : 0,
"totalCarryOverSegmentsToBeDeleted" : 0,
"totalRemainingSegmentsToConverge": 0,
"totalUniqueNewUntrackedSegmentsDuringRebalance" : 10,
"percentageTotalSegmentsAddsRemaining" : 46.666666666666664,
"percentageTotalSegmentDeletesRemaining" : 36.666666666666664,
"estimatedTimeToCompleteAddsInSeconds" : 71.768375,
"estimatedTimeToCompleteDeletesInSeconds" : 47.48584210526316,
"averageSegmentSizeInBytes" : 35918,
"totalEstimatedDataToBeMovedInBytes" : 1077540,
"startTimeMs" : 1742319763143
},
"rebalanceProgressStatsCurrentStep" : {
"totalSegmentsToBeAdded" : 10,
"totalSegmentsToBeDeleted" : 10,
"totalRemainingSegmentsToBeAdded" : 4,
"totalRemainingSegmentsToBeDeleted" : 1,
"totalCarryOverSegmentsToBeAdded" : 0,
"totalCarryOverSegmentsToBeDeleted" : 0,
"totalRemainingSegmentsToConverge": 0,
"totalUniqueNewUntrackedSegmentsDuringRebalance" : 0,
"percentageTotalSegmentsAddsRemaining" : 40.0,
"percentageTotalSegmentDeletesRemaining" : 10.0,
"estimatedTimeToCompleteAddsInSeconds" : 16.923333333333332,
"estimatedTimeToCompleteDeletesInSeconds" : 2.8205555555555555,
"averageSegmentSizeInBytes" : 35918,
"totalEstimatedDataToBeMovedInBytes" : 359180,
"startTimeMs" : 1742319819779
},
```
The overall stats (`rebalanceProgressStatsOverall`) above are updated at
each step (`rebalanceProgressStatsCurrentStep`) as well to capture the progress
during IS-EV convergence. This is to ensure that users only need to monitor
`rebalanceProgressStatsOverall` to find out how much work has already been
done. The `rebalanceProgressStatsCurrentStep` can be used for debugging but
does not need to be monitored as such.
`_totalUniqueNewUntrackedSegmentsDuringRebalance`: this field is added to
track new segments that show up in the IdealState but which aren't being
tracked as part of the rebalance. This is just informational and can be used to
identify how many additional segments got added during the rebalance run.
The time estimates are based on how much time has passed and how many
segments were already processed and how many are remaining. This may not be a
very accurate measure and is not based on the actual data moving since
different set ups may have different throughput characteristics.
This PR also renames some of the existing stats to make them clearer to
understand.
## Interesting Scenario
While experimenting with QuickStart and REALTIME tables, I ran into a
situation where rebalance would complete successfully, but the stats I added
would always show some segment deletions as not yet processed.
Digging some more, found the following:
https://github.com/apache/pinot/blob/604cd165586716ae04db22856371e50147e30459/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java#L1074
The above only checks for extra assigned instances for given segments if
`lowDiskMode` is enabled. So in my scenario, there were some extra instances
assigned when I did the stats calculations, but the above check was never
performed since `lowDiskMode=false`. The EV-IS convergence was marked as true
in this case and the rebalance was completed.
When I updated the code to comment out `lowDiskMode &&` from the above if, I
saw that eventually the EV does get updated to remove the deleted segments, and
the EV-IS do match up exactly, but that needs some additional convergence
loops. In this case the stats were correctly updated to reflect the deletions.
**As a fix for the above:** updated the code to remove `lowDiskMode &&` so
that this convergence loop always runs to ensure deletions are converged for
all scenarios and not just for `lowDiskMode`.
## Testing:
All testing done with `minAvailableReplicas = -1`
### Test 1 (no delay in segment adds)
- Ran HybridQuickStart with 6 servers
- Tagged 3 servers with `NewDefaultTenant_OFFLINE` and removed
`DefaultTenant_OFFLINE` from these servers
- Ran rebalance of two types:
- Updated replication of OFFLINE table from 1 -> 3, this got triggered
as single step rebalance to add all the new replicas
- Updated server tenant of OFFLINE table to `NewDefaultTenant`
(triggered as a 3 step rebalance to move 1 replica at a time)
### Test 2 (delay in segment adds/deletes)
- Added a random delay between 10 - 30 seconds in the
`onBecomeOnlineFromOffline()`, `onBecomeOnlineFromConsuming()`,
`onBecomeConsumingFromOffline()` state transition callbacks
- Added a random delay between 0 - 30 seconds in the `
onBecomeDroppedFrom*()` state transition callbacks
- Ran HybridQuickStart with 6 servers
- Tagged 3 servers with `NewDefaultTenant_OFFLINE` and removed
`DefaultTenant_OFFLINE` from these servers
- Ran rebalance of two types (both were triggered as multi-step rebalances):
- Updated replication of OFFLINE table from 1 -> 3, monitored the steps
(due to random delay in state transition, could see some segments getting added
sooner and others take longer)
- Updated server tenant of OFFLINE table to `NewDefaultTenant` and
similarly monitored the steps
### Additional testing (with delays):
- REALTIME table with and without `includeConsuming` (similar steps as for
OFFLINE above)
- Scenario where some segments get added to the IdealState in the middle of
rebalance (e.g. forceCommit for REALTIME table) to tag some servers as
`NewDefaultTenant_REALTIME` and remove `DefaultTenant_REALTIME` tag
- Test to force some segments into `ERROR` state, and use `bestEfforts=true`
to validate that convergence stats are correctly captured. Also verified
convergence stats with `forceCommit` since the existing `CONSUMING` segments
change to `ONLINE` and the rebalance correctly waits (and stats are correctly
updated) until the state convergence also completes.
- Set some segments to `OFFLINE` state in ideal state. Verified that while
they are in `OFFLINE` state in ideal state, we skip progress tracking for
those. Eventually if they get updated to ONLINE state, we track them again and
wait for convergence.
- Updated the code to add fields for carry-over segments that didn't
converge in the last step. This will mostly occur if `bestEffort=true`, and we
are unable to complete convergence within the timeout. The other scenario was
with `lowDiskMode` and extra deletions from EV, but we changed the code here to
always wait for additional deletions in the EV. The percentage calculations
include these carry-over segments.
cc @raghavyadav01 @Jackie-Jiang @klsince @J-HowHuang
Something went wrong with the branch of my original PR:
https://github.com/apache/pinot/pull/15266
This PR is the exact same. I'll close the other one
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]