npawar commented on PR #15050:
URL: https://github.com/apache/pinot/pull/15050#issuecomment-2661233155
> This PR adds a `summary` option to the TableRebalance API which is meant
to be used with `dryRun`. If `summary` is set to `true` a summary of the
dry-run is returned rather than the full dry-run. This summary gives some stats
about changes that will occur during the rebalance, such as:
>
> * Total number of segments to be moved
> * Unique and total number of segments
> * Replication factor
> * Number of servers
> * Server map to segment add/remove/unchanged information
> * Average segment size
> * Number of servers getting segments added
> * Total data size being moved (calculated based on average segment size -
based on TableSize)
> * An estimate of how long the move can take (based on estimates to
download and process each segment) - we should decide on a good throughput
estimate as part of this PR review (left a TODO where the code is relevant)
>
> Today the dry-run output can be very large, and can be difficult to make
sense of in terms of the changes occurring. It can also be difficult to display
the full output. For now `summary` is not appended to `dryRun` without
`summary` enabled, but this can be added if it makes sense to add a summary
even for the usual result return.
>
> Note: This PR will have conflicts with #15029 and will need to be rebased
once that is merged.
>
> Sample JSON summary:
>
> ```
> {
> "totalSegmentsToBeMoved" : 6,
> "numServersGettingNewSegments" : 1,
> "estimatedAverageSegmentSizeInBytes" : 1690546,
> "totalEstimatedDataToBeMovedInBytes" : 10143276,
> "totalEstimatedTimeToMoveDataInSecs" : 0.09673381805419921,
> "numServers" : {
> "_existingValue" : 1,
> "_newValue" : 2
> },
> "replicationFactor" : {
> "_existingValue" : 1,
> "_newValue" : 1
> },
> "numUniqueSegments" : {
> "_existingValue" : 12,
> "_newValue" : 12
> },
> "numTotalSegments" : {
> "_existingValue" : 12,
> "_newValue" : 12
> },
> "serverSegmentChangeInfo" : {
> "Server_localhost_22004" : {
> "_totalNewSegments" : 6,
> "_totalExistingSegments" : 0,
> "_segmentsAdded" : 6,
> "_segmentsDeleted" : 0,
> "_segmentsUnchanged" : 0
> },
> "Server_localhost_22001" : {
> "_totalNewSegments" : 6,
> "_totalExistingSegments" : 12,
> "_segmentsAdded" : 0,
> "_segmentsDeleted" : 6,
> "_segmentsUnchanged" : 6
> }
> }
> }
> ```
>
> cc @Jackie-Jiang @klsince @deepthi912 @npawar
A few comments on the summary. Feel free to just take those that make sense
for this first iteration:
1. along with numServers, it might be useful to see the list there of
existing and new, so operator can confirm that their tagging / untagging is
effective (so a servers added / removed / unchanged ?)
2. how about showing what tenant tag we're operating with? so summarize the
tags from tenants, completed, tier, pools
3. what does num unique segments mean? same with numTotalSegments, didn't
follow what existing/new value means in context of rebalance. Perhaps having a
description field within the sections will help.
4. in the server to stats map for 22001, if it started with 12 and 6 are
moving, shouldn't totalNewSegments be 0?
5. this payload will get pretty extensive over time. wondering if we should
take time to do some more top level categorization - segments related info,
servers related info, generic info
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]