Re: [PR] Add a dry-run summary mode for TableRebalance which only returns a summary of the dry-run results [pinot]

via GitHub Sat, 15 Feb 2025 20:12:21 -0800


npawar commented on PR #15050:
URL: https://github.com/apache/pinot/pull/15050#issuecomment-2661233155


   > This PR adds a `summary` option to the TableRebalance API which is meant 
to be used with `dryRun`. If `summary` is set to `true` a summary of the 
dry-run is returned rather than the full dry-run. This summary gives some stats 
about changes that will occur during the rebalance, such as:
   > 
   > * Total number of segments to be moved
   > * Unique and total number of segments
   > * Replication factor
   > * Number of servers
   > * Server map to segment add/remove/unchanged information
   > * Average segment size
   > * Number of servers getting segments added
   > * Total data size being moved (calculated based on average segment size - 
based on TableSize)
   > * An estimate of how long the move can take (based on estimates to 
download and process each segment) - we should decide on a good throughput 
estimate as part of this PR review (left a TODO where the code is relevant)
   > 
   > Today the dry-run output can be very large, and can be difficult to make 
sense of in terms of the changes occurring. It can also be difficult to display 
the full output. For now `summary` is not appended to `dryRun` without 
`summary` enabled, but this can be added if it makes sense to add a summary 
even for the usual result return.
   > 
   > Note: This PR will have conflicts with #15029 and will need to be rebased 
once that is merged.
   > 
   > Sample JSON summary:
   > 
   > ```
   > {
   >   "totalSegmentsToBeMoved" : 6,
   >   "numServersGettingNewSegments" : 1,
   >   "estimatedAverageSegmentSizeInBytes" : 1690546,
   >   "totalEstimatedDataToBeMovedInBytes" : 10143276,
   >   "totalEstimatedTimeToMoveDataInSecs" : 0.09673381805419921,
   >   "numServers" : {
   >     "_existingValue" : 1,
   >     "_newValue" : 2
   >   },
   >   "replicationFactor" : {
   >     "_existingValue" : 1,
   >     "_newValue" : 1
   >   },
   >   "numUniqueSegments" : {
   >     "_existingValue" : 12,
   >     "_newValue" : 12
   >   },
   >   "numTotalSegments" : {
   >     "_existingValue" : 12,
   >     "_newValue" : 12
   >   },
   >   "serverSegmentChangeInfo" : {
   >     "Server_localhost_22004" : {
   >       "_totalNewSegments" : 6,
   >       "_totalExistingSegments" : 0,
   >       "_segmentsAdded" : 6,
   >       "_segmentsDeleted" : 0,
   >       "_segmentsUnchanged" : 0
   >     },
   >     "Server_localhost_22001" : {
   >       "_totalNewSegments" : 6,
   >       "_totalExistingSegments" : 12,
   >       "_segmentsAdded" : 0,
   >       "_segmentsDeleted" : 6,
   >       "_segmentsUnchanged" : 6
   >     }
   >   }
   > }
   > ```
   > 
   > cc @Jackie-Jiang @klsince @deepthi912 @npawar
   
   A few comments on the summary. Feel free to just take those that make sense 
for this first iteration:
   1. along with numServers, it might be useful to see the list there of 
existing and new, so operator can confirm that their tagging / untagging is 
effective (so a servers added / removed / unchanged ?)
   2. how about showing what tenant tag we're operating with? so summarize the 
tags from tenants, completed, tier, pools
   3. what does num unique segments mean? same with numTotalSegments, didn't 
follow what existing/new value means in context of rebalance. Perhaps having a 
description field within the sections will help.
   4. in the server to stats map for 22001, if it started with 12 and 6 are 
moving, shouldn't totalNewSegments be 0?
   5. this payload will get pretty extensive over time. wondering if we should 
take time to do some more top level categorization - segments related info, 
servers related info, generic info


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add a dry-run summary mode for TableRebalance which only returns a summary of the dry-run results [pinot]

Reply via email to