[PR] Consuming Rebalance Summary [pinot]

via GitHub Tue, 25 Mar 2025 13:05:01 -0700


J-HowHuang opened a new pull request, #15368:
URL: https://github.com/apache/pinot/pull/15368


   ## Description
   To include more info for consuming segments during table rebalance. It's 
important because users will likely to experience unexpected Kafka topic 
re-consumption. would be better to show relevant information so they can expect 
that.
   
   New info will be added under `rebalanceSummaryResult.segmentInfo`:
   ```
   "consumingSegmentSummary": {
         "numConsumingSegmentsToBeMoved": 5,
         "maxBytesConsumingSegmentsToCatchUp": 81,
         "bytesConsumingSegmentsToCatchUpPerServer": {
           "Server_<redacted>": 375
         }
       }
   ```
   
   ## Changes
   While doing rebalance summary, `TableRebalancer` will get 
`consumingSegmentInfo` of the table using `ConsumingSegmentInfoReader`, which 
is the same as the controller API `GET 
/tables/{tableName}/consumingSegmentsInfo` uses, to get the latest offset of 
each consuming segment w.r.t. its stream partition. It also gets segment ZK 
metadata of these consuming segments to acquire their `startOffset`. Then the 
size of each consuming segment is calculated and turned into the info in the 
summary.
   
   If any consuming segment failed to get `consumingSegmentInfo`, 
`maxBytesConsumingSegmentsToCatchUp` and 
`bytesConsumingSegmentsToCatchUpPerServer` will become `null`
   
   ## Tests
   With `pinot-admin.sh QuickStart -type STREAM` and set 2 servers. Start the 
cluster, and remove `DefaultTenant_REALTIME` tag for one server, then dry-run 
rebalance the table `airlineStats_REALTIME` (10-partition Kafka stream):
   ```
   {
     "jobId": "",
     "status": "DONE",
     "description": "Dry-run mode",
     "rebalanceSummaryResult": {
       "serverInfo": {
         ...
         }
       },
       "segmentInfo": {
         "totalSegmentsToBeMoved": 5,
         "maxSegmentsAddedToASingleServer": 5,
         "estimatedAverageSegmentSizeInBytes": 0,
         "totalEstimatedDataToBeMovedInBytes": 0,
         "replicationFactor": {
           "valueBeforeRebalance": 1,
           "expectedValueAfterRebalance": 1
         },
         "numSegmentsInSingleReplica": {
           "valueBeforeRebalance": 10,
           "expectedValueAfterRebalance": 10
         },
         "numSegmentsAcrossAllReplicas": {
           "valueBeforeRebalance": 10,
           "expectedValueAfterRebalance": 10
         },
         "consumingSegmentSummary": {
           "numConsumingSegmentsToBeMoved": 5,
           "maxBytesConsumingSegmentsToCatchUp": 81,
           "bytesConsumingSegmentsToCatchUpPerServer": {
             "Server_<redacted>": 375
           }
         }
       }
     },
     "instanceAssignment": {
       "CONSUMING": {
         ...
       }
     },
     "segmentAssignment": {
       ...
     }
   }
   ```
   Run `POST http://localhost:9000/tables/airlineStats_REALTIME/forceCommit`, 
dry-run rebalance again. Notice that total segments doubled (10 consuming 
segments were committed), and now there's no bytes to catch up during rebalance.
   ```
   {
     "jobId": "",
     "status": "DONE",
     "description": "Dry-run mode",
     "rebalanceSummaryResult": {
       "serverInfo": {
         ...
         }
       },
       "segmentInfo": {
         "totalSegmentsToBeMoved": 5,
         "maxSegmentsAddedToASingleServer": 5,
         "estimatedAverageSegmentSizeInBytes": 43048,
         "totalEstimatedDataToBeMovedInBytes": 215240,
         "replicationFactor": {
           "valueBeforeRebalance": 1,
           "expectedValueAfterRebalance": 1
         },
         "numSegmentsInSingleReplica": {
           "valueBeforeRebalance": 20,
           "expectedValueAfterRebalance": 20
         },
         "numSegmentsAcrossAllReplicas": {
           "valueBeforeRebalance": 20,
           "expectedValueAfterRebalance": 20
         },
         "consumingSegmentSummary": {
           "numConsumingSegmentsToBeMoved": 0,
           "maxBytesConsumingSegmentsToCatchUp": 0,
           "bytesConsumingSegmentsToCatchUpPerServer": {
             "Server_100.114.242.49_7050": 0
           }
         }
       }
     },
     "instanceAssignment": {
       "CONSUMING": {
         ...
       }
     },
     "segmentAssignment": {
       ...
     }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Consuming Rebalance Summary [pinot]

Reply via email to