shauryachats opened a new pull request, #18433:
URL: https://github.com/apache/pinot/pull/18433

   ## Summary
   
   Multi-stream realtime tables encode Pinot partition IDs as `streamIndex * 
10000 + streamPartitionId`. Before this fix, RealtimeSegmentAssignment and 
ReplicaGroupSegmentAssignmentStrategy used the raw (encoded) partition ID 
directly when computing instance slots, causing incorrect slot mapping and 
breaking colocation of segments belonging to the same partition.
   
   ### Changes:
     - In `RealtimeSegmentAssignment.assignConsumingSegment`, decode the Pinot 
partition ID to the stream-level partition ID via 
`IngestionConfigUtils.getStreamPartitionIdFromPinotPartitionId` before
     computing the instance index.                                              
                                                                                
                                        
     - In `ReplicaGroupSegmentAssignmentStrategy`, extract a 
`getPartitionIdFromSegmentName` helper that applies the same decoding for 
REALTIME tables before `% numPartitions`, fixing both single-segment assignment 
and rebalance paths. 
   
   ## Testing
   Deployed this on an internal cluster containing a multi-topic table and 
verified by setting the `instanceAssignmentConfig` as:
   ```
       "instanceAssignmentConfigMap": {
         "CONSUMING": {
           "tagPoolConfig": {
             "tag": "cluster_REALTIME",
             "poolBased": false,
             "numPools": 0
           },
           "replicaGroupPartitionConfig": {
             "replicaGroupBased": true,
             "numInstances": 0,
             "numReplicaGroups": 2,
             "numInstancesPerReplicaGroup": 3,
             "numPartitions": 3,
             "numInstancesPerPartition": 1,
             "minimizeDataMovement": true,
             "partitionColumn": "trace_id"
           },
           "partitionSelector": "INSTANCE_REPLICA_GROUP_PARTITION_SELECTOR",
           "minimizeDataMovement": false
         },
         "COMPLETED": {
           "tagPoolConfig": {
             "tag": "cluster_REALTIME",
             "poolBased": false,
             "numPools": 0
           },
           "replicaGroupPartitionConfig": {
             "replicaGroupBased": true,
             "numInstances": 0,
             "numReplicaGroups": 2,
             "numInstancesPerReplicaGroup": 3,
             "numPartitions": 3,
             "numInstancesPerPartition": 1,
             "minimizeDataMovement": true,
             "partitionColumn": "trace_id"
           },
           "partitionSelector": "INSTANCE_REPLICA_GROUP_PARTITION_SELECTOR",
           "minimizeDataMovement": false
         }
       },
   ```                                     
                                                                                
          


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to