tibrewalpratik17 commented on issue #12400:
URL: https://github.com/apache/pinot/issues/12400#issuecomment-2154558656

   Hey @ankitsultana I was looking into this.
   
   So based on your example:
   ```Say we have two replicas of consuming segments: S0 and S1.
   
   Say the segments with the previous sequence id for this segment in the 
replicas are: P0 and P1.
   
   While a rebalance is going on, say P0 gets moved to the target server before 
P1, and between that time we had a record come to S0 which needed to be read 
from P0.
   
   If a segment commit happens before the consuming segments were moved, we 
will end up with S0, S1 having different data.
   ```
   
   When we rebalance with the includeConsuming option set to true, according to 
[Pinot's official 
documentation](https://docs.pinot.apache.org/operators/operating-pinot/rebalance/rebalance-servers#rebalance-parameters):
   ```
   CONSUMING segments are rebalanced only if this is set to true.
   Moving a CONSUMING segment involves dropping the data consumed so far on old 
server, and re-consuming on the new server. 
   ```
   In the case of partial-upsert, an entire partition will be moved to another 
node, not just a few segments. As you mentioned, allSegmentsLoaded will prevent 
the consumption from starting on the new node until all the old segments for 
the partition are available, ensuring the data is re-consumed properly.
   
   Moreover, since we will rebalance at the replica level during a NoDowntime 
rebalance, once the first replica is in a stable consuming state, we will move 
to the next replica using the same logic. If a segment commit happens during 
this process, it should not result in different data because the 
allSegmentsLoaded condition will prevent any consumption inconsistencies.
   
   But please let me know if there are any edge cases I might have missed. I 
haven't gone through the rebalance code in detail yet, so my understanding is 
based on documentation and theoretical knowledge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to