tibrewalpratik17 commented on issue #13140:
URL: https://github.com/apache/pinot/issues/13140#issuecomment-2120131734

   One potential root-cause for this might be releasing the 
`_partitionGroupConsumerSemaphore` before the segment actually came `ONLINE`. 
This can result in starting the consumption on new segment.
   
   For example:
   
   - Segment S1 was in CONSUMING state and is now getting moved to ONLINE.
   
   - Segment S2 is the new segment which is waiting for acquiring the 
`_partitionGroupConsumerSemaphore` -- 
https://github.com/apache/pinot/blob/a385e28c3d8f5175eaca621f59debc2d8f83ab56/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java#L1563
   
   - Segment S1 starts building the same segment and releases 
`_partitionGroupConsumerSemaphore` 
https://github.com/apache/pinot/blob/a385e28c3d8f5175eaca621f59debc2d8f83ab56/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java#L964-L965
 or it issues a downloadSegmentAndReplace step again releasing the semaphore 
first 
https://github.com/apache/pinot/blob/a385e28c3d8f5175eaca621f59debc2d8f83ab56/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java#L1320-L1322
 In both the scenarios, S2 will start the consumption and in another thread, S1 
will be getting built.
   
   - Now in this particular issue reported, what we were observing is that the 
`merge` function in S2 was treating the incoming record as a new record and not 
picking up the changes already came in S1. This might be because the records 
never came in S1 and the consumption was stopped because it never caught up to 
the committing offset. If you see the ingestion throughput is pretty high here 
(15k msgs/sec) and all the records for a given key is coming in a very small 
space of time. So if S1 did not catch up to the first offset of a given key K1 
in one of the replicas and that replica starts getting replaced, S2 in the same 
replica will treat it as a new key and persist it. And now S1 will report the 
keys as `not_replaced` which we were seeing in our metrics as well.
   
   Question to @Jackie-Jiang and @klsince , can we move the 
`closeStreamConsumers` call after building the consuming segment? Do you know 
why we went with this logic in the first place?
   
   Note: This will not affect realtime and full-upsert cases to a lot extent 
but becomes very critical for partial-upsert cases where replica will diverge 
over time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to