tibrewalpratik17 opened a new issue, #14633:
URL: https://github.com/apache/pinot/issues/14633

   Creating this issue from comment: 
https://github.com/apache/pinot/pull/14477#discussion_r1870284658
   
   The current approach for determining the latest uploaded segment in the 
UploadedRealtimeSegment refresh process relies on comparing segment creation 
times. However, this method introduces an edge case where segments with 
identical creation times can lead to incorrect metadata resolution and 
inconsistencies in the compact-merge process.
   
   
   Specific Scenario:
   
   -  Consider the following: Uploaded Segment U1, LLC Segments LLC1 and LLC2, 
which are merged to form a new uploaded segment U2 via UpsertCompactMerge task.
   
   - If the creation time for both U1 and U2 is identical, and the 
SegmentRefresh task refreshes U1 after U2 has been uploaded, the keys from U1 
will dominate in the metadata manager.
   
   - As a result: U1 is incorrectly marked as refreshed in Zookeeper (ZK) for 
U2. The system mistakenly considers U1 as already merged, preventing it from 
being picked again for the compact-merge process.
   
   This issue can lead to metadata inconsistency, where the latest segment (U2) 
does not dominate despite being the most recent merge result. Though there are 
no data consistency issues as such, we still need U1 to merged in future but it 
will be not until U2 is deleted and it's ZK metadata is cleared.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to