jihoonson commented on issue #8663: Kafka indexing service duplicate entry 
exception in druid_pendingSegments
URL: 
https://github.com/apache/incubator-druid/issues/8663#issuecomment-542881773
 
 
   Hmm, the entries in `druid_pendingSegments` table are never updated but 
should be reused if possible. The way the segment ID allocation works is as 
below:
   
   1) a task asks a new segment ID to the overlord with `datasource`, 
`interval`, and `sequenceName`. Segment ID is unique per `datasource`, 
`interval`, and `sequenceName` and segment ID allocation is idempotent. This 
means, if a task asks with the same `datasource`, `interval`, and 
`sequenceName`, the overlord will return the same segment ID instead of 
creating a new one. The base `sequenceName` is generated as 
[here](https://github.com/apache/incubator-druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java#L1813-L1848),
 and each task will attach a sequential number whenever intermediate publish is 
triggered.
   2) The overlord first looks up `druid_pendingSegments` table to see if 
there's a reusable segment ID.
   3) If not, it looks up both `druid_segments` and `druid_pendingSegments` 
tables to find the next available segment ID. This is done by searching for the 
max partition id of all segments in the interval including both `published` and 
`unpublished` ones. The new segment will have the partition id of `current max 
partition id + 1`.
   4) Once it finds the next segment ID, it will insert it into the metadata 
store. Otherwise, the segment ID allocation fails.
   
   `2) - 4)` should be done atomically. Based on what you see in the metadata 
store, do you think there could be a bug in any step?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to