jihoonson commented on issue #8663: Kafka indexing service duplicate entry exception in druid_pendingSegments URL: https://github.com/apache/incubator-druid/issues/8663#issuecomment-542881773 Hmm, the entries in `druid_pendingSegments` table are never updated but should be reused if possible. The way the segment ID allocation works is as below: 1) a task asks a new segment ID to the overlord with `datasource`, `interval`, and `sequenceName`. Segment ID is unique per `datasource`, `interval`, and `sequenceName` and segment ID allocation is idempotent. This means, if a task asks with the same `datasource`, `interval`, and `sequenceName`, the overlord will return the same segment ID instead of creating a new one. The base `sequenceName` is generated as [here](https://github.com/apache/incubator-druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java#L1813-L1848), and each task will attach a sequential number whenever intermediate publish is triggered. 2) The overlord first looks up `druid_pendingSegments` table to see if there's a reusable segment ID. 3) If not, it looks up both `druid_segments` and `druid_pendingSegments` tables to find the next available segment ID. This is done by searching for the max partition id of all segments in the interval including both `published` and `unpublished` ones. The new segment will have the partition id of `current max partition id + 1`. 4) Once it finds the next segment ID, it will insert it into the metadata store. Otherwise, the segment ID allocation fails. `2) - 4)` should be done atomically. Based on what you see in the metadata store, do you think there could be a bug in any step?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org