amrishlal edited a comment on issue #7004: URL: https://github.com/apache/incubator-pinot/issues/7004#issuecomment-884523925
I am a bit confused by these two statements: > When streaming from Kafka, Pinot currently lacks of a way to allow users to uniquely identify messages and > where offset > last_recorded_offset It seems like in the first case, you are looking for a globally unique identifier for each row. I am assuming this would involve something like a UUID generator that will tack on UUID with each row that is ingested (?) In the second case, it seems like you are looking for a "rowid" with the additional criteria that it should be monotonically increasing and be comparable. I am not quite sure if it is possible to do both with reasonable amount of effort (i.e generate a globally unique identifier that is monotonically increasing and hence also comparable across all rows of all segments) specially when one considers that we commonly replace segments, generate segments offline, and also do some update operations such as UPSERT. Unless I am missing something, maybe it could be done with a cluster wide id generation service in Pinot (?). The first (UUID generation) can probably be done now at ingestion time using an ingestion transform function (?). The second looks very difficult to implement and get right (?). I think we need more clarity on what exactly is being implemented here: 1) dynamically generated ROWID over resultset only (for supporting cursors), 2) a column that will identify each row with a globally unique identifier (useful for partitioning, indexing, etc), 3) ROWID generated for each row at row creation time that is globally unique and comparable across all rows and all segments and that can be kept up to date with operations such as segment replacement, UPSERT, etc? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
