amrishlal edited a comment on issue #7004:
URL: 
https://github.com/apache/incubator-pinot/issues/7004#issuecomment-884523925


   I am a bit confused by these two statements:
   > When streaming from Kafka, Pinot currently lacks of a way to allow users 
to uniquely identify messages
   
   and
   
   > where offset > last_recorded_offset
   
   It seems like in the first case, you are looking for a globally unique 
identifier for each row. I am assuming this would involve something like a UUID 
generator that will tack on UUID with each row that is ingested (?) In the 
second case, it seems like you are looking for a "rowid" with the additional 
criteria that it should be monotonically increasing and be comparable.
   
   I am not quite sure if it is possible to do both with reasonable amount of 
effort (i.e generate a globally unique identifier that is monotonically 
increasing and hence also comparable across all rows of all segments) specially 
when one considers that we commonly replace segments, generate segments 
offline, and also do some update operations such as UPSERT. Unless I am missing 
something, maybe it could be done with a cluster wide id generation service in 
Pinot (?). The first (UUID generation) can probably be done now at ingestion 
time using an ingestion transform function (?). The second looks very difficult 
to implement and get right (?).
   
   I think we need more clarity on what exactly is being implemented here: 1) 
dynamically generated ROWID over resultset only (for supporting cursors), 2) a 
column that will identify each row with a globally unique identifier (useful 
for partitioning, indexing, etc), 3) ROWID generated for each row at row 
creation time that is globally unique and comparable across all rows and all 
segments and that can be kept up to date with operations such as segment 
replacement, UPSERT, etc?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to