tibrewalpratik17 commented on code in PR #11811:
URL: https://github.com/apache/pinot/pull/11811#discussion_r1361317237


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java:
##########
@@ -489,12 +489,16 @@ public boolean index(GenericRow row, @Nullable 
RowMetadata rowMetadata)
     if (isUpsertEnabled()) {
       RecordInfo recordInfo = getRecordInfo(row, numDocsIndexed);
       GenericRow updatedRow = 
_partitionUpsertMetadataManager.updateRecord(row, recordInfo);
-      updateDictionary(updatedRow);
-      addNewRow(numDocsIndexed, updatedRow);
-      // Update number of documents indexed before handling the upsert 
metadata so that the record becomes queryable
-      // once validated
-      canTakeMore = numDocsIndexed++ < _capacity;
-      _partitionUpsertMetadataManager.addRecord(this, recordInfo);
+      // if record doesn't need to be dropped, then persist in segment and 
update metadata hashmap
+      if (!_partitionUpsertMetadataManager.shouldDropRecord(recordInfo)) {

Review Comment:
   Umm i think there can be a scenario something like:
   
   - `Key: A, Current Segment: S1, Valid Doc ID: 2`  and say `addNewRow` call 
fails for this.
   MetadataMap: A -> RecordLocation (S1, 2)
   - `Key: B, Current Segment: S1, Valid Doc ID: 2` (This also gets valid Doc 
ID as 2 as `_numDocsIndexed` didn't update). 
   MetadataMap:  A -> RecordLocation (S1, 2), B -> RecordLocation (S1, 2)
   - Now say, we receive a record for key A again after sometime when we are 
processing segment S2.
   In that scenario, we will invoke a removeDocID command for (S1, 2) here - 
https://github.com/apache/pinot/blob/51335a2db1720ac18da8818a4b565b557a0c667b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java#L264
   which will invalidate docID for B and will stop responding it in queries. 
   
   Did I miss anything? 😅 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to