Jackie-Jiang commented on code in PR #10915:
URL: https://github.com/apache/pinot/pull/10915#discussion_r1262144652
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -247,6 +267,17 @@ public void addSegment(ImmutableSegmentImpl segment,
@Nullable ThreadSafeMutable
Lock segmentLock = SegmentLocks.getSegmentLock(_tableNameWithType,
segmentName);
segmentLock.lock();
try {
+ // Skip adding segments that has segment EndTime in the comparison cols
earlier than (largestSeenTimestamp - TTL).
Review Comment:
(MAJOR) Just realized a problem. If we return here, the valid doc ids and
queryable doc ids won't be set in the segment. The correct solution should be:
- If `delete` is not enabled, set valid doc ids snapshot into the segment
- If `delete` is enabled, since we don't have snapshot for queryable doc
ids, we need to loop over valid doc ids to create the queryable doc ids (find
non-delete records)
Can you add a test to catch this problem? With current solution, after
restarting all the invalid docs will show up again
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -556,6 +659,32 @@ protected void finishOperation() {
}
}
+ /**
+ * When TTL is enabled for upsert, this function is used to remove expired
keys from the primary key indexes.
+ * Primary keys that has comparison value earlier than
largestSeenComparisonValue - TTL value will be removed.
+ */
+ @Override
+ public void removeExpiredPrimaryKeys() {
+ // If upsertTTL is enabled, we will remove expired primary keys from
upsertMetadata after taking snapshot.
+ if (_metadataTTL == 0) {
Review Comment:
To be more robust
```suggestion
if (_metadataTTL <= 0) {
```
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -367,6 +398,17 @@ public void replaceSegment(ImmutableSegment segment,
@Nullable ThreadSafeMutable
Lock segmentLock = SegmentLocks.getSegmentLock(_tableNameWithType,
segmentName);
segmentLock.lock();
try {
+ // Skip adding segments that has segment EndTime in the comparison cols
earlier than (largestSeenTimestamp - TTL).
+ // Note: We only update largestSeenComparisonValue when addRecord, and
access the value when addOrReplaceSegments.
+ // We only support single comparison column for TTL-enabled upsert
tables.
+ if (_largestSeenComparisonValue > 0) {
Review Comment:
We cannot skip replacing segment because the new segment doesn't have
snapshot, thus we don't know which docs are valid. Unfortunately I don't see a
good way to handle it if user is replacing a segment that is out of TTL. Or we
only allow user to upload segment without any invalid doc. We need to add more
comments explaining the logic.
Also, we can move this check outside of the segment lock
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]