EungsopYoo commented on code in PR #6557:
URL: https://github.com/apache/hbase/pull/6557#discussion_r1922964278


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java:
##########
@@ -71,15 +84,42 @@ public MatchCode match(ExtendedCell cell) throws 
IOException {
       if (includeDeleteMarker) {
         this.deletes.add(cell);
       }
-      return MatchCode.SKIP;
+      // In some cases, optimization can not be done
+      if (!canOptimizeReadDeleteMarkers()) {
+        return MatchCode.SKIP;
+      }
     }
-    returnCode = checkDeleted(deletes, cell);
-    if (returnCode != null) {
+    // optimization when prevCell is Delete or DeleteFamilyVersion
+    if ((returnCode = checkDeletedEffectively(cell, prevCell)) != null) {
+      return returnCode;
+    }
+    if ((returnCode = checkDeleted(deletes, cell)) != null) {
       return returnCode;
     }
     return matchColumn(cell, timestamp, typeByte);
   }
 
+  // If prevCell is a delete marker and cell is a delete marked Put or delete 
marker,
+  // it means the cell is deleted effectively.
+  // And we can do SEEK_NEXT_COL.
+  private MatchCode checkDeletedEffectively(ExtendedCell cell, ExtendedCell 
prevCell) {
+    if (
+      prevCell != null && canOptimizeReadDeleteMarkers()
+        && CellUtil.matchingRowColumn(prevCell, cell) && 
CellUtil.matchingTimestamp(prevCell, cell)
+        && (PrivateCellUtil.isDeleteType(prevCell)
+          || PrivateCellUtil.isDeleteFamilyVersion(prevCell))
+    ) {
+      return MatchCode.SEEK_NEXT_COL;
+    }
+    return null;
+  }
+
+  private boolean canOptimizeReadDeleteMarkers() {
+    // for simplicity, optimization works only for these cases
+    return !seePastDeleteMarkers && scanMaxVersions == 1 && 
!visibilityLabelEnabled
+      && getFilter() == null && !(deletes instanceof 
NewVersionBehaviorTracker);
+  }

Review Comment:
   
https://issues.apache.org/jira/browse/HBASE-25972?focusedCommentId=17799356&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17799356
   I have run the tests followed by upper link. But some settings should be 
modified.
   ```
   hbase.hstore.defaultengine.enable.dualfilewriter -> 
hbase.enable.historical.compaction.files
   alter "T1", {NAME =>"info0", KEEP_DELETED_CELLS => TRUE} -> alter "T1", 
{NAME =>"info0", KEEP_DELETED_CELLS => true} 
   ```
   
   There were 4 observations.
   1. scan with dual file compaction enabled 7040ms
   2. scan with dual file compaction disabled 8539ms
   3. scan with delete markers and dual file compaction disabled 6.9936 seconds
   4. scan with delete markers and dual file compaction enabled 0.5660 seconds
   
   First, to make a baseline, I ran the same test on master branch without this 
PR on my Mac.
   1. scan with dual file compaction enabled 5605ms
   2. scan with dual file compaction disabled 5711ms
   3. scan with delete markers and dual file compaction disabled 3.7109 seconds
   4. scan with delete markers and dual file compaction enabled 0.3644 seconds
   
   And then, I have ran the same test on this PR branch.
   1. scan with dual file compaction enabled 5398ms
   2. scan with dual file compaction disabled 5476ms
   3. scan with delete markers and dual file compaction disabled 3.1572 seconds
   4. scan with delete markers and dual file compaction enabled 0.3701 seconds
   
   The result is exactly the same, because this PR skips the optimization when 
KEEP_DELETED_CELLS is set true.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to