Re: [PR] HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell [hbase]

via GitHub Wed, 22 Jan 2025 19:15:47 -0800


EungsopYoo commented on code in PR #6557:
URL: https://github.com/apache/hbase/pull/6557#discussion_r1926301945



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java:
##########
@@ -71,15 +84,42 @@ public MatchCode match(ExtendedCell cell) throws 
IOException {
       if (includeDeleteMarker) {
         this.deletes.add(cell);
       }
-      return MatchCode.SKIP;
+      // In some cases, optimization can not be done
+      if (!canOptimizeReadDeleteMarkers()) {
+        return MatchCode.SKIP;
+      }
     }
-    returnCode = checkDeleted(deletes, cell);
-    if (returnCode != null) {
+    // optimization when prevCell is Delete or DeleteFamilyVersion
+    if ((returnCode = checkDeletedEffectively(cell, prevCell)) != null) {
+      return returnCode;
+    }
+    if ((returnCode = checkDeleted(deletes, cell)) != null) {
       return returnCode;
     }
     return matchColumn(cell, timestamp, typeByte);
   }
 
+  // If prevCell is a delete marker and cell is a delete marked Put or delete 
marker,
+  // it means the cell is deleted effectively.
+  // And we can do SEEK_NEXT_COL.
+  private MatchCode checkDeletedEffectively(ExtendedCell cell, ExtendedCell 
prevCell) {
+    if (
+      prevCell != null && canOptimizeReadDeleteMarkers()
+        && CellUtil.matchingRowColumn(prevCell, cell) && 
CellUtil.matchingTimestamp(prevCell, cell)
+        && (PrivateCellUtil.isDeleteType(prevCell)
+          || PrivateCellUtil.isDeleteFamilyVersion(prevCell))
+    ) {
+      return MatchCode.SEEK_NEXT_COL;
+    }
+    return null;
+  }
+
+  private boolean canOptimizeReadDeleteMarkers() {
+    // for simplicity, optimization works only for these cases
+    return !seePastDeleteMarkers && scanMaxVersions == 1 && 
!visibilityLabelEnabled
+      && getFilter() == null && !(deletes instanceof 
NewVersionBehaviorTracker);
+  }

Review Comment:
   > To me, this improvement is only meaningful when the scanned data is in 
memstore assuming that that the skip list will be used for jumping from one 
column to the next (I have not looked at the code in detail recently so I 
assume it is the case). However, when HBase scans data from HFile, do we have 
data structures in place to jump from one column to next one? I think we do not 
have one. No only we linearly scan the cells within a row, we also linearly 
scan all rows within a HBase block, do not we? So I did not understand why 
skipping to the next column would be a significant optimization in general.
   
   
   
   > To me, this improvement is only meaningful when the scanned data is in 
memstore assuming that that the skip list will be used for jumping from one 
column to the next (I have not looked at the code in detail recently so I 
assume it is the case). However, when HBase scans data from HFile, do we have 
data structures in place to jump from one column to next one? I think we do not 
have one. No only we linearly scan the cells within a row, we also linearly 
scan all rows within a HBase block, do not we? So I did not understand why 
skipping to the next column would be a significant optimization in general.
   
   https://issues.apache.org/jira/browse/HBASE-29039
   The performance tests on the Jira description are just the cases of reading 
from MemStore only. So I have run new performance tests, reading from 
StoreFiles only with or without dual file compaction.
   
   ```
   create 'test', 'c'
   
   java_import org.apache.hadoop.hbase.client.Delete
   java_import org.apache.hadoop.hbase.TableName
   java_import java.lang.System
   
   con = @hbase.instance_variable_get(:@connection)
   table = con.getTable(TableName.valueOf('test'))
   
   1000.times do |i|
     # batch 10000 deletes with different timestamps every 10 seconds
     now = System.currentTimeMillis()
     dels = 10000.times.map do |i|
       del = Delete.new(Bytes.toBytes('row'))
       del.addFamily(Bytes.toBytes('c'), now + i)
     end
     table.delete(dels)
     sleep(10)
     flush 'test'
     # Trigger manually minor compaction because of compaction is not triggered 
automatically. But I don't know why yet
     compact 'test' if i % 10 == 0
     puts "i - #{i}"
     # Read after flush from StoreFiles
     get 'test', 'row'
   end
   ```
   
   master - without dual file compaction
   ```
   i - 0
   COLUMN  CELL
   0 row(s)
   Took 0.0184 seconds
   ...
   i - 10
   COLUMN  CELL
   0 row(s)
   Took 0.0206 seconds
   ...
   i - 20
   COLUMN  CELL
   0 row(s)
   Took 0.0398 seconds
   ...
   i - 30
   COLUMN  CELL
   0 row(s)
   Took 0.0462 seconds
   ...
   i - 40
   COLUMN  CELL
   0 row(s)
   Took 0.0517 seconds
   ...
   i - 50
   COLUMN  CELL
   0 row(s)
   Took 0.0730 seconds
   ...
   i - 60
   COLUMN  CELL
   0 row(s)
   Took 0.0903 seconds
   ```
   
   
   master - with dual file compaction
   ```
   i - 0
   COLUMN  CELL
   0 row(s)
   Took 0.0050 seconds
   ...
   i - 10
   COLUMN  CELL
   0 row(s)
   Took 0.0175 seconds
   ...
   i - 20
   COLUMN  CELL
   0 row(s)
   Took 0.0265 seconds
   ...
   i - 30
   COLUMN  CELL
   0 row(s)
   Took 0.0310 seconds
   ...
   i - 40
   COLUMN  CELL
   0 row(s)
   Took 0.0205 seconds
   ...
   i - 50
   COLUMN  CELL
   0 row(s)
   Took 0.0532 seconds
   ...
   i - 60
   COLUMN  CELL
   0 row(s)
   Took 0.0360 seconds
   ```
   
   this PR - without dual file compaction
   ```
   i - 0
   COLUMN  CELL
   0 row(s)
   Took 0.0061 seconds
   ...
   i - 10
   COLUMN  CELL
   0 row(s)
   Took 0.0102 seconds
   ...
   i - 20
   COLUMN  CELL
   0 row(s)
   Took 0.0052 seconds
   ...
   i - 30
   COLUMN  CELL
   0 row(s)
   Took 0.0073 seconds
   ...
   i - 40
   COLUMN  CELL
   0 row(s)
   Took 0.0077 seconds
   ...
   i - 50
   COLUMN  CELL
   0 row(s)
   Took 0.0115 seconds
   ...
   i - 60
   COLUMN  CELL
   0 row(s)
   Took 0.0101 seconds
   ```
   
   this PR - with dual file compaction
   ```
   i - 0
   COLUMN  CELL
   0 row(s)
   Took 0.0046 seconds
   ...
   i - 10
   COLUMN  CELL
   0 row(s)
   Took 0.0103 seconds
   ...
   i - 20
   COLUMN  CELL
   0 row(s)
   Took 0.0074 seconds
   ...
   i - 30
   COLUMN  CELL
   0 row(s)
   Took 0.0067 seconds
   ...
   i - 40
   COLUMN  CELL
   0 row(s)
   Took 0.0077 seconds
   ...
   i - 50
   COLUMN  CELL
   0 row(s)
   Took 0.0091 seconds
   i - 60
   COLUMN  CELL
   0 row(s)
   Took 0.0106 seconds
   ```
   
   The results show that the optimization of this PR works on reading from 
StoreFiles too, even without dual file compaction. What do you think about this 
results?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell [hbase]

Reply via email to