EungsopYoo commented on code in PR #6557:
URL: https://github.com/apache/hbase/pull/6557#discussion_r1922964278
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java:
##########
@@ -71,15 +84,42 @@ public MatchCode match(ExtendedCell cell) throws
IOException {
if (includeDeleteMarker) {
this.deletes.add(cell);
}
- return MatchCode.SKIP;
+ // In some cases, optimization can not be done
+ if (!canOptimizeReadDeleteMarkers()) {
+ return MatchCode.SKIP;
+ }
}
- returnCode = checkDeleted(deletes, cell);
- if (returnCode != null) {
+ // optimization when prevCell is Delete or DeleteFamilyVersion
+ if ((returnCode = checkDeletedEffectively(cell, prevCell)) != null) {
+ return returnCode;
+ }
+ if ((returnCode = checkDeleted(deletes, cell)) != null) {
return returnCode;
}
return matchColumn(cell, timestamp, typeByte);
}
+ // If prevCell is a delete marker and cell is a delete marked Put or delete
marker,
+ // it means the cell is deleted effectively.
+ // And we can do SEEK_NEXT_COL.
+ private MatchCode checkDeletedEffectively(ExtendedCell cell, ExtendedCell
prevCell) {
+ if (
+ prevCell != null && canOptimizeReadDeleteMarkers()
+ && CellUtil.matchingRowColumn(prevCell, cell) &&
CellUtil.matchingTimestamp(prevCell, cell)
+ && (PrivateCellUtil.isDeleteType(prevCell)
+ || PrivateCellUtil.isDeleteFamilyVersion(prevCell))
+ ) {
+ return MatchCode.SEEK_NEXT_COL;
+ }
+ return null;
+ }
+
+ private boolean canOptimizeReadDeleteMarkers() {
+ // for simplicity, optimization works only for these cases
+ return !seePastDeleteMarkers && scanMaxVersions == 1 &&
!visibilityLabelEnabled
+ && getFilter() == null && !(deletes instanceof
NewVersionBehaviorTracker);
+ }
Review Comment:
https://issues.apache.org/jira/browse/HBASE-25972?focusedCommentId=17799356&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17799356
I have run the tests followed by upper link. But some settings should be
modified.
```
hbase.hstore.defaultengine.enable.dualfilewriter ->
hbase.enable.historical.compaction.files
alter "T1", {NAME =>"info0", KEEP_DELETED_CELLS => TRUE} -> alter "T1",
{NAME =>"info0", KEEP_DELETED_CELLS => true}
```
There were 4 observations.
1. scan with dual file compaction enabled 7040ms
2. scan with dual file compaction disabled 8539ms
3. scan with delete markers and dual file compaction disabled 6.9936 seconds
4. scan with delete markers and dual file compaction enabled 0.5660 seconds
First, to make a baseline, I ran the same test with master branch without
this PR on my Mac.
1. scan with dual file compaction enabled 5605ms
2. scan with dual file compaction disabled 5711ms
3. scan with delete markers and dual file compaction disabled 3.7109 seconds
4. scan with delete markers and dual file compaction enabled 0.3644 seconds
And then, I have ran the same test with master branch with this PR.
1. scan with dual file compaction enabled 5398ms
2. scan with dual file compaction disabled 5476ms
3. scan with delete markers and dual file compaction disabled 3.1572 seconds
4. scan with delete markers and dual file compaction enabled 0.3701 seconds
The result is exactly the same, because this PR skips the optimization when
KEEP_DELETED_CELLS is set true.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]