kadirozde commented on code in PR #6557:
URL: https://github.com/apache/hbase/pull/6557#discussion_r1925877479
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java:
##########
@@ -71,15 +84,42 @@ public MatchCode match(ExtendedCell cell) throws
IOException {
if (includeDeleteMarker) {
this.deletes.add(cell);
}
- return MatchCode.SKIP;
+ // In some cases, optimization can not be done
+ if (!canOptimizeReadDeleteMarkers()) {
+ return MatchCode.SKIP;
+ }
}
- returnCode = checkDeleted(deletes, cell);
- if (returnCode != null) {
+ // optimization when prevCell is Delete or DeleteFamilyVersion
+ if ((returnCode = checkDeletedEffectively(cell, prevCell)) != null) {
+ return returnCode;
+ }
+ if ((returnCode = checkDeleted(deletes, cell)) != null) {
return returnCode;
}
return matchColumn(cell, timestamp, typeByte);
}
+ // If prevCell is a delete marker and cell is a delete marked Put or delete
marker,
+ // it means the cell is deleted effectively.
+ // And we can do SEEK_NEXT_COL.
+ private MatchCode checkDeletedEffectively(ExtendedCell cell, ExtendedCell
prevCell) {
+ if (
+ prevCell != null && canOptimizeReadDeleteMarkers()
+ && CellUtil.matchingRowColumn(prevCell, cell) &&
CellUtil.matchingTimestamp(prevCell, cell)
+ && (PrivateCellUtil.isDeleteType(prevCell)
+ || PrivateCellUtil.isDeleteFamilyVersion(prevCell))
+ ) {
+ return MatchCode.SEEK_NEXT_COL;
+ }
+ return null;
+ }
+
+ private boolean canOptimizeReadDeleteMarkers() {
+ // for simplicity, optimization works only for these cases
+ return !seePastDeleteMarkers && scanMaxVersions == 1 &&
!visibilityLabelEnabled
+ && getFilter() == null && !(deletes instanceof
NewVersionBehaviorTracker);
+ }
Review Comment:
To me, this improvement is only meaningful when the scanned data is in
memstore assuming that that the skip list will be used for jumping from one
column to the next (I have not looked at the code in detail recently so I
assume it is the case). However, when HBase scans data from HFile, do we have
data structures in place to jump from one column to next one? I think we do
not have one. No only we linearly scan the cells within a row, we also
linearly scan all rows within a HBase block, do not we? So I did not understand
why skipping to the next column would be a significant optimization in general.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]