EungsopYoo commented on code in PR #6557:
URL: https://github.com/apache/hbase/pull/6557#discussion_r1926301945
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java:
##########
@@ -71,15 +84,42 @@ public MatchCode match(ExtendedCell cell) throws
IOException {
if (includeDeleteMarker) {
this.deletes.add(cell);
}
- return MatchCode.SKIP;
+ // In some cases, optimization can not be done
+ if (!canOptimizeReadDeleteMarkers()) {
+ return MatchCode.SKIP;
+ }
}
- returnCode = checkDeleted(deletes, cell);
- if (returnCode != null) {
+ // optimization when prevCell is Delete or DeleteFamilyVersion
+ if ((returnCode = checkDeletedEffectively(cell, prevCell)) != null) {
+ return returnCode;
+ }
+ if ((returnCode = checkDeleted(deletes, cell)) != null) {
return returnCode;
}
return matchColumn(cell, timestamp, typeByte);
}
+ // If prevCell is a delete marker and cell is a delete marked Put or delete
marker,
+ // it means the cell is deleted effectively.
+ // And we can do SEEK_NEXT_COL.
+ private MatchCode checkDeletedEffectively(ExtendedCell cell, ExtendedCell
prevCell) {
+ if (
+ prevCell != null && canOptimizeReadDeleteMarkers()
+ && CellUtil.matchingRowColumn(prevCell, cell) &&
CellUtil.matchingTimestamp(prevCell, cell)
+ && (PrivateCellUtil.isDeleteType(prevCell)
+ || PrivateCellUtil.isDeleteFamilyVersion(prevCell))
+ ) {
+ return MatchCode.SEEK_NEXT_COL;
+ }
+ return null;
+ }
+
+ private boolean canOptimizeReadDeleteMarkers() {
+ // for simplicity, optimization works only for these cases
+ return !seePastDeleteMarkers && scanMaxVersions == 1 &&
!visibilityLabelEnabled
+ && getFilter() == null && !(deletes instanceof
NewVersionBehaviorTracker);
+ }
Review Comment:
> To me, this improvement is only meaningful when the scanned data is in
memstore assuming that that the skip list will be used for jumping from one
column to the next (I have not looked at the code in detail recently so I
assume it is the case). However, when HBase scans data from HFile, do we have
data structures in place to jump from one column to next one? I think we do not
have one. No only we linearly scan the cells within a row, we also linearly
scan all rows within a HBase block, do not we? So I did not understand why
skipping to the next column would be a significant optimization in general.
> To me, this improvement is only meaningful when the scanned data is in
memstore assuming that that the skip list will be used for jumping from one
column to the next (I have not looked at the code in detail recently so I
assume it is the case). However, when HBase scans data from HFile, do we have
data structures in place to jump from one column to next one? I think we do not
have one. No only we linearly scan the cells within a row, we also linearly
scan all rows within a HBase block, do not we? So I did not understand why
skipping to the next column would be a significant optimization in general.
https://issues.apache.org/jira/browse/HBASE-29039
The performance tests on the Jira description are just the cases of reading
from MemStore only. So I have run new performance tests, reading from
StoreFiles only with or without dual file compaction.
```
create 'test', 'c'
java_import org.apache.hadoop.hbase.client.Delete
java_import org.apache.hadoop.hbase.TableName
java_import java.lang.System
con = @hbase.instance_variable_get(:@connection)
table = con.getTable(TableName.valueOf('test'))
1000.times do |i|
# batch 10000 deletes with different timestamps every 10 seconds
now = System.currentTimeMillis()
dels = 10000.times.map do |i|
del = Delete.new(Bytes.toBytes('row'))
del.addFamily(Bytes.toBytes('c'), now + i)
end
table.delete(dels)
sleep(10)
flush 'test'
# Trigger manually minor compaction because of compaction is not triggered
automatically. But I don't know why yet
compact 'test' if i % 10 == 0
puts "i - #{i}"
# Read after flush from StoreFiles
get 'test', 'row'
end
```
master - without dual file compaction
```
i - 0
COLUMN CELL
0 row(s)
Took 0.0184 seconds
...
i - 10
COLUMN CELL
0 row(s)
Took 0.0206 seconds
...
i - 20
COLUMN CELL
0 row(s)
Took 0.0398 seconds
...
i - 30
COLUMN CELL
0 row(s)
Took 0.0462 seconds
...
i - 40
COLUMN CELL
0 row(s)
Took 0.0517 seconds
...
i - 50
COLUMN CELL
0 row(s)
Took 0.0730 seconds
...
i - 60
COLUMN CELL
0 row(s)
Took 0.0903 seconds
```
master - with dual file compaction
```
i - 0
COLUMN CELL
0 row(s)
Took 0.0050 seconds
...
i - 10
COLUMN CELL
0 row(s)
Took 0.0175 seconds
...
i - 20
COLUMN CELL
0 row(s)
Took 0.0265 seconds
...
i - 30
COLUMN CELL
0 row(s)
Took 0.0310 seconds
...
i - 40
COLUMN CELL
0 row(s)
Took 0.0205 seconds
...
i - 50
COLUMN CELL
0 row(s)
Took 0.0532 seconds
...
i - 60
COLUMN CELL
0 row(s)
Took 0.0360 seconds
```
this PR - without dual file compaction
```
i - 0
COLUMN CELL
0 row(s)
Took 0.0061 seconds
...
i - 10
COLUMN CELL
0 row(s)
Took 0.0102 seconds
...
i - 20
COLUMN CELL
0 row(s)
Took 0.0052 seconds
...
i - 30
COLUMN CELL
0 row(s)
Took 0.0073 seconds
...
i - 40
COLUMN CELL
0 row(s)
Took 0.0077 seconds
...
i - 50
COLUMN CELL
0 row(s)
Took 0.0115 seconds
...
i - 60
COLUMN CELL
0 row(s)
Took 0.0101 seconds
```
this PR - with dual file compaction
```
i - 0
COLUMN CELL
0 row(s)
Took 0.0046 seconds
...
i - 10
COLUMN CELL
0 row(s)
Took 0.0103 seconds
...
i - 20
COLUMN CELL
0 row(s)
Took 0.0074 seconds
...
i - 30
COLUMN CELL
0 row(s)
Took 0.0067 seconds
...
i - 40
COLUMN CELL
0 row(s)
Took 0.0077 seconds
...
i - 50
COLUMN CELL
0 row(s)
Took 0.0091 seconds
i - 60
COLUMN CELL
0 row(s)
Took 0.0106 seconds
```
The results show that the optimization of this PR works on reading from
StoreFiles too, even without dual file compaction. What do you think about this
results?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]