bbeaudreault commented on code in PR #4967:
URL: https://github.com/apache/hbase/pull/4967#discussion_r1073484237
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:
##########
@@ -612,6 +612,9 @@ public boolean next(List<Cell> outResult, ScannerContext
scannerContext) throws
// here, we still need to scan all the qualifiers before returning...
scannerContext.returnImmediately();
}
+
+ scannerContext.incrementBlockProgress(heap.getCurrentBlockSizeOnce());
Review Comment:
Correct. KeyValueHeap makes this tricky, this is why the API design is
strange. In the context of a single `StoreScanner.next` call, we can jump back
and forth between different blocks in different StoreFiles. This approach here
ensures that we always only count each block once.
I considered a few different approaches, and I felt this one was the
simplest:
1. Similar to RSRpcServices.addSize, try tracking blocks here by checking
`heap.getCurrentBlock() != lastBlock`. As mentioned above, this is error prone
(as the comments in addSize mention as well).
2. Try adding an `HFileReaderImpl.setBlockChangedCallback(newBlock ->
scannerContext.incrementBlockSize(newBlock))`. This would work, but opens up
issue with how StoreScanner can re-open StoreFileScanners due to flushes or
read type switches, etc. We'd need to update this callback in
RegionScannerImpl.nextInternal, which would need to delegate out to all
StoreScanners -> KeyValueHeaps -> StoreFileScanners.
3. Add a `HFileReaderImpl.getBlockChanged()` and
`HFileReader.getCurrentBlockSize()`. Then we could change this line to:
```java
if (heap.getBlockChanged()) {
scannerContext.incrementBlockProgress(heap.getCurrentBlockSize());
}
```
The implementation of HFileReaderImpl.getBlockChanged would be:
```java
public boolean getBlockChanged() {
if (blockChanged) {
blockChanged = false;
return true;
}
return false;
}
```
We'd reset blockChanged to true whenever we update curBlock in
HFileReaderImpl.
Option 3 would probably be the closest to what I have and most
accurate/simplest, and it might be more intuitive API as well. The only
downside would be 2 new interface methods instead of 1.
Would you like me to do Option 3?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]