Apache9 commented on PR #6557: URL: https://github.com/apache/hbase/pull/6557#issuecomment-2564724971
I checked the code, we do have logic to seek to next row or column when we hit a delte family cell. https://github.com/apache/hbase/blob/28c435378a95a59d6d34acce6b91524ed797afd3/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L207 But the problem is that, seems we will return earlier before we actually call this method here https://github.com/apache/hbase/blob/28c435378a95a59d6d34acce6b91524ed797afd3/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java#L76 The above code block ``` if (PrivateCellUtil.isDelete(typeByte)) { boolean includeDeleteMarker = seePastDeleteMarkers ? tr.withinTimeRange(timestamp) : tr.withinOrAfterTimeRange(timestamp); if (includeDeleteMarker) { this.deletes.add(cell); } return MatchCode.SKIP; } ``` Seems incorrect, we will always return MatchCode.SKIP if we get a delete maker... I think why we do not find this before is that, usually there will be only one delete maker, so when we check the next cell, we will fall through and call the checkDeleted method so we will seek to next row or column. Here the scenario is that we have bunch of delete makrer, then here we will see them all instead of seek to next row or column, since we will always go into the code block above and return MatchCode.SKIP. I think we should try to optimize the logic of the above code block. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
