Re: [PR] HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell [hbase]

via GitHub Sun, 29 Dec 2024 05:26:27 -0800


Apache9 commented on PR #6557:
URL: https://github.com/apache/hbase/pull/6557#issuecomment-2564724971


   I checked the code, we do have logic to seek to next row or column when we 
hit a delte family cell.
   
   
https://github.com/apache/hbase/blob/28c435378a95a59d6d34acce6b91524ed797afd3/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L207
   
   But the problem is that, seems we will return earlier before we actually 
call this method here
   
   
https://github.com/apache/hbase/blob/28c435378a95a59d6d34acce6b91524ed797afd3/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/NormalUserScanQueryMatcher.java#L76
   
   The above code block
   ```
       if (PrivateCellUtil.isDelete(typeByte)) {
         boolean includeDeleteMarker =
           seePastDeleteMarkers ? tr.withinTimeRange(timestamp) : 
tr.withinOrAfterTimeRange(timestamp);
         if (includeDeleteMarker) {
           this.deletes.add(cell);
         }
         return MatchCode.SKIP;
       }
   ```
   
   Seems incorrect, we will always return MatchCode.SKIP if we get a delete 
maker...
   
   I think why we do not find this before is that, usually there will be only 
one delete maker, so when we check the next cell, we will fall through and call 
the checkDeleted method so we will seek to next row or column.
   
   Here the scenario is that we have bunch of delete makrer, then here we will 
see them all instead of seek to next row or column, since we will always go 
into the code block above and return MatchCode.SKIP.
   
   I think we should try to optimize the logic of the above code block.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-29039 Optimize read performance for accumulated delete markers on the same row or cell [hbase]

Reply via email to