[ https://issues.apache.org/jira/browse/HBASE-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784899#comment-17784899 ]
Hudson commented on HBASE-28043: -------------------------------- Results for branch branch-2 [build #921 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/921/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/921/General_20Nightly_20Build_20Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/921/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/921/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/921/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Reduce seeks from beginning of block in StoreFileScanner.seekToPreviousRow > -------------------------------------------------------------------------- > > Key: HBASE-28043 > URL: https://issues.apache.org/jira/browse/HBASE-28043 > Project: HBase > Issue Type: Improvement > Reporter: Becker Ewing > Assignee: Becker Ewing > Priority: Major > Fix For: 2.6.0, 3.0.0-beta-1 > > Attachments: Current_SeekToPreviousRowBehavior.png, > Proposed_SeekToPreviousRowBehavior.png > > > Currently, for non-RIV1 DBE encodings, each call to > [StoreFileScanner.seekToPreviousRow|https://github.com/apache/hbase/blob/89ca7f4ade84c84a246281c71898543b6161c099/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L493-L506] > (a common operation in reverse scans) results in two seeks: > # Seek from the beginning of the block to before the given row to find the > prior row > # Seek from the beginning of the block to the first cell of the prior row > So if there are "N" rows in a block, a reverse scan through each row results > in seeking past summation from i=1 to N (2(i-1)) rows. > > This is a particularly expensive operation for tall tables that have many > rows in a block. > > By introducing a state variable "previousRow" to StoreFileScanner, I believe > that we could modify the seeking algorithm to be: > # Seek from the beginning of the block to before the given row to find the > prior row > # Seek from the beginning of the block to before the row that is before the > row that was just seeked to (i.e. 2 rows back). _Save_ this as a hint for > where the prior row is in "previousRow" > # Reseek from "previousRow" (2 rows back from start) to 1 row back from > start (to the actual previousRow) > Then the rest of the calls where a "previousRow" is present, you just need to > seek to the beginning of the block once instead of twice, i.e. > # seek from the beginning of the block to right before the beginning of your > "previousRow" marker. Save this as the new "previousRow" marker > # Reseek to the next row (i.e. your previous "previousRow" marker) > > If there are "N" rows in a block, a reverse scan from row N to row 0 results > in seeking past approximately summation from i=1 to N (i-1) rows i.e. 50% > less than the current behavior. > > See the attached diagrams for the current and proposed behavior. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)