[ https://issues.apache.org/jira/browse/HBASE-29254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Duo Zhang resolved HBASE-29254. ------------------------------- Hadoop Flags: Reviewed Resolution: Fixed > StoreScanner returns incorrect row after flush due to topChanged behavior > ------------------------------------------------------------------------- > > Key: HBASE-29254 > URL: https://issues.apache.org/jira/browse/HBASE-29254 > Project: HBase > Issue Type: Bug > Components: Scanners > Reporter: Minwoo Kang > Assignee: Minwoo Kang > Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.3, 2.5.12 > > > Let’s assume the data stored in HBase is as follows: > (1) row0/family2:qf1/DeleteColumn > (2) row0/family2:qf1/Put/value2 > (3) row1/family1:qf1/Put/value2 > (4) row1/family2:qf1/Put/value2 > Now, suppose a user starts scanning from {*}row0{*}. > In > [RegionScannerImpl#nextInternal|https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java#L415], > when the [current > cell|https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java#L446]’s > row is row0, after reading entry (2) in StoreScanner, if a flush happens, a > topChanged occurs (Storescanner.peek() is changed where before ...), and the > value of StoreScanner’s heap.peek() becomes (4) row1/family2:qf1/Put/value2. > Since it is the next row, StoreScanner should return at that point — but it > fails to recognize that it has moved to the next row because > [outResult|https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L835] > is empty, and ends up including the new row in the result. > Then, in RegionScannerImpl, it sees that nextKv’s row is different from the > current cell’s row, and returns (since it has moved to a different row). > As a result, even though (3) and (4) belong to the same row (row1), they are > returned to the client as if they were from different rows. > (3) and (4) should be combined into a single > [Result|https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Result.java], > but they end up being returned as separate Result instances. -- This message was sent by Atlassian Jira (v8.20.10#820010)