[ 
https://issues.apache.org/jira/browse/HBASE-29254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-29254.
-------------------------------
    Hadoop Flags: Reviewed
      Resolution: Fixed

> StoreScanner returns incorrect row after flush due to topChanged behavior
> -------------------------------------------------------------------------
>
>                 Key: HBASE-29254
>                 URL: https://issues.apache.org/jira/browse/HBASE-29254
>             Project: HBase
>          Issue Type: Bug
>          Components: Scanners
>            Reporter: Minwoo Kang
>            Assignee: Minwoo Kang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.6.3, 2.5.12
>
>
> Let’s assume the data stored in HBase is as follows:
> (1) row0/family2:qf1/DeleteColumn
> (2) row0/family2:qf1/Put/value2
> (3) row1/family1:qf1/Put/value2
> (4) row1/family2:qf1/Put/value2
> Now, suppose a user starts scanning from {*}row0{*}.
> In 
> [RegionScannerImpl#nextInternal|https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java#L415],
>  when the [current 
> cell|https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerImpl.java#L446]’s
>  row is row0, after reading entry (2) in StoreScanner, if a flush happens, a 
> topChanged occurs (Storescanner.peek() is changed where before ...), and the 
> value of StoreScanner’s heap.peek() becomes (4) row1/family2:qf1/Put/value2.
> Since it is the next row, StoreScanner should return at that point — but it 
> fails to recognize that it has moved to the next row because 
> [outResult|https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L835]
>  is empty, and ends up including the new row in the result.
> Then, in RegionScannerImpl, it sees that nextKv’s row is different from the 
> current cell’s row, and returns (since it has moved to a different row).
> As a result, even though (3) and (4) belong to the same row (row1), they are 
> returned to the client as if they were from different rows.
> (3) and (4) should be combined into a single 
> [Result|https://github.com/apache/hbase/blob/0b3c17302843d1f4d6f3c6b458f837cb9c274510/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Result.java],
>  but they end up being returned as separate Result instances.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to