[jira] [Commented] (HBASE-15097) When the scan operation covered two regions,sometimes the final results have duplicated rows.

chenrongwei (JIRA) Sat, 23 Jan 2016 06:03:04 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113755#comment-15113755
 ]


chenrongwei commented on HBASE-15097:
-------------------------------------

I think it won't happen.There are two situations under no stop row had been set.
1: table only have one region,(null,null)
2: table has more than one region, such as 
(null,region_1_endKey)...[region_n-1_startKey, region_n_startKey), 
[region_n_startKey,null).
if table only have one region,there is no this problem obviously,because of all 
data in the same region,so we just to see the second situation.
Under the second situation,if we not per the patch,according to the region 
maybe hold the old data which maybe belong to this region before its splitting, 
so that the scan operation will maybe get duplicate rows.But I think this 
mistake,which the region scan get old data, would just happen in the region 
except the last one. Because there is no rowkey can out of its end key(null),so 
the last region always has the newest data,according to this reason,we just 
need to make sure other regions don't happen this mistake,then we will make the 
scan avoid getting old data,and we per this patch just do that thing. 

> When the scan operation covered two regions,sometimes the final results have 
> duplicated rows.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-15097
>                 URL: https://issues.apache.org/jira/browse/HBASE-15097
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.1.2
>         Environment: centos 6.5
> hbase 1.1.2 
>            Reporter: chenrongwei
>            Assignee: chenrongwei
>         Attachments: HBASE-15097-v001.patch, HBASE-15097-v002.patch, 
> HBASE-15097-v003.patch, HBASE-15097-v004.patch, output.log, rowkey.txt, 
> snapshot2016-01-13 pm 8.42.37.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When the scan operation‘s start key and end key covered two regions,the first 
> region returned the rows which were beyond of its' end key.So,this finally 
> leads to duplicated rows in the results.
> To avoid this problem,we should add a judgment before setting the variable 
> "stopRow" in the class of HRegion,like follow:
>             if (Bytes.equals(scan.getStopRow(), HConstants.EMPTY_END_ROW) && 
> !scan.isGetScan()) {
>                 this.stopRow = null;
>             } else {
>                 if (Bytes.compareTo(scan.getStopRow(), 
> this.getRegionInfo().getEndKey()) >= 0) {
>                     this.stopRow = this.getRegionInfo().getEndKey();
>                 } else {
>                     this.stopRow = scan.getStopRow();
>                 }
>             }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15097) When the scan operation covered two regions,sometimes the final results have duplicated rows.

Reply via email to