[ 
https://issues.apache.org/jira/browse/HBASE-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-10679:
---------------------------------

    Attachment: HBASE-10679-trunk_v2.patch

> Both clients get wrong scan results if the first scanner expires and the 
> second scanner is created with the same scannerId on the same region
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10679
>                 URL: https://issues.apache.org/jira/browse/HBASE-10679
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Feng Honghua
>            Assignee: Feng Honghua
>            Priority: Critical
>         Attachments: HBASE-10679-trunk_v1.patch, HBASE-10679-trunk_v2.patch, 
> HBASE-10679-trunk_v2.patch, HBASE-10679-trunk_v2.patch
>
>
> The scenario is as below (both Client A and Client B scan against Region R)
> # A opens a scanner SA on R, the scannerId is N, it successfully get its 
> first row "a"
> # SA's lease expires and it's removed from scanners
> # B opens a scanner SB on R, the scannerId is N too. it successfully get its 
> first row "m"
> # A issues its second scan request with scannerId N, regionserver finds N is 
> valid scannerId and the region matches too. (since the region is always 
> online on this regionserver and both two scanners are against it), so it 
> executes scan request on SB, returns "n" to A -- wrong! (get data from other 
> scanner, A expects row something like "b" that follows "a")
> # B issues its second scan request with scannerId N, regionserver also thinks 
> it's valid, and executes scan on SB, return "o" to B -- wrong! (should return 
> "n" but "n" has been scanned out by A just now)
> The consequence is both clients get wrong scan results:
> # A gets data from scanner created by other client, its own scanner has 
> expired and removed
> # B misses data which should be gotten but has been wrongly scanned out by A
> The root cause is scannerId generated by regionserver can't be guaranteed 
> unique within regionserver's whole lifecycle, *there is only guarantee that 
> scannerIds of scanners that are currently still valid (not expired) are 
> unique*, so a same scannerId can present in scanners again after a former 
> scanner with this scannerId expires and has been removed from scanners. And 
> if the second scanner is against the same region, the bug arises.
> Theoretically, the possibility of above scenario should be very rare(two 
> consecutive scans on a same region from two different clients get a same 
> scannerId, and the first expires before the second is created), but it does 
> can happen, and once it happens, the consequence is severe(all clients 
> involved get wrong data), and should be extremely hard to diagnose/debug



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to