[ https://issues.apache.org/jira/browse/HBASE-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Feng Honghua updated HBASE-10679: --------------------------------- Attachment: HBASE-10679-trunk_v2.patch > Both clients get wrong scan results if the first scanner expires and the > second scanner is created with the same scannerId on the same region > --------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-10679 > URL: https://issues.apache.org/jira/browse/HBASE-10679 > Project: HBase > Issue Type: Bug > Components: regionserver > Reporter: Feng Honghua > Assignee: Feng Honghua > Priority: Critical > Attachments: HBASE-10679-trunk_v1.patch, HBASE-10679-trunk_v2.patch, > HBASE-10679-trunk_v2.patch, HBASE-10679-trunk_v2.patch > > > The scenario is as below (both Client A and Client B scan against Region R) > # A opens a scanner SA on R, the scannerId is N, it successfully get its > first row "a" > # SA's lease expires and it's removed from scanners > # B opens a scanner SB on R, the scannerId is N too. it successfully get its > first row "m" > # A issues its second scan request with scannerId N, regionserver finds N is > valid scannerId and the region matches too. (since the region is always > online on this regionserver and both two scanners are against it), so it > executes scan request on SB, returns "n" to A -- wrong! (get data from other > scanner, A expects row something like "b" that follows "a") > # B issues its second scan request with scannerId N, regionserver also thinks > it's valid, and executes scan on SB, return "o" to B -- wrong! (should return > "n" but "n" has been scanned out by A just now) > The consequence is both clients get wrong scan results: > # A gets data from scanner created by other client, its own scanner has > expired and removed > # B misses data which should be gotten but has been wrongly scanned out by A > The root cause is scannerId generated by regionserver can't be guaranteed > unique within regionserver's whole lifecycle, *there is only guarantee that > scannerIds of scanners that are currently still valid (not expired) are > unique*, so a same scannerId can present in scanners again after a former > scanner with this scannerId expires and has been removed from scanners. And > if the second scanner is against the same region, the bug arises. > Theoretically, the possibility of above scenario should be very rare(two > consecutive scans on a same region from two different clients get a same > scannerId, and the first expires before the second is created), but it does > can happen, and once it happens, the consequence is severe(all clients > involved get wrong data), and should be extremely hard to diagnose/debug -- This message was sent by Atlassian JIRA (v6.2#6252)