Thanks Tim, I suspect that it should work unless you get so many connections trying to hit the same region that you overwhelm its ability to handle the scans properly. (Or there was a problem in the OP's code)
Scans should be 'dirty reads' imho. -Mike > Date: Thu, 22 Apr 2010 18:57:01 +0200 > Subject: Re: multiple scanners on same table will cause problem? Scan results > change among different tries. > From: timrobertson...@gmail.com > To: hbase-user@hadoop.apache.org > > Attached is a quickly hacked test for parallel scanning threads. You > might want to increase the amount of data in the test though to test > properly. > It seems to pass consistently for me. > > Note it uses a shared HTable object across threads, but the API states: > "Used to communicate with a single HBase table. This class is not > thread safe for writes. Gets, puts, and deletes take out a row lock > for the duration of their operation. Scans (currently) do not respect > row locking." > > But I am not doing any writes in the test. > > Cheers, > Tim > > > > On Thu, Apr 22, 2010 at 4:22 PM, Michael Segel > <michael_se...@hotmail.com> wrote: > > > > > > Tim, > > > > Even without his code, this should be pretty straightforward on how to > > duplicate. > > > > Create the table with a sequence as a column in a column family. > > Then write a non-m/r job that has multiple threads that connect to > > HBase and see what they get when they hit the small table in a single > > region. > > > > If you can duplicate the problem, that would be the test code for the jira. > > > > -Mike > > > >> Date: Thu, 22 Apr 2010 16:13:31 +0200 > >> Subject: Re: multiple scanners on same table will cause problem? Scan > >> results change among different tries. > >> From: timrobertson...@gmail.com > >> To: hbase-user@hadoop.apache.org > >> > >> Could you please post your code that is doing the scanning Steven? > >> > >> > >> > >> On Thu, Apr 22, 2010 at 3:50 PM, Michael Segel > >> <michael_se...@hotmail.com> wrote: > >> > > >> > Ok... > >> > > >> > This is something that I think we'll need input from a major > >> > contributor... > >> > > >> > It looks like there may be an issue with respect to row locking... > >> > > >> > I guess the questions to ask are: > >> > > >> > - How does HBase handle row level locking? > >> > -Concurrent reads/fetches of the same row? > >> > > >> > To be honest and fair, HBase is still an immature product when compared > >> > to databases and there going to be some issues that need to be fleshed > >> > out. (Lets see where we are in 20+ years ;-) > >> > > >> > I wish I knew more about the internals of HBase, but there are only so > >> > many hours in the day and my wife forces me to work so I can keep up > >> > with her spending. ;-) (And if any of you happen to ever meet her, > >> > please don't bring this up, she'll kill me. :-D ) > >> > > >> > Lets see what St.Ack or Andrew have to say. This might be a JIRA issue. > >> > > >> > Thx > >> > > >> > -Mike > >> > > >> > > >> > > >> >> Date: Thu, 22 Apr 2010 20:17:12 +0800 > >> >> Subject: Re: multiple scanners on same table will cause problem? Scan > >> >> results change among different tries. > >> >> From: steven.zhuang.1...@gmail.com > >> >> To: hbase-user@hadoop.apache.org > >> >> > >> >> hi, Michael, > >> >> > >> >> Sorry for not making the question clear, there are > >> >> multiple > >> >> scanners scanning a single table, there might be the case multiple > >> >> scanners > >> >> reading from a single region. > >> >> please see answers inline. > >> >> > >> >> On Thu, Apr 22, 2010 at 8:08 PM, Michael Segel > >> >> <michael_se...@hotmail.com>wrote: > >> >> > >> >> > > >> >> > I'm sorry, but are you trying to say that you have multiple scanners > >> >> > trying > >> >> > to read from a single region and the result sets do not match? > >> >> > > >> >> > Yes, the result sets do not match. > >> >> > >> >> > I guess it would be an easy test, enter a bunch of rows in to a > >> >> > region and > >> >> > have a unique integer for each row. (1,2,3,...) > >> >> > Then run a bunch of unfiltered scans in parallel, and generate a sum > >> >> > from > >> >> > the scan. If any of the sums do not match, then you have a potential > >> >> > issue > >> >> > on concurency/row locking, and row isolation level. How does HBase > >> >> > handle > >> >> > row level locking and isolation levels? > >> >> > > >> >> > I have iterate on the rows/columnfamilies/cells, and printed the > >> >> > content of > >> >> each cell, found that there are some cells missing in some scan result > >> >> set. > >> >> > >> >> > -Mike > >> >> > > >> >> > > Date: Thu, 22 Apr 2010 17:07:47 +0800 > >> >> > > Subject: multiple scanners on same table will cause problem? Scan > >> >> > > results > >> >> > change among different tries. > >> >> > > From: steven.zhuang.1...@gmail.com > >> >> > > To: hbase-user@hadoop.apache.org > >> >> > > > >> >> > > hi, All, > >> >> > > Has anybody do scan on one table using multiple scanners > >> >> > > at the > >> >> > > same time and found some inconsistent problem? > >> >> > > I am doing query on a table using dozens(20-120) of > >> >> > > scanners in > >> >> > > parallel(multiple threads), trying to take advantage of the multiple > >> >> > cores. > >> >> > > But I found the scan results doesn't consist among several goes. I > >> >> > > have > >> >> > > checked my code, seems there is no bug in it. So I guess the > >> >> > > problem may > >> >> > > come from the HBase itself. > >> >> > > My HBase version is 0.20.3. > >> >> > > >> >> > _________________________________________________________________ > >> >> > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars > >> >> > with > >> >> > Hotmail. > >> >> > > >> >> > http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 > >> >> > > >> > > >> > _________________________________________________________________ > >> > Hotmail is redefining busy with tools for the New Busy. Get more from > >> > your inbox. > >> > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2 > > > > _________________________________________________________________ > > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with > > Hotmail. > > http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5 _________________________________________________________________ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2