If I just start a client to fetch the META infomation (string) then inject
it to
another clients. Will it be possible?
Thanks

Fleming Chiu(邱宏明)
707-6128
y_823...@tsmc.com
週一無肉日吃素救地球(Meat Free Monday Taiwan)




                                                                                
                                                                      
                      jdcry...@gmail.co                                         
                                                                      
                      m                        To:      
hbase-user@hadoop.apache.org                                                    
              
                      Sent by:                 cc:      (bcc: Y_823910/TSMC)    
                                                                      
                      jdcry...@gmail.co        Subject: Re: HBase reading 
performance                                                                 
                      m                                                         
                                                                      
                                                                                
                                                                      
                                                                                
                                                                      
                      2010/03/02 10:00                                          
                                                                      
                      AM                                                        
                                                                      
                      Please respond to                                         
                                                                      
                      hbase-user                                                
                                                                      
                                                                                
                                                                      
                                                                                
                                                                      




Ah I understand now, thanks for the context. So I interpreted your
first test wrong, you are just basically hitting .META. with a lot of
random reads with lots of clients that have completely empty caches
when the test begins.

So here you hit some pain points we have currently WRT random reads
but first, I'd like to point out that HBase isn't your typical RDBMS
where you can just point the machine to read from and be done with it.
Here the client has to figure the region locations by itself doing
location discovery using the .META. table. Normally that would be fast
but a couple of issues are slowing down concurrent reads on hot rows:

We don't use pread: https://issues.apache.org/jira/browse/HBASE-2180
Random reading locks whole rows (among other stuff) that will be fixed
in: https://issues.apache.org/jira/browse/HBASE-2248
Reading from .META. is really slowed down when it has more than 1
store file: https://issues.apache.org/jira/browse/HBASE-2175

The first two are actively being worked on, the third still needs
investigation and may be just a symptom.

What this means for you is that, if possible, you should try to reuse
JVMs across jobs in order to use warmed up caches. For example, do the
same test but call the same code twice and you should see each new
HTable be really faster in the second batch.

Another option would be to implement a new feature in the HBase client
that warms it up using scanners (I think there's a jira about it).

J-D

2010/3/1  <y_823...@tsmc.com>:
> Hi,
>
> We treat HBASE as a DataGrid.
> There are a lot of HBase java client in our Compute Grid(GridGain) to
fetch
> data from HBASE concurrently.
> Our data is normalized data from Oracle, these computing code is to do
join
> and some aggregations.
> So our POC job is to ?Loading Tables' data from Hbase -> Compute these
data
> (join & aggregation) -> Save back to HBase
> It's doing very well while we run 10 jobs using 10 concurrent clients ,
it
> took 53 sec.
> We expect our 20 machines can gain 60 sec complete time while we run 200
> jobs(200 concurrent clients)
> but in fact, these clients all blocked in following code:
> ? ? ?IndexedTable idxTable1= new
> IndexedTable(config,Bytes.toBytes("Table1"));
> The result we are not satisfied as following,
> ? ? > > 200 ?client ? 839 sec
> ? ? > > 400 ?cleint ?1801 sec
> We estimated about 85% time took in new IndexedTable while client number
up
> to 200.
> That say HBase can serve well while hundred of client connecting to it
> concurrently.
> Just new a table in your code then run it concurrently in thread or other
> distributing computing platform
> that maybe you can see what's wrong with it ?
> If Hbase just focuses on a few web server connections that's ok,
> but like RDBMS can serve a thousand of concurrent connection, the Hbase
> architecture seems need to be adjusted.
> That's my opinion!
>
>




 --------------------------------------------------------------------------- 
                                                         TSMC PROPERTY       
 This email communication (and any attachments) is proprietary information   
 for the sole use of its                                                     
 intended recipient. Any unauthorized review, use or distribution by anyone  
 other than the intended                                                     
 recipient is strictly prohibited.  If you are not the intended recipient,   
 please notify the sender by                                                 
 replying to this email, and then delete this email and any copies of it     
 immediately. Thank you.                                                     
 --------------------------------------------------------------------------- 



Reply via email to