RE: 答复: HBase random read performance

2013-04-16 Thread Liu, Raymond
So what is lacking here? The action should also been parallel inside RS for each region, Instead of just parallel on RS level? Seems this will be rather difficult to implement, and for Get, might not be worthy? I looked at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java

Re: 答复: HBase random read performance

2013-04-16 Thread Nicolas Liochon
I think there is something in the middle that could be done. It was discussed here a while ago, but without any JIRA created. See thread: http://mail-archives.apache.org/mod_mbox/hbase-user/201302.mbox/%3CCAKxWWm19OC+dePTK60bMmcecv=7tc+3t4-bq6fdqeppix_e...@mail.gmail.com%3E If someone can spend

Re: 答复: HBase random read performance

2013-04-16 Thread Jean-Marc Spaggiari
Hi Nicolas, I think it might be good to create a JIRA for that anyway since seems that some users are expecting this behaviour. My 2¢ ;) JM 2013/4/16 Nicolas Liochon nkey...@gmail.com I think there is something in the middle that could be done. It was discussed here a while ago, but without

Re: 答复: HBase random read performance

2013-04-16 Thread lars hofhansl
From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Sent: Monday, April 15, 2013 10:03 AM Subject: Re: 答复: HBase random read performance This is a related JIRA which should provide noticeable speed up: HBASE-1935 Scan in parallel Cheers On Mon, Apr

Re: 答复: HBase random read performance

2013-04-15 Thread Ankit Jain
Hi Liang, Thanks Liang for reply.. Ans1: I tried by using HFile block size of 32 KB and bloom filter is enabled. The random read performance is 1 records in 23 secs. Ans2: We are retrieving all the 1 rows in one call. Ans3: Disk detai: Model Number: ST2000DM001-1CH164 Serial

Re: 答复: HBase random read performance

2013-04-15 Thread Doug Meil
Hi there, regarding this... We are passing random 1 row-keys as input, while HBase is taking around 17 secs to return 1 records. …. Given that you are generating 10,000 random keys, your multi-get is very likely hitting all 5 nodes of your cluster. Historically, multi-Get used to

Re: 答复: HBase random read performance

2013-04-15 Thread Ted Yu
Doug made a good point. Take a look at the performance gain for parallel scan (bottom chart compared to top chart): https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png See

Re: 答复: HBase random read performance

2013-04-15 Thread Ted Yu
I looked at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java in 0.94 In processBatchCallback(), starting line 1538, // step 1: break up into regionserver-sized chunks and build the data structs MapHRegionLocation, MultiActionR actionsByServer = new

Re: 答复: HBase random read performance

2013-04-15 Thread Ted Yu
This is a related JIRA which should provide noticeable speed up: HBASE-1935 Scan in parallel Cheers On Mon, Apr 15, 2013 at 7:13 AM, Ted Yu yuzhih...@gmail.com wrote: I looked at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java in 0.94 In processBatchCallback(),