[ 
https://issues.apache.org/jira/browse/HBASE-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene updated HBASE-3382:
---------------------------------
    Issue Type: Improvement  (was: Bug)

> Make HBase client work better under concurrent clients
> ------------------------------------------------------
>
>                 Key: HBASE-3382
>                 URL: https://issues.apache.org/jira/browse/HBASE-3382
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>              Labels: delete
>         Attachments: HBASE-3382-nio.txt, HBASE-3382.txt
>
>
> The HBase client uses 1 socket per regionserver for communication.  This is 
> good for socket control but potentially bad for latency.  How bad?  I did a 
> simple YCSB test that had this config:
>  readproportion=0
>  updateproportion=0
>  scanproportion=1
>  insertproportion=0
>  fieldlength=10
>  fieldcount=100
>  requestdistribution=zipfian
>  scanlength=300
>  scanlengthdistribution=zipfian
> I ran this with 1 and 10 threads.  The summary is as so:
> 1 thread:
> [SCAN]         Operations     1000
> [SCAN]         AverageLatency(ms)     35.871
> 10 threads:
> [SCAN]         Operations     1000
> [SCAN]         AverageLatency(ms)     228.576
> We are taking a 6.5x latency hit in our client.  But why?
> First step was to move the deserialization out of the Connection thread, this 
> seemed like it could have a big win, an analog change on the server side got 
> a 20% performance improvement (already commited as HBASE-2941).  I did this 
> and got about a 20% improvement again, with that 228ms number going to about 
> 190 ms.  
> So I then wrote a high performance nanosecond resolution tracing utility.  
> Clients can flag an API call, and we get tracing and numbers through the 
> client pipeline.  What I found is that a lot of time is being spent in 
> receiving the response from the network.  The code block is like so:
>         NanoProfiler.split(id, "receiveResponse");
>         if (LOG.isDebugEnabled())
>           LOG.debug(getName() + " got value #" + id);
>         Call call = calls.get(id);
>         size -= 4;  // 4 byte off for id because we already read it.
>         ByteBuffer buf = ByteBuffer.allocate(size);
>         IOUtils.readFully(in, buf.array(), buf.arrayOffset(), size);
>         buf.limit(size);
>         buf.rewind();
>         NanoProfiler.split(id, "setResponse", "Data size: " + size);
> I came up with some numbers:
> 11726 (receiveResponse) split: 64991689 overall: 133562895 Data size: 4288937
> 12163 (receiveResponse) split: 32743954 overall: 103787420 Data size: 1606273
> 12561 (receiveResponse) split: 3517940 overall: 83346740 Data size: 4
> 12136 (receiveResponse) split: 64448701 overall: 203872573 Data size: 3570569
> The first number is the internal counter for keeping requests unique from 
> HTable on down.  The numbers are in ns, the data size is in bytes.
> Doing some simple calculations, we see for the first line we were reading at 
> about 31 MB/sec.  The second one is even worse.  Other calls are like:
> 26 (receiveResponse) split: 7985400 overall: 21546226 Data size: 850429
> which is 107 MB/sec which is pretty close to the maximum of gige.  In my set 
> up, the ycsb client ran on the master node and HAD to use network to talk to 
> regionservers.
> Even at full line rate, we could still see unacceptable hold ups of unrelated 
> calls that just happen to need to talk to the same regionserver.
> This issue is about these findings, what to do, how to improve. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to