[jira] [Updated] (HBASE-9775) Client write path perf issues

2013-11-05 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9775:
-

Attachment: 9775.rig.v3.patch

v3 dumps mockito.  Mockito keeps references to each invocation so can keep 
running counts.  I could not figure how to disable this facility.  Patch is 
better w/o it anyways.

v3 no longer has heap issues, at least at current 'scales'.

The patch as is is configured to do the inverse of the previous patch.  Now I 
have a single 'server' and I have ten clients beating up on it.  It doesn't 
take long for the clients to 'overrun' the server.  The server cannot respond 
in time so we just keep throwing RegionBusyException more and more frequently 
-- which simulates I think what E was seeing on the 'big' cluster.

Will dig in tomorrow on what we can do when RBE -- how to better back off 
(Elliott had ideas in here).

 Client write path perf issues
 -

 Key: HBASE-9775
 URL: https://issues.apache.org/jira/browse/HBASE-9775
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: Elliott Clark
Priority: Critical
 Attachments: 9775.rig.txt, 9775.rig.v2.patch, 9775.rig.v3.patch, 
 Charts Search   Cloudera Manager - ITBLL.png, Charts Search   Cloudera 
 Manager.png, hbase-9775.patch, job_run.log, short_ycsb.png, ycsb.png, 
 ycsb_insert_94_vs_96.png


 Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9775) Client write path perf issues

2013-10-30 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9775:
-

Attachment: 9775.rig.v2.patch

Rebase for 0.96.

You just run the main on TestClientNoCluster.

After updating, no noticeable difference.  We run up to 100 threads and stay 
there w/ near all in wait mode.

 Client write path perf issues
 -

 Key: HBASE-9775
 URL: https://issues.apache.org/jira/browse/HBASE-9775
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: Elliott Clark
Priority: Critical
 Attachments: 9775.rig.txt, 9775.rig.v2.patch, Charts Search   
 Cloudera Manager - ITBLL.png, Charts Search   Cloudera Manager.png, 
 hbase-9775.patch, job_run.log, short_ycsb.png, ycsb_insert_94_vs_96.png, 
 ycsb.png


 Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9775) Client write path perf issues

2013-10-29 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9775:
-

Attachment: 9775.rig.txt

I'm trying to write a rig that the client can run in so we can inspect it.  
Attached is a bit of code that mocks a cluster of 1k servers and 100k regions.  
Currently it runs w/o throwing exceptions of failures.

When I put it under the profiler, we spin up 7 threads and that seems to keep 
us running nicely; we never go beyond 7.

If I add some friction by adding pause to the mock Put handler so it takes 
time to process the puts, thread count spins up and tops out at 100 which 
looks like it is: AsyncProcess#maxTotalConcurrentTasks whose config is 
hbase.client.max.total.tasks.

I suppose I should randomize up the way I put -- it is sort of ordered at the 
moment but even then, it looks like I'd be doing 1/10th of the servers at a 
time.

Let me update and see what the [~liochon] recent changes do.


 Client write path perf issues
 -

 Key: HBASE-9775
 URL: https://issues.apache.org/jira/browse/HBASE-9775
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: Elliott Clark
Priority: Critical
 Attachments: 9775.rig.txt, Charts Search   Cloudera Manager - 
 ITBLL.png, Charts Search   Cloudera Manager.png, hbase-9775.patch, 
 job_run.log, short_ycsb.png, ycsb_insert_94_vs_96.png, ycsb.png


 Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9775) Client write path perf issues

2013-10-18 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-9775:
-

Attachment: ycsb.png

{quote}
I observed better write performances on the 0.96 than 0.94, by about 20% when 
inserting 100m of rows from an empty cluster. There are around 18 regions at 
this stage IIRC, so the cluster size should not matter that much when we start 
from an empty table. I've inserted around 1b w/o issue on 0.96.
{quote}

Our performance team independently ran some ycsb tests vs HBase 0.94.6. Here's 
the graph that they generated.  Blue is 94.  Orange is 0.96.

X axis is target throughput
Y axis is actual throughput

 Client write path perf issues
 -

 Key: HBASE-9775
 URL: https://issues.apache.org/jira/browse/HBASE-9775
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: Elliott Clark
Priority: Critical
 Attachments: Charts Search   Cloudera Manager - ITBLL.png, Charts 
 Search   Cloudera Manager.png, hbase-9775.patch, job_run.log, short_ycsb.png, 
 ycsb_insert_94_vs_96.png, ycsb.png


 Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9775) Client write path perf issues

2013-10-17 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9775:
-

Attachment: hbase-9775.patch

I think I found one bug in the AsyncProcess hurts performance. Below is the 
code snippet:
{code}
  incTaskCounters(multiAction.getRegions(), loc.getServerName());
  Runnable runnable = Trace.wrap(AsyncProcess.sendMultiAction, new 
Runnable() {

receiveMultiAction(initialActions, multiAction, loc, res, 
numAttempt, errorsByServer);
  } finally {
decTaskCounters(multiAction.getRegions(), loc.getServerName());
  }
{code}
Because receiveMultiAction use recursive way to resubmit failure edits. 
Therefore, we double bump up the TaskCounter when error happens and the overlap 
timing is a retry internal which is quite long time for client operations.

I attached a patch for your reference.

 Client write path perf issues
 -

 Key: HBASE-9775
 URL: https://issues.apache.org/jira/browse/HBASE-9775
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: Elliott Clark
Priority: Critical
 Attachments: Charts Search   Cloudera Manager - ITBLL.png, Charts 
 Search   Cloudera Manager.png, hbase-9775.patch, job_run.log, short_ycsb.png, 
 ycsb_insert_94_vs_96.png


 Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9775) Client write path perf issues

2013-10-16 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-9775:
-

Attachment: ycsb_insert_94_vs_96.png

I ran a 94 vs 96 comparison.  Here are the results.  You can see that 0.94 
handily beats 96 until compaction become the limiting factor.


 Client write path perf issues
 -

 Key: HBASE-9775
 URL: https://issues.apache.org/jira/browse/HBASE-9775
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: Elliott Clark
Priority: Critical
 Attachments: Charts Search   Cloudera Manager.png, short_ycsb.png, 
 ycsb_insert_94_vs_96.png


 Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9775) Client write path perf issues

2013-10-16 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-9775:
-

Attachment: Charts Search   Cloudera Manager - ITBLL.png

Here's what the network looked like at the time.

 Client write path perf issues
 -

 Key: HBASE-9775
 URL: https://issues.apache.org/jira/browse/HBASE-9775
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: Elliott Clark
Priority: Critical
 Attachments: Charts Search   Cloudera Manager - ITBLL.png, Charts 
 Search   Cloudera Manager.png, job_run.log, short_ycsb.png, 
 ycsb_insert_94_vs_96.png


 Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9775) Client write path perf issues

2013-10-16 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-9775:
-

Attachment: job_run.log

here's the logs.  The RS's are running G1GC so there should be no issues with 
GC pausing.

I'm seeing this as the pause times:
2013-10-16T10:19:23.182-0700: [GC pause (young), 0.10152600 secs]

All of the boxes are on 10 gig.

I ran:

{code}
hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey calm 
Loop 2 154 2500 IntegrationTestBigLinkedList 77  job_run.log  21
{code}

So there should be 2 clients per region server.

 Client write path perf issues
 -

 Key: HBASE-9775
 URL: https://issues.apache.org/jira/browse/HBASE-9775
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: Elliott Clark
Priority: Critical
 Attachments: Charts Search   Cloudera Manager - ITBLL.png, Charts 
 Search   Cloudera Manager.png, job_run.log, short_ycsb.png, 
 ycsb_insert_94_vs_96.png


 Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9775) Client write path perf issues

2013-10-15 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-9775:
-

Summary: Client write path perf issues  (was: Client write path scales very 
badly with more servers)

 Client write path perf issues
 -

 Key: HBASE-9775
 URL: https://issues.apache.org/jira/browse/HBASE-9775
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0
Reporter: Elliott Clark
Priority: Critical
 Attachments: Charts Search   Cloudera Manager.png, short_ycsb.png


 Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)