Thanks for the input. One thing I am trying to understand is the re-try attempts by the HBase client. I would think it would help to overcome to some degree a 'busy region server'. In the above stack trace, the client attempted #14/35 and then NOT resubmitting. Can anyone familiar with that part of the code share some understanding?
On Tue, Feb 4, 2014 at 6:38 PM, Vladimir Rodionov <[email protected]>wrote: > Forgot to add .. > > * Throttle insertion rate. > > If you have 100 simultaneous task - set 100 -200 inserts per sec per task > for starter > (depends on a cluster power, of course). > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [email protected] > > ________________________________________ > From: Vladimir Rodionov [[email protected]] > Sent: Tuesday, February 04, 2014 6:36 PM > To: [email protected] > Subject: Re: hbase client RetriesExhaustedWithDetailsException with > RegionTooBusyException > > Busy - means busy. > Doc says: > Thrown by a region server if it will block and wait to serve a request. For > example, the client wants to insert something to a region while the > region is compacting. > > >> I have a MR job that is loading data into hbase using hbase client API. > You should avoid this. If you can't : > > * Reduce # of M/R tasks. > * Disable major compaction > * Disable region splitting > * Increase max region size. > * Add retry logic to the client (M/R task). > > Check bulk loading. > > > > On Tue, Feb 4, 2014 at 5:44 PM, Jerry He <[email protected]> wrote: > > > Hi, hbase experts: > > > > You probably have seen this in the past. Can someone share some quick > > ideas? > > I have a MR job that is loading data into hbase using hbase client API. > > The job failed after a while. Below is the error info and stack trace. > > > > 2014-02-04 14:15:32,587 WARN org.apache.hadoop.hbase.client.AsyncProcess: > > Attempt #6/35 failed for 182 ops on hdtest203.svl.ibm.com > > ,60020,1391489734651 > > NOT resubmitting. > > > > > region=tpch_hb_1000.lineitem,\x01\x8Ao\x83\xF0\x01\x80'`\x04\x01\x80\x00\x00\x00\x02ufc\x01\x80\x00\x00\x01,1391544629530.b4a41acd34723629c417571b524b80ab., > > hostname=hdtest203.svl.ibm.com,60020,1391489734651, seqNum=682710 > > 2014-02-04 14:19:12,987 INFO > org.apache.hadoop.hbase.client.AsyncProcess: : > > Waiting for the global number of running tasks to be equals or less than > 0, > > tasksSent=302, tasksDone=301, currentTasksDone=301, > > tableName=tpch_hb_1000.lineitem > > 2014-02-04 14:19:32,452 INFO > org.apache.hadoop.hbase.client.AsyncProcess: : > > Waiting for the global number of running tasks to be equals or less than > 0, > > tasksSent=306, tasksDone=305, currentTasksDone=305, > > tableName=tpch_hb_1000.lineitem > > 2014-02-04 14:19:42,544 INFO > org.apache.hadoop.hbase.client.AsyncProcess: : > > Waiting for the global number of running tasks to be equals or less than > 0, > > tasksSent=307, tasksDone=306, currentTasksDone=306, > > tableName=tpch_hb_1000.lineitem > > 2014-02-04 14:19:52,624 INFO > org.apache.hadoop.hbase.client.AsyncProcess: : > > Waiting for the global number of running tasks to be equals or less than > 0, > > tasksSent=308, tasksDone=307, currentTasksDone=307, > > tableName=tpch_hb_1000.lineitem > > 2014-02-04 14:20:03,044 INFO > org.apache.hadoop.hbase.client.AsyncProcess: : > > Waiting for the global number of running tasks to be equals or less than > 0, > > tasksSent=309, tasksDone=308, currentTasksDone=308, > > tableName=tpch_hb_1000.lineitem > > 2014-02-04 14:20:13,133 INFO > org.apache.hadoop.hbase.client.AsyncProcess: : > > Waiting for the global number of running tasks to be equals or less than > 0, > > tasksSent=310, tasksDone=309, currentTasksDone=309, > > tableName=tpch_hb_1000.lineitem > > 2014-02-04 14:20:23,240 INFO > org.apache.hadoop.hbase.client.AsyncProcess: : > > Waiting for the global number of running tasks to be equals or less than > 0, > > tasksSent=311, tasksDone=310, currentTasksDone=310, > > tableName=tpch_hb_1000.lineitem > > 2014-02-04 14:20:33,328 INFO > org.apache.hadoop.hbase.client.AsyncProcess: : > > Waiting for the global number of running tasks to be equals or less than > 0, > > tasksSent=312, tasksDone=311, currentTasksDone=311, > > tableName=tpch_hb_1000.lineitem > > 2014-02-04 14:20:38,444 WARN org.apache.hadoop.hbase.client.AsyncProcess: > > Attempt #14/35 failed for 837 ops on hdtest203.svl.ibm.com > > ,60020,1391489734651 > > NOT resubmitting. > > > > > region=tpch_hb_1000.lineitem,\x01\x8Ao\x83\xF0\x01\x80'`\x04\x01\x80\x00\x00\x00\x02ufc\x01\x80\x00\x00\x01,1391544629530.b4a41acd34723629c417571b524b80ab., > > hostname=hdtest203.svl.ibm.com,60020,1391489734651, seqNum=682710 > > 2014-02-04 14:20:38,452 INFO org.apache.hadoop.mapred.TaskLogsTruncater: > > Initializing logs' truncater with mapRetainSize=-1 and > reduceRetainSize=-1 > > 2014-02-04 14:20:38,472 ERROR > > org.apache.hadoop.security.UserGroupInformation: > PriviledgedActionException > > as:hive (auth:SIMPLE) > > > cause:org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: > > Failed 837 actions: RegionTooBusyException: 837 times, > > 2014-02-04 14:20:38,473 WARN org.apache.hadoop.mapred.Child: Error > running > > child > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: > Failed > > 837 actions: RegionTooBusyException: 837 times, > > at > > > > > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:185) > > at > > > > > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:169) > > at > > > > > org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:782) > > at > > > > > org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:934) > > at > > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1193) > > at > > > > > com.ibm.jaql.module.hbase.HBaseRecordWriter.close(HBaseRecordWriter.java:42) > > at > > > > > com.ibm.jaql.io.hadoop.CompositeOutputAdapter$1.close(CompositeOutputAdapter.java:383) > > at > > > > > org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:806) > > at > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:439) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > > at > > java.security.AccessController.doPrivileged(AccessController.java:310) > > at javax.security.auth.Subject.doAs(Subject.java:573) > > at > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) > > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > 2014-02-04 14:20:38,477 INFO org.apache.hadoop.mapred.Task: Runnning > > cleanup for the task > > > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or [email protected] and > delete or destroy any copy of this message and its attachments. >
