On the target region server, there does not seem to be any significant
problem during the same time, except the 'connection reset error':
2014-02-04 14:20:01,462 INFO
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed,
sequenceid=986466, memsize=255.7 M, hasBloomFilter=true, into tmp file
gpfs:/hbase/data/default/tpch_hb_1000.lineitem/b4a41acd34723629c417571b524b80ab/.tmp/6ca03f6cd05140df88f243592cd09fb5
2014-02-04 14:20:39,033 WARN org.apache.hadoop.ipc.RpcServer:
RpcServer.listener,port=60020: count of bytes read: 0
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
at sun.nio.ch.IOUtil.read(IOUtil.java:183)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
at
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
at
org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
at java.lang.Thread.run(Thread.java:738)
2014-02-04 14:20:39,033 WARN org.apache.hadoop.ipc.RpcServer:
RpcServer.respondercallId: 534 service: ClientService methodName: Multi
size: 203.6 K connection: 9.30.194.217:62998: output error
2014-02-04 14:20:39,033 INFO org.apache.hadoop.ipc.RpcServer:
RpcServer.responder: asyncWrite
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:41)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:81)
at sun.nio.ch.IOUtil.write(IOUtil.java:52)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:348)
at
org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
at
org.apache.hadoop.hbase.ipc.RpcServer.channelWrite(RpcServer.java:2345)
at
org.apache.hadoop.hbase.ipc.RpcServer$Responder.processResponse(RpcServer.java:985)
at
org.apache.hadoop.hbase.ipc.RpcServer$Responder.doAsyncWrite(RpcServer.java:925)
at
org.apache.hadoop.hbase.ipc.RpcServer$Responder.doRunLoop(RpcServer.java:854)
at
org.apache.hadoop.hbase.ipc.RpcServer$Responder.run(RpcServer.java:830)
2014-02-04 14:20:39,253 WARN org.apache.hadoop.ipc.RpcServer:
RpcServer.respondercallId: 553 service: ClientService methodName: Multi
size: 304.8 K connection: 9.30.194.218:11920: output error
2014-02-04 14:20:39,253 INFO org.apache.hadoop.ipc.RpcServer:
RpcServer.responder: asyncWrite
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:41)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:81)
at sun.nio.ch.IOUtil.write(IOUtil.java:52)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:348)
at
org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
at
org.apache.hadoop.hbase.ipc.RpcServer.channelWrite(RpcServer.java:2345)
at
org.apache.hadoop.hbase.ipc.RpcServer$Responder.processResponse(RpcServer.java:985)
at
org.apache.hadoop.hbase.ipc.RpcServer$Responder.doAsyncWrite(RpcServer.java:925)
at
org.apache.hadoop.hbase.ipc.RpcServer$Responder.doRunLoop(RpcServer.java:854)
at
org.apache.hadoop.hbase.ipc.RpcServer$Responder.run(RpcServer.java:830)
2014-02-04 14:20:41,802 INFO org.apache.hadoop.hbase.regionserver.HStore:
Added
gpfs:/hbase/data/default/tpch_hb_1000.lineitem/b4a41acd34723629c417571b524b80ab/cf/6ca03f6cd05140df88f243592cd09fb5,
entries=1275365, sequenceid=986466, filesize=86.4 M
2014-02-04 14:20:41,802 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Finished memstore flush of ~256.6 M/269107976, currentsize=0/0 for region
tpch_hb_1000.lineitem,\x01\x8Ao\x83\xF0\x01\x80'`\x04\x01\x80\x00\x00\x00\x02ufc\x01\x80\x00\x00\x01,1391544629530.b4a41acd34723629c417571b524b80ab.
in 44074ms, sequenceid=986466, compaction requested=true
On Tue, Feb 4, 2014 at 5:44 PM, Jerry He <[email protected]> wrote:
> Hi, hbase experts:
>
> You probably have seen this in the past. Can someone share some quick
> ideas?
> I have a MR job that is loading data into hbase using hbase client API.
> The job failed after a while. Below is the error info and stack trace.
>
> 2014-02-04 14:15:32,587 WARN org.apache.hadoop.hbase.client.AsyncProcess:
> Attempt #6/35 failed for 182 ops on hdtest203.svl.ibm.com,60020,1391489734651
> NOT resubmitting.
> region=tpch_hb_1000.lineitem,\x01\x8Ao\x83\xF0\x01\x80'`\x04\x01\x80\x00\x00\x00\x02ufc\x01\x80\x00\x00\x01,1391544629530.b4a41acd34723629c417571b524b80ab.,
> hostname=hdtest203.svl.ibm.com,60020,1391489734651, seqNum=682710
> 2014-02-04 14:19:12,987 INFO org.apache.hadoop.hbase.client.AsyncProcess:
> : Waiting for the global number of running tasks to be equals or less than
> 0, tasksSent=302, tasksDone=301, currentTasksDone=301,
> tableName=tpch_hb_1000.lineitem
> 2014-02-04 14:19:32,452 INFO org.apache.hadoop.hbase.client.AsyncProcess:
> : Waiting for the global number of running tasks to be equals or less than
> 0, tasksSent=306, tasksDone=305, currentTasksDone=305,
> tableName=tpch_hb_1000.lineitem
> 2014-02-04 14:19:42,544 INFO org.apache.hadoop.hbase.client.AsyncProcess:
> : Waiting for the global number of running tasks to be equals or less than
> 0, tasksSent=307, tasksDone=306, currentTasksDone=306,
> tableName=tpch_hb_1000.lineitem
> 2014-02-04 14:19:52,624 INFO org.apache.hadoop.hbase.client.AsyncProcess:
> : Waiting for the global number of running tasks to be equals or less than
> 0, tasksSent=308, tasksDone=307, currentTasksDone=307,
> tableName=tpch_hb_1000.lineitem
> 2014-02-04 14:20:03,044 INFO org.apache.hadoop.hbase.client.AsyncProcess:
> : Waiting for the global number of running tasks to be equals or less than
> 0, tasksSent=309, tasksDone=308, currentTasksDone=308,
> tableName=tpch_hb_1000.lineitem
> 2014-02-04 14:20:13,133 INFO org.apache.hadoop.hbase.client.AsyncProcess:
> : Waiting for the global number of running tasks to be equals or less than
> 0, tasksSent=310, tasksDone=309, currentTasksDone=309,
> tableName=tpch_hb_1000.lineitem
> 2014-02-04 14:20:23,240 INFO org.apache.hadoop.hbase.client.AsyncProcess:
> : Waiting for the global number of running tasks to be equals or less than
> 0, tasksSent=311, tasksDone=310, currentTasksDone=310,
> tableName=tpch_hb_1000.lineitem
> 2014-02-04 14:20:33,328 INFO org.apache.hadoop.hbase.client.AsyncProcess:
> : Waiting for the global number of running tasks to be equals or less than
> 0, tasksSent=312, tasksDone=311, currentTasksDone=311,
> tableName=tpch_hb_1000.lineitem
> 2014-02-04 14:20:38,444 WARN org.apache.hadoop.hbase.client.AsyncProcess:
> Attempt #14/35 failed for 837 ops on hdtest203.svl.ibm.com,60020,1391489734651
> NOT resubmitting.
> region=tpch_hb_1000.lineitem,\x01\x8Ao\x83\xF0\x01\x80'`\x04\x01\x80\x00\x00\x00\x02ufc\x01\x80\x00\x00\x01,1391544629530.b4a41acd34723629c417571b524b80ab.,
> hostname=hdtest203.svl.ibm.com,60020,1391489734651, seqNum=682710
> 2014-02-04 14:20:38,452 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2014-02-04 14:20:38,472 ERROR
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hive (auth:SIMPLE)
> cause:org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 837 actions: RegionTooBusyException: 837 times,
> 2014-02-04 14:20:38,473 WARN org.apache.hadoop.mapred.Child: Error running
> child
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 837 actions: RegionTooBusyException: 837 times,
> at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:185)
> at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:169)
> at
> org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:782)
> at
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:934)
> at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1193)
> at
> com.ibm.jaql.module.hbase.HBaseRecordWriter.close(HBaseRecordWriter.java:42)
> at
> com.ibm.jaql.io.hadoop.CompositeOutputAdapter$1.close(CompositeOutputAdapter.java:383)
> at
> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:806)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:439)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at
> java.security.AccessController.doPrivileged(AccessController.java:310)
> at javax.security.auth.Subject.doAs(Subject.java:573)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 2014-02-04 14:20:38,477 INFO org.apache.hadoop.mapred.Task: Runnning
> cleanup for the task
>
>