Yes Esteban  I have checked the health of the datanodes from the master in
the hadoop console. Nothing seems really wrong to cause this, even though
one data-node is apparently lost along with the RS in the process of
inserting 50 Million updates... the other 11 are there, up and running so
it should pick-up next and that is it (as long as it is replicating as it
should through the HDFS pipelining process). I thought of HBase
writes-key-hotspotting or some problem in the Hadoop namenode, so checking
this out now...

I will keep investigating and let you know, in fact my first thought was
same as yours too but ./hadoop fsck / is showing all "active" nodes are
healthy nodes, and no file-system level inconsistencies are detected (first
thing I checked before sending the post). Of course running the HBase hbck
consistency check from the command line behaves differently, missing the
mentioned RS in place and throws corresponding exception log.... that is a
weird one then... I might check the name node before I get back to you on
this. I can't think of anything else as of now. Space is not unlimited, yet
sufficient in each of the data-nodes (12) but getting close to its limit in
the mentioned dead RS so yes writes are yet not very balanced but
definitely not the issue as I understand.


On 5 April 2014 19:16, Esteban Gutierrez <este...@cloudera.com> wrote:

> Álvaro,
>
> Have you checked for the health of HDFS? Maybe your cluster ran out of
> space or you don't have data nodes running.
>
> Esteban
>
> > On Apr 5, 2014, at 10:11, haosdent <haosd...@gmail.com> wrote:
> >
> > From the log informations, it seems you lost blocks.
> > 2014-4-6 上午12:38于 "Álvaro Recuero" <algar...@gmail.com>写道:
> >
> >> has anyone come across this before? there is still space in the RS and
> this
> >> is not a problem of datanodes availability as I can confirm. cheers
> >>
> >> 2014-04-05 09:55:19,210 DEBUG
> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: using
> new
> >> createWriter -- HADOOP-6840
> >> 2014-04-05 09:55:19,211 DEBUG
> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
> >> Path=hdfs://
> >> taurus-5.lyon.grid5000.fr:
> >>
> >>
> 9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/0000000000002550928.temp,
> >> syncFs=true, hflush=false, compressi
> >> on=false
> >> 2014-04-05 09:55:19,211 DEBUG
> >> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer
> >> path=hdfs://taurus-5.lyon.grid5
> >>
> >>
> 000.fr:9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/0000000000002550928.tempregion=fc55e2d2d4bcec49d6fedf5
> >> a469353b9
> >> 2014-04-05 09:55:19,233 DEBUG
> >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: tasks arrived or
> >> departed
> >> 2014-04-05 09:55:19,233 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer
> >> Exception: org.apache.hadoop.ipc.RemoteException: java.i
> >> o.IOException: File
> >>
> >>
> /hbase/usertable/237859a0b1e47c86c25a6123506ccb2a/recovered.edits/0000000000002550921.temp
> >> could only be replica
> >> ted to 0 nodes, instead of 1
> >>        at
> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
> >>        at
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
> >>        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
> >>        at
> >>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>        at java.lang.reflect.Method.invoke(Method.java:616)
> >>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
> >>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
> >>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
> >>        at java.security.AccessController.doPrivileged(Native Method)
> >>        at javax.security.auth.Subject.doAs(Subject.java:416)
> >>        at
> >>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> >>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
> >>
> >>        at org.apache.hadoop.ipc.Client.call(Client.java:1070)
> >>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
> >>        at sun.proxy.$Proxy9.addBlock(Unknown Source)
> >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>        at
> >>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >>        at
> >>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>        at java.lang.reflect.Method.invoke(Method.java:616)
> >>        at
> >>
> >>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> >>        at
> >>
> >>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> >>        at sun.proxy.$Proxy9.addBlock(Unknown Source)
> >>        at
> >>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3510)
> >>        at
> >>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3373)
> >>        at
> >>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2589)
> >>        at
> >>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2829)
> >>
> >> 2014-04-05 09:55:19,233 WARN org.apache.hadoop.hdfs.DFSClient: Error
> >> Recovery for block null bad datanode[0] nodes == null
> >> 2014-04-05 09:55:19,233 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> >> get block locations. Source file
> >>
> >>
> "/hbase/usertable/237859a0b1e47c86c25a6123506ccb2a/recovered.edits/0000000000002550921.temp"
> >> - Aborting...
> >>
>

Reply via email to