Re: Error in RS with 0.94.8

Enis Söztutar Fri, 25 Apr 2014 11:06:33 -0700

Did you set replication to 1?

The following error message indicates that the default replication is set
to 1:


could only be replicated to 0 nodes, instead of 1


In that case, losing a datanode would mean blocks will be lost.

Enis


On Fri, Apr 25, 2014 at 1:32 AM, Álvaro Recuero <algar...@gmail.com> wrote:

> Data nodes are fine. Actually the Region server on that serverxxxxx is the
> solely one dead afterwards. Datanode is up, and HDFS reporting healthy
> status. Interesting that is possible.
>
> I have steadily come across the problem again testing a new HBase cluster,
> so yes, I would bet the problem is in HDFS somehow. Probably something is
> missing yes.
>
> 2014-04-24 17:59:30,003 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block null bad datanode[0] nodes == null
> 2014-04-24 17:59:30,003 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get block locations. Source file
>
> "/hbase/.logs/serverxxxxx,1398350408274/serverxxxxx%2C60020%2C1398350408274.1398350409004"
> - Aborting...
> 2014-04-24 17:59:30,003 ERROR
> org.apache.hadoop.hbase.regionserver.wal.HLog: syncer encountered error,
> will retry. txid=1
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
>
> /hbase/.logs/serverxxxxx,60020,1398350408274/serverxxxxx%2C60020%2C1398350408274.1398350409004
> could only be replicated to 0 nodes, instead of 1
>         at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:616)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.securitWrite failed: Broken pipect.java:416)
>
>
> On 5 April 2014 21:58, Álvaro Recuero <algar...@gmail.com> wrote:
>
> > Yes Esteban  I have checked the health of the datanodes from the master
> > in the hadoop console. Nothing seems really wrong to cause this, even
> > though one data-node is apparently lost along with the RS in the process
> of
> > inserting 50 Million updates... the other 11 are there, up and running so
> > it should pick-up next and that is it (as long as it is replicating as it
> > should through the HDFS pipelining process). I thought of HBase
> > writes-key-hotspotting or some problem in the Hadoop namenode, so
> checking
> > this out now...
> >
> > I will keep investigating and let you know, in fact my first thought was
> > same as yours too but ./hadoop fsck / is showing all "active" nodes are
> > healthy nodes, and no file-system level inconsistencies are detected
> (first
> > thing I checked before sending the post). Of course running the HBase
> hbck
> > consistency check from the command line behaves differently, missing the
> > mentioned RS in place and throws corresponding exception log.... that is
> a
> > weird one then... I might check the name node before I get back to you on
> > this. I can't think of anything else as of now. Space is not unlimited,
> yet
> > sufficient in each of the data-nodes (12) but getting close to its limit
> in
> > the mentioned dead RS so yes writes are yet not very balanced but
> > definitely not the issue as I understand.
> >
> >
> > On 5 April 2014 19:16, Esteban Gutierrez <este...@cloudera.com> wrote:
> >
> >> Álvaro,
> >>
> >> Have you checked for the health of HDFS? Maybe your cluster ran out of
> >> space or you don't have data nodes running.
> >>
> >> Esteban
> >>
> >> > On Apr 5, 2014, at 10:11, haosdent <haosd...@gmail.com> wrote:
> >> >
> >> > From the log informations, it seems you lost blocks.
> >> > 2014-4-6 上午12:38于 "Álvaro Recuero" <algar...@gmail.com>写道：
> >> >
> >> >> has anyone come across this before? there is still space in the RS
> and
> >> this
> >> >> is not a problem of datanodes availability as I can confirm. cheers
> >> >>
> >> >> 2014-04-05 09:55:19,210 DEBUG
> >> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: using
> >> new
> >> >> createWriter -- HADOOP-6840
> >> >> 2014-04-05 09:55:19,211 DEBUG
> >> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
> >> >> Path=hdfs://
> >> >> taurus-5.lyon.grid5000.fr:
> >> >>
> >> >>
> >>
> 9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/0000000000002550928.temp,
> >> >> syncFs=true, hflush=false, compressi
> >> >> on=false
> >> >> 2014-04-05 09:55:19,211 DEBUG
> >> >> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating
> writer
> >> >> path=hdfs://taurus-5.lyon.grid5
> >> >>
> >> >>
> >>
> 000.fr:9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/0000000000002550928.tempregion=fc55e2d2d4bcec49d6fedf5
> >> >> a469353b9
> >> >> 2014-04-05 09:55:19,233 DEBUG
> >> >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: tasks arrived or
> >> >> departed
> >> >> 2014-04-05 09:55:19,233 WARN org.apache.hadoop.hdfs.DFSClient:
> >> DataStreamer
> >> >> Exception: org.apache.hadoop.ipc.RemoteException: java.i
> >> >> o.IOException: File
> >> >>
> >> >>
> >>
> /hbase/usertable/237859a0b1e47c86c25a6123506ccb2a/recovered.edits/0000000000002550921.temp
> >> >> could only be replica
> >> >> ted to 0 nodes, instead of 1
> >> >>        at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
> >> >>        at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
> >> >>        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown
> Source)
> >> >>        at
> >> >>
> >> >>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> >>        at java.lang.reflect.Method.invoke(Method.java:616)
> >> >>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
> >> >>        at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
> >> >>        at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
> >> >>        at java.security.AccessController.doPrivileged(Native Method)
> >> >>        at javax.security.auth.Subject.doAs(Subject.java:416)
> >> >>        at
> >> >>
> >> >>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> >> >>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
> >> >>
> >> >>        at org.apache.hadoop.ipc.Client.call(Client.java:1070)
> >> >>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
> >> >>        at sun.proxy.$Proxy9.addBlock(Unknown Source)
> >> >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >>        at
> >> >>
> >> >>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >> >>        at
> >> >>
> >> >>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> >>        at java.lang.reflect.Method.invoke(Method.java:616)
> >> >>        at
> >> >>
> >> >>
> >>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> >> >>        at
> >> >>
> >> >>
> >>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> >> >>        at sun.proxy.$Proxy9.addBlock(Unknown Source)
> >> >>        at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3510)
> >> >>        at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3373)
> >> >>        at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2589)
> >> >>        at
> >> >>
> >> >>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2829)
> >> >>
> >> >> 2014-04-05 09:55:19,233 WARN org.apache.hadoop.hdfs.DFSClient: Error
> >> >> Recovery for block null bad datanode[0] nodes == null
> >> >> 2014-04-05 09:55:19,233 WARN org.apache.hadoop.hdfs.DFSClient: Could
> >> not
> >> >> get block locations. Source file
> >> >>
> >> >>
> >>
> "/hbase/usertable/237859a0b1e47c86c25a6123506ccb2a/recovered.edits/0000000000002550921.temp"
> >> >> - Aborting...
> >> >>
> >>
> >
> >
>

Re: Error in RS with 0.94.8

Reply via email to