True, I don't recall 0.20.2 (the original release that was a few years
ago) carrying these fixes. You ought to upgrade that cluster to the
current stable release for the many fixes you can benefit from :)

On Mon, May 14, 2012 at 11:58 PM, Prashant Kommireddi
<prash1...@gmail.com> wrote:
> Thanks Harsh. I am using 0.20.2, I see on the Jira this issue was
> fixed for 0.23?
>
> I will try out your suggestions and get back.
>
> On May 14, 2012, at 1:22 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Your fsimage seems to have gone bad (is it 0-sized? I recall that as a
>> known issue long since fixed).
>>
>> The easiest way is to fall back to the last available good checkpoint
>> (From SNN). Or if you have multiple dfs.name.dirs, see if some of the
>> other points have better/complete files on them, and re-spread them
>> across after testing them out (and backing up the originals).
>>
>> Though what version are you running? Cause AFAIK most of the recent
>> stable versions/distros include NN resource monitoring threads which
>> should have placed your NN into safemode the moment all its disks ran
>> near to out of space.
>>
>> On Mon, May 14, 2012 at 10:50 PM, Prashant Kommireddi
>> <prash1...@gmail.com> wrote:
>>> Hi,
>>>
>>> I am seeing an issue where Namenode does not start due an EOFException. The
>>> disk was full and I cleared space up but I am unable to get past this
>>> exception. Any ideas on how this can be resolved?
>>>
>>> 2012-05-14 10:10:44,018 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=hadoop
>>> 2012-05-14 10:10:44,018 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>> isPermissionEnabled=false
>>> 2012-05-14 10:10:44,023 INFO
>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>>> Initializing FSNamesystemMetrics using context
>>> object:org.apache.hadoop.metrics.file.FileContext
>>> 2012-05-14 10:10:44,024 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>> FSNamesystemStatusMBean
>>> 2012-05-14 10:10:44,047 INFO org.apache.hadoop.hdfs.server.common.Storage:
>>> Number of files = 205470
>>> 2012-05-14 10:10:44,844 ERROR
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
>>> initialization failed.
>>> java.io.EOFException
>>>    at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>>    at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
>>> 2012-05-14 10:10:44,845 INFO org.apache.hadoop.ipc.Server: Stopping server
>>> on 54310
>>> 2012-05-14 10:10:44,845 ERROR
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
>>>    at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>>    at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1578)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:880)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>>>    at
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
>>>
>>> 2012-05-14 10:10:44,846 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>>> /************************************************************
>>> SHUTDOWN_MSG: Shutting down NameNode at
>>> gridforce-1.internal.salesforce.com/10.0.201.159
>>> ************************************************************/
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Reply via email to