Hi Terry,

It seems like something got truncated in your FSImage... though it's
unclear how that might have happened.

If you're able to share your logs and your dfs.name.dir contents, feel
free to contact me off-list and I can try to take a look to diagnose
the issue and try to recover the system. Of course whenever any
corruption issue occurs we take it seriously and want to get at a root
cause to prevent future occurrences!

Thanks
-Todd

On Fri, May 18, 2012 at 6:57 AM, Terry Healy <the...@bnl.gov> wrote:
> Sorry, forgot to attach the trace:
> <code>
> 2012-05-18 09:54:45,355 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 128
> 2012-05-18 09:54:45,379 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> initialization failed.
> java.io.EOFException
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>        at org.apache.hadoop.io.UTF8.readFields(UTF8.java:112)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1808)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:901)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:824)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
>        at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
> 2012-05-18 09:54:45,380 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>        at org.apache.hadoop.io.UTF8.readFields(UTF8.java:112)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1808)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:901)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:824)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
>        at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
>
> 2012-05-18 09:54:45,380 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at abcd/1xx.1xx.2xx.3xx
> ************************************************************/
>
> </code>
>
>
>
> On 05/18/2012 09:51 AM, Terry Healy wrote:
>> Running Apache 1.0.2 ~12 datanodes
>>
>> Ran FSCK / -> OK, before, everything running as expected.
>>
>> Started trying to use a script to assign nodes to racks, which required
>> several stop-dfs.sh / start-dfs.sh cycles. (with some stop-all.sh /
>> start-all.sh too if that matters.
>>
>> Got past errors in script and data file, but dfsadmin -report still
>> showed all assigned to default rack. I tried replacing one system name
>> in the rack mapping file with it's IP address. At this point the NN
>> failed to start up.
>>
>> So I commented out the topology.script.file.name property statements in
>> hdfs-site.xml
>>
>> NN still fails to start; trace below indicating EOF Exception, but I
>> don't know what file it can't read.
>>
>> As always your patience with a noob appreciated; any suggestions to get
>> started again? (I can forget about the rack assignment for now)
>>
>> Thanks.
>>
>>
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to