[ http://issues.apache.org/jira/browse/HADOOP-128?page=all ]
Owen O'Malley updated HADOOP-128:
---------------------------------
Attachment: datanode-mirroring.patch
This patch changes the client so that:
1. it has replication * 1 minute timeout for the block replicas to be
written.
2. improved logging, including the filename and remote hostname when things
fail
3.
It patches the DataNode so that:
1. Failures downstream (from the mirror nodes) never propagate back upstream.
2. Improved logging including filenames and remote host names.
3. the changes involve a lot of whitespace changes because of block changes,
so i'll include a separate upload that ignores whitespaces.
> Failure to replicate dfs block kills client
> -------------------------------------------
>
> Key: HADOOP-128
> URL: http://issues.apache.org/jira/browse/HADOOP-128
> Project: Hadoop
> Type: Bug
> Components: dfs
> Versions: 0.1.1
> Environment: ~200 node linux cluster (kernel 2.6, redhat, 2 hyper threaded
> cpus)
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Attachments: datanode-mirroring.patch
>
> When the datanode gets an exception, which is logged as:
> 060407 155835 13 DataXCeiver
> java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:178)
> at java.io.DataInputStream.readLong(DataInputStream.java:380)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:462)
> at java.lang.Thread.run(Thread.java:595)
> It closes the user's connection to the data node, which causes the client to
> get an IOException from:
> at java.io.DataInputStream.readFully(DataInputStream.java:178)
> at java.io.DataInputStream.readLong(DataInputStream.java:380)
> at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.internalClose(DFSClient.java:883)
>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira