Re: Issue with HDFS Client when datanode is temporarily unavailable

Pallavi Palleti Fri, 24 Jul 2009 06:05:55 -0700

Could some one let me know what would be the reason for failure. If the stream 
can be closed with out any issue when datanodes are available, it reduces most 
of the complexity that need to be done at my end. As the stream is failing to 
close even when datanodes are available, I have to maintain a kind of 
checkpointing to resume from where the data has failed to copy back to HDFS 
which will add an overhead for a solution which is near real time.

Thanks
Pallavi
----- Original Message -----
From: "Pallavi Palleti" <pallavi.pall...@corp.aol.com>
To: common-user@hadoop.apache.org
Sent: Wednesday, July 22, 2009 5:06:49 PM GMT +05:30 Chennai, Kolkata, Mumbai, 
New Delhi
Subject: RE: Issue with HDFS Client when datanode is temporarily unavailable

Hi all,

In simple terms, Why is any output stream that failed to close when the
datanodes weren't available fails when I try to close the same again
when the datanodes are available? Could someone kindly help me to tackle
this situation?

Thanks
Pallavi

-----Original Message-----
From: Palleti, Pallavi [mailto:pallavi.pall...@corp.aol.com] 
Sent: Tuesday, July 21, 2009 10:21 PM
To: common-user@hadoop.apache.org
Subject: Issue with HDFS Client when datanode is temporarily unavailable

Hi all,

We are facing issues with an external application when it tries to write
data into HDFS using FSDataOutputStream. We are using hadoop-0.18.2
version. The code works perfectly fine as long as the data nodes are
doing well. If the data nodes are unavailable due to some reason (No
space left etc, which is temporary due to map red jobs running on the
machine), the code fails. I tried to fix the issue by catching the error
and waiting for some time before retrying again. During this, I came to
know that the actual writes are not happening when we specify
out.write() (Even the same case with out.write() followed by
out.flush()), but it happens when we actually specify out.close().
During this time, if the datanodes are unavailable, the DFSClient
internally tries multiple times before actually throwing exception.
Below are the sequence of exceptions that I am seeing.

09/07/21 19:33:25 INFO dfs.DFSClient: Exception in
createBlockOutputStream java.net.ConnectException: Connection refused

09/07/21 19:33:25 INFO dfs.DFSClient: Abandoning block
blk_2612177980121914843_134112

09/07/21 19:33:31 INFO dfs.DFSClient: Exception in
createBlockOutputStream java.net.ConnectException: Connection refused

09/07/21 19:33:31 INFO dfs.DFSClient: Abandoning block
blk_-3499389777806382640_134112

09/07/21 19:33:37 INFO dfs.DFSClient: Exception in
createBlockOutputStream java.net.ConnectException: Connection refused

09/07/21 19:33:37 INFO dfs.DFSClient: Abandoning block
blk_1835125657840860999_134112

09/07/21 19:33:43 INFO dfs.DFSClient: Exception in
createBlockOutputStream java.net.ConnectException: Connection refused

09/07/21 19:33:43 INFO dfs.DFSClient: Abandoning block
blk_-3979824251735502509_134112    [4 times attempt done by DFSClient
before throwing exception during which datanode is unavailable]

09/07/21 19:33:49 WARN dfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.

        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DF
SClient.java:2357)

        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.ja
va:1743)

        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie
nt.java:1920)

09/07/21 19:33:49 WARN dfs.DFSClient: Error Recovery for block
blk_-3979824251735502509_134112 bad datanode[0]

09/07/21 19:33:49 ERROR logwriter.LogWriterToHDFSV2: Failed while
creating file for data:some dummy line [21/Jul/2009:17:15:18
somethinghere] with other dummy info :to HDFS

java.io.IOException: Could not get block locations. Aborting...

        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS
Client.java:2151)

        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.ja
va:1743)

        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie
nt.java:1897)

09/07/21 19:33:49 INFO logwriter.LogWriterToHDFSV2: Retrying
again...number of Attempts =0  [done by me manually  during which
datanode is available]

09/07/21 19:33:54 ERROR logwriter.LogWriterToHDFSV2: Failed while
creating file for data:some dummy line [21/Jul/2009:17:15:18
somethinghere] with other dummy info :to HDFS

java.io.IOException: Could not get block locations. Aborting...

        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS
Client.java:2151)

        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.ja
va:1743)

        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie
nt.java:1897)

09/07/21 19:33:54 INFO logwriter.LogWriterToHDFSV2: Retrying
again...number of Attempts =1 [done by me manually during which datanode
is available]

09/07/21 19:33:59 ERROR logwriter.LogWriterToHDFSV2: Failed while
creating file for data:some dummy line [21/Jul/2009:17:15:18
somethinghere] with other dummy info :to HDFS

java.io.IOException: Could not get block locations. Aborting...

        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS
Client.java:2151)

        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.ja
va:1743)

        at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie
nt.java:1897)

09/07/21 19:33:59 INFO logwriter.LogWriterToHDFSV2: Retrying
again...number of Attempts =2  [done by me manually during which
datanode is available]

09/07/21 19:34:04 ERROR logwriter.LogWriterToHDFSV2: Unexpected error
while writing to HDFS, exiting ...

So, if the writes are happening during close and if it fails because of
unavailability of datanodes, next time, when I try to close the same
stream, it is throwing exception even when the datanodes are available.
Why is it failing when I am trying to close the stream again though the
datanodes are available? Any idea how to handle the scenario? The only
way that I can think of is  to remember the position in the input file
from where we started writing into new file in HDFS and  seek to that
position during failure and re-read the same and try to write back to
HDFS. Could someone please tell me if there is a better option of
handling these errors?

Thanks

Pallavi

Re: Issue with HDFS Client when datanode is temporarily unavailable

Reply via email to