[ 
https://issues.apache.org/jira/browse/HDFS-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1229:
---------------------------

    Description: 
Setup:
--------
+ # available datanodes = 2
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+ When/where failure happens = during primary's recoverBlock
 
Details:
----------
Say client is appending to block X1 in 2 datanodes: dn1 and dn2.
First it needs to make sure both dn1 and dn2  agree on the new GS of the block.
1) Client first creates DFSOutputStream by calling
 
>OutputStream result = new DFSOutputStream(src, buffersize, progress,
>                                            lastBlock, stat, 
> conf.getInt("io.bytes.per.checksum", 512));
 
in DFSClient.append()
 
2) The above DFSOutputStream constructor in turn calls 
processDataNodeError(true, true)
(i.e, hasError = true, isAppend = true), and starts the DataStreammer
 
> processDatanodeError(true, true);  /* let's call this PDNE 1 */
> streamer.start();
 
Note that DataStreammer.run() also calls processDatanodeError()
> while (!closed && clientRunning) {
>  ...
>      boolean doSleep = processDatanodeError(hasError, false); /let's call 
> this PDNE 2*/
 
3) Now in the PDNE 1, we have following code:
 
> blockStream = null;
> blockReplyStream = null;
> ...
> while (!success && clientRunning) {
> ...
>    try {
>         primary = createClientDatanodeProtocolProxy(primaryNode, conf);
>         newBlock = primary.recoverBlock(block, isAppend, newnodes); 
> /*exception here*/
>         ...
>    catch (IOException e) {
>         ...
>         if (recoveryErrorCount > maxRecoveryErrorCount) { 
>         /* this condition is false */
>         }
>         ...
>         return true;
>    } // end catch
>    finally {...}
>    
>    this.hasError = false;
>    lastException = null;
>    errorIndex = 0;
>    success = createBlockOutputStream(nodes, clientName, true);
>    }
>    ...
 
Because dn1 crashes during client call to recoverBlock, we have an exception.
Hence, go to the catch block, in which processDatanodeError returns true
before setting hasError to false. Also, because createBlockOutputStream() is 
not called
(due to an early return), blockStream is still null.
 
4) Now PDNE 1 has finished, we come to streamer.start(), which calls PDNE 2.
Because hasError = false, PDNE 2 returns false immediately without doing 
anything
>    if (!hasError) {
>     return false;
>    }
 
5) still in the DataStreamer.run(), after returning false from PDNE 2, we still 
have
blockStream = null, hence the following code is executed:
>              if (blockStream == null) {
>               nodes = nextBlockOutputStream(src);
>               this.setName("DataStreamer for file " + src +
>                             " block " + block);
>                response = new ResponseProcessor(nodes);
>                response.start();
>              }
 
nextBlockOutputStream which asks namenode to allocate new Block is called.
(This is not good, because we are appending, not writing).
Namenode gives it new Block ID and a set of datanodes, including crashed dn1.
this leads to createOutputStream() fails because it tries to contact the dn1 
first.
(which has crashed). The client retries 5 times without any success,
because every time, it asks namenode for new block! Again we see
that the retry logic at client is weird!

*This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)*

  was:
- Setup:
+ # available datanodes = 2
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+ When/where failure happens = during primary's recoverBlock
 
- Details:
Say client is appending to block X1 in 2 datanodes: dn1 and dn2.
First it needs to make sure both dn1 and dn2  agree on the new GS of the block.
1) Client first creates DFSOutputStream by calling
 
>OutputStream result = new DFSOutputStream(src, buffersize, progress,
>                                            lastBlock, stat, 
> conf.getInt("io.bytes.per.checksum", 512));
 
in DFSClient.append()
 
2) The above DFSOutputStream constructor in turn calls 
processDataNodeError(true, true)
(i.e, hasError = true, isAppend = true), and starts the DataStreammer
 
> processDatanodeError(true, true);  /* let's call this PDNE 1 */
> streamer.start();
 
Note that DataStreammer.run() also calls processDatanodeError()
> while (!closed && clientRunning) {
>  ...
>      boolean doSleep = processDatanodeError(hasError, false); /let's call 
> this PDNE 2*/
 
3) Now in the PDNE 1, we have following code:
 
> blockStream = null;
> blockReplyStream = null;
> ...
> while (!success && clientRunning) {
> ...
>    try {
>         primary = createClientDatanodeProtocolProxy(primaryNode, conf);
>         newBlock = primary.recoverBlock(block, isAppend, newnodes); 
> /*exception here*/
>         ...
>    catch (IOException e) {
>         ...
>         if (recoveryErrorCount > maxRecoveryErrorCount) { 
>         /* this condition is false */
>         }
>         ...
>         return true;
>    } // end catch
>    finally {...}
>    
>    this.hasError = false;
>    lastException = null;
>    errorIndex = 0;
>    success = createBlockOutputStream(nodes, clientName, true);
>    }
>    ...
 
Because dn1 crashes during client call to recoverBlock, we have an exception.
Hence, go to the catch block, in which processDatanodeError returns true
before setting hasError to false. Also, because createBlockOutputStream() is 
not called
(due to an early return), blockStream is still null.
 
4) Now PDNE 1 has finished, we come to streamer.start(), which calls PDNE 2.
Because hasError = false, PDNE 2 returns false immediately without doing 
anything
>    if (!hasError) {
>     return false;
>    }
 
5) still in the DataStreamer.run(), after returning false from PDNE 2, we still 
have
blockStream = null, hence the following code is executed:
>              if (blockStream == null) {
>               nodes = nextBlockOutputStream(src);
>               this.setName("DataStreamer for file " + src +
>                             " block " + block);
>                response = new ResponseProcessor(nodes);
>                response.start();
>              }
 
nextBlockOutputStream which asks namenode to allocate new Block is called.
(This is not good, because we are appending, not writing).
Namenode gives it new Block ID and a set of datanodes, including crashed dn1.
this leads to createOutputStream() fails because it tries to contact the dn1 
first.
(which has crashed). The client retries 5 times without any success,
because every time, it asks namenode for new block! Again we see
that the retry logic at client is weird!

*This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)*


> DFSClient incorrectly asks for new block if primary crashes during first 
> recoverBlock
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-1229
>                 URL: https://issues.apache.org/jira/browse/HDFS-1229
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 0.20.1
>            Reporter: Thanh Do
>
> Setup:
> --------
> + # available datanodes = 2
> + # disks / datanode = 1
> + # failures = 1
> + failure type = crash
> + When/where failure happens = during primary's recoverBlock
>  
> Details:
> ----------
> Say client is appending to block X1 in 2 datanodes: dn1 and dn2.
> First it needs to make sure both dn1 and dn2  agree on the new GS of the 
> block.
> 1) Client first creates DFSOutputStream by calling
>  
> >OutputStream result = new DFSOutputStream(src, buffersize, progress,
> >                                            lastBlock, stat, 
> > conf.getInt("io.bytes.per.checksum", 512));
>  
> in DFSClient.append()
>  
> 2) The above DFSOutputStream constructor in turn calls 
> processDataNodeError(true, true)
> (i.e, hasError = true, isAppend = true), and starts the DataStreammer
>  
> > processDatanodeError(true, true);  /* let's call this PDNE 1 */
> > streamer.start();
>  
> Note that DataStreammer.run() also calls processDatanodeError()
> > while (!closed && clientRunning) {
> >  ...
> >      boolean doSleep = processDatanodeError(hasError, false); /let's call 
> > this PDNE 2*/
>  
> 3) Now in the PDNE 1, we have following code:
>  
> > blockStream = null;
> > blockReplyStream = null;
> > ...
> > while (!success && clientRunning) {
> > ...
> >    try {
> >         primary = createClientDatanodeProtocolProxy(primaryNode, conf);
> >         newBlock = primary.recoverBlock(block, isAppend, newnodes); 
> > /*exception here*/
> >         ...
> >    catch (IOException e) {
> >         ...
> >         if (recoveryErrorCount > maxRecoveryErrorCount) { 
> >         /* this condition is false */
> >         }
> >         ...
> >         return true;
> >    } // end catch
> >    finally {...}
> >    
> >    this.hasError = false;
> >    lastException = null;
> >    errorIndex = 0;
> >    success = createBlockOutputStream(nodes, clientName, true);
> >    }
> >    ...
>  
> Because dn1 crashes during client call to recoverBlock, we have an exception.
> Hence, go to the catch block, in which processDatanodeError returns true
> before setting hasError to false. Also, because createBlockOutputStream() is 
> not called
> (due to an early return), blockStream is still null.
>  
> 4) Now PDNE 1 has finished, we come to streamer.start(), which calls PDNE 2.
> Because hasError = false, PDNE 2 returns false immediately without doing 
> anything
> >    if (!hasError) {
> >     return false;
> >    }
>  
> 5) still in the DataStreamer.run(), after returning false from PDNE 2, we 
> still have
> blockStream = null, hence the following code is executed:
> >              if (blockStream == null) {
> >               nodes = nextBlockOutputStream(src);
> >               this.setName("DataStreamer for file " + src +
> >                             " block " + block);
> >                response = new ResponseProcessor(nodes);
> >                response.start();
> >              }
>  
> nextBlockOutputStream which asks namenode to allocate new Block is called.
> (This is not good, because we are appending, not writing).
> Namenode gives it new Block ID and a set of datanodes, including crashed dn1.
> this leads to createOutputStream() fails because it tries to contact the dn1 
> first.
> (which has crashed). The client retries 5 times without any success,
> because every time, it asks namenode for new block! Again we see
> that the retry logic at client is weird!
> *This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to