DFSClient incorrectly asks for new block if primary crashes during first recoverBlock -------------------------------------------------------------------------------------
Key: HDFS-1229 URL: https://issues.apache.org/jira/browse/HDFS-1229 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.1 Reporter: Thanh Do - Setup: + # available datanodes = 2 + # disks / datanode = 1 + # failures = 1 + failure type = crash + When/where failure happens = during primary's recoverBlock - Details: Say client is appending to block X1 in 2 datanodes: dn1 and dn2. First it needs to make sure both dn1 and dn2 agree on the new GS of the block. 1) Client first creates DFSOutputStream by calling >OutputStream result = new DFSOutputStream(src, buffersize, progress, > lastBlock, stat, > conf.getInt("io.bytes.per.checksum", 512)); in DFSClient.append() 2) The above DFSOutputStream constructor in turn calls processDataNodeError(true, true) (i.e, hasError = true, isAppend = true), and starts the DataStreammer > processDatanodeError(true, true); /* let's call this PDNE 1 */ > streamer.start(); Note that DataStreammer.run() also calls processDatanodeError() > while (!closed && clientRunning) { > ... > boolean doSleep = processDatanodeError(hasError, false); /let's call > this PDNE 2*/ 3) Now in the PDNE 1, we have following code: > blockStream = null; > blockReplyStream = null; > ... > while (!success && clientRunning) { > ... > try { > primary = createClientDatanodeProtocolProxy(primaryNode, conf); > newBlock = primary.recoverBlock(block, isAppend, newnodes); > /*exception here*/ > ... > catch (IOException e) { > ... > if (recoveryErrorCount > maxRecoveryErrorCount) { > /* this condition is false */ > } > ... > return true; > } // end catch > finally {...} > > this.hasError = false; > lastException = null; > errorIndex = 0; > success = createBlockOutputStream(nodes, clientName, true); > } > ... Because dn1 crashes during client call to recoverBlock, we have an exception. Hence, go to the catch block, in which processDatanodeError returns true before setting hasError to false. Also, because createBlockOutputStream() is not called (due to an early return), blockStream is still null. 4) Now PDNE 1 has finished, we come to streamer.start(), which calls PDNE 2. Because hasError = false, PDNE 2 returns false immediately without doing anything > if (!hasError) { > return false; > } 5) still in the DataStreamer.run(), after returning false from PDNE 2, we still have blockStream = null, hence the following code is executed: > if (blockStream == null) { > nodes = nextBlockOutputStream(src); > this.setName("DataStreamer for file " + src + > " block " + block); > response = new ResponseProcessor(nodes); > response.start(); > } nextBlockOutputStream which asks namenode to allocate new Block is called. (This is not good, because we are appending, not writing). Namenode gives it new Block ID and a set of datanodes, including crashed dn1. this leads to createOutputStream() fails because it tries to contact the dn1 first. (which has crashed). The client retries 5 times without any success, because every time, it asks namenode for new block! Again we see that the retry logic at client is weird! *This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu)* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.