[ https://issues.apache.org/jira/browse/HDFS-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889001#action_12889001 ]
Thanh Do commented on HDFS-1229: -------------------------------- this does not happens in the append+320 trunk. > DFSClient incorrectly asks for new block if primary crashes during first > recoverBlock > ------------------------------------------------------------------------------------- > > Key: HDFS-1229 > URL: https://issues.apache.org/jira/browse/HDFS-1229 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client > Affects Versions: 0.20-append > Reporter: Thanh Do > > Setup: > -------- > + # available datanodes = 2 > + # disks / datanode = 1 > + # failures = 1 > + failure type = crash > + When/where failure happens = during primary's recoverBlock > > Details: > ---------- > Say client is appending to block X1 in 2 datanodes: dn1 and dn2. > First it needs to make sure both dn1 and dn2 agree on the new GS of the > block. > 1) Client first creates DFSOutputStream by calling > > >OutputStream result = new DFSOutputStream(src, buffersize, progress, > > lastBlock, stat, > > conf.getInt("io.bytes.per.checksum", 512)); > > in DFSClient.append() > > 2) The above DFSOutputStream constructor in turn calls > processDataNodeError(true, true) > (i.e, hasError = true, isAppend = true), and starts the DataStreammer > > > processDatanodeError(true, true); /* let's call this PDNE 1 */ > > streamer.start(); > > Note that DataStreammer.run() also calls processDatanodeError() > > while (!closed && clientRunning) { > > ... > > boolean doSleep = processDatanodeError(hasError, false); /let's call > > this PDNE 2*/ > > 3) Now in the PDNE 1, we have following code: > > > blockStream = null; > > blockReplyStream = null; > > ... > > while (!success && clientRunning) { > > ... > > try { > > primary = createClientDatanodeProtocolProxy(primaryNode, conf); > > newBlock = primary.recoverBlock(block, isAppend, newnodes); > > /*exception here*/ > > ... > > catch (IOException e) { > > ... > > if (recoveryErrorCount > maxRecoveryErrorCount) { > > // this condition is false > > } > > ... > > return true; > > } // end catch > > finally {...} > > > > this.hasError = false; > > lastException = null; > > errorIndex = 0; > > success = createBlockOutputStream(nodes, clientName, true); > > } > > ... > > Because dn1 crashes during client call to recoverBlock, we have an exception. > Hence, go to the catch block, in which processDatanodeError returns true > before setting hasError to false. Also, because createBlockOutputStream() is > not called > (due to an early return), blockStream is still null. > > 4) Now PDNE 1 has finished, we come to streamer.start(), which calls PDNE 2. > Because hasError = false, PDNE 2 returns false immediately without doing > anything > > if (!hasError) { return false; } > > 5) still in the DataStreamer.run(), after returning false from PDNE 2, we > still have > blockStream = null, hence the following code is executed: > if (blockStream == null) { > nodes = nextBlockOutputStream(src); > this.setName("DataStreamer for file " + src + " block " + block); > response = new ResponseProcessor(nodes); > response.start(); > } > > nextBlockOutputStream which asks namenode to allocate new Block is called. > (This is not good, because we are appending, not writing). > Namenode gives it new Block ID and a set of datanodes, including crashed dn1. > this leads to createOutputStream() fails because it tries to contact the dn1 > first. > (which has crashed). The client retries 5 times without any success, > because every time, it asks namenode for new block! Again we see > that the retry logic at client is weird! > *This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu)* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.