[jira] Commented: (HDFS-1230) BlocksMap.blockinfo is not getting cleared immediately after deleting a block.This will be cleared only after block report comes from the datanode.Why we need to maintain
[ https://issues.apache.org/jira/browse/HDFS-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881574#action_12881574 ] Gokul commented on HDFS-1230: - {quote} Konstantin Shvachko added a comment - 22/Jun/10 10:36 PM What is BlocksMap.blockinfo? {quote} FYI, BlockInfo is the static inner class of BlocksMap which is used for holding block metadata. {quote} If this is a question please use hdfs lists rather than creating jiras. {quote} Not a question. {quote} If not please clarify what exactly you would like to improve. {quote} Yup.! Here we go. Whenever a delete is done, the removePathAndBlocks() in FsNameSystem is internally called. {noformat} void removePathAndBlocks(String src, List blocks) throws IOException { leaseManager.removeLeaseWithPrefixPath(src); for (Block b : blocks) { blocksMap.removeINode(b); corruptReplicas.removeFromCorruptReplicasMap(b); addToInvalidates(b); } } {noformat} Here the corresponding block is not removed by blocksMap.removeINode(b) since it checks for number of datanodes = 0 and then removing the enrty. so when exactly will it be removed from the map?? It will be removed during the next block report of the datanode to the namenode which by default is 1 hour. My concern here is why should we maintain the unwanted entry in the Map (*Map map*) in BlocksMap till that time??? We can clear the entry in removePathAndBlocks() itself something like this right?? {noformat} blocksMap.removeBlock(blocksMap.getStoredBlock(b)); {noformat} I got this issue when I found that the NameNode's BlocksMap memory usage keeps on increasing during intensive write/mkdir operations even if they are deleted. > BlocksMap.blockinfo is not getting cleared immediately after deleting a > block.This will be cleared only after block report comes from the > datanode.Why we need to maintain the blockinfo till that time. > > > Key: HDFS-1230 > URL: https://issues.apache.org/jira/browse/HDFS-1230 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Gokul > > BlocksMap.blockinfo is not getting cleared immediately after deleting a > block.This will be cleared only after block report comes from the > datanode.Why we need to maintain the blockinfo till that time It > increases namenode memory unnecessarily. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-650) Namenode in infinite loop for removing/recovering lease.
[ https://issues.apache.org/jira/browse/HDFS-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881552#action_12881552 ] Gokul commented on HDFS-650: {quote} At this point, I could provide some information is that the user request the APPEND operation before the file is not closed. {quote} So this issue is there only in this scenario?? (When user tries to append before closing the file) > Namenode in infinite loop for removing/recovering lease. > > > Key: HDFS-650 > URL: https://issues.apache.org/jira/browse/HDFS-650 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yajun Dong >Priority: Blocker > > 2009-09-23 18:05:48,929 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing > lease [Lease. Holder: DFSClient_2121971893, pendingcreates: 1], > sortedLeases.size()=: 1 > 2009-09-23 18:05:48,929 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. > Holder: DFSClient_2121971893, pendingcreates: 1], > src=/54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO > 2009-09-23 18:05:48,929 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseCreate: attempt to release a create lock on > /54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO but file is already > closed. > 2009-09-23 18:05:48,929 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing > lease [Lease. Holder: DFSClient_2121971893, pendingcreates: 1], > sortedLeases.size()=: 1 > 2009-09-23 18:05:48,929 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. > Holder: DFSClient_2121971893, pendingcreates: 1], > src=/54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO > 2009-09-23 18:05:48,929 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseCreate: attempt to release a create lock on > /54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO but file is already > closed. > 2009-09-23 18:05:48,929 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing > lease [Lease. Holder: DFSClient_2121971893, pendingcreates: 1], > sortedLeases.size()=: 1 > 2009-09-23 18:05:48,929 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. > Holder: DFSClient_2121971893, pendingcreates: 1], > src=/54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO > 2009-09-23 18:05:48,929 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseCreate: attempt to release a create lock on > /54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO but file is already > closed. > 2009-09-23 18:05:48,929 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing > lease [Lease. Holder: DFSClient_2121971893, pendingcreates: 1], > sortedLeases.size()=: 1 > 2009-09-23 18:05:48,929 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. > Holder: DFSClient_2121971893, pendingcreates: 1], > src=/54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO > 2009-09-23 18:05:48,929 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseCreate: attempt to release a create lock on > /54_upload/GALGAME/<风子の漂流...@sumisora+2dgal>CANVAS3V1.ISO but file is already > closed. > 2009-09-23 18:05:48,929 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease Monitor: Removing > lease [Lease. Holder: DFSClient_2121971893, pendingcreates: 1], > sortedLeases.size()=: 1 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1227) UpdateBlock fails due to unmatched file length
[ https://issues.apache.org/jira/browse/HDFS-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-1227. --- Resolution: Duplicate Going to resolve this as invalid. If you can reproduce after HDFS-1186 is committed, or provide a unit test, we can reopen. > UpdateBlock fails due to unmatched file length > -- > > Key: HDFS-1227 > URL: https://issues.apache.org/jira/browse/HDFS-1227 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Thanh Do > > - Summary: client append is not atomic, hence, it is possible that > when retrying during append, there is an exception in updateBlock > indicating unmatched file length, making append failed. > > - Setup: > + # available datanodes = 3 > + # disks / datanode = 1 > + # failures = 2 > + failure type = bad disk > + When/where failure happens = (see below) > + This bug is non-deterministic, to reproduce it, add a sufficient sleep > before out.write() in BlockReceiver.receivePacket() in dn1 and dn2 but not dn3 > > - Details: > Suppose client appends 16 bytes to block X which has length 16 bytes at dn1, > dn2, dn3. > Dn1 is primary. The pipeline is dn3-dn2-dn1. recoverBlock succeeds. > Client starts sending data to the dn3 - the first datanode in pipeline. > dn3 forwards the packet to downstream datanodes, and starts writing > data to its disk. Suppose there is an exception in dn3 when writing to disk. > Client gets the exception, it starts the recovery code by calling > dn1.recoverBlock() again. > dn1 in turn calls dn2.getMetadataInfo() and dn1.getMetaDataInfo() to build > the syncList. > Suppose at the time getMetadataInfo() is called at both datanodes (dn1 and > dn2), > the previous packet (which is sent from dn3) has not come to disk yet. > Hence, the block Info given by getMetaDataInfo contains the length of 16 > bytes. > But after that, the packet "comes" to disk, making the block file length now > becomes 32 bytes. > Using the syncList (with contains block info with length 16 byte), dn1 calls > updateBlock at > dn2 and dn1, which will failed, because the length of new block info (given > by updateBlock, > which is 16 byte) does not match with its actual length on disk (which is 32 > byte) > > Note that this bug is non-deterministic. Its depends on the thread > interleaving > at datanodes. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1226) Last block is temporary unavailable for readers because of crashed appender
[ https://issues.apache.org/jira/browse/HDFS-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-1226. --- Resolution: Duplicate > Last block is temporary unavailable for readers because of crashed appender > --- > > Key: HDFS-1226 > URL: https://issues.apache.org/jira/browse/HDFS-1226 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Thanh Do > > - Summary: the last block is unavailable to subsequent readers if appender > crashes in the > middle of appending workload. > > - Setup: > + # available datanodes = 3 > + # disks / datanode = 1 > + # failures = 1 > + failure type = crash > + When/where failure happens = (see below) > > - Details: > Say a client appending to block X at 3 datanodes: dn1, dn2 and dn3. After > successful > recoverBlock at primary datanode, client calls createOutputStream, which make > all datanodes > move the block file and the meta file from current directory to tmp > directory. Now suppose > the client crashes. Since all replicas of block X are in tmp folders of > corresponding datanode, > subsequent readers cannot read block X. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1224) Stale connection makes node miss append
[ https://issues.apache.org/jira/browse/HDFS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881541#action_12881541 ] Todd Lipcon commented on HDFS-1224: --- bq. but the append semantic is not guaranteed, right There's nothing there that violates the append semantic. New readers will see the updated generation stamp and thus won't be able to read from the node with the shorter length, right? > Stale connection makes node miss append > --- > > Key: HDFS-1224 > URL: https://issues.apache.org/jira/browse/HDFS-1224 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Thanh Do > > - Summary: if a datanode crashes and restarts, it may miss an append. > > - Setup: > + # available datanodes = 3 > + # replica = 3 > + # disks / datanode = 1 > + # failures = 1 > + failure type = crash > + When/where failure happens = after the first append succeed > > - Details: > Since each datanode maintains a pool of IPC connections, whenever it wants > to make an IPC call, it first looks into the pool. If the connection is not > there, > it is created and put in to the pool. Otherwise the existing connection is > used. > Suppose that the append pipeline contains dn1, dn2, and dn3. Dn1 is the > primary. > After the client appends to block X successfully, dn2 crashes and restarts. > Now client writes a new block Y to dn1, dn2 and dn3. The write is successful. > Client starts appending to block Y. It first calls dn1.recoverBlock(). > Dn1 will first create a proxy corresponding with each of the datanode in the > pipeline > (in order to make RPC call like getMetadataInfo( ) or updateBlock( )). > However, because > dn2 has just crashed and restarts, its connection in dn1's pool become stale. > Dn1 uses > this connection to make a call to dn2, hence an exception. Therefore, append > will be > made only to dn1 and dn3, although dn2 is alive and the write of block Y to > dn2 has > been successful. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1229) DFSClient incorrectly asks for new block if primary crashes during first recoverBlock
[ https://issues.apache.org/jira/browse/HDFS-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881540#action_12881540 ] Todd Lipcon commented on HDFS-1229: --- Hi Thanh, Can you please make a JUnit test for this against branch-0.20-append? The tests in TestFileAppend4 should be a good model. -Todd > DFSClient incorrectly asks for new block if primary crashes during first > recoverBlock > - > > Key: HDFS-1229 > URL: https://issues.apache.org/jira/browse/HDFS-1229 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.20-append >Reporter: Thanh Do > > Setup: > > + # available datanodes = 2 > + # disks / datanode = 1 > + # failures = 1 > + failure type = crash > + When/where failure happens = during primary's recoverBlock > > Details: > -- > Say client is appending to block X1 in 2 datanodes: dn1 and dn2. > First it needs to make sure both dn1 and dn2 agree on the new GS of the > block. > 1) Client first creates DFSOutputStream by calling > > >OutputStream result = new DFSOutputStream(src, buffersize, progress, > >lastBlock, stat, > > conf.getInt("io.bytes.per.checksum", 512)); > > in DFSClient.append() > > 2) The above DFSOutputStream constructor in turn calls > processDataNodeError(true, true) > (i.e, hasError = true, isAppend = true), and starts the DataStreammer > > > processDatanodeError(true, true); /* let's call this PDNE 1 */ > > streamer.start(); > > Note that DataStreammer.run() also calls processDatanodeError() > > while (!closed && clientRunning) { > > ... > > boolean doSleep = processDatanodeError(hasError, false); /let's call > > this PDNE 2*/ > > 3) Now in the PDNE 1, we have following code: > > > blockStream = null; > > blockReplyStream = null; > > ... > > while (!success && clientRunning) { > > ... > >try { > > primary = createClientDatanodeProtocolProxy(primaryNode, conf); > > newBlock = primary.recoverBlock(block, isAppend, newnodes); > > /*exception here*/ > > ... > >catch (IOException e) { > > ... > > if (recoveryErrorCount > maxRecoveryErrorCount) { > > // this condition is false > > } > > ... > > return true; > >} // end catch > >finally {...} > > > >this.hasError = false; > >lastException = null; > >errorIndex = 0; > >success = createBlockOutputStream(nodes, clientName, true); > >} > >... > > Because dn1 crashes during client call to recoverBlock, we have an exception. > Hence, go to the catch block, in which processDatanodeError returns true > before setting hasError to false. Also, because createBlockOutputStream() is > not called > (due to an early return), blockStream is still null. > > 4) Now PDNE 1 has finished, we come to streamer.start(), which calls PDNE 2. > Because hasError = false, PDNE 2 returns false immediately without doing > anything > > if (!hasError) { return false; } > > 5) still in the DataStreamer.run(), after returning false from PDNE 2, we > still have > blockStream = null, hence the following code is executed: > if (blockStream == null) { >nodes = nextBlockOutputStream(src); >this.setName("DataStreamer for file " + src + " block " + block); >response = new ResponseProcessor(nodes); >response.start(); > } > > nextBlockOutputStream which asks namenode to allocate new Block is called. > (This is not good, because we are appending, not writing). > Namenode gives it new Block ID and a set of datanodes, including crashed dn1. > this leads to createOutputStream() fails because it tries to contact the dn1 > first. > (which has crashed). The client retries 5 times without any success, > because every time, it asks namenode for new block! Again we see > that the retry logic at client is weird! > *This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu)* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1224) Stale connection makes node miss append
[ https://issues.apache.org/jira/browse/HDFS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881539#action_12881539 ] Thanh Do commented on HDFS-1224: "Even so, does this cause any actual problems aside from a shorter pipeline? I'm not sure, but based on the description, it sounds like dn2 thinks it has a block (but it is incomplete), so a client might end up trying to get a block from that node and get an incomplete block" I think this does not create any problem aside from shorter pipeline. dn2 has a block with old time stamp, because it misses updateBlock. Hence the block at dn2 is finally deleted. (but the append semantic is not guaranteed, right? because there are 3 alive datanodes, and write to all 3 is successful, but append only happen successfully at 2 datanodes). > Stale connection makes node miss append > --- > > Key: HDFS-1224 > URL: https://issues.apache.org/jira/browse/HDFS-1224 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Thanh Do > > - Summary: if a datanode crashes and restarts, it may miss an append. > > - Setup: > + # available datanodes = 3 > + # replica = 3 > + # disks / datanode = 1 > + # failures = 1 > + failure type = crash > + When/where failure happens = after the first append succeed > > - Details: > Since each datanode maintains a pool of IPC connections, whenever it wants > to make an IPC call, it first looks into the pool. If the connection is not > there, > it is created and put in to the pool. Otherwise the existing connection is > used. > Suppose that the append pipeline contains dn1, dn2, and dn3. Dn1 is the > primary. > After the client appends to block X successfully, dn2 crashes and restarts. > Now client writes a new block Y to dn1, dn2 and dn3. The write is successful. > Client starts appending to block Y. It first calls dn1.recoverBlock(). > Dn1 will first create a proxy corresponding with each of the datanode in the > pipeline > (in order to make RPC call like getMetadataInfo( ) or updateBlock( )). > However, because > dn2 has just crashed and restarts, its connection in dn1's pool become stale. > Dn1 uses > this connection to make a call to dn2, hence an exception. Therefore, append > will be > made only to dn1 and dn3, although dn2 is alive and the write of block Y to > dn2 has > been successful. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-63) Datanode stops cleaning disk space
[ https://issues.apache.org/jira/browse/HDFS-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881537#action_12881537 ] Gokul commented on HDFS-63: --- is this issue there in 0.20 versions > Datanode stops cleaning disk space > -- > > Key: HDFS-63 > URL: https://issues.apache.org/jira/browse/HDFS-63 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Linux >Reporter: Igor Bolotin >Priority: Critical > > Here is the situation - DFS cluster running Hadoop version 0.19.0. The > cluster is running on multiple servers with practically identical hardware. > Everything works perfectly well, except for one thing - from time to time one > of the data nodes (every time it's a different node) starts to consume more > and more disk space. The node keeps going and if we don't do anything - it > runs out of space completely (ignoring 20GB reserved space settings). > Once restarted - it cleans disk rapidly and goes back to approximately the > same utilization as the rest of data nodes in the cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1261) Important methods and fields of BlockPlacementPolicyDefault.java shouldn't be private
Important methods and fields of BlockPlacementPolicyDefault.java shouldn't be private - Key: HDFS-1261 URL: https://issues.apache.org/jira/browse/HDFS-1261 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.22.0 Reporter: Rodrigo Schmidt Assignee: Rodrigo Schmidt Fix For: 0.22.0 BlockPlacementPolicyDefault has important properties and methods listed as private, which prevents it from being extended in a more useful way. We should change them to be protected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1222) NameNode fail stop in spite of multiple metadata directories
[ https://issues.apache.org/jira/browse/HDFS-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881527#action_12881527 ] Thanh Do commented on HDFS-1222: Konstantin, this is the namenode start up workload. when namenode gets an exception, it fails, but not tolerate, i.e not retry with other image if there is any. (this may be due the design choice that already been made) > NameNode fail stop in spite of multiple metadata directories > > > Key: HDFS-1222 > URL: https://issues.apache.org/jira/browse/HDFS-1222 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20.1 >Reporter: Thanh Do > > Despite the ability to configure multiple name directories > (to store fsimage) and edits directories, the NameNode will fail stop > in most of the time it faces exception when accessing to these directories. > > NameNode fail stops if an exception happens when loading fsimage, > reading fstime, loading edits log, writing fsimage.ckpt ..., although there > are still good replicas. NameNode could have tried to work with other > replicas, > and marked the faulty one. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1220) Namenode unable to start due to truncated fstime
[ https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881526#action_12881526 ] Thanh Do commented on HDFS-1220: it is not exactly the same as HDFS-1221, although fstime suffered from corruption too (which may lead to data loss). In this case, i think the update to fstime should be atomic, or NameNode some how should anticipate reading an empty fstime. > Namenode unable to start due to truncated fstime > > > Key: HDFS-1220 > URL: https://issues.apache.org/jira/browse/HDFS-1220 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20.1 >Reporter: Thanh Do > > - Summary: updating fstime file on disk is not atomic, so it is possible that > if a crash happens in the middle, next time when NameNode reboots, it will > read stale fstime, hence unable to start successfully. > > - Details: > Basically, this involve 3 steps: > 1) delete fstime file (timeFile.delete()) > 2) truncate fstime file (new FileOutputStream(timeFile)) > 3) write new time to fstime file (out.writeLong(checkpointTime)) > If a crash happens after step 2 and before step 3, in the next reboot, > NameNode > got an exception when reading the time (8 byte) from an empty fstime file. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1225) Block lost when primary crashes in recoverBlock
[ https://issues.apache.org/jira/browse/HDFS-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1225: -- Affects Version/s: 0.20-append (was: 0.20.1) > Block lost when primary crashes in recoverBlock > --- > > Key: HDFS-1225 > URL: https://issues.apache.org/jira/browse/HDFS-1225 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Thanh Do > > - Summary: Block is lost if primary datanode crashes in the middle > tryUpdateBlock. > > - Setup: > # available datanode = 2 > # replica = 2 > # disks / datanode = 1 > # failures = 1 > # failure type = crash > When/where failure happens = (see below) > > - Details: > Suppose we have 2 datanodes: dn1 and dn2 and dn1 is primary. > Client appends to blk_X_1001 and crash happens during dn1.recoverBlock, > at the point after blk_X_1001.meta is renamed to blk_X_1001.meta_tmp1002 > **Interesting**, this case, the block X is lost eventually. Why? > After dn1.recoverBlock crashes at rename, what left at dn1 current directory > is: > 1) blk_X > > > 2) blk_X_1001.meta_tmp1002 > ==> this is an invalid block, because it has no meta file associated with it. > dn2 (after dn1 crash) now contains: > 1) blk_X > > > 2) blk_X_1002.meta > (note that the rename at dn2 is completed, because dn1 called > dn2.updateBlock() before > calling its own updateBlock()) > But the command namenode.commitBlockSynchronization is not reported to > namenode, > because dn1 is crashed. Therefore, from namenode point of view, the block X > has GS 1001. > Hence, the block is lost. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1228) CRC does not match when retrying appending a partial block
[ https://issues.apache.org/jira/browse/HDFS-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1228: -- Affects Version/s: 0.20-append (was: 0.20.1) > CRC does not match when retrying appending a partial block > -- > > Key: HDFS-1228 > URL: https://issues.apache.org/jira/browse/HDFS-1228 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Thanh Do > > - Summary: when appending to partial block, if is possible that > retrial when facing an exception fails due to a checksum mismatch. > Append operation is not atomic (either complete or fail completely). > > - Setup: > + # available datanodes = 2 > +# disks / datanode = 1 > + # failures = 1 > + failure type = bad disk > + When/where failure happens = (see below) > > - Details: > Client writes 16 bytes to dn1 and dn2. Write completes. So far so good. > The meta file now contains: 7 bytes header + 4 byte checksum (CK1 - > checksum for 16 byte) Client then appends 16 bytes more, and let assume there > is an > exception at BlockReceiver.receivePacket() at dn2. So the client knows dn2 > is bad. BUT, the append at dn1 is complete (i.e the data portion and checksum > portion > has been made to disk to the corresponding block file and meta file), meaning > that the > checksum file at dn1 now contains 7 bytes header + 4 byte checksum (CK2 - > this is > checksum for 32 byte data). Because dn2 has an exception, client calls > recoverBlock and > starts append again to dn1. dn1 receives 16 byte data, it verifies if the > pre-computed > crc (CK2) matches what we recalculate just now (CK1), which obviously does > not match. > Hence an exception and retrial fails. > > - a similar bug has been reported at > https://issues.apache.org/jira/browse/HDFS-679 > but here, it manifests in different context. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1229) DFSClient incorrectly asks for new block if primary crashes during first recoverBlock
[ https://issues.apache.org/jira/browse/HDFS-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1229: -- Affects Version/s: 0.20-append (was: 0.20.1) > DFSClient incorrectly asks for new block if primary crashes during first > recoverBlock > - > > Key: HDFS-1229 > URL: https://issues.apache.org/jira/browse/HDFS-1229 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.20-append >Reporter: Thanh Do > > Setup: > > + # available datanodes = 2 > + # disks / datanode = 1 > + # failures = 1 > + failure type = crash > + When/where failure happens = during primary's recoverBlock > > Details: > -- > Say client is appending to block X1 in 2 datanodes: dn1 and dn2. > First it needs to make sure both dn1 and dn2 agree on the new GS of the > block. > 1) Client first creates DFSOutputStream by calling > > >OutputStream result = new DFSOutputStream(src, buffersize, progress, > >lastBlock, stat, > > conf.getInt("io.bytes.per.checksum", 512)); > > in DFSClient.append() > > 2) The above DFSOutputStream constructor in turn calls > processDataNodeError(true, true) > (i.e, hasError = true, isAppend = true), and starts the DataStreammer > > > processDatanodeError(true, true); /* let's call this PDNE 1 */ > > streamer.start(); > > Note that DataStreammer.run() also calls processDatanodeError() > > while (!closed && clientRunning) { > > ... > > boolean doSleep = processDatanodeError(hasError, false); /let's call > > this PDNE 2*/ > > 3) Now in the PDNE 1, we have following code: > > > blockStream = null; > > blockReplyStream = null; > > ... > > while (!success && clientRunning) { > > ... > >try { > > primary = createClientDatanodeProtocolProxy(primaryNode, conf); > > newBlock = primary.recoverBlock(block, isAppend, newnodes); > > /*exception here*/ > > ... > >catch (IOException e) { > > ... > > if (recoveryErrorCount > maxRecoveryErrorCount) { > > // this condition is false > > } > > ... > > return true; > >} // end catch > >finally {...} > > > >this.hasError = false; > >lastException = null; > >errorIndex = 0; > >success = createBlockOutputStream(nodes, clientName, true); > >} > >... > > Because dn1 crashes during client call to recoverBlock, we have an exception. > Hence, go to the catch block, in which processDatanodeError returns true > before setting hasError to false. Also, because createBlockOutputStream() is > not called > (due to an early return), blockStream is still null. > > 4) Now PDNE 1 has finished, we come to streamer.start(), which calls PDNE 2. > Because hasError = false, PDNE 2 returns false immediately without doing > anything > > if (!hasError) { return false; } > > 5) still in the DataStreamer.run(), after returning false from PDNE 2, we > still have > blockStream = null, hence the following code is executed: > if (blockStream == null) { >nodes = nextBlockOutputStream(src); >this.setName("DataStreamer for file " + src + " block " + block); >response = new ResponseProcessor(nodes); >response.start(); > } > > nextBlockOutputStream which asks namenode to allocate new Block is called. > (This is not good, because we are appending, not writing). > Namenode gives it new Block ID and a set of datanodes, including crashed dn1. > this leads to createOutputStream() fails because it tries to contact the dn1 > first. > (which has crashed). The client retries 5 times without any success, > because every time, it asks namenode for new block! Again we see > that the retry logic at client is weird! > *This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu)* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1230) BlocksMap.blockinfo is not getting cleared immediately after deleting a block.This will be cleared only after block report comes from the datanode.Why we need to maintain
[ https://issues.apache.org/jira/browse/HDFS-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881523#action_12881523 ] Konstantin Shvachko commented on HDFS-1230: --- What is BlocksMap.blockinfo? There is no such member in BlocksMap. If this is a question please use hdfs lists rather than creating jiras. If not please clarify what exactly you would like to improve. > BlocksMap.blockinfo is not getting cleared immediately after deleting a > block.This will be cleared only after block report comes from the > datanode.Why we need to maintain the blockinfo till that time. > > > Key: HDFS-1230 > URL: https://issues.apache.org/jira/browse/HDFS-1230 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.1 >Reporter: Gokul > > BlocksMap.blockinfo is not getting cleared immediately after deleting a > block.This will be cleared only after block report comes from the > datanode.Why we need to maintain the blockinfo till that time It > increases namenode memory unnecessarily. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1227) UpdateBlock fails due to unmatched file length
[ https://issues.apache.org/jira/browse/HDFS-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1227: -- Affects Version/s: 0.20-append (was: 0.20.1) > UpdateBlock fails due to unmatched file length > -- > > Key: HDFS-1227 > URL: https://issues.apache.org/jira/browse/HDFS-1227 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Thanh Do > > - Summary: client append is not atomic, hence, it is possible that > when retrying during append, there is an exception in updateBlock > indicating unmatched file length, making append failed. > > - Setup: > + # available datanodes = 3 > + # disks / datanode = 1 > + # failures = 2 > + failure type = bad disk > + When/where failure happens = (see below) > + This bug is non-deterministic, to reproduce it, add a sufficient sleep > before out.write() in BlockReceiver.receivePacket() in dn1 and dn2 but not dn3 > > - Details: > Suppose client appends 16 bytes to block X which has length 16 bytes at dn1, > dn2, dn3. > Dn1 is primary. The pipeline is dn3-dn2-dn1. recoverBlock succeeds. > Client starts sending data to the dn3 - the first datanode in pipeline. > dn3 forwards the packet to downstream datanodes, and starts writing > data to its disk. Suppose there is an exception in dn3 when writing to disk. > Client gets the exception, it starts the recovery code by calling > dn1.recoverBlock() again. > dn1 in turn calls dn2.getMetadataInfo() and dn1.getMetaDataInfo() to build > the syncList. > Suppose at the time getMetadataInfo() is called at both datanodes (dn1 and > dn2), > the previous packet (which is sent from dn3) has not come to disk yet. > Hence, the block Info given by getMetaDataInfo contains the length of 16 > bytes. > But after that, the packet "comes" to disk, making the block file length now > becomes 32 bytes. > Using the syncList (with contains block info with length 16 byte), dn1 calls > updateBlock at > dn2 and dn1, which will failed, because the length of new block info (given > by updateBlock, > which is 16 byte) does not match with its actual length on disk (which is 32 > byte) > > Note that this bug is non-deterministic. Its depends on the thread > interleaving > at datanodes. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1223) DataNode fails stop due to a bad disk (or storage directory)
[ https://issues.apache.org/jira/browse/HDFS-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-1223. --- Resolution: Duplicate This is fixed as Todd mentions. BTW sometimes the behavior you describe here is desirable, see HDFS-1158 and HDFS-1161. > DataNode fails stop due to a bad disk (or storage directory) > > > Key: HDFS-1223 > URL: https://issues.apache.org/jira/browse/HDFS-1223 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20.1 >Reporter: Thanh Do > > A datanode can store block files in multiple volumes. > If a datanode sees a bad volume during start up (i.e, face an exception > when accessing that volume), it simply fail stops, making all block files > stored in other healthy volumes inaccessible. Consequently, these lost > replicas will be generated later on in other datanodes. > If a datanode is able to mark the bad disk and continue working with > healthy ones, this will increase availability and avoid unnecessary > regeneration. As an extreme example, consider one datanode which has > 2 volumes V1 and V2, each contains about 1 64MB block files. > During startup, the datanode gets an exception when accessing V1, it then > fail stops, making 2 block files generated later on. > If the datanode masks V1 as bad and continues working with V2, the number > of replicas needed to be regenerated is cut in to half. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1226) Last block is temporary unavailable for readers because of crashed appender
[ https://issues.apache.org/jira/browse/HDFS-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1226: -- Affects Version/s: 0.20-append (was: 0.20.1) > Last block is temporary unavailable for readers because of crashed appender > --- > > Key: HDFS-1226 > URL: https://issues.apache.org/jira/browse/HDFS-1226 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Thanh Do > > - Summary: the last block is unavailable to subsequent readers if appender > crashes in the > middle of appending workload. > > - Setup: > + # available datanodes = 3 > + # disks / datanode = 1 > + # failures = 1 > + failure type = crash > + When/where failure happens = (see below) > > - Details: > Say a client appending to block X at 3 datanodes: dn1, dn2 and dn3. After > successful > recoverBlock at primary datanode, client calls createOutputStream, which make > all datanodes > move the block file and the meta file from current directory to tmp > directory. Now suppose > the client crashes. Since all replicas of block X are in tmp folders of > corresponding datanode, > subsequent readers cannot read block X. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1220) Namenode unable to start due to truncated fstime
[ https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881519#action_12881519 ] Konstantin Shvachko commented on HDFS-1220: --- Is this the same as HDFS-1221? > Namenode unable to start due to truncated fstime > > > Key: HDFS-1220 > URL: https://issues.apache.org/jira/browse/HDFS-1220 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20.1 >Reporter: Thanh Do > > - Summary: updating fstime file on disk is not atomic, so it is possible that > if a crash happens in the middle, next time when NameNode reboots, it will > read stale fstime, hence unable to start successfully. > > - Details: > Basically, this involve 3 steps: > 1) delete fstime file (timeFile.delete()) > 2) truncate fstime file (new FileOutputStream(timeFile)) > 3) write new time to fstime file (out.writeLong(checkpointTime)) > If a crash happens after step 2 and before step 3, in the next reboot, > NameNode > got an exception when reading the time (8 byte) from an empty fstime file. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1231) Generation Stamp mismatches, leading to failed append
[ https://issues.apache.org/jira/browse/HDFS-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1231: -- Affects Version/s: 0.20-append (was: 0.20.1) > Generation Stamp mismatches, leading to failed append > - > > Key: HDFS-1231 > URL: https://issues.apache.org/jira/browse/HDFS-1231 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.20-append >Reporter: Thanh Do > > - Summary: the recoverBlock is not atomic, leading retrial fails when > facing a failure. > > - Setup: > + # available datanodes = 3 > + # disks / datanode = 1 > + # failures = 2 > + failure type = crash > + When/where failure happens = (see below) > > - Details: > Suppose there are 3 datanodes in the pipeline: dn3, dn2, and dn1. Dn1 is > primary. > When appending, client first calls dn1.recoverBlock to make all the datanodes > in > pipeline agree on the new Generation Stamp (GS1) and the length of the block. > Client then sends a data packet to dn3. dn3 in turn forwards this packet to > down stream > dns (dn2 and dn1) and starts writing to its own disk, then it crashes AFTER > writing to the block > file but BEFORE writing to the meta file. Client notices the crash, it calls > dn1.recoverBlock(). > dn1.recoverBlock() first creates a syncList (by calling getMetadataInfo at > all dn2 and dn1). > Then dn1 calls NameNode.getNextGS() to get new Generation Stamp (GS2). > Then it calls dn2.updateBlock(), this returns successfully. > Now, it starts calling its own updateBlock and crashes after renaming from > blk_X_GS1.meta to blk_X_GS1.meta_tmpGS2. > Therefore, dn1.recoverBlock() from the client point of view fails. > but the GS for corresponding block has been incremented in the namenode (GS2) > The client retries by calling dn2.recoverBlock with old GS (GS1), which does > not match with > the new GS at the NameNode (GS1) -->exception, leading to append fails. > > Now, after all, we have > - in dn3 (which is crashed) > tmp/blk_X > tmp/blk_X_GS1.meta > - in dn2 > current/blk_X > current/blk_X_GS2 > - in dn1: > current/blk_X > current/blk_X_GS1.meta_tmpGS2 > - in NameNode, the block X has generation stamp GS1 (because dn1 has not > called > commitSyncronization yet). > > Therefore, when crashed datanodes restart, at dn1 the block is invalid > because > there is no meta file. In dn3, block file and meta file are finalized, > however, the > block is corrupted because CRC mismatch. In dn2, the GS of the block is GS2, > which is not equal with the generation stamp info of the block maintained in > NameNode. > Hence, the block blk_X is inaccessible. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1192) refreshSuperUserGroupsConfiguration should use server side configuration for the refresh (for HADOOP-6815)
[ https://issues.apache.org/jira/browse/HDFS-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881515#action_12881515 ] Hudson commented on HDFS-1192: -- Integrated in Hadoop-Hdfs-trunk-Commit #317 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/317/]) HDFS-1192. refreshSuperUserGroupsConfiguration should use server side configuration for the refresh (for HADOOP-6815) > refreshSuperUserGroupsConfiguration should use server side configuration for > the refresh (for HADOOP-6815) > -- > > Key: HDFS-1192 > URL: https://issues.apache.org/jira/browse/HDFS-1192 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Attachments: HDFS-1192-1.patch, HDFS-1192-10.patch, HDFS-1192-9.patch > > > Currently refreshSuperUserGroupsConfiguration is using client side > Configuration. > One of the issues with this is that if the cluster is restarted it will loose > the "refreshed' values (unless they are copied to the NameNode/JobTracker > machine). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1255) test-libhdfs.sh fails
[ https://issues.apache.org/jira/browse/HDFS-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881517#action_12881517 ] Hudson commented on HDFS-1255: -- Integrated in Hadoop-Hdfs-trunk-Commit #317 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/317/]) Removed errant debug line from HDFS-1255 commit. HDFS-1255. Fix failing test-libhdfs.sh test. > test-libhdfs.sh fails > - > > Key: HDFS-1255 > URL: https://issues.apache.org/jira/browse/HDFS-1255 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Tom White >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: HDFS-1255.patch > > > This is a consequence of bin scripts having moved (see HADOOP-6794). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881512#action_12881512 ] Rodrigo Schmidt commented on HDFS-: --- Thanks for the comments, Konstantin! You are right about 1. I'll change that. As for 2, those words make a big difference for ops people, specially if they are running fsck -list-corruptfiles on a subdirectory. Knowing that the list is empty because there are no corrupt files instead of maybe thinking it's empty because the list reported has more than the limit number of corrupt files in a different directory makes all the difference for them. Let me give you an example: some time ago we had a problem and many files got corrupted. We were using fsck -list-corruptfiles because it was faster and direct, but we wanted to focus on important directories first. Wee ran fsck -list-corruptfiles /path/to/important/dir but it returned an empty list. This was weird because we knew there were corrupt files there. The problem was that we filter the directory after we get the list reported from the namenode and the list is limited. For that reason, it was truncated with files in different directories and reported ambiguous output. Although the new code makes just a minor change to the interface, its meaning makes a huge impact to the user. > getCorruptFiles() should give some hint that the list is not complete > - > > Key: HDFS- > URL: https://issues.apache.org/jira/browse/HDFS- > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Rodrigo Schmidt >Assignee: Rodrigo Schmidt > Attachments: HADFS-.0.patch > > > If the list of corruptfiles returned by the namenode doesn't say anything if > the number of corrupted files is larger than the call output limit (which > means the list is not complete). There should be a way to hint incompleteness > to clients. > A simple hack would be to add an extra entry to the array returned with the > value null. Clients could interpret this as a sign that there are other > corrupt files in the system. > We should also do some rephrasing of the fsck output to make it more > confident when the list is not complete and less confident when the list is > known to be incomplete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881504#action_12881504 ] Konstantin Shvachko commented on HDFS-: --- # If you change {{ClientProtocol}} you should increment the protocol version. # So many changes and new classes with the only outcome that fsck prints prints "ALL" or "A FEW" in the output instead of current "a few". I am curious when is it necessary to know whether thees are all corrupt files or there is more? > getCorruptFiles() should give some hint that the list is not complete > - > > Key: HDFS- > URL: https://issues.apache.org/jira/browse/HDFS- > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Rodrigo Schmidt >Assignee: Rodrigo Schmidt > Attachments: HADFS-.0.patch > > > If the list of corruptfiles returned by the namenode doesn't say anything if > the number of corrupted files is larger than the call output limit (which > means the list is not complete). There should be a way to hint incompleteness > to clients. > A simple hack would be to add an extra entry to the array returned with the > value null. Clients could interpret this as a sign that there are other > corrupt files in the system. > We should also do some rephrasing of the fsck output to make it more > confident when the list is not complete and less confident when the list is > known to be incomplete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1140) Speedup INode.getPathComponents
[ https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881498#action_12881498 ] Konstantin Shvachko commented on HDFS-1140: --- Some review comment: # {{FSImage.isParent(String, String)}} is not used, please remove. # Could you please add separators between the methods and javaDoc descriptions for the new methods if possible. # {{INode.getPathFromComponents()}} should be {{DFSUtil.byteArray2String()}}. # {{TestPathComponents}} should use junit 4 style rather than junit 3. # I'd advise to reuse {{U_STR}} instead of allocating {{DeprecatedUTF8 buff}} directly in {{FSImage.loadFSImage()}}. In order to do that you can provide a convenience method similar to {{readString()}} or {{readBytes()}}: {code} static byte[][] readPathComponents(DataInputStream in) throws IOException { U_STR.readFields(in); return DFSUtil.bytes2byteArray(U_STR.getBytes(), U_STR.getLength(), (byte)Path.SEPARATOR_CHAR); } {code} The idea was to remove DeprecatedUTF8 at some point, so it is better to keep this stuff in one place right after the declaration of U_STR. # It does not look like {{FSDirectory.addToParent(String src ...)}} is used anywhere anymore. Could you please verify and remove it if so. # Same with {{INodeDirectory.addToParent(String path, ...)}} - can we eliminate it too? > Speedup INode.getPathComponents > --- > > Key: HDFS-1140 > URL: https://issues.apache.org/jira/browse/HDFS-1140 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Dmytro Molkov >Assignee: Dmytro Molkov >Priority: Minor > Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.patch > > > When the namenode is loading the image there is a significant amount of time > being spent in the DFSUtil.string2Bytes. We have a very specific workload > here. The path that namenode does getPathComponents for shares N - 1 > component with the previous path this method was called for (assuming current > path has N components). > Hence we can improve the image load time by caching the result of previous > conversion. > We thought of using some simple LRU cache for components, but the reality is, > String.getBytes gets optimized during runtime and LRU cache doesn't perform > as well, however using just the latest path components and their translation > to bytes in two arrays gives quite a performance boost. > I could get another 20% off of the time to load the image on our cluster (30 > seconds vs 24) and I wrote a simple benchmark that tests performance with and > without caching. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1071) savenamespace should write the fsimage to all configured fs.name.dir in parallel
[ https://issues.apache.org/jira/browse/HDFS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881489#action_12881489 ] Dmytro Molkov commented on HDFS-1071: - Well, what I mean by the parent thread holding the lock is the following: the saveNamespace method is synchronized in the FSNamesystem and currently while holding this lock, the handler thread walks the tree N times and writes N files, so in a way we assume that the tree is guarded from all the modifications by the FSNamesystem lock. The same is true for the patch, except in this case we are walking the tree by N different threads. But operating under the same assumptions that while we are holding the FSNamesystem lock the tree is not being modified, and the handler thread is waiting for all worker threads to finish writing to their files before returning from the section synchronized on FSNamesystem. We just deployed this patch internally to our production cluster: 2010-06-22 10:12:59,714 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 11906663754 saved in 140 seconds. 2010-06-22 10:13:50,626 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 11906663754 saved in 191 seconds. This saved us 140 seconds on the current image. As far as both copies being on the same drive is concerned - I guess this patch will not give much of an improvement. However I am not sure there is much value in storing two copies of the image on the same drive? Please correct me if I am wrong, but I thought that multiple copies of the image should theoretically be stored on different drives to help in case of drive failure (or on a filer to protect against machine dying), and storing two copies on the same drive only helps with file corruption (accidental deletion) and that is a weak argument to have multiple copies on one physical drive? I like your approach with one thread doing serialization and others doing writes, but it seems like it is a lot more complicated than the one in this patch. Because I am simply executing one call in a new born thread, while with serializer-writer approach there will be more implementation questions, like what to do with multiple writers that consume their queues at different speeds. You cannot grow the queue indefinitely, since the namenode will simply run out of memory, on the other hand you might want to write things out to faster consumers as quickly as possible. And the main benefit I see is only doing serialization of a tree once, but since we are holding the FSNamesystem lock at that time the NameNode doesn't do much anyways, it is also not worse than what was in place before that (serialization was taking place once per image location). > savenamespace should write the fsimage to all configured fs.name.dir in > parallel > > > Key: HDFS-1071 > URL: https://issues.apache.org/jira/browse/HDFS-1071 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: dhruba borthakur >Assignee: Dmytro Molkov > Attachments: HDFS-1071.2.patch, HDFS-1071.3.patch, HDFS-1071.4.patch, > HDFS-1071.5.patch, HDFS-1071.patch > > > If you have a large number of files in HDFS, the fsimage file is very big. > When the namenode restarts, it writes a copy of the fsimage to all > directories configured in fs.name.dir. This takes a long time, especially if > there are many directories in fs.name.dir. Make the NN write the fsimage to > all these directories in parallel. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1260) 0.20: Block lost when multiple DNs trying to recover it to different genstamps
[ https://issues.apache.org/jira/browse/HDFS-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881491#action_12881491 ] Todd Lipcon commented on HDFS-1260: --- err.,.. sorry... rename the meta block back to *_7093* > 0.20: Block lost when multiple DNs trying to recover it to different genstamps > -- > > Key: HDFS-1260 > URL: https://issues.apache.org/jira/browse/HDFS-1260 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.20-append > > > Saw this issue on a cluster where some ops people were doing network changes > without shutting down DNs first. So, recovery ended up getting started at > multiple different DNs at the same time, and some race condition occurred > that caused a block to get permanently stuck in recovery mode. What seems to > have happened is the following: > - FSDataset.tryUpdateBlock called with old genstamp 7091, new genstamp 7094, > while the block in the volumeMap (and on filesystem) was genstamp 7093 > - we find the block file and meta file based on block ID only, without > comparing gen stamp > - we rename the meta file to the new genstamp _7094 > - in updateBlockMap, we do comparison in the volumeMap by oldblock *without* > wildcard GS, so it does *not* update volumeMap > - validateBlockMetaData now fails with "blk_7739687463244048122_7094 does not > exist in blocks map" > After this point, all future recovery attempts to that node fail in > getBlockMetaDataInfo, since it finds the _7094 gen stamp in getStoredBlock > (since the meta file got renamed above) and then fails since _7094 isn't in > volumeMap in validateBlockMetadata > Making a unit test for this is probably going to be difficult, but doable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1260) 0.20: Block lost when multiple DNs trying to recover it to different genstamps
[ https://issues.apache.org/jira/browse/HDFS-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881490#action_12881490 ] Todd Lipcon commented on HDFS-1260: --- To confirm the suspicion above, I had the operator rename the meta block back to _7094, and the next recovery attempt succeeded. > 0.20: Block lost when multiple DNs trying to recover it to different genstamps > -- > > Key: HDFS-1260 > URL: https://issues.apache.org/jira/browse/HDFS-1260 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.20-append > > > Saw this issue on a cluster where some ops people were doing network changes > without shutting down DNs first. So, recovery ended up getting started at > multiple different DNs at the same time, and some race condition occurred > that caused a block to get permanently stuck in recovery mode. What seems to > have happened is the following: > - FSDataset.tryUpdateBlock called with old genstamp 7091, new genstamp 7094, > while the block in the volumeMap (and on filesystem) was genstamp 7093 > - we find the block file and meta file based on block ID only, without > comparing gen stamp > - we rename the meta file to the new genstamp _7094 > - in updateBlockMap, we do comparison in the volumeMap by oldblock *without* > wildcard GS, so it does *not* update volumeMap > - validateBlockMetaData now fails with "blk_7739687463244048122_7094 does not > exist in blocks map" > After this point, all future recovery attempts to that node fail in > getBlockMetaDataInfo, since it finds the _7094 gen stamp in getStoredBlock > (since the meta file got renamed above) and then fails since _7094 isn't in > volumeMap in validateBlockMetadata > Making a unit test for this is probably going to be difficult, but doable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1260) 0.20: Block lost when multiple DNs trying to recover it to different genstamps
0.20: Block lost when multiple DNs trying to recover it to different genstamps -- Key: HDFS-1260 URL: https://issues.apache.org/jira/browse/HDFS-1260 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 0.20-append Saw this issue on a cluster where some ops people were doing network changes without shutting down DNs first. So, recovery ended up getting started at multiple different DNs at the same time, and some race condition occurred that caused a block to get permanently stuck in recovery mode. What seems to have happened is the following: - FSDataset.tryUpdateBlock called with old genstamp 7091, new genstamp 7094, while the block in the volumeMap (and on filesystem) was genstamp 7093 - we find the block file and meta file based on block ID only, without comparing gen stamp - we rename the meta file to the new genstamp _7094 - in updateBlockMap, we do comparison in the volumeMap by oldblock *without* wildcard GS, so it does *not* update volumeMap - validateBlockMetaData now fails with "blk_7739687463244048122_7094 does not exist in blocks map" After this point, all future recovery attempts to that node fail in getBlockMetaDataInfo, since it finds the _7094 gen stamp in getStoredBlock (since the meta file got renamed above) and then fails since _7094 isn't in volumeMap in validateBlockMetadata Making a unit test for this is probably going to be difficult, but doable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1259) getCorruptFiles() should double check blocks that are not really corrupted
getCorruptFiles() should double check blocks that are not really corrupted -- Key: HDFS-1259 URL: https://issues.apache.org/jira/browse/HDFS-1259 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.22.0 Reporter: Rodrigo Schmidt Fix For: 0.22.0 the getCorruptFiles() api is based on blocks that are in the queue of under-replicated blocks. However, this queue might be outdated and report blocks that are safe, specially during restarts when block reports might get delayed. getCorruptFiles() should double check if the block is really corrupted based on blocksMap before reporting it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: hdfs-1057-trunk-4.txt includes requested changes by hairong. also handles immediate reading of new files by translating a ReplicaNotFoundException into a 0-length block within DFSInputStream for under construction files > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Fix For: 0.20-append > > Attachments: conurrent-reader-patch-1.txt, > conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, > hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, > hdfs-1057-trunk-4.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1255) test-libhdfs.sh fails
[ https://issues.apache.org/jira/browse/HDFS-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated HDFS-1255: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed I've just committed this. (Note that this script is not run as a part of the test target, but I verified it manually.) > test-libhdfs.sh fails > - > > Key: HDFS-1255 > URL: https://issues.apache.org/jira/browse/HDFS-1255 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Tom White >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: HDFS-1255.patch > > > This is a consequence of bin scripts having moved (see HADOOP-6794). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1212) Harmonize HDFS JAR library versions with Common
[ https://issues.apache.org/jira/browse/HDFS-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated HDFS-1212: Status: Patch Available (was: Open) > Harmonize HDFS JAR library versions with Common > --- > > Key: HDFS-1212 > URL: https://issues.apache.org/jira/browse/HDFS-1212 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Reporter: Tom White >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: HDFS-1212.patch, HDFS-1212.patch > > > HDFS part of HADOOP-6800. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1212) Harmonize HDFS JAR library versions with Common
[ https://issues.apache.org/jira/browse/HDFS-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated HDFS-1212: Attachment: HDFS-1212.patch New patch to use Jetty's ant dependency. > Harmonize HDFS JAR library versions with Common > --- > > Key: HDFS-1212 > URL: https://issues.apache.org/jira/browse/HDFS-1212 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Reporter: Tom White >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: HDFS-1212.patch, HDFS-1212.patch > > > HDFS part of HADOOP-6800. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1071) savenamespace should write the fsimage to all configured fs.name.dir in parallel
[ https://issues.apache.org/jira/browse/HDFS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881444#action_12881444 ] Konstantin Shvachko commented on HDFS-1071: --- > saveNamespace can only be done when in safemode, so we are only performing > read operations on the datastructure that is essentially read only Yes you can saveNamespace only only in safe mode, but the data structures are not read only. For example, {{getBlockLocations()}} updates {{accessTime}} for file inodes. This means that different threads in your implementation may record different atimes of the same file, depending on when they write it. This means that file system images will not be identical in different directories. They have been so far. > while the parent thread is holding a lock, right? Are you relying on the parent thread lock? Could you please explain how that works? In fact I thought you would implement something like this: one thread traverses the tree, serializes image objects and puts them into a queue, where other writing threads pick them up and write to disk in parallel. Then it is guaranteed that images are exactly the same. > so it will take 1.5-2 minutes for both. Have you actually tested it? Is it the same if both directories are on the same drive? > savenamespace should write the fsimage to all configured fs.name.dir in > parallel > > > Key: HDFS-1071 > URL: https://issues.apache.org/jira/browse/HDFS-1071 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: dhruba borthakur >Assignee: Dmytro Molkov > Attachments: HDFS-1071.2.patch, HDFS-1071.3.patch, HDFS-1071.4.patch, > HDFS-1071.5.patch, HDFS-1071.patch > > > If you have a large number of files in HDFS, the fsimage file is very big. > When the namenode restarts, it writes a copy of the fsimage to all > directories configured in fs.name.dir. This takes a long time, especially if > there are many directories in fs.name.dir. Make the NN write the fsimage to > all these directories in parallel. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1108) ability to create a file whose newly allocated blocks are automatically persisted immediately
[ https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Molkov updated HDFS-1108: Status: Patch Available (was: Open) > ability to create a file whose newly allocated blocks are automatically > persisted immediately > - > > Key: HDFS-1108 > URL: https://issues.apache.org/jira/browse/HDFS-1108 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: dhruba borthakur >Assignee: Dmytro Molkov > Attachments: HDFS-1108.patch > > > The current HDFS design says that newly allocated blocks for a file are not > persisted in the NN transaction log when the block is allocated. Instead, a > hflush() or a close() on the file persists the blocks into the transaction > log. It would be nice if we can immediately persist newly allocated blocks > (as soon as they are allocated) for specific files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1221) NameNode unable to start due to stale edits log after a crash
[ https://issues.apache.org/jira/browse/HDFS-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881425#action_12881425 ] Konstantin Shvachko commented on HDFS-1221: --- If I understand it correctly, this is about name-node failure, when the image or edits files get corrupt, and when name-node detects that it fails even though there are other directories with good images and edits, right? I think this works as designed. We want the admins to know that something went wrong with those directories in bad conditions rather than silently starting the name-node. Admins may choose to manually change configuration, replace drives or something else, and restart the name-node again. Did you check HDFS-955, which fixed similar issues I believe? Since you do not provide test cases it is really hard to understand what failure condition exactly you are talking about. Are you planning to contribute your Failure Testing Service framework? > NameNode unable to start due to stale edits log after a crash > - > > Key: HDFS-1221 > URL: https://issues.apache.org/jira/browse/HDFS-1221 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20.1 >Reporter: Thanh Do > > - Summary: > If a crash happens during FSEditLog.createEditLogFile(), the > edits log file on disk may be stale. During next reboot, NameNode > will get an exception when parsing the edits file, because of stale data, > leading to unsuccessful reboot. > Note: This is just one example. Since we see that edits log (and fsimage) > does not have checksum, they are vulnerable to corruption too. > > - Details: > The steps to create new edits log (which we infer from HDFS code) are: > 1) truncate the file to zero size > 2) write FSConstants.LAYOUT_VERSION to buffer > 3) insert the end-of-file marker OP_INVALID to the end of the buffer > 4) preallocate 1MB of data, and fill the data with 0 > 5) flush the buffer to disk > > Note that only in step 1, 4, 5, the data on disk is actually changed. > Now, suppose a crash happens after step 4, but before step 5. > In the next reboot, NameNode will fetch this edits log file (which contains > all 0). The first thing parsed is the LAYOUT_VERSION, which is 0. This is OK, > because NameNode has code to handle that case. > (but we expect LAYOUT_VERSION to be -18, don't we). > Now it parses the operation code, which happens to be 0. Unfortunately, since > 0 > is the value for OP_ADD, the NameNode expects some parameters corresponding > to that operation. Now NameNode calls readString to read the path, which > throws > an exception leading to a failed reboot. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1258) Clearing namespace quota on "/" corrupts FS image
[ https://issues.apache.org/jira/browse/HDFS-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881424#action_12881424 ] Tom White commented on HDFS-1258: - This is serious, for sure, but I think we could release 0.21.0 without it. The point of 0.21.0 is to exercise the release process, and make a Hadoop release available to people who want to try newer features and help stabilize post-20 Hadoop, so that later 0.21 releases and the 0.22 release in November will be more widely usable. 0.21 already has known issues (e.g. HDFS-875), so this one too could be called out in the release notes, so folks are made aware of its seriousness. > Clearing namespace quota on "/" corrupts FS image > - > > Key: HDFS-1258 > URL: https://issues.apache.org/jira/browse/HDFS-1258 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Aaron T. Myers >Priority: Blocker > Fix For: 0.20.3, 0.21.0, 0.22.0 > > > The HDFS root directory starts out with a default namespace quota of > Integer.MAX_VALUE. If you clear this quota (using "hadoop dfsadmin -clrQuota > /"), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, > and the NN will not come back up from a restart. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1108) ability to create a file whose newly allocated blocks are automatically persisted immediately
[ https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Molkov updated HDFS-1108: Attachment: HDFS-1108.patch This is a rather simple patch. When we allocate new block - call persistBlocks on the file. While we do not sync the transaction log to disk this seems to be reasonable enough, since the only problematic case is with the machine crashing. The test creates a file, waits for a few blocks to become available, then stops the mini cluster without closing the file and starts it again. The length of the file cannot be smaller than before the restart if the blocks were persisted into log. > ability to create a file whose newly allocated blocks are automatically > persisted immediately > - > > Key: HDFS-1108 > URL: https://issues.apache.org/jira/browse/HDFS-1108 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: dhruba borthakur >Assignee: Dmytro Molkov > Attachments: HDFS-1108.patch > > > The current HDFS design says that newly allocated blocks for a file are not > persisted in the NN transaction log when the block is allocated. Instead, a > hflush() or a close() on the file persists the blocks into the transaction > log. It would be nice if we can immediately persist newly allocated blocks > (as soon as they are allocated) for specific files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881421#action_12881421 ] sam rash commented on HDFS-1057: i have an updated patch, but it does not yet handle the missing replicas as 0 sized for under construction. there may be other 20 patches to port to make this happen. > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Fix For: 0.20-append > > Attachments: conurrent-reader-patch-1.txt, > conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, > hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1192) refreshSuperUserGroupsConfiguration should use server side configuration for the refresh (for HADOOP-6815)
[ https://issues.apache.org/jira/browse/HDFS-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881417#action_12881417 ] Jitendra Nath Pandey commented on HDFS-1192: +1 for the patch. > refreshSuperUserGroupsConfiguration should use server side configuration for > the refresh (for HADOOP-6815) > -- > > Key: HDFS-1192 > URL: https://issues.apache.org/jira/browse/HDFS-1192 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Attachments: HDFS-1192-1.patch, HDFS-1192-10.patch, HDFS-1192-9.patch > > > Currently refreshSuperUserGroupsConfiguration is using client side > Configuration. > One of the issues with this is that if the cluster is restarted it will loose > the "refreshed' values (unless they are copied to the NameNode/JobTracker > machine). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881413#action_12881413 ] dhruba borthakur commented on HDFS-1057: hi sam, will it be possible for you to address hairong's feedback and provide a new patch? Thanks. > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Fix For: 0.20-append > > Attachments: conurrent-reader-patch-1.txt, > conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, > hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1194) Secondary namenode fails to fetch the image from the primary
[ https://issues.apache.org/jira/browse/HDFS-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881412#action_12881412 ] Dmytro Molkov commented on HDFS-1194: - We switched to using jetty 6.1.24 and can now checkpoint using secondary again. The log on both nodes shows that we are hitting the JVM bug over and over again (24 jetty has instrumentation to better understand what is happening to the transfer). So I say we should update the jetty version from the currently used 6.1.14 to 6.1.24 > Secondary namenode fails to fetch the image from the primary > > > Key: HDFS-1194 > URL: https://issues.apache.org/jira/browse/HDFS-1194 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0 > Environment: Java(TM) SE Runtime Environment (build 1.6.0_14-b08) > Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode) > CentOS 5 >Reporter: Dmytro Molkov >Assignee: Dmytro Molkov > > We just hit the problem described in HDFS-1024 again. > After more investigation of the underlying problems with > CancelledKeyException there are some findings: > One of the symptoms: the transfer becomes really slow (it does 700 kb/s) when > I am doing the fetch using wget. At the same time disk and network are OK > since I can copy at 50 mb/s using scp. > I was taking jstacks of the namenode while the transfer is in process and we > found that every stack trace has one thread of jetty sitting in this place: > {code} >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:452) > at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185) > at > org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) > at > org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > {code} > Here is a jetty code that corresponds to this: > {code} > // Look for JVM bug > if (selected==0 && wait>0 && (now-before) _selector.selectedKeys().size()==0) > { > if (_jvmBug++>5) // TODO tune or configure this > { > // Probably JVM BUG! > > Iterator iter = _selector.keys().iterator(); > while(iter.hasNext()) > { > key = (SelectionKey) iter.next(); > if (key.isValid()&&key.interestOps()==0) > { > key.cancel(); > } > } > try > { > Thread.sleep(20); // tune or configure this > } > catch (InterruptedException e) > { > Log.ignore(e); > } > } > } > {code} > Based on this it is obvious we are hitting a jetty workaround for a JVM bug > that doesn't handle select() properly. > There is a jetty JIRA for this http://jira.codehaus.org/browse/JETTY-937 (it > actually introduces the workaround for the JVM bug that we are hitting) > They say that the problem was fixed in 6.1.22, there is a person on that JIRA > also saying that switching to using SocketConnector instead of > SelectChannelConnector helped in their case. > Since we are hitting the same bug in our world we should either adopt the > newer Jetty version where there is a better workaround, but it might not help > if we are still hitting that bug constantly, the workaround might be better > though. > Another approach is to switch to using SocketConnector which will eliminate > the problem completely, although I am not sure what problems that will bring. > The java version we are running is in Environment > Any thoughts -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1218) 20 append: Blocks recovered on startup should be treated with lower priority during block synchronization
[ https://issues.apache.org/jira/browse/HDFS-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881411#action_12881411 ] dhruba borthakur commented on HDFS-1218: Ok no problem. then we will just have to wait for HDFS-1056 and HDFS-1057 to be committed into trunk first. > 20 append: Blocks recovered on startup should be treated with lower priority > during block synchronization > - > > Key: HDFS-1218 > URL: https://issues.apache.org/jira/browse/HDFS-1218 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.20-append > > Attachments: hdfs-1281.txt > > > When a datanode experiences power loss, it can come back up with truncated > replicas (due to local FS journal replay). Those replicas should not be > allowed to truncate the block during block synchronization if there are other > replicas from DNs that have _not_ restarted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1258) Clearing namespace quota on "/" corrupts FS image
[ https://issues.apache.org/jira/browse/HDFS-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881410#action_12881410 ] Tsz Wo (Nicholas), SZE commented on HDFS-1258: -- Unfortunately, this seems true. # start hdfs # put a file # clear / quota # restart namenode {noformat} 2010-06-22 22:22:16,337 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.hdfs.server.namenode.FSImage.readBytes(FSImage.java:1588) at org.apache.hadoop.hdfs.server.namenode.FSImage.readINodeUnderConstruction(FSImage.java:1227) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1205) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:303) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:284) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) 2010-06-22 22:22:16,338 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000 2010-06-22 22:22:16,339 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.hdfs.server.namenode.FSImage.readBytes(FSImage.java:1588) at org.apache.hadoop.hdfs.server.namenode.FSImage.readINodeUnderConstruction(FSImage.java:1227) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1205) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:807) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:303) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:284) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:279) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) {noformat} > Clearing namespace quota on "/" corrupts FS image > - > > Key: HDFS-1258 > URL: https://issues.apache.org/jira/browse/HDFS-1258 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Aaron T. Myers >Priority: Blocker > Fix For: 0.20.3, 0.21.0, 0.22.0 > > > The HDFS root directory starts out with a default namespace quota of > Integer.MAX_VALUE. If you clear this quota (using "hadoop dfsadmin -clrQuota > /"), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, > and the NN will not come back up from a restart. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1056) Multi-node RPC deadlocks during block recovery
[ https://issues.apache.org/jira/browse/HDFS-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HDFS-1056: --- Fix Version/s: 0.20-append This should be fixed for the 0.20-append branch as well. > Multi-node RPC deadlocks during block recovery > -- > > Key: HDFS-1056 > URL: https://issues.apache.org/jira/browse/HDFS-1056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Todd Lipcon > Fix For: 0.20-append > > > Believe it or not, I'm seeing HADOOP-3657 / HADOOP-3673 in a 5-node 0.20 > cluster. I have many concurrent writes on the cluster, and when I kill a DN, > some percentage of the time I get one of these cross-node deadlocks among 3 > of the nodes (replication 3). All of the DN RPC server threads are tied up > waiting on RPC clients to other datanodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HDFS-1057: --- Fix Version/s: 0.20-append we need this for the 0.20-append branch too. > Concurrent readers hit ChecksumExceptions if following a writer to very end > of file > --- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: sam rash >Priority: Blocker > Fix For: 0.20-append > > Attachments: conurrent-reader-patch-1.txt, > conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, > hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before > calling flush(). Therefore, if there is a concurrent reader, it's possible to > race here - the reader will see the new length while those bytes are still in > the buffers of BlockReceiver. Thus the client will potentially see checksum > errors or EOFs. Additionally, the last checksum chunk of the file is made > accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1218) 20 append: Blocks recovered on startup should be treated with lower priority during block synchronization
[ https://issues.apache.org/jira/browse/HDFS-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881401#action_12881401 ] Todd Lipcon commented on HDFS-1218: --- The patch has a lot of conflicts unless we do HDFS-1057 and HDFS-1056 first. Rather than try to resolve those conflicts, I think it's safer to get those patches in first, and then have less confict resolution work to do here (I'm afraid of switching around the application order too much - too easy to flub resolution and introduce a bug). > 20 append: Blocks recovered on startup should be treated with lower priority > during block synchronization > - > > Key: HDFS-1218 > URL: https://issues.apache.org/jira/browse/HDFS-1218 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.20-append > > Attachments: hdfs-1281.txt > > > When a datanode experiences power loss, it can come back up with truncated > replicas (due to local FS journal replay). Those replicas should not be > allowed to truncate the block during block synchronization if there are other > replicas from DNs that have _not_ restarted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1202) DataBlockScanner throws NPE when updated before initialized
[ https://issues.apache.org/jira/browse/HDFS-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881398#action_12881398 ] dhruba borthakur commented on HDFS-1202: I will commit this to trunk as well as 0.20-append branch as soon as HadoopQA runs its tests on it. BTW, the patch for 0.20-append branch does not apply cleanly. Todd: can you pl upload a new version of this patch? Thanks a lot once again. > DataBlockScanner throws NPE when updated before initialized > --- > > Key: HDFS-1202 > URL: https://issues.apache.org/jira/browse/HDFS-1202 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append, 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.20-append, 0.22.0 > > Attachments: hdfs-1202-0.20-append.txt, hdfs-1202.txt > > > Missing an isInitialized() check in updateScanStatusInternal -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1218) 20 append: Blocks recovered on startup should be treated with lower priority during block synchronization
[ https://issues.apache.org/jira/browse/HDFS-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881395#action_12881395 ] dhruba borthakur commented on HDFS-1218: This looks ready for commit into 0.20-append. Todd: can you pl upload a patch that merges with 0-20-append branch? Thanks a lot. > 20 append: Blocks recovered on startup should be treated with lower priority > during block synchronization > - > > Key: HDFS-1218 > URL: https://issues.apache.org/jira/browse/HDFS-1218 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.20-append > > Attachments: hdfs-1281.txt > > > When a datanode experiences power loss, it can come back up with truncated > replicas (due to local FS journal replay). Those replicas should not be > allowed to truncate the block during block synchronization if there are other > replicas from DNs that have _not_ restarted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HDFS-1214) hdfs client metadata cache
[ https://issues.apache.org/jira/browse/HDFS-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash reassigned HDFS-1214: -- Assignee: sam rash > hdfs client metadata cache > -- > > Key: HDFS-1214 > URL: https://issues.apache.org/jira/browse/HDFS-1214 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Joydeep Sen Sarma >Assignee: sam rash > > In some applications, latency is affected by the cost of making rpc calls to > namenode to fetch metadata. the most obvious case are calls to fetch > file/directory status. applications like hive like to make optimizations > based on file size/number etc. - and for such optimizations - 'recent' status > data (as opposed to most up-to-date) is acceptable. in such cases, a cache on > the DFS client that transparently caches metadata would be greatly benefit > applications. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1258) Clearing namespace quota on "/" corrupts FS image
[ https://issues.apache.org/jira/browse/HDFS-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1258: - Fix Version/s: 0.20.3 0.21.0 0.22.0 Priority: Blocker (was: Major) Let's mark this as a blocker. We have to resolve this before new releases anyway. > Clearing namespace quota on "/" corrupts FS image > - > > Key: HDFS-1258 > URL: https://issues.apache.org/jira/browse/HDFS-1258 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Aaron T. Myers >Priority: Blocker > Fix For: 0.20.3, 0.21.0, 0.22.0 > > > The HDFS root directory starts out with a default namespace quota of > Integer.MAX_VALUE. If you clear this quota (using "hadoop dfsadmin -clrQuota > /"), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, > and the NN will not come back up from a restart. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1253) bin/hdfs conflicts with common user shortcut
[ https://issues.apache.org/jira/browse/HDFS-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881392#action_12881392 ] Allen Wittenauer commented on HDFS-1253: I know it is all over Y!'s and LI's wiki pages and internal (and sometimes external) presentations. A more public one is this one: http://blog.rapleaf.com/dev/2009/11/17/command-line-auto-completion-for-hadoop-dfs-commands/ > bin/hdfs conflicts with common user shortcut > > > Key: HDFS-1253 > URL: https://issues.apache.org/jira/browse/HDFS-1253 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Allen Wittenauer >Priority: Blocker > Fix For: 0.21.0 > > > The 'hdfs' command introduced in 0.21 (unreleased at this time) conflicts > with a common user alias and wrapper script. This change should either be > reverted or moved from $HADOOP_HOME/bin to somewhere else in $HADOOP_HOME > (perhaps sbin?) so that users do not accidentally hit it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1218) 20 append: Blocks recovered on startup should be treated with lower priority during block synchronization
[ https://issues.apache.org/jira/browse/HDFS-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881389#action_12881389 ] Todd Lipcon commented on HDFS-1218: --- bq. todd: do we need to port this to trunk? or does trunk already handle this since it has the RBW/RWR? Trunk should already handle this case. We should port forward these test cases at some point, but there's already an open JIRA to move forward all the new TestFileAppend4 cases - hopefully we can do that after we finish getting everything in this branch. > 20 append: Blocks recovered on startup should be treated with lower priority > during block synchronization > - > > Key: HDFS-1218 > URL: https://issues.apache.org/jira/browse/HDFS-1218 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.20-append > > Attachments: hdfs-1281.txt > > > When a datanode experiences power loss, it can come back up with truncated > replicas (due to local FS journal replay). Those replicas should not be > allowed to truncate the block during block synchronization if there are other > replicas from DNs that have _not_ restarted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1218) 20 append: Blocks recovered on startup should be treated with lower priority during block synchronization
[ https://issues.apache.org/jira/browse/HDFS-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881382#action_12881382 ] sam rash commented on HDFS-1218: sorry, to clarify my previous comment, "this" = the case in my previous(previous comment) that was talking about moving the recovery-state check inside the else block. so just to clarify, i think this patch is good for 0.20-append. todd: do we need to port this to trunk? or does trunk already handle this since it has the RBW/RWR? > 20 append: Blocks recovered on startup should be treated with lower priority > during block synchronization > - > > Key: HDFS-1218 > URL: https://issues.apache.org/jira/browse/HDFS-1218 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.20-append > > Attachments: hdfs-1281.txt > > > When a datanode experiences power loss, it can come back up with truncated > replicas (due to local FS journal replay). Those replicas should not be > allowed to truncate the block during block synchronization if there are other > replicas from DNs that have _not_ restarted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1202) DataBlockScanner throws NPE when updated before initialized
[ https://issues.apache.org/jira/browse/HDFS-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1202: -- Attachment: hdfs-1202.txt Patch attached for trunk. No unit test as this is pretty trivial and difficult to isolate in a test. > DataBlockScanner throws NPE when updated before initialized > --- > > Key: HDFS-1202 > URL: https://issues.apache.org/jira/browse/HDFS-1202 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append, 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.20-append, 0.22.0 > > Attachments: hdfs-1202-0.20-append.txt, hdfs-1202.txt > > > Missing an isInitialized() check in updateScanStatusInternal -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1202) DataBlockScanner throws NPE when updated before initialized
[ https://issues.apache.org/jira/browse/HDFS-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1202: -- Status: Patch Available (was: Open) Fix Version/s: 0.22.0 > DataBlockScanner throws NPE when updated before initialized > --- > > Key: HDFS-1202 > URL: https://issues.apache.org/jira/browse/HDFS-1202 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append, 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.20-append, 0.22.0 > > Attachments: hdfs-1202-0.20-append.txt, hdfs-1202.txt > > > Missing an isInitialized() check in updateScanStatusInternal -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1253) bin/hdfs conflicts with common user shortcut
[ https://issues.apache.org/jira/browse/HDFS-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881375#action_12881375 ] Jeff Hammerbacher commented on HDFS-1253: - Hey Allen, Can you point to places on the net where the hdfs alias is used? It hasn't been as common in environments in which I've worked. I'm afraid we're optimizing for a potential use case rather than a real use case. In any case, as you pointed out in IRC, if a user has an alias for hdfs, that will take precedence over their PATH setting, so it's unlikely that they'll get bitten too hard. Thanks, Jeff > bin/hdfs conflicts with common user shortcut > > > Key: HDFS-1253 > URL: https://issues.apache.org/jira/browse/HDFS-1253 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Allen Wittenauer >Priority: Blocker > Fix For: 0.21.0 > > > The 'hdfs' command introduced in 0.21 (unreleased at this time) conflicts > with a common user alias and wrapper script. This change should either be > reverted or moved from $HADOOP_HOME/bin to somewhere else in $HADOOP_HOME > (perhaps sbin?) so that users do not accidentally hit it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1202) DataBlockScanner throws NPE when updated before initialized
[ https://issues.apache.org/jira/browse/HDFS-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881370#action_12881370 ] sam rash commented on HDFS-1202: this looks good. I checked trunk and I think it is needed there also > DataBlockScanner throws NPE when updated before initialized > --- > > Key: HDFS-1202 > URL: https://issues.apache.org/jira/browse/HDFS-1202 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append, 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.20-append > > Attachments: hdfs-1202-0.20-append.txt > > > Missing an isInitialized() check in updateScanStatusInternal -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1209) Add conf dfs.client.block.recovery.retries to configure number of block recovery attempts
[ https://issues.apache.org/jira/browse/HDFS-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1209: -- Attachment: hdfs-1209.txt There was some whitespace issue with the previous patch. This one applies against branch. > Add conf dfs.client.block.recovery.retries to configure number of block > recovery attempts > - > > Key: HDFS-1209 > URL: https://issues.apache.org/jira/browse/HDFS-1209 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-1209.txt, hdfs-1209.txt > > > This variable is referred to in the TestFileAppend4 tests, but it isn't > actually looked at by DFSClient (I'm betting this is in FB's branch). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1258) Clearing namespace quota on "/" corrupts FS image
[ https://issues.apache.org/jira/browse/HDFS-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881335#action_12881335 ] Allen Wittenauer commented on HDFS-1258: If someone validates this, we should probably mark this as a blocker for 0.21. > Clearing namespace quota on "/" corrupts FS image > - > > Key: HDFS-1258 > URL: https://issues.apache.org/jira/browse/HDFS-1258 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Aaron T. Myers > > The HDFS root directory starts out with a default namespace quota of > Integer.MAX_VALUE. If you clear this quota (using "hadoop dfsadmin -clrQuota > /"), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, > and the NN will not come back up from a restart. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1258) Clearing namespace quota on "/" corrupts FS image
Clearing namespace quota on "/" corrupts FS image - Key: HDFS-1258 URL: https://issues.apache.org/jira/browse/HDFS-1258 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Aaron T. Myers The HDFS root directory starts out with a default namespace quota of Integer.MAX_VALUE. If you clear this quota (using "hadoop dfsadmin -clrQuota /"), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, and the NN will not come back up from a restart. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1224) Stale connection makes node miss append
[ https://issues.apache.org/jira/browse/HDFS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881330#action_12881330 ] Todd Lipcon commented on HDFS-1224: --- I still don't believe this is valid. Maybe I'm just not understanding. Could you please post a test case that shows the issue? You can follow the model of the tests in TestFileAppend4 in the 0.20-append branch - it has a number of tests similar to what you're describing. > Stale connection makes node miss append > --- > > Key: HDFS-1224 > URL: https://issues.apache.org/jira/browse/HDFS-1224 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Thanh Do > > - Summary: if a datanode crashes and restarts, it may miss an append. > > - Setup: > + # available datanodes = 3 > + # replica = 3 > + # disks / datanode = 1 > + # failures = 1 > + failure type = crash > + When/where failure happens = after the first append succeed > > - Details: > Since each datanode maintains a pool of IPC connections, whenever it wants > to make an IPC call, it first looks into the pool. If the connection is not > there, > it is created and put in to the pool. Otherwise the existing connection is > used. > Suppose that the append pipeline contains dn1, dn2, and dn3. Dn1 is the > primary. > After the client appends to block X successfully, dn2 crashes and restarts. > Now client writes a new block Y to dn1, dn2 and dn3. The write is successful. > Client starts appending to block Y. It first calls dn1.recoverBlock(). > Dn1 will first create a proxy corresponding with each of the datanode in the > pipeline > (in order to make RPC call like getMetadataInfo( ) or updateBlock( )). > However, because > dn2 has just crashed and restarts, its connection in dn1's pool become stale. > Dn1 uses > this connection to make a call to dn2, hence an exception. Therefore, append > will be > made only to dn1 and dn3, although dn2 is alive and the write of block Y to > dn2 has > been successful. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1245) Plugable block id generation
[ https://issues.apache.org/jira/browse/HDFS-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881325#action_12881325 ] Konstantin Shvachko commented on HDFS-1245: --- The idea is interesting, but it will be hard to provide protection from switching from one block id generator to another incompatible one. Like in HDFS-898 it is easy to write a new sequential block id generator, the hard problem is to covert current file system to the new generating strategy. It would be good if you could provide some motivation on - why you need generator to be plugable, and - what stratagies other than random and sequential do you have in mind, and - in which conditions they may be useful. You know, if I had a choice I'd rather not choose between paper and plastic. Ask sysadmins. > Plugable block id generation > - > > Key: HDFS-1245 > URL: https://issues.apache.org/jira/browse/HDFS-1245 > Project: Hadoop HDFS > Issue Type: New Feature > Components: name-node >Reporter: Dmytro Molkov >Assignee: Dmytro Molkov > > The idea is to have a way to easily create block id generation engines that > may fit a certain purpose. One of them could be HDFS-898 started by > Konstantin, but potentially others. > We chatted with Dhruba about this for a while and came up with the following > approach: > There should be a BlockIDGenerator interface that has following methods: > void blockAdded(Block) > void blockRemoved(Block) > Block nextBlock() > First two methods are needed for block generation engines that hold a certain > state. During the restart, when namenode reads the fsimage it will notify > generator about all the blocks it reads from the image and during runtime > namenode will notify the generator about block removals on file deletion. > The instance of the generator will also have a reference to the block > registry, the interface that BlockManager implements. The only method there > is __blockExists(Block)__, so that the current random block id generation can > be implemented, since it needs to check with the block manager if the id is > already present. > What does the community think about this proposal? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1239) All datanodes are bad in 2nd phase
[ https://issues.apache.org/jira/browse/HDFS-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-1239. --- Resolution: Invalid > All datanodes are bad in 2nd phase > -- > > Key: HDFS-1239 > URL: https://issues.apache.org/jira/browse/HDFS-1239 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.20.1 >Reporter: Thanh Do > > - Setups: > number of datanodes = 2 > replication factor = 2 > Type of failure: transient fault (a java i/o call throws an exception or > return false) > Number of failures = 2 > when/where failures happen = during the 2nd phase of the pipeline, each > happens at each datanode when trying to perform I/O > (e.g. dataoutputstream.flush()) > > - Details: > > This is similar to HDFS-1237. > In this case, node1 throws exception that makes client creates > a pipeline only with node2, then tries to redo the whole thing, > which throws another failure. So at this point, the client considers > all datanodes are bad, and never retries the whole thing again, > (i.e. it never asks the namenode again to ask for a new set of datanodes). > In HDFS-1237, the bug is due to permanent disk fault. In this case, it's > about transient error. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1239) All datanodes are bad in 2nd phase
[ https://issues.apache.org/jira/browse/HDFS-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881305#action_12881305 ] Konstantin Shvachko commented on HDFS-1239: --- > client-namenode protocol does not allow the client to say to the namenode > "hey, i tried to write to the datanodes you've given me, but it fails, could > you give me other datanodes please?" This is incorrect. There is such logic. See {{DFSOutputStream.DataStreamer.run()}}. This is where the logic is implemented. If you are on 0.20 then it should be in DataNode.java. The client retries but the name-node does not have more data-nodes to assign the replicas to - there is only 2 in the cluster. > All datanodes are bad in 2nd phase > -- > > Key: HDFS-1239 > URL: https://issues.apache.org/jira/browse/HDFS-1239 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.20.1 >Reporter: Thanh Do > > - Setups: > number of datanodes = 2 > replication factor = 2 > Type of failure: transient fault (a java i/o call throws an exception or > return false) > Number of failures = 2 > when/where failures happen = during the 2nd phase of the pipeline, each > happens at each datanode when trying to perform I/O > (e.g. dataoutputstream.flush()) > > - Details: > > This is similar to HDFS-1237. > In this case, node1 throws exception that makes client creates > a pipeline only with node2, then tries to redo the whole thing, > which throws another failure. So at this point, the client considers > all datanodes are bad, and never retries the whole thing again, > (i.e. it never asks the namenode again to ask for a new set of datanodes). > In HDFS-1237, the bug is due to permanent disk fault. In this case, it's > about transient error. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1119) Refactor BlocksMap with GettableSet
[ https://issues.apache.org/jira/browse/HDFS-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881291#action_12881291 ] Konstantin Shvachko commented on HDFS-1119: --- +1 the patch for 0.20 looks good. Should we apply this to the official 0.20 branch? It is an optimization, not a new feature. > Refactor BlocksMap with GettableSet > --- > > Key: HDFS-1119 > URL: https://issues.apache.org/jira/browse/HDFS-1119 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.22.0 > > Attachments: h1119_20100429.patch, h1119_20100506.patch, > h1119_20100521.patch, h1119_20100525.patch, h1119_20100525_y0.20.1xx.patch > > > The data structure required in BlocksMap is a GettableSet. See also [this > comment|https://issues.apache.org/jira/browse/HDFS-1114?focusedCommentId=12862118&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12862118]. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-609) Create a file with the append flag does not work in HDFS
[ https://issues.apache.org/jira/browse/HDFS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881288#action_12881288 ] Tom White commented on HDFS-609: Konstantin, This was one of those situations where the changes for this JIRA had to be applied at the same time as its parent HADOOP-5438, so wasn't possible to have Hudson check this patch either before or after HADOOP-5438 was committed, so I did it manually. You're right about the tests. I should have mentioned that I did run tests at the time I posted the patch - I'm re-running them now to be sure I didn't miss anything. > Create a file with the append flag does not work in HDFS > > > Key: HDFS-609 > URL: https://issues.apache.org/jira/browse/HDFS-609 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0, 0.22.0 >Reporter: Hairong Kuang >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: HDFS-609.patch > > > HADOOP-5438 introduced a create API with flags. There are a couple of issues > when the flag is set to be APPEND. > 1. The APPEND flag does not work in HDFS. Append is not as simple as changing > a FileINode to be a FileINodeUnderConstruction. It also need to reopen the > last block for applend if last block is not full and handle crc when the last > crc chunk is not full. > 2. The API is not well thought. It has parameters like replication factor and > blockSize. Those parameters do not make any sense if APPEND flag is set. But > they give an application user a wrong impression that append could change a > file's block size and replication factor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1257) Race condition introduced by HADOOP-5124
Race condition introduced by HADOOP-5124 Key: HDFS-1257 URL: https://issues.apache.org/jira/browse/HDFS-1257 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Ramkumar Vadali HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. But it introduced unprotected access to the data structure recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork accesses recentInvalidateSets without read-lock protection. If there is concurrent activity (like reducing replication on a file) that adds to recentInvalidateSets, the name-node crashes with a ConcurrentModificationException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1036) in DelegationTokenFetch dfs.getURI returns no port
[ https://issues.apache.org/jira/browse/HDFS-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated HDFS-1036: - Attachment: HDFS-1036.patch > in DelegationTokenFetch dfs.getURI returns no port > -- > > Key: HDFS-1036 > URL: https://issues.apache.org/jira/browse/HDFS-1036 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Attachments: fetchdt_doc.patch, HDFS-1036-BP20-1.patch, > HDFS-1036-BP20-Fix.patch, HDFS-1036-BP20.patch, HDFS-1036.patch > > > dfs.getUri().getPort() returns -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1036) in DelegationTokenFetch dfs.getURI returns no port
[ https://issues.apache.org/jira/browse/HDFS-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated HDFS-1036: - Status: Patch Available (was: Open) > in DelegationTokenFetch dfs.getURI returns no port > -- > > Key: HDFS-1036 > URL: https://issues.apache.org/jira/browse/HDFS-1036 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Attachments: fetchdt_doc.patch, HDFS-1036-BP20-1.patch, > HDFS-1036-BP20-Fix.patch, HDFS-1036-BP20.patch, HDFS-1036.patch > > > dfs.getUri().getPort() returns -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1224) Stale connection makes node miss append
[ https://issues.apache.org/jira/browse/HDFS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1224: -- Affects Version/s: 0.20-append (was: 0.20.1) Probably makes sense to assign this to 20-append branch. I see append is officially on there, which is great. > Stale connection makes node miss append > --- > > Key: HDFS-1224 > URL: https://issues.apache.org/jira/browse/HDFS-1224 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Thanh Do > > - Summary: if a datanode crashes and restarts, it may miss an append. > > - Setup: > + # available datanodes = 3 > + # replica = 3 > + # disks / datanode = 1 > + # failures = 1 > + failure type = crash > + When/where failure happens = after the first append succeed > > - Details: > Since each datanode maintains a pool of IPC connections, whenever it wants > to make an IPC call, it first looks into the pool. If the connection is not > there, > it is created and put in to the pool. Otherwise the existing connection is > used. > Suppose that the append pipeline contains dn1, dn2, and dn3. Dn1 is the > primary. > After the client appends to block X successfully, dn2 crashes and restarts. > Now client writes a new block Y to dn1, dn2 and dn3. The write is successful. > Client starts appending to block Y. It first calls dn1.recoverBlock(). > Dn1 will first create a proxy corresponding with each of the datanode in the > pipeline > (in order to make RPC call like getMetadataInfo( ) or updateBlock( )). > However, because > dn2 has just crashed and restarts, its connection in dn1's pool become stale. > Dn1 uses > this connection to make a call to dn2, hence an exception. Therefore, append > will be > made only to dn1 and dn3, although dn2 is alive and the write of block Y to > dn2 has > been successful. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-609) Create a file with the append flag does not work in HDFS
[ https://issues.apache.org/jira/browse/HDFS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881273#action_12881273 ] Konstantin Shvachko commented on HDFS-609: -- test-patch does not run tests. We should keep following common practices and go through patch available stage, shouldn't we. > Create a file with the append flag does not work in HDFS > > > Key: HDFS-609 > URL: https://issues.apache.org/jira/browse/HDFS-609 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0, 0.22.0 >Reporter: Hairong Kuang >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: HDFS-609.patch > > > HADOOP-5438 introduced a create API with flags. There are a couple of issues > when the flag is set to be APPEND. > 1. The APPEND flag does not work in HDFS. Append is not as simple as changing > a FileINode to be a FileINodeUnderConstruction. It also need to reopen the > last block for applend if last block is not full and handle crc when the last > crc chunk is not full. > 2. The API is not well thought. It has parameters like replication factor and > blockSize. Those parameters do not make any sense if APPEND flag is set. But > they give an application user a wrong impression that append could change a > file's block size and replication factor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1255) test-libhdfs.sh fails
[ https://issues.apache.org/jira/browse/HDFS-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881272#action_12881272 ] Tom White commented on HDFS-1255: - Results of running test-patch: {noformat} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. {noformat} > test-libhdfs.sh fails > - > > Key: HDFS-1255 > URL: https://issues.apache.org/jira/browse/HDFS-1255 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Tom White >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: HDFS-1255.patch > > > This is a consequence of bin scripts having moved (see HADOOP-6794). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-743) file size is fluctuating although file is closed
[ https://issues.apache.org/jira/browse/HDFS-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881269#action_12881269 ] Konstantin Shvachko commented on HDFS-743: -- Could you please update the "Version/s" fields. > file size is fluctuating although file is closed > > > Key: HDFS-743 > URL: https://issues.apache.org/jira/browse/HDFS-743 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: dhruba borthakur >Assignee: dhruba borthakur >Priority: Blocker > Attachments: fluctuatingFileSize_0.17.txt > > > I am seeing that the length of a file sometimes becomes zero after a namenode > restart. These files have only one block. All the three replicas of that > block on the datanode(s) has non-zero size. Increasing the replication factor > of the file causes the file to show its correct non-zero length. > I am marking this as a blocker because it is still to be investigated which > releases it affects. I am seeing this on 0.17.x very frequently. I might have > seen this on 0.20.x but do not have a reproducible case yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-18) NameNode startup failed
[ https://issues.apache.org/jira/browse/HDFS-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-18. - Resolution: Cannot Reproduce Please feel free to reopen if you see this as a problem again. > NameNode startup failed > --- > > Key: HDFS-18 > URL: https://issues.apache.org/jira/browse/HDFS-18 > Project: Hadoop HDFS > Issue Type: Bug > Environment: 0.19.1 patched with 3422, 4675 and 5269 on Redhat >Reporter: Tamir Kamara > Attachments: nn-log.txt > > > After bouncing the cluster namenode refuses to start and gives the error: > FSNamesystem initialization failed. Also says: saveLeases found path > /tmp/temp623789763/tmp659456056/_temporary/_attempt_200904211331_0010_r_02_0/part-2 > but no matching entry in namespace. Recovery from checkpoint resulted in > wide spread corruption which made it necessary to format the dfs. > This is opened as a result from these threads: > http://www.mail-archive.com/core-u...@hadoop.apache.org/msg09397.html, > http://www.mail-archive.com/core-u...@hadoop.apache.org/msg09663.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1253) bin/hdfs conflicts with common user shortcut
[ https://issues.apache.org/jira/browse/HDFS-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881253#action_12881253 ] Allen Wittenauer commented on HDFS-1253: The hadoop command is still supported, but there are lots of places on the net where hdfs is used as an alias for 'hadoop dfs'. If someone doesn't know/how to actually create that shell alias, the fact that the hadoop command is there will be irrelevant. > bin/hdfs conflicts with common user shortcut > > > Key: HDFS-1253 > URL: https://issues.apache.org/jira/browse/HDFS-1253 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Allen Wittenauer >Priority: Blocker > Fix For: 0.21.0 > > > The 'hdfs' command introduced in 0.21 (unreleased at this time) conflicts > with a common user alias and wrapper script. This change should either be > reverted or moved from $HADOOP_HOME/bin to somewhere else in $HADOOP_HOME > (perhaps sbin?) so that users do not accidentally hit it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1218) 20 append: Blocks recovered on startup should be treated with lower priority during block synchronization
[ https://issues.apache.org/jira/browse/HDFS-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881249#action_12881249 ] sam rash commented on HDFS-1218: I racked my brain and can't come up with a case that this could actually occur--keepLength is only set true when doing an append. If any nodes had gone down and come back up (RWR), they either have an old genstamp and will be ignored, or soft lease expiry recovery is initiated by the NN with keepLength = false first. i think the idea + patch look good to me (and thanks for taking the time to explain it) > 20 append: Blocks recovered on startup should be treated with lower priority > during block synchronization > - > > Key: HDFS-1218 > URL: https://issues.apache.org/jira/browse/HDFS-1218 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.20-append > > Attachments: hdfs-1281.txt > > > When a datanode experiences power loss, it can come back up with truncated > replicas (due to local FS journal replay). Those replicas should not be > allowed to truncate the block during block synchronization if there are other > replicas from DNs that have _not_ restarted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1253) bin/hdfs conflicts with common user shortcut
[ https://issues.apache.org/jira/browse/HDFS-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881245#action_12881245 ] Tom White commented on HDFS-1253: - The 'hadoop' command is still supported, and will give users time to move to 'hdfs' over time - is that an acceptable solution? What do others think? > bin/hdfs conflicts with common user shortcut > > > Key: HDFS-1253 > URL: https://issues.apache.org/jira/browse/HDFS-1253 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Allen Wittenauer >Priority: Blocker > Fix For: 0.21.0 > > > The 'hdfs' command introduced in 0.21 (unreleased at this time) conflicts > with a common user alias and wrapper script. This change should either be > reverted or moved from $HADOOP_HOME/bin to somewhere else in $HADOOP_HOME > (perhaps sbin?) so that users do not accidentally hit it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1192) refreshSuperUserGroupsConfiguration should use server side configuration for the refresh (for HADOOP-6815)
[ https://issues.apache.org/jira/browse/HDFS-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated HDFS-1192: - Attachment: HDFS-1192-10.patch > refreshSuperUserGroupsConfiguration should use server side configuration for > the refresh (for HADOOP-6815) > -- > > Key: HDFS-1192 > URL: https://issues.apache.org/jira/browse/HDFS-1192 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Attachments: HDFS-1192-1.patch, HDFS-1192-10.patch, HDFS-1192-9.patch > > > Currently refreshSuperUserGroupsConfiguration is using client side > Configuration. > One of the issues with this is that if the cluster is restarted it will loose > the "refreshed' values (unless they are copied to the NameNode/JobTracker > machine). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1213) Implement an Apache Commons VFS Driver for HDFS
[ https://issues.apache.org/jira/browse/HDFS-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael D'Amour updated HDFS-1213: -- Attachment: HADOOP-HDFS-Apache-VFS.patch Patch for Apache Commons VFS driver. > Implement an Apache Commons VFS Driver for HDFS > --- > > Key: HDFS-1213 > URL: https://issues.apache.org/jira/browse/HDFS-1213 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs client >Reporter: Michael D'Amour > Attachments: HADOOP-HDFS-Apache-VFS.patch, > pentaho-hdfs-vfs-TRUNK-SNAPSHOT-sources.tar.gz, > pentaho-hdfs-vfs-TRUNK-SNAPSHOT.jar > > > We have an open source ETL tool (Kettle) which uses VFS for many input/output > steps/jobs. We would like to be able to read/write HDFS from Kettle using > VFS. > > I haven't been able to find anything out there other than "it would be nice." > > I had some time a few weeks ago to begin writing a VFS driver for HDFS and we > (Pentaho) would like to be able to contribute this driver. I believe it > supports all the major file/folder operations and I have written unit tests > for all of these operations. The code is currently checked into an open > Pentaho SVN repository under the Apache 2.0 license. There are some current > limitations, such as a lack of authentication (kerberos), which appears to be > coming in 0.22.0, however, the driver supports username/password, but I just > can't use them yet. > I will be attaching the code for the driver once the case is created. The > project does not modify existing hadoop/hdfs source. > Our JIRA case can be found at http://jira.pentaho.com/browse/PDI-4146 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-18) NameNode startup failed
[ https://issues.apache.org/jira/browse/HDFS-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881116#action_12881116 ] Tamir Kamara commented on HDFS-18: -- I've not seen this again with 19 and for a while now we're on 20 so no chance to reproduce it at my end. > NameNode startup failed > --- > > Key: HDFS-18 > URL: https://issues.apache.org/jira/browse/HDFS-18 > Project: Hadoop HDFS > Issue Type: Bug > Environment: 0.19.1 patched with 3422, 4675 and 5269 on Redhat >Reporter: Tamir Kamara > Attachments: nn-log.txt > > > After bouncing the cluster namenode refuses to start and gives the error: > FSNamesystem initialization failed. Also says: saveLeases found path > /tmp/temp623789763/tmp659456056/_temporary/_attempt_200904211331_0010_r_02_0/part-2 > but no matching entry in namespace. Recovery from checkpoint resulted in > wide spread corruption which made it necessary to format the dfs. > This is opened as a result from these threads: > http://www.mail-archive.com/core-u...@hadoop.apache.org/msg09397.html, > http://www.mail-archive.com/core-u...@hadoop.apache.org/msg09663.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1197) Blocks are considered "complete" prematurely after commitBlockSynchronization or DN restart
[ https://issues.apache.org/jira/browse/HDFS-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HDFS-1197: --- Fix Version/s: 0.20-append I am marking this for the append-branch to ensure that we investigate and follow through on this one. > Blocks are considered "complete" prematurely after commitBlockSynchronization > or DN restart > --- > > Key: HDFS-1197 > URL: https://issues.apache.org/jira/browse/HDFS-1197 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client, name-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon > Fix For: 0.20-append > > Attachments: hdfs-1197-test-changes.txt, testTC2-failure.txt > > > I saw this failure once on my internal Hudson job that runs the append tests > 48 times a day: > junit.framework.AssertionFailedError: expected:<114688> but was:<98304> > at org.apache.hadoop.hdfs.AppendTestUtil.check(AppendTestUtil.java:112) > at > org.apache.hadoop.hdfs.TestFileAppend3.testTC2(TestFileAppend3.java:116) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1254) 0.20: mark dfs.supprt.append to be true by default for the 0.20-append branch
[ https://issues.apache.org/jira/browse/HDFS-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur resolved HDFS-1254. Hadoop Flags: [Reviewed] Resolution: Fixed I just committed this. > 0.20: mark dfs.supprt.append to be true by default for the 0.20-append branch > - > > Key: HDFS-1254 > URL: https://issues.apache.org/jira/browse/HDFS-1254 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Fix For: 0.20-append > > Attachments: append.txt > > > The 0.20-append branch supports append/sync for HDFS. Change the default > configuration to enable append. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-183) MapReduce Streaming job hang when all replications of the input file has corrupted!
[ https://issues.apache.org/jira/browse/HDFS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881090#action_12881090 ] Soundararajan Velu commented on HDFS-183: - Zhu, I tried reproeducing this issue in our cluster with no luck... The dfs client retries for 5 times and then throws an IO exception and then terminates the operation. Please let me know if you are still facing this issue. > MapReduce Streaming job hang when all replications of the input file has > corrupted! > --- > > Key: HDFS-183 > URL: https://issues.apache.org/jira/browse/HDFS-183 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZhuGuanyin >Priority: Critical > > On some special cases, all replications of a given file has truncated to zero > but the namenode still hold the original size (we don't know why), the > mapreduce streaming job will hang if we don't specified mapred.task.timeout > when the input files contain this corrupted file, even the dfs shell "cat" > will hang when fetch data from this corrupted file. > We found that job hang at DFSInputStream.blockSeekTo() when chosing a > datanode. The following test will show: > 1)Copy a small file to hdfs. > 2)Get the file blocks and login to these datanodes, and truncate these > blocks to zero. > 3)Cat this file through dfs shell "cat" > 4)Cat command will enter dead loop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1197) Blocks are considered "complete" prematurely after commitBlockSynchronization or DN restart
[ https://issues.apache.org/jira/browse/HDFS-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1197: -- Attachment: hdfs-1197.txt Here's a patch, applies on top of HDFS-1186 and HDFS-1218, HDFS-1057. Been in heavy testing for a week or so, should be stable. > Blocks are considered "complete" prematurely after commitBlockSynchronization > or DN restart > --- > > Key: HDFS-1197 > URL: https://issues.apache.org/jira/browse/HDFS-1197 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client, name-node >Affects Versions: 0.20-append >Reporter: Todd Lipcon > Fix For: 0.20-append > > Attachments: hdfs-1197-test-changes.txt, hdfs-1197.txt, > testTC2-failure.txt > > > I saw this failure once on my internal Hudson job that runs the append tests > 48 times a day: > junit.framework.AssertionFailedError: expected:<114688> but was:<98304> > at org.apache.hadoop.hdfs.AppendTestUtil.check(AppendTestUtil.java:112) > at > org.apache.hadoop.hdfs.TestFileAppend3.testTC2(TestFileAppend3.java:116) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.