[jira] Updated: (HDFS-1007) HFTP needs to be updated to use delegation tokens
[ https://issues.apache.org/jira/browse/HDFS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated HDFS-1007: - Status: Resolved (was: Patch Available) Resolution: Fixed HFTP needs to be updated to use delegation tokens - Key: HDFS-1007 URL: https://issues.apache.org/jira/browse/HDFS-1007 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 1007-bugfix.patch, distcp-hftp-2.1.1.patch, distcp-hftp.1.patch, distcp-hftp.2.1.patch, distcp-hftp.2.patch, distcp-hftp.patch, HDFS-1007-1.patch, HDFS-1007-2.patch, HDFS-1007-3.patch, HDFS-1007-BP20-fix-1.patch, HDFS-1007-BP20-fix-2.patch, HDFS-1007-BP20-fix-3.patch, HDFS-1007-BP20.patch, hdfs-1007-long-running-hftp-client.patch, hdfs-1007-securityutil-fix.patch HFTPFileSystem should be updated to use the delegation tokens so that it can talk to the secure namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1301) TestHDFSProxy need to use server side conf for ProxyUser stuff.
[ https://issues.apache.org/jira/browse/HDFS-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated HDFS-1301: - Attachment: HDFS-1301-BP20.patch for previous version, not for commit TestHDFSProxy need to use server side conf for ProxyUser stuff. --- Key: HDFS-1301 URL: https://issues.apache.org/jira/browse/HDFS-1301 Project: Hadoop HDFS Issue Type: Bug Reporter: Boris Shkolnik Assignee: Boris Shkolnik Attachments: HDFS-1301-BP20.patch currently TestHdfsProxy sets hadoop.proxyuser.USER.groups in local copy of configuration. But ProxyUsers only looks at the server side config. For test we can uses static method in ProxyUsers to load the config. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1227) UpdateBlock fails due to unmatched file length
[ https://issues.apache.org/jira/browse/HDFS-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1293#action_1293 ] Thanh Do commented on HDFS-1227: In the append-branch, I saw the unmatched file length exception happens, but then the client retries RecoverBlock, hence, tolerates this UpdateBlock fails due to unmatched file length -- Key: HDFS-1227 URL: https://issues.apache.org/jira/browse/HDFS-1227 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20-append Reporter: Thanh Do - Summary: client append is not atomic, hence, it is possible that when retrying during append, there is an exception in updateBlock indicating unmatched file length, making append failed. - Setup: + # available datanodes = 3 + # disks / datanode = 1 + # failures = 2 + failure type = bad disk + When/where failure happens = (see below) + This bug is non-deterministic, to reproduce it, add a sufficient sleep before out.write() in BlockReceiver.receivePacket() in dn1 and dn2 but not dn3 - Details: Suppose client appends 16 bytes to block X which has length 16 bytes at dn1, dn2, dn3. Dn1 is primary. The pipeline is dn3-dn2-dn1. recoverBlock succeeds. Client starts sending data to the dn3 - the first datanode in pipeline. dn3 forwards the packet to downstream datanodes, and starts writing data to its disk. Suppose there is an exception in dn3 when writing to disk. Client gets the exception, it starts the recovery code by calling dn1.recoverBlock() again. dn1 in turn calls dn2.getMetadataInfo() and dn1.getMetaDataInfo() to build the syncList. Suppose at the time getMetadataInfo() is called at both datanodes (dn1 and dn2), the previous packet (which is sent from dn3) has not come to disk yet. Hence, the block Info given by getMetaDataInfo contains the length of 16 bytes. But after that, the packet comes to disk, making the block file length now becomes 32 bytes. Using the syncList (with contains block info with length 16 byte), dn1 calls updateBlock at dn2 and dn1, which will failed, because the length of new block info (given by updateBlock, which is 16 byte) does not match with its actual length on disk (which is 32 byte) Note that this bug is non-deterministic. Its depends on the thread interleaving at datanodes. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1298) Add support in HDFS to update statistics that tracks number of file system operations in FileSystem
[ https://issues.apache.org/jira/browse/HDFS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888965#action_12888965 ] Konstantin Shvachko commented on HDFS-1298: --- +1 Both patches look good to me. Add support in HDFS to update statistics that tracks number of file system operations in FileSystem --- Key: HDFS-1298 URL: https://issues.apache.org/jira/browse/HDFS-1298 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.22.0 Attachments: HDFS-1298.patch See HADOOP-6859 for the new statistics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-974) FileSystem.Statistics should include NN accesses from the clients
[ https://issues.apache.org/jira/browse/HDFS-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas resolved HDFS-974. -- Assignee: Suresh Srinivas (was: Sanjay Radia) Resolution: Duplicate Duplicate of HDFS-1298 FileSystem.Statistics should include NN accesses from the clients - Key: HDFS-974 URL: https://issues.apache.org/jira/browse/HDFS-974 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Reporter: Arun C Murthy Assignee: Suresh Srinivas Fix For: 0.22.0 It is a very useful metric to track, we can track per-task and hence per-job stats through Counters etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1201) Support for using different Kerberos keys for Namenode and datanode.
[ https://issues.apache.org/jira/browse/HDFS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated HDFS-1201: Attachment: h6632-06.patch This is the HDFS part of HADOOP-6632. It incorporates HDFS-1020 and a bug fix from HDFS-1006. Support for using different Kerberos keys for Namenode and datanode. - Key: HDFS-1201 URL: https://issues.apache.org/jira/browse/HDFS-1201 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: h6632-06.patch This jira covers the hdfs changes to support different Kerberos keys for Namenode and datanode. This corresponds to changes in HADOOP-6632 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1201) Support for using different Kerberos keys for Namenode and datanode.
[ https://issues.apache.org/jira/browse/HDFS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888984#action_12888984 ] Kan Zhang commented on HDFS-1201: - Ran ant test and passed. Also, manually verified the feature on a single node cluster. Support for using different Kerberos keys for Namenode and datanode. - Key: HDFS-1201 URL: https://issues.apache.org/jira/browse/HDFS-1201 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: h6632-06.patch This jira covers the hdfs changes to support different Kerberos keys for Namenode and datanode. This corresponds to changes in HADOOP-6632 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1303) StreamFile.doGet(..) uses an additional RPC to get file length
StreamFile.doGet(..) uses an additional RPC to get file length -- Key: HDFS-1303 URL: https://issues.apache.org/jira/browse/HDFS-1303 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Tsz Wo (Nicholas), SZE {code} //StreamFile.doGet(..) long fileLen = dfs.getFileInfo(filename).getLen(); FSInputStream in = dfs.open(filename); {code} In the codes above, t is unnecessary to call getFileInfo(..), which is an additional RPC to namenode. The file length can be obtained from the input stream after open(..). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1304) There is no unit test for HftpFileSystem.open(..)
There is no unit test for HftpFileSystem.open(..) - Key: HDFS-1304 URL: https://issues.apache.org/jira/browse/HDFS-1304 Project: Hadoop HDFS Issue Type: Improvement Components: test Reporter: Tsz Wo (Nicholas), SZE HftpFileSystem.open(..) first opens an URL connection to namenode's FileDataServlet and then is redirected to datanode's StreamFile servlet. Such redirection does not work in the unit test environment because the redirect URL uses real hostname instead of localhost. One way to get around it is to use fault-injection in order to replace the real hostname with localhost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1229) DFSClient incorrectly asks for new block if primary crashes during first recoverBlock
[ https://issues.apache.org/jira/browse/HDFS-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889001#action_12889001 ] Thanh Do commented on HDFS-1229: this does not happens in the append+320 trunk. DFSClient incorrectly asks for new block if primary crashes during first recoverBlock - Key: HDFS-1229 URL: https://issues.apache.org/jira/browse/HDFS-1229 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20-append Reporter: Thanh Do Setup: + # available datanodes = 2 + # disks / datanode = 1 + # failures = 1 + failure type = crash + When/where failure happens = during primary's recoverBlock Details: -- Say client is appending to block X1 in 2 datanodes: dn1 and dn2. First it needs to make sure both dn1 and dn2 agree on the new GS of the block. 1) Client first creates DFSOutputStream by calling OutputStream result = new DFSOutputStream(src, buffersize, progress, lastBlock, stat, conf.getInt(io.bytes.per.checksum, 512)); in DFSClient.append() 2) The above DFSOutputStream constructor in turn calls processDataNodeError(true, true) (i.e, hasError = true, isAppend = true), and starts the DataStreammer processDatanodeError(true, true); /* let's call this PDNE 1 */ streamer.start(); Note that DataStreammer.run() also calls processDatanodeError() while (!closed clientRunning) { ... boolean doSleep = processDatanodeError(hasError, false); /let's call this PDNE 2*/ 3) Now in the PDNE 1, we have following code: blockStream = null; blockReplyStream = null; ... while (!success clientRunning) { ... try { primary = createClientDatanodeProtocolProxy(primaryNode, conf); newBlock = primary.recoverBlock(block, isAppend, newnodes); /*exception here*/ ... catch (IOException e) { ... if (recoveryErrorCount maxRecoveryErrorCount) { // this condition is false } ... return true; } // end catch finally {...} this.hasError = false; lastException = null; errorIndex = 0; success = createBlockOutputStream(nodes, clientName, true); } ... Because dn1 crashes during client call to recoverBlock, we have an exception. Hence, go to the catch block, in which processDatanodeError returns true before setting hasError to false. Also, because createBlockOutputStream() is not called (due to an early return), blockStream is still null. 4) Now PDNE 1 has finished, we come to streamer.start(), which calls PDNE 2. Because hasError = false, PDNE 2 returns false immediately without doing anything if (!hasError) { return false; } 5) still in the DataStreamer.run(), after returning false from PDNE 2, we still have blockStream = null, hence the following code is executed: if (blockStream == null) { nodes = nextBlockOutputStream(src); this.setName(DataStreamer for file + src + block + block); response = new ResponseProcessor(nodes); response.start(); } nextBlockOutputStream which asks namenode to allocate new Block is called. (This is not good, because we are appending, not writing). Namenode gives it new Block ID and a set of datanodes, including crashed dn1. this leads to createOutputStream() fails because it tries to contact the dn1 first. (which has crashed). The client retries 5 times without any success, because every time, it asks namenode for new block! Again we see that the retry logic at client is weird! *This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu)* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.