[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793113#action_12793113 ] Raghu Angadi commented on HDFS-755: --- User code should use buffering for application specific reasons. May be 'bufferSize' argument for FSInputStream is flawed to start with. My impression is that main purpose of this patch is to reduce a copy. keeping the large buffer prohibits that. Even when a sequencefile has very small records (avg 1k?), I think it might not have net negative effect. system calls are fairly cheap. There might not be a net negative effect on fairly small reads. Do you see FSInputChecker or DFSClient evolve to dynamically decide if a buffer should be used in near future? +1 for the patch itself. btw, I ran 'time bin/hadoop fs -cat 1gbfile /dev/null', with NN, DN, and the client on the same machine, but not been able to see improvement. will verify if I am really running the patch. Read multiple checksum chunks at once in DFSInputStream --- Key: HDFS-755 URL: https://issues.apache.org/jira/browse/HDFS-755 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum chunks in a single call to readChunk. This is the HDFS-side use of that new feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-758) Improve reporting of progress of decommissioning
[ https://issues.apache.org/jira/browse/HDFS-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Chansler updated HDFS-758: - Release Note: New name node web UI page displays details of decommissioning progress. (dfsnodelist.jsp?whatNodes=DECOMMISSIONING) Improve reporting of progress of decommissioning Key: HDFS-758 URL: https://issues.apache.org/jira/browse/HDFS-758 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HDFS-758.1.patch, HDFS-758.2.patch, HDFS-758.3.patch, HDFS-758.4.patch, HDFS-758.5.0-20.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-761) Failure to process rename operation from edits log due to quota verification
[ https://issues.apache.org/jira/browse/HDFS-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated HDFS-761: -- Release Note: An error checking quota policy resulted in a failure to read the edits log, stopping the primary/secondary name node. (was: An error checking quota policy resulted in a failure to read the edits log, stopping the secondary name node.) Failure to process rename operation from edits log due to quota verification Key: HDFS-761 URL: https://issues.apache.org/jira/browse/HDFS-761 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.20.2, 0.21.0, 0.22.0 Attachments: hdfs-761.1.patch, hdfs-761.1.patch, hdfs-761.1.rel20.patch, hdfs-761.patch, hdfs-761.rel20.patch, hdfs-761.rel21.patch When processing edits log, quota verification is not done and the used quota for directories is not updated. The update is done at the end of processing edits log. This rule is broken by change introduced in HDFS-677. This causes namenode from handling rename operation from edits log due to quota verification failure. Once this happens, namenode does not proceed edits log any further. This results in check point failure on backup node or secondary namenode. This also prevents namenode from coming up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-781) Metrics PendingDeletionBlocks is not decremented
[ https://issues.apache.org/jira/browse/HDFS-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Chansler updated HDFS-781: - Release Note: Correct PendingDeletionBlocks metric to properly decrement counts. Metrics PendingDeletionBlocks is not decremented Key: HDFS-781 URL: https://issues.apache.org/jira/browse/HDFS-781 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Priority: Blocker Fix For: 0.20.2, 0.21.0, 0.22.0 Attachments: hdfs-781.1.patch, hdfs-781.2.patch, hdfs-781.3.patch, hdfs-781.4.patch, hdfs-781.patch, hdfs-781.rel20.1.patch, hdfs-781.rel20.patch PendingDeletionBlocks is not decremented decremented when blocks pending deletion in {{BlockManager.recentInvalidateSets}} are sent to datanode for deletion. This results in invalid value in the metrics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-625) ListPathsServlet throws NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Chansler updated HDFS-625: - Release Note: Corrected error where listing path no longer in name space could stop ListPathsServlet until system restarted. (was: Attempt to list path no longer in name space could stop ListPathsServlet until system restarted.) ListPathsServlet throws NullPointerException Key: HDFS-625 URL: https://issues.apache.org/jira/browse/HDFS-625 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Suresh Srinivas Fix For: 0.21.0, 0.22.0 Attachments: hdfs-625.0-20.patch, hdfs-625.1.patch, hdfs-625.patch ListPathsServlet throws NullPointerException when listing on a path which is not found in the namesystem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-761) Failure to process rename operation from edits log due to quota verification
[ https://issues.apache.org/jira/browse/HDFS-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Chansler updated HDFS-761: - Release Note: Corrected an error when checking quota policy that resulted in a failure to read the edits log, stopping the primary/secondary name node. (was: An error checking quota policy resulted in a failure to read the edits log, stopping the primary/secondary name node.) Failure to process rename operation from edits log due to quota verification Key: HDFS-761 URL: https://issues.apache.org/jira/browse/HDFS-761 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2, 0.21.0, 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.20.2, 0.21.0, 0.22.0 Attachments: hdfs-761.1.patch, hdfs-761.1.patch, hdfs-761.1.rel20.patch, hdfs-761.patch, hdfs-761.rel20.patch, hdfs-761.rel21.patch When processing edits log, quota verification is not done and the used quota for directories is not updated. The update is done at the end of processing edits log. This rule is broken by change introduced in HDFS-677. This causes namenode from handling rename operation from edits log due to quota verification failure. Once this happens, namenode does not proceed edits log any further. This results in check point failure on backup node or secondary namenode. This also prevents namenode from coming up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-564) Adding pipeline test 17-35
[ https://issues.apache.org/jira/browse/HDFS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-564: --- Status: Patch Available (was: Open) Adding pipeline test 17-35 -- Key: HDFS-564 URL: https://issues.apache.org/jira/browse/HDFS-564 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 0.21.0 Reporter: Kan Zhang Assignee: Hairong Kuang Priority: Blocker Fix For: 0.21.0 Attachments: h564-24.patch, h564-25.patch, pipelineTests.patch, pipelineTests1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
[ https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-630: Component/s: name-node Priority: Major (was: Minor) Issue Type: Improvement (was: New Feature) Hadoop Flags: [Incompatible change, Reviewed] (was: [Reviewed]) +1 0001-Fix-HDFS-630-0.21-svn.patch looks good. Thanks, Cosmin. In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block. --- Key: HDFS-630 URL: https://issues.apache.org/jira/browse/HDFS-630 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client, name-node Affects Versions: 0.21.0 Reporter: Ruyue Ma Assignee: Cosmin Lehene Attachments: 0001-Fix-HDFS-630-0.21-svn.patch, 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 0001-Fix-HDFS-630-trunk-svn-2.patch, HDFS-630.patch created from hdfs-200. If during a write, the dfsclient sees that a block replica location for a newly allocated block is not-connectable, it re-requests the NN to get a fresh set of replica locations of the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds between each retry ( see DFSClient.nextBlockOutputStream). This setting works well when you have a reasonable size cluster; if u have few datanodes in the cluster, every retry maybe pick the dead-datanode and the above logic bails out. Our solution: when getting block location from namenode, we give nn the excluded datanodes. The list of dead datanodes is only for one block allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
[ https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793340#action_12793340 ] Hudson commented on HDFS-630: - Integrated in Hadoop-Hdfs-trunk-Commit #151 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/151/]) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block; back out this patch so can replace w/ improved version In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block. --- Key: HDFS-630 URL: https://issues.apache.org/jira/browse/HDFS-630 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client, name-node Affects Versions: 0.21.0 Reporter: Ruyue Ma Assignee: Cosmin Lehene Attachments: 0001-Fix-HDFS-630-0.21-svn.patch, 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 0001-Fix-HDFS-630-trunk-svn-2.patch, HDFS-630.patch created from hdfs-200. If during a write, the dfsclient sees that a block replica location for a newly allocated block is not-connectable, it re-requests the NN to get a fresh set of replica locations of the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds between each retry ( see DFSClient.nextBlockOutputStream). This setting works well when you have a reasonable size cluster; if u have few datanodes in the cluster, every retry maybe pick the dead-datanode and the above logic bails out. Our solution: when getting block location from namenode, we give nn the excluded datanodes. The list of dead datanodes is only for one block allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-846) SetSpaceQuota of value 9223372036854775807 does not apply quota.
SetSpaceQuota of value 9223372036854775807 does not apply quota. Key: HDFS-846 URL: https://issues.apache.org/jira/browse/HDFS-846 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.1 Reporter: Ravi Phulari Priority: Minor Fix For: 0.20.2, 0.21.0, 0.22.0 *hadoop dfsadmin -setSpaceQuota* can set maximum quota of value 9223372036854775807 and minimum 0. There is no error message or quota is not applied when user tries to setSpaceQuota of size 9223372036854775807. {noformat} [u...@ghost-host hadoop]$ hadoop dfsadmin -setSpaceQuota 9223372036854775807 / [u...@ghost-host hadoop]$ hadoop fs -count -q / 2147483647 2147483599 none inf 24 242019254 hdfs://ghost-nn.scary.land.com/ {noformat} We should warn user by showing error message that 9223372036854775807 is invalid value and quota will not be applied. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-846) SetSpaceQuota of value 9223372036854775807 does not apply quota.
[ https://issues.apache.org/jira/browse/HDFS-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793349#action_12793349 ] Ravi Phulari commented on HDFS-846: --- Also we need to correct ambiguous error message for setSpaceQuota value. *setSpaceQuota: Invalid values for quota : 9223372036854775807 and 0* This should be something like - *setSpaceQuota: Invalid value user-input for setSpaceQuota : Allowed maximum spaceQuota = 9223372036854775807 and minimum spaceQuota = 0* {noformat} [u...@ghost-host hadoop]$ hadoop dfsadmin -setSpaceQuota 0 / setSpaceQuota: Invalid values for quota : 9223372036854775807 and 0 {noformat} SetSpaceQuota of value 9223372036854775807 does not apply quota. Key: HDFS-846 URL: https://issues.apache.org/jira/browse/HDFS-846 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.1 Reporter: Ravi Phulari Priority: Minor Fix For: 0.20.2, 0.21.0, 0.22.0 *hadoop dfsadmin -setSpaceQuota* can set maximum quota of value 9223372036854775807 and minimum 0. There is no error message or quota is not applied when user tries to setSpaceQuota of size 9223372036854775807. {noformat} [u...@ghost-host hadoop]$ hadoop dfsadmin -setSpaceQuota 9223372036854775807 / [u...@ghost-host hadoop]$ hadoop fs -count -q / 2147483647 2147483599 none inf 24 242019254 hdfs://ghost-nn.scary.land.com/ {noformat} We should warn user by showing error message that 9223372036854775807 is invalid value and quota will not be applied. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-770) SocketTimeoutException: timeout while waiting for channel to be ready for read
[ https://issues.apache.org/jira/browse/HDFS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793363#action_12793363 ] Todd Lipcon commented on HDFS-770: -- Hi Zheng, What kernel is the newer box running? Is it a brand new kernel? 2.6.32 has significant changes to dirty page writeback -Todd SocketTimeoutException: timeout while waiting for channel to be ready for read -- Key: HDFS-770 URL: https://issues.apache.org/jira/browse/HDFS-770 Project: Hadoop HDFS Issue Type: Bug Components: contrib/libhdfs, data-node, hdfs client, name-node Affects Versions: 0.20.1 Environment: Ubuntu Linux 8.04 Reporter: Leon Mergen Attachments: client.txt, datanode.txt, namenode.txt We're having issues with timeouts occurring in our client: for some reason, a timeout of 63000 milliseconds is triggered while writing HDFS data. Since we currently have a single-server setup, this results in our client terminating with a All datanodes are bad IOException. We're running all services, including the client, on our single server, so it cannot be a network error. The load on the client is extremely low during this period: only a few kilobytes a minute were being written around the time the error occured. After browsing a bit online, a lot of people talk about setting dfs.datanode.socket.write.timeout to 0 as a solution for this problem. Due to the low load of our system during this period, however, I do feel this is a real error and a timeout that should not be occurring. I have attached 3 logs of the namenode, datanode and client. It could be that this is related to http://issues.apache.org/jira/browse/HDFS-693 Any pointers on how I can assist to resolve this issue will be greatly appreciated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-564) Adding pipeline test 17-35
[ https://issues.apache.org/jira/browse/HDFS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793364#action_12793364 ] Tsz Wo (Nicholas), SZE commented on HDFS-564: - Test 17-35 are not found in pipelineTests1.patch. Forgot to add them? Adding pipeline test 17-35 -- Key: HDFS-564 URL: https://issues.apache.org/jira/browse/HDFS-564 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 0.21.0 Reporter: Kan Zhang Assignee: Hairong Kuang Priority: Blocker Fix For: 0.21.0 Attachments: h564-24.patch, h564-25.patch, pipelineTests.patch, pipelineTests1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
[ https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793373#action_12793373 ] stack commented on HDFS-630: Any chance of a patch that will apply to TRUNK Cosmin? The 0.21 patch does the below when applied. Thanks. {code} patching file src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java Hunk #1 FAILED at 44. Hunk #2 succeeded at 192 (offset 2 lines). 1 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java.rej {code} In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block. --- Key: HDFS-630 URL: https://issues.apache.org/jira/browse/HDFS-630 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client, name-node Affects Versions: 0.21.0 Reporter: Ruyue Ma Assignee: Cosmin Lehene Attachments: 0001-Fix-HDFS-630-0.21-svn.patch, 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 0001-Fix-HDFS-630-trunk-svn-2.patch, HDFS-630.patch created from hdfs-200. If during a write, the dfsclient sees that a block replica location for a newly allocated block is not-connectable, it re-requests the NN to get a fresh set of replica locations of the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds between each retry ( see DFSClient.nextBlockOutputStream). This setting works well when you have a reasonable size cluster; if u have few datanodes in the cluster, every retry maybe pick the dead-datanode and the above logic bails out. Our solution: when getting block location from namenode, we give nn the excluded datanodes. The list of dead datanodes is only for one block allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-564) Adding pipeline test 17-35
[ https://issues.apache.org/jira/browse/HDFS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793379#action_12793379 ] Hadoop QA commented on HDFS-564: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428501/pipelineTests1.patch against trunk revision 892941. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 24 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/154/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/154/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/154/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/154/console This message is automatically generated. Adding pipeline test 17-35 -- Key: HDFS-564 URL: https://issues.apache.org/jira/browse/HDFS-564 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 0.21.0 Reporter: Kan Zhang Assignee: Hairong Kuang Priority: Blocker Fix For: 0.21.0 Attachments: h564-24.patch, h564-25.patch, pipelineTests.patch, pipelineTests1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-814) Add an api to get the visible length of a DFSDataInputStream.
[ https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-814: Attachment: h814_20091221.patch h814_20091221.patch: added DFSDataInputStream.getVisibleLength(). Add an api to get the visible length of a DFSDataInputStream. - Key: HDFS-814 URL: https://issues.apache.org/jira/browse/HDFS-814 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.21.0, 0.22.0 Attachments: h814_20091221.patch Hflush guarantees that the bytes written before are visible to the new readers. However, there is no way to get the length of the visible bytes. The visible length is useful in some applications like SequenceFile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
[ https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793382#action_12793382 ] Hudson commented on HDFS-630: - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #154 (See [http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/154/]) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block; back out this patch so can replace w/ improved version In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block. --- Key: HDFS-630 URL: https://issues.apache.org/jira/browse/HDFS-630 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client, name-node Affects Versions: 0.21.0 Reporter: Ruyue Ma Assignee: Cosmin Lehene Attachments: 0001-Fix-HDFS-630-0.21-svn.patch, 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 0001-Fix-HDFS-630-trunk-svn-2.patch, HDFS-630.patch created from hdfs-200. If during a write, the dfsclient sees that a block replica location for a newly allocated block is not-connectable, it re-requests the NN to get a fresh set of replica locations of the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds between each retry ( see DFSClient.nextBlockOutputStream). This setting works well when you have a reasonable size cluster; if u have few datanodes in the cluster, every retry maybe pick the dead-datanode and the above logic bails out. Our solution: when getting block location from namenode, we give nn the excluded datanodes. The list of dead datanodes is only for one block allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-770) SocketTimeoutException: timeout while waiting for channel to be ready for read
[ https://issues.apache.org/jira/browse/HDFS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793385#action_12793385 ] Todd Lipcon commented on HDFS-770: -- Hey Zheng, Any chance you can run grep . /proc/sys/vm/* on the system that does show the problem, and compare the results to the one that doesn't show the problem? I'm thinking this might just be a factor of system level tuning. See http://www.westnet.com/~gsmith/content/linux-pdflush.htm SocketTimeoutException: timeout while waiting for channel to be ready for read -- Key: HDFS-770 URL: https://issues.apache.org/jira/browse/HDFS-770 Project: Hadoop HDFS Issue Type: Bug Components: contrib/libhdfs, data-node, hdfs client, name-node Affects Versions: 0.20.1 Environment: Ubuntu Linux 8.04 Reporter: Leon Mergen Attachments: client.txt, datanode.txt, filewriter.cpp, namenode.txt We're having issues with timeouts occurring in our client: for some reason, a timeout of 63000 milliseconds is triggered while writing HDFS data. Since we currently have a single-server setup, this results in our client terminating with a All datanodes are bad IOException. We're running all services, including the client, on our single server, so it cannot be a network error. The load on the client is extremely low during this period: only a few kilobytes a minute were being written around the time the error occured. After browsing a bit online, a lot of people talk about setting dfs.datanode.socket.write.timeout to 0 as a solution for this problem. Due to the low load of our system during this period, however, I do feel this is a real error and a timeout that should not be occurring. I have attached 3 logs of the namenode, datanode and client. It could be that this is related to http://issues.apache.org/jira/browse/HDFS-693 Any pointers on how I can assist to resolve this issue will be greatly appreciated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-101) DFS write pipeline : DFSClient sometimes does not detect second datanode failure
[ https://issues.apache.org/jira/browse/HDFS-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-101: --- Resolution: Fixed Fix Version/s: 0.22.0 0.20.2 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) DFS write pipeline : DFSClient sometimes does not detect second datanode failure - Key: HDFS-101 URL: https://issues.apache.org/jira/browse/HDFS-101 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.1 Reporter: Raghu Angadi Assignee: Hairong Kuang Priority: Blocker Fix For: 0.20.2, 0.21.0, 0.22.0 Attachments: detectDownDN-0.20.patch, detectDownDN1-0.20.patch, detectDownDN2.patch, detectDownDN3-0.20.patch, detectDownDN3.patch, hdfs-101.tar.gz When the first datanode's write to second datanode fails or times out DFSClient ends up marking first datanode as the bad one and removes it from the pipeline. Similar problem exists on DataNode as well and it is fixed in HADOOP-3339. From HADOOP-3339 : The main issue is that BlockReceiver thread (and DataStreamer in the case of DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty coarse control. We don't know what state the responder is in and interrupting has different effects depending on responder state. To fix this properly we need to redesign how we handle these interactions. When the first datanode closes its socket from DFSClient, DFSClient should properly read all the data left in the socket.. Also, DataNode's closing of the socket should not result in a TCP reset, otherwise I think DFSClient will not be able to read from the socket. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-762) Trying to start the balancer throws a NPE
[ https://issues.apache.org/jira/browse/HDFS-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HDFS-762: -- Status: Patch Available (was: Open) Trying to start the balancer throws a NPE - Key: HDFS-762 URL: https://issues.apache.org/jira/browse/HDFS-762 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Reporter: Cristian Ivascu Fix For: 0.21.0 Attachments: 0001-corrected-balancer-constructor.patch, HDFS-762.patch When trying to run the balancer, I get a NullPointerException: 2009-11-10 11:08:14,235 ERROR org.apache.hadoop.hdfs.server.balancer.Balancer: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:161) at org.apache.hadoop.hdfs.server.balancer.Balancer.checkReplicationPolicyCompatibility(Balancer.java:784) at org.apache.hadoop.hdfs.server.balancer.Balancer.init(Balancer.java:792) at org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:814) This happens when trying to use bin/start-balancer or bin/hdfs balancer -threshold 10 The config files (hdfs-site and core-site) have as fs.default.name hdfs://namenode:9000. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HDFS-762) Trying to start the balancer throws a NPE
[ https://issues.apache.org/jira/browse/HDFS-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur reassigned HDFS-762: - Assignee: Cristian Ivascu Trying to start the balancer throws a NPE - Key: HDFS-762 URL: https://issues.apache.org/jira/browse/HDFS-762 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Reporter: Cristian Ivascu Assignee: Cristian Ivascu Fix For: 0.21.0 Attachments: 0001-corrected-balancer-constructor.patch, HDFS-762.patch When trying to run the balancer, I get a NullPointerException: 2009-11-10 11:08:14,235 ERROR org.apache.hadoop.hdfs.server.balancer.Balancer: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:161) at org.apache.hadoop.hdfs.server.balancer.Balancer.checkReplicationPolicyCompatibility(Balancer.java:784) at org.apache.hadoop.hdfs.server.balancer.Balancer.init(Balancer.java:792) at org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:814) This happens when trying to use bin/start-balancer or bin/hdfs balancer -threshold 10 The config files (hdfs-site and core-site) have as fs.default.name hdfs://namenode:9000. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-762) Trying to start the balancer throws a NPE
[ https://issues.apache.org/jira/browse/HDFS-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793429#action_12793429 ] dhruba borthakur commented on HDFS-762: --- +1. Code looks good. Trying to start the balancer throws a NPE - Key: HDFS-762 URL: https://issues.apache.org/jira/browse/HDFS-762 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0 Reporter: Cristian Ivascu Assignee: Cristian Ivascu Fix For: 0.21.0 Attachments: 0001-corrected-balancer-constructor.patch, HDFS-762.patch When trying to run the balancer, I get a NullPointerException: 2009-11-10 11:08:14,235 ERROR org.apache.hadoop.hdfs.server.balancer.Balancer: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:161) at org.apache.hadoop.hdfs.server.balancer.Balancer.checkReplicationPolicyCompatibility(Balancer.java:784) at org.apache.hadoop.hdfs.server.balancer.Balancer.init(Balancer.java:792) at org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:814) This happens when trying to use bin/start-balancer or bin/hdfs balancer -threshold 10 The config files (hdfs-site and core-site) have as fs.default.name hdfs://namenode:9000. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-843) Add HTTP POST/PUT/DELETE support for web servers in datanodes
[ https://issues.apache.org/jira/browse/HDFS-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793450#action_12793450 ] Allen Wittenauer commented on HDFS-843: --- I can think of one fairly easily: loading data to a grid without requiring having the hadoop client installed. Add HTTP POST/PUT/DELETE support for web servers in datanodes - Key: HDFS-843 URL: https://issues.apache.org/jira/browse/HDFS-843 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Reporter: issei yoshida Attachments: 843.patch Currently, HDFS files can be read from datanodes by their web servers, but cannot be written or deleted. This add HTTP POST/PUT/DELETE support for HDFS. In requests, HTTP Header must contain Content-Length and Content-Type should NOT be application/x-www-form-urlencoded. In POST or PUT requests, target data need to be directly stored in HTTP body. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-245) Create symbolic links in HDFS
[ https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-245: - Attachment: symlink28-hdfs.patch Latest patch. * Implements path resolution as discussed in the above comment and HADOOP-6427 * Additional tests in TestLink to cover the above * Resolved against trunk * See HADOOP-6421 for the common changes Create symbolic links in HDFS - Key: HDFS-245 URL: https://issues.apache.org/jira/browse/HDFS-245 Project: Hadoop HDFS Issue Type: New Feature Reporter: dhruba borthakur Assignee: Eli Collins Attachments: 4044_20081030spi.java, designdocv1.txt, designdocv2.txt, designdocv3.txt, HADOOP-4044-strawman.patch, symlink-0.20.0.patch, symlink-25-hdfs.patch, symlink-26-hdfs.patch, symlink-26-hdfs.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink15.txt, symLink15.txt, symlink16-common.patch, symlink16-hdfs.patch, symlink16-mr.patch, symlink17-common.txt, symlink17-hdfs.txt, symlink18-common.txt, symlink19-common-delta.patch, symlink19-common.txt, symlink19-common.txt, symlink19-hdfs-delta.patch, symlink19-hdfs.txt, symlink20-common.patch, symlink20-hdfs.patch, symlink21-common.patch, symlink21-hdfs.patch, symlink22-common.patch, symlink22-hdfs.patch, symlink23-common.patch, symlink23-hdfs.patch, symlink24-hdfs.patch, symlink27-hdfs.patch, symlink28-hdfs.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-564) Adding pipeline test 17-35
[ https://issues.apache.org/jira/browse/HDFS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793484#action_12793484 ] Hadoop QA commented on HDFS-564: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428669/pipelineTests2.patch against trunk revision 893039. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 27 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/87/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/87/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/87/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/87/console This message is automatically generated. Adding pipeline test 17-35 -- Key: HDFS-564 URL: https://issues.apache.org/jira/browse/HDFS-564 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 0.21.0 Reporter: Kan Zhang Assignee: Hairong Kuang Priority: Blocker Fix For: 0.21.0 Attachments: h564-24.patch, h564-25.patch, pipelineTests.patch, pipelineTests1.patch, pipelineTests2.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-564) Adding pipeline test 17-35
[ https://issues.apache.org/jira/browse/HDFS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793494#action_12793494 ] Hairong Kuang commented on HDFS-564: The failed core test is not caused by this patch because this patch adds only fault injection tests. Adding pipeline test 17-35 -- Key: HDFS-564 URL: https://issues.apache.org/jira/browse/HDFS-564 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 0.21.0 Reporter: Kan Zhang Assignee: Hairong Kuang Priority: Blocker Fix For: 0.21.0 Attachments: h564-24.patch, h564-25.patch, pipelineTests.patch, pipelineTests1.patch, pipelineTests2.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-564) Adding pipeline test 17-35
[ https://issues.apache.org/jira/browse/HDFS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-564: --- Resolution: Fixed Fix Version/s: 0.22.0 Status: Resolved (was: Patch Available) I've committed this. Thanks, Nicholas and Kan! Adding pipeline test 17-35 -- Key: HDFS-564 URL: https://issues.apache.org/jira/browse/HDFS-564 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 0.21.0 Reporter: Kan Zhang Assignee: Hairong Kuang Priority: Blocker Fix For: 0.21.0, 0.22.0 Attachments: h564-24.patch, h564-25.patch, pipelineTests.patch, pipelineTests1.patch, pipelineTests2.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-483) Data transfer (aka pipeline) implementation cannot tolerate exceptions
[ https://issues.apache.org/jira/browse/HDFS-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793500#action_12793500 ] Hairong Kuang commented on HDFS-483: Nicholas, since all sub tasks are resolved, can we close this? Did HDFS-101 also fix HDFS-264? Data transfer (aka pipeline) implementation cannot tolerate exceptions -- Key: HDFS-483 URL: https://issues.apache.org/jira/browse/HDFS-483 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Attachments: h483_20090709.patch, h483_20090713.patch, h483_20090717.patch, h483_20090727.patch, h483_20090730.patch, h483_20090731.patch, h483_20090806.patch, h483_20090807.patch, h483_20090807b.patch, h483_20090810.patch, h483_20090818.patch, h483_20090819.patch, h483_20090819b.patch Data transfer was tested with simulated exceptions as below: # create files with dfs # write 1 byte # close file # open the same file # read the 1 byte and compare results The file was closed successfully but we got an IOException(Could not get block locations...) when the file was reopened for reading. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.