[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432946#comment-13432946 ] Vinay commented on HDFS-3561: - Hi [~atm] any more comments you have on this..? ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active Key: HDFS-3561 URL: https://issues.apache.org/jira/browse/HDFS-3561 Project: Hadoop HDFS Issue Type: Bug Components: auto-failover, ha Affects Versions: 2.1.0-alpha, 3.0.0 Reporter: suja s Assignee: Vinay Attachments: HDFS-3561-2.patch, HDFS-3561.patch Scenario: Active NN on machine1 Standby NN on machine2 Machine1 is isolated from the network (machine1 network cable unplugged) After zk session timeout ZKFC at machine2 side gets notification that NN1 is not there. ZKFC tries to failover NN2 as active. As part of this during fencing it tries to connect to machine1 and kill NN1. (sshfence technique configured) This connection retry happens for 45 times( as it takes ipc.client.connect.max.socket.retries) Also after that standby NN is not able to take over as active (because of fencing failure). Suggestion: If ZKFC is not able to reach other NN for specified time/no of retries it can consider that NN as dead and instruct the other NN to take over as active as there is no chance of the other NN (NN1) retaining its state as active after zk session timeout when its isolated from network From ZKFC log: {noformat} 2012-06-21 17:46:14,378 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 22 time(s). 2012-06-21 17:46:35,378 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 23 time(s). 2012-06-21 17:46:56,378 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 24 time(s). 2012-06-21 17:47:17,378 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 25 time(s). 2012-06-21 17:47:38,382 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 26 time(s). 2012-06-21 17:47:59,382 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 27 time(s). 2012-06-21 17:48:20,386 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 28 time(s). 2012-06-21 17:48:41,386 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 29 time(s). 2012-06-21 17:49:02,386 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 30 time(s). 2012-06-21 17:49:23,386 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 31 time(s). {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3788) distcp can't copy large files using webhdfs due to missing Content-Length header
[ https://issues.apache.org/jira/browse/HDFS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432952#comment-13432952 ] Eli Collins commented on HDFS-3788: --- Correct, this is a different issue from HDFS-3671. distcp can't copy large files using webhdfs due to missing Content-Length header Key: HDFS-3788 URL: https://issues.apache.org/jira/browse/HDFS-3788 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Eli Collins Priority: Critical Attachments: distcp-webhdfs-errors.txt The following command fails when data1 contains a 3gb file. It passes when using hftp or when the directory just contains smaller (2gb) files, so looks like a webhdfs issue with large files. {{hadoop distcp webhdfs://eli-thinkpad:50070/user/eli/data1 hdfs://localhost:8020/user/eli/data2}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3789) JournalManager#format() should be able to throw IOException
Ivan Kelly created HDFS-3789: Summary: JournalManager#format() should be able to throw IOException Key: HDFS-3789 URL: https://issues.apache.org/jira/browse/HDFS-3789 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: 3.0.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Currently JournalManager#format cannot throw any exception. As format can fail, we should be able to propogate this failure upwards. Otherwise, format will fail silently, and the admin will start using the cluster with a failed/unusable journal manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3789) JournalManager#format() should be able to throw IOException
[ https://issues.apache.org/jira/browse/HDFS-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-3789: - Status: Patch Available (was: Open) JournalManager#format() should be able to throw IOException --- Key: HDFS-3789 URL: https://issues.apache.org/jira/browse/HDFS-3789 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: 3.0.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: HDFS-3789.diff Currently JournalManager#format cannot throw any exception. As format can fail, we should be able to propogate this failure upwards. Otherwise, format will fail silently, and the admin will start using the cluster with a failed/unusable journal manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3789) JournalManager#format() should be able to throw IOException
[ https://issues.apache.org/jira/browse/HDFS-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-3789: - Attachment: HDFS-3789.diff JournalManager#format() should be able to throw IOException --- Key: HDFS-3789 URL: https://issues.apache.org/jira/browse/HDFS-3789 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: 3.0.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: HDFS-3789.diff Currently JournalManager#format cannot throw any exception. As format can fail, we should be able to propogate this failure upwards. Otherwise, format will fail silently, and the admin will start using the cluster with a failed/unusable journal manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433124#comment-13433124 ] Arun C Murthy commented on HDFS-3672: - I'd really encourage you to put this into the DataNode and throw an UnsupportedOperationException rather than merely do this via a client-side config. Expose disk-location information for blocks to enable better scheduling --- Key: HDFS-3672 URL: https://issues.apache.org/jira/browse/HDFS-3672 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Attachments: design-doc-v1.pdf, design-doc-v2.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch, hdfs-3672-7.patch, hdfs-3672-8.patch Currently, HDFS exposes on which datanodes a block resides, which allows clients to make scheduling decisions for locality and load balancing. Extending this to also expose on which disk on a datanode a block resides would enable even better scheduling, on a per-disk rather than coarse per-datanode basis. This API would likely look similar to Filesystem#getFileBlockLocations, but also involve a series of RPCs to the responsible datanodes to determine disk ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3789) JournalManager#format() should be able to throw IOException
[ https://issues.apache.org/jira/browse/HDFS-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433126#comment-13433126 ] Hadoop QA commented on HDFS-3789: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540641/HDFS-3789.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.TestFileAppend4 +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2989//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2989//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2989//console This message is automatically generated. JournalManager#format() should be able to throw IOException --- Key: HDFS-3789 URL: https://issues.apache.org/jira/browse/HDFS-3789 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: 3.0.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: HDFS-3789.diff Currently JournalManager#format cannot throw any exception. As format can fail, we should be able to propogate this failure upwards. Otherwise, format will fail silently, and the admin will start using the cluster with a failed/unusable journal manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3788) distcp can't copy large files using webhdfs due to missing Content-Length header
[ https://issues.apache.org/jira/browse/HDFS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HDFS-3788: - Affects Version/s: 0.23.3 This affects 0.23 as well. distcp can't copy large files using webhdfs due to missing Content-Length header Key: HDFS-3788 URL: https://issues.apache.org/jira/browse/HDFS-3788 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Eli Collins Priority: Critical Attachments: distcp-webhdfs-errors.txt The following command fails when data1 contains a 3gb file. It passes when using hftp or when the directory just contains smaller (2gb) files, so looks like a webhdfs issue with large files. {{hadoop distcp webhdfs://eli-thinkpad:50070/user/eli/data1 hdfs://localhost:8020/user/eli/data2}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433209#comment-13433209 ] Daryn Sharp commented on HDFS-3150: --- Question: should we consider tying this and the use_ip config together? I think that if you need hosts names for multihoming you probably need host names for everything. Does this even work if use_ip is true (default value)? Add option for clients to contact DNs via hostname -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, hdfs client Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt, hdfs-3150.txt, hdfs-3150.txt The DN listens on multiple IP addresses (the default {{dfs.datanode.address}} is the wildcard) however per HADOOP-6867 only the source address (IP) of the registration is given to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming (the client can not route the IP exposed by the NN if the DN registers with an interface that has a cluster-private IP). To fix this let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP Approach #2 does not require an incompatible client protocol change, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3788) distcp can't copy large files using webhdfs due to missing Content-Length header
[ https://issues.apache.org/jira/browse/HDFS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433211#comment-13433211 ] Jason Lowe commented on HDFS-3788: -- A -get of a large file also fails, but it works on smaller files: {noformat} $ hadoop fs -ls bigfile Found 1 items -rw-r--r-- 3 someuser hdfs 3246391296 2012-08-13 15:04 bigfile $ hadoop fs -get webhdfs://clusternn:50070/user/someuser/bigfile get: Content-Length header is missing {noformat} distcp can't copy large files using webhdfs due to missing Content-Length header Key: HDFS-3788 URL: https://issues.apache.org/jira/browse/HDFS-3788 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Eli Collins Priority: Critical Attachments: distcp-webhdfs-errors.txt The following command fails when data1 contains a 3gb file. It passes when using hftp or when the directory just contains smaller (2gb) files, so looks like a webhdfs issue with large files. {{hadoop distcp webhdfs://eli-thinkpad:50070/user/eli/data1 hdfs://localhost:8020/user/eli/data2}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3788) distcp can't copy large files using webhdfs due to missing Content-Length header
[ https://issues.apache.org/jira/browse/HDFS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433254#comment-13433254 ] Daryn Sharp commented on HDFS-3788: --- The problem is complex to support multiple grid versions: * you need either the content-length or chunking to reliably know when the file has been fully read * if the response isn't chunked, and there's no content-length, the client needs to obtain the content-length by other means such as a file stat Based on a quick glance, it looks like the current streaming servlet is explicitly setting the content-length to 0. (That seems wrong, because it's not an empty file) The puzzling part is I don't know how it works at all for either 2GB or 2GB! Java must be implicitly setting the content-length when the stream is 2GB. distcp can't copy large files using webhdfs due to missing Content-Length header Key: HDFS-3788 URL: https://issues.apache.org/jira/browse/HDFS-3788 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Eli Collins Priority: Critical Attachments: distcp-webhdfs-errors.txt The following command fails when data1 contains a 3gb file. It passes when using hftp or when the directory just contains smaller (2gb) files, so looks like a webhdfs issue with large files. {{hadoop distcp webhdfs://eli-thinkpad:50070/user/eli/data1 hdfs://localhost:8020/user/eli/data2}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3790) test_fuse_dfs.c doesn't compile on centos 5
Colin Patrick McCabe created HDFS-3790: -- Summary: test_fuse_dfs.c doesn't compile on centos 5 Key: HDFS-3790 URL: https://issues.apache.org/jira/browse/HDFS-3790 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor test_fuse_dfs.c uses execvpe, which doesn't exist in the version of glibc shipped on CentOS 5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3790) test_fuse_dfs.c doesn't compile on centos 5
[ https://issues.apache.org/jira/browse/HDFS-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3790: --- Attachment: HDFS-3790.001.patch test_fuse_dfs.c doesn't compile on centos 5 --- Key: HDFS-3790 URL: https://issues.apache.org/jira/browse/HDFS-3790 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3790.001.patch test_fuse_dfs.c uses execvpe, which doesn't exist in the version of glibc shipped on CentOS 5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3790) test_fuse_dfs.c doesn't compile on centos 5
[ https://issues.apache.org/jira/browse/HDFS-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3790: --- Status: Patch Available (was: Open) test_fuse_dfs.c doesn't compile on centos 5 --- Key: HDFS-3790 URL: https://issues.apache.org/jira/browse/HDFS-3790 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3790.001.patch test_fuse_dfs.c uses execvpe, which doesn't exist in the version of glibc shipped on CentOS 5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3790) test_fuse_dfs.c doesn't compile on centos 5
[ https://issues.apache.org/jira/browse/HDFS-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433361#comment-13433361 ] Colin Patrick McCabe commented on HDFS-3790: I tested this on Centos5.8. It works and the test passes. test_fuse_dfs.c doesn't compile on centos 5 --- Key: HDFS-3790 URL: https://issues.apache.org/jira/browse/HDFS-3790 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3790.001.patch test_fuse_dfs.c uses execvpe, which doesn't exist in the version of glibc shipped on CentOS 5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3790) test_fuse_dfs.c doesn't compile on centos 5
[ https://issues.apache.org/jira/browse/HDFS-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433374#comment-13433374 ] Aaron T. Myers commented on HDFS-3790: -- +1 pending Jenkins. Colin, could you please set the affects/targets versions appropriately? Thanks. test_fuse_dfs.c doesn't compile on centos 5 --- Key: HDFS-3790 URL: https://issues.apache.org/jira/browse/HDFS-3790 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3790.001.patch test_fuse_dfs.c uses execvpe, which doesn't exist in the version of glibc shipped on CentOS 5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3790) test_fuse_dfs.c doesn't compile on centos 5
[ https://issues.apache.org/jira/browse/HDFS-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433376#comment-13433376 ] Hadoop QA commented on HDFS-3790: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540713/HDFS-3790.001.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. -1 javac. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2990//console This message is automatically generated. test_fuse_dfs.c doesn't compile on centos 5 --- Key: HDFS-3790 URL: https://issues.apache.org/jira/browse/HDFS-3790 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3790.001.patch test_fuse_dfs.c uses execvpe, which doesn't exist in the version of glibc shipped on CentOS 5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3789) JournalManager#format() should be able to throw IOException
[ https://issues.apache.org/jira/browse/HDFS-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433388#comment-13433388 ] Todd Lipcon commented on HDFS-3789: --- +1. I had this same patch pending but couldn't post it due to the JIRA outage the past few days. I will commit this later today. JournalManager#format() should be able to throw IOException --- Key: HDFS-3789 URL: https://issues.apache.org/jira/browse/HDFS-3789 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: 3.0.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: HDFS-3789.diff Currently JournalManager#format cannot throw any exception. As format can fail, we should be able to propogate this failure upwards. Otherwise, format will fail silently, and the admin will start using the cluster with a failed/unusable journal manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3791) Backport HDFS-173 Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes
Uma Maheswara Rao G created HDFS-3791: - Summary: Backport HDFS-173 Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes Key: HDFS-3791 URL: https://issues.apache.org/jira/browse/HDFS-3791 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Backport HDFS-173. see the [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007] for more details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433400#comment-13433400 ] Aaron T. Myers commented on HDFS-3719: -- OK, since it appears there's more to these failing tests than a simple fix, I'm going to go ahead and revert this change to re-enable the tests. Re-enable append-related tests in TestFileConcurrentReader -- Key: HDFS-3719 URL: https://issues.apache.org/jira/browse/HDFS-3719 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 2.2.0-alpha Attachments: hdfs-3719-1.patch Both of these tests are disabled. We should figure out what append functionality we need to make the tests work again, and reenable them. {code} // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorTransferToAppend() throws IOException { runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); } // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorNormalTransferAppend() throws IOException { runTestUnfinishedBlockCRCError(false, SyncType.APPEND, DEFAULT_WRITE_SIZE); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers reopened HDFS-3719: -- Re-enable append-related tests in TestFileConcurrentReader -- Key: HDFS-3719 URL: https://issues.apache.org/jira/browse/HDFS-3719 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 2.2.0-alpha Attachments: hdfs-3719-1.patch Both of these tests are disabled. We should figure out what append functionality we need to make the tests work again, and reenable them. {code} // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorTransferToAppend() throws IOException { runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); } // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorNormalTransferAppend() throws IOException { runTestUnfinishedBlockCRCError(false, SyncType.APPEND, DEFAULT_WRITE_SIZE); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3719: - Fix Version/s: (was: 2.2.0-alpha) I've just reverted this. Re-enable append-related tests in TestFileConcurrentReader -- Key: HDFS-3719 URL: https://issues.apache.org/jira/browse/HDFS-3719 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-3719-1.patch Both of these tests are disabled. We should figure out what append functionality we need to make the tests work again, and reenable them. {code} // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorTransferToAppend() throws IOException { runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); } // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorNormalTransferAppend() throws IOException { runTestUnfinishedBlockCRCError(false, SyncType.APPEND, DEFAULT_WRITE_SIZE); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433415#comment-13433415 ] Hudson commented on HDFS-3719: -- Integrated in Hadoop-Hdfs-trunk-Commit #2637 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2637/]) Revert HDFS-3719. See discussion there and HDFS-3770 for more info. (Revision 1372544) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372544 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java Re-enable append-related tests in TestFileConcurrentReader -- Key: HDFS-3719 URL: https://issues.apache.org/jira/browse/HDFS-3719 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-3719-1.patch Both of these tests are disabled. We should figure out what append functionality we need to make the tests work again, and reenable them. {code} // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorTransferToAppend() throws IOException { runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); } // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorNormalTransferAppend() throws IOException { runTestUnfinishedBlockCRCError(false, SyncType.APPEND, DEFAULT_WRITE_SIZE); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3770) TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed
[ https://issues.apache.org/jira/browse/HDFS-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433416#comment-13433416 ] Hudson commented on HDFS-3770: -- Integrated in Hadoop-Hdfs-trunk-Commit #2637 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2637/]) Revert HDFS-3719. See discussion there and HDFS-3770 for more info. (Revision 1372544) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372544 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed --- Key: HDFS-3770 URL: https://issues.apache.org/jira/browse/HDFS-3770 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Eli Collins TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed on [a recent job|https://builds.apache.org/job/PreCommit-HDFS-Build/2959]. Looks like a race in the test. The failure is due to a ChecksumException but that's likely due to the DFSOutputstream getting interrupted on close. Looking at the relevant code, waitForAckedSeqno is getting an InterruptedException waiting on dataQueue, looks like there are uses of interrupt where we're not first notifying dataQueue, or waiting for the notifications to be delivered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433419#comment-13433419 ] Hudson commented on HDFS-3719: -- Integrated in Hadoop-Common-trunk-Commit #2572 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2572/]) Revert HDFS-3719. See discussion there and HDFS-3770 for more info. (Revision 1372544) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372544 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java Re-enable append-related tests in TestFileConcurrentReader -- Key: HDFS-3719 URL: https://issues.apache.org/jira/browse/HDFS-3719 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-3719-1.patch Both of these tests are disabled. We should figure out what append functionality we need to make the tests work again, and reenable them. {code} // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorTransferToAppend() throws IOException { runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); } // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorNormalTransferAppend() throws IOException { runTestUnfinishedBlockCRCError(false, SyncType.APPEND, DEFAULT_WRITE_SIZE); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3770) TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed
[ https://issues.apache.org/jira/browse/HDFS-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433420#comment-13433420 ] Hudson commented on HDFS-3770: -- Integrated in Hadoop-Common-trunk-Commit #2572 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2572/]) Revert HDFS-3719. See discussion there and HDFS-3770 for more info. (Revision 1372544) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372544 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed --- Key: HDFS-3770 URL: https://issues.apache.org/jira/browse/HDFS-3770 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Eli Collins TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed on [a recent job|https://builds.apache.org/job/PreCommit-HDFS-Build/2959]. Looks like a race in the test. The failure is due to a ChecksumException but that's likely due to the DFSOutputstream getting interrupted on close. Looking at the relevant code, waitForAckedSeqno is getting an InterruptedException waiting on dataQueue, looks like there are uses of interrupt where we're not first notifying dataQueue, or waiting for the notifications to be delivered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3788) distcp can't copy large files using webhdfs due to missing Content-Length header
[ https://issues.apache.org/jira/browse/HDFS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433437#comment-13433437 ] Daryn Sharp commented on HDFS-3788: --- I'll add that if you just remove the content-length check, and the response is not chunked, the http timeouts will abort the download. distcp can't copy large files using webhdfs due to missing Content-Length header Key: HDFS-3788 URL: https://issues.apache.org/jira/browse/HDFS-3788 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Eli Collins Priority: Critical Attachments: distcp-webhdfs-errors.txt The following command fails when data1 contains a 3gb file. It passes when using hftp or when the directory just contains smaller (2gb) files, so looks like a webhdfs issue with large files. {{hadoop distcp webhdfs://eli-thinkpad:50070/user/eli/data1 hdfs://localhost:8020/user/eli/data2}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3790) test_fuse_dfs.c doesn't compile on centos 5
[ https://issues.apache.org/jira/browse/HDFS-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3790: --- Target Version/s: 2.2.0-alpha Affects Version/s: 2.2.0-alpha test_fuse_dfs.c doesn't compile on centos 5 --- Key: HDFS-3790 URL: https://issues.apache.org/jira/browse/HDFS-3790 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs Affects Versions: 2.2.0-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3790.001.patch test_fuse_dfs.c uses execvpe, which doesn't exist in the version of glibc shipped on CentOS 5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3789) JournalManager#format() should be able to throw IOException
[ https://issues.apache.org/jira/browse/HDFS-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3789: -- Resolution: Fixed Fix Version/s: 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Ivan. JournalManager#format() should be able to throw IOException --- Key: HDFS-3789 URL: https://issues.apache.org/jira/browse/HDFS-3789 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: 3.0.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 3.0.0 Attachments: HDFS-3789.diff Currently JournalManager#format cannot throw any exception. As format can fail, we should be able to propogate this failure upwards. Otherwise, format will fail silently, and the admin will start using the cluster with a failed/unusable journal manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3770) TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed
[ https://issues.apache.org/jira/browse/HDFS-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433458#comment-13433458 ] Hudson commented on HDFS-3770: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2593 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2593/]) Revert HDFS-3719. See discussion there and HDFS-3770 for more info. (Revision 1372544) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372544 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed --- Key: HDFS-3770 URL: https://issues.apache.org/jira/browse/HDFS-3770 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Eli Collins TestFileConcurrentReader#testUnfinishedBlockCRCErrorTransferToAppend failed on [a recent job|https://builds.apache.org/job/PreCommit-HDFS-Build/2959]. Looks like a race in the test. The failure is due to a ChecksumException but that's likely due to the DFSOutputstream getting interrupted on close. Looking at the relevant code, waitForAckedSeqno is getting an InterruptedException waiting on dataQueue, looks like there are uses of interrupt where we're not first notifying dataQueue, or waiting for the notifications to be delivered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433457#comment-13433457 ] Hudson commented on HDFS-3719: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2593 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2593/]) Revert HDFS-3719. See discussion there and HDFS-3770 for more info. (Revision 1372544) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372544 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileConcurrentReader.java Re-enable append-related tests in TestFileConcurrentReader -- Key: HDFS-3719 URL: https://issues.apache.org/jira/browse/HDFS-3719 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-3719-1.patch Both of these tests are disabled. We should figure out what append functionality we need to make the tests work again, and reenable them. {code} // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorTransferToAppend() throws IOException { runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); } // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorNormalTransferAppend() throws IOException { runTestUnfinishedBlockCRCError(false, SyncType.APPEND, DEFAULT_WRITE_SIZE); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3789) JournalManager#format() should be able to throw IOException
[ https://issues.apache.org/jira/browse/HDFS-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433462#comment-13433462 ] Hudson commented on HDFS-3789: -- Integrated in Hadoop-Hdfs-trunk-Commit #2638 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2638/]) HDFS-3789. JournalManager#format() should be able to throw IOException. Contributed by Ivan Kelly. (Revision 1372566) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372566 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java JournalManager#format() should be able to throw IOException --- Key: HDFS-3789 URL: https://issues.apache.org/jira/browse/HDFS-3789 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: 3.0.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 3.0.0 Attachments: HDFS-3789.diff Currently JournalManager#format cannot throw any exception. As format can fail, we should be able to propogate this failure upwards. Otherwise, format will fail silently, and the admin will start using the cluster with a failed/unusable journal manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3771) Namenode can't restart due to corrupt edit logs, timing issue with shutdown and edit log rolling
[ https://issues.apache.org/jira/browse/HDFS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433466#comment-13433466 ] Todd Lipcon commented on HDFS-3771: --- Hey Patrick. I think this behavior might have been fixed in 2.0.0 already -- the empty file should get properly ignored and the NN should start up. Perhaps you can instigate this failure again by adding System.exit(0) right before where {{START_LOG_SEGMENT}} is logged in {{startLogSegmentAndWriteHeaderTxn}}. That would allow you to see what the right recovery steps are. The issue seems to be described in HDFS-2093... I think the following comment may be relevant: {quote} Thus in the situation above, where the only log we have is this corrupted one, it will refuse to let the NN start, with a nice message explaining that the logs starting at this txid are corrupt with no txns. The operator can then double-check whether a different storage drive which possibly went missing might have better logs, etc, before starting NN. {quote} Looking at your logs, it seems like you have only one edits directory. So the above probably applies, and you could successfully start by removing that last (empty) log segment. bq. The larger concern should be for data loss. Based on what happened in this case it appears that any pending txids would be lost, unless the edit logs could be manually repaired. The filesystem would be intact, only minus the changes from the outstanding edit events, does that sound correct? Only in-flight transactions could be lost -- ie those that were never ACKed to a client. Anything that has been ACKed would have been fsynced to the log, and thus not lost. So, after inspecting the segment to make sure there are truly no transactions, you should be able to remove it and start with no data loss or corruption whatsoever. Namenode can't restart due to corrupt edit logs, timing issue with shutdown and edit log rolling Key: HDFS-3771 URL: https://issues.apache.org/jira/browse/HDFS-3771 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.3, 2.0.0-alpha Environment: QE, 20 node Federated cluster with 3 NNs and 15 DNs, using Kerberos based security Reporter: patrick white Priority: Critical Our 0.23.3 nightly HDFS regression suite encountered a particularly nasty issue recently, which resulted in the cluster's default Namenode being unable to restart, this was on a 20 node Federated cluster with security. The cause appears to be that the NN was just starting to roll its edit log when a shutdown occurred, the shutdown was intentional to restart the cluster as part of an automated test. The tests that were running do not appear to be the issue in themselves, the cluster was just wrapping up an adminReport subset and this failure case has not reproduce so far, nor was it failing previously. It looks like a chance occurrence of sending the shutdown just as the edit log roll was begun. From the NN log, the following sequence is noted: 1. an InvalidateBlocks operation had completed 2. FSNamesystem: Roll Edit Log from [Secondary Namenode IPaddr] 3. FSEditLog: Ending log segment 23963 4. FSEditLog: Starting log segment at 23967 4. NameNode: SHUTDOWN_MSG = the NN shuts down and then is restarted... 5. FSImageTransactionalStorageInspector: Logs beginning at txid 23967 were are all in-progress 6. FSImageTransactionalStorageInspector: Marking log at /grid/[PATH]/edits_inprogress_0023967 as corrupt since it has no transactions in it. 7. NameNode: Exception in namenode join [main]java.lang.IllegalStateException: No non-corrupt logs for txid 23967 = NN start attempts continue to cycle trying to restart but can't, failing on the same exception due to lack of non-corrupt edit logs If observations are correct and issue is from shutdown happening as edit logs are rolling, does the NN have an equivalent to the conventional fs 'sync' blocking action that should be called, or perhaps has a timing hole? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3789) JournalManager#format() should be able to throw IOException
[ https://issues.apache.org/jira/browse/HDFS-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433497#comment-13433497 ] Hudson commented on HDFS-3789: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2594 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2594/]) HDFS-3789. JournalManager#format() should be able to throw IOException. Contributed by Ivan Kelly. (Revision 1372566) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372566 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java JournalManager#format() should be able to throw IOException --- Key: HDFS-3789 URL: https://issues.apache.org/jira/browse/HDFS-3789 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: 3.0.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 3.0.0 Attachments: HDFS-3789.diff Currently JournalManager#format cannot throw any exception. As format can fail, we should be able to propogate this failure upwards. Otherwise, format will fail silently, and the admin will start using the cluster with a failed/unusable journal manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433506#comment-13433506 ] Aaron T. Myers commented on HDFS-3672: -- bq. I'd really encourage you to put this into the DataNode and throw an UnsupportedOperationException rather than merely do this via a client-side config. That's fine by me. I don't feel super strongly about this, so if this is your preference Arun, let's go with that. Expose disk-location information for blocks to enable better scheduling --- Key: HDFS-3672 URL: https://issues.apache.org/jira/browse/HDFS-3672 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Attachments: design-doc-v1.pdf, design-doc-v2.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch, hdfs-3672-7.patch, hdfs-3672-8.patch Currently, HDFS exposes on which datanodes a block resides, which allows clients to make scheduling decisions for locality and load balancing. Extending this to also expose on which disk on a datanode a block resides would enable even better scheduling, on a per-disk rather than coarse per-datanode basis. This API would likely look similar to Filesystem#getFileBlockLocations, but also involve a series of RPCs to the responsible datanodes to determine disk ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3765) Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages
[ https://issues.apache.org/jira/browse/HDFS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3765: -- Attachment: hdfs-3765.txt Trying patch upload again... this applies clean on trunk for me. Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages --- Key: HDFS-3765 URL: https://issues.apache.org/jira/browse/HDFS-3765 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 2.1.0-alpha, 3.0.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-3765.patch, HDFS-3765.patch, HDFS-3765.patch, hdfs-3765.txt, hdfs-3765.txt, hdfs-3765.txt Currently, NameNode INITIALIZESHAREDEDITS provides ability to copy the edits files to file schema based shared storages when moving cluster from Non-HA environment to HA enabled environment. This Jira focuses on the following * Generalizing the logic of copying the edits to new shared storage so that any schema based shared storage can initialized for HA cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2330) In NNStorage.java, IOExceptions of stream closures can mask root exceptions.
[ https://issues.apache.org/jira/browse/HDFS-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2330: -- Fix Version/s: 2.2.0-alpha Backported this small fix to branch-2 to avoid some merge conflicts in further backports. In NNStorage.java, IOExceptions of stream closures can mask root exceptions. - Key: HDFS-2330 URL: https://issues.apache.org/jira/browse/HDFS-2330 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 3.0.0, 2.2.0-alpha Attachments: HDFS-2330.patch, HDFS-2330.patch In NNStorage.java: There are many stream closures in finally block. There is a chance that they can mask the root exceptions. So, better to follow the pattern like below: {code} try{ stream.close(); stream =null; } finally{ IOUtils.cleanup(LOG, stream); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension
[ https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3190: -- Fix Version/s: 2.2.0-alpha Backported this to branch-2, since it was causing some conflicts in other backports, and it's a straight refactor. Simple refactors in existing NN code to assist QuorumJournalManager extension - Key: HDFS-3190 URL: https://issues.apache.org/jira/browse/HDFS-3190 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 3.0.0, 2.2.0-alpha Attachments: hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt This JIRA is for some simple refactors in the NN: - refactor the code which writes the seen_txid file in NNStorage into a new LongContainingFile utility class. This is useful for the JournalNode to atomically/durably record its last promised epoch - refactor the interface from FileJournalManager back to StorageDirectory to use a StorageErrorReport interface. This allows FileJournalManager to be used in isolation of a full StorageDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3276) initializeSharedEdits should have a -nonInteractive flag
[ https://issues.apache.org/jira/browse/HDFS-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433527#comment-13433527 ] Todd Lipcon commented on HDFS-3276: --- Hudson built this here: https://builds.apache.org/job/PreCommit-HDFS-Build/2983/ but the comment was swallowed during JIRA downtime: {quote} -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540156/hdfs-3276.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. {quote} Looking into the new findbugs warnings. initializeSharedEdits should have a -nonInteractive flag Key: HDFS-3276 URL: https://issues.apache.org/jira/browse/HDFS-3276 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 2.0.0-alpha Reporter: Vinithra Varadharajan Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3276.txt Similar to format and bootstrapStandby, would be nice to have -nonInteractive as an option on initializeSharedEdits -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3787) BlockManager#close races with ReplicationMonitor#run
[ https://issues.apache.org/jira/browse/HDFS-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433530#comment-13433530 ] Eli Collins commented on HDFS-3787: --- I kicked the pre-commit build manually. BlockManager#close races with ReplicationMonitor#run Key: HDFS-3787 URL: https://issues.apache.org/jira/browse/HDFS-3787 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Priority: Minor Attachments: hdfs-3787-2.txt, hdfs-3787-2.txt, hdfs-3787.txt We saw {{TestDirectoryScanner}} fail during shutdown: {code} 2012-08-09 12:17:19,844 WARN datanode.DataNode (BPServiceActor.java:run(683)) - Ending block pool service for: Block pool BP-610123021-172.29.121.238-1344539835759 (storage id DS-1581877160-172.29.121.238-43609-1344539837880) service to localhost/127.0.0.1:40012 ... 2012-08-09 12:17:19,876 FATAL blockmanagement.BlockManager (BlockManager.java:run(3039)) - ReplicationMonitor thread received Runtime exception. java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getBlockCollection(BlocksMap.java:101) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1141) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1116) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3070) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3032) at java.lang.Thread.run(Thread.java:662) {code} Inspecting the code, it appears that {{BlockManager#close - BlocksMap#close}} can set {{blocks}} to {{null}} while {{computeDatanodeWork}} is running. The fix seems simple -- have {{close}} just set an exit flag, and have {{ReplicationMonitor#run}} call {{BlocksMap#close}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3792) Fix two findbugs introduced by HDFS-3695
[ https://issues.apache.org/jira/browse/HDFS-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3792: -- Attachment: hdfs-3792.txt Trivial fix: forgot to add synchronized to these two methods and missed it in the QA report on HDFS-3695. Fix two findbugs introduced by HDFS-3695 Key: HDFS-3792 URL: https://issues.apache.org/jira/browse/HDFS-3792 Project: Hadoop HDFS Issue Type: Bug Components: build, name-node Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Trivial Attachments: hdfs-3792.txt Accidentally introduced two trivial findbugs warnings in HDFS-3695. This JIRA is to fix them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3792) Fix two findbugs introduced by HDFS-3695
[ https://issues.apache.org/jira/browse/HDFS-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3792: -- Status: Patch Available (was: Open) Fix two findbugs introduced by HDFS-3695 Key: HDFS-3792 URL: https://issues.apache.org/jira/browse/HDFS-3792 Project: Hadoop HDFS Issue Type: Bug Components: build, name-node Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Trivial Attachments: hdfs-3792.txt Accidentally introduced two trivial findbugs warnings in HDFS-3695. This JIRA is to fix them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433538#comment-13433538 ] Suresh Srinivas commented on HDFS-3672: --- bq. Perhaps Storage(BlockLocation|Id)? Volume(BlockLocation|Id)? I'm not entirely sure of the end-user terminology here. DiskBlockLocation could be BlockStorageLocation or just StorageLocation. DiskId - StorageId seems appropriate here. However it is used for other things in HDFS. Als you suggested, perhaps VolumeId may be okay. bq. Should I just bump the default (say, to 10)? I haven't done any performance testing, so I don't know if it's a problem. Only with this feature there will be more RPC calls to datanodes and hence may need more handlers. Handler is just a thread, so increasing it to 10 should be fine. @aaron - need server side config as well. That is the only way an admin could control the accessibility to the feature. One could use exception/support for required method to figure out if server supports the functionality on client side instead of config. Please address my previous comment: bq. Is there a timeline where someone will work on HBase or MapReduce enhancements to use this capability? Expose disk-location information for blocks to enable better scheduling --- Key: HDFS-3672 URL: https://issues.apache.org/jira/browse/HDFS-3672 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Attachments: design-doc-v1.pdf, design-doc-v2.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch, hdfs-3672-7.patch, hdfs-3672-8.patch Currently, HDFS exposes on which datanodes a block resides, which allows clients to make scheduling decisions for locality and load balancing. Extending this to also expose on which disk on a datanode a block resides would enable even better scheduling, on a per-disk rather than coarse per-datanode basis. This API would likely look similar to Filesystem#getFileBlockLocations, but also involve a series of RPCs to the responsible datanodes to determine disk ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3276) initializeSharedEdits should have a -nonInteractive flag
[ https://issues.apache.org/jira/browse/HDFS-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433539#comment-13433539 ] Todd Lipcon commented on HDFS-3276: --- The two new findbugs warnings are HDFS-3792 - not caused by this patch. bq. -1 tests included. The patch doesn't appear to include any new or modified tests. There are no new tests since this is just hooking up existing code (which is tested in TestInitializeSharedEdits) to command line flags. I manually tested the command line flags and verified they perform as expected. initializeSharedEdits should have a -nonInteractive flag Key: HDFS-3276 URL: https://issues.apache.org/jira/browse/HDFS-3276 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 2.0.0-alpha Reporter: Vinithra Varadharajan Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3276.txt Similar to format and bootstrapStandby, would be nice to have -nonInteractive as an option on initializeSharedEdits -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3792) Fix two findbugs introduced by HDFS-3695
[ https://issues.apache.org/jira/browse/HDFS-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433544#comment-13433544 ] Aaron T. Myers commented on HDFS-3792: -- +1 pending Jenkins. Fix two findbugs introduced by HDFS-3695 Key: HDFS-3792 URL: https://issues.apache.org/jira/browse/HDFS-3792 Project: Hadoop HDFS Issue Type: Bug Components: build, name-node Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Trivial Attachments: hdfs-3792.txt Accidentally introduced two trivial findbugs warnings in HDFS-3695. This JIRA is to fix them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3276) initializeSharedEdits should have a -nonInteractive flag
[ https://issues.apache.org/jira/browse/HDFS-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433550#comment-13433550 ] Aaron T. Myers commented on HDFS-3276: -- +1, the patch looks good to me. initializeSharedEdits should have a -nonInteractive flag Key: HDFS-3276 URL: https://issues.apache.org/jira/browse/HDFS-3276 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 2.0.0-alpha Reporter: Vinithra Varadharajan Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3276.txt Similar to format and bootstrapStandby, would be nice to have -nonInteractive as an option on initializeSharedEdits -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3276) initializeSharedEdits should have a -nonInteractive flag
[ https://issues.apache.org/jira/browse/HDFS-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3276: -- Resolution: Fixed Fix Version/s: 2.2.0-alpha 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to branch-2 and trunk. Thanks. initializeSharedEdits should have a -nonInteractive flag Key: HDFS-3276 URL: https://issues.apache.org/jira/browse/HDFS-3276 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 2.0.0-alpha Reporter: Vinithra Varadharajan Assignee: Todd Lipcon Priority: Minor Fix For: 3.0.0, 2.2.0-alpha Attachments: hdfs-3276.txt Similar to format and bootstrapStandby, would be nice to have -nonInteractive as an option on initializeSharedEdits -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3793) Implement genericized format() in QJM
Todd Lipcon created HDFS-3793: - Summary: Implement genericized format() in QJM Key: HDFS-3793 URL: https://issues.apache.org/jira/browse/HDFS-3793 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon HDFS-3695 added the ability for non-File journal managers to tie into calls like NameNode -format. This JIRA is to implement format() for QuorumJournalManager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433582#comment-13433582 ] Andrew Purtell commented on HDFS-3672: -- bq. Is there a timeline where someone will work on HBase or MapReduce enhancements to use this capability? I put up some ramblings on HBASE-6572. The scope is much larger and there's no timeline, it's a brainstorming issue. However, if you'd like this issue can be linked to it. Expose disk-location information for blocks to enable better scheduling --- Key: HDFS-3672 URL: https://issues.apache.org/jira/browse/HDFS-3672 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Attachments: design-doc-v1.pdf, design-doc-v2.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch, hdfs-3672-7.patch, hdfs-3672-8.patch Currently, HDFS exposes on which datanodes a block resides, which allows clients to make scheduling decisions for locality and load balancing. Extending this to also expose on which disk on a datanode a block resides would enable even better scheduling, on a per-disk rather than coarse per-datanode basis. This API would likely look similar to Filesystem#getFileBlockLocations, but also involve a series of RPCs to the responsible datanodes to determine disk ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3794) WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header.
Ravi Prakash created HDFS-3794: -- Summary: WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header. Key: HDFS-3794 URL: https://issues.apache.org/jira/browse/HDFS-3794 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha, 0.23.3, 2.1.0-alpha Reporter: Ravi Prakash Assignee: Ravi Prakash When an offset is specified, the HTTP header Content Length still contains the original file size. e.g. if the original file is 100 bytes, and the offset specified it 10, then HTTP Content Length ought to be 90. Currently it is still returned as 100. This causes curl to give error 18, and JAVA to throw ConnectionClosedException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3793) Implement genericized format() in QJM
[ https://issues.apache.org/jira/browse/HDFS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3793: -- Attachment: hdfs-3793.txt Attached patch implements the formatting behavior. In addition to changing the tests to use this new API to format at startup, I also tested this manually on a cluster using both namenode -format and namenode -initializeSharedEdits. Both the confirmation behavior and the formatting behavior reacted correctly. Implement genericized format() in QJM - Key: HDFS-3793 URL: https://issues.apache.org/jira/browse/HDFS-3793 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3793.txt HDFS-3695 added the ability for non-File journal managers to tie into calls like NameNode -format. This JIRA is to implement format() for QuorumJournalManager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3723) All commands should support meaningful --help
[ https://issues.apache.org/jira/browse/HDFS-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3723: Attachment: HDFS-3723.001.patch Suresh, thanks for the comments. I have addressed the comments and added a help function in DFSUtil. I used the function to parse and check the help argument for commands DataNode, NameNode, ZKFC, FSCK, Balancer, GetConf, and GetGroups. Other commands such as JmxGet have their own mechanisms to handle help argument, so I did not change them. All commands should support meaningful --help - Key: HDFS-3723 URL: https://issues.apache.org/jira/browse/HDFS-3723 Project: Hadoop HDFS Issue Type: Improvement Components: scripts, tools Affects Versions: 2.0.0-alpha Reporter: E. Sammer Assignee: Jing Zhao Attachments: HDFS-3723.001.patch, HDFS-3723.patch, HDFS-3723.patch Some (sub)commands support -help or -h options for detailed help while others do not. Ideally, all commands should support meaningful help that works regardless of current state or configuration. For example, hdfs zkfc --help (or -h or -help) is not very useful. Option checking should occur before state / configuration checking. {code} [esammer@hadoop-fed01 ~]# hdfs zkfc --help Exception in thread main org.apache.hadoop.HadoopIllegalArgumentException: HA is not enabled for this namenode. at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.setConf(DFSZKFailoverController.java:122) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:66) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:168) {code} This would go a long way toward better usability for ops staff. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3794) WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header.
[ https://issues.apache.org/jira/browse/HDFS-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433603#comment-13433603 ] Ravi Prakash commented on HDFS-3794: {noformat} e.g. $ curl -L http://HOST:PORT/webhdfs/v1/somePath/someFile?op=OPENoffset=10; curl: (18) transfer closed with 10 bytes remaining to read {noformat} WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header. - Key: HDFS-3794 URL: https://issues.apache.org/jira/browse/HDFS-3794 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-alpha Reporter: Ravi Prakash Assignee: Ravi Prakash When an offset is specified, the HTTP header Content Length still contains the original file size. e.g. if the original file is 100 bytes, and the offset specified it 10, then HTTP Content Length ought to be 90. Currently it is still returned as 100. This causes curl to give error 18, and JAVA to throw ConnectionClosedException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3795) QJM: validate journal dir at startup
Todd Lipcon created HDFS-3795: - Summary: QJM: validate journal dir at startup Key: HDFS-3795 URL: https://issues.apache.org/jira/browse/HDFS-3795 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3795.txt Currently, the JN does not validate the configured journal directory until it tries to write into it. This is counter-intuitive for users, since they would expect to find out about a misconfiguration at startup time, rather than on first access. Additionally, two testers accidentally configured the journal dir to be a URI, which the code accidentally understood as a relative path ({{CWD/file:/foo/bar}}. We should validate the config at startup to be an accessible absolute path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3795) QJM: validate journal dir at startup
[ https://issues.apache.org/jira/browse/HDFS-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3795: -- Attachment: hdfs-3795.txt Simple patch attached. QJM: validate journal dir at startup Key: HDFS-3795 URL: https://issues.apache.org/jira/browse/HDFS-3795 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3795.txt Currently, the JN does not validate the configured journal directory until it tries to write into it. This is counter-intuitive for users, since they would expect to find out about a misconfiguration at startup time, rather than on first access. Additionally, two testers accidentally configured the journal dir to be a URI, which the code accidentally understood as a relative path ({{CWD/file:/foo/bar}}. We should validate the config at startup to be an accessible absolute path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3794) WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header.
[ https://issues.apache.org/jira/browse/HDFS-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-3794: --- Attachment: HDFS-3794.patch Attaching a patch that fixes the issue. Its too trivial to write a unit test (which will have to be pretty complicated :'( ... I tried briefly) Here's the testing I did 1. Small file with offset. Worked 2. Big file (multiple blocks) with offset. Worked 3. Big file with offset greater than file size. Correctly threw a RemoteException WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header. - Key: HDFS-3794 URL: https://issues.apache.org/jira/browse/HDFS-3794 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-alpha Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-3794.patch When an offset is specified, the HTTP header Content Length still contains the original file size. e.g. if the original file is 100 bytes, and the offset specified it 10, then HTTP Content Length ought to be 90. Currently it is still returned as 100. This causes curl to give error 18, and JAVA to throw ConnectionClosedException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3794) WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header.
[ https://issues.apache.org/jira/browse/HDFS-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433609#comment-13433609 ] Ravi Prakash commented on HDFS-3794: The same patch applies to branch 0.23, 2, and trunk. WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header. - Key: HDFS-3794 URL: https://issues.apache.org/jira/browse/HDFS-3794 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-alpha Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-3794.patch When an offset is specified, the HTTP header Content Length still contains the original file size. e.g. if the original file is 100 bytes, and the offset specified it 10, then HTTP Content Length ought to be 90. Currently it is still returned as 100. This causes curl to give error 18, and JAVA to throw ConnectionClosedException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3794) WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header.
[ https://issues.apache.org/jira/browse/HDFS-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-3794: --- Status: Patch Available (was: Open) WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header. - Key: HDFS-3794 URL: https://issues.apache.org/jira/browse/HDFS-3794 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha, 0.23.3, 2.1.0-alpha Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-3794.patch When an offset is specified, the HTTP header Content Length still contains the original file size. e.g. if the original file is 100 bytes, and the offset specified it 10, then HTTP Content Length ought to be 90. Currently it is still returned as 100. This causes curl to give error 18, and JAVA to throw ConnectionClosedException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2330) In NNStorage.java, IOExceptions of stream closures can mask root exceptions.
[ https://issues.apache.org/jira/browse/HDFS-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433613#comment-13433613 ] Hudson commented on HDFS-2330: -- Integrated in Hadoop-Hdfs-trunk-Commit #2639 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2639/]) Move HDFS-2330 and HDFS-3190 to branch-2 section, since they have been backported from trunk. (Revision 1372605) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372605 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt In NNStorage.java, IOExceptions of stream closures can mask root exceptions. - Key: HDFS-2330 URL: https://issues.apache.org/jira/browse/HDFS-2330 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 3.0.0, 2.2.0-alpha Attachments: HDFS-2330.patch, HDFS-2330.patch In NNStorage.java: There are many stream closures in finally block. There is a chance that they can mask the root exceptions. So, better to follow the pattern like below: {code} try{ stream.close(); stream =null; } finally{ IOUtils.cleanup(LOG, stream); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3276) initializeSharedEdits should have a -nonInteractive flag
[ https://issues.apache.org/jira/browse/HDFS-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433614#comment-13433614 ] Hudson commented on HDFS-3276: -- Integrated in Hadoop-Hdfs-trunk-Commit #2639 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2639/]) HDFS-3276. initializeSharedEdits should have a -nonInteractive flag. Contributed by Todd Lipcon. (Revision 1372628) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372628 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java initializeSharedEdits should have a -nonInteractive flag Key: HDFS-3276 URL: https://issues.apache.org/jira/browse/HDFS-3276 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 2.0.0-alpha Reporter: Vinithra Varadharajan Assignee: Todd Lipcon Priority: Minor Fix For: 3.0.0, 2.2.0-alpha Attachments: hdfs-3276.txt Similar to format and bootstrapStandby, would be nice to have -nonInteractive as an option on initializeSharedEdits -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension
[ https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433612#comment-13433612 ] Hudson commented on HDFS-3190: -- Integrated in Hadoop-Hdfs-trunk-Commit #2639 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2639/]) Move HDFS-2330 and HDFS-3190 to branch-2 section, since they have been backported from trunk. (Revision 1372605) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372605 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Simple refactors in existing NN code to assist QuorumJournalManager extension - Key: HDFS-3190 URL: https://issues.apache.org/jira/browse/HDFS-3190 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 3.0.0, 2.2.0-alpha Attachments: hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt This JIRA is for some simple refactors in the NN: - refactor the code which writes the seen_txid file in NNStorage into a new LongContainingFile utility class. This is useful for the JournalNode to atomically/durably record its last promised epoch - refactor the interface from FileJournalManager back to StorageDirectory to use a StorageErrorReport interface. This allows FileJournalManager to be used in isolation of a full StorageDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2330) In NNStorage.java, IOExceptions of stream closures can mask root exceptions.
[ https://issues.apache.org/jira/browse/HDFS-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433624#comment-13433624 ] Hudson commented on HDFS-2330: -- Integrated in Hadoop-Common-trunk-Commit #2574 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2574/]) Move HDFS-2330 and HDFS-3190 to branch-2 section, since they have been backported from trunk. (Revision 1372605) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372605 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt In NNStorage.java, IOExceptions of stream closures can mask root exceptions. - Key: HDFS-2330 URL: https://issues.apache.org/jira/browse/HDFS-2330 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 3.0.0, 2.2.0-alpha Attachments: HDFS-2330.patch, HDFS-2330.patch In NNStorage.java: There are many stream closures in finally block. There is a chance that they can mask the root exceptions. So, better to follow the pattern like below: {code} try{ stream.close(); stream =null; } finally{ IOUtils.cleanup(LOG, stream); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension
[ https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433623#comment-13433623 ] Hudson commented on HDFS-3190: -- Integrated in Hadoop-Common-trunk-Commit #2574 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2574/]) Move HDFS-2330 and HDFS-3190 to branch-2 section, since they have been backported from trunk. (Revision 1372605) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372605 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Simple refactors in existing NN code to assist QuorumJournalManager extension - Key: HDFS-3190 URL: https://issues.apache.org/jira/browse/HDFS-3190 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 3.0.0, 2.2.0-alpha Attachments: hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt This JIRA is for some simple refactors in the NN: - refactor the code which writes the seen_txid file in NNStorage into a new LongContainingFile utility class. This is useful for the JournalNode to atomically/durably record its last promised epoch - refactor the interface from FileJournalManager back to StorageDirectory to use a StorageErrorReport interface. This allows FileJournalManager to be used in isolation of a full StorageDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3276) initializeSharedEdits should have a -nonInteractive flag
[ https://issues.apache.org/jira/browse/HDFS-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433625#comment-13433625 ] Hudson commented on HDFS-3276: -- Integrated in Hadoop-Common-trunk-Commit #2574 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2574/]) HDFS-3276. initializeSharedEdits should have a -nonInteractive flag. Contributed by Todd Lipcon. (Revision 1372628) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372628 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java initializeSharedEdits should have a -nonInteractive flag Key: HDFS-3276 URL: https://issues.apache.org/jira/browse/HDFS-3276 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 2.0.0-alpha Reporter: Vinithra Varadharajan Assignee: Todd Lipcon Priority: Minor Fix For: 3.0.0, 2.2.0-alpha Attachments: hdfs-3276.txt Similar to format and bootstrapStandby, would be nice to have -nonInteractive as an option on initializeSharedEdits -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension
[ https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433628#comment-13433628 ] Hudson commented on HDFS-3190: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2596 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2596/]) Move HDFS-2330 and HDFS-3190 to branch-2 section, since they have been backported from trunk. (Revision 1372605) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372605 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Simple refactors in existing NN code to assist QuorumJournalManager extension - Key: HDFS-3190 URL: https://issues.apache.org/jira/browse/HDFS-3190 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 3.0.0, 2.2.0-alpha Attachments: hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt This JIRA is for some simple refactors in the NN: - refactor the code which writes the seen_txid file in NNStorage into a new LongContainingFile utility class. This is useful for the JournalNode to atomically/durably record its last promised epoch - refactor the interface from FileJournalManager back to StorageDirectory to use a StorageErrorReport interface. This allows FileJournalManager to be used in isolation of a full StorageDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2330) In NNStorage.java, IOExceptions of stream closures can mask root exceptions.
[ https://issues.apache.org/jira/browse/HDFS-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433629#comment-13433629 ] Hudson commented on HDFS-2330: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2596 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2596/]) Move HDFS-2330 and HDFS-3190 to branch-2 section, since they have been backported from trunk. (Revision 1372605) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372605 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt In NNStorage.java, IOExceptions of stream closures can mask root exceptions. - Key: HDFS-2330 URL: https://issues.apache.org/jira/browse/HDFS-2330 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 3.0.0, 2.2.0-alpha Attachments: HDFS-2330.patch, HDFS-2330.patch In NNStorage.java: There are many stream closures in finally block. There is a chance that they can mask the root exceptions. So, better to follow the pattern like below: {code} try{ stream.close(); stream =null; } finally{ IOUtils.cleanup(LOG, stream); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3723) All commands should support meaningful --help
[ https://issues.apache.org/jira/browse/HDFS-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433630#comment-13433630 ] Hadoop QA commented on HDFS-3723: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540767/HDFS-3723.001.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2993//console This message is automatically generated. All commands should support meaningful --help - Key: HDFS-3723 URL: https://issues.apache.org/jira/browse/HDFS-3723 Project: Hadoop HDFS Issue Type: Improvement Components: scripts, tools Affects Versions: 2.0.0-alpha Reporter: E. Sammer Assignee: Jing Zhao Attachments: HDFS-3723.001.patch, HDFS-3723.patch, HDFS-3723.patch Some (sub)commands support -help or -h options for detailed help while others do not. Ideally, all commands should support meaningful help that works regardless of current state or configuration. For example, hdfs zkfc --help (or -h or -help) is not very useful. Option checking should occur before state / configuration checking. {code} [esammer@hadoop-fed01 ~]# hdfs zkfc --help Exception in thread main org.apache.hadoop.HadoopIllegalArgumentException: HA is not enabled for this namenode. at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.setConf(DFSZKFailoverController.java:122) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:66) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:168) {code} This would go a long way toward better usability for ops staff. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3150) Add option for clients to contact DNs via hostname
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3150: -- Attachment: hdfs-3150.txt Patch attached again since it looks like jira lost the old version. Add option for clients to contact DNs via hostname -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, hdfs client Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt, hdfs-3150.txt, hdfs-3150.txt, hdfs-3150.txt The DN listens on multiple IP addresses (the default {{dfs.datanode.address}} is the wildcard) however per HADOOP-6867 only the source address (IP) of the registration is given to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming (the client can not route the IP exposed by the NN if the DN registers with an interface that has a cluster-private IP). To fix this let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP Approach #2 does not require an incompatible client protocol change, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3796) Speed up edit log tests by avoiding fsync()
Todd Lipcon created HDFS-3796: - Summary: Speed up edit log tests by avoiding fsync() Key: HDFS-3796 URL: https://issues.apache.org/jira/browse/HDFS-3796 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.2.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Our edit log tests are very slow because they incur a lot of fsyncs as they write out transactions. Since fsync() has no effect except in the case of power outages or system crashes, and we don't care about power outages in the context of tests, we can safely skip the fsync without any loss in coverage. In my tests, this sped up TestEditLog by about 5x. The testFuzzSequences test case improved from ~83 seconds with fsync to about 5 seconds without. These results are from my SSD laptop - they are probably even more drastic on spinning media. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3796) Speed up edit log tests by avoiding fsync()
[ https://issues.apache.org/jira/browse/HDFS-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3796: -- Status: Patch Available (was: Open) Speed up edit log tests by avoiding fsync() --- Key: HDFS-3796 URL: https://issues.apache.org/jira/browse/HDFS-3796 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.2.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3796.txt Our edit log tests are very slow because they incur a lot of fsyncs as they write out transactions. Since fsync() has no effect except in the case of power outages or system crashes, and we don't care about power outages in the context of tests, we can safely skip the fsync without any loss in coverage. In my tests, this sped up TestEditLog by about 5x. The testFuzzSequences test case improved from ~83 seconds with fsync to about 5 seconds without. These results are from my SSD laptop - they are probably even more drastic on spinning media. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3796) Speed up edit log tests by avoiding fsync()
[ https://issues.apache.org/jira/browse/HDFS-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3796: -- Attachment: hdfs-3796.txt Speed up edit log tests by avoiding fsync() --- Key: HDFS-3796 URL: https://issues.apache.org/jira/browse/HDFS-3796 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.2.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3796.txt Our edit log tests are very slow because they incur a lot of fsyncs as they write out transactions. Since fsync() has no effect except in the case of power outages or system crashes, and we don't care about power outages in the context of tests, we can safely skip the fsync without any loss in coverage. In my tests, this sped up TestEditLog by about 5x. The testFuzzSequences test case improved from ~83 seconds with fsync to about 5 seconds without. These results are from my SSD laptop - they are probably even more drastic on spinning media. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3797) QJM: add segment txid as a parameter to journal() RPC
Todd Lipcon created HDFS-3797: - Summary: QJM: add segment txid as a parameter to journal() RPC Key: HDFS-3797 URL: https://issues.apache.org/jira/browse/HDFS-3797 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor During fault testing of QJM, I saw the following issue: 1) NN sends txn 5 to JN 2) NN gets partitioned from JN while JN remains up. The next two RPCs are missed while the partition has happened: 2a) finalizeSegment(1-5) 2b) startSegment(6) 3) NN sends txn 6 to JN This caused one of the JNs to end up with a segment 1-10 while the others had two segments; 1-5 and 6-10. This broke some invariants of the QJM protocol and prevented the recovery protocol from running properly. This can be addressed on the client side by HDFS-3726, which would cause the NN to not send the RPC in #3. But it makes sense to also add an extra safety check here on the server side: with every journal() call, we can send the segment's txid. Then if the JN and the client get out of sync, the JN can reject the RPCs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3796) Speed up edit log tests by avoiding fsync()
[ https://issues.apache.org/jira/browse/HDFS-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433692#comment-13433692 ] Suresh Srinivas commented on HDFS-3796: --- Todd, do multiple junit tests reuse JVM? If so, you are better off adding this to @BeforeClass and @AfterClass? Speed up edit log tests by avoiding fsync() --- Key: HDFS-3796 URL: https://issues.apache.org/jira/browse/HDFS-3796 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.2.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3796.txt Our edit log tests are very slow because they incur a lot of fsyncs as they write out transactions. Since fsync() has no effect except in the case of power outages or system crashes, and we don't care about power outages in the context of tests, we can safely skip the fsync without any loss in coverage. In my tests, this sped up TestEditLog by about 5x. The testFuzzSequences test case improved from ~83 seconds with fsync to about 5 seconds without. These results are from my SSD laptop - they are probably even more drastic on spinning media. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3798) Avoid throwing NPE when finalizeSegment() is called on invalid segment
Todd Lipcon created HDFS-3798: - Summary: Avoid throwing NPE when finalizeSegment() is called on invalid segment Key: HDFS-3798 URL: https://issues.apache.org/jira/browse/HDFS-3798 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Trivial Currently, if the client calls finalizeLogSegment() on a segment which doesn't exist on the JournalNode side, it throws an NPE. Instead it should throw a more intelligible exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3798) Avoid throwing NPE when finalizeSegment() is called on invalid segment
[ https://issues.apache.org/jira/browse/HDFS-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3798: -- Attachment: hdfs-3798.txt Avoid throwing NPE when finalizeSegment() is called on invalid segment -- Key: HDFS-3798 URL: https://issues.apache.org/jira/browse/HDFS-3798 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Trivial Attachments: hdfs-3798.txt Currently, if the client calls finalizeLogSegment() on a segment which doesn't exist on the JournalNode side, it throws an NPE. Instead it should throw a more intelligible exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3276) initializeSharedEdits should have a -nonInteractive flag
[ https://issues.apache.org/jira/browse/HDFS-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433696#comment-13433696 ] Hudson commented on HDFS-3276: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2597 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2597/]) HDFS-3276. initializeSharedEdits should have a -nonInteractive flag. Contributed by Todd Lipcon. (Revision 1372628) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1372628 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java initializeSharedEdits should have a -nonInteractive flag Key: HDFS-3276 URL: https://issues.apache.org/jira/browse/HDFS-3276 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 2.0.0-alpha Reporter: Vinithra Varadharajan Assignee: Todd Lipcon Priority: Minor Fix For: 3.0.0, 2.2.0-alpha Attachments: hdfs-3276.txt Similar to format and bootstrapStandby, would be nice to have -nonInteractive as an option on initializeSharedEdits -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3796) Speed up edit log tests by avoiding fsync()
[ https://issues.apache.org/jira/browse/HDFS-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433699#comment-13433699 ] Todd Lipcon commented on HDFS-3796: --- Hey Suresh. Nope, each junit class file runs in its own JVM. We make use of the static {} pattern for setting log levels as well, so I think this should be considered equivalent. Speed up edit log tests by avoiding fsync() --- Key: HDFS-3796 URL: https://issues.apache.org/jira/browse/HDFS-3796 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.2.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3796.txt Our edit log tests are very slow because they incur a lot of fsyncs as they write out transactions. Since fsync() has no effect except in the case of power outages or system crashes, and we don't care about power outages in the context of tests, we can safely skip the fsync without any loss in coverage. In my tests, this sped up TestEditLog by about 5x. The testFuzzSequences test case improved from ~83 seconds with fsync to about 5 seconds without. These results are from my SSD laptop - they are probably even more drastic on spinning media. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3796) Speed up edit log tests by avoiding fsync()
[ https://issues.apache.org/jira/browse/HDFS-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433704#comment-13433704 ] Suresh Srinivas commented on HDFS-3796: --- Well I thought we use a specific LOG to do that. +1 for the patch. Speed up edit log tests by avoiding fsync() --- Key: HDFS-3796 URL: https://issues.apache.org/jira/browse/HDFS-3796 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.2.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3796.txt Our edit log tests are very slow because they incur a lot of fsyncs as they write out transactions. Since fsync() has no effect except in the case of power outages or system crashes, and we don't care about power outages in the context of tests, we can safely skip the fsync without any loss in coverage. In my tests, this sped up TestEditLog by about 5x. The testFuzzSequences test case improved from ~83 seconds with fsync to about 5 seconds without. These results are from my SSD laptop - they are probably even more drastic on spinning media. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3794) WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header.
[ https://issues.apache.org/jira/browse/HDFS-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433710#comment-13433710 ] Tsz Wo (Nicholas), SZE commented on HDFS-3794: -- +1 on the patch. Good catch! WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header. - Key: HDFS-3794 URL: https://issues.apache.org/jira/browse/HDFS-3794 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 0.23.3, 2.0.0-alpha, 2.1.0-alpha Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-3794.patch When an offset is specified, the HTTP header Content Length still contains the original file size. e.g. if the original file is 100 bytes, and the offset specified it 10, then HTTP Content Length ought to be 90. Currently it is still returned as 100. This causes curl to give error 18, and JAVA to throw ConnectionClosedException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3788) distcp can't copy large files using webhdfs due to missing Content-Length header
[ https://issues.apache.org/jira/browse/HDFS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433716#comment-13433716 ] Tsz Wo (Nicholas), SZE commented on HDFS-3788: -- How about first check the transfer-encoding, if it is chunked, then no content-length check? distcp can't copy large files using webhdfs due to missing Content-Length header Key: HDFS-3788 URL: https://issues.apache.org/jira/browse/HDFS-3788 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Eli Collins Priority: Critical Attachments: distcp-webhdfs-errors.txt The following command fails when data1 contains a 3gb file. It passes when using hftp or when the directory just contains smaller (2gb) files, so looks like a webhdfs issue with large files. {{hadoop distcp webhdfs://eli-thinkpad:50070/user/eli/data1 hdfs://localhost:8020/user/eli/data2}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3766) Release stream and storage directory for removed streams, and fix TestStorageRestore on Windows
[ https://issues.apache.org/jira/browse/HDFS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-3766: - Attachment: HDFS-3766.patch Release stream and storage directory for removed streams, and fix TestStorageRestore on Windows --- Key: HDFS-3766 URL: https://issues.apache.org/jira/browse/HDFS-3766 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1-win Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-3766.patch When a storage directory is removed, namenode doesn't close the stream and storage directory is remained locked. This could fail later on the restoring storage directory function because namenode will not be able to format original directory. Unlike Linux, Windows doesn't allow deleting a file or directory which is opened with no share/delete permission by a different process. Similar problem also caused TestStorageRestore to fail because it can't delete the directories/files being used by the test itself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3766) Release stream and storage directory for removed streams, and fix TestStorageRestore on Windows
[ https://issues.apache.org/jira/browse/HDFS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433723#comment-13433723 ] Brandon Li commented on HDFS-3766: -- Patch uploaded for branch-1-win Release stream and storage directory for removed streams, and fix TestStorageRestore on Windows --- Key: HDFS-3766 URL: https://issues.apache.org/jira/browse/HDFS-3766 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1-win Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-3766.patch When a storage directory is removed, namenode doesn't close the stream and storage directory is remained locked. This could fail later on the restoring storage directory function because namenode will not be able to format original directory. Unlike Linux, Windows doesn't allow deleting a file or directory which is opened with no share/delete permission by a different process. Similar problem also caused TestStorageRestore to fail because it can't delete the directories/files being used by the test itself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433724#comment-13433724 ] Aaron T. Myers commented on HDFS-3150: -- The trunk patch looks pretty good to me. One little comment: bq. @param useHostname if name should use a hostname or IP This comment reads a little funny. Maybe true to use the hostname of the DN, false to use the IP address. +1 once this is addressed. Add option for clients to contact DNs via hostname -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, hdfs client Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt, hdfs-3150.txt, hdfs-3150.txt, hdfs-3150.txt The DN listens on multiple IP addresses (the default {{dfs.datanode.address}} is the wildcard) however per HADOOP-6867 only the source address (IP) of the registration is given to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming (the client can not route the IP exposed by the NN if the DN registers with an interface that has a cluster-private IP). To fix this let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP Approach #2 does not require an incompatible client protocol change, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3790) test_fuse_dfs.c doesn't compile on centos 5
[ https://issues.apache.org/jira/browse/HDFS-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433736#comment-13433736 ] Hadoop QA commented on HDFS-3790: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540713/HDFS-3790.001.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2991//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2991//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2991//console This message is automatically generated. test_fuse_dfs.c doesn't compile on centos 5 --- Key: HDFS-3790 URL: https://issues.apache.org/jira/browse/HDFS-3790 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs Affects Versions: 2.2.0-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3790.001.patch test_fuse_dfs.c uses execvpe, which doesn't exist in the version of glibc shipped on CentOS 5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3790) test_fuse_dfs.c doesn't compile on centos 5
[ https://issues.apache.org/jira/browse/HDFS-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3790: - Resolution: Fixed Fix Version/s: 2.2.0-alpha Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Thanks a lot for fixing this, Colin. test_fuse_dfs.c doesn't compile on centos 5 --- Key: HDFS-3790 URL: https://issues.apache.org/jira/browse/HDFS-3790 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs Affects Versions: 2.2.0-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Fix For: 2.2.0-alpha Attachments: HDFS-3790.001.patch test_fuse_dfs.c uses execvpe, which doesn't exist in the version of glibc shipped on CentOS 5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3795) QJM: validate journal dir at startup
[ https://issues.apache.org/jira/browse/HDFS-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433773#comment-13433773 ] Aaron T. Myers commented on HDFS-3795: -- # Instead of !dir.getPath().startsWith(/) how about !dis.isAbsolute() ? # If the path is not a directory, this will fail with a misleading error message: {code} +if (!dir.isDirectory() !dir.mkdirs()) { + throw new IOException(Could not create journal dir ' + + dir + '); +} {code} Patch looks good otherwise. QJM: validate journal dir at startup Key: HDFS-3795 URL: https://issues.apache.org/jira/browse/HDFS-3795 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3795.txt Currently, the JN does not validate the configured journal directory until it tries to write into it. This is counter-intuitive for users, since they would expect to find out about a misconfiguration at startup time, rather than on first access. Additionally, two testers accidentally configured the journal dir to be a URI, which the code accidentally understood as a relative path ({{CWD/file:/foo/bar}}. We should validate the config at startup to be an accessible absolute path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-3672: -- Attachment: hdfs-3672-9.patch Expose disk-location information for blocks to enable better scheduling --- Key: HDFS-3672 URL: https://issues.apache.org/jira/browse/HDFS-3672 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Attachments: design-doc-v1.pdf, design-doc-v2.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch, hdfs-3672-7.patch, hdfs-3672-8.patch, hdfs-3672-9.patch Currently, HDFS exposes on which datanodes a block resides, which allows clients to make scheduling decisions for locality and load balancing. Extending this to also expose on which disk on a datanode a block resides would enable even better scheduling, on a per-disk rather than coarse per-datanode basis. This API would likely look similar to Filesystem#getFileBlockLocations, but also involve a series of RPCs to the responsible datanodes to determine disk ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3792) Fix two findbugs introduced by HDFS-3695
[ https://issues.apache.org/jira/browse/HDFS-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433787#comment-13433787 ] Hadoop QA commented on HDFS-3792: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540757/hdfs-3792.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2994//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2994//console This message is automatically generated. Fix two findbugs introduced by HDFS-3695 Key: HDFS-3792 URL: https://issues.apache.org/jira/browse/HDFS-3792 Project: Hadoop HDFS Issue Type: Bug Components: build, name-node Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Trivial Attachments: hdfs-3792.txt Accidentally introduced two trivial findbugs warnings in HDFS-3695. This JIRA is to fix them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling
[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433788#comment-13433788 ] Andrew Wang commented on HDFS-3672: --- Thanks everyone for all your input! Here's another spin of the patch. Big things: * I renamed the Disk* classes to BlockStorageLocation and VolumeId, and tried to update all the javadoc/comments. * I split out most of the DFSClient code into a new BlockStorageLocationUtil class, which is ~300 lines of static methods. I pulled apart one of the long methods. Doing this for the other long method would arguably be messier, so I left it. * Added the DN-side config option. If any of the DNs throws an UnsupportedOperationException, it's bubbled up to the client (thus failing the entire call). The client-side code also checks for the same DN config option, so you need to enable it in both the client and DN for this to do anything. * Bumped the DN handler count to 10. I think Suresh's other more minor comments are also addressed. Expose disk-location information for blocks to enable better scheduling --- Key: HDFS-3672 URL: https://issues.apache.org/jira/browse/HDFS-3672 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Attachments: design-doc-v1.pdf, design-doc-v2.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch, hdfs-3672-7.patch, hdfs-3672-8.patch, hdfs-3672-9.patch Currently, HDFS exposes on which datanodes a block resides, which allows clients to make scheduling decisions for locality and load balancing. Extending this to also expose on which disk on a datanode a block resides would enable even better scheduling, on a per-disk rather than coarse per-datanode basis. This API would likely look similar to Filesystem#getFileBlockLocations, but also involve a series of RPCs to the responsible datanodes to determine disk ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3795) QJM: validate journal dir at startup
[ https://issues.apache.org/jira/browse/HDFS-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3795: -- Attachment: hdfs-3795.txt Updated patch addresses ATM's feedback. QJM: validate journal dir at startup Key: HDFS-3795 URL: https://issues.apache.org/jira/browse/HDFS-3795 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hdfs-3795.txt, hdfs-3795.txt Currently, the JN does not validate the configured journal directory until it tries to write into it. This is counter-intuitive for users, since they would expect to find out about a misconfiguration at startup time, rather than on first access. Additionally, two testers accidentally configured the journal dir to be a URI, which the code accidentally understood as a relative path ({{CWD/file:/foo/bar}}. We should validate the config at startup to be an accessible absolute path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3792) Fix two findbugs introduced by HDFS-3695
[ https://issues.apache.org/jira/browse/HDFS-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3792: -- Resolution: Fixed Fix Version/s: 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk, thx for review, sorry for missing this. Fix two findbugs introduced by HDFS-3695 Key: HDFS-3792 URL: https://issues.apache.org/jira/browse/HDFS-3792 Project: Hadoop HDFS Issue Type: Bug Components: build, name-node Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Trivial Fix For: 3.0.0 Attachments: hdfs-3792.txt Accidentally introduced two trivial findbugs warnings in HDFS-3695. This JIRA is to fix them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3765) Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages
[ https://issues.apache.org/jira/browse/HDFS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433797#comment-13433797 ] Hadoop QA commented on HDFS-3765: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540746/hdfs-3765.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2995//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2995//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2995//console This message is automatically generated. Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages --- Key: HDFS-3765 URL: https://issues.apache.org/jira/browse/HDFS-3765 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 2.1.0-alpha, 3.0.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-3765.patch, HDFS-3765.patch, HDFS-3765.patch, hdfs-3765.txt, hdfs-3765.txt, hdfs-3765.txt Currently, NameNode INITIALIZESHAREDEDITS provides ability to copy the edits files to file schema based shared storages when moving cluster from Non-HA environment to HA enabled environment. This Jira focuses on the following * Generalizing the logic of copying the edits to new shared storage so that any schema based shared storage can initialized for HA cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3765) Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages
[ https://issues.apache.org/jira/browse/HDFS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433800#comment-13433800 ] Todd Lipcon commented on HDFS-3765: --- The findbugs warnings are from HDFS-3695. They have already been fixed in HDFS-3792 (committed just before this QA report) Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages --- Key: HDFS-3765 URL: https://issues.apache.org/jira/browse/HDFS-3765 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 2.1.0-alpha, 3.0.0 Reporter: Vinay Assignee: Vinay Attachments: HDFS-3765.patch, HDFS-3765.patch, HDFS-3765.patch, hdfs-3765.txt, hdfs-3765.txt, hdfs-3765.txt Currently, NameNode INITIALIZESHAREDEDITS provides ability to copy the edits files to file schema based shared storages when moving cluster from Non-HA environment to HA enabled environment. This Jira focuses on the following * Generalizing the logic of copying the edits to new shared storage so that any schema based shared storage can initialized for HA cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3799) QJM: handle empty log segments during recovery
Todd Lipcon created HDFS-3799: - Summary: QJM: handle empty log segments during recovery Key: HDFS-3799 URL: https://issues.apache.org/jira/browse/HDFS-3799 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon One of the cases not yet handled in the QJM branch is the one where either the writer or the journal node crashes after startLogSegment() but before it has written its first transaction to the log. We currently have TODO assertions in the code which fire in these cases. This JIRA is to deal with these cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3799) QJM: handle empty log segments during recovery
[ https://issues.apache.org/jira/browse/HDFS-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3799: -- Attachment: hdfs-3799.txt The solution is as follows: - during recovery, when we validate a log, if the log has no transactions, then we remove the file (same as if the log segment was never started) - when coordinating recovery, if none of the loggers have any non-empty logs, then we don't have to take any action. We can simply treat the recovery as a no-op. QJM: handle empty log segments during recovery -- Key: HDFS-3799 URL: https://issues.apache.org/jira/browse/HDFS-3799 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3799.txt One of the cases not yet handled in the QJM branch is the one where either the writer or the journal node crashes after startLogSegment() but before it has written its first transaction to the log. We currently have TODO assertions in the code which fire in these cases. This JIRA is to deal with these cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3723) All commands should support meaningful --help
[ https://issues.apache.org/jira/browse/HDFS-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3723: Attachment: HDFS-3723.001.patch All commands should support meaningful --help - Key: HDFS-3723 URL: https://issues.apache.org/jira/browse/HDFS-3723 Project: Hadoop HDFS Issue Type: Improvement Components: scripts, tools Affects Versions: 2.0.0-alpha Reporter: E. Sammer Assignee: Jing Zhao Attachments: HDFS-3723.001.patch, HDFS-3723.001.patch, HDFS-3723.patch, HDFS-3723.patch Some (sub)commands support -help or -h options for detailed help while others do not. Ideally, all commands should support meaningful help that works regardless of current state or configuration. For example, hdfs zkfc --help (or -h or -help) is not very useful. Option checking should occur before state / configuration checking. {code} [esammer@hadoop-fed01 ~]# hdfs zkfc --help Exception in thread main org.apache.hadoop.HadoopIllegalArgumentException: HA is not enabled for this namenode. at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.setConf(DFSZKFailoverController.java:122) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:66) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:168) {code} This would go a long way toward better usability for ops staff. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3800) QJM: improvements to QJM fault testing
Todd Lipcon created HDFS-3800: - Summary: QJM: improvements to QJM fault testing Key: HDFS-3800 URL: https://issues.apache.org/jira/browse/HDFS-3800 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon This JIRA improves TestQJMWithFaults as follows: - the current implementation didn't properly unwrap exceptions thrown by the reflection-based injection method. This caused some issues in the code where the injecting proxy didn't act quite like the original object. - the current implementation incorrectly assumed that the recovery process would recover to _exactly_ the last acked sequence number. In fact, it may recover to that transaction _or any greater transaction_. It also adds a new randomized test which uncovered a number of other bugs. I will defer to the included javadoc for a description of this test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3800) QJM: improvements to QJM fault testing
[ https://issues.apache.org/jira/browse/HDFS-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3800: -- Attachment: hdfs-3800.txt This patch applies on top of the following other staged patches: pick fac7f15 HDFS-3796. Allow EditLogFileOutputStream to skip fsync() in tests pick 5a18397 HDFS-3765. initializeSharedEdits pick 85978e7 HDFS-3793. Implement format() for QJM pick 4fc442e HDFS-3795. Validate journal dir at startup pick f2da880 HDFS-3798. Avoid throwing NPE if finalizeLogSegment() is called on an invalid segment pick b0d1a3d HDFS-3799. deal with empty files in recovery path pick d792847 HDFS-3797. Make journal() call take segmentTxId as parameter The new randomized test requires a few more patches on top before it passes reliably. However, I'd like to check it in and then get it passing reliably in follow-up JIRAs. QJM: improvements to QJM fault testing -- Key: HDFS-3800 URL: https://issues.apache.org/jira/browse/HDFS-3800 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3800.txt This JIRA improves TestQJMWithFaults as follows: - the current implementation didn't properly unwrap exceptions thrown by the reflection-based injection method. This caused some issues in the code where the injecting proxy didn't act quite like the original object. - the current implementation incorrectly assumed that the recovery process would recover to _exactly_ the last acked sequence number. In fact, it may recover to that transaction _or any greater transaction_. It also adds a new randomized test which uncovered a number of other bugs. I will defer to the included javadoc for a description of this test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3586) Blocks are not getting replicate even DN's are availble.
[ https://issues.apache.org/jira/browse/HDFS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433835#comment-13433835 ] Han Xiao commented on HDFS-3586: Uma, you are right. In my comments, there also be a condition (no in expression) as as long as good one exists(). Your description is exactly coninsiding with my ideas. Hope it will be done. Blocks are not getting replicate even DN's are availble. Key: HDFS-3586 URL: https://issues.apache.org/jira/browse/HDFS-3586 Project: Hadoop HDFS Issue Type: Bug Components: data-node, name-node Affects Versions: 2.0.0-alpha, 2.1.0-alpha, 3.0.0 Reporter: Brahma Reddy Battula Assignee: amith Attachments: HDFS-3586-analysis.txt Scenario: = Started four DN's(Say DN1,DN2,DN3 and DN4) writing files with RF=3.. formed pipeline with DN1-DN2-DN3. Since DN3 network is very slow.it's not able to send acks. Again pipeline is fromed with DN1-DN2-DN4. Here DN4 network is also slow. So finally commitblocksync happend tp DN1 and DN2 successfully. block present in all the four DN's(finalized state in two DN's and rbw state in another DN's).. Here NN is asking replicate to DN3 and DN4,but it's failing since replcia's are already present in RBW dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3788) distcp can't copy large files using webhdfs due to missing Content-Length header
[ https://issues.apache.org/jira/browse/HDFS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3788: - Attachment: h3788_20120813.patch h3788_20120813.patch: check content-length only for non-chunked transfer encoding. distcp can't copy large files using webhdfs due to missing Content-Length header Key: HDFS-3788 URL: https://issues.apache.org/jira/browse/HDFS-3788 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Eli Collins Priority: Critical Attachments: distcp-webhdfs-errors.txt, h3788_20120813.patch The following command fails when data1 contains a 3gb file. It passes when using hftp or when the directory just contains smaller (2gb) files, so looks like a webhdfs issue with large files. {{hadoop distcp webhdfs://eli-thinkpad:50070/user/eli/data1 hdfs://localhost:8020/user/eli/data2}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira