[jira] [Commented] (HDFS-1371) One bad node can incorrectly flag many files as corrupt
[ https://issues.apache.org/jira/browse/HDFS-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032263#comment-13032263 ] Konstantin Shvachko commented on HDFS-1371: --- I understand you are trying to avoid writing verification logic on the NN, and you don't want to trust clients, as they can be wrong. I agree with the first part of your design, which reports a bad replica if the client can read at least one. I disagree that failure of all replicas should be concealed from the NN. Can we do this: if a client reports all replicas corrupt, then NN chooses one of the remaining, which is presumed healthy, and adds it to corrupt. In this case one bad client cannot immediately corrupt the entire block, but if the block is really corrupt then eventually all replicas will be marked corrupt by different clients reading it. As I said not seeing something in the past doesn't mean you should not plan for it. In real life things may change quickly. You can get a shipment of faulty drives or get buggy software (not necessarily in Hadoop). With your solution you will not even know there is a problem. One bad node can incorrectly flag many files as corrupt --- Key: HDFS-1371 URL: https://issues.apache.org/jira/browse/HDFS-1371 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client, name-node Affects Versions: 0.20.1 Environment: yahoo internal version [knoguchi@gwgd4003 ~]$ hadoop version Hadoop 0.20.104.3.1007030707 Reporter: Koji Noguchi Assignee: Tanping Wang Attachments: HDFS-1371.04252011.patch, HDFS-1371.0503.patch On our cluster, 12 files were reported as corrupt by fsck even though the replicas on the datanodes were healthy. Turns out that all the replicas (12 files x 3 replicas per file) were reported corrupt from one node. Surprisingly, these files were still readable/accessible from dfsclient (-get/-cat) without any problems. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-1924) Block information displayed in UI is incorrect
Block information displayed in UI is incorrect --- Key: HDFS-1924 URL: https://issues.apache.org/jira/browse/HDFS-1924 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20-append Reporter: ramkrishna.s.vasudevan Priority: Minor Fix For: 0.20-append Problem statement Deleted blocks are not removed from the blockmap. Solution === Whenever delete is called the block entry must be removed from the block map and also move it to invalidates. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1891) TestBackupNode fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-1891: -- Affects Version/s: (was: 0.23.0) 0.22.0 0.22 should have the same problem. Could you please commit it to .22 TestBackupNode fails intermittently --- Key: HDFS-1891 URL: https://issues.apache.org/jira/browse/HDFS-1891 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: HDFS-1891.part2.patch, HDFS-1891.patch TestBackupNode fails due to unexpected ipv6 address format. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032268#comment-13032268 ] Bharath Mundlapudi commented on HDFS-1905: -- Cluster ID is displayed on the dfshealth web page. If we are having multiple clusters then having a proper cluster name defined by admins will be useful. If user executes the following command, then correct usage is indeed displayed. ./hdfs namenode -format -help This should be corrected from all the paths. Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1919) Upgrade to federated namespace fails
[ https://issues.apache.org/jira/browse/HDFS-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032270#comment-13032270 ] Suresh Srinivas commented on HDFS-1919: --- I do not see any restarts failing. Perhaps you did not clean build and the static LV was not updated in some classes. However, I do see that bumping up the layout version for 203, 22 and trunk (from HDFS-1842 and HDFS-1824) has caused problems with some of the version checks for new fsimage scheme of loading and checksum of edits etc. I will change the title of this bug and fix them. Upgrade to federated namespace fails Key: HDFS-1919 URL: https://issues.apache.org/jira/browse/HDFS-1919 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Suresh Srinivas Priority: Blocker Fix For: 0.23.0 Attachments: hdfs-1919.txt I formatted a namenode running off 0.22 branch, and trying to start it on trunk yields: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/name1 is in an inconsistent state: file VERSION has clusterID mising. It looks like 0.22 has LAYOUT_VERSION -33, but trunk has LAST_PRE_FEDERATION_LAYOUT_VERSION = -30, which is incorrect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-1925) SafeModeInfo should use DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT instead of 0.95
SafeModeInfo should use DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT instead of 0.95 --- Key: HDFS-1925 URL: https://issues.apache.org/jira/browse/HDFS-1925 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Fix For: 0.22.0 {{SafeMode()}} constructor has 0.95f default threshold hard-coded. This should be replaced by the constant {{DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT}}, which is correctly set to 0.999f. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1918) DataXceiver double logs every IOE out of readBlock
[ https://issues.apache.org/jira/browse/HDFS-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032280#comment-13032280 ] Todd Lipcon commented on HDFS-1918: --- Hmm, I half agree with your assessment. But, I think this patch will change the metrics behavior here, no? Maybe we still need to catch certain classes of exception (socket timeout and connection reset by peer) and treat them as successful block reads as far as metrics are concerned? (And probably log at DEBUG level instead of WARN)? DataXceiver double logs every IOE out of readBlock -- Key: HDFS-1918 URL: https://issues.apache.org/jira/browse/HDFS-1918 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.20.2 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Trivial Fix For: 0.22.0 Attachments: HDFS-1918.patch DataXceiver will log an IOE twice because opReadBlock() will catch it, log a WARN, then throw it again only to be caught in run() as a Throwable and logged as an ERROR. As far as I can tell all the information is the same in both messages. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1925) SafeModeInfo should use DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT instead of 0.95
[ https://issues.apache.org/jira/browse/HDFS-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-1925: - Labels: newbie (was: ) SafeModeInfo should use DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT instead of 0.95 --- Key: HDFS-1925 URL: https://issues.apache.org/jira/browse/HDFS-1925 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Labels: newbie Fix For: 0.22.0 {{SafeMode()}} constructor has 0.95f default threshold hard-coded. This should be replaced by the constant {{DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_DEFAULT}}, which is correctly set to 0.999f. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-671) Documentation change for updated configuration keys.
[ https://issues.apache.org/jira/browse/HDFS-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032285#comment-13032285 ] Dmitriy V. Ryaboy commented on HDFS-671: Guys, While working on getting Pig to play with 0.22, I was getting 35 errors for one of the test cases (TestGrunt). In the process of debugging, I noticed a lot of deprecation warnings concerning fs.default.name and changed the references across the Pig codebase to use fs.defaultFS, just to silence the noise. No other code changes were made between compilations. Suddenly the errors dropped to 13. I believe the cause is that Pig plays pretty loose with switching between Conf and using Properties directly, and was using fs.default.name all over the place. Seems like the deprecation warning and docs should user stronger language -- using the old string anywhere but in the xml file is likely to cause problems, as illustrated by this. Documentation change for updated configuration keys. Key: HDFS-671 URL: https://issues.apache.org/jira/browse/HDFS-671 Project: Hadoop HDFS Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Tom White Priority: Blocker Fix For: 0.22.0 Attachments: HDFS-671.patch HDFS-531, HADOOP-6233 and HDFS-631 have resulted in changes in several config keys. The hadoop documentation needs to be updated to reflect those changes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032311#comment-13032311 ] Hadoop QA commented on HDFS-1332: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478925/HDFS-1332.patch against trunk revision 1102153. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.namenode.TestOverReplicatedBlocks org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestHDFSTrash org.apache.hadoop.tools.TestJMXGet +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/493//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/493//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/493//console This message is automatically generated. When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save
[ https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032316#comment-13032316 ] Hadoop QA commented on HDFS-1505: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478913/hdfs-1505-trunk.1.patch against trunk revision 1102153. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.tools.TestJMXGet +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/492//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/492//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/492//console This message is automatically generated. saveNamespace appears to succeed even if all directories fail to save - Key: HDFS-1505 URL: https://issues.apache.org/jira/browse/HDFS-1505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Aaron T. Myers Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch After HDFS-1071, saveNamespace now appears to succeed even if all of the individual directories failed to save. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save
[ https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032388#comment-13032388 ] Hadoop QA commented on HDFS-1505: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478913/hdfs-1505-trunk.1.patch against trunk revision 1102153. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestHDFSTrash org.apache.hadoop.tools.TestJMXGet +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/495//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/495//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/495//console This message is automatically generated. saveNamespace appears to succeed even if all directories fail to save - Key: HDFS-1505 URL: https://issues.apache.org/jira/browse/HDFS-1505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Aaron T. Myers Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch After HDFS-1071, saveNamespace now appears to succeed even if all of the individual directories failed to save. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032400#comment-13032400 ] Hadoop QA commented on HDFS-1332: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478925/HDFS-1332.patch against trunk revision 1102153. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.namenode.TestNodeCount org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestHDFSTrash org.apache.hadoop.tools.TestJMXGet +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/496//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/496//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/496//console This message is automatically generated. When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-1926) Remove references to StorageDirectory from JournalManager interface
Remove references to StorageDirectory from JournalManager interface --- Key: HDFS-1926 URL: https://issues.apache.org/jira/browse/HDFS-1926 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1926) Remove references to StorageDirectory from JournalManager interface
[ https://issues.apache.org/jira/browse/HDFS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1926: - Description: The JournalManager interface introduced by HDFS-1799 has a getStorageDirectory method which is out of place in a generic interface. This JIRA removed that call by refactoring the error handling for FSEditLog. Each EditLogFileOutputStream is now a NNStorageListener and listens for error on it's containing StorageDirectory. If an error occurs from FSImage, the stream will be aborted. If the error occurs in FSEditLog, the stream will be aborted and NNStorage will be notified that the StorageDirectory is no longer valid. Remove references to StorageDirectory from JournalManager interface --- Key: HDFS-1926 URL: https://issues.apache.org/jira/browse/HDFS-1926 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly The JournalManager interface introduced by HDFS-1799 has a getStorageDirectory method which is out of place in a generic interface. This JIRA removed that call by refactoring the error handling for FSEditLog. Each EditLogFileOutputStream is now a NNStorageListener and listens for error on it's containing StorageDirectory. If an error occurs from FSImage, the stream will be aborted. If the error occurs in FSEditLog, the stream will be aborted and NNStorage will be notified that the StorageDirectory is no longer valid. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1926) Remove references to StorageDirectory from JournalManager interface
[ https://issues.apache.org/jira/browse/HDFS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1926: - Attachment: HDFS-1926.diff Remove references to StorageDirectory from JournalManager interface --- Key: HDFS-1926 URL: https://issues.apache.org/jira/browse/HDFS-1926 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: HDFS-1926.diff The JournalManager interface introduced by HDFS-1799 has a getStorageDirectory method which is out of place in a generic interface. This JIRA removed that call by refactoring the error handling for FSEditLog. Each EditLogFileOutputStream is now a NNStorageListener and listens for error on it's containing StorageDirectory. If an error occurs from FSImage, the stream will be aborted. If the error occurs in FSEditLog, the stream will be aborted and NNStorage will be notified that the StorageDirectory is no longer valid. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-1927) audit logs could ignore certain xsactions and also could contain ip=null
audit logs could ignore certain xsactions and also could contain ip=null -- Key: HDFS-1927 URL: https://issues.apache.org/jira/browse/HDFS-1927 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.2, 0.23.0 Reporter: John George Assignee: John George Namenode audit logs could be ignoring certain transactions that are successfully completed. This is because it check if the RemoteIP is null to decide if a transaction is remote or not. In certain cases, RemoteIP could return null but the xsaction could still be remote. An example is a case where a client gets killed while in the middle of the transaction. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1926) Remove references to StorageDirectory from JournalManager interface
[ https://issues.apache.org/jira/browse/HDFS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032427#comment-13032427 ] Ivan Kelly commented on HDFS-1926: -- One addendum is that I have temporarily put the code to format edits directories in FSImage#formatOccurred. I did this because the NNStorageListener is not on the streams and these are not created at the time of a format, so formatting would not occur. I assume that HDFS-1073 will get rid of these style of format anyhow, so once sequential editlog filenames are implemented, this can be directly deleted. At this stage NNStorageListener could be reevaluated as a lot of it's usefulness will no longer be needed. Remove references to StorageDirectory from JournalManager interface --- Key: HDFS-1926 URL: https://issues.apache.org/jira/browse/HDFS-1926 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: HDFS-1926.diff The JournalManager interface introduced by HDFS-1799 has a getStorageDirectory method which is out of place in a generic interface. This JIRA removed that call by refactoring the error handling for FSEditLog. Each EditLogFileOutputStream is now a NNStorageListener and listens for error on it's containing StorageDirectory. If an error occurs from FSImage, the stream will be aborted. If the error occurs in FSEditLog, the stream will be aborted and NNStorage will be notified that the StorageDirectory is no longer valid. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1926) Remove references to StorageDirectory from JournalManager interface
[ https://issues.apache.org/jira/browse/HDFS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1926: - Status: Patch Available (was: Open) Remove references to StorageDirectory from JournalManager interface --- Key: HDFS-1926 URL: https://issues.apache.org/jira/browse/HDFS-1926 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: HDFS-1926.diff The JournalManager interface introduced by HDFS-1799 has a getStorageDirectory method which is out of place in a generic interface. This JIRA removed that call by refactoring the error handling for FSEditLog. Each EditLogFileOutputStream is now a NNStorageListener and listens for error on it's containing StorageDirectory. If an error occurs from FSImage, the stream will be aborted. If the error occurs in FSEditLog, the stream will be aborted and NNStorage will be notified that the StorageDirectory is no longer valid. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HDFS-1787) Not enough xcievers error should propagate to client
[ https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-1787 started by Jonathan Hsieh. Not enough xcievers error should propagate to client -- Key: HDFS-1787 URL: https://issues.apache.org/jira/browse/HDFS-1787 Project: Hadoop HDFS Issue Type: Improvement Reporter: Todd Lipcon Assignee: Jonathan Hsieh Labels: newbie We find that users often run into the default transceiver limits in the DN. Putting aside the inherent issues with xceiver threads, it would be nice if the xceiver limit exceeded error propagated to the client. Currently, clients simply see an EOFException which is hard to interpret, and have to go slogging through DN logs to find the underlying issue. The data transfer protocol should be extended to either have a special error code for not enough xceivers or should have some error code for generic errors with which a string can be attached and propagated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1926) Remove references to StorageDirectory from JournalManager interface
[ https://issues.apache.org/jira/browse/HDFS-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032439#comment-13032439 ] Hadoop QA commented on HDFS-1926: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478972/HDFS-1926.diff against trunk revision 1102153. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2511 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/498//console This message is automatically generated. Remove references to StorageDirectory from JournalManager interface --- Key: HDFS-1926 URL: https://issues.apache.org/jira/browse/HDFS-1926 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: HDFS-1926.diff The JournalManager interface introduced by HDFS-1799 has a getStorageDirectory method which is out of place in a generic interface. This JIRA removed that call by refactoring the error handling for FSEditLog. Each EditLogFileOutputStream is now a NNStorageListener and listens for error on it's containing StorageDirectory. If an error occurs from FSImage, the stream will be aborted. If the error occurs in FSEditLog, the stream will be aborted and NNStorage will be notified that the StorageDirectory is no longer valid. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1787) Not enough xcievers error should propagate to client
[ https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HDFS-1787: - Attachment: hdfs-1787.patch This patch updates the max transfers/xceivers message so that it gets propagated to the dfs client. I was able to write a reasonable test for the write side, but the read side requires a change to hadoop common. FSDataOuptutStream for a the write side has a getWrappedStream method, but the FSDataInputStream class for the read side does not have or expose this. Not enough xcievers error should propagate to client -- Key: HDFS-1787 URL: https://issues.apache.org/jira/browse/HDFS-1787 Project: Hadoop HDFS Issue Type: Improvement Reporter: Todd Lipcon Assignee: Jonathan Hsieh Labels: newbie Attachments: hdfs-1787.patch We find that users often run into the default transceiver limits in the DN. Putting aside the inherent issues with xceiver threads, it would be nice if the xceiver limit exceeded error propagated to the client. Currently, clients simply see an EOFException which is hard to interpret, and have to go slogging through DN logs to find the underlying issue. The data transfer protocol should be extended to either have a special error code for not enough xceivers or should have some error code for generic errors with which a string can be attached and propagated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1787) Not enough xcievers error should propagate to client
[ https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HDFS-1787: - Release Note: This changes the DataTransferProtocol to return a new error code when a max tranfers exceeded messages is encountered. Status: Patch Available (was: In Progress) Not enough xcievers error should propagate to client -- Key: HDFS-1787 URL: https://issues.apache.org/jira/browse/HDFS-1787 Project: Hadoop HDFS Issue Type: Improvement Reporter: Todd Lipcon Assignee: Jonathan Hsieh Labels: newbie Attachments: hdfs-1787.patch We find that users often run into the default transceiver limits in the DN. Putting aside the inherent issues with xceiver threads, it would be nice if the xceiver limit exceeded error propagated to the client. Currently, clients simply see an EOFException which is hard to interpret, and have to go slogging through DN logs to find the underlying issue. The data transfer protocol should be extended to either have a special error code for not enough xceivers or should have some error code for generic errors with which a string can be attached and propagated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1903) Fix path display for rm/rmr
[ https://issues.apache.org/jira/browse/HDFS-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032444#comment-13032444 ] Daryn Sharp commented on HDFS-1903: --- This patch is ready for integration. Note it only resolves test issues with rm, not all test issues. Fix path display for rm/rmr --- Key: HDFS-1903 URL: https://issues.apache.org/jira/browse/HDFS-1903 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 0.23.0 Attachments: HDFS-1903.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1927) audit logs could ignore certain xsactions and also could contain ip=null
[ https://issues.apache.org/jira/browse/HDFS-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-1927: -- Attachment: HDFS-1927.patch audit logs could ignore certain xsactions and also could contain ip=null -- Key: HDFS-1927 URL: https://issues.apache.org/jira/browse/HDFS-1927 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.2, 0.23.0 Reporter: John George Assignee: John George Attachments: HDFS-1927.patch Namenode audit logs could be ignoring certain transactions that are successfully completed. This is because it check if the RemoteIP is null to decide if a transaction is remote or not. In certain cases, RemoteIP could return null but the xsaction could still be remote. An example is a case where a client gets killed while in the middle of the transaction. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1927) audit logs could ignore certain xsactions and also could contain ip=null
[ https://issues.apache.org/jira/browse/HDFS-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-1927: -- Affects Version/s: (was: 0.20.2) Status: Patch Available (was: Open) audit logs could ignore certain xsactions and also could contain ip=null -- Key: HDFS-1927 URL: https://issues.apache.org/jira/browse/HDFS-1927 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: John George Assignee: John George Attachments: HDFS-1927.patch Namenode audit logs could be ignoring certain transactions that are successfully completed. This is because it check if the RemoteIP is null to decide if a transaction is remote or not. In certain cases, RemoteIP could return null but the xsaction could still be remote. An example is a case where a client gets killed while in the middle of the transaction. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save
[ https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032482#comment-13032482 ] Aaron T. Myers commented on HDFS-1505: -- I believe the test failures are unrelated. All of these are presently failing on trunk. saveNamespace appears to succeed even if all directories fail to save - Key: HDFS-1505 URL: https://issues.apache.org/jira/browse/HDFS-1505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Aaron T. Myers Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch After HDFS-1071, saveNamespace now appears to succeed even if all of the individual directories failed to save. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1787) Not enough xcievers error should propagate to client
[ https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032487#comment-13032487 ] Hadoop QA commented on HDFS-1787: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478976/hdfs-1787.patch against trunk revision 1102153. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.tools.TestJMXGet +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/499//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/499//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/499//console This message is automatically generated. Not enough xcievers error should propagate to client -- Key: HDFS-1787 URL: https://issues.apache.org/jira/browse/HDFS-1787 Project: Hadoop HDFS Issue Type: Improvement Reporter: Todd Lipcon Assignee: Jonathan Hsieh Labels: newbie Attachments: hdfs-1787.patch We find that users often run into the default transceiver limits in the DN. Putting aside the inherent issues with xceiver threads, it would be nice if the xceiver limit exceeded error propagated to the client. Currently, clients simply see an EOFException which is hard to interpret, and have to go slogging through DN logs to find the underlying issue. The data transfer protocol should be extended to either have a special error code for not enough xceivers or should have some error code for generic errors with which a string can be attached and propagated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1725) Set storage directories only at FSImage construction (was Cleanup FSImage construction)
[ https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1725: - Status: Patch Available (was: Open) Set storage directories only at FSImage construction (was Cleanup FSImage construction) --- Key: HDFS-1725 URL: https://issues.apache.org/jira/browse/HDFS-1725 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: Edit log branch (HDFS-1073) Attachments: HDFS-1725-review-guide.pdf, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.patch HDFS-1580 proposes extending FSEditLog to allow it to use editlog streams which are not backed by StorageDirectory. Currently, to set the the directories used for edits, NNStorage#setStorageDirectory is called with a list of URIs as the second argument. NNStorage takes this list or URIs, takes all file:/// URIs and adds them to its StorageDirectory list. Then, when opened, FSEditLog will request a list of StorageDirectories from NNStorage and create a list of EditLogOutputStreams based on these. This approach cannot work with HDFS-1580. NNStorage exists solely to deal with filesystem based storage. As such, only StorageDirectories can be retrieved from NNStorage by FSEditLog. So, FSEditLog should get the URI from some place other than NNStorage. This presents a further problem, in that, NNStorage#setStorageDirectories is the current way of setting the URIs for images and edits. This call can happen at any time, so the directories in NNStorage can change at any time. If FSEditLog is to get its URIs from elsewhere, this opens up the risk of the filesystem directories in NNStorage and filesystem URIs being out of sync. A solution to this is to stipulate that the URIs for NNStorage are set only once, on construction. All proper uses of NNStorage#setStorageDirectories are being called just after construction of the image in any case. All other cases are using NNStorage#setStorageDirectories not to set the storage directories, but for the side effects of this call. This guide explains these other cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1787) Not enough xcievers error should propagate to client
[ https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032490#comment-13032490 ] Jonathan Hsieh commented on HDFS-1787: -- I will look into these newly failing tests. Not enough xcievers error should propagate to client -- Key: HDFS-1787 URL: https://issues.apache.org/jira/browse/HDFS-1787 Project: Hadoop HDFS Issue Type: Improvement Reporter: Todd Lipcon Assignee: Jonathan Hsieh Labels: newbie Attachments: hdfs-1787.patch We find that users often run into the default transceiver limits in the DN. Putting aside the inherent issues with xceiver threads, it would be nice if the xceiver limit exceeded error propagated to the client. Currently, clients simply see an EOFException which is hard to interpret, and have to go slogging through DN logs to find the underlying issue. The data transfer protocol should be extended to either have a special error code for not enough xceivers or should have some error code for generic errors with which a string can be attached and propagated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1725) Set storage directories only at FSImage construction (was Cleanup FSImage construction)
[ https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1725: - Status: Open (was: Patch Available) Set storage directories only at FSImage construction (was Cleanup FSImage construction) --- Key: HDFS-1725 URL: https://issues.apache.org/jira/browse/HDFS-1725 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: Edit log branch (HDFS-1073) Attachments: HDFS-1725-review-guide.pdf, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.patch HDFS-1580 proposes extending FSEditLog to allow it to use editlog streams which are not backed by StorageDirectory. Currently, to set the the directories used for edits, NNStorage#setStorageDirectory is called with a list of URIs as the second argument. NNStorage takes this list or URIs, takes all file:/// URIs and adds them to its StorageDirectory list. Then, when opened, FSEditLog will request a list of StorageDirectories from NNStorage and create a list of EditLogOutputStreams based on these. This approach cannot work with HDFS-1580. NNStorage exists solely to deal with filesystem based storage. As such, only StorageDirectories can be retrieved from NNStorage by FSEditLog. So, FSEditLog should get the URI from some place other than NNStorage. This presents a further problem, in that, NNStorage#setStorageDirectories is the current way of setting the URIs for images and edits. This call can happen at any time, so the directories in NNStorage can change at any time. If FSEditLog is to get its URIs from elsewhere, this opens up the risk of the filesystem directories in NNStorage and filesystem URIs being out of sync. A solution to this is to stipulate that the URIs for NNStorage are set only once, on construction. All proper uses of NNStorage#setStorageDirectories are being called just after construction of the image in any case. All other cases are using NNStorage#setStorageDirectories not to set the storage directories, but for the side effects of this call. This guide explains these other cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1725) Set storage directories only at FSImage construction (was Cleanup FSImage construction)
[ https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1725: - Attachment: HDFS-1725.diff Brought up to date with current HDFS-1073 Set storage directories only at FSImage construction (was Cleanup FSImage construction) --- Key: HDFS-1725 URL: https://issues.apache.org/jira/browse/HDFS-1725 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: Edit log branch (HDFS-1073) Attachments: HDFS-1725-review-guide.pdf, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.patch HDFS-1580 proposes extending FSEditLog to allow it to use editlog streams which are not backed by StorageDirectory. Currently, to set the the directories used for edits, NNStorage#setStorageDirectory is called with a list of URIs as the second argument. NNStorage takes this list or URIs, takes all file:/// URIs and adds them to its StorageDirectory list. Then, when opened, FSEditLog will request a list of StorageDirectories from NNStorage and create a list of EditLogOutputStreams based on these. This approach cannot work with HDFS-1580. NNStorage exists solely to deal with filesystem based storage. As such, only StorageDirectories can be retrieved from NNStorage by FSEditLog. So, FSEditLog should get the URI from some place other than NNStorage. This presents a further problem, in that, NNStorage#setStorageDirectories is the current way of setting the URIs for images and edits. This call can happen at any time, so the directories in NNStorage can change at any time. If FSEditLog is to get its URIs from elsewhere, this opens up the risk of the filesystem directories in NNStorage and filesystem URIs being out of sync. A solution to this is to stipulate that the URIs for NNStorage are set only once, on construction. All proper uses of NNStorage#setStorageDirectories are being called just after construction of the image in any case. All other cases are using NNStorage#setStorageDirectories not to set the storage directories, but for the side effects of this call. This guide explains these other cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1787) Not enough xcievers error should propagate to client
[ https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032495#comment-13032495 ] Aaron T. Myers commented on HDFS-1787: -- You're encouraged to look at the test failures, but I'm pretty confident they're unrelated to this patch. Those tests are known to be failing on trunk. Not enough xcievers error should propagate to client -- Key: HDFS-1787 URL: https://issues.apache.org/jira/browse/HDFS-1787 Project: Hadoop HDFS Issue Type: Improvement Reporter: Todd Lipcon Assignee: Jonathan Hsieh Labels: newbie Attachments: hdfs-1787.patch We find that users often run into the default transceiver limits in the DN. Putting aside the inherent issues with xceiver threads, it would be nice if the xceiver limit exceeded error propagated to the client. Currently, clients simply see an EOFException which is hard to interpret, and have to go slogging through DN logs to find the underlying issue. The data transfer protocol should be extended to either have a special error code for not enough xceivers or should have some error code for generic errors with which a string can be attached and propagated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1458) Improve checkpoint performance by avoiding unnecessary image downloads
[ https://issues.apache.org/jira/browse/HDFS-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032501#comment-13032501 ] Hairong Kuang commented on HDFS-1458: - Todd, you need the fix to HDFS-1627. We already run this and HDFS-1627, together with image compression for around 2 months on our large cluster. All seem to pretty stable and have improved NN availability/responsiveness a lot. Improve checkpoint performance by avoiding unnecessary image downloads -- Key: HDFS-1458 URL: https://issues.apache.org/jira/browse/HDFS-1458 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: checkpoint-checkfsimageissame.patch, trunkNoDownloadImage.patch, trunkNoDownloadImage1.patch, trunkNoDownloadImage2.patch, trunkNoDownloadImage3.patch If secondary namenode could verify that the image it has on its disk is the same as the one in the primary NameNode, it could skip downloading the image from the primary NN, thus completely eliminating the image download overhead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1917) Clean up duplication of dependent jar files
[ https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032515#comment-13032515 ] Eric Yang commented on HDFS-1917: - Patch for this jira is going to assume that hadoop-common third party jar files can be referenced from HADOOP_HOME/lib until HADOOP-6255 and proposed HADOOP_PREFIX take place. Where HADOOP_HOME is the PREFIX directory of hadoop-common-0.2x.y = hadoop-hdfs-0.2x.y = hadoop-mapred-0.2x.y. Clean up duplication of dependent jar files --- Key: HDFS-1917 URL: https://issues.apache.org/jira/browse/HDFS-1917 Project: Hadoop HDFS Issue Type: Bug Components: build Affects Versions: 0.23.0 Environment: Java 6, RHEL 5.5 Reporter: Eric Yang For trunk, the build and deployment tree look like this: hadoop-common-0.2x.y hadoop-hdfs-0.2x.y hadoop-mapred-0.2x.y Technically, hdfs's the third party dependent jar files should be fetch from hadoop-common. However, it is currently fetching from hadoop-hdfs/lib only. It would be nice to eliminate the need to repeat duplicated jar files at build time. There are two options to manage this dependency list, continue to enhance ant build structure to fetch and filter jar file dependencies using ivy. On the other hand, it would be a good opportunity to convert the build structure to maven, and use maven to manage the provided jar files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032517#comment-13032517 ] Tsz Wo (Nicholas), SZE commented on HDFS-1332: -- - Why not creating the reason string directly but first creating a HashMap map? - reason does not seem a good variable name. How about failingReason? - Have you tested your patch? When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1869) mkdirs should use the supplied permission for all of the created directories
[ https://issues.apache.org/jira/browse/HDFS-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032520#comment-13032520 ] Tsz Wo (Nicholas), SZE commented on HDFS-1869: -- Would it work if the given permission does not have x, for example 0600? mkdirs should use the supplied permission for all of the created directories Key: HDFS-1869 URL: https://issues.apache.org/jira/browse/HDFS-1869 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-1869.patch Mkdirs only uses the supplied FsPermission for the last directory of the path. Paths 0..N-1 will all inherit the parent dir's permissions -even if- inheritPermission is false. This is a regression from somewhere around 0.20.9 and does not follow posix semantics. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1893) Change edit logs and images to be named based on txid
[ https://issues.apache.org/jira/browse/HDFS-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032533#comment-13032533 ] Ivan Kelly commented on HDFS-1893: -- Mostly looks good. There's a couple of TODOs and things which will need to be addressed, but I guess there's be a polishing JIRA before any merge back into trunk. I don't like the call to finalizeLogSegment from JournalAndStream. A member variable is being maintained segmentStartsAtTxId, which would be better incapsulated inside the JournalAndStream or even the EditLogOutputStream itself, as it is a property of the segment, not that which is writing to it. I understand the rational for keeping this code out of EditLogOutputStream, but I don't understand why this needs to be called from JournalAndStream. I think it would be better for the stream to notify it's manager whenever it is closed. This way the segment is _always_ finalised on a close. So, I would propose the following. {code} public class EditLogFileOutputStream { public interface ClosureListener { public void streamClosed(); } private final ClosureListener listener; public EditLogFileOutputStream(File name, int size, ClosureListener listener); public void close() { ... listener.streamClosed(); } } public FileJournalManager implements JournalManager, EditLogFileOutputStream.ClosureListener { // etc, etc EditLogOutputStream startLogSegment(long txid) { return new EditLogFileOutputStream(file, sizeOutputFlushBuffer, this); } void streamClosed() { // what finalize current does. } } {code} This removes the need for FSEditLog to know anything about the lifecycle of the streams. It currently has to know that finalizeLogSegment has to be called after stream close, which is clunky. Change edit logs and images to be named based on txid - Key: HDFS-1893 URL: https://issues.apache.org/jira/browse/HDFS-1893 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) Attachments: hdfs-1893-prelim.txt This is the main subtask of HDFS-1073: actually switch over the naming of the files to the new format as described in the design doc. I imagine it will be split out into a couple separate JIRAs before being committed, but this still be the big kahuna patch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032536#comment-13032536 ] Ted Yu commented on HDFS-1332: -- I created the HashMap because there could be multiple datanodes that were not good target. When I tried to access https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/496/, I saw it seemed to be stuck. I couldn't see the exact cause for individual test failure. I ran all the newly reported failed tests in Eclipse: org.apache.hadoop.hdfs.server.namenode.TestNodeCount, TestHDFSTrash along with TestFileConcurrentReader and TestDFSStorageStateRecovery I mentioned yesterday. I have renamed the reason variable in my next patch. Thanks for the review. When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-1332: - Attachment: (was: HDFS-1332.patch) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-1332: - Attachment: HDFS-1332.patch Updated name of reason variable according to Nicolas' comment. When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032553#comment-13032553 ] Tsz Wo (Nicholas), SZE commented on HDFS-1332: -- It seems that logging the reasons is very expensive. So we should only log it in when debug is enabled. When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032555#comment-13032555 ] Ted Yu commented on HDFS-1332: -- For TestHDFSTrash, I removed my changes in BlockPlacementPolicyDefault, recompiled and reran the test on commandline using MacBook. I got: {code} Testcase: testTrashEmptier took 0.001 sec Caused an ERROR Timeout occurred. Please note the time in the report does not reflect the time until the timeout. junit.framework.AssertionFailedError: Timeout occurred. Please note the time in the report does not reflect the time until the timeout. {code} When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032556#comment-13032556 ] Matt Foley commented on HDFS-1905: -- bq. What is a use case in which a cluster ID would be manually specified? I can imagine some nasty manual recovery process where you might wish to initialize a clean environment with a specified clusterId, followed by manual injection of data recovered from image and edits files. Not something I would want to do :-) but probably should be supported. However, I agree -format without other args should create a new cid if no old cid is available. A prompt would be appropriate, same as currently done with re-use of an available old cid. Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032560#comment-13032560 ] Tsz Wo (Nicholas), SZE commented on HDFS-1332: -- Hi Ted, although your patch is only adding log messages, it actually may cause significant performance degradation in namenode since {{BlockPlacementPolicyDefault}} is invoked frequently. It seems that creating the HashMap and the strings for logging are too expensive to be executed. So, all such activities should be executed only in debug mode. When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032568#comment-13032568 ] Ted Yu commented on HDFS-1332: -- How about adding a static boolean, blockPlacementDebug, in BlockPlacementPolicyDefault which is set to true based on System.getenv(BLOCK_PLACEMENT_DEBUG) carrying true ? Then creating the HashMap and the strings for logging would be done only if this boolean is true. When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1920) libhdfs does not build for ARM processors
[ https://issues.apache.org/jira/browse/HDFS-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Robinson updated HDFS-1920: -- Status: Patch Available (was: Open) libhdfs does not build for ARM processors - Key: HDFS-1920 URL: https://issues.apache.org/jira/browse/HDFS-1920 Project: Hadoop HDFS Issue Type: Bug Components: contrib/libhdfs Affects Versions: 0.21.0 Environment: $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper Target: arm-linux-gnueabi Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.5.2-8ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.5 --enable-shared --enable-multiarch --with-multiarch-defaults=arm-linux-gnueabi --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib/arm-linux-gnueabi --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.5 --libdir=/usr/lib/arm-linux-gnueabi --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi Thread model: posix gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) $ uname -a Linux panda0 2.6.38-1002-linaro-omap #3-Ubuntu SMP Fri Apr 15 14:00:54 UTC 2011 armv7l armv7l armv7l GNU/Linux Reporter: Trevor Robinson Attachments: hadoop-hdfs-arm.patch $ ant compile -Dcompile.native=true -Dcompile.c++=1 -Dlibhdfs=1 -Dfusedfs=1 ... create-libhdfs-configure: ... [exec] configure: error: Unsupported CPU architecture armv7l Once the CPU arch check is fixed in src/c++/libhdfs/m4/apsupport.m4, then next issue is -m32: $ ant compile -Dcompile.native=true -Dcompile.c++=1 -Dlibhdfs=1 -Dfusedfs=1 ... compile-c++-libhdfs: [exec] /bin/bash ./libtool --tag=CC --mode=compile gcc -DPACKAGE_NAME=\libhdfs\ -DPACKAGE_TARNAME=\libhdfs\ -DPACKAGE_VERSION=\0.1.0\ -DPACKAGE_STRING=\libhdfs\ 0.1.0\ -DPACKAGE_BUGREPORT=\omal...@apache.org\ -DPACKAGE_URL=\\ -DPACKAGE=\libhdfs\ -DVERSION=\0.1.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -Dsize_t=unsigned\ int -Dconst=/\*\*/ -Dvolatile=/\*\*/ -I. -I/home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs -g -O2 -DOS_LINUX -DDSO_DLFCN -DCPU=\arm\ -m32 -I/usr/lib/jvm/java-6-openjdk/include -I/usr/lib/jvm/java-6-openjdk/include/arm -Wall -Wstrict-prototypes -MT hdfs.lo -MD -MP -MF .deps/hdfs.Tpo -c -o hdfs.lo /home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs/hdfs.c [exec] make: Warning: File `.deps/hdfs_write.Po' has modification time 2.1 s in the future [exec] libtool: compile: gcc -DPACKAGE_NAME=\libhdfs\ -DPACKAGE_TARNAME=\libhdfs\ -DPACKAGE_VERSION=\0.1.0\ -DPACKAGE_STRING=\libhdfs 0.1.0\ -DPACKAGE_BUGREPORT=\omal...@apache.org\ -DPACKAGE_URL=\\ -DPACKAGE=\libhdfs\ -DVERSION=\0.1.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -Dsize_t=unsigned int -Dconst=/**/ -Dvolatile=/**/ -I. -I/home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs -g -O2 -DOS_LINUX -DDSO_DLFCN -DCPU=\arm\ -m32 -I/usr/lib/jvm/java-6-openjdk/include -I/usr/lib/jvm/java-6-openjdk/include/arm -Wall -Wstrict-prototypes -MT hdfs.lo -MD -MP -MF .deps/hdfs.Tpo -c /home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs/hdfs.c -fPIC -DPIC -o .libs/hdfs.o [exec] cc1: error: unrecognized command line option -m32 [exec] make: *** [hdfs.lo] Error 1 Here, gcc does not support -m32 for the ARM target, so -m${JVM_ARCH} must be omitted from CFLAGS and LDFLAGS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1627) Fix NullPointerException in Secondary NameNode
[ https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1627: Attachment: NPE_SNN1.patch A patch with a unit test. Fix NullPointerException in Secondary NameNode -- Key: HDFS-1627 URL: https://issues.apache.org/jira/browse/HDFS-1627 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: NPE_SNN.patch, NPE_SNN1.patch Secondary NameNode should not reset namespace if no new image is downloaded from the primary NameNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1927) audit logs could ignore certain xsactions and also could contain ip=null
[ https://issues.apache.org/jira/browse/HDFS-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032585#comment-13032585 ] Hadoop QA commented on HDFS-1927: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478979/HDFS-1927.patch against trunk revision 1102153. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/500//testReport/ Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/500//console This message is automatically generated. audit logs could ignore certain xsactions and also could contain ip=null -- Key: HDFS-1927 URL: https://issues.apache.org/jira/browse/HDFS-1927 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: John George Assignee: John George Attachments: HDFS-1927.patch Namenode audit logs could be ignoring certain transactions that are successfully completed. This is because it check if the RemoteIP is null to decide if a transaction is remote or not. In certain cases, RemoteIP could return null but the xsaction could still be remote. An example is a case where a client gets killed while in the middle of the transaction. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1627) Fix NullPointerException in Secondary NameNode
[ https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1627: Status: Open (was: Patch Available) Fix NullPointerException in Secondary NameNode -- Key: HDFS-1627 URL: https://issues.apache.org/jira/browse/HDFS-1627 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: NPE_SNN.patch, NPE_SNN1.patch Secondary NameNode should not reset namespace if no new image is downloaded from the primary NameNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1627) Fix NullPointerException in Secondary NameNode
[ https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1627: Status: Patch Available (was: Open) Fix NullPointerException in Secondary NameNode -- Key: HDFS-1627 URL: https://issues.apache.org/jira/browse/HDFS-1627 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: NPE_SNN.patch, NPE_SNN1.patch Secondary NameNode should not reset namespace if no new image is downloaded from the primary NameNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1627) Fix NullPointerException in Secondary NameNode
[ https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032591#comment-13032591 ] dhruba borthakur commented on HDFS-1627: +1, looks good to me. Fix NullPointerException in Secondary NameNode -- Key: HDFS-1627 URL: https://issues.apache.org/jira/browse/HDFS-1627 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: NPE_SNN.patch, NPE_SNN1.patch Secondary NameNode should not reset namespace if no new image is downloaded from the primary NameNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save
[ https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032592#comment-13032592 ] Matt Foley commented on HDFS-1505: -- Regarding the check for {code} +if (storage.getNumStorageDirs(NameNodeDirType.IMAGE) == 0 +storage.getNumStorageDirs(NameNodeDirType.IMAGE_AND_EDITS) == 0) { + throw new IOException(Failed to save any storage directories while saving namespace); {code} Isn't the desired check actually {code} +if (storage.getNumStorageDirs(NameNodeDirType.IMAGE) == 0 || +storage.getNumStorageDirs(NameNodeDirType.EDITS) == 0) { + throw new IOException(Failed to save at least one storage directory for both IMAGE and EDITS while saving namespace); {code} Also, since HDFS-1826 copied the concurrent saveNamespace() logic into FSImage.doUpgrade(), would you please add the same code fragment to the end of doUpgrade(), and a corresponding corruption unit test case to TestDFSUpgrade? Thanks. saveNamespace appears to succeed even if all directories fail to save - Key: HDFS-1505 URL: https://issues.apache.org/jira/browse/HDFS-1505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Aaron T. Myers Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch After HDFS-1071, saveNamespace now appears to succeed even if all of the individual directories failed to save. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1725) Set storage directories only at FSImage construction (was Cleanup FSImage construction)
[ https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032595#comment-13032595 ] Hadoop QA commented on HDFS-1725: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478988/HDFS-1725.diff against trunk revision 1102153. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/502//console This message is automatically generated. Set storage directories only at FSImage construction (was Cleanup FSImage construction) --- Key: HDFS-1725 URL: https://issues.apache.org/jira/browse/HDFS-1725 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: Edit log branch (HDFS-1073) Attachments: HDFS-1725-review-guide.pdf, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.patch HDFS-1580 proposes extending FSEditLog to allow it to use editlog streams which are not backed by StorageDirectory. Currently, to set the the directories used for edits, NNStorage#setStorageDirectory is called with a list of URIs as the second argument. NNStorage takes this list or URIs, takes all file:/// URIs and adds them to its StorageDirectory list. Then, when opened, FSEditLog will request a list of StorageDirectories from NNStorage and create a list of EditLogOutputStreams based on these. This approach cannot work with HDFS-1580. NNStorage exists solely to deal with filesystem based storage. As such, only StorageDirectories can be retrieved from NNStorage by FSEditLog. So, FSEditLog should get the URI from some place other than NNStorage. This presents a further problem, in that, NNStorage#setStorageDirectories is the current way of setting the URIs for images and edits. This call can happen at any time, so the directories in NNStorage can change at any time. If FSEditLog is to get its URIs from elsewhere, this opens up the risk of the filesystem directories in NNStorage and filesystem URIs being out of sync. A solution to this is to stipulate that the URIs for NNStorage are set only once, on construction. All proper uses of NNStorage#setStorageDirectories are being called just after construction of the image in any case. All other cases are using NNStorage#setStorageDirectories not to set the storage directories, but for the side effects of this call. This guide explains these other cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart
[ https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley reassigned HDFS-1921: Assignee: Matt Foley Save namespace can cause NN to be unable to come up on restart -- Key: HDFS-1921 URL: https://issues.apache.org/jira/browse/HDFS-1921 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Aaron T. Myers Assignee: Matt Foley Priority: Blocker Fix For: 0.22.0, 0.23.0 I discovered this in the course of trying to implement a fix for HDFS-1505. Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save namespace proceeds in the following order: # rename current to lastcheckpoint.tmp for all of them, # save image and recreate edits for all of them, # rename lastcheckpoint.tmp to previous.checkpoint. The problem is that step 3 occurs regardless of whether or not an error occurs for all storage directories in step 2. Upon restart, the NN will see non-existent or corrupt {{current}} directories, and no {{lastcheckpoint.tmp}} directories, and so will conclude that the storage directories are not formatted. This issue appears to be present on both 0.22 and 0.23. This should arguably be a 0.22/0.23 blocker. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart
[ https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032597#comment-13032597 ] Matt Foley commented on HDFS-1921: -- I will propose a patch for this, unless Dmytro wants it. Save namespace can cause NN to be unable to come up on restart -- Key: HDFS-1921 URL: https://issues.apache.org/jira/browse/HDFS-1921 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Aaron T. Myers Assignee: Matt Foley Priority: Blocker Fix For: 0.22.0, 0.23.0 I discovered this in the course of trying to implement a fix for HDFS-1505. Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save namespace proceeds in the following order: # rename current to lastcheckpoint.tmp for all of them, # save image and recreate edits for all of them, # rename lastcheckpoint.tmp to previous.checkpoint. The problem is that step 3 occurs regardless of whether or not an error occurs for all storage directories in step 2. Upon restart, the NN will see non-existent or corrupt {{current}} directories, and no {{lastcheckpoint.tmp}} directories, and so will conclude that the storage directories are not formatted. This issue appears to be present on both 0.22 and 0.23. This should arguably be a 0.22/0.23 blocker. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save
[ https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032607#comment-13032607 ] Matt Foley commented on HDFS-1505: -- One more related issue: at the end of saveNamespace() it calls editLog.open(), which is implemented by FSEditLog.open(). This routine has the same problem: if the list of EditLogOutputStream is empty, it appears to succeed, but should throw an exception. I would suggest fixing the lack of notification in FSEditLog.open(), but also in your patch to saveNamespace() the check for empty IMAGE and EDITS lists should precede the call to editLog.open(). saveNamespace appears to succeed even if all directories fail to save - Key: HDFS-1505 URL: https://issues.apache.org/jira/browse/HDFS-1505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Aaron T. Myers Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch After HDFS-1071, saveNamespace now appears to succeed even if all of the individual directories failed to save. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart
[ https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032608#comment-13032608 ] Aaron T. Myers commented on HDFS-1921: -- Hey Matt, that's great news. Thanks for picking this up. I just talked to Todd, and he agrees that this code will be superseded in 0.23 by the work that's going on HDFS-1073. So, I think it's reasonable to only work on a patch for 0.22 as part of this JIRA. Save namespace can cause NN to be unable to come up on restart -- Key: HDFS-1921 URL: https://issues.apache.org/jira/browse/HDFS-1921 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Aaron T. Myers Assignee: Matt Foley Priority: Blocker Fix For: 0.22.0, 0.23.0 I discovered this in the course of trying to implement a fix for HDFS-1505. Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save namespace proceeds in the following order: # rename current to lastcheckpoint.tmp for all of them, # save image and recreate edits for all of them, # rename lastcheckpoint.tmp to previous.checkpoint. The problem is that step 3 occurs regardless of whether or not an error occurs for all storage directories in step 2. Upon restart, the NN will see non-existent or corrupt {{current}} directories, and no {{lastcheckpoint.tmp}} directories, and so will conclude that the storage directories are not formatted. This issue appears to be present on both 0.22 and 0.23. This should arguably be a 0.22/0.23 blocker. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1917) Clean up duplication of dependent jar files
[ https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated HDFS-1917: Attachment: HDFS-1917.patch * Changed ivy configuration to setup third party jar file for compile profile. * common profile contains only common-daemon to be included in HADOOP_HOME/lib. Clean up duplication of dependent jar files --- Key: HDFS-1917 URL: https://issues.apache.org/jira/browse/HDFS-1917 Project: Hadoop HDFS Issue Type: Bug Components: build Affects Versions: 0.23.0 Environment: Java 6, RHEL 5.5 Reporter: Eric Yang Attachments: HDFS-1917.patch For trunk, the build and deployment tree look like this: hadoop-common-0.2x.y hadoop-hdfs-0.2x.y hadoop-mapred-0.2x.y Technically, hdfs's the third party dependent jar files should be fetch from hadoop-common. However, it is currently fetching from hadoop-hdfs/lib only. It would be nice to eliminate the need to repeat duplicated jar files at build time. There are two options to manage this dependency list, continue to enhance ant build structure to fetch and filter jar file dependencies using ivy. On the other hand, it would be a good opportunity to convert the build structure to maven, and use maven to manage the provided jar files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1917) Clean up duplication of dependent jar files
[ https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated HDFS-1917: Assignee: Eric Yang Release Note: Remove packaging of duplicated third party jar files Status: Patch Available (was: Open) Remove packaging of duplicated third party jar files Clean up duplication of dependent jar files --- Key: HDFS-1917 URL: https://issues.apache.org/jira/browse/HDFS-1917 Project: Hadoop HDFS Issue Type: Bug Components: build Affects Versions: 0.23.0 Environment: Java 6, RHEL 5.5 Reporter: Eric Yang Assignee: Eric Yang Attachments: HDFS-1917.patch For trunk, the build and deployment tree look like this: hadoop-common-0.2x.y hadoop-hdfs-0.2x.y hadoop-mapred-0.2x.y Technically, hdfs's the third party dependent jar files should be fetch from hadoop-common. However, it is currently fetching from hadoop-hdfs/lib only. It would be nice to eliminate the need to repeat duplicated jar files at build time. There are two options to manage this dependency list, continue to enhance ant build structure to fetch and filter jar file dependencies using ivy. On the other hand, it would be a good opportunity to convert the build structure to maven, and use maven to manage the provided jar files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032636#comment-13032636 ] Todd Lipcon commented on HDFS-1332: --- Hey Nicholas. I thought about the performance impact as well, but I came to the conlusion that the node-selection code is not a hot code path. In my experience, the NN spends much much more time on read operations than on block allocation. For example, on one production NN whose metrics I have access to, it has performed 3.6M addBlock operations vs 105M FileInfoOps, 30M GetListing ops, 27M GetBlockLocations ops. Additionally, the new code will only get run for nodes which are decommissioning, out of space, or highly loaded. Thus it's not likely that it will add any appreciable overhead to most chooseTarget operations. Looking at the existing code, it's hardly optimized at all. For example, each invocation of chooseRandom() invokes countNumOfAvailableNodes which takes and releases locks, computes String substrings, etc. When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1627) Fix NullPointerException in Secondary NameNode
[ https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032638#comment-13032638 ] Todd Lipcon commented on HDFS-1627: --- Me too, thanks Hairong! Fix NullPointerException in Secondary NameNode -- Key: HDFS-1627 URL: https://issues.apache.org/jira/browse/HDFS-1627 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: NPE_SNN.patch, NPE_SNN1.patch Secondary NameNode should not reset namespace if no new image is downloaded from the primary NameNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save
[ https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032641#comment-13032641 ] Aaron T. Myers commented on HDFS-1505: -- Thanks a lot for the review/comments, Matt. Upon further reflection, I think the desired check should actually be: {code} if ((storage.getNumStorageDirs(NameNodeDirType.IMAGE) == 0 || storage.getNumStorageDirs(NameNodeDirType.EDITS) == 0) storage.getNumStorageDirs(NameNodeDirType.IMAGE_AND_EDITS) == 0) { throw new IOException(Failed to save any storage directories while saving namespace); {code} What do you think? Note that IMAGE_AND_EDITS is a distinct type of storage directory, which contains both {{fsimage}} and {{edits}} files. Apologies if you already knew that. bq. Also, since HDFS-1826 copied the concurrent saveNamespace() logic into FSImage.doUpgrade(), would you please add the same code fragment to the end of doUpgrade(), and a corresponding corruption unit test case to TestDFSUpgrade? Thanks. Will do. bq. I would suggest fixing the lack of notification in FSEditLog.open(), but also in your patch to saveNamespace() the check for empty IMAGE and EDITS lists should precede the call to editLog.open(). Agreed. Will do. saveNamespace appears to succeed even if all directories fail to save - Key: HDFS-1505 URL: https://issues.apache.org/jira/browse/HDFS-1505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Aaron T. Myers Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch After HDFS-1071, saveNamespace now appears to succeed even if all of the individual directories failed to save. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1917) Clean up duplication of dependent jar files
[ https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032644#comment-13032644 ] Luke Lu commented on HDFS-1917: --- Though I understand the goal is to separate the hdfs only dependencies for easier dedup, it seems to me that if you keep the common profile as is and add an hdfs profile for common-daemon. The patch would be smaller and less confusing (the common profile now contains hdfs only dependencies and the compile profile is actually from common.) Clean up duplication of dependent jar files --- Key: HDFS-1917 URL: https://issues.apache.org/jira/browse/HDFS-1917 Project: Hadoop HDFS Issue Type: Bug Components: build Affects Versions: 0.23.0 Environment: Java 6, RHEL 5.5 Reporter: Eric Yang Assignee: Eric Yang Attachments: HDFS-1917.patch For trunk, the build and deployment tree look like this: hadoop-common-0.2x.y hadoop-hdfs-0.2x.y hadoop-mapred-0.2x.y Technically, hdfs's the third party dependent jar files should be fetch from hadoop-common. However, it is currently fetching from hadoop-hdfs/lib only. It would be nice to eliminate the need to repeat duplicated jar files at build time. There are two options to manage this dependency list, continue to enhance ant build structure to fetch and filter jar file dependencies using ivy. On the other hand, it would be a good opportunity to convert the build structure to maven, and use maven to manage the provided jar files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032642#comment-13032642 ] Hadoop QA commented on HDFS-1332: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478995/HDFS-1332.patch against trunk revision 1102153. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.tools.TestJMXGet +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/504//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/504//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/504//console This message is automatically generated. When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1627) Fix NullPointerException in Secondary NameNode
[ https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032650#comment-13032650 ] Hadoop QA commented on HDFS-1627: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479004/NPE_SNN1.patch against trunk revision 1102153. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.tools.TestJMXGet +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/503//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/503//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/503//console This message is automatically generated. Fix NullPointerException in Secondary NameNode -- Key: HDFS-1627 URL: https://issues.apache.org/jira/browse/HDFS-1627 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: NPE_SNN.patch, NPE_SNN1.patch Secondary NameNode should not reset namespace if no new image is downloaded from the primary NameNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032651#comment-13032651 ] Tsz Wo (Nicholas), SZE commented on HDFS-1332: -- How about adding {{BlockPlacementPolicyDefault.LOG}} and use it to print the messages when debug is enabled? When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1869) mkdirs should use the supplied permission for all of the created directories
[ https://issues.apache.org/jira/browse/HDFS-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032653#comment-13032653 ] Tsz Wo (Nicholas), SZE commented on HDFS-1869: -- I think it is good to add a unit test for the 0600 case. It would also illustrate what is the expected behavior. BTW, have you verified it on BSD unix? We should make it the same as BSD, which our implementation is based on. mkdirs should use the supplied permission for all of the created directories Key: HDFS-1869 URL: https://issues.apache.org/jira/browse/HDFS-1869 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-1869.patch Mkdirs only uses the supplied FsPermission for the last directory of the path. Paths 0..N-1 will all inherit the parent dir's permissions -even if- inheritPermission is false. This is a regression from somewhere around 0.20.9 and does not follow posix semantics. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032655#comment-13032655 ] Suresh Srinivas commented on HDFS-1905: --- The reason format requires a cluster ID is the following: # When you add new namenodes to the existing cluster, they become the part of the federated cluster only if the same cluster ID is used. Otherwise, it is a different cluster. # This leaves us two choices - allow automatic generation of cluster ID for the first namenode. Then expect admin to use the same cluster ID for formatting additional namenodes. But this leaves us with admin accidentally formatting additional namenode without specifying a cluster ID and a cluster ID is automatically is generated. The namenode that was intended to be part of the same cluster now is not! Given this, we decided not to automatically generate cluster ID. An admin must specify it. A prompt would be appropriate, same as currently done with re-use of an available old cid. I do not think this solves the problem I stated. Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1920) libhdfs does not build for ARM processors
[ https://issues.apache.org/jira/browse/HDFS-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032654#comment-13032654 ] Hadoop QA commented on HDFS-1920: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478901/hadoop-hdfs-arm.patch against trunk revision 1102153. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.tools.TestJMXGet +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/501//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/501//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/501//console This message is automatically generated. libhdfs does not build for ARM processors - Key: HDFS-1920 URL: https://issues.apache.org/jira/browse/HDFS-1920 Project: Hadoop HDFS Issue Type: Bug Components: contrib/libhdfs Affects Versions: 0.21.0 Environment: $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper Target: arm-linux-gnueabi Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.5.2-8ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.5 --enable-shared --enable-multiarch --with-multiarch-defaults=arm-linux-gnueabi --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib/arm-linux-gnueabi --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.5 --libdir=/usr/lib/arm-linux-gnueabi --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi Thread model: posix gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) $ uname -a Linux panda0 2.6.38-1002-linaro-omap #3-Ubuntu SMP Fri Apr 15 14:00:54 UTC 2011 armv7l armv7l armv7l GNU/Linux Reporter: Trevor Robinson Attachments: hadoop-hdfs-arm.patch $ ant compile -Dcompile.native=true -Dcompile.c++=1 -Dlibhdfs=1 -Dfusedfs=1 ... create-libhdfs-configure: ... [exec] configure: error: Unsupported CPU architecture armv7l Once the CPU arch check is fixed in src/c++/libhdfs/m4/apsupport.m4, then next issue is -m32: $ ant compile -Dcompile.native=true -Dcompile.c++=1 -Dlibhdfs=1 -Dfusedfs=1 ... compile-c++-libhdfs: [exec] /bin/bash ./libtool --tag=CC --mode=compile gcc -DPACKAGE_NAME=\libhdfs\ -DPACKAGE_TARNAME=\libhdfs\ -DPACKAGE_VERSION=\0.1.0\ -DPACKAGE_STRING=\libhdfs\ 0.1.0\ -DPACKAGE_BUGREPORT=\omal...@apache.org\ -DPACKAGE_URL=\\ -DPACKAGE=\libhdfs\ -DVERSION=\0.1.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -Dsize_t=unsigned\ int -Dconst=/\*\*/ -Dvolatile=/\*\*/ -I. -I/home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs -g -O2 -DOS_LINUX -DDSO_DLFCN -DCPU=\arm\ -m32 -I/usr/lib/jvm/java-6-openjdk/include -I/usr/lib/jvm/java-6-openjdk/include/arm -Wall -Wstrict-prototypes -MT hdfs.lo -MD -MP -MF .deps/hdfs.Tpo -c -o hdfs.lo
[jira] [Assigned] (HDFS-1762) Allow TestHDFSCLI to be run against a cluster
[ https://issues.apache.org/jira/browse/HDFS-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik reassigned HDFS-1762: Assignee: Konstantin Boudnik Allow TestHDFSCLI to be run against a cluster - Key: HDFS-1762 URL: https://issues.apache.org/jira/browse/HDFS-1762 Project: Hadoop HDFS Issue Type: Test Reporter: Tom White Assignee: Konstantin Boudnik Currently TestHDFSCLI starts mini clusters to run tests against. It would be useful to be able to support running against arbitrary clusters for testing purposes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032657#comment-13032657 ] Todd Lipcon commented on HDFS-1905: --- Since the vast majority of users will not be using the federation feature, I think it's best to optimize for the common case and not for federated clusters. That is to say, we don't want to pollute the mental model of HDFS for new users by making them understand cluster IDs, block pools, etc. bq. But this leaves us with admin accidentally formatting additional namenode without specifying a cluster ID and a cluster ID is automatically is generated. The namenode that was intended to be part of the same cluster now is not! Sure, but they will figure this out before they put any data into it (since the DNs won't talk to this NN). And then calling format again with the correct cluster ID specified is no problem at all for them. Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032661#comment-13032661 ] Todd Lipcon commented on HDFS-1332: --- I don't think restricting nice error messages to the case when the NN is in debug mode is a good idea. We should endeavor to always have error messages that provide enough information to the user to understand and rectify the problem. New users are unlikely to know the tricks to switch over to debug mode using the cryptic daemonlog interface, and new users are the ones who need nice errors the most. When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1869) mkdirs should use the supplied permission for all of the created directories
[ https://issues.apache.org/jira/browse/HDFS-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032666#comment-13032666 ] Daryn Sharp commented on HDFS-1869: --- Ok, test will be added. I haven't tested on *BSD due to easy access, but I have tested on Darwin. However, the bsd man page for mkdir states: bq. -p Create intermediate directories as required. If this option is not specified, the full path prefix of each operand must already exist. On the other hand, with this option specified, no error will be reported if a directory given as an operand already exists. *Intermediate directories are created with permission bits of rwxrwxrwx (0777) as modified by the current umask*, plus write and search permission for the owner. Will double check that write and search are indeed implicitly added. mkdirs should use the supplied permission for all of the created directories Key: HDFS-1869 URL: https://issues.apache.org/jira/browse/HDFS-1869 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-1869.patch Mkdirs only uses the supplied FsPermission for the last directory of the path. Paths 0..N-1 will all inherit the parent dir's permissions -even if- inheritPermission is false. This is a regression from somewhere around 0.20.9 and does not follow posix semantics. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032668#comment-13032668 ] Suresh Srinivas commented on HDFS-1905: --- Couple of other comments I missed: During design we wanted to ensure cluster ID is unique - to avoid accidentally naming two clusters with the same cluster ID. To do that, we have an option to generate a unique, UUID like, cluster ID. Instead of a complicated UUID kind of to identify a cluster, it would be good to use a name to identify a cluster. Given the small number of cluster, coming with a simple naming scheme should not be hard. Given that we should delete the functionality to generate cluster ID. Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1920) libhdfs does not build for ARM processors
[ https://issues.apache.org/jira/browse/HDFS-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032670#comment-13032670 ] Trevor Robinson commented on HDFS-1920: --- No tests included because this change just fixes a build failure. Manually verified that x86-64 builds unchanged (-m64 is properly specified) and that ARM now builds (-m32 is not specified). Core unit test failures are existing and unrelated issues. This change only affects libhdfs. Would a committer please review the change? libhdfs does not build for ARM processors - Key: HDFS-1920 URL: https://issues.apache.org/jira/browse/HDFS-1920 Project: Hadoop HDFS Issue Type: Bug Components: contrib/libhdfs Affects Versions: 0.21.0 Environment: $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/arm-linux-gnueabi/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper Target: arm-linux-gnueabi Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.5.2-8ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.5 --enable-shared --enable-multiarch --with-multiarch-defaults=arm-linux-gnueabi --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib/arm-linux-gnueabi --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.5 --libdir=/usr/lib/arm-linux-gnueabi --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld=ld.gold --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a --with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi Thread model: posix gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) $ uname -a Linux panda0 2.6.38-1002-linaro-omap #3-Ubuntu SMP Fri Apr 15 14:00:54 UTC 2011 armv7l armv7l armv7l GNU/Linux Reporter: Trevor Robinson Attachments: hadoop-hdfs-arm.patch $ ant compile -Dcompile.native=true -Dcompile.c++=1 -Dlibhdfs=1 -Dfusedfs=1 ... create-libhdfs-configure: ... [exec] configure: error: Unsupported CPU architecture armv7l Once the CPU arch check is fixed in src/c++/libhdfs/m4/apsupport.m4, then next issue is -m32: $ ant compile -Dcompile.native=true -Dcompile.c++=1 -Dlibhdfs=1 -Dfusedfs=1 ... compile-c++-libhdfs: [exec] /bin/bash ./libtool --tag=CC --mode=compile gcc -DPACKAGE_NAME=\libhdfs\ -DPACKAGE_TARNAME=\libhdfs\ -DPACKAGE_VERSION=\0.1.0\ -DPACKAGE_STRING=\libhdfs\ 0.1.0\ -DPACKAGE_BUGREPORT=\omal...@apache.org\ -DPACKAGE_URL=\\ -DPACKAGE=\libhdfs\ -DVERSION=\0.1.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -Dsize_t=unsigned\ int -Dconst=/\*\*/ -Dvolatile=/\*\*/ -I. -I/home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs -g -O2 -DOS_LINUX -DDSO_DLFCN -DCPU=\arm\ -m32 -I/usr/lib/jvm/java-6-openjdk/include -I/usr/lib/jvm/java-6-openjdk/include/arm -Wall -Wstrict-prototypes -MT hdfs.lo -MD -MP -MF .deps/hdfs.Tpo -c -o hdfs.lo /home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs/hdfs.c [exec] make: Warning: File `.deps/hdfs_write.Po' has modification time 2.1 s in the future [exec] libtool: compile: gcc -DPACKAGE_NAME=\libhdfs\ -DPACKAGE_TARNAME=\libhdfs\ -DPACKAGE_VERSION=\0.1.0\ -DPACKAGE_STRING=\libhdfs 0.1.0\ -DPACKAGE_BUGREPORT=\omal...@apache.org\ -DPACKAGE_URL=\\ -DPACKAGE=\libhdfs\ -DVERSION=\0.1.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -Dsize_t=unsigned int -Dconst=/**/ -Dvolatile=/**/ -I. -I/home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs -g -O2 -DOS_LINUX -DDSO_DLFCN -DCPU=\arm\ -m32 -I/usr/lib/jvm/java-6-openjdk/include -I/usr/lib/jvm/java-6-openjdk/include/arm -Wall -Wstrict-prototypes -MT hdfs.lo -MD -MP -MF .deps/hdfs.Tpo -c /home/trobinson/dev/hadoop-hdfs/src/c++/libhdfs/hdfs.c -fPIC -DPIC -o .libs/hdfs.o [exec] cc1: error: unrecognized command line option -m32 [exec] make: *** [hdfs.lo] Error 1 Here, gcc does not support -m32 for the ARM target, so -m${JVM_ARCH} must be omitted from CFLAGS and LDFLAGS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1332) When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
[ https://issues.apache.org/jira/browse/HDFS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032671#comment-13032671 ] Tsz Wo (Nicholas), SZE commented on HDFS-1332: -- Hi Todd, it is questionable whether this is a nice error message. Too many error messages confuse users. Replication also uses {{BlockPlacementPolicy}}. Have you counted it? Also, your example is just one example. It may not be a good representative. Moreover, the performance degradation is twofold: # it takes time to create the messages/objects, and # it creates additional objects for GC. When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded -- Key: HDFS-1332 URL: https://issues.apache.org/jira/browse/HDFS-1332 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Todd Lipcon Assignee: Ted Yu Priority: Minor Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1332.patch Whenever the block placement policy determines that a node is not a good target it could add the reason for exclusion to a list, and then when we log Not able to place enough replicas we could say why each node was refused. This would help new users who are having issues on pseudo-distributed (eg because their data dir is on /tmp and /tmp is full). Right now it's very difficult to figure out the issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1903) Fix path display for rm/rmr
[ https://issues.apache.org/jira/browse/HDFS-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1903: - Hadoop Flags: [Reviewed] +1 patch looks good. Fix path display for rm/rmr --- Key: HDFS-1903 URL: https://issues.apache.org/jira/browse/HDFS-1903 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 0.23.0 Attachments: HDFS-1903.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1869) mkdirs should use the supplied permission for all of the created directories
[ https://issues.apache.org/jira/browse/HDFS-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032679#comment-13032679 ] Daryn Sharp commented on HDFS-1869: --- Creating a multi-level dir with 0600 creates all dirs with 0600 -- will post test shortly. So it works as expected in the aspect of you get the permissions you asked for. It's neglecting to implicitly add u+rx. In unix this is required since mkdir -p does a series of mkdir/chdir, so u+rx is required to do the chdir calls. In hdfs it's not necessary since it verifies permissions in the directory where the mkdir originates, and then creates all the dirs with no permission checking. Do want the u+rx behavior added too? If so, would it be ok to be done on a separate jira? mkdirs should use the supplied permission for all of the created directories Key: HDFS-1869 URL: https://issues.apache.org/jira/browse/HDFS-1869 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-1869.patch Mkdirs only uses the supplied FsPermission for the last directory of the path. Paths 0..N-1 will all inherit the parent dir's permissions -even if- inheritPermission is false. This is a regression from somewhere around 0.20.9 and does not follow posix semantics. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart
[ https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032680#comment-13032680 ] Aaron T. Myers commented on HDFS-1921: -- Also, I should mention that there's a test case posted on HDFS-1505 which will illustrate this case. Save namespace can cause NN to be unable to come up on restart -- Key: HDFS-1921 URL: https://issues.apache.org/jira/browse/HDFS-1921 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Aaron T. Myers Assignee: Matt Foley Priority: Blocker Fix For: 0.22.0, 0.23.0 I discovered this in the course of trying to implement a fix for HDFS-1505. Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save namespace proceeds in the following order: # rename current to lastcheckpoint.tmp for all of them, # save image and recreate edits for all of them, # rename lastcheckpoint.tmp to previous.checkpoint. The problem is that step 3 occurs regardless of whether or not an error occurs for all storage directories in step 2. Upon restart, the NN will see non-existent or corrupt {{current}} directories, and no {{lastcheckpoint.tmp}} directories, and so will conclude that the storage directories are not formatted. This issue appears to be present on both 0.22 and 0.23. This should arguably be a 0.22/0.23 blocker. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032682#comment-13032682 ] Jitendra Nath Pandey commented on HDFS-1592: 1. There seems to be a redundancy in following conditions (volsFailed volFailuresTolerated) and validVolsRequired storage.getNumStorageDirs(). Since both checks throw the same exception, I will recommend doing it in one condition. 2. Please don't remove the DataNode.LOG.error. Datanode startup doesn't honor volumes.tolerated - Key: HDFS-1592 URL: https://issues.apache.org/jira/browse/HDFS-1592 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1592-1.patch, HDFS-1592-rel20.patch Datanode startup doesn't honor volumes.tolerated for hadoop 20 version. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1903) Fix path display for rm/rmr
[ https://issues.apache.org/jira/browse/HDFS-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1903: - Resolution: Fixed Status: Resolved (was: Patch Available) I have committed this. Thanks, Daryn! Fix path display for rm/rmr --- Key: HDFS-1903 URL: https://issues.apache.org/jira/browse/HDFS-1903 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 0.23.0 Attachments: HDFS-1903.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1627) Fix NullPointerException in Secondary NameNode
[ https://issues.apache.org/jira/browse/HDFS-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1627: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just committed this! The failed tests are not relate to this patch. Fix NullPointerException in Secondary NameNode -- Key: HDFS-1627 URL: https://issues.apache.org/jira/browse/HDFS-1627 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Attachments: NPE_SNN.patch, NPE_SNN1.patch Secondary NameNode should not reset namespace if no new image is downloaded from the primary NameNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1917) Clean up duplication of dependent jar files
[ https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated HDFS-1917: Attachment: HDFS-1917-1.patch Revise patch to add hdfs ivy configuration. Thanks Luke! Clean up duplication of dependent jar files --- Key: HDFS-1917 URL: https://issues.apache.org/jira/browse/HDFS-1917 Project: Hadoop HDFS Issue Type: Bug Components: build Affects Versions: 0.23.0 Environment: Java 6, RHEL 5.5 Reporter: Eric Yang Assignee: Eric Yang Attachments: HDFS-1917-1.patch, HDFS-1917.patch For trunk, the build and deployment tree look like this: hadoop-common-0.2x.y hadoop-hdfs-0.2x.y hadoop-mapred-0.2x.y Technically, hdfs's the third party dependent jar files should be fetch from hadoop-common. However, it is currently fetching from hadoop-hdfs/lib only. It would be nice to eliminate the need to repeat duplicated jar files at build time. There are two options to manage this dependency list, continue to enhance ant build structure to fetch and filter jar file dependencies using ivy. On the other hand, it would be a good opportunity to convert the build structure to maven, and use maven to manage the provided jar files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-1904) Secondary Namenode dies when a mkdir on a non-existent parent directory is run
[ https://issues.apache.org/jira/browse/HDFS-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang resolved HDFS-1904. - Resolution: Duplicate Secondary Namenode dies when a mkdir on a non-existent parent directory is run -- Key: HDFS-1904 URL: https://issues.apache.org/jira/browse/HDFS-1904 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Environment: Linux Reporter: Ravi Prakash Priority: Blocker Steps to reproduce: 1. Configure secondary namenode with {{fs.checkpoint.period}} set to a small value (eg 3 seconds) 2. Format filesystem and start HDFS 3. hadoop fs -mkdir /foo/bar ; sleep 5 ; echo | hadoop fs -put - /foo/bar/baz 2NN will crash with the following trace on the next checkpoint. The primary NN also crashes on next restart 11/05/10 15:19:28 ERROR namenode.SecondaryNameNode: Throwable Exception in doCheckpoint: 11/05/10 15:19:28 ERROR namenode.SecondaryNameNode: java.lang.NullPointerException: Panic: parent does not exist at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1693) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1707) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1544) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:288) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:116) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:62) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:723) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:720) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$500(SecondaryNameNode.java:610) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:487) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:448) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:312) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:276) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1377) Quota bug for partial blocks allows quotas to be violated
[ https://issues.apache.org/jira/browse/HDFS-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1377: - Fix Version/s: 0.20.204.0 I have merged this to branch-0.20-security-204. Quota bug for partial blocks allows quotas to be violated -- Key: HDFS-1377 URL: https://issues.apache.org/jira/browse/HDFS-1377 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0, 0.23.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Blocker Fix For: 0.20.3, 0.20.204.0, 0.20.205.0, 0.21.1, Federation Branch, 0.22.0, 0.23.0 Attachments: HDFS-1377.patch, hdfs-1377-1.patch, hdfs-1377-b20-1.patch, hdfs-1377-b20-2.patch, hdfs-1377-b20-3.patch There's a bug in the quota code that causes them not to be respected when a file is not an exact multiple of the block size. Here's an example: {code} $ hadoop fs -mkdir /test $ hadoop dfsadmin -setSpaceQuota 384M /test $ ls dir/ | wc -l # dir contains 101 files 101 $ du -ms dir# each is 3mb 304 dir $ hadoop fs -put dir /test $ hadoop fs -count -q /test none inf 402653184 -5505024002 101 317718528 hdfs://haus01.sf.cloudera.com:10020/test $ hadoop fs -stat %o %r /test/dir/f30 134217728 3# three 128mb blocks {code} INodeDirectoryWithQuota caches the number of bytes consumed by it's children in {{diskspace}}. The quota adjustment code has a bug that causes {{diskspace}} to get updated incorrectly when a file is not an exact multiple of the block size (the value ends up being negative). This causes the quota checking code to think that the files in the directory consumes less space than they actually do, so the verifyQuota does not throw a QuotaExceededException even when the directory is over quota. However the bug isn't visible to users because {{fs count -q}} reports the numbers generated by INode#getContentSummary which adds up the sizes of the blocks rather than use the cached INodeDirectoryWithQuota#diskspace value. In FSDirectory#addBlock the disk space consumed is set conservatively to the full block size * the number of replicas: {code} updateCount(inodes, inodes.length-1, 0, fileNode.getPreferredBlockSize()*fileNode.getReplication(), true); {code} In FSNameSystem#addStoredBlock we adjust for this conservative estimate by subtracting out the difference between the conservative estimate and what the number of bytes actually stored was: {code} //Updated space consumed if required. INodeFile file = (storedBlock != null) ? storedBlock.getINode() : null; long diff = (file == null) ? 0 : (file.getPreferredBlockSize() - storedBlock.getNumBytes()); if (diff 0 file.isUnderConstruction() cursize storedBlock.getNumBytes()) { ... dir.updateSpaceConsumed(path, 0, -diff*file.getReplication()); {code} We do the same in FSDirectory#replaceNode when completing the file, but at a file granularity (I believe the intent here is to correct for the cases when there's a failure replicating blocks and recovery). Since oldnode is under construction INodeFile#diskspaceConsumed will use the preferred block size (vs of Block#getNumBytes used by newnode) so we will again subtract out the difference between the full block size and what the number of bytes actually stored was: {code} long dsOld = oldnode.diskspaceConsumed(); ... //check if disk space needs to be updated. long dsNew = 0; if (updateDiskspace (dsNew = newnode.diskspaceConsumed()) != dsOld) { try { updateSpaceConsumed(path, 0, dsNew-dsOld); ... {code} So in the above example we started with diskspace at 384mb (3 * 128mb) and then we subtract 375mb (to reflect only 9mb raw was actually used) twice so for each file the diskspace for the directory is - 366mb (384mb minus 2 * 375mb). Which is why the quota gets negative and yet we can still write more files. So a directory with lots of single block files (if you have multiple blocks on the final partial block ends up subtracting from the diskspace used) ends up having a quota that's way off. I think the fix is to in FSDirectory#replaceNode not have the diskspaceConsumed calculations differ when the old and new INode have the same blocks. I'll work on a patch which also adds a quota test for blocks that are not multiples of the block size and warns in INodeDirectory#computeContentSummary if the computed size does not reflect the cached value. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save
[ https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032692#comment-13032692 ] Matt Foley commented on HDFS-1505: -- Hi Aaron, agree with you that storage directories of type IMAGE_AND_EDITS are a distinct NameNodeDirType. However, my understanding of NNStorage.getNumStorageDirs(NameNodeDirType), and NameNodeDirType.isOfType() is that membership queries (iterators or counts) about storage dirs of type EDITS return answers relating to all storage dirs of type EDITS || IMAGE_AND_EDITS, while queries about storage dirs of type IMAGE return answers relating to all storage dirs of type IMAGE || IMAGE_AND_EDITS. That is, isOfType() is permissive rather than exclusive. I could be wrong of course :-) as it's possible I didn't correctly follow overloaded implementations. Please let me know if so. Thanks. saveNamespace appears to succeed even if all directories fail to save - Key: HDFS-1505 URL: https://issues.apache.org/jira/browse/HDFS-1505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Aaron T. Myers Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch After HDFS-1071, saveNamespace now appears to succeed even if all of the individual directories failed to save. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save
[ https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032697#comment-13032697 ] Aaron T. Myers commented on HDFS-1505: -- bq. That is, isOfType() is permissive rather than exclusive. You are quite correct. My mistake. The original logic you posted for the check seems to be correct. bq. Also, since HDFS-1826 copied the concurrent saveNamespace() logic into FSImage.doUpgrade(), would you please add the same code fragment to the end of doUpgrade(), and a corresponding corruption unit test case to TestDFSUpgrade? Thanks. It occurs to me now that the failure handling should perhaps be different between these two cases. i.e. it is acceptable to tolerate some number of storage directory failures during save namespace, but we should perhaps throw an error in the event *any* storage directories fail during upgrade. Thoughts? saveNamespace appears to succeed even if all directories fail to save - Key: HDFS-1505 URL: https://issues.apache.org/jira/browse/HDFS-1505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Aaron T. Myers Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch After HDFS-1071, saveNamespace now appears to succeed even if all of the individual directories failed to save. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1917) Clean up duplication of dependent jar files
[ https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032695#comment-13032695 ] Hadoop QA commented on HDFS-1917: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479007/HDFS-1917.patch against trunk revision 1102153. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.namenode.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.tools.TestJMXGet +1 contrib tests. The patch passed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/506//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/506//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/506//console This message is automatically generated. Clean up duplication of dependent jar files --- Key: HDFS-1917 URL: https://issues.apache.org/jira/browse/HDFS-1917 Project: Hadoop HDFS Issue Type: Bug Components: build Affects Versions: 0.23.0 Environment: Java 6, RHEL 5.5 Reporter: Eric Yang Assignee: Eric Yang Attachments: HDFS-1917-1.patch, HDFS-1917.patch For trunk, the build and deployment tree look like this: hadoop-common-0.2x.y hadoop-hdfs-0.2x.y hadoop-mapred-0.2x.y Technically, hdfs's the third party dependent jar files should be fetch from hadoop-common. However, it is currently fetching from hadoop-hdfs/lib only. It would be nice to eliminate the need to repeat duplicated jar files at build time. There are two options to manage this dependency list, continue to enhance ant build structure to fetch and filter jar file dependencies using ivy. On the other hand, it would be a good opportunity to convert the build structure to maven, and use maven to manage the provided jar files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1814) HDFS portion of HADOOP-7214 - Hadoop /usr/bin/groups equivalent
[ https://issues.apache.org/jira/browse/HDFS-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032698#comment-13032698 ] Aaron T. Myers commented on HDFS-1814: -- The test failures are unrelated to this patch. HDFS portion of HADOOP-7214 - Hadoop /usr/bin/groups equivalent --- Key: HDFS-1814 URL: https://issues.apache.org/jira/browse/HDFS-1814 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs client, name-node Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: hdfs-1814.0.txt, hdfs-1814.1.txt, hdfs-1814.2.txt, hdfs-1814.3.patch, hdfs-1814.4.patch, hdfs-1814.5.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032701#comment-13032701 ] Bharath Mundlapudi commented on HDFS-1592: -- Thanks for the review, Jitendra. 1. The conditions are there for better readability. Yes, we can change this into one condition. 2. Error is logged where the exception is caught. Datanode startup doesn't honor volumes.tolerated - Key: HDFS-1592 URL: https://issues.apache.org/jira/browse/HDFS-1592 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1592-1.patch, HDFS-1592-rel20.patch Datanode startup doesn't honor volumes.tolerated for hadoop 20 version. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1899) GenericTestUtils.formatNamenode is misplaced
[ https://issues.apache.org/jira/browse/HDFS-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1899: -- Status: Patch Available (was: Open) GenericTestUtils.formatNamenode is misplaced Key: HDFS-1899 URL: https://issues.apache.org/jira/browse/HDFS-1899 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Ted Yu Labels: newbie Fix For: 0.23.0 Attachments: HDFS-1899.patch This function belongs in DFSTestUtil, the standard place for putting cluster-related utils. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save
[ https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032705#comment-13032705 ] Matt Foley commented on HDFS-1505: -- Good question. I don't know. Let's both ask our ops teams. saveNamespace appears to succeed even if all directories fail to save - Key: HDFS-1505 URL: https://issues.apache.org/jira/browse/HDFS-1505 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Aaron T. Myers Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-1505-22.0.patch, hdfs-1505-22.1.patch, hdfs-1505-test.txt, hdfs-1505-trunk.0.patch, hdfs-1505-trunk.1.patch After HDFS-1071, saveNamespace now appears to succeed even if all of the individual directories failed to save. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032714#comment-13032714 ] Suresh Srinivas commented on HDFS-1905: --- we don't want to pollute the mental model of HDFS for new users by making them understand cluster IDs, block pools, etc. I disagree with you on this. Cluster ID though is being added as part of federation, I do not think pollutes the mental model. What is the cluster today? It is all the nodes sharing the same namespaceID, that is automatically generated and shared by all the nodes. Cluster ID makes it much cleaner where user identifiable name is shared by all the nodes and will identify all the nodes in the cluster. I am not sure if this is such a complicated idea that disrupts the HDFS model. Further, even without federation, we should have had such an identifier in the first place, instead of namespaceID, which happened to become cluster ID equivalent. Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart
[ https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-1921: - Status: Patch Available (was: Open) Save namespace can cause NN to be unable to come up on restart -- Key: HDFS-1921 URL: https://issues.apache.org/jira/browse/HDFS-1921 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Aaron T. Myers Assignee: Matt Foley Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: hdfs1921_v23.patch I discovered this in the course of trying to implement a fix for HDFS-1505. Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save namespace proceeds in the following order: # rename current to lastcheckpoint.tmp for all of them, # save image and recreate edits for all of them, # rename lastcheckpoint.tmp to previous.checkpoint. The problem is that step 3 occurs regardless of whether or not an error occurs for all storage directories in step 2. Upon restart, the NN will see non-existent or corrupt {{current}} directories, and no {{lastcheckpoint.tmp}} directories, and so will conclude that the storage directories are not formatted. This issue appears to be present on both 0.22 and 0.23. This should arguably be a 0.22/0.23 blocker. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart
[ https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-1921: - Attachment: hdfs1921_v23.patch Here's a patch for trunk, so it will run under auto-test. I'll post the v22 version when it passes. The HDFS-1505 test case should work if this patch is added. Can you please try it, as I was getting a failure to unlock the storage dir upon FSNamesystem.close(). Save namespace can cause NN to be unable to come up on restart -- Key: HDFS-1921 URL: https://issues.apache.org/jira/browse/HDFS-1921 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Aaron T. Myers Assignee: Matt Foley Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: hdfs1921_v23.patch I discovered this in the course of trying to implement a fix for HDFS-1505. Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save namespace proceeds in the following order: # rename current to lastcheckpoint.tmp for all of them, # save image and recreate edits for all of them, # rename lastcheckpoint.tmp to previous.checkpoint. The problem is that step 3 occurs regardless of whether or not an error occurs for all storage directories in step 2. Upon restart, the NN will see non-existent or corrupt {{current}} directories, and no {{lastcheckpoint.tmp}} directories, and so will conclude that the storage directories are not formatted. This issue appears to be present on both 0.22 and 0.23. This should arguably be a 0.22/0.23 blocker. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032719#comment-13032719 ] Todd Lipcon commented on HDFS-1905: --- I agree that cluster ID is a nicer construct than namespace ID. But it doesn't replace it, since we still have the namespaceID in NNStorage. Perhaps a nice compromise would be the following: - hadoop namenode -format gains a required argument for cluster ID. ie hadoop namenode -format mycluster. If you don't specify this it should print usage info. - hadoop namenode -upgrade by default will carry over the old namespaceID as the new cluster's cluster ID? Alternatively one may provide a cluster ID with hadoop namenode -upgrade -clusterid foo? Another question: if cluster ID is meant to be a user-visible nice name -- how can one rename a cluster? Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1917) Clean up duplication of dependent jar files
[ https://issues.apache.org/jira/browse/HDFS-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032725#comment-13032725 ] Hadoop QA commented on HDFS-1917: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479013/HDFS-1917-1.patch against trunk revision 1102467. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.tools.TestJMXGet +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/507//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/507//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/507//console This message is automatically generated. Clean up duplication of dependent jar files --- Key: HDFS-1917 URL: https://issues.apache.org/jira/browse/HDFS-1917 Project: Hadoop HDFS Issue Type: Bug Components: build Affects Versions: 0.23.0 Environment: Java 6, RHEL 5.5 Reporter: Eric Yang Assignee: Eric Yang Attachments: HDFS-1917-1.patch, HDFS-1917.patch For trunk, the build and deployment tree look like this: hadoop-common-0.2x.y hadoop-hdfs-0.2x.y hadoop-mapred-0.2x.y Technically, hdfs's the third party dependent jar files should be fetch from hadoop-common. However, it is currently fetching from hadoop-hdfs/lib only. It would be nice to eliminate the need to repeat duplicated jar files at build time. There are two options to manage this dependency list, continue to enhance ant build structure to fetch and filter jar file dependencies using ivy. On the other hand, it would be a good opportunity to convert the build structure to maven, and use maven to manage the provided jar files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1921) Save namespace can cause NN to be unable to come up on restart
[ https://issues.apache.org/jira/browse/HDFS-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-1921: - Attachment: hdfs-1505-1-test.txt Here's the modified form of the test that works - there was a glitch in spy storage setup. The test passes. Save namespace can cause NN to be unable to come up on restart -- Key: HDFS-1921 URL: https://issues.apache.org/jira/browse/HDFS-1921 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Aaron T. Myers Assignee: Matt Foley Priority: Blocker Fix For: 0.22.0, 0.23.0 Attachments: hdfs-1505-1-test.txt, hdfs1921_v23.patch I discovered this in the course of trying to implement a fix for HDFS-1505. Per the comment for {{FSImage.saveNamespace(...)}}, the algorithm for save namespace proceeds in the following order: # rename current to lastcheckpoint.tmp for all of them, # save image and recreate edits for all of them, # rename lastcheckpoint.tmp to previous.checkpoint. The problem is that step 3 occurs regardless of whether or not an error occurs for all storage directories in step 2. Upon restart, the NN will see non-existent or corrupt {{current}} directories, and no {{lastcheckpoint.tmp}} directories, and so will conclude that the storage directories are not formatted. This issue appears to be present on both 0.22 and 0.23. This should arguably be a 0.22/0.23 blocker. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1787) Not enough xcievers error should propagate to client
[ https://issues.apache.org/jira/browse/HDFS-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032731#comment-13032731 ] Jonathan Hsieh commented on HDFS-1787: -- After more investigation, these two may be newly incurred errors. org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader The other three seem flaky on trunk. Not enough xcievers error should propagate to client -- Key: HDFS-1787 URL: https://issues.apache.org/jira/browse/HDFS-1787 Project: Hadoop HDFS Issue Type: Improvement Reporter: Todd Lipcon Assignee: Jonathan Hsieh Labels: newbie Attachments: hdfs-1787.patch We find that users often run into the default transceiver limits in the DN. Putting aside the inherent issues with xceiver threads, it would be nice if the xceiver limit exceeded error propagated to the client. Currently, clients simply see an EOFException which is hard to interpret, and have to go slogging through DN logs to find the underlying issue. The data transfer protocol should be extended to either have a special error code for not enough xceivers or should have some error code for generic errors with which a string can be attached and propagated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira