[jira] [Created] (HDFS-7930) commitBlockSynchronization() does not remove locations, which were not confirmed
Konstantin Shvachko created HDFS-7930: - Summary: commitBlockSynchronization() does not remove locations, which were not confirmed Key: HDFS-7930 URL: https://issues.apache.org/jira/browse/HDFS-7930 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Priority: Blocker When {{commitBlockSynchronization()}} has less {{newTargets}} than in the original block it does not remove unconfirmed locations. This results in that the the block stores locations of different lengths or genStamp (corrupt). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: upstream jenkins build broken?
Any updates on this issues? It seems that all HDFS jenkins builds are still failing. Regards, Haohui On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B vinayakum...@apache.org wrote: I think the problem started from here. https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/ As Chris mentioned TestDataNodeVolumeFailure is changing the permission. But in this patch, ReplicationMonitor got NPE and it got terminate signal, due to which MiniDFSCluster.shutdown() throwing Exception. But, TestDataNodeVolumeFailure#teardown() is restoring those permission after shutting down cluster. So in this case IMO, permissions were never restored. @After public void tearDown() throws Exception { if(data_fail != null) { FileUtil.setWritable(data_fail, true); } if(failedDir != null) { FileUtil.setWritable(failedDir, true); } if(cluster != null) { cluster.shutdown(); } for (int i = 0; i 3; i++) { FileUtil.setExecutable(new File(dataDir, data+(2*i+1)), true); FileUtil.setExecutable(new File(dataDir, data+(2*i+2)), true); } } Regards, Vinay On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B vinayakum...@apache.org wrote: When I see the history of these kind of builds, All these are failed on node H9. I think some or the other uncommitted patch would have created the problem and left it there. Regards, Vinay On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey bus...@cloudera.com wrote: You could rely on a destructive git clean call instead of maven to do the directory removal. -- Sean On Mar 11, 2015 4:11 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Is there a maven plugin or setting we can use to simply remove directories that have no executable permissions on them? Clearly we have the permission to do this from a technical point of view (since we created the directories as the jenkins user), it's simply that the code refuses to do it. Otherwise I guess we can just fix those tests... Colin On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu l...@cloudera.com wrote: Thanks a lot for looking into HDFS-7722, Chris. In HDFS-7722: TestDataNodeVolumeFailureXXX tests reset data dir permissions in TearDown(). TestDataNodeHotSwapVolumes reset permissions in a finally clause. Also I ran mvn test several times on my machine and all tests passed. However, since in DiskChecker#checkDirAccess(): private static void checkDirAccess(File dir) throws DiskErrorException { if (!dir.isDirectory()) { throw new DiskErrorException(Not a directory: + dir.toString()); } checkAccessByFileMethods(dir); } One potentially safer alternative is replacing data dir with a regular file to stimulate disk failures. On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth cnaur...@hortonworks.com wrote: TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, TestDataNodeVolumeFailureReporting, and TestDataNodeVolumeFailureToleration all remove executable permissions from directories like the one Colin mentioned to simulate disk failures at data nodes. I reviewed the code for all of those, and they all appear to be doing the necessary work to restore executable permissions at the end of the test. The only recent uncommitted patch I¹ve seen that makes changes in these test suites is HDFS-7722. That patch still looks fine though. I don¹t know if there are other uncommitted patches that changed these test suites. I suppose it¹s also possible that the JUnit process unexpectedly died after removing executable permissions but before restoring them. That always would have been a weakness of these test suites, regardless of any recent changes. Chris Nauroth Hortonworks http://hortonworks.com/ On 3/10/15, 1:47 PM, Aaron T. Myers a...@cloudera.com wrote: Hey Colin, I asked Andrew Bayer, who works with Apache Infra, what's going on with these boxes. He took a look and concluded that some perms are being set in those directories by our unit tests which are precluding those files from getting deleted. He's going to clean up the boxes for us, but we should expect this to keep happening until we can fix the test in question to properly clean up after itself. To help narrow down which commit it was that started this, Andrew sent me this info: /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ has 500 perms, so I'm guessing that's the problem. Been that way since 9:32 UTC on March 5th. -- Aaron T. Myers Software Engineer, Cloudera On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe cmcc...@apache.org
[jira] [Created] (HDFS-7933) fsck should also report decommissioning replicas.
Jitendra Nath Pandey created HDFS-7933: -- Summary: fsck should also report decommissioning replicas. Key: HDFS-7933 URL: https://issues.apache.org/jira/browse/HDFS-7933 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Jitendra Nath Pandey Fsck doesn't count replicas that are on decommissioning nodes. If a block has all replicas on the decommissioning nodes, it will be marked as missing, which is alarming for the admins, although the system will replicate them before nodes are decommissioned. Fsck output should also show decommissioning replicas along with the live replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7932) Speed up the shutdown of datanode during rolling upgrade
Kihwal Lee created HDFS-7932: Summary: Speed up the shutdown of datanode during rolling upgrade Key: HDFS-7932 URL: https://issues.apache.org/jira/browse/HDFS-7932 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Datanode normally exits in 3 seconds after receiving {{shutdownDatanode}} command. However, sometimes it doesn't, especially when the IO is busy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7931) Spurious Error message Could not find uri with key [dfs.encryption.key.provider.uri] to create a key appears even when Encryption is dissabled
Arun Suresh created HDFS-7931: - Summary: Spurious Error message Could not find uri with key [dfs.encryption.key.provider.uri] to create a key appears even when Encryption is dissabled Key: HDFS-7931 URL: https://issues.apache.org/jira/browse/HDFS-7931 Project: Hadoop HDFS Issue Type: Bug Environment: The {{addDelegationTokens}} method in {{DistributedFileSystem}} calls {{DFSClient#getKeyProvider()}} which attempts to get a provider from the {{KeyProvderCache}} but since the required key, *dfs.encryption.key.provider.uri* is not present (due to encryption being dissabled), it throws an exception. {noformat} 2015-03-11 23:55:47,849 [JobControl] ER ROR org.apache.hadoop.hdfs.KeyProviderCache - Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! {noformat} Reporter: Arun Suresh Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-1841) Enforce read-only permissions in FUSE open()
[ https://issues.apache.org/jira/browse/HDFS-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe resolved HDFS-1841. Resolution: Duplicate duplicate of HDFS-4139 from 2012 Enforce read-only permissions in FUSE open() Key: HDFS-1841 URL: https://issues.apache.org/jira/browse/HDFS-1841 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs Affects Versions: 0.20.2 Environment: Linux 2.6.35 Reporter: Brian Bloniarz Priority: Minor Attachments: patch.fuse-dfs, patch.fuse-dfs.kernel fuse-dfs currently allows files to be created on a read-only filesystem: $ fuse_dfs_wrapper.sh dfs://example.com:8020 ro ~/hdfs $ touch ~/hdfs/foobar Attached is a simple patch, which does two things: 1) Checks the read_only flag inside dfs_open(). 2) Passes the read-only mount option to FUSE when ro is specified on the commandline. This is probably a better long-term solution; the kernel will enforce the read-only operations without it being necessary inside the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: upstream jenkins build broken?
I filed HDFS-7917 to change the way to simulate disk failures. But I think we still need infrastructure folks to help with jenkins scripts to clean the dirs left today. On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui ricet...@gmail.com wrote: Any updates on this issues? It seems that all HDFS jenkins builds are still failing. Regards, Haohui On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B vinayakum...@apache.org wrote: I think the problem started from here. https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/ As Chris mentioned TestDataNodeVolumeFailure is changing the permission. But in this patch, ReplicationMonitor got NPE and it got terminate signal, due to which MiniDFSCluster.shutdown() throwing Exception. But, TestDataNodeVolumeFailure#teardown() is restoring those permission after shutting down cluster. So in this case IMO, permissions were never restored. @After public void tearDown() throws Exception { if(data_fail != null) { FileUtil.setWritable(data_fail, true); } if(failedDir != null) { FileUtil.setWritable(failedDir, true); } if(cluster != null) { cluster.shutdown(); } for (int i = 0; i 3; i++) { FileUtil.setExecutable(new File(dataDir, data+(2*i+1)), true); FileUtil.setExecutable(new File(dataDir, data+(2*i+2)), true); } } Regards, Vinay On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B vinayakum...@apache.org wrote: When I see the history of these kind of builds, All these are failed on node H9. I think some or the other uncommitted patch would have created the problem and left it there. Regards, Vinay On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey bus...@cloudera.com wrote: You could rely on a destructive git clean call instead of maven to do the directory removal. -- Sean On Mar 11, 2015 4:11 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Is there a maven plugin or setting we can use to simply remove directories that have no executable permissions on them? Clearly we have the permission to do this from a technical point of view (since we created the directories as the jenkins user), it's simply that the code refuses to do it. Otherwise I guess we can just fix those tests... Colin On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu l...@cloudera.com wrote: Thanks a lot for looking into HDFS-7722, Chris. In HDFS-7722: TestDataNodeVolumeFailureXXX tests reset data dir permissions in TearDown(). TestDataNodeHotSwapVolumes reset permissions in a finally clause. Also I ran mvn test several times on my machine and all tests passed. However, since in DiskChecker#checkDirAccess(): private static void checkDirAccess(File dir) throws DiskErrorException { if (!dir.isDirectory()) { throw new DiskErrorException(Not a directory: + dir.toString()); } checkAccessByFileMethods(dir); } One potentially safer alternative is replacing data dir with a regular file to stimulate disk failures. On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth cnaur...@hortonworks.com wrote: TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, TestDataNodeVolumeFailureReporting, and TestDataNodeVolumeFailureToleration all remove executable permissions from directories like the one Colin mentioned to simulate disk failures at data nodes. I reviewed the code for all of those, and they all appear to be doing the necessary work to restore executable permissions at the end of the test. The only recent uncommitted patch I¹ve seen that makes changes in these test suites is HDFS-7722. That patch still looks fine though. I don¹t know if there are other uncommitted patches that changed these test suites. I suppose it¹s also possible that the JUnit process unexpectedly died after removing executable permissions but before restoring them. That always would have been a weakness of these test suites, regardless of any recent changes. Chris Nauroth Hortonworks http://hortonworks.com/ On 3/10/15, 1:47 PM, Aaron T. Myers a...@cloudera.com wrote: Hey Colin, I asked Andrew Bayer, who works with Apache Infra, what's going on with these boxes. He took a look and concluded that some perms are being set in those directories by our unit tests which are precluding those files from getting deleted. He's going to clean up the boxes for us, but we should expect this to keep happening until we can fix the test in question to properly clean up after itself. To help narrow down which commit it was that started this, Andrew sent me this info: /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
[jira] [Reopened] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error
[ https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe reopened HDFS-7915: oops, I just saw that jenkins didn't run on v6 yet. sigh... The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error - Key: HDFS-7915 URL: https://issues.apache.org/jira/browse/HDFS-7915 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.0 Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch, HDFS-7915.004.patch, HDFS-7915.005.patch, HDFS-7915.006.patch The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error. In {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first part (mark the slot as used) and fail at the second part (tell the DFSClient what it did). The try block for unregistering the slot only covers a failure in the first part, not the second part. In this way, a divergence can form between the views of which slots are allocated on DFSClient and on server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7191) WebHDFS prematurely closes connections under high concurrent loads
[ https://issues.apache.org/jira/browse/HDFS-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HDFS-7191. -- Resolution: Duplicate HDFS-7279 should fix this problem. WebHDFS prematurely closes connections under high concurrent loads -- Key: HDFS-7191 URL: https://issues.apache.org/jira/browse/HDFS-7191 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Priority: Critical We're seeing the DN prematurely closes APPEND connections: {noformat] 2014-09-22 23:53:12,721 WARN org.apache.hadoop.hdfs.web.resources.ExceptionHandler: INTERNAL_SERVER_ERROR java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.mortbay.io.nio.SelectChannelEndPoint.updateKey(SelectChannelEndPoint.java:325) at org.mortbay.io.nio.SelectChannelEndPoint.blockReadable(SelectChannelEndPoint.java:242) at org.mortbay.jetty.HttpParser$Input.blockForContent(HttpParser.java:1169) at org.mortbay.jetty.HttpParser$Input.read(HttpParser.java:1122) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:84) at org.apache.hadoop.hdfs.server.datanode.web.resources.DatanodeWebHdfsMethods.put(DatanodeWebHdfsMethods.java:239) at org.apache.hadoop.hdfs.server.datanode.web.resources.DatanodeWebHdfsMethods.access$000(DatanodeWebHdfsMethods.java:87) at org.apache.hadoop.hdfs.server.datanode.web.resources.DatanodeWebHdfsMethods$1.run(DatanodeWebHdfsMethods.java:205) at org.apache.hadoop.hdfs.server.datanode.web.resources.DatanodeWebHdfsMethods$1.run(DatanodeWebHdfsMethods.java:202) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hdfs.server.datanode.web.resources.DatanodeWebHdfsMethods.put(DatanodeWebHdfsMethods.java:202) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-6118) Code cleanup
[ https://issues.apache.org/jira/browse/HDFS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HDFS-6118. -- Resolution: Fixed Code cleanup Key: HDFS-6118 URL: https://issues.apache.org/jira/browse/HDFS-6118 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Assignee: Suresh Srinivas HDFS code needs cleanup related to many typos, undocumented parameters, unused methods, unnecessary cast, imports and exceptions declared as thrown to name a few. I plan on working on cleaning this up as I get time. To keep code review manageable, I will create sub tasks and cleanup the code a few classes at a time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-5193) Unifying HA support in HftpFileSystem, HsftpFileSystem and WebHdfsFileSystem
[ https://issues.apache.org/jira/browse/HDFS-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HDFS-5193. -- Resolution: Won't Fix As hftp is phasing out there are few motivations to get this fixed. Unifying HA support in HftpFileSystem, HsftpFileSystem and WebHdfsFileSystem Key: HDFS-5193 URL: https://issues.apache.org/jira/browse/HDFS-5193 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Recent changes in HDFS-5122 implement the HA support for the WebHDFS client. Similar to WebHDFS client, both HftpFileSystem and HsftpFilesystem access HDFS via HTTP, but their current implementation hinders the implementation of HA support. I propose to refactor HftpFileSystem, HsftpFileSystem, and WebHdfsFileSystem to provide unified abstractions to support HA cluster over HTTP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7050) Implementation of NameNodeMXBean.getLiveNodes() skips DataNodes started on the same host
[ https://issues.apache.org/jira/browse/HDFS-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HDFS-7050. -- Resolution: Duplicate Fixed in HDFS-7303 Implementation of NameNodeMXBean.getLiveNodes() skips DataNodes started on the same host Key: HDFS-7050 URL: https://issues.apache.org/jira/browse/HDFS-7050 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Reporter: Przemyslaw Pretki Priority: Minor If two or more data nodes are running on the same host only one of them is reported via tab-datanode web page (and NameNodeMXBean interface) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-6496) WebHDFS cannot open file
[ https://issues.apache.org/jira/browse/HDFS-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HDFS-6496. -- Resolution: Invalid WebHDFS cannot open file Key: HDFS-6496 URL: https://issues.apache.org/jira/browse/HDFS-6496 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Fengdong Yu Attachments: webhdfs.PNG WebHDFS cannot open the file on the name node web UI. I attched screen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7927) Fluentd unable to write events to MaprFS using httpfs
Roman Slysh created HDFS-7927: - Summary: Fluentd unable to write events to MaprFS using httpfs Key: HDFS-7927 URL: https://issues.apache.org/jira/browse/HDFS-7927 Project: Hadoop HDFS Issue Type: Bug Environment: mapr 4.0.1 Reporter: Roman Slysh The issue is on MaprFS file system. Probably, can be reproduced on HDFS, but not sure. We have observed in td-agent log whenever webhdfs plugin call to flush events its calling append instead of create file on maprfs by communicating with webhdfs. We need to modify this plugin to create file and then append data to the file as manually creating file is not a solution as lot of log events write to Filesystem they need to rotate on timely basis. http://docs.fluentd.org/articles/http-to-hdfs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: How the ack sent back to upstream of a pipeline when write data to HDFS
On 3/13/2015 7:55 AM, xiaohe lan wrote: Hi experts, When HDFS client sends a packet of data to a DN in the pipeline, the packet will then be sent to the next DN in the pipeline. What confuses me is when the ack from a DN in the pipeline will be sent back ? In which order ? It is sent from the last to first or in other ways ? Thanks, Xiaohe Hi Xiaohe, Take a look at figure 3.2 in https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf. IHTH. Charles
Hadoop-Hdfs-trunk - Build # 2063 - Still Failing
See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2063/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 7563 lines...] main: [mkdir] Created dir: /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/hadoop-hdfs-project/target/test-dir [INFO] Executed tasks [INFO] [INFO] --- maven-source-plugin:2.3:jar-no-fork (hadoop-java-sources) @ hadoop-hdfs-project --- [INFO] [INFO] --- maven-source-plugin:2.3:test-jar-no-fork (hadoop-java-sources) @ hadoop-hdfs-project --- [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (dist-enforce) @ hadoop-hdfs-project --- [INFO] [INFO] --- maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @ hadoop-hdfs-project --- [INFO] [INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ hadoop-hdfs-project --- [INFO] Skipping javadoc generation [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (depcheck) @ hadoop-hdfs-project --- [INFO] [INFO] --- maven-checkstyle-plugin:2.12.1:checkstyle (default-cli) @ hadoop-hdfs-project --- [INFO] [INFO] --- findbugs-maven-plugin:3.0.0:findbugs (default-cli) @ hadoop-hdfs-project --- [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop HDFS FAILURE [ 03:06 h] [INFO] Apache Hadoop HttpFS .. SKIPPED [INFO] Apache Hadoop HDFS BookKeeper Journal . SKIPPED [INFO] Apache Hadoop HDFS-NFS SKIPPED [INFO] Apache Hadoop HDFS Project SUCCESS [ 2.231 s] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 03:06 h [INFO] Finished at: 2015-03-13T14:40:47+00:00 [INFO] Final Memory: 51M/626M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on project hadoop-hdfs: There are test failures. [ERROR] [ERROR] Please refer to /home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/hadoop-hdfs-project/hadoop-hdfs/target/surefire-reports for the individual test results. [ERROR] - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Updating HDFS-7722 Updating HDFS-6833 Updating HADOOP-9477 Updating HADOOP-11710 Updating HADOOP-11711 Updating YARN-3154 Updating YARN-3338 Sending e-mails to: hdfs-dev@hadoop.apache.org Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 5 tests failed. REGRESSION: org.apache.hadoop.hdfs.TestAppendSnapshotTruncate.testAST Error Message: file00 has ERROR Stack Trace: java.lang.IllegalStateException: file00 has ERROR at org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$Worker.checkErrorState(TestAppendSnapshotTruncate.java:429) at org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$Worker.stop(TestAppendSnapshotTruncate.java:483) at org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$DirWorker.stopAllFiles(TestAppendSnapshotTruncate.java:263) at org.apache.hadoop.hdfs.TestAppendSnapshotTruncate.testAST(TestAppendSnapshotTruncate.java:128) Caused by: java.lang.AssertionError: inode should complete in ~3 ms. Expected: is true but: was false at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.junit.Assert.assertThat(Assert.java:865) at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1170) at org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$FileWorker.truncate(TestAppendSnapshotTruncate.java:366) at org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$FileWorker.truncateArbitrarily(TestAppendSnapshotTruncate.java:342) at org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$FileWorker.call(TestAppendSnapshotTruncate.java:307) at org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$FileWorker.call(TestAppendSnapshotTruncate.java:280) at org.apache.hadoop.hdfs.TestAppendSnapshotTruncate$Worker$1.run(TestAppendSnapshotTruncate.java:454) at java.lang.Thread.run(Thread.java:745) REGRESSION:
Build failed in Jenkins: Hadoop-Hdfs-trunk-Java8 #122
See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/122/changes Changes: [xgong] YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie [szetszwo] HDFS-6833. DirectoryScanner should not register a deleting block with memory of DataNode. Contributed by Shinichi Yamashita [cmccabe] HDFS-7722. DataNode#checkDiskError should also remove Storage when error is found. (Lei Xu via Colin P. McCabe) [vinodkv] YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating running logs of application when rolling is enabled. Contributed by Xuan Gong. [yzhang] HADOOP-9477. Add posixGroups support for LDAP groups mapping service. (Dapeng Sun via Yongjun Zhang) [yliu] HADOOP-11710. Make CryptoOutputStream behave like DFSOutputStream wrt synchronization. (Sean Busbey via yliu) [wang] HADOOP-11711. Provide a default value for AES/CTR/NoPadding CryptoCodec classes. -- [...truncated 8234 lines...] Running org.apache.hadoop.tracing.TestTracing Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.079 sec - in org.apache.hadoop.tracing.TestTracing Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.tracing.TestTracingShortCircuitLocalRead Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.53 sec - in org.apache.hadoop.tracing.TestTracingShortCircuitLocalRead Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.tracing.TestTraceAdmin Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.16 sec - in org.apache.hadoop.tracing.TestTraceAdmin Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.security.TestPermission Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.229 sec - in org.apache.hadoop.security.TestPermission Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.security.TestPermissionSymlinks Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.198 sec - in org.apache.hadoop.security.TestPermissionSymlinks Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.security.TestRefreshUserMappings Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.826 sec - in org.apache.hadoop.security.TestRefreshUserMappings Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.fs.TestFcHdfsSetUMask Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.408 sec - in org.apache.hadoop.fs.TestFcHdfsSetUMask Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.fs.TestSymlinkHdfsFileSystem Tests run: 72, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 8.099 sec - in org.apache.hadoop.fs.TestSymlinkHdfsFileSystem Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.fs.loadGenerator.TestLoadGenerator Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.65 sec - in org.apache.hadoop.fs.loadGenerator.TestLoadGenerator Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.fs.contract.hdfs.TestHDFSContractRename Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.795 sec - in org.apache.hadoop.fs.contract.hdfs.TestHDFSContractRename Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.fs.contract.hdfs.TestHDFSContractDelete Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.687 sec - in org.apache.hadoop.fs.contract.hdfs.TestHDFSContractDelete Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.fs.contract.hdfs.TestHDFSContractAppend Tests run: 5, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 4.785 sec - in org.apache.hadoop.fs.contract.hdfs.TestHDFSContractAppend Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.fs.contract.hdfs.TestHDFSContractOpen Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.477 sec - in org.apache.hadoop.fs.contract.hdfs.TestHDFSContractOpen Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.fs.contract.hdfs.TestHDFSContractConcat Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.708 sec - in
Build failed in Jenkins: Hadoop-Hdfs-trunk #2063
See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2063/changes Changes: [xgong] YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie [szetszwo] HDFS-6833. DirectoryScanner should not register a deleting block with memory of DataNode. Contributed by Shinichi Yamashita [cmccabe] HDFS-7722. DataNode#checkDiskError should also remove Storage when error is found. (Lei Xu via Colin P. McCabe) [vinodkv] YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating running logs of application when rolling is enabled. Contributed by Xuan Gong. [yzhang] HADOOP-9477. Add posixGroups support for LDAP groups mapping service. (Dapeng Sun via Yongjun Zhang) [yliu] HADOOP-11710. Make CryptoOutputStream behave like DFSOutputStream wrt synchronization. (Sean Busbey via yliu) [wang] HADOOP-11711. Provide a default value for AES/CTR/NoPadding CryptoCodec classes. -- [...truncated 7370 lines...] Running org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.998 sec - in org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager Running org.apache.hadoop.hdfs.qjournal.client.TestSegmentRecoveryComparator Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.299 sec - in org.apache.hadoop.hdfs.qjournal.client.TestSegmentRecoveryComparator Running org.apache.hadoop.hdfs.qjournal.client.TestIPCLoggerChannel Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.164 sec - in org.apache.hadoop.hdfs.qjournal.client.TestIPCLoggerChannel Running org.apache.hadoop.hdfs.qjournal.client.TestEpochsAreUnique Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.116 sec - in org.apache.hadoop.hdfs.qjournal.client.TestEpochsAreUnique Running org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 154.29 sec - in org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults Running org.apache.hadoop.hdfs.qjournal.client.TestQuorumCall Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.261 sec - in org.apache.hadoop.hdfs.qjournal.client.TestQuorumCall Running org.apache.hadoop.hdfs.qjournal.TestMiniJournalCluster Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.813 sec - in org.apache.hadoop.hdfs.qjournal.TestMiniJournalCluster Running org.apache.hadoop.hdfs.qjournal.TestNNWithQJM Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.39 sec - in org.apache.hadoop.hdfs.qjournal.TestNNWithQJM Running org.apache.hadoop.hdfs.TestConnCache Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.944 sec - in org.apache.hadoop.hdfs.TestConnCache Running org.apache.hadoop.hdfs.TestDFSStorageStateRecovery Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.958 sec - in org.apache.hadoop.hdfs.TestDFSStorageStateRecovery Running org.apache.hadoop.hdfs.TestFileAppend Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.624 sec - in org.apache.hadoop.hdfs.TestFileAppend Running org.apache.hadoop.hdfs.TestFileAppend3 Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 42.261 sec - in org.apache.hadoop.hdfs.TestFileAppend3 Running org.apache.hadoop.hdfs.TestClientReportBadBlock Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.013 sec - in org.apache.hadoop.hdfs.TestClientReportBadBlock Running org.apache.hadoop.hdfs.TestParallelShortCircuitReadNoChecksum Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.163 sec - in org.apache.hadoop.hdfs.TestParallelShortCircuitReadNoChecksum Running org.apache.hadoop.hdfs.TestFileCreation Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 381.347 sec - in org.apache.hadoop.hdfs.TestFileCreation Running org.apache.hadoop.hdfs.TestDFSRemove Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.335 sec - in org.apache.hadoop.hdfs.TestDFSRemove Running org.apache.hadoop.hdfs.TestHdfsAdmin Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.895 sec - in org.apache.hadoop.hdfs.TestHdfsAdmin Running org.apache.hadoop.hdfs.TestDFSUtil Tests run: 30, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 104.465 sec - in org.apache.hadoop.hdfs.TestDFSUtil Running org.apache.hadoop.hdfs.TestWriteBlockGetsBlockLengthHint Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.058 sec - in org.apache.hadoop.hdfs.TestWriteBlockGetsBlockLengthHint Running org.apache.hadoop.hdfs.TestDataTransferKeepalive Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.814 sec - in org.apache.hadoop.hdfs.TestDataTransferKeepalive Running org.apache.hadoop.hdfs.TestLease Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.486 sec - in org.apache.hadoop.hdfs.TestLease Running org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS Tests run: 19, Failures: 0, Errors: 0,
How the ack sent back to upstream of a pipeline when write data to HDFS
Hi experts, When HDFS client sends a packet of data to a DN in the pipeline, the packet will then be sent to the next DN in the pipeline. What confuses me is when the ack from a DN in the pipeline will be sent back ? In which order ? It is sent from the last to first or in other ways ? Thanks, Xiaohe
[jira] [Created] (HDFS-7928) Scanning blocks from disk during rolling upgrade startup takes a lot of time if disks are busy
Rushabh S Shah created HDFS-7928: Summary: Scanning blocks from disk during rolling upgrade startup takes a lot of time if disks are busy Key: HDFS-7928 URL: https://issues.apache.org/jira/browse/HDFS-7928 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Rushabh S Shah Assignee: Rushabh S Shah We observed this issue in rolling upgrade to 2.6.x on one of our cluster. One of the disks was very busy and it took long time to scan that disk compared to other disks. Seeing the sar (System Activity Reporter) data we saw that the particular disk was very busy performing IO operations. Requesting for an improvement during datanode rolling upgrade. During shutdown, we can persist the whole volume map on the disk and let the datanode read that file and create the volume map during startup after rolling upgrade. This will not require the datanode process to scan all the disk and read the block. This will significantly improve the datanode startup time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)