[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644357#comment-14644357 ] Mit Desai commented on HDFS-742: Attached modified patch. But still, I do not have a unit test for the fix A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list Key: HDFS-742 URL: https://issues.apache.org/jira/browse/HDFS-742 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Hairong Kuang Assignee: Mit Desai Attachments: HDFS-742-trunk.patch, HDFS-742.patch We had a balancer that had not made any progress for a long time. It turned out it was repeatingly asking Namenode for a partial block list of one datanode, which was done while the balancer was running. NameNode should notify Balancer that the datanode is not available and Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-742: --- Attachment: HDFS-742-trunk.patch A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list Key: HDFS-742 URL: https://issues.apache.org/jira/browse/HDFS-742 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Hairong Kuang Assignee: Mit Desai Attachments: HDFS-742-trunk.patch, HDFS-742.patch We had a balancer that had not made any progress for a long time. It turned out it was repeatingly asking Namenode for a partial block list of one datanode, which was done while the balancer was running. NameNode should notify Balancer that the datanode is not available and Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7364) Balancer always show zero Bytes Already Moved
[ https://issues.apache.org/jira/browse/HDFS-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200425#comment-14200425 ] Mit Desai commented on HDFS-7364: - Nice catch. Here balancer exits here after 5 iterations of what it thinks has 0B move. It means it is still balancing and exits in middle the process. I see that the Bytes left to move is going down in every iteration. It will be nice to have this fixed. But it would be good to have a unit test as well. Balancer always show zero Bytes Already Moved - Key: HDFS-7364 URL: https://issues.apache.org/jira/browse/HDFS-7364 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7364_20141105.patch Here is an example: {noformat} Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved Nov 5, 2014 5:23:38 PM0 0 B 116.82 MB 181.07 MB Nov 5, 2014 5:24:30 PM1 0 B88.05 MB 181.07 MB Nov 5, 2014 5:25:10 PM2 0 B73.08 MB 181.07 MB Nov 5, 2014 5:25:49 PM3 0 B13.37 MB 90.53 MB Nov 5, 2014 5:26:30 PM4 0 B13.59 MB 90.53 MB Nov 5, 2014 5:27:12 PM5 0 B 9.25 MB 90.53 MB The cluster is balanced. Exiting... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7230) Add rolling downgrade documentation
[ https://issues.apache.org/jira/browse/HDFS-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190226#comment-14190226 ] Mit Desai commented on HDFS-7230: - +1 (non-binding) Thanks for the patch [~szetszwo]. Add rolling downgrade documentation --- Key: HDFS-7230 URL: https://issues.apache.org/jira/browse/HDFS-7230 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7230_20141028.patch HDFS-5535 made a lot of improvement on rolling upgrade. It also added the cluster downgrade feature. However, the downgrade described in HDFS-5535 requires cluster downtime. In this JIRA, we discuss how to do rolling downgrade, i.e. downgrade without downtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6687) nn.getNamesystem() may return NPE from JspHelper
[ https://issues.apache.org/jira/browse/HDFS-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6687: Target Version/s: 2.6.0 (was: 2.5.0) nn.getNamesystem() may return NPE from JspHelper Key: HDFS-6687 URL: https://issues.apache.org/jira/browse/HDFS-6687 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Mit Desai Assignee: Mit Desai In hadoop-2, the http server is started in the very early stage to show the progress. If the user tries to get the name system, it may not be completely up and the NN logs will have this kind of error. {noformat} 2014-07-14 15:49:03,521 [***] WARN resources.ExceptionHandler: INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.hdfs.server.common.JspHelper.getTokenUGI(JspHelper.java:661) at org.apache.hadoop.hdfs.server.common.JspHelper.getUGI(JspHelper.java:604) at org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:53) at org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:41) at com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:84) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78) at com.yahoo.hadoop.GzipFilter.doFilter(GzipFilter.java:220) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at
[jira] [Created] (HDFS-6983) TestBalancer#testExitZeroOnSuccess fails intermittently
Mit Desai created HDFS-6983: --- Summary: TestBalancer#testExitZeroOnSuccess fails intermittently Key: HDFS-6983 URL: https://issues.apache.org/jira/browse/HDFS-6983 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Mit Desai TestBalancer#testExitZeroOnSuccess fails intermittently on branch-2. And probably fails on trunk too. The test fails 1 in 20 times when I ran it in a loop. Here is the how it fails. {noformat} org.apache.hadoop.hdfs.server.balancer.TestBalancer testExitZeroOnSuccess(org.apache.hadoop.hdfs.server.balancer.TestBalancer) Time elapsed: 53.965 sec ERROR! java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.2, but on datanode 127.0.0.1:35502 it remains at 0.08 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:321) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:632) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:549) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:437) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:645) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:845) Results : Tests in error: TestBalancer.testExitZeroOnSuccess:845-oneNodeTest:645-doTest:437-doTest:549-runBalancerCli:632-waitForBalancer:321 Timeout {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6754: Status: Patch Available (was: Open) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry --- Key: HDFS-6754 URL: https://issues.apache.org/jira/browse/HDFS-6754 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6754.patch I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently in our nightly builds with the following error: {noformat} java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) at org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6754: Status: Open (was: Patch Available) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry --- Key: HDFS-6754 URL: https://issues.apache.org/jira/browse/HDFS-6754 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6754.patch I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently in our nightly builds with the following error: {noformat} java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) at org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6754: Attachment: HDFS-6754.patch Attaching patch to enable retries. TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry --- Key: HDFS-6754 URL: https://issues.apache.org/jira/browse/HDFS-6754 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6754.patch I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently in our nightly builds with the following error: {noformat} java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) at org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6754: Status: Patch Available (was: Open) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry --- Key: HDFS-6754 URL: https://issues.apache.org/jira/browse/HDFS-6754 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6754.patch, HDFS-6754.patch I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently in our nightly builds with the following error: {noformat} java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) at org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6754: Attachment: HDFS-6754.patch Refined patch to update the comment which described the changed line TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry --- Key: HDFS-6754 URL: https://issues.apache.org/jira/browse/HDFS-6754 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6754.patch, HDFS-6754.patch I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently in our nightly builds with the following error: {noformat} java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) at org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076649#comment-14076649 ] Mit Desai commented on HDFS-6754: - These test failures are not related to the patch. TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry --- Key: HDFS-6754 URL: https://issues.apache.org/jira/browse/HDFS-6754 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6754.patch, HDFS-6754.patch I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently in our nightly builds with the following error: {noformat} java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) at org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076918#comment-14076918 ] Mit Desai commented on HDFS-6754: - Thanks Daryn! TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry --- Key: HDFS-6754 URL: https://issues.apache.org/jira/browse/HDFS-6754 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6754.patch, HDFS-6754.patch I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently in our nightly builds with the following error: {noformat} java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) at org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai moved YARN-2358 to HDFS-6754: --- Target Version/s: 2.6.0 (was: 2.6.0) Affects Version/s: (was: 2.6.0) 2.6.0 Key: HDFS-6754 (was: YARN-2358) Project: Hadoop HDFS (was: Hadoop YARN) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry --- Key: HDFS-6754 URL: https://issues.apache.org/jira/browse/HDFS-6754 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently in our nightly builds with the following error: {noformat} java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) at org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6755) Make DFSOutputStream more efficient
Mit Desai created HDFS-6755: --- Summary: Make DFSOutputStream more efficient Key: HDFS-6755 URL: https://issues.apache.org/jira/browse/HDFS-6755 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Following code in DFSOutputStream may have an unnecessary sleep. {code} try { Thread.sleep(localTimeout); if (retries == 0) { throw new IOException(Unable to close file because the last block + does not have enough number of replicas.); } retries--; localTimeout *= 2; if (Time.now() - localstart 5000) { DFSClient.LOG.info(Could not complete + src + retrying...); } } catch (InterruptedException ie) { DFSClient.LOG.warn(Caught exception , ie); } {code} Currently, the code sleeps before throwing an exception which should not be the case. The sleep time gets doubled on every iteration, which can make a significant effect if there are more than one iterations. We need to move the sleep down after decrementing retries. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6755) Make DFSOutputStream more efficient
[ https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6755: Issue Type: Improvement (was: Bug) Make DFSOutputStream more efficient --- Key: HDFS-6755 URL: https://issues.apache.org/jira/browse/HDFS-6755 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Following code in DFSOutputStream may have an unnecessary sleep. {code} try { Thread.sleep(localTimeout); if (retries == 0) { throw new IOException(Unable to close file because the last block + does not have enough number of replicas.); } retries--; localTimeout *= 2; if (Time.now() - localstart 5000) { DFSClient.LOG.info(Could not complete + src + retrying...); } } catch (InterruptedException ie) { DFSClient.LOG.warn(Caught exception , ie); } {code} Currently, the code sleeps before throwing an exception which should not be the case. The sleep time gets doubled on every iteration, which can make a significant effect if there are more than one iterations. We need to move the sleep down after decrementing retries. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6755) Make DFSOutputStream more efficient
[ https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6755: Description: Following code in DFSOutputStream may have an unnecessary sleep. {code} try { Thread.sleep(localTimeout); if (retries == 0) { throw new IOException(Unable to close file because the last block + does not have enough number of replicas.); } retries--; localTimeout *= 2; if (Time.now() - localstart 5000) { DFSClient.LOG.info(Could not complete + src + retrying...); } } catch (InterruptedException ie) { DFSClient.LOG.warn(Caught exception , ie); } {code} Currently, the code sleeps before throwing an exception which should not be the case. The sleep time gets doubled on every iteration, which can make a significant effect if there are more than one iterations and it would sleep just to throw an exception. We need to move the sleep down after decrementing retries. was: Following code in DFSOutputStream may have an unnecessary sleep. {code} try { Thread.sleep(localTimeout); if (retries == 0) { throw new IOException(Unable to close file because the last block + does not have enough number of replicas.); } retries--; localTimeout *= 2; if (Time.now() - localstart 5000) { DFSClient.LOG.info(Could not complete + src + retrying...); } } catch (InterruptedException ie) { DFSClient.LOG.warn(Caught exception , ie); } {code} Currently, the code sleeps before throwing an exception which should not be the case. The sleep time gets doubled on every iteration, which can make a significant effect if there are more than one iterations. We need to move the sleep down after decrementing retries. Make DFSOutputStream more efficient --- Key: HDFS-6755 URL: https://issues.apache.org/jira/browse/HDFS-6755 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Following code in DFSOutputStream may have an unnecessary sleep. {code} try { Thread.sleep(localTimeout); if (retries == 0) { throw new IOException(Unable to close file because the last block + does not have enough number of replicas.); } retries--; localTimeout *= 2; if (Time.now() - localstart 5000) { DFSClient.LOG.info(Could not complete + src + retrying...); } } catch (InterruptedException ie) { DFSClient.LOG.warn(Caught exception , ie); } {code} Currently, the code sleeps before throwing an exception which should not be the case. The sleep time gets doubled on every iteration, which can make a significant effect if there are more than one iterations and it would sleep just to throw an exception. We need to move the sleep down after decrementing retries. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6755) Make DFSOutputStream more efficient
[ https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6755: Attachment: HDFS-6755.patch Hi [~cmccabe], I did not mean to get rid of the sleep. I have uploaded the patch to indicate the change I wanted to make. I wanted to throw an IOException if the {{retries == 0}} before {{Thread.sleep(localTimeout);}} is called. Does that seem reasonable? Make DFSOutputStream more efficient --- Key: HDFS-6755 URL: https://issues.apache.org/jira/browse/HDFS-6755 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6755.patch Following code in DFSOutputStream may have an unnecessary sleep. {code} try { Thread.sleep(localTimeout); if (retries == 0) { throw new IOException(Unable to close file because the last block + does not have enough number of replicas.); } retries--; localTimeout *= 2; if (Time.now() - localstart 5000) { DFSClient.LOG.info(Could not complete + src + retrying...); } } catch (InterruptedException ie) { DFSClient.LOG.warn(Caught exception , ie); } {code} Currently, the code sleeps before throwing an exception which should not be the case. The sleep time gets doubled on every iteration, which can make a significant effect if there are more than one iterations and it would sleep just to throw an exception. We need to move the sleep down after decrementing retries. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6755) There is an unnecessary sleep in the code path where DFSOutputStream#close gives up its attempt to contact the namenode
[ https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075236#comment-14075236 ] Mit Desai commented on HDFS-6755: - Thanks Colin! There is an unnecessary sleep in the code path where DFSOutputStream#close gives up its attempt to contact the namenode --- Key: HDFS-6755 URL: https://issues.apache.org/jira/browse/HDFS-6755 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6755.patch DFSOutputStream#close has a loop where it tries to contact the NameNode, to call {{complete}} on the file which is open-for-write. This loop includes a sleep which increases exponentially (exponential backoff). It makes sense to sleep before re-contacting the NameNode, but the code also sleeps even in the case where it has already decided to give up and throw an exception back to the user. It should not sleep after it has already decided to give up, since there's no point. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6696) Name node cannot start if the path of a file under construction contains .snapshot
[ https://issues.apache.org/jira/browse/HDFS-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069235#comment-14069235 ] Mit Desai commented on HDFS-6696: - [~andrew.wang], we were trying to upgrade 0.21.11 to 2.4.0 Name node cannot start if the path of a file under construction contains .snapshot Key: HDFS-6696 URL: https://issues.apache.org/jira/browse/HDFS-6696 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Andrew Wang Priority: Blocker Using {{-renameReserved}} to rename .snapshot in a pre-hdfs-snapshot feature fsimage during upgrade only works, if there is nothing under construction under the renamed directory. I am not sure whether it takes care of edits containing .snapshot properly. The workaround is to identify these directories and rename, then do {{saveNamespace}} before performing upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6687) nn.getNamesystem() may return NPE from JspHelper
Mit Desai created HDFS-6687: --- Summary: nn.getNamesystem() may return NPE from JspHelper Key: HDFS-6687 URL: https://issues.apache.org/jira/browse/HDFS-6687 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Mit Desai Assignee: Mit Desai In hadoop-2, the http server is started in the very early stage to show the progress. If the user tries to get the name system, it may not be completely up and the NN logs will have this kind of error. {noformat} 2014-07-14 15:49:03,521 [***] WARN resources.ExceptionHandler: INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.hdfs.server.common.JspHelper.getTokenUGI(JspHelper.java:661) at org.apache.hadoop.hdfs.server.common.JspHelper.getUGI(JspHelper.java:604) at org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:53) at org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:41) at com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:84) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78) at com.yahoo.hadoop.GzipFilter.doFilter(GzipFilter.java:220) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at
[jira] [Created] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade
Mit Desai created HDFS-6691: --- Summary: The message on NN UI can be confusing during a rolling upgrade Key: HDFS-6691 URL: https://issues.apache.org/jira/browse/HDFS-6691 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: ha1.png On ANN, it says rollback image was created. On SBN, it says otherwise. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6691: Attachment: ha1.png The message on NN UI can be confusing during a rolling upgrade --- Key: HDFS-6691 URL: https://issues.apache.org/jira/browse/HDFS-6691 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: ha1.png On ANN, it says rollback image was created. On SBN, it says otherwise. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6691: Attachment: ha2.png The message on NN UI can be confusing during a rolling upgrade --- Key: HDFS-6691 URL: https://issues.apache.org/jira/browse/HDFS-6691 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: ha1.png, ha2.png On ANN, it says rollback image was created. On SBN, it says otherwise. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6597) New option for namenode upgrade
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042518#comment-14042518 ] Mit Desai commented on HDFS-6597: - The idea seems to be good as it does not alter the way -upgrade currently works. * I agree with [~cmccabe] and [~cnauroth] on the new name to be force * Instead of -force, -halt or -upgradeOnly seems to be reasonable. But anything would be good as long as it does nor imply we are forcing something to get done which it should not be doing. Thanks, Mit New option for namenode upgrade --- Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: JIRA-HDFS-30.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6597: Summary: Add a new option to NN upgrade to terminate the process after upgrade on NN is completed (was: New option for namenode upgrade) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: JIRA-HDFS-30.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042525#comment-14042525 ] Mit Desai commented on HDFS-6597: - Changing the summary to describe the jira more accurately. Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: JIRA-HDFS-30.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-742: --- Attachment: HDFS-742.patch A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list Key: HDFS-742 URL: https://issues.apache.org/jira/browse/HDFS-742 Project: Hadoop HDFS Issue Type: Bug Components: balancer Reporter: Hairong Kuang Assignee: Mit Desai Attachments: HDFS-742.patch We had a balancer that had not made any progress for a long time. It turned out it was repeatingly asking Namenode for a partial block list of one datanode, which was done while the balancer was running. NameNode should notify Balancer that the datanode is not available and Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026612#comment-14026612 ] Mit Desai commented on HDFS-742: Attaching the patch. Unfortunately I do not have a way to reproduce the issue so I'm unable to have a test to verify the change. Here is the explanation of the part of the Balancer code makes it hang forever. In the following while loop in Balancer.java, when the Balancer figures out that it should fetch more blocks, it gets the BlockList and decrements the blockToReceive by that many blocks. It again starts from the top of the loop after that. {code} while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { ## SOME LINES OMITTED ## filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { ## SOME LINES OMITTED ## // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } ## SOME LINES OMITTED ## } {code} The problem here is, if the datanode is decommissioned, the {{getBlockList()}} method will not return anything and the {{blocksToReceive}} will not be changed. It will keep on doing this indefinitely as the {{blocksToReceive}} will always be greater than 0. The {{isTimeUp}} will never be set to true as it will never reach that part of the code. In the patch that is submitted, the Time up condition is moved to the top of the loop. So it will check if {{isTimeUp}} is set and proceed ahead only if time up is not encountered. A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list Key: HDFS-742 URL: https://issues.apache.org/jira/browse/HDFS-742 Project: Hadoop HDFS Issue Type: Bug Components: balancer Reporter: Hairong Kuang Assignee: Mit Desai Attachments: HDFS-742.patch We had a balancer that had not made any progress for a long time. It turned out it was repeatingly asking Namenode for a partial block list of one datanode, which was done while the balancer was running. NameNode should notify Balancer that the datanode is not available and Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6487: Status: Open (was: Patch Available) TestStandbyCheckpoint#testSBNCheckpoints is racy Key: HDFS-6487 URL: https://issues.apache.org/jira/browse/HDFS-6487 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.4.1 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6487.patch testSBNCheckpoints fails occasionally. I could not reproduce it consistently but it would fail 8 out of 10 times after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6487: Attachment: HDFS-6487.patch Attaching the updated patch. TestStandbyCheckpoint#testSBNCheckpoints is racy Key: HDFS-6487 URL: https://issues.apache.org/jira/browse/HDFS-6487 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.4.1 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6487.patch, HDFS-6487.patch testSBNCheckpoints fails occasionally. I could not reproduce it consistently but it would fail 8 out of 10 times after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6487: Status: Patch Available (was: Open) TestStandbyCheckpoint#testSBNCheckpoints is racy Key: HDFS-6487 URL: https://issues.apache.org/jira/browse/HDFS-6487 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.4.1 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6487.patch, HDFS-6487.patch testSBNCheckpoints fails occasionally. I could not reproduce it consistently but it would fail 8 out of 10 times after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020213#comment-14020213 ] Mit Desai commented on HDFS-6487: - Failure not related to the patch submitted. This has been there since a long time. HDFS-5807 and HDFS-6159 were filed to resolve it. I will comment on those JIRAs. TestStandbyCheckpoint#testSBNCheckpoints is racy Key: HDFS-6487 URL: https://issues.apache.org/jira/browse/HDFS-6487 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.4.1 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6487.patch, HDFS-6487.patch testSBNCheckpoints fails occasionally. I could not reproduce it consistently but it would fail 8 out of 10 times after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success
[ https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020222#comment-14020222 ] Mit Desai commented on HDFS-6159: - This test is failing again. https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ [~airbots], can you take a look on this pre-commit? TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success -- Key: HDFS-6159 URL: https://issues.apache.org/jira/browse/HDFS-6159 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.3.0 Reporter: Chen He Assignee: Chen He Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6159-v2.patch, HDFS-6159-v2.patch, HDFS-6159.patch, logs.txt The TestBalancerWithNodeGroup.testBalancerWithNodeGroup will report negative false failure if there is(are) data block(s) losing after balancer successfuly finishes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020413#comment-14020413 ] Mit Desai commented on HDFS-6487: - Thanks Andrew! TestStandbyCheckpoint#testSBNCheckpoints is racy Key: HDFS-6487 URL: https://issues.apache.org/jira/browse/HDFS-6487 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.4.1 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6487.patch, HDFS-6487.patch testSBNCheckpoints fails occasionally. I could not reproduce it consistently but it would fail 8 out of 10 times after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6487: Attachment: HDFS-6487.patch Attaching patch for trunk and branch-2 TestStandbyCheckpoint#testSBNCheckpoints is racy Key: HDFS-6487 URL: https://issues.apache.org/jira/browse/HDFS-6487 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6487.patch testSBNCheckpoints fails occasionally. I could not reproduce it consistently but it would fail 8 out of 10 times after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018953#comment-14018953 ] Mit Desai commented on HDFS-6487: - In testSBNCheckpoints, after doEdits() it waits for the SBN to do checkpoint and immediately after that checks if the OIV image has been written. The race lies in between completion of checkpoint and checking for OIV image. I have added a wait for writing the OIV image. This prevents the test from failing due to the race and if the OIV image is not written even after 5000ms, the test will fail. Which is what is expected. TestStandbyCheckpoint#testSBNCheckpoints is racy Key: HDFS-6487 URL: https://issues.apache.org/jira/browse/HDFS-6487 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.4.1 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6487.patch testSBNCheckpoints fails occasionally. I could not reproduce it consistently but it would fail 8 out of 10 times after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6487: Affects Version/s: (was: 2.5.0) 2.4.1 Status: Patch Available (was: Open) TestStandbyCheckpoint#testSBNCheckpoints is racy Key: HDFS-6487 URL: https://issues.apache.org/jira/browse/HDFS-6487 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.4.1 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6487.patch testSBNCheckpoints fails occasionally. I could not reproduce it consistently but it would fail 8 out of 10 times after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019320#comment-14019320 ] Mit Desai commented on HDFS-6487: - Thanks Andrew for looking into the patch. Using GenericTestUtils.waitFor looks like a better option. I will update my patch. For the timeour, 5sec works for me now. But I will increase it to 60s (It does not hurt in waiting a little longer. It will come out of the wait before that time anyways) TestStandbyCheckpoint#testSBNCheckpoints is racy Key: HDFS-6487 URL: https://issues.apache.org/jira/browse/HDFS-6487 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.4.1 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6487.patch testSBNCheckpoints fails occasionally. I could not reproduce it consistently but it would fail 8 out of 10 times after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
Mit Desai created HDFS-6487: --- Summary: TestStandbyCheckpoint#testSBNCheckpoints is racy Key: HDFS-6487 URL: https://issues.apache.org/jira/browse/HDFS-6487 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai testSBNCheckpoints fails occasionally. I could not reproduce it consistently but it would fail 8 out of 10 times after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6421: Attachment: HDFS-6421.patch Thanks [~cmccabe] for taking a reviewing the patch. Attaching the new patch addressing your comments. RHEL4 fails to compile vecsum.c --- Key: HDFS-6421 URL: https://issues.apache.org/jira/browse/HDFS-6421 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: 2.5.0 Environment: RHEL4 Reporter: Jason Lowe Assignee: Mit Desai Attachments: HDFS-6421.patch, HDFS-6421.patch After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001718#comment-14001718 ] Mit Desai commented on HDFS-6421: - Correction: Thanks [~cmccabe] for reviewing the patch. :-) RHEL4 fails to compile vecsum.c --- Key: HDFS-6421 URL: https://issues.apache.org/jira/browse/HDFS-6421 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: 2.5.0 Environment: RHEL4 Reporter: Jason Lowe Assignee: Mit Desai Attachments: HDFS-6421.patch, HDFS-6421.patch After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6421: Status: Open (was: Patch Available) RHEL4 fails to compile vecsum.c --- Key: HDFS-6421 URL: https://issues.apache.org/jira/browse/HDFS-6421 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: 2.5.0 Environment: RHEL4 Reporter: Jason Lowe Assignee: Mit Desai Attachments: HDFS-6421.patch, HDFS-6421.patch After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6421: Status: Patch Available (was: Open) RHEL4 fails to compile vecsum.c --- Key: HDFS-6421 URL: https://issues.apache.org/jira/browse/HDFS-6421 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: 2.5.0 Environment: RHEL4 Reporter: Jason Lowe Assignee: Mit Desai Attachments: HDFS-6421.patch, HDFS-6421.patch After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned HDFS-742: -- Assignee: Mit Desai (was: Hairong Kuang) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list Key: HDFS-742 URL: https://issues.apache.org/jira/browse/HDFS-742 Project: Hadoop HDFS Issue Type: Bug Components: balancer Reporter: Hairong Kuang Assignee: Mit Desai We had a balancer that had not made any progress for a long time. It turned out it was repeatingly asking Namenode for a partial block list of one datanode, which was done while the balancer was running. NameNode should notify Balancer that the datanode is not available and Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Status: Open (was: Patch Available) Expose upgrade status through NameNode web UI - Key: HDFS-6230 URL: https://issues.apache.org/jira/browse/HDFS-6230 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Mit Desai Attachments: HDFS-6230-NoUpgradesInProgress.png, HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch, HDFS-6230.patch The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned HDFS-6421: --- Assignee: Mit Desai RHEL4 fails to compile vecsum.c --- Key: HDFS-6421 URL: https://issues.apache.org/jira/browse/HDFS-6421 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: 2.5.0 Environment: RHEL4 Reporter: Jason Lowe Assignee: Mit Desai After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6421: Attachment: HDFS-6421.patch This code in the stopwatch structure gets the rusage and stores it into {{struct rusage rusage;}} but it is never used. {code} if (getrusage(RUSAGE_THREAD, watch-rusage) 0) { int err = errno; fprintf(stderr, getrusage failed: error %d (%s)\n, err, strerror(err)); goto error; } {code} Removing the block as to get REHL4 compiling again. RHEL4 fails to compile vecsum.c --- Key: HDFS-6421 URL: https://issues.apache.org/jira/browse/HDFS-6421 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: 2.5.0 Environment: RHEL4 Reporter: Jason Lowe Assignee: Mit Desai Attachments: HDFS-6421.patch After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Status: Patch Available (was: Open) Expose upgrade status through NameNode web UI - Key: HDFS-6230 URL: https://issues.apache.org/jira/browse/HDFS-6230 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Mit Desai Attachments: HDFS-6230-NoUpgradesInProgress.png, HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch, HDFS-6230.patch The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6421: Status: Patch Available (was: Open) RHEL4 fails to compile vecsum.c --- Key: HDFS-6421 URL: https://issues.apache.org/jira/browse/HDFS-6421 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: 2.5.0 Environment: RHEL4 Reporter: Jason Lowe Assignee: Mit Desai Attachments: HDFS-6421.patch After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993598#comment-13993598 ] Mit Desai commented on HDFS-742: Taking this over. Feel free to reassign if you are still working on it. A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list Key: HDFS-742 URL: https://issues.apache.org/jira/browse/HDFS-742 Project: Hadoop HDFS Issue Type: Bug Components: balancer Reporter: Hairong Kuang Assignee: Hairong Kuang We had a balancer that had not made any progress for a long time. It turned out it was repeatingly asking Namenode for a partial block list of one datanode, which was done while the balancer was running. NameNode should notify Balancer that the datanode is not available and Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Attachment: HDFS-6230.patch Thanks for looking at the patch [~wheat9]. Posting the updated patch. Expose upgrade status through NameNode web UI - Key: HDFS-6230 URL: https://issues.apache.org/jira/browse/HDFS-6230 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Mit Desai Attachments: HDFS-6230-NoUpgradesInProgress.png, HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch, HDFS-6230.patch The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992781#comment-13992781 ] Mit Desai commented on HDFS-742: Hey [~hairong], are you still working on this JIRA? If not, I can take it over and work on it. A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list Key: HDFS-742 URL: https://issues.apache.org/jira/browse/HDFS-742 Project: Hadoop HDFS Issue Type: Bug Components: balancer Reporter: Hairong Kuang Assignee: Hairong Kuang We had a balancer that had not made any progress for a long time. It turned out it was repeatingly asking Namenode for a partial block list of one datanode, which was done while the balancer was running. NameNode should notify Balancer that the datanode is not available and Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Attachment: HDFS-6230-UpgradeInProgress.jpg HDFS-6230-NoUpgradesInProgress.png Expose upgrade status through NameNode web UI - Key: HDFS-6230 URL: https://issues.apache.org/jira/browse/HDFS-6230 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Mit Desai Attachments: HDFS-6230-NoUpgradesInProgress.png, HDFS-6230-UpgradeInProgress.jpg The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Status: Patch Available (was: Open) Expose upgrade status through NameNode web UI - Key: HDFS-6230 URL: https://issues.apache.org/jira/browse/HDFS-6230 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Mit Desai Attachments: HDFS-6230-NoUpgradesInProgress.png, HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985963#comment-13985963 ] Mit Desai commented on HDFS-6230: - [~arpitagarwal] are you working on the jira? Expose upgrade status through NameNode web UI - Key: HDFS-6230 URL: https://issues.apache.org/jira/browse/HDFS-6230 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned HDFS-6230: --- Assignee: Mit Desai (was: Arpit Agarwal) Expose upgrade status through NameNode web UI - Key: HDFS-6230 URL: https://issues.apache.org/jira/browse/HDFS-6230 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Mit Desai The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985985#comment-13985985 ] Mit Desai commented on HDFS-6230: - Thanks! Taking it over Expose upgrade status through NameNode web UI - Key: HDFS-6230 URL: https://issues.apache.org/jira/browse/HDFS-6230 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Mit Desai The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984337#comment-13984337 ] Mit Desai commented on HDFS-5892: - [~wheat9], taking a closer look on the commits, I found that this is not yet fixed into 2.4. Do we want to commit this into 2.4.1 and change the Fix version to 2.4.1 or edit the fix version to 2.5.0? TestDeleteBlockPool fails in branch-2 - Key: HDFS-5892 URL: https://issues.apache.org/jira/browse/HDFS-5892 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor Fix For: 2.4.0 Attachments: HDFS-5892.patch, org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt Running test suite on Linux, I got: {code} testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) Time elapsed: 8.143 sec ERROR! java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.
[ https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved HDFS-3122. - Resolution: Not a Problem Target Version/s: 0.23.3, 0.24.0 (was: 0.24.0, 0.23.3) Haven't heard anything yet. Resolving this issue. Feel free to reopen if anyone thinks the other way. Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt. Key: HDFS-3122 URL: https://issues.apache.org/jira/browse/HDFS-3122 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Critical Attachments: blockCorrupt.txt *Block Report* can *race* with *Block Recovery* with closeFile flag true. Block report generated just before block recovery at DN side and due to N/W problems, block report got delayed to NN. After this, recovery success and generation stamp modifies to new one. And primary DN invokes the commitBlockSynchronization and block got updated in NN side. Also block got marked as complete, since the closeFile flag was true. Updated with new genstamp. Now blockReport started processing at NN side. This particular block from RBW (when it generated the BR at DN), and file was completed at NN side. Finally block will be marked as corrupt because of genstamp mismatch. {code} case RWR: if (!storedBlock.isComplete()) { return null; // not corrupt } else if (storedBlock.getGenerationStamp() != iblk.getGenerationStamp()) { return new BlockToMarkCorrupt(storedBlock, reported + reportedState + replica with genstamp + iblk.getGenerationStamp() + does not match COMPLETE block's + genstamp in block map + storedBlock.getGenerationStamp()); } else { // COMPLETE block, same genstamp {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.
[ https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979848#comment-13979848 ] Mit Desai commented on HDFS-3122: - Hi [~umamaheswararao], Is this still an issue? I looked at the code and I think this got fixed sometime. Here is the code snippet from BlockManager {code} case RWR: if (!storedBlock.isComplete()) { return null; // not corrupt } else if (storedBlock.getGenerationStamp() != reported.getGenerationStamp()) { final long reportedGS = reported.getGenerationStamp(); return new BlockToMarkCorrupt(storedBlock, reportedGS, reported + reportedState + replica with genstamp + reportedGS + does not match COMPLETE block's genstamp in block map + storedBlock.getGenerationStamp(), Reason.GENSTAMP_MISMATCH); } else { // COMPLETE block, same genstamp if (reportedState == ReplicaState.RBW) { // If it's a RBW report for a COMPLETE block, it may just be that // the block report got a little bit delayed after the pipeline // closed. So, ignore this report, assuming we will get a // FINALIZED replica later. See HDFS-2791 LOG.info(Received an RBW replica for + storedBlock + on + dn + : ignoring it, since it is + complete with the same genstamp); return null; } else { return new BlockToMarkCorrupt(storedBlock, reported replica has invalid state + reportedState, Reason.INVALID_STATE); } } {code} I will resolve this Jira as Not a Problem tomorrow unless someone wants to go some other way. Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt. Key: HDFS-3122 URL: https://issues.apache.org/jira/browse/HDFS-3122 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Critical Attachments: blockCorrupt.txt *Block Report* can *race* with *Block Recovery* with closeFile flag true. Block report generated just before block recovery at DN side and due to N/W problems, block report got delayed to NN. After this, recovery success and generation stamp modifies to new one. And primary DN invokes the commitBlockSynchronization and block got updated in NN side. Also block got marked as complete, since the closeFile flag was true. Updated with new genstamp. Now blockReport started processing at NN side. This particular block from RBW (when it generated the BR at DN), and file was completed at NN side. Finally block will be marked as corrupt because of genstamp mismatch. {code} case RWR: if (!storedBlock.isComplete()) { return null; // not corrupt } else if (storedBlock.getGenerationStamp() != iblk.getGenerationStamp()) { return new BlockToMarkCorrupt(storedBlock, reported + reportedState + replica with genstamp + iblk.getGenerationStamp() + does not match COMPLETE block's + genstamp in block map + storedBlock.getGenerationStamp()); } else { // COMPLETE block, same genstamp {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered
[ https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved HDFS-2734. - Resolution: Not a Problem Target Version/s: 0.23.0, 0.20.1 (was: 0.20.1, 0.23.0) I think this issue is not a problem. Resolving it as Not a Problem. But feel free to reopen this jira if you still feel there is a problem Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered Key: HDFS-2734 URL: https://issues.apache.org/jira/browse/HDFS-2734 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1, 0.23.0 Reporter: J.Andreina Priority: Minor Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971592#comment-13971592 ] Mit Desai commented on HDFS-5892: - [~yuzhih...@gmail.com] [~dandan] : Are you guys still having the issues? This test still fails randomly in our nightly builds TestDeleteBlockPool fails in branch-2 - Key: HDFS-5892 URL: https://issues.apache.org/jira/browse/HDFS-5892 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor Fix For: 2.4.0 Attachments: HDFS-5892.patch, org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt Running test suite on Linux, I got: {code} testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) Time elapsed: 8.143 sec ERROR! java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-4587) Webhdfs secure clients are incompatible with non-secure NN
[ https://issues.apache.org/jira/browse/HDFS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-4587: Target Version/s: 3.0.0 (was: 3.0.0, 0.23.11) Webhdfs secure clients are incompatible with non-secure NN -- Key: HDFS-4587 URL: https://issues.apache.org/jira/browse/HDFS-4587 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Daryn Sharp A secure webhdfs client will receive an exception from a non-secure NN. For a NN in non-secure mode, {{FSNamesystem#getDelegationToken}} will return null to indicate no token is required. Hdfs will send back the null to the client, but webhdfs uses {{DelegationTokenSecretManager.createCredentials}} which instead throws an exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4587) Webhdfs secure clients are incompatible with non-secure NN
[ https://issues.apache.org/jira/browse/HDFS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968382#comment-13968382 ] Mit Desai commented on HDFS-4587: - As 0.23 is going into the maintenance state and this bug will not be fixed in it, I am removing the target version for 0.23.11 Webhdfs secure clients are incompatible with non-secure NN -- Key: HDFS-4587 URL: https://issues.apache.org/jira/browse/HDFS-4587 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Daryn Sharp A secure webhdfs client will receive an exception from a non-secure NN. For a NN in non-secure mode, {{FSNamesystem#getDelegationToken}} will return null to indicate no token is required. Hdfs will send back the null to the client, but webhdfs uses {{DelegationTokenSecretManager.createCredentials}} which instead throws an exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-4576) Webhdfs authentication issues
[ https://issues.apache.org/jira/browse/HDFS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved HDFS-4576. - Resolution: Fixed Fix Version/s: 0.23.11 3.0.0 Resolving this task as resolved as all of its subtasks are resolved now Webhdfs authentication issues - Key: HDFS-4576 URL: https://issues.apache.org/jira/browse/HDFS-4576 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.0.0-alpha, 3.0.0, 0.23.7 Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 3.0.0, 0.23.11 Umbrella jira to track the webhdfs authentication issues as subtasks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-4587) Webhdfs secure clients are incompatible with non-secure NN
[ https://issues.apache.org/jira/browse/HDFS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-4587: Issue Type: Bug (was: Sub-task) Parent: (was: HDFS-4576) Webhdfs secure clients are incompatible with non-secure NN -- Key: HDFS-4587 URL: https://issues.apache.org/jira/browse/HDFS-4587 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Daryn Sharp A secure webhdfs client will receive an exception from a non-secure NN. For a NN in non-secure mode, {{FSNamesystem#getDelegationToken}} will return null to indicate no token is required. Hdfs will send back the null to the client, but webhdfs uses {{DelegationTokenSecretManager.createCredentials}} which instead throws an exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered
[ https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966723#comment-13966723 ] Mit Desai commented on HDFS-2734: - I see that that there is no activity on this Jira since a long time. [~andreina], Is this still reproducible on your side? If this is still an issue, can you provide the information [~qwertymaniac] requested? For the analysis that Harsh did, I think this is not reproducable on his side and I have not seen anyone else raising this concern. In that case, if I do not hear back by 4/17/14, I will go ahead and close this issue as Not A Problem. -Mit Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered Key: HDFS-2734 URL: https://issues.apache.org/jira/browse/HDFS-2734 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.20.1, 0.23.0 Reporter: J.Andreina Priority: Minor Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964459#comment-13964459 ] Mit Desai commented on HDFS-5983: - Reviewed the patch. LGTM +1 (non binding) TestSafeMode#testInitializeReplQueuesEarly fails Key: HDFS-5983 URL: https://issues.apache.org/jira/browse/HDFS-5983 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Ming Ma Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt It was seen from one of the precommit build of HDFS-5962. The test case creates 15 blocks and then shuts down all datanodes. Then the namenode is restarted with a low safe block threshold and one datanode is restarted. The idea is that the initial block report from the restarted datanode will make the namenode leave the safemode and initialize the replication queues. According to the log, the datanode reported 3 blocks, but slightly before that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5983: Status: Patch Available (was: Open) TestSafeMode#testInitializeReplQueuesEarly fails Key: HDFS-5983 URL: https://issues.apache.org/jira/browse/HDFS-5983 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Ming Ma Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt It was seen from one of the precommit build of HDFS-5962. The test case creates 15 blocks and then shuts down all datanodes. Then the namenode is restarted with a low safe block threshold and one datanode is restarted. The idea is that the initial block report from the restarted datanode will make the namenode leave the safemode and initialize the replication queues. According to the log, the datanode reported 3 blocks, but slightly before that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964462#comment-13964462 ] Mit Desai commented on HDFS-5983: - One note, you need to Submit the Patch once you upload the patch to get the HadoopQA Comment. I just did that. TestSafeMode#testInitializeReplQueuesEarly fails Key: HDFS-5983 URL: https://issues.apache.org/jira/browse/HDFS-5983 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Ming Ma Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt It was seen from one of the precommit build of HDFS-5962. The test case creates 15 blocks and then shuts down all datanodes. Then the namenode is restarted with a low safe block threshold and one datanode is restarted. The idea is that the initial block report from the restarted datanode will make the namenode leave the safemode and initialize the replication queues. According to the log, the datanode reported 3 blocks, but slightly before that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964491#comment-13964491 ] Mit Desai commented on HDFS-5983: - [~airbots], [~mingma] : Can any of you regenerate the patch and attach it to make sure it applies successfully? Mit TestSafeMode#testInitializeReplQueuesEarly fails Key: HDFS-5983 URL: https://issues.apache.org/jira/browse/HDFS-5983 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Ming Ma Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt It was seen from one of the precommit build of HDFS-5962. The test case creates 15 blocks and then shuts down all datanodes. Then the namenode is restarted with a low safe block threshold and one datanode is restarted. The idea is that the initial block report from the restarted datanode will make the namenode leave the safemode and initialize the replication queues. According to the log, the datanode reported 3 blocks, but slightly before that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5983: Status: Open (was: Patch Available) TestSafeMode#testInitializeReplQueuesEarly fails Key: HDFS-5983 URL: https://issues.apache.org/jira/browse/HDFS-5983 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Ming Ma Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt It was seen from one of the precommit build of HDFS-5962. The test case creates 15 blocks and then shuts down all datanodes. Then the namenode is restarted with a low safe block threshold and one datanode is restarted. The idea is that the initial block report from the restarted datanode will make the namenode leave the safemode and initialize the replication queues. According to the log, the datanode reported 3 blocks, but slightly before that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964579#comment-13964579 ] Mit Desai commented on HDFS-5983: - Already fixed by HDFS-6160. So Closing it. TestSafeMode#testInitializeReplQueuesEarly fails Key: HDFS-5983 URL: https://issues.apache.org/jira/browse/HDFS-5983 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Kihwal Lee Assignee: Ming Ma Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt It was seen from one of the precommit build of HDFS-5962. The test case creates 15 blocks and then shuts down all datanodes. Then the namenode is restarted with a low safe block threshold and one datanode is restarted. The idea is that the initial block report from the restarted datanode will make the namenode leave the safemode and initialize the replication queues. According to the log, the datanode reported 3 blocks, but slightly before that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2
Mit Desai created HDFS-6195: --- Summary: TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2 Key: HDFS-6195 URL: https://issues.apache.org/jira/browse/HDFS-6195 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai The test has 1 containers that it tries to cleanup. The cleanup has a timeout of 2ms in which the test sometimes cannot do the cleanup completely and gives out an Assertion Failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2
[ https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961887#comment-13961887 ] Mit Desai commented on HDFS-6195: - analyzing the cause. Will post the analysis/fix soon TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2 -- Key: HDFS-6195 URL: https://issues.apache.org/jira/browse/HDFS-6195 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai The test has 1 containers that it tries to cleanup. The cleanup has a timeout of 2ms in which the test sometimes cannot do the cleanup completely and gives out an Assertion Failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2
[ https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6195: Attachment: HDFS-6195.patch Attaching the patch for trunk and branch2 TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2 -- Key: HDFS-6195 URL: https://issues.apache.org/jira/browse/HDFS-6195 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-6195.patch The test has 1 containers that it tries to cleanup. The cleanup has a timeout of 2ms in which the test sometimes cannot do the cleanup completely and gives out an Assertion Failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2
[ https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6195: Fix Version/s: 2.5.0 3.0.0 Status: Patch Available (was: Open) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2 -- Key: HDFS-6195 URL: https://issues.apache.org/jira/browse/HDFS-6195 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6195.patch The test has 1 containers that it tries to cleanup. The cleanup has a timeout of 2ms in which the test sometimes cannot do the cleanup completely and gives out an Assertion Failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2
[ https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961984#comment-13961984 ] Mit Desai commented on HDFS-6195: - While cleaning up the containers, {code} while (cleanedSize allocatedSize waitCount++ 200) { Thread.sleep(100); resp = nm.nodeHeartbeat(true); cleaned = resp.getContainersToCleanup(); cleanedSize += cleaned.size(); } {code} The test sometimes cannot do the complete cleanup and some of the 1 containers cannot be cleaned up. Resulting an assertion error at {{Assert.assertEquals(allocatedSize, cleanedSize);}}. This test has been failing in our nightly builds since couple of days. I was able to reproduce this consistently on eclipse but not using maven. I think this is an environment issue so cannot be reproduced everywhere. As a fix, I have increased the thread sleep time in the while loop. Which will give some extra time for the container cleanup. And as there is also a check in the while loop for the allocated size and cleaned size, the test will not always take up all cycles in the loop. TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2 -- Key: HDFS-6195 URL: https://issues.apache.org/jira/browse/HDFS-6195 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6195.patch The test has 1 containers that it tries to cleanup. The cleanup has a timeout of 2ms in which the test sometimes cannot do the cleanup completely and gives out an Assertion Failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2
[ https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962088#comment-13962088 ] Mit Desai commented on HDFS-6195: - TestRMRestart is a different issue related to YARN-1906 TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2 -- Key: HDFS-6195 URL: https://issues.apache.org/jira/browse/HDFS-6195 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6195.patch The test has 1 containers that it tries to cleanup. The cleanup has a timeout of 2ms in which the test sometimes cannot do the cleanup completely and gives out an Assertion Failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5807) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2
[ https://issues.apache.org/jira/browse/HDFS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947957#comment-13947957 ] Mit Desai commented on HDFS-5807: - Thanks [~airbots] TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2 Key: HDFS-5807 URL: https://issues.apache.org/jira/browse/HDFS-5807 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.3.0 Reporter: Mit Desai Assignee: Chen He Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5807.patch The test times out after some time. {noformat} java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.16, but on datanode 127.0.0.1:42451 it remains at 0.3 after more than 2 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (HDFS-5807) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2
[ https://issues.apache.org/jira/browse/HDFS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reopened HDFS-5807: - [~airbots], I found this test failing again in our nightly builds, Can you take a look into it again? {noformat} Error Message Rebalancing expected avg utilization to become 0.16, but on datanode X.X.X.X: it remains at 0.3 after more than 4 msec. Stacktrace java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.16, but on datanode X.X.X.X: it remains at 0.3 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302) {noformat} TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2 Key: HDFS-5807 URL: https://issues.apache.org/jira/browse/HDFS-5807 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.3.0 Reporter: Mit Desai Assignee: Chen He Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5807.patch The test times out after some time. {noformat} java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.16, but on datanode 127.0.0.1:42451 it remains at 0.3 after more than 2 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6126) TestnameNodeMetrics#testCorruptBlock fails intermittently
Mit Desai created HDFS-6126: --- Summary: TestnameNodeMetrics#testCorruptBlock fails intermittently Key: HDFS-6126 URL: https://issues.apache.org/jira/browse/HDFS-6126 Project: Hadoop HDFS Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai I get the following error {noformat} testCorruptBlock(org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics) Time elapsed: 5.556 sec FAILURE! java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:190) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:247) Results : Failed tests: TestNameNodeMetrics.testCorruptBlock:247 Bad value for metric CorruptBlocks expected:1 but was:0 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6104) TestFsLimits#testDefaultMaxComponentLength Fails on branch-2
Mit Desai created HDFS-6104: --- Summary: TestFsLimits#testDefaultMaxComponentLength Fails on branch-2 Key: HDFS-6104 URL: https://issues.apache.org/jira/browse/HDFS-6104 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Mit Desai testDefaultMaxComponentLength fails intermittently with the following error {noformat} java.lang.AssertionError: expected:0 but was:255 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.namenode.TestFsLimits.testDefaultMaxComponentLength(TestFsLimits.java:90) {noformat} On doing some research, I found that this is actually a JDK7 issue. The test always fails when it runs after any test that runs addChildWithName() method -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930720#comment-13930720 ] Mit Desai commented on HDFS-6035: - [~sathish.gurram], Can you let me know what branch are you testing this on? TestCacheDirectives#testCacheManagerRestart is failing on branch-2 -- Key: HDFS-6035 URL: https://issues.apache.org/jira/browse/HDFS-6035 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: sathish Attachments: HDFS-6035-0001.patch {noformat} java.io.IOException: Inconsistent checkpoint fields. LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; blockpoolId = BP-423574854-x.x.x.x-1393478669835. Expecting respectively: -51; 2; 0; testClusterID; BP-2051361571-x.x.x.x-1393478572877. at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920948#comment-13920948 ] Mit Desai commented on HDFS-6035: - I am trying but cannot reproduce it in eclipse as well. I'll have to put some more efforts and update you once I have some findings. TestCacheDirectives#testCacheManagerRestart is failing on branch-2 -- Key: HDFS-6035 URL: https://issues.apache.org/jira/browse/HDFS-6035 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: sathish Attachments: HDFS-6035-0001.patch {noformat} java.io.IOException: Inconsistent checkpoint fields. LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; blockpoolId = BP-423574854-x.x.x.x-1393478669835. Expecting respectively: -51; 2; 0; testClusterID; BP-2051361571-x.x.x.x-1393478572877. at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE
[ https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5857: Attachment: HDFS-5857.patch Thanks for the inputs Haohui. Attaching the updated patch TestWebHDFS#testNamenodeRestart fails intermittently with NPE - Key: HDFS-5857 URL: https://issues.apache.org/jira/browse/HDFS-5857 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-5857.patch, HDFS-5857.patch {noformat} java.lang.AssertionError: There are 1 exception(s): Exception 0: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816) at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196) at org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920) at java.lang.Thread.run(Thread.java:722) at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031) at org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951) at org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE
[ https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921181#comment-13921181 ] Mit Desai commented on HDFS-5857: - None of the test failures are related to the patch. I have manually tested them with the patch and they pass on my machine TestWebHDFS#testNamenodeRestart fails intermittently with NPE - Key: HDFS-5857 URL: https://issues.apache.org/jira/browse/HDFS-5857 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-5857.patch, HDFS-5857.patch {noformat} java.lang.AssertionError: There are 1 exception(s): Exception 0: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816) at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196) at org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920) at java.lang.Thread.run(Thread.java:722) at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031) at org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951) at org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE
[ https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned HDFS-5857: --- Assignee: Mit Desai TestWebHDFS#testNamenodeRestart fails intermittently with NPE - Key: HDFS-5857 URL: https://issues.apache.org/jira/browse/HDFS-5857 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Mit Desai Assignee: Mit Desai {noformat} java.lang.AssertionError: There are 1 exception(s): Exception 0: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816) at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196) at org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920) at java.lang.Thread.run(Thread.java:722) at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031) at org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951) at org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE
[ https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5857: Affects Version/s: 3.0.0 Status: Patch Available (was: Open) TestWebHDFS#testNamenodeRestart fails intermittently with NPE - Key: HDFS-5857 URL: https://issues.apache.org/jira/browse/HDFS-5857 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0, 3.0.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-5857.patch {noformat} java.lang.AssertionError: There are 1 exception(s): Exception 0: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816) at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196) at org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920) at java.lang.Thread.run(Thread.java:722) at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031) at org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951) at org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-5839) TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk
[ https://issues.apache.org/jira/browse/HDFS-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved HDFS-5839. - Resolution: Duplicate HDFS-5857 has a patch for this issue. I am resolving this JIRA so that we have a single Jira tracking it TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk Key: HDFS-5839 URL: https://issues.apache.org/jira/browse/HDFS-5839 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Mit Desai Attachments: org.apache.hadoop.hdfs.web.TestWebHDFS-output.txt Here is test failure: {code} testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS) Time elapsed: 45.206 sec FAILURE! java.lang.AssertionError: There are 1 exception(s): Exception 0: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:104) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:615) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$OffsetUrlOpener.connect(WebHdfsFileSystem.java:878) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:180) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.TestDFSClientRetries$5.run(TestDFSClientRetries.java:954) at java.lang.Thread.run(Thread.java:724) at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1083) at org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:1003) at org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) {code} From test output: {code} 2014-01-27 17:55:59,388 WARN resources.ExceptionHandler (ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:166) at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:231) at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:658) at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.access$400(NamenodeWebHdfsMethods.java:116) at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:631) at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:626) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1560) at org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:626) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at
[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918110#comment-13918110 ] Mit Desai commented on HDFS-6035: - Thanks for taking this issue Sathish. This test is failing in our nightly builds but I am unable to reproduce it. is there a specific way you were able to reproduce it? TestCacheDirectives#testCacheManagerRestart is failing on branch-2 -- Key: HDFS-6035 URL: https://issues.apache.org/jira/browse/HDFS-6035 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: sathish Attachments: HDFS-6035-0001.patch {noformat} java.io.IOException: Inconsistent checkpoint fields. LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; blockpoolId = BP-423574854-x.x.x.x-1393478669835. Expecting respectively: -51; 2; 0; testClusterID; BP-2051361571-x.x.x.x-1393478572877. at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5950) The DFSClient and DataNode should use shared memory segments to communicate short-circuit information
[ https://issues.apache.org/jira/browse/HDFS-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918570#comment-13918570 ] Mit Desai commented on HDFS-5950: - Hey, I just found that this check in causes a Release Audit Warning for the empty file _TestShortCircuitShm.java_ The DFSClient and DataNode should use shared memory segments to communicate short-circuit information - Key: HDFS-5950 URL: https://issues.apache.org/jira/browse/HDFS-5950 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-5950.001.patch, HDFS-5950.003.patch, HDFS-5950.004.patch, HDFS-5950.006.patch, HDFS-5950.007.patch, HDFS-5950.008.patch The DFSClient and DataNode should use the shared memory segments and unified cache added in the other HDFS-5182 subtasks to communicate short-circuit information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2
Mit Desai created HDFS-6035: --- Summary: TestCacheDirectives#testCacheManagerRestart is failing on branch-2 Key: HDFS-6035 URL: https://issues.apache.org/jira/browse/HDFS-6035 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0 Reporter: Mit Desai {noformat} java.io.IOException: Inconsistent checkpoint fields. LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; blockpoolId = BP-423574854-x.x.x.x-1393478669835. Expecting respectively: -51; 2; 0; testClusterID; BP-2051361571-x.x.x.x-1393478572877. at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5780: Attachment: HDFS-5780-v3.patch New Patch Attached. No code changes after form the previous patch. This patch only contains the change in the comment where the Thread timeout was changed from 1sec to 2sec TestRBWBlockInvalidation times out intemittently on branch-2 Key: HDFS-5780 URL: https://issues.apache.org/jira/browse/HDFS-5780 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch i recently found out that the test TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times out intermittently. I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5780: Attachment: HDFS-5780.patch Attaching the patch. We need to change the conditions in the test because the test failure is due to the Replication Monitor coming and making the changes to the corrupted block before the test checks for it. The test will than keep on waiting for the change to happen. TestRBWBlockInvalidation times out intemittently on branch-2 Key: HDFS-5780 URL: https://issues.apache.org/jira/browse/HDFS-5780 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-5780.patch i recently found out that the test TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times out intermittently. I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5780: Affects Version/s: 3.0.0 Status: Patch Available (was: Open) TestRBWBlockInvalidation times out intemittently on branch-2 Key: HDFS-5780 URL: https://issues.apache.org/jira/browse/HDFS-5780 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0, 3.0.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-5780.patch i recently found out that the test TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times out intermittently. I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901969#comment-13901969 ] Mit Desai commented on HDFS-5780: - Thanks Arpit. I will address your concerns and post another patch. TestRBWBlockInvalidation times out intemittently on branch-2 Key: HDFS-5780 URL: https://issues.apache.org/jira/browse/HDFS-5780 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-5780.patch i recently found out that the test TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times out intermittently. I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5780: Status: Open (was: Patch Available) TestRBWBlockInvalidation times out intemittently on branch-2 Key: HDFS-5780 URL: https://issues.apache.org/jira/browse/HDFS-5780 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0, 3.0.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-5780.patch i recently found out that the test TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times out intermittently. I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5780: Attachment: HDFS-5780.patch Attaching the new patch with the addressed changes. I have increased the timeout to 10minutes and I had to make few other timing related changes. TestRBWBlockInvalidation times out intemittently on branch-2 Key: HDFS-5780 URL: https://issues.apache.org/jira/browse/HDFS-5780 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.2.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: HDFS-5780.patch, HDFS-5780.patch i recently found out that the test TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times out intermittently. I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)