[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644357#comment-14644357 ] Mit Desai commented on HDFS-742: Attached modified patch. But still, I do not have a unit test for the fix > A down DataNode makes Balancer to hang on repeatingly asking NameNode its > partial block list > > > Key: HDFS-742 > URL: https://issues.apache.org/jira/browse/HDFS-742 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: Hairong Kuang >Assignee: Mit Desai > Attachments: HDFS-742-trunk.patch, HDFS-742.patch > > > We had a balancer that had not made any progress for a long time. It turned > out it was repeatingly asking Namenode for a partial block list of one > datanode, which was done while the balancer was running. > NameNode should notify Balancer that the datanode is not available and > Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-742: --- Attachment: HDFS-742-trunk.patch > A down DataNode makes Balancer to hang on repeatingly asking NameNode its > partial block list > > > Key: HDFS-742 > URL: https://issues.apache.org/jira/browse/HDFS-742 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: Hairong Kuang >Assignee: Mit Desai > Attachments: HDFS-742-trunk.patch, HDFS-742.patch > > > We had a balancer that had not made any progress for a long time. It turned > out it was repeatingly asking Namenode for a partial block list of one > datanode, which was done while the balancer was running. > NameNode should notify Balancer that the datanode is not available and > Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7364) Balancer always show zero Bytes Already Moved
[ https://issues.apache.org/jira/browse/HDFS-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200425#comment-14200425 ] Mit Desai commented on HDFS-7364: - Nice catch. Here balancer exits here after 5 iterations of what it thinks has 0B move. It means it is still balancing and exits in middle the process. I see that the Bytes left to move is going down in every iteration. It will be nice to have this fixed. But it would be good to have a unit test as well. > Balancer always show zero Bytes Already Moved > - > > Key: HDFS-7364 > URL: https://issues.apache.org/jira/browse/HDFS-7364 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h7364_20141105.patch > > > Here is an example: > {noformat} > Time Stamp Iteration# Bytes Already Moved Bytes Left To Move > Bytes Being Moved > Nov 5, 2014 5:23:38 PM0 0 B 116.82 MB > 181.07 MB > Nov 5, 2014 5:24:30 PM1 0 B88.05 MB > 181.07 MB > Nov 5, 2014 5:25:10 PM2 0 B73.08 MB > 181.07 MB > Nov 5, 2014 5:25:49 PM3 0 B13.37 MB > 90.53 MB > Nov 5, 2014 5:26:30 PM4 0 B13.59 MB > 90.53 MB > Nov 5, 2014 5:27:12 PM5 0 B 9.25 MB > 90.53 MB > The cluster is balanced. Exiting... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7230) Add rolling downgrade documentation
[ https://issues.apache.org/jira/browse/HDFS-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190226#comment-14190226 ] Mit Desai commented on HDFS-7230: - +1 (non-binding) Thanks for the patch [~szetszwo]. > Add rolling downgrade documentation > --- > > Key: HDFS-7230 > URL: https://issues.apache.org/jira/browse/HDFS-7230 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h7230_20141028.patch > > > HDFS-5535 made a lot of improvement on rolling upgrade. It also added the > cluster downgrade feature. However, the downgrade described in HDFS-5535 > requires cluster downtime. In this JIRA, we discuss how to do rolling > downgrade, i.e. downgrade without downtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6687) nn.getNamesystem() may return NPE from JspHelper
[ https://issues.apache.org/jira/browse/HDFS-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6687: Target Version/s: 2.6.0 (was: 2.5.0) > nn.getNamesystem() may return NPE from JspHelper > > > Key: HDFS-6687 > URL: https://issues.apache.org/jira/browse/HDFS-6687 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.5.0 >Reporter: Mit Desai >Assignee: Mit Desai > > In hadoop-2, the http server is started in the very early stage to show the > progress. If the user tries to get the name system, it may not be completely > up and the NN logs will have this kind of error. > {noformat} > 2014-07-14 15:49:03,521 [***] WARN > resources.ExceptionHandler: INTERNAL_SERVER_ERROR > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.common.JspHelper.getTokenUGI(JspHelper.java:661) > at > org.apache.hadoop.hdfs.server.common.JspHelper.getUGI(JspHelper.java:604) > at > org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:53) > at > org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:41) > at > com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) > at > com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:84) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78) > at com.yahoo.hadoop.GzipFilter.doFilter(GzipFilter.java:220) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.
[jira] [Created] (HDFS-6983) TestBalancer#testExitZeroOnSuccess fails intermittently
Mit Desai created HDFS-6983: --- Summary: TestBalancer#testExitZeroOnSuccess fails intermittently Key: HDFS-6983 URL: https://issues.apache.org/jira/browse/HDFS-6983 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Mit Desai TestBalancer#testExitZeroOnSuccess fails intermittently on branch-2. And probably fails on trunk too. The test fails 1 in 20 times when I ran it in a loop. Here is the how it fails. {noformat} org.apache.hadoop.hdfs.server.balancer.TestBalancer testExitZeroOnSuccess(org.apache.hadoop.hdfs.server.balancer.TestBalancer) Time elapsed: 53.965 sec <<< ERROR! java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.2, but on datanode 127.0.0.1:35502 it remains at 0.08 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:321) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:632) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:549) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:437) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:645) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:845) Results : Tests in error: TestBalancer.testExitZeroOnSuccess:845->oneNodeTest:645->doTest:437->doTest:549->runBalancerCli:632->waitForBalancer:321 Timeout {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076918#comment-14076918 ] Mit Desai commented on HDFS-6754: - Thanks Daryn! > TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of > retry > --- > > Key: HDFS-6754 > URL: https://issues.apache.org/jira/browse/HDFS-6754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6754.patch, HDFS-6754.patch > > > I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently > in our nightly builds with the following error: > {noformat} > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) > at > org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076649#comment-14076649 ] Mit Desai commented on HDFS-6754: - These test failures are not related to the patch. > TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of > retry > --- > > Key: HDFS-6754 > URL: https://issues.apache.org/jira/browse/HDFS-6754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6754.patch, HDFS-6754.patch > > > I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently > in our nightly builds with the following error: > {noformat} > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) > at > org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6754: Status: Patch Available (was: Open) > TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of > retry > --- > > Key: HDFS-6754 > URL: https://issues.apache.org/jira/browse/HDFS-6754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6754.patch, HDFS-6754.patch > > > I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently > in our nightly builds with the following error: > {noformat} > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) > at > org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6754: Attachment: HDFS-6754.patch Refined patch to update the comment which described the changed line > TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of > retry > --- > > Key: HDFS-6754 > URL: https://issues.apache.org/jira/browse/HDFS-6754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6754.patch, HDFS-6754.patch > > > I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently > in our nightly builds with the following error: > {noformat} > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) > at > org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6754: Status: Open (was: Patch Available) > TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of > retry > --- > > Key: HDFS-6754 > URL: https://issues.apache.org/jira/browse/HDFS-6754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6754.patch > > > I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently > in our nightly builds with the following error: > {noformat} > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) > at > org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6754: Attachment: HDFS-6754.patch Attaching patch to enable retries. > TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of > retry > --- > > Key: HDFS-6754 > URL: https://issues.apache.org/jira/browse/HDFS-6754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6754.patch > > > I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently > in our nightly builds with the following error: > {noformat} > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) > at > org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6754: Status: Patch Available (was: Open) > TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of > retry > --- > > Key: HDFS-6754 > URL: https://issues.apache.org/jira/browse/HDFS-6754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6754.patch > > > I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently > in our nightly builds with the following error: > {noformat} > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) > at > org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6755) There is an unnecessary sleep in the code path where DFSOutputStream#close gives up its attempt to contact the namenode
[ https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075236#comment-14075236 ] Mit Desai commented on HDFS-6755: - Thanks Colin! > There is an unnecessary sleep in the code path where DFSOutputStream#close > gives up its attempt to contact the namenode > --- > > Key: HDFS-6755 > URL: https://issues.apache.org/jira/browse/HDFS-6755 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6755.patch > > > DFSOutputStream#close has a loop where it tries to contact the NameNode, to > call {{complete}} on the file which is open-for-write. This loop includes a > sleep which increases exponentially (exponential backoff). It makes sense to > sleep before re-contacting the NameNode, but the code also sleeps even in the > case where it has already decided to give up and throw an exception back to > the user. It should not sleep after it has already decided to give up, since > there's no point. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6755) Make DFSOutputStream more efficient
[ https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6755: Attachment: HDFS-6755.patch Hi [~cmccabe], I did not mean to get rid of the sleep. I have uploaded the patch to indicate the change I wanted to make. I wanted to throw an IOException if the {{retries == 0}} before {{Thread.sleep(localTimeout);}} is called. Does that seem reasonable? > Make DFSOutputStream more efficient > --- > > Key: HDFS-6755 > URL: https://issues.apache.org/jira/browse/HDFS-6755 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6755.patch > > > Following code in DFSOutputStream may have an unnecessary sleep. > {code} > try { > Thread.sleep(localTimeout); > if (retries == 0) { > throw new IOException("Unable to close file because the last > block" > + " does not have enough number of replicas."); > } > retries--; > localTimeout *= 2; > if (Time.now() - localstart > 5000) { > DFSClient.LOG.info("Could not complete " + src + " retrying..."); > } > } catch (InterruptedException ie) { > DFSClient.LOG.warn("Caught exception ", ie); > } > {code} > Currently, the code sleeps before throwing an exception which should not be > the case. > The sleep time gets doubled on every iteration, which can make a significant > effect if there are more than one iterations and it would sleep just to throw > an exception. We need to move the sleep down after decrementing retries. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6755) Make DFSOutputStream more efficient
[ https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6755: Description: Following code in DFSOutputStream may have an unnecessary sleep. {code} try { Thread.sleep(localTimeout); if (retries == 0) { throw new IOException("Unable to close file because the last block" + " does not have enough number of replicas."); } retries--; localTimeout *= 2; if (Time.now() - localstart > 5000) { DFSClient.LOG.info("Could not complete " + src + " retrying..."); } } catch (InterruptedException ie) { DFSClient.LOG.warn("Caught exception ", ie); } {code} Currently, the code sleeps before throwing an exception which should not be the case. The sleep time gets doubled on every iteration, which can make a significant effect if there are more than one iterations and it would sleep just to throw an exception. We need to move the sleep down after decrementing retries. was: Following code in DFSOutputStream may have an unnecessary sleep. {code} try { Thread.sleep(localTimeout); if (retries == 0) { throw new IOException("Unable to close file because the last block" + " does not have enough number of replicas."); } retries--; localTimeout *= 2; if (Time.now() - localstart > 5000) { DFSClient.LOG.info("Could not complete " + src + " retrying..."); } } catch (InterruptedException ie) { DFSClient.LOG.warn("Caught exception ", ie); } {code} Currently, the code sleeps before throwing an exception which should not be the case. The sleep time gets doubled on every iteration, which can make a significant effect if there are more than one iterations. We need to move the sleep down after decrementing retries. > Make DFSOutputStream more efficient > --- > > Key: HDFS-6755 > URL: https://issues.apache.org/jira/browse/HDFS-6755 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > > Following code in DFSOutputStream may have an unnecessary sleep. > {code} > try { > Thread.sleep(localTimeout); > if (retries == 0) { > throw new IOException("Unable to close file because the last > block" > + " does not have enough number of replicas."); > } > retries--; > localTimeout *= 2; > if (Time.now() - localstart > 5000) { > DFSClient.LOG.info("Could not complete " + src + " retrying..."); > } > } catch (InterruptedException ie) { > DFSClient.LOG.warn("Caught exception ", ie); > } > {code} > Currently, the code sleeps before throwing an exception which should not be > the case. > The sleep time gets doubled on every iteration, which can make a significant > effect if there are more than one iterations and it would sleep just to throw > an exception. We need to move the sleep down after decrementing retries. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6755) Make DFSOutputStream more efficient
[ https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6755: Issue Type: Improvement (was: Bug) > Make DFSOutputStream more efficient > --- > > Key: HDFS-6755 > URL: https://issues.apache.org/jira/browse/HDFS-6755 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > > Following code in DFSOutputStream may have an unnecessary sleep. > {code} > try { > Thread.sleep(localTimeout); > if (retries == 0) { > throw new IOException("Unable to close file because the last > block" > + " does not have enough number of replicas."); > } > retries--; > localTimeout *= 2; > if (Time.now() - localstart > 5000) { > DFSClient.LOG.info("Could not complete " + src + " retrying..."); > } > } catch (InterruptedException ie) { > DFSClient.LOG.warn("Caught exception ", ie); > } > {code} > Currently, the code sleeps before throwing an exception which should not be > the case. > The sleep time gets doubled on every iteration, which can make a significant > effect if there are more than one iterations. We need to move the sleep down > after decrementing retries. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6755) Make DFSOutputStream more efficient
Mit Desai created HDFS-6755: --- Summary: Make DFSOutputStream more efficient Key: HDFS-6755 URL: https://issues.apache.org/jira/browse/HDFS-6755 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Following code in DFSOutputStream may have an unnecessary sleep. {code} try { Thread.sleep(localTimeout); if (retries == 0) { throw new IOException("Unable to close file because the last block" + " does not have enough number of replicas."); } retries--; localTimeout *= 2; if (Time.now() - localstart > 5000) { DFSClient.LOG.info("Could not complete " + src + " retrying..."); } } catch (InterruptedException ie) { DFSClient.LOG.warn("Caught exception ", ie); } {code} Currently, the code sleeps before throwing an exception which should not be the case. The sleep time gets doubled on every iteration, which can make a significant effect if there are more than one iterations. We need to move the sleep down after decrementing retries. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry
[ https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai moved YARN-2358 to HDFS-6754: --- Target Version/s: 2.6.0 (was: 2.6.0) Affects Version/s: (was: 2.6.0) 2.6.0 Key: HDFS-6754 (was: YARN-2358) Project: Hadoop HDFS (was: Hadoop YARN) > TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of > retry > --- > > Key: HDFS-6754 > URL: https://issues.apache.org/jira/browse/HDFS-6754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > > I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently > in our nightly builds with the following error: > {noformat} > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119) > at > org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6696) Name node cannot start if the path of a file under construction contains ".snapshot"
[ https://issues.apache.org/jira/browse/HDFS-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069235#comment-14069235 ] Mit Desai commented on HDFS-6696: - [~andrew.wang], we were trying to upgrade 0.21.11 to 2.4.0 > Name node cannot start if the path of a file under construction contains > ".snapshot" > > > Key: HDFS-6696 > URL: https://issues.apache.org/jira/browse/HDFS-6696 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Andrew Wang >Priority: Blocker > > Using {{-renameReserved}} to rename ".snapshot" in a pre-hdfs-snapshot > feature fsimage during upgrade only works, if there is nothing under > construction under the renamed directory. I am not sure whether it takes > care of edits containing ".snapshot" properly. > The workaround is to identify these directories and rename, then do > {{saveNamespace}} before performing upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6691: Attachment: ha2.png > The message on NN UI can be confusing during a rolling upgrade > --- > > Key: HDFS-6691 > URL: https://issues.apache.org/jira/browse/HDFS-6691 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.5.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: ha1.png, ha2.png > > > On ANN, it says rollback image was created. On SBN, it says otherwise. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6691: Attachment: ha1.png > The message on NN UI can be confusing during a rolling upgrade > --- > > Key: HDFS-6691 > URL: https://issues.apache.org/jira/browse/HDFS-6691 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.5.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: ha1.png > > > On ANN, it says rollback image was created. On SBN, it says otherwise. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade
Mit Desai created HDFS-6691: --- Summary: The message on NN UI can be confusing during a rolling upgrade Key: HDFS-6691 URL: https://issues.apache.org/jira/browse/HDFS-6691 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: ha1.png On ANN, it says rollback image was created. On SBN, it says otherwise. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6687) nn.getNamesystem() may return NPE from JspHelper
Mit Desai created HDFS-6687: --- Summary: nn.getNamesystem() may return NPE from JspHelper Key: HDFS-6687 URL: https://issues.apache.org/jira/browse/HDFS-6687 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Mit Desai Assignee: Mit Desai In hadoop-2, the http server is started in the very early stage to show the progress. If the user tries to get the name system, it may not be completely up and the NN logs will have this kind of error. {noformat} 2014-07-14 15:49:03,521 [***] WARN resources.ExceptionHandler: INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.hdfs.server.common.JspHelper.getTokenUGI(JspHelper.java:661) at org.apache.hadoop.hdfs.server.common.JspHelper.getUGI(JspHelper.java:604) at org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:53) at org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:41) at com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:84) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78) at com.yahoo.hadoop.GzipFilter.doFilter(GzipFilter.java:220) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mort
[jira] [Commented] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042525#comment-14042525 ] Mit Desai commented on HDFS-6597: - Changing the summary to describe the jira more accurately. > Add a new option to NN upgrade to terminate the process after upgrade on NN > is completed > > > Key: HDFS-6597 > URL: https://issues.apache.org/jira/browse/HDFS-6597 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Danilo Vunjak > Attachments: JIRA-HDFS-30.patch > > > Currently when namenode is started for upgrade (hadoop namenode -upgrade > command), after finishing upgrade of metadata, namenode starts working > normally and wait for datanodes to upgrade itself and connect to to NN. We > need to have option for upgrading only NN metadata, so after upgrade is > finished on NN, process should terminate. > I have tested it by changing in file: hdfs.server.namenode.NameNode.java, > method: public static NameNode createNameNode(String argv[], Configuration > conf): > in switch added > case UPGRADE: > case UPGRADE: > { > DefaultMetricsSystem.initialize("NameNode"); > NameNode nameNode = new NameNode(conf); > if (startOpt.getForceUpgrade()) { > terminate(0); > return null; > } > > return nameNode; > } > This did upgrade of metadata, closed process after finished, and later when > all services were started, upgrade of datanodes finished sucessfully and > system run . > What I'm suggesting right now is to add new startup parameter "-force", so > namenode can be started like this "hadoop namenode -upgrade -force", so we > can indicate that we want to terminate process after upgrade metadata on NN > is finished. Old functionality should be preserved, so users can run "hadoop > namenode -upgrade" on same way and with same behaviour as it was previous. > Thanks, > Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6597: Summary: Add a new option to NN upgrade to terminate the process after upgrade on NN is completed (was: New option for namenode upgrade) > Add a new option to NN upgrade to terminate the process after upgrade on NN > is completed > > > Key: HDFS-6597 > URL: https://issues.apache.org/jira/browse/HDFS-6597 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Danilo Vunjak > Attachments: JIRA-HDFS-30.patch > > > Currently when namenode is started for upgrade (hadoop namenode -upgrade > command), after finishing upgrade of metadata, namenode starts working > normally and wait for datanodes to upgrade itself and connect to to NN. We > need to have option for upgrading only NN metadata, so after upgrade is > finished on NN, process should terminate. > I have tested it by changing in file: hdfs.server.namenode.NameNode.java, > method: public static NameNode createNameNode(String argv[], Configuration > conf): > in switch added > case UPGRADE: > case UPGRADE: > { > DefaultMetricsSystem.initialize("NameNode"); > NameNode nameNode = new NameNode(conf); > if (startOpt.getForceUpgrade()) { > terminate(0); > return null; > } > > return nameNode; > } > This did upgrade of metadata, closed process after finished, and later when > all services were started, upgrade of datanodes finished sucessfully and > system run . > What I'm suggesting right now is to add new startup parameter "-force", so > namenode can be started like this "hadoop namenode -upgrade -force", so we > can indicate that we want to terminate process after upgrade metadata on NN > is finished. Old functionality should be preserved, so users can run "hadoop > namenode -upgrade" on same way and with same behaviour as it was previous. > Thanks, > Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6597) New option for namenode upgrade
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042518#comment-14042518 ] Mit Desai commented on HDFS-6597: - The idea seems to be good as it does not alter the way -upgrade currently works. * I agree with [~cmccabe] and [~cnauroth] on the new name to be "force" * Instead of -force, -halt or -upgradeOnly seems to be reasonable. But anything would be good as long as it does nor imply we are forcing something to get done which it should not be doing. Thanks, Mit > New option for namenode upgrade > --- > > Key: HDFS-6597 > URL: https://issues.apache.org/jira/browse/HDFS-6597 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Danilo Vunjak > Attachments: JIRA-HDFS-30.patch > > > Currently when namenode is started for upgrade (hadoop namenode -upgrade > command), after finishing upgrade of metadata, namenode starts working > normally and wait for datanodes to upgrade itself and connect to to NN. We > need to have option for upgrading only NN metadata, so after upgrade is > finished on NN, process should terminate. > I have tested it by changing in file: hdfs.server.namenode.NameNode.java, > method: public static NameNode createNameNode(String argv[], Configuration > conf): > in switch added > case UPGRADE: > case UPGRADE: > { > DefaultMetricsSystem.initialize("NameNode"); > NameNode nameNode = new NameNode(conf); > if (startOpt.getForceUpgrade()) { > terminate(0); > return null; > } > > return nameNode; > } > This did upgrade of metadata, closed process after finished, and later when > all services were started, upgrade of datanodes finished sucessfully and > system run . > What I'm suggesting right now is to add new startup parameter "-force", so > namenode can be started like this "hadoop namenode -upgrade -force", so we > can indicate that we want to terminate process after upgrade metadata on NN > is finished. Old functionality should be preserved, so users can run "hadoop > namenode -upgrade" on same way and with same behaviour as it was previous. > Thanks, > Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026612#comment-14026612 ] Mit Desai commented on HDFS-742: Attaching the patch. Unfortunately I do not have a way to reproduce the issue so I'm unable to have a test to verify the change. Here is the explanation of the part of the Balancer code makes it hang forever. In the following while loop in Balancer.java, when the Balancer figures out that it should fetch more blocks, it gets the BlockList and decrements the blockToReceive by that many blocks. It again starts from the top of the loop after that. {code} while(!isTimeUp && getScheduledSize()>0 && (!srcBlockList.isEmpty() || blocksToReceive>0)) { ## SOME LINES OMITTED ## filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { ## SOME LINES OMITTED ## // check if time is up or not if (Time.now()-startTime > MAX_ITERATION_TIME) { isTimeUp = true; continue; } ## SOME LINES OMITTED ## } {code} The problem here is, if the datanode is decommissioned, the {{getBlockList()}} method will not return anything and the {{blocksToReceive}} will not be changed. It will keep on doing this indefinitely as the {{blocksToReceive}} will always be greater than 0. The {{isTimeUp}} will never be set to true as it will never reach that part of the code. In the patch that is submitted, the Time up condition is moved to the top of the loop. So it will check if {{isTimeUp}} is set and proceed ahead only if time up is not encountered. > A down DataNode makes Balancer to hang on repeatingly asking NameNode its > partial block list > > > Key: HDFS-742 > URL: https://issues.apache.org/jira/browse/HDFS-742 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Reporter: Hairong Kuang >Assignee: Mit Desai > Attachments: HDFS-742.patch > > > We had a balancer that had not made any progress for a long time. It turned > out it was repeatingly asking Namenode for a partial block list of one > datanode, which was done while the balancer was running. > NameNode should notify Balancer that the datanode is not available and > Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-742: --- Attachment: HDFS-742.patch > A down DataNode makes Balancer to hang on repeatingly asking NameNode its > partial block list > > > Key: HDFS-742 > URL: https://issues.apache.org/jira/browse/HDFS-742 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Reporter: Hairong Kuang >Assignee: Mit Desai > Attachments: HDFS-742.patch > > > We had a balancer that had not made any progress for a long time. It turned > out it was repeatingly asking Namenode for a partial block list of one > datanode, which was done while the balancer was running. > NameNode should notify Balancer that the datanode is not available and > Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020413#comment-14020413 ] Mit Desai commented on HDFS-6487: - Thanks Andrew! > TestStandbyCheckpoint#testSBNCheckpoints is racy > > > Key: HDFS-6487 > URL: https://issues.apache.org/jira/browse/HDFS-6487 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6487.patch, HDFS-6487.patch > > > testSBNCheckpoints fails occasionally. > I could not reproduce it consistently but it would fail 8 out of 10 times > after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success
[ https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020222#comment-14020222 ] Mit Desai commented on HDFS-6159: - This test is failing again. https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ [~airbots], can you take a look on this pre-commit? > TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block > missing after balancer success > -- > > Key: HDFS-6159 > URL: https://issues.apache.org/jira/browse/HDFS-6159 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.3.0 >Reporter: Chen He >Assignee: Chen He > Fix For: 3.0.0, 2.5.0 > > Attachments: HDFS-6159-v2.patch, HDFS-6159-v2.patch, HDFS-6159.patch, > logs.txt > > > The TestBalancerWithNodeGroup.testBalancerWithNodeGroup will report negative > false failure if there is(are) data block(s) losing after balancer > successfuly finishes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020213#comment-14020213 ] Mit Desai commented on HDFS-6487: - Failure not related to the patch submitted. This has been there since a long time. HDFS-5807 and HDFS-6159 were filed to resolve it. I will comment on those JIRAs. > TestStandbyCheckpoint#testSBNCheckpoints is racy > > > Key: HDFS-6487 > URL: https://issues.apache.org/jira/browse/HDFS-6487 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6487.patch, HDFS-6487.patch > > > testSBNCheckpoints fails occasionally. > I could not reproduce it consistently but it would fail 8 out of 10 times > after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6487: Status: Patch Available (was: Open) > TestStandbyCheckpoint#testSBNCheckpoints is racy > > > Key: HDFS-6487 > URL: https://issues.apache.org/jira/browse/HDFS-6487 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6487.patch, HDFS-6487.patch > > > testSBNCheckpoints fails occasionally. > I could not reproduce it consistently but it would fail 8 out of 10 times > after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6487: Attachment: HDFS-6487.patch Attaching the updated patch. > TestStandbyCheckpoint#testSBNCheckpoints is racy > > > Key: HDFS-6487 > URL: https://issues.apache.org/jira/browse/HDFS-6487 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6487.patch, HDFS-6487.patch > > > testSBNCheckpoints fails occasionally. > I could not reproduce it consistently but it would fail 8 out of 10 times > after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6487: Status: Open (was: Patch Available) > TestStandbyCheckpoint#testSBNCheckpoints is racy > > > Key: HDFS-6487 > URL: https://issues.apache.org/jira/browse/HDFS-6487 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6487.patch > > > testSBNCheckpoints fails occasionally. > I could not reproduce it consistently but it would fail 8 out of 10 times > after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019320#comment-14019320 ] Mit Desai commented on HDFS-6487: - Thanks Andrew for looking into the patch. Using GenericTestUtils.waitFor looks like a better option. I will update my patch. For the timeour, 5sec works for me now. But I will increase it to 60s (It does not hurt in waiting a little longer. It will come out of the wait before that time anyways) > TestStandbyCheckpoint#testSBNCheckpoints is racy > > > Key: HDFS-6487 > URL: https://issues.apache.org/jira/browse/HDFS-6487 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6487.patch > > > testSBNCheckpoints fails occasionally. > I could not reproduce it consistently but it would fail 8 out of 10 times > after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6487: Affects Version/s: (was: 2.5.0) 2.4.1 Status: Patch Available (was: Open) > TestStandbyCheckpoint#testSBNCheckpoints is racy > > > Key: HDFS-6487 > URL: https://issues.apache.org/jira/browse/HDFS-6487 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6487.patch > > > testSBNCheckpoints fails occasionally. > I could not reproduce it consistently but it would fail 8 out of 10 times > after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018953#comment-14018953 ] Mit Desai commented on HDFS-6487: - In testSBNCheckpoints, after doEdits() it waits for the SBN to do checkpoint and immediately after that checks if the OIV image has been written. The race lies in between completion of checkpoint and checking for OIV image. I have added a wait for writing the OIV image. This prevents the test from failing due to the race and if the OIV image is not written even after 5000ms, the test will fail. Which is what is expected. > TestStandbyCheckpoint#testSBNCheckpoints is racy > > > Key: HDFS-6487 > URL: https://issues.apache.org/jira/browse/HDFS-6487 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6487.patch > > > testSBNCheckpoints fails occasionally. > I could not reproduce it consistently but it would fail 8 out of 10 times > after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
[ https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6487: Attachment: HDFS-6487.patch Attaching patch for trunk and branch-2 > TestStandbyCheckpoint#testSBNCheckpoints is racy > > > Key: HDFS-6487 > URL: https://issues.apache.org/jira/browse/HDFS-6487 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.5.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6487.patch > > > testSBNCheckpoints fails occasionally. > I could not reproduce it consistently but it would fail 8 out of 10 times > after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy
Mit Desai created HDFS-6487: --- Summary: TestStandbyCheckpoint#testSBNCheckpoints is racy Key: HDFS-6487 URL: https://issues.apache.org/jira/browse/HDFS-6487 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai testSBNCheckpoints fails occasionally. I could not reproduce it consistently but it would fail 8 out of 10 times after I did mvn clean, mvn install and run the test -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6421: Status: Patch Available (was: Open) > RHEL4 fails to compile vecsum.c > --- > > Key: HDFS-6421 > URL: https://issues.apache.org/jira/browse/HDFS-6421 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.5.0 > Environment: RHEL4 >Reporter: Jason Lowe >Assignee: Mit Desai > Attachments: HDFS-6421.patch, HDFS-6421.patch > > > After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't > have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit > compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6421: Status: Open (was: Patch Available) > RHEL4 fails to compile vecsum.c > --- > > Key: HDFS-6421 > URL: https://issues.apache.org/jira/browse/HDFS-6421 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.5.0 > Environment: RHEL4 >Reporter: Jason Lowe >Assignee: Mit Desai > Attachments: HDFS-6421.patch, HDFS-6421.patch > > > After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't > have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit > compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001718#comment-14001718 ] Mit Desai commented on HDFS-6421: - Correction: Thanks [~cmccabe] for reviewing the patch. :-) > RHEL4 fails to compile vecsum.c > --- > > Key: HDFS-6421 > URL: https://issues.apache.org/jira/browse/HDFS-6421 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.5.0 > Environment: RHEL4 >Reporter: Jason Lowe >Assignee: Mit Desai > Attachments: HDFS-6421.patch, HDFS-6421.patch > > > After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't > have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit > compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6421: Attachment: HDFS-6421.patch Thanks [~cmccabe] for taking a reviewing the patch. Attaching the new patch addressing your comments. > RHEL4 fails to compile vecsum.c > --- > > Key: HDFS-6421 > URL: https://issues.apache.org/jira/browse/HDFS-6421 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.5.0 > Environment: RHEL4 >Reporter: Jason Lowe >Assignee: Mit Desai > Attachments: HDFS-6421.patch, HDFS-6421.patch > > > After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't > have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit > compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6421: Status: Patch Available (was: Open) > RHEL4 fails to compile vecsum.c > --- > > Key: HDFS-6421 > URL: https://issues.apache.org/jira/browse/HDFS-6421 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.5.0 > Environment: RHEL4 >Reporter: Jason Lowe >Assignee: Mit Desai > Attachments: HDFS-6421.patch > > > After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't > have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit > compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Status: Patch Available (was: Open) > Expose upgrade status through NameNode web UI > - > > Key: HDFS-6230 > URL: https://issues.apache.org/jira/browse/HDFS-6230 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Mit Desai > Attachments: HDFS-6230-NoUpgradesInProgress.png, > HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch, HDFS-6230.patch > > > The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 > also does not have the _hadoop dfsadmin -upgradeProgress_ command to check > the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6421: Attachment: HDFS-6421.patch This code in the stopwatch structure gets the rusage and stores it into {{struct rusage rusage;}} but it is never used. {code} if (getrusage(RUSAGE_THREAD, &watch->rusage) < 0) { int err = errno; fprintf(stderr, "getrusage failed: error %d (%s)\n", err, strerror(err)); goto error; } {code} Removing the block as to get REHL4 compiling again. > RHEL4 fails to compile vecsum.c > --- > > Key: HDFS-6421 > URL: https://issues.apache.org/jira/browse/HDFS-6421 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.5.0 > Environment: RHEL4 >Reporter: Jason Lowe >Assignee: Mit Desai > Attachments: HDFS-6421.patch > > > After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't > have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit > compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6421) RHEL4 fails to compile vecsum.c
[ https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned HDFS-6421: --- Assignee: Mit Desai > RHEL4 fails to compile vecsum.c > --- > > Key: HDFS-6421 > URL: https://issues.apache.org/jira/browse/HDFS-6421 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: 2.5.0 > Environment: RHEL4 >Reporter: Jason Lowe >Assignee: Mit Desai > > After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't > have RUSAGE_THREAD. RHEL4 is ancient, but we use it in a 32-bit > compatibility environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Status: Open (was: Patch Available) > Expose upgrade status through NameNode web UI > - > > Key: HDFS-6230 > URL: https://issues.apache.org/jira/browse/HDFS-6230 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Mit Desai > Attachments: HDFS-6230-NoUpgradesInProgress.png, > HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch, HDFS-6230.patch > > > The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 > also does not have the _hadoop dfsadmin -upgradeProgress_ command to check > the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned HDFS-742: -- Assignee: Mit Desai (was: Hairong Kuang) > A down DataNode makes Balancer to hang on repeatingly asking NameNode its > partial block list > > > Key: HDFS-742 > URL: https://issues.apache.org/jira/browse/HDFS-742 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Reporter: Hairong Kuang >Assignee: Mit Desai > > We had a balancer that had not made any progress for a long time. It turned > out it was repeatingly asking Namenode for a partial block list of one > datanode, which was done while the balancer was running. > NameNode should notify Balancer that the datanode is not available and > Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993598#comment-13993598 ] Mit Desai commented on HDFS-742: Taking this over. Feel free to reassign if you are still working on it. > A down DataNode makes Balancer to hang on repeatingly asking NameNode its > partial block list > > > Key: HDFS-742 > URL: https://issues.apache.org/jira/browse/HDFS-742 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Reporter: Hairong Kuang >Assignee: Hairong Kuang > > We had a balancer that had not made any progress for a long time. It turned > out it was repeatingly asking Namenode for a partial block list of one > datanode, which was done while the balancer was running. > NameNode should notify Balancer that the datanode is not available and > Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Attachment: HDFS-6230.patch Thanks for looking at the patch [~wheat9]. Posting the updated patch. > Expose upgrade status through NameNode web UI > - > > Key: HDFS-6230 > URL: https://issues.apache.org/jira/browse/HDFS-6230 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Mit Desai > Attachments: HDFS-6230-NoUpgradesInProgress.png, > HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch, HDFS-6230.patch > > > The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 > also does not have the _hadoop dfsadmin -upgradeProgress_ command to check > the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list
[ https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992781#comment-13992781 ] Mit Desai commented on HDFS-742: Hey [~hairong], are you still working on this JIRA? If not, I can take it over and work on it. > A down DataNode makes Balancer to hang on repeatingly asking NameNode its > partial block list > > > Key: HDFS-742 > URL: https://issues.apache.org/jira/browse/HDFS-742 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Reporter: Hairong Kuang >Assignee: Hairong Kuang > > We had a balancer that had not made any progress for a long time. It turned > out it was repeatingly asking Namenode for a partial block list of one > datanode, which was done while the balancer was running. > NameNode should notify Balancer that the datanode is not available and > Balancer should stop asking for the datanode's block list. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Status: Patch Available (was: Open) > Expose upgrade status through NameNode web UI > - > > Key: HDFS-6230 > URL: https://issues.apache.org/jira/browse/HDFS-6230 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Mit Desai > Attachments: HDFS-6230-NoUpgradesInProgress.png, > HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch > > > The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 > also does not have the _hadoop dfsadmin -upgradeProgress_ command to check > the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Attachment: HDFS-6230.patch Attaching the patch for showing message when upgrade is in progress. Attaching the screenshots of the web UI when the upgrades are in progress and after the upgrade is finalized > Expose upgrade status through NameNode web UI > - > > Key: HDFS-6230 > URL: https://issues.apache.org/jira/browse/HDFS-6230 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Mit Desai > Attachments: HDFS-6230-NoUpgradesInProgress.png, > HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch > > > The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 > also does not have the _hadoop dfsadmin -upgradeProgress_ command to check > the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6230: Attachment: HDFS-6230-UpgradeInProgress.jpg HDFS-6230-NoUpgradesInProgress.png > Expose upgrade status through NameNode web UI > - > > Key: HDFS-6230 > URL: https://issues.apache.org/jira/browse/HDFS-6230 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Mit Desai > Attachments: HDFS-6230-NoUpgradesInProgress.png, > HDFS-6230-UpgradeInProgress.jpg > > > The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 > also does not have the _hadoop dfsadmin -upgradeProgress_ command to check > the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985985#comment-13985985 ] Mit Desai commented on HDFS-6230: - Thanks! Taking it over > Expose upgrade status through NameNode web UI > - > > Key: HDFS-6230 > URL: https://issues.apache.org/jira/browse/HDFS-6230 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Mit Desai > > The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 > also does not have the _hadoop dfsadmin -upgradeProgress_ command to check > the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned HDFS-6230: --- Assignee: Mit Desai (was: Arpit Agarwal) > Expose upgrade status through NameNode web UI > - > > Key: HDFS-6230 > URL: https://issues.apache.org/jira/browse/HDFS-6230 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Mit Desai > > The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 > also does not have the _hadoop dfsadmin -upgradeProgress_ command to check > the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6230) Expose upgrade status through NameNode web UI
[ https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985963#comment-13985963 ] Mit Desai commented on HDFS-6230: - [~arpitagarwal] are you working on the jira? > Expose upgrade status through NameNode web UI > - > > Key: HDFS-6230 > URL: https://issues.apache.org/jira/browse/HDFS-6230 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > > The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 > also does not have the _hadoop dfsadmin -upgradeProgress_ command to check > the upgrade status. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984337#comment-13984337 ] Mit Desai commented on HDFS-5892: - [~wheat9], taking a closer look on the commits, I found that this is not yet fixed into 2.4. Do we want to commit this into 2.4.1 and change the Fix version to 2.4.1 or edit the fix version to 2.5.0? > TestDeleteBlockPool fails in branch-2 > - > > Key: HDFS-5892 > URL: https://issues.apache.org/jira/browse/HDFS-5892 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Ted Yu >Priority: Minor > Fix For: 2.4.0 > > Attachments: HDFS-5892.patch, > org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt > > > Running test suite on Linux, I got: > {code} > testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) > Time elapsed: 8.143 sec <<< ERROR! > java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.
[ https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved HDFS-3122. - Resolution: Not a Problem Target Version/s: 0.23.3, 0.24.0 (was: 0.24.0, 0.23.3) Haven't heard anything yet. Resolving this issue. Feel free to reopen if anyone thinks the other way. > Block recovery with closeFile flag true can race with blockReport. Due to > this blocks are getting marked as corrupt. > > > Key: HDFS-3122 > URL: https://issues.apache.org/jira/browse/HDFS-3122 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 0.23.0, 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Critical > Attachments: blockCorrupt.txt > > > *Block Report* can *race* with *Block Recovery* with closeFile flag true. > Block report generated just before block recovery at DN side and due to N/W > problems, block report got delayed to NN. > After this, recovery success and generation stamp modifies to new one. > And primary DN invokes the commitBlockSynchronization and block got updated > in NN side. Also block got marked as complete, since the closeFile flag was > true. Updated with new genstamp. > Now blockReport started processing at NN side. This particular block from RBW > (when it generated the BR at DN), and file was completed at NN side. > Finally block will be marked as corrupt because of genstamp mismatch. > {code} > case RWR: > if (!storedBlock.isComplete()) { > return null; // not corrupt > } else if (storedBlock.getGenerationStamp() != > iblk.getGenerationStamp()) { > return new BlockToMarkCorrupt(storedBlock, > "reported " + reportedState + " replica with genstamp " + > iblk.getGenerationStamp() + " does not match COMPLETE block's " + > "genstamp in block map " + storedBlock.getGenerationStamp()); > } else { // COMPLETE block, same genstamp > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.
[ https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979848#comment-13979848 ] Mit Desai commented on HDFS-3122: - Hi [~umamaheswararao], Is this still an issue? I looked at the code and I think this got fixed sometime. Here is the code snippet from BlockManager {code} case RWR: if (!storedBlock.isComplete()) { return null; // not corrupt } else if (storedBlock.getGenerationStamp() != reported.getGenerationStamp()) { final long reportedGS = reported.getGenerationStamp(); return new BlockToMarkCorrupt(storedBlock, reportedGS, "reported " + reportedState + " replica with genstamp " + reportedGS + " does not match COMPLETE block's genstamp in block map " + storedBlock.getGenerationStamp(), Reason.GENSTAMP_MISMATCH); } else { // COMPLETE block, same genstamp if (reportedState == ReplicaState.RBW) { // If it's a RBW report for a COMPLETE block, it may just be that // the block report got a little bit delayed after the pipeline // closed. So, ignore this report, assuming we will get a // FINALIZED replica later. See HDFS-2791 LOG.info("Received an RBW replica for " + storedBlock + " on " + dn + ": ignoring it, since it is " + "complete with the same genstamp"); return null; } else { return new BlockToMarkCorrupt(storedBlock, "reported replica has invalid state " + reportedState, Reason.INVALID_STATE); } } {code} I will resolve this Jira as "Not a Problem" tomorrow unless someone wants to go some other way. > Block recovery with closeFile flag true can race with blockReport. Due to > this blocks are getting marked as corrupt. > > > Key: HDFS-3122 > URL: https://issues.apache.org/jira/browse/HDFS-3122 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 0.23.0, 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Critical > Attachments: blockCorrupt.txt > > > *Block Report* can *race* with *Block Recovery* with closeFile flag true. > Block report generated just before block recovery at DN side and due to N/W > problems, block report got delayed to NN. > After this, recovery success and generation stamp modifies to new one. > And primary DN invokes the commitBlockSynchronization and block got updated > in NN side. Also block got marked as complete, since the closeFile flag was > true. Updated with new genstamp. > Now blockReport started processing at NN side. This particular block from RBW > (when it generated the BR at DN), and file was completed at NN side. > Finally block will be marked as corrupt because of genstamp mismatch. > {code} > case RWR: > if (!storedBlock.isComplete()) { > return null; // not corrupt > } else if (storedBlock.getGenerationStamp() != > iblk.getGenerationStamp()) { > return new BlockToMarkCorrupt(storedBlock, > "reported " + reportedState + " replica with genstamp " + > iblk.getGenerationStamp() + " does not match COMPLETE block's " + > "genstamp in block map " + storedBlock.getGenerationStamp()); > } else { // COMPLETE block, same genstamp > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered
[ https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved HDFS-2734. - Resolution: Not a Problem Target Version/s: 0.23.0, 0.20.1 (was: 0.20.1, 0.23.0) I think this issue is not a problem. Resolving it as Not a Problem. But feel free to reopen this jira if you still feel there is a problem > Even if we configure the property fs.checkpoint.size in both core-site.xml > and hdfs-site.xml the values are not been considered > > > Key: HDFS-2734 > URL: https://issues.apache.org/jira/browse/HDFS-2734 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.20.1, 0.23.0 >Reporter: J.Andreina >Priority: Minor > > Even if we configure the property fs.checkpoint.size in both core-site.xml > and hdfs-site.xml the values are not been considered -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971592#comment-13971592 ] Mit Desai commented on HDFS-5892: - [~yuzhih...@gmail.com] [~dandan] : Are you guys still having the issues? This test still fails randomly in our nightly builds > TestDeleteBlockPool fails in branch-2 > - > > Key: HDFS-5892 > URL: https://issues.apache.org/jira/browse/HDFS-5892 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Ted Yu >Priority: Minor > Fix For: 2.4.0 > > Attachments: HDFS-5892.patch, > org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt > > > Running test suite on Linux, I got: > {code} > testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) > Time elapsed: 8.143 sec <<< ERROR! > java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting... > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-4587) Webhdfs secure clients are incompatible with non-secure NN
[ https://issues.apache.org/jira/browse/HDFS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-4587: Issue Type: Bug (was: Sub-task) Parent: (was: HDFS-4576) > Webhdfs secure clients are incompatible with non-secure NN > -- > > Key: HDFS-4587 > URL: https://issues.apache.org/jira/browse/HDFS-4587 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, webhdfs >Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha >Reporter: Daryn Sharp > > A secure webhdfs client will receive an exception from a non-secure NN. For > a NN in non-secure mode, {{FSNamesystem#getDelegationToken}} will return > "null" to indicate no token is required. Hdfs will send back the null to the > client, but webhdfs uses {{DelegationTokenSecretManager.createCredentials}} > which instead throws an exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-4576) Webhdfs authentication issues
[ https://issues.apache.org/jira/browse/HDFS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved HDFS-4576. - Resolution: Fixed Fix Version/s: 0.23.11 3.0.0 Resolving this task as resolved as all of its subtasks are resolved now > Webhdfs authentication issues > - > > Key: HDFS-4576 > URL: https://issues.apache.org/jira/browse/HDFS-4576 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.0.0-alpha, 3.0.0, 0.23.7 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 3.0.0, 0.23.11 > > > Umbrella jira to track the webhdfs authentication issues as subtasks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4587) Webhdfs secure clients are incompatible with non-secure NN
[ https://issues.apache.org/jira/browse/HDFS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968382#comment-13968382 ] Mit Desai commented on HDFS-4587: - As 0.23 is going into the maintenance state and this bug will not be fixed in it, I am removing the target version for 0.23.11 > Webhdfs secure clients are incompatible with non-secure NN > -- > > Key: HDFS-4587 > URL: https://issues.apache.org/jira/browse/HDFS-4587 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, webhdfs >Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha >Reporter: Daryn Sharp > > A secure webhdfs client will receive an exception from a non-secure NN. For > a NN in non-secure mode, {{FSNamesystem#getDelegationToken}} will return > "null" to indicate no token is required. Hdfs will send back the null to the > client, but webhdfs uses {{DelegationTokenSecretManager.createCredentials}} > which instead throws an exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-4587) Webhdfs secure clients are incompatible with non-secure NN
[ https://issues.apache.org/jira/browse/HDFS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-4587: Target Version/s: 3.0.0 (was: 3.0.0, 0.23.11) > Webhdfs secure clients are incompatible with non-secure NN > -- > > Key: HDFS-4587 > URL: https://issues.apache.org/jira/browse/HDFS-4587 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, webhdfs >Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha >Reporter: Daryn Sharp > > A secure webhdfs client will receive an exception from a non-secure NN. For > a NN in non-secure mode, {{FSNamesystem#getDelegationToken}} will return > "null" to indicate no token is required. Hdfs will send back the null to the > client, but webhdfs uses {{DelegationTokenSecretManager.createCredentials}} > which instead throws an exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered
[ https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966723#comment-13966723 ] Mit Desai commented on HDFS-2734: - I see that that there is no activity on this Jira since a long time. [~andreina], Is this still reproducible on your side? If this is still an issue, can you provide the information [~qwertymaniac] requested? For the analysis that Harsh did, I think this is not reproducable on his side and I have not seen anyone else raising this concern. In that case, if I do not hear back by 4/17/14, I will go ahead and close this issue as Not A Problem. -Mit > Even if we configure the property fs.checkpoint.size in both core-site.xml > and hdfs-site.xml the values are not been considered > > > Key: HDFS-2734 > URL: https://issues.apache.org/jira/browse/HDFS-2734 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.20.1, 0.23.0 >Reporter: J.Andreina >Priority: Minor > > Even if we configure the property fs.checkpoint.size in both core-site.xml > and hdfs-site.xml the values are not been considered -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964579#comment-13964579 ] Mit Desai commented on HDFS-5983: - Already fixed by HDFS-6160. So Closing it. > TestSafeMode#testInitializeReplQueuesEarly fails > > > Key: HDFS-5983 > URL: https://issues.apache.org/jira/browse/HDFS-5983 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Kihwal Lee >Assignee: Ming Ma > Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt > > > It was seen from one of the precommit build of HDFS-5962. The test case > creates 15 blocks and then shuts down all datanodes. Then the namenode is > restarted with a low safe block threshold and one datanode is restarted. The > idea is that the initial block report from the restarted datanode will make > the namenode leave the safemode and initialize the replication queues. > According to the log, the datanode reported 3 blocks, but slightly before > that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5983: Status: Open (was: Patch Available) > TestSafeMode#testInitializeReplQueuesEarly fails > > > Key: HDFS-5983 > URL: https://issues.apache.org/jira/browse/HDFS-5983 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Ming Ma > Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt > > > It was seen from one of the precommit build of HDFS-5962. The test case > creates 15 blocks and then shuts down all datanodes. Then the namenode is > restarted with a low safe block threshold and one datanode is restarted. The > idea is that the initial block report from the restarted datanode will make > the namenode leave the safemode and initialize the replication queues. > According to the log, the datanode reported 3 blocks, but slightly before > that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964491#comment-13964491 ] Mit Desai commented on HDFS-5983: - [~airbots], [~mingma] : Can any of you regenerate the patch and attach it to make sure it applies successfully? Mit > TestSafeMode#testInitializeReplQueuesEarly fails > > > Key: HDFS-5983 > URL: https://issues.apache.org/jira/browse/HDFS-5983 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Ming Ma > Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt > > > It was seen from one of the precommit build of HDFS-5962. The test case > creates 15 blocks and then shuts down all datanodes. Then the namenode is > restarted with a low safe block threshold and one datanode is restarted. The > idea is that the initial block report from the restarted datanode will make > the namenode leave the safemode and initialize the replication queues. > According to the log, the datanode reported 3 blocks, but slightly before > that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964462#comment-13964462 ] Mit Desai commented on HDFS-5983: - One note, you need to Submit the Patch once you upload the patch to get the HadoopQA Comment. I just did that. > TestSafeMode#testInitializeReplQueuesEarly fails > > > Key: HDFS-5983 > URL: https://issues.apache.org/jira/browse/HDFS-5983 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Ming Ma > Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt > > > It was seen from one of the precommit build of HDFS-5962. The test case > creates 15 blocks and then shuts down all datanodes. Then the namenode is > restarted with a low safe block threshold and one datanode is restarted. The > idea is that the initial block report from the restarted datanode will make > the namenode leave the safemode and initialize the replication queues. > According to the log, the datanode reported 3 blocks, but slightly before > that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5983: Status: Patch Available (was: Open) > TestSafeMode#testInitializeReplQueuesEarly fails > > > Key: HDFS-5983 > URL: https://issues.apache.org/jira/browse/HDFS-5983 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Ming Ma > Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt > > > It was seen from one of the precommit build of HDFS-5962. The test case > creates 15 blocks and then shuts down all datanodes. Then the namenode is > restarted with a low safe block threshold and one datanode is restarted. The > idea is that the initial block report from the restarted datanode will make > the namenode leave the safemode and initialize the replication queues. > According to the log, the datanode reported 3 blocks, but slightly before > that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails
[ https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964459#comment-13964459 ] Mit Desai commented on HDFS-5983: - Reviewed the patch. LGTM +1 (non binding) > TestSafeMode#testInitializeReplQueuesEarly fails > > > Key: HDFS-5983 > URL: https://issues.apache.org/jira/browse/HDFS-5983 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Ming Ma > Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt > > > It was seen from one of the precommit build of HDFS-5962. The test case > creates 15 blocks and then shuts down all datanodes. Then the namenode is > restarted with a low safe block threshold and one datanode is restarted. The > idea is that the initial block report from the restarted datanode will make > the namenode leave the safemode and initialize the replication queues. > According to the log, the datanode reported 3 blocks, but slightly before > that the namenode did repl queue init with 1 block. I will attach the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2
[ https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962088#comment-13962088 ] Mit Desai commented on HDFS-6195: - TestRMRestart is a different issue related to YARN-1906 > TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and > intermittently fails on trunk and branch2 > -- > > Key: HDFS-6195 > URL: https://issues.apache.org/jira/browse/HDFS-6195 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.5.0 >Reporter: Mit Desai >Assignee: Mit Desai > Fix For: 3.0.0, 2.5.0 > > Attachments: HDFS-6195.patch > > > The test has 1 containers that it tries to cleanup. > The cleanup has a timeout of 2ms in which the test sometimes cannot do > the cleanup completely and gives out an Assertion Failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2
[ https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961984#comment-13961984 ] Mit Desai commented on HDFS-6195: - While cleaning up the containers, {code} while (cleanedSize < allocatedSize && waitCount++ < 200) { Thread.sleep(100); resp = nm.nodeHeartbeat(true); cleaned = resp.getContainersToCleanup(); cleanedSize += cleaned.size(); } {code} The test sometimes cannot do the complete cleanup and some of the 1 containers cannot be cleaned up. Resulting an assertion error at {{Assert.assertEquals(allocatedSize, cleanedSize);}}. This test has been failing in our nightly builds since couple of days. I was able to reproduce this consistently on eclipse but not using maven. I think this is an environment issue so cannot be reproduced everywhere. As a fix, I have increased the thread sleep time in the while loop. Which will give some extra time for the container cleanup. And as there is also a check in the while loop for the allocated size and cleaned size, the test will not always take up all cycles in the loop. > TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and > intermittently fails on trunk and branch2 > -- > > Key: HDFS-6195 > URL: https://issues.apache.org/jira/browse/HDFS-6195 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.5.0 >Reporter: Mit Desai >Assignee: Mit Desai > Fix For: 3.0.0, 2.5.0 > > Attachments: HDFS-6195.patch > > > The test has 1 containers that it tries to cleanup. > The cleanup has a timeout of 2ms in which the test sometimes cannot do > the cleanup completely and gives out an Assertion Failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2
[ https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6195: Fix Version/s: 2.5.0 3.0.0 Status: Patch Available (was: Open) > TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and > intermittently fails on trunk and branch2 > -- > > Key: HDFS-6195 > URL: https://issues.apache.org/jira/browse/HDFS-6195 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.5.0 >Reporter: Mit Desai >Assignee: Mit Desai > Fix For: 3.0.0, 2.5.0 > > Attachments: HDFS-6195.patch > > > The test has 1 containers that it tries to cleanup. > The cleanup has a timeout of 2ms in which the test sometimes cannot do > the cleanup completely and gives out an Assertion Failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2
[ https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-6195: Attachment: HDFS-6195.patch Attaching the patch for trunk and branch2 > TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and > intermittently fails on trunk and branch2 > -- > > Key: HDFS-6195 > URL: https://issues.apache.org/jira/browse/HDFS-6195 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.5.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-6195.patch > > > The test has 1 containers that it tries to cleanup. > The cleanup has a timeout of 2ms in which the test sometimes cannot do > the cleanup completely and gives out an Assertion Failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2
[ https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961887#comment-13961887 ] Mit Desai commented on HDFS-6195: - analyzing the cause. Will post the analysis/fix soon > TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and > intermittently fails on trunk and branch2 > -- > > Key: HDFS-6195 > URL: https://issues.apache.org/jira/browse/HDFS-6195 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.5.0 >Reporter: Mit Desai >Assignee: Mit Desai > > The test has 1 containers that it tries to cleanup. > The cleanup has a timeout of 2ms in which the test sometimes cannot do > the cleanup completely and gives out an Assertion Failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2
Mit Desai created HDFS-6195: --- Summary: TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2 Key: HDFS-6195 URL: https://issues.apache.org/jira/browse/HDFS-6195 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai The test has 1 containers that it tries to cleanup. The cleanup has a timeout of 2ms in which the test sometimes cannot do the cleanup completely and gives out an Assertion Failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5807) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2
[ https://issues.apache.org/jira/browse/HDFS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947957#comment-13947957 ] Mit Desai commented on HDFS-5807: - Thanks [~airbots] > TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on > Branch-2 > > > Key: HDFS-5807 > URL: https://issues.apache.org/jira/browse/HDFS-5807 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.3.0 >Reporter: Mit Desai >Assignee: Chen He > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-5807.patch > > > The test times out after some time. > {noformat} > java.util.concurrent.TimeoutException: Rebalancing expected avg utilization > to become 0.16, but on datanode 127.0.0.1:42451 it remains at 0.3 after more > than 2 msec. > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (HDFS-5807) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2
[ https://issues.apache.org/jira/browse/HDFS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reopened HDFS-5807: - [~airbots], I found this test failing again in our nightly builds, Can you take a look into it again? {noformat} Error Message Rebalancing expected avg utilization to become 0.16, but on datanode X.X.X.X: it remains at 0.3 after more than 4 msec. Stacktrace java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.16, but on datanode X.X.X.X: it remains at 0.3 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302) {noformat} > TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on > Branch-2 > > > Key: HDFS-5807 > URL: https://issues.apache.org/jira/browse/HDFS-5807 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.3.0 >Reporter: Mit Desai >Assignee: Chen He > Fix For: 3.0.0, 2.4.0 > > Attachments: HDFS-5807.patch > > > The test times out after some time. > {noformat} > java.util.concurrent.TimeoutException: Rebalancing expected avg utilization > to become 0.16, but on datanode 127.0.0.1:42451 it remains at 0.3 after more > than 2 msec. > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6126) TestnameNodeMetrics#testCorruptBlock fails intermittently
Mit Desai created HDFS-6126: --- Summary: TestnameNodeMetrics#testCorruptBlock fails intermittently Key: HDFS-6126 URL: https://issues.apache.org/jira/browse/HDFS-6126 Project: Hadoop HDFS Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai I get the following error {noformat} testCorruptBlock(org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics) Time elapsed: 5.556 sec <<< FAILURE! java.lang.AssertionError: Bad value for metric CorruptBlocks expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:190) at org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:247) Results : Failed tests: TestNameNodeMetrics.testCorruptBlock:247 Bad value for metric CorruptBlocks expected:<1> but was:<0> {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6104) TestFsLimits#testDefaultMaxComponentLength Fails on branch-2
Mit Desai created HDFS-6104: --- Summary: TestFsLimits#testDefaultMaxComponentLength Fails on branch-2 Key: HDFS-6104 URL: https://issues.apache.org/jira/browse/HDFS-6104 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Mit Desai testDefaultMaxComponentLength fails intermittently with the following error {noformat} java.lang.AssertionError: expected:<0> but was:<255> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.namenode.TestFsLimits.testDefaultMaxComponentLength(TestFsLimits.java:90) {noformat} On doing some research, I found that this is actually a JDK7 issue. The test always fails when it runs after any test that runs addChildWithName() method -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930720#comment-13930720 ] Mit Desai commented on HDFS-6035: - [~sathish.gurram], Can you let me know what branch are you testing this on? > TestCacheDirectives#testCacheManagerRestart is failing on branch-2 > -- > > Key: HDFS-6035 > URL: https://issues.apache.org/jira/browse/HDFS-6035 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.4.0 >Reporter: Mit Desai >Assignee: sathish > Attachments: HDFS-6035-0001.patch > > > {noformat} > java.io.IOException: Inconsistent checkpoint fields. > LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; > blockpoolId = BP-423574854-x.x.x.x-1393478669835. > Expecting respectively: -51; 2; 0; testClusterID; > BP-2051361571-x.x.x.x-1393478572877. > at > org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE
[ https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13921181#comment-13921181 ] Mit Desai commented on HDFS-5857: - None of the test failures are related to the patch. I have manually tested them with the patch and they pass on my machine > TestWebHDFS#testNamenodeRestart fails intermittently with NPE > - > > Key: HDFS-5857 > URL: https://issues.apache.org/jira/browse/HDFS-5857 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-5857.patch, HDFS-5857.patch > > > {noformat} > java.lang.AssertionError: There are 1 exception(s): > Exception 0: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816) > at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196) > at > org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920) > at java.lang.Thread.run(Thread.java:722) > at org.junit.Assert.fail(Assert.java:93) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951) > at > org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE
[ https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5857: Attachment: HDFS-5857.patch Thanks for the inputs Haohui. Attaching the updated patch > TestWebHDFS#testNamenodeRestart fails intermittently with NPE > - > > Key: HDFS-5857 > URL: https://issues.apache.org/jira/browse/HDFS-5857 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-5857.patch, HDFS-5857.patch > > > {noformat} > java.lang.AssertionError: There are 1 exception(s): > Exception 0: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816) > at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196) > at > org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920) > at java.lang.Thread.run(Thread.java:722) > at org.junit.Assert.fail(Assert.java:93) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951) > at > org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920948#comment-13920948 ] Mit Desai commented on HDFS-6035: - I am trying but cannot reproduce it in eclipse as well. I'll have to put some more efforts and update you once I have some findings. > TestCacheDirectives#testCacheManagerRestart is failing on branch-2 > -- > > Key: HDFS-6035 > URL: https://issues.apache.org/jira/browse/HDFS-6035 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.4.0 >Reporter: Mit Desai >Assignee: sathish > Attachments: HDFS-6035-0001.patch > > > {noformat} > java.io.IOException: Inconsistent checkpoint fields. > LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; > blockpoolId = BP-423574854-x.x.x.x-1393478669835. > Expecting respectively: -51; 2; 0; testClusterID; > BP-2051361571-x.x.x.x-1393478572877. > at > org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-5839) TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk
[ https://issues.apache.org/jira/browse/HDFS-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved HDFS-5839. - Resolution: Duplicate HDFS-5857 has a patch for this issue. I am resolving this JIRA so that we have a single Jira tracking it > TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk > > > Key: HDFS-5839 > URL: https://issues.apache.org/jira/browse/HDFS-5839 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu >Assignee: Mit Desai > Attachments: org.apache.hadoop.hdfs.web.TestWebHDFS-output.txt > > > Here is test failure: > {code} > testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS) Time elapsed: > 45.206 sec <<< FAILURE! > java.lang.AssertionError: There are 1 exception(s): > Exception 0: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:104) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:615) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$OffsetUrlOpener.connect(WebHdfsFileSystem.java:878) > at > org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119) > at > org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103) > at > org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:180) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.hdfs.TestDFSClientRetries$5.run(TestDFSClientRetries.java:954) > at java.lang.Thread.run(Thread.java:724) > at org.junit.Assert.fail(Assert.java:93) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1083) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:1003) > at > org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) > {code} > From test output: > {code} > 2014-01-27 17:55:59,388 WARN resources.ExceptionHandler > (ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERROR > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:166) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:231) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:658) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.access$400(NamenodeWebHdfsMethods.java:116) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:631) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:626) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1560) > at > org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.Res
[jira] [Updated] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE
[ https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5857: Affects Version/s: 3.0.0 Status: Patch Available (was: Open) > TestWebHDFS#testNamenodeRestart fails intermittently with NPE > - > > Key: HDFS-5857 > URL: https://issues.apache.org/jira/browse/HDFS-5857 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0, 3.0.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-5857.patch > > > {noformat} > java.lang.AssertionError: There are 1 exception(s): > Exception 0: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816) > at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196) > at > org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920) > at java.lang.Thread.run(Thread.java:722) > at org.junit.Assert.fail(Assert.java:93) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951) > at > org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE
[ https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5857: Attachment: HDFS-5857.patch Attaching the patch > TestWebHDFS#testNamenodeRestart fails intermittently with NPE > - > > Key: HDFS-5857 > URL: https://issues.apache.org/jira/browse/HDFS-5857 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-5857.patch > > > {noformat} > java.lang.AssertionError: There are 1 exception(s): > Exception 0: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816) > at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196) > at > org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920) > at java.lang.Thread.run(Thread.java:722) > at org.junit.Assert.fail(Assert.java:93) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951) > at > org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE
[ https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned HDFS-5857: --- Assignee: Mit Desai > TestWebHDFS#testNamenodeRestart fails intermittently with NPE > - > > Key: HDFS-5857 > URL: https://issues.apache.org/jira/browse/HDFS-5857 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > > {noformat} > java.lang.AssertionError: There are 1 exception(s): > Exception 0: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701) > at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816) > at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196) > at > org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920) > at java.lang.Thread.run(Thread.java:722) > at org.junit.Assert.fail(Assert.java:93) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031) > at > org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951) > at > org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5950) The DFSClient and DataNode should use shared memory segments to communicate short-circuit information
[ https://issues.apache.org/jira/browse/HDFS-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918570#comment-13918570 ] Mit Desai commented on HDFS-5950: - Hey, I just found that this check in causes a Release Audit Warning for the empty file _TestShortCircuitShm.java_ > The DFSClient and DataNode should use shared memory segments to communicate > short-circuit information > - > > Key: HDFS-5950 > URL: https://issues.apache.org/jira/browse/HDFS-5950 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Affects Versions: 2.4.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 2.4.0 > > Attachments: HDFS-5950.001.patch, HDFS-5950.003.patch, > HDFS-5950.004.patch, HDFS-5950.006.patch, HDFS-5950.007.patch, > HDFS-5950.008.patch > > > The DFSClient and DataNode should use the shared memory segments and unified > cache added in the other HDFS-5182 subtasks to communicate short-circuit > information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918110#comment-13918110 ] Mit Desai commented on HDFS-6035: - Thanks for taking this issue Sathish. This test is failing in our nightly builds but I am unable to reproduce it. is there a specific way you were able to reproduce it? > TestCacheDirectives#testCacheManagerRestart is failing on branch-2 > -- > > Key: HDFS-6035 > URL: https://issues.apache.org/jira/browse/HDFS-6035 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.4.0 >Reporter: Mit Desai >Assignee: sathish > Attachments: HDFS-6035-0001.patch > > > {noformat} > java.io.IOException: Inconsistent checkpoint fields. > LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; > blockpoolId = BP-423574854-x.x.x.x-1393478669835. > Expecting respectively: -51; 2; 0; testClusterID; > BP-2051361571-x.x.x.x-1393478572877. > at > org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526) > at > org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2
Mit Desai created HDFS-6035: --- Summary: TestCacheDirectives#testCacheManagerRestart is failing on branch-2 Key: HDFS-6035 URL: https://issues.apache.org/jira/browse/HDFS-6035 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0 Reporter: Mit Desai {noformat} java.io.IOException: Inconsistent checkpoint fields. LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; blockpoolId = BP-423574854-x.x.x.x-1393478669835. Expecting respectively: -51; 2; 0; testClusterID; BP-2051361571-x.x.x.x-1393478572877. at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5780: Attachment: HDFS-5780-v3.patch New Patch Attached. No code changes after form the previous patch. This patch only contains the change in the comment where the Thread timeout was changed from 1sec to 2sec > TestRBWBlockInvalidation times out intemittently on branch-2 > > > Key: HDFS-5780 > URL: https://issues.apache.org/jira/browse/HDFS-5780 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch > > > i recently found out that the test > TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times > out intermittently. > I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5780: Status: Patch Available (was: Open) > TestRBWBlockInvalidation times out intemittently on branch-2 > > > Key: HDFS-5780 > URL: https://issues.apache.org/jira/browse/HDFS-5780 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0, 3.0.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-5780.patch, HDFS-5780.patch > > > i recently found out that the test > TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times > out intermittently. > I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5780: Attachment: HDFS-5780.patch Attaching the new patch with the addressed changes. I have increased the timeout to 10minutes and I had to make few other timing related changes. > TestRBWBlockInvalidation times out intemittently on branch-2 > > > Key: HDFS-5780 > URL: https://issues.apache.org/jira/browse/HDFS-5780 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-5780.patch, HDFS-5780.patch > > > i recently found out that the test > TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times > out intermittently. > I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated HDFS-5780: Status: Open (was: Patch Available) > TestRBWBlockInvalidation times out intemittently on branch-2 > > > Key: HDFS-5780 > URL: https://issues.apache.org/jira/browse/HDFS-5780 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0, 3.0.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: HDFS-5780.patch > > > i recently found out that the test > TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times > out intermittently. > I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)