[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2015-07-28 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644357#comment-14644357
 ] 

Mit Desai commented on HDFS-742:


Attached modified patch. But still, I do not have a unit test for the fix

> A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
> partial block list
> 
>
> Key: HDFS-742
> URL: https://issues.apache.org/jira/browse/HDFS-742
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Hairong Kuang
>Assignee: Mit Desai
> Attachments: HDFS-742-trunk.patch, HDFS-742.patch
>
>
> We had a balancer that had not made any progress for a long time. It turned 
> out it was repeatingly asking Namenode for a partial block list of one 
> datanode, which was done while the balancer was running.
> NameNode should notify Balancer that the datanode is not available and 
> Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2015-07-28 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-742:
---
Attachment: HDFS-742-trunk.patch

> A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
> partial block list
> 
>
> Key: HDFS-742
> URL: https://issues.apache.org/jira/browse/HDFS-742
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Hairong Kuang
>Assignee: Mit Desai
> Attachments: HDFS-742-trunk.patch, HDFS-742.patch
>
>
> We had a balancer that had not made any progress for a long time. It turned 
> out it was repeatingly asking Namenode for a partial block list of one 
> datanode, which was done while the balancer was running.
> NameNode should notify Balancer that the datanode is not available and 
> Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7364) Balancer always show zero Bytes Already Moved

2014-11-06 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200425#comment-14200425
 ] 

Mit Desai commented on HDFS-7364:
-

Nice catch. Here balancer exits here after 5 iterations of what it thinks has 
0B move. It means it is still balancing and exits in middle the process. I see 
that the Bytes left to move is going down in every iteration.
It will be nice to have this fixed. But it would be good to have a unit test as 
well.

> Balancer always show zero Bytes Already Moved
> -
>
> Key: HDFS-7364
> URL: https://issues.apache.org/jira/browse/HDFS-7364
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h7364_20141105.patch
>
>
> Here is an example:
> {noformat}
> Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
> Bytes Being Moved
> Nov 5, 2014 5:23:38 PM0  0 B   116.82 MB  
> 181.07 MB
> Nov 5, 2014 5:24:30 PM1  0 B88.05 MB  
> 181.07 MB
> Nov 5, 2014 5:25:10 PM2  0 B73.08 MB  
> 181.07 MB
> Nov 5, 2014 5:25:49 PM3  0 B13.37 MB  
>  90.53 MB
> Nov 5, 2014 5:26:30 PM4  0 B13.59 MB  
>  90.53 MB
> Nov 5, 2014 5:27:12 PM5  0 B 9.25 MB  
>  90.53 MB
> The cluster is balanced. Exiting...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7230) Add rolling downgrade documentation

2014-10-30 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190226#comment-14190226
 ] 

Mit Desai commented on HDFS-7230:
-

+1 (non-binding)

Thanks for the patch [~szetszwo].

> Add rolling downgrade documentation
> ---
>
> Key: HDFS-7230
> URL: https://issues.apache.org/jira/browse/HDFS-7230
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h7230_20141028.patch
>
>
> HDFS-5535 made a lot of improvement on rolling upgrade.  It also added the 
> cluster downgrade feature.  However, the downgrade described in HDFS-5535 
> requires cluster downtime.  In this JIRA, we discuss how to do rolling 
> downgrade, i.e. downgrade without downtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6687) nn.getNamesystem() may return NPE from JspHelper

2014-09-26 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6687:

Target Version/s: 2.6.0  (was: 2.5.0)

> nn.getNamesystem() may return NPE from JspHelper
> 
>
> Key: HDFS-6687
> URL: https://issues.apache.org/jira/browse/HDFS-6687
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>
> In hadoop-2, the http server is started in the very early stage to show the 
> progress. If the user tries to get the name system, it may not be completely 
> up and the NN logs will have this kind of error.
> {noformat}
> 2014-07-14 15:49:03,521 [***] WARN
> resources.ExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.common.JspHelper.getTokenUGI(JspHelper.java:661)
> at
> org.apache.hadoop.hdfs.server.common.JspHelper.getUGI(JspHelper.java:604)
> at
> org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:53)
> at
> org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:41)
> at
> com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46)
> at
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153)
> at
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203)
> at
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
> at
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
> at
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
> at
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
> at
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
> at
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
> at
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
> at
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
> at
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
> at
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
> at
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:84)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78)
> at com.yahoo.hadoop.GzipFilter.doFilter(GzipFilter.java:220)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.

[jira] [Created] (HDFS-6983) TestBalancer#testExitZeroOnSuccess fails intermittently

2014-09-02 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6983:
---

 Summary: TestBalancer#testExitZeroOnSuccess fails intermittently
 Key: HDFS-6983
 URL: https://issues.apache.org/jira/browse/HDFS-6983
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Mit Desai


TestBalancer#testExitZeroOnSuccess fails intermittently on branch-2. And 
probably fails on trunk too.

The test fails 1 in 20 times when I ran it in a loop. Here is the how it fails.

{noformat}
org.apache.hadoop.hdfs.server.balancer.TestBalancer
testExitZeroOnSuccess(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  
Time elapsed: 53.965 sec  <<< ERROR!
java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to 
become 0.2, but on datanode 127.0.0.1:35502 it remains at 0.08 after more than 
4 msec.
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:321)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:632)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:549)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:437)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:645)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:845)


Results :

Tests in error: 
  
TestBalancer.testExitZeroOnSuccess:845->oneNodeTest:645->doTest:437->doTest:549->runBalancerCli:632->waitForBalancer:321
 Timeout
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076918#comment-14076918
 ] 

Mit Desai commented on HDFS-6754:
-

Thanks Daryn!

> TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
> retry
> ---
>
> Key: HDFS-6754
> URL: https://issues.apache.org/jira/browse/HDFS-6754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6754.patch, HDFS-6754.patch
>
>
> I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
> in our nightly builds with the following error:
> {noformat}
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076649#comment-14076649
 ] 

Mit Desai commented on HDFS-6754:
-

These test failures are not related to the patch.

> TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
> retry
> ---
>
> Key: HDFS-6754
> URL: https://issues.apache.org/jira/browse/HDFS-6754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6754.patch, HDFS-6754.patch
>
>
> I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
> in our nightly builds with the following error:
> {noformat}
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6754:


Status: Patch Available  (was: Open)

> TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
> retry
> ---
>
> Key: HDFS-6754
> URL: https://issues.apache.org/jira/browse/HDFS-6754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6754.patch, HDFS-6754.patch
>
>
> I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
> in our nightly builds with the following error:
> {noformat}
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6754:


Attachment: HDFS-6754.patch

Refined patch to update the comment which described the changed line

> TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
> retry
> ---
>
> Key: HDFS-6754
> URL: https://issues.apache.org/jira/browse/HDFS-6754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6754.patch, HDFS-6754.patch
>
>
> I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
> in our nightly builds with the following error:
> {noformat}
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6754:


Status: Open  (was: Patch Available)

> TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
> retry
> ---
>
> Key: HDFS-6754
> URL: https://issues.apache.org/jira/browse/HDFS-6754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6754.patch
>
>
> I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
> in our nightly builds with the following error:
> {noformat}
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6754:


Attachment: HDFS-6754.patch

Attaching patch to enable retries.

> TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
> retry
> ---
>
> Key: HDFS-6754
> URL: https://issues.apache.org/jira/browse/HDFS-6754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6754.patch
>
>
> I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
> in our nightly builds with the following error:
> {noformat}
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6754:


Status: Patch Available  (was: Open)

> TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
> retry
> ---
>
> Key: HDFS-6754
> URL: https://issues.apache.org/jira/browse/HDFS-6754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6754.patch
>
>
> I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
> in our nightly builds with the following error:
> {noformat}
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6755) There is an unnecessary sleep in the code path where DFSOutputStream#close gives up its attempt to contact the namenode

2014-07-25 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075236#comment-14075236
 ] 

Mit Desai commented on HDFS-6755:
-

Thanks Colin!

> There is an unnecessary sleep in the code path where DFSOutputStream#close 
> gives up its attempt to contact the namenode
> ---
>
> Key: HDFS-6755
> URL: https://issues.apache.org/jira/browse/HDFS-6755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6755.patch
>
>
> DFSOutputStream#close has a loop where it tries to contact the NameNode, to 
> call {{complete}} on the file which is open-for-write.  This loop includes a 
> sleep which increases exponentially (exponential backoff).  It makes sense to 
> sleep before re-contacting the NameNode, but the code also sleeps even in the 
> case where it has already decided to give up and throw an exception back to 
> the user.  It should not sleep after it has already decided to give up, since 
> there's no point.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6755) Make DFSOutputStream more efficient

2014-07-25 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6755:


Attachment: HDFS-6755.patch

Hi [~cmccabe],
I did not mean to get rid of the sleep. I have uploaded the patch to indicate 
the change I wanted to make.
I wanted to throw an IOException if the {{retries == 0}} before 
{{Thread.sleep(localTimeout);}} is called.

Does that seem reasonable?

> Make DFSOutputStream more efficient
> ---
>
> Key: HDFS-6755
> URL: https://issues.apache.org/jira/browse/HDFS-6755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6755.patch
>
>
> Following code in DFSOutputStream may have an unnecessary sleep.
> {code}
> try {
>   Thread.sleep(localTimeout);
>   if (retries == 0) {
> throw new IOException("Unable to close file because the last 
> block"
> + " does not have enough number of replicas.");
>   }
>   retries--;
>   localTimeout *= 2;
>   if (Time.now() - localstart > 5000) {
> DFSClient.LOG.info("Could not complete " + src + " retrying...");
>   }
> } catch (InterruptedException ie) {
>   DFSClient.LOG.warn("Caught exception ", ie);
> }
> {code}
> Currently, the code sleeps before throwing an exception which should not be 
> the case.
> The sleep time gets doubled on every iteration, which can make a significant 
> effect if there are more than one iterations and it would sleep just to throw 
> an exception. We need to move the sleep down after decrementing retries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6755) Make DFSOutputStream more efficient

2014-07-25 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6755:


Description: 
Following code in DFSOutputStream may have an unnecessary sleep.

{code}
try {
  Thread.sleep(localTimeout);
  if (retries == 0) {
throw new IOException("Unable to close file because the last block"
+ " does not have enough number of replicas.");
  }
  retries--;
  localTimeout *= 2;
  if (Time.now() - localstart > 5000) {
DFSClient.LOG.info("Could not complete " + src + " retrying...");
  }
} catch (InterruptedException ie) {
  DFSClient.LOG.warn("Caught exception ", ie);
}
{code}

Currently, the code sleeps before throwing an exception which should not be the 
case.
The sleep time gets doubled on every iteration, which can make a significant 
effect if there are more than one iterations and it would sleep just to throw 
an exception. We need to move the sleep down after decrementing retries.

  was:
Following code in DFSOutputStream may have an unnecessary sleep.

{code}
try {
  Thread.sleep(localTimeout);
  if (retries == 0) {
throw new IOException("Unable to close file because the last block"
+ " does not have enough number of replicas.");
  }
  retries--;
  localTimeout *= 2;
  if (Time.now() - localstart > 5000) {
DFSClient.LOG.info("Could not complete " + src + " retrying...");
  }
} catch (InterruptedException ie) {
  DFSClient.LOG.warn("Caught exception ", ie);
}
{code}

Currently, the code sleeps before throwing an exception which should not be the 
case.
The sleep time gets doubled on every iteration, which can make a significant 
effect if there are more than one iterations. We need to move the sleep down 
after decrementing retries.


> Make DFSOutputStream more efficient
> ---
>
> Key: HDFS-6755
> URL: https://issues.apache.org/jira/browse/HDFS-6755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>
> Following code in DFSOutputStream may have an unnecessary sleep.
> {code}
> try {
>   Thread.sleep(localTimeout);
>   if (retries == 0) {
> throw new IOException("Unable to close file because the last 
> block"
> + " does not have enough number of replicas.");
>   }
>   retries--;
>   localTimeout *= 2;
>   if (Time.now() - localstart > 5000) {
> DFSClient.LOG.info("Could not complete " + src + " retrying...");
>   }
> } catch (InterruptedException ie) {
>   DFSClient.LOG.warn("Caught exception ", ie);
> }
> {code}
> Currently, the code sleeps before throwing an exception which should not be 
> the case.
> The sleep time gets doubled on every iteration, which can make a significant 
> effect if there are more than one iterations and it would sleep just to throw 
> an exception. We need to move the sleep down after decrementing retries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6755) Make DFSOutputStream more efficient

2014-07-25 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6755:


Issue Type: Improvement  (was: Bug)

> Make DFSOutputStream more efficient
> ---
>
> Key: HDFS-6755
> URL: https://issues.apache.org/jira/browse/HDFS-6755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>
> Following code in DFSOutputStream may have an unnecessary sleep.
> {code}
> try {
>   Thread.sleep(localTimeout);
>   if (retries == 0) {
> throw new IOException("Unable to close file because the last 
> block"
> + " does not have enough number of replicas.");
>   }
>   retries--;
>   localTimeout *= 2;
>   if (Time.now() - localstart > 5000) {
> DFSClient.LOG.info("Could not complete " + src + " retrying...");
>   }
> } catch (InterruptedException ie) {
>   DFSClient.LOG.warn("Caught exception ", ie);
> }
> {code}
> Currently, the code sleeps before throwing an exception which should not be 
> the case.
> The sleep time gets doubled on every iteration, which can make a significant 
> effect if there are more than one iterations. We need to move the sleep down 
> after decrementing retries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6755) Make DFSOutputStream more efficient

2014-07-25 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6755:
---

 Summary: Make DFSOutputStream more efficient
 Key: HDFS-6755
 URL: https://issues.apache.org/jira/browse/HDFS-6755
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai


Following code in DFSOutputStream may have an unnecessary sleep.

{code}
try {
  Thread.sleep(localTimeout);
  if (retries == 0) {
throw new IOException("Unable to close file because the last block"
+ " does not have enough number of replicas.");
  }
  retries--;
  localTimeout *= 2;
  if (Time.now() - localstart > 5000) {
DFSClient.LOG.info("Could not complete " + src + " retrying...");
  }
} catch (InterruptedException ie) {
  DFSClient.LOG.warn("Caught exception ", ie);
}
{code}

Currently, the code sleeps before throwing an exception which should not be the 
case.
The sleep time gets doubled on every iteration, which can make a significant 
effect if there are more than one iterations. We need to move the sleep down 
after decrementing retries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-25 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai moved YARN-2358 to HDFS-6754:
---

 Target Version/s: 2.6.0  (was: 2.6.0)
Affects Version/s: (was: 2.6.0)
   2.6.0
  Key: HDFS-6754  (was: YARN-2358)
  Project: Hadoop HDFS  (was: Hadoop YARN)

> TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
> retry
> ---
>
> Key: HDFS-6754
> URL: https://issues.apache.org/jira/browse/HDFS-6754
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>
> I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
> in our nightly builds with the following error:
> {noformat}
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6696) Name node cannot start if the path of a file under construction contains ".snapshot"

2014-07-21 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069235#comment-14069235
 ] 

Mit Desai commented on HDFS-6696:
-

[~andrew.wang], we were trying to upgrade 0.21.11 to 2.4.0

> Name node cannot start if the path of a file under construction contains 
> ".snapshot"
> 
>
> Key: HDFS-6696
> URL: https://issues.apache.org/jira/browse/HDFS-6696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Andrew Wang
>Priority: Blocker
>
> Using {{-renameReserved}} to rename ".snapshot" in a pre-hdfs-snapshot 
> feature fsimage during upgrade only works, if there is nothing under 
> construction under the renamed directory.  I am not sure whether it takes 
> care of edits containing ".snapshot" properly.
> The workaround is to identify these directories and rename, then do 
> {{saveNamespace}} before performing upgrade.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade

2014-07-15 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6691:


Attachment: ha2.png

> The message on NN UI can be confusing during a rolling upgrade 
> ---
>
> Key: HDFS-6691
> URL: https://issues.apache.org/jira/browse/HDFS-6691
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: ha1.png, ha2.png
>
>
> On ANN, it says rollback image was created. On SBN, it says otherwise.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade

2014-07-15 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6691:


Attachment: ha1.png

> The message on NN UI can be confusing during a rolling upgrade 
> ---
>
> Key: HDFS-6691
> URL: https://issues.apache.org/jira/browse/HDFS-6691
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: ha1.png
>
>
> On ANN, it says rollback image was created. On SBN, it says otherwise.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade

2014-07-15 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6691:
---

 Summary: The message on NN UI can be confusing during a rolling 
upgrade 
 Key: HDFS-6691
 URL: https://issues.apache.org/jira/browse/HDFS-6691
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: ha1.png

On ANN, it says rollback image was created. On SBN, it says otherwise.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6687) nn.getNamesystem() may return NPE from JspHelper

2014-07-15 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6687:
---

 Summary: nn.getNamesystem() may return NPE from JspHelper
 Key: HDFS-6687
 URL: https://issues.apache.org/jira/browse/HDFS-6687
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai


In hadoop-2, the http server is started in the very early stage to show the 
progress. If the user tries to get the name system, it may not be completely up 
and the NN logs will have this kind of error.

{noformat}
2014-07-14 15:49:03,521 [***] WARN
resources.ExceptionHandler: INTERNAL_SERVER_ERROR
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.common.JspHelper.getTokenUGI(JspHelper.java:661)
at
org.apache.hadoop.hdfs.server.common.JspHelper.getUGI(JspHelper.java:604)
at
org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:53)
at
org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:41)
at
com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46)
at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153)
at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203)
at
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:84)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78)
at com.yahoo.hadoop.GzipFilter.doFilter(GzipFilter.java:220)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mort

[jira] [Commented] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-06-24 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042525#comment-14042525
 ] 

Mit Desai commented on HDFS-6597:
-

Changing the summary to describe the jira more accurately.

> Add a new option to NN upgrade to terminate the process after upgrade on NN 
> is completed
> 
>
> Key: HDFS-6597
> URL: https://issues.apache.org/jira/browse/HDFS-6597
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Danilo Vunjak
> Attachments: JIRA-HDFS-30.patch
>
>
> Currently when namenode is started for upgrade (hadoop namenode -upgrade 
> command), after finishing upgrade of metadata, namenode starts working 
> normally and wait for datanodes to upgrade itself and connect to to NN. We 
> need to have option for upgrading only NN metadata, so after upgrade is 
> finished on NN, process should terminate.
> I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
> method: public static NameNode createNameNode(String argv[], Configuration 
> conf):
>  in switch added
>  case UPGRADE:
> case UPGRADE:
>   {
> DefaultMetricsSystem.initialize("NameNode");
>   NameNode nameNode = new NameNode(conf);
>   if (startOpt.getForceUpgrade()) {
> terminate(0);
> return null;
>   }
>   
>   return nameNode;
>   }
> This did upgrade of metadata, closed process after finished, and later when 
> all services were started, upgrade of datanodes finished sucessfully and 
> system run .
> What I'm suggesting right now is to add new startup parameter "-force", so 
> namenode can be started like this "hadoop namenode -upgrade -force", so we 
> can indicate that we want to terminate process after upgrade metadata on NN 
> is finished. Old functionality should be preserved, so users can run "hadoop 
> namenode -upgrade" on same way and with same behaviour as it was previous.
>  Thanks,
>  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-06-24 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6597:


Summary: Add a new option to NN upgrade to terminate the process after 
upgrade on NN is completed  (was: New option for namenode upgrade)

> Add a new option to NN upgrade to terminate the process after upgrade on NN 
> is completed
> 
>
> Key: HDFS-6597
> URL: https://issues.apache.org/jira/browse/HDFS-6597
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Danilo Vunjak
> Attachments: JIRA-HDFS-30.patch
>
>
> Currently when namenode is started for upgrade (hadoop namenode -upgrade 
> command), after finishing upgrade of metadata, namenode starts working 
> normally and wait for datanodes to upgrade itself and connect to to NN. We 
> need to have option for upgrading only NN metadata, so after upgrade is 
> finished on NN, process should terminate.
> I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
> method: public static NameNode createNameNode(String argv[], Configuration 
> conf):
>  in switch added
>  case UPGRADE:
> case UPGRADE:
>   {
> DefaultMetricsSystem.initialize("NameNode");
>   NameNode nameNode = new NameNode(conf);
>   if (startOpt.getForceUpgrade()) {
> terminate(0);
> return null;
>   }
>   
>   return nameNode;
>   }
> This did upgrade of metadata, closed process after finished, and later when 
> all services were started, upgrade of datanodes finished sucessfully and 
> system run .
> What I'm suggesting right now is to add new startup parameter "-force", so 
> namenode can be started like this "hadoop namenode -upgrade -force", so we 
> can indicate that we want to terminate process after upgrade metadata on NN 
> is finished. Old functionality should be preserved, so users can run "hadoop 
> namenode -upgrade" on same way and with same behaviour as it was previous.
>  Thanks,
>  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6597) New option for namenode upgrade

2014-06-24 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042518#comment-14042518
 ] 

Mit Desai commented on HDFS-6597:
-

The idea seems to be good as it does not alter the way -upgrade currently works.
* I agree with [~cmccabe] and [~cnauroth] on the new name to be "force"
* Instead of -force, -halt or -upgradeOnly seems to be reasonable. But anything 
would be good as long as it does nor imply we are forcing something to get done 
which it should not be doing.

Thanks,
Mit

> New option for namenode upgrade
> ---
>
> Key: HDFS-6597
> URL: https://issues.apache.org/jira/browse/HDFS-6597
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Danilo Vunjak
> Attachments: JIRA-HDFS-30.patch
>
>
> Currently when namenode is started for upgrade (hadoop namenode -upgrade 
> command), after finishing upgrade of metadata, namenode starts working 
> normally and wait for datanodes to upgrade itself and connect to to NN. We 
> need to have option for upgrading only NN metadata, so after upgrade is 
> finished on NN, process should terminate.
> I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
> method: public static NameNode createNameNode(String argv[], Configuration 
> conf):
>  in switch added
>  case UPGRADE:
> case UPGRADE:
>   {
> DefaultMetricsSystem.initialize("NameNode");
>   NameNode nameNode = new NameNode(conf);
>   if (startOpt.getForceUpgrade()) {
> terminate(0);
> return null;
>   }
>   
>   return nameNode;
>   }
> This did upgrade of metadata, closed process after finished, and later when 
> all services were started, upgrade of datanodes finished sucessfully and 
> system run .
> What I'm suggesting right now is to add new startup parameter "-force", so 
> namenode can be started like this "hadoop namenode -upgrade -force", so we 
> can indicate that we want to terminate process after upgrade metadata on NN 
> is finished. Old functionality should be preserved, so users can run "hadoop 
> namenode -upgrade" on same way and with same behaviour as it was previous.
>  Thanks,
>  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2014-06-10 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026612#comment-14026612
 ] 

Mit Desai commented on HDFS-742:


Attaching the patch. Unfortunately I do not have a way to reproduce the issue 
so I'm unable to have a test to verify the change.
Here is the explanation of the part of the Balancer code makes it hang forever.

In the following while loop in Balancer.java, when the Balancer figures out 
that it should fetch more blocks, it gets the BlockList and decrements the 
blockToReceive by that many blocks. It again starts from the top of the loop 
after that.

{code}
 while(!isTimeUp && getScheduledSize()>0 &&
  (!srcBlockList.isEmpty() || blocksToReceive>0)) {
   
## SOME LINES OMITTED ##

filterMovedBlocks(); // filter already moved blocks
if (shouldFetchMoreBlocks()) {
  // fetch new blocks
  try {
blocksToReceive -= getBlockList();
continue;
  } catch (IOException e) {

## SOME LINES OMITTED ##

// check if time is up or not
if (Time.now()-startTime > MAX_ITERATION_TIME) {
  isTimeUp = true;
  continue;
}
## SOME LINES OMITTED ##

 }
{code}

The problem here is, if the datanode is decommissioned, the {{getBlockList()}} 
method will not return anything and the {{blocksToReceive}} will not be 
changed. It will keep on doing this indefinitely as the {{blocksToReceive}} 
will always be greater than 0. The {{isTimeUp}} will never be set to true as it 
will never reach that part of the code. In the patch that is submitted, the 
Time up condition is moved to the top of the loop. So it will check if 
{{isTimeUp}} is set and proceed ahead only if time up is not encountered.

> A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
> partial block list
> 
>
> Key: HDFS-742
> URL: https://issues.apache.org/jira/browse/HDFS-742
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Reporter: Hairong Kuang
>Assignee: Mit Desai
> Attachments: HDFS-742.patch
>
>
> We had a balancer that had not made any progress for a long time. It turned 
> out it was repeatingly asking Namenode for a partial block list of one 
> datanode, which was done while the balancer was running.
> NameNode should notify Balancer that the datanode is not available and 
> Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2014-06-10 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-742:
---

Attachment: HDFS-742.patch

> A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
> partial block list
> 
>
> Key: HDFS-742
> URL: https://issues.apache.org/jira/browse/HDFS-742
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Reporter: Hairong Kuang
>Assignee: Mit Desai
> Attachments: HDFS-742.patch
>
>
> We had a balancer that had not made any progress for a long time. It turned 
> out it was repeatingly asking Namenode for a partial block list of one 
> datanode, which was done while the balancer was running.
> NameNode should notify Balancer that the datanode is not available and 
> Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-06 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020413#comment-14020413
 ] 

Mit Desai commented on HDFS-6487:
-

Thanks Andrew!

> TestStandbyCheckpoint#testSBNCheckpoints is racy
> 
>
> Key: HDFS-6487
> URL: https://issues.apache.org/jira/browse/HDFS-6487
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6487.patch, HDFS-6487.patch
>
>
> testSBNCheckpoints fails occasionally.
> I could not reproduce it consistently but it would fail 8 out of 10 times 
> after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success

2014-06-06 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020222#comment-14020222
 ] 

Mit Desai commented on HDFS-6159:
-

This test is failing again.
https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
[~airbots], can you take a look on this pre-commit?

> TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block 
> missing after balancer success
> --
>
> Key: HDFS-6159
> URL: https://issues.apache.org/jira/browse/HDFS-6159
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.3.0
>Reporter: Chen He
>Assignee: Chen He
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6159-v2.patch, HDFS-6159-v2.patch, HDFS-6159.patch, 
> logs.txt
>
>
> The TestBalancerWithNodeGroup.testBalancerWithNodeGroup will report negative 
> false failure if there is(are) data block(s) losing after balancer 
> successfuly finishes. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-06 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020213#comment-14020213
 ] 

Mit Desai commented on HDFS-6487:
-

Failure not related to the patch submitted. This has been there since a long 
time. HDFS-5807 and HDFS-6159 were filed to resolve it. I will comment on those 
JIRAs.

> TestStandbyCheckpoint#testSBNCheckpoints is racy
> 
>
> Key: HDFS-6487
> URL: https://issues.apache.org/jira/browse/HDFS-6487
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6487.patch, HDFS-6487.patch
>
>
> testSBNCheckpoints fails occasionally.
> I could not reproduce it consistently but it would fail 8 out of 10 times 
> after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-06 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6487:


Status: Patch Available  (was: Open)

> TestStandbyCheckpoint#testSBNCheckpoints is racy
> 
>
> Key: HDFS-6487
> URL: https://issues.apache.org/jira/browse/HDFS-6487
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6487.patch, HDFS-6487.patch
>
>
> testSBNCheckpoints fails occasionally.
> I could not reproduce it consistently but it would fail 8 out of 10 times 
> after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-06 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6487:


Attachment: HDFS-6487.patch

Attaching the updated patch.

> TestStandbyCheckpoint#testSBNCheckpoints is racy
> 
>
> Key: HDFS-6487
> URL: https://issues.apache.org/jira/browse/HDFS-6487
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6487.patch, HDFS-6487.patch
>
>
> testSBNCheckpoints fails occasionally.
> I could not reproduce it consistently but it would fail 8 out of 10 times 
> after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-06 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6487:


Status: Open  (was: Patch Available)

> TestStandbyCheckpoint#testSBNCheckpoints is racy
> 
>
> Key: HDFS-6487
> URL: https://issues.apache.org/jira/browse/HDFS-6487
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6487.patch
>
>
> testSBNCheckpoints fails occasionally.
> I could not reproduce it consistently but it would fail 8 out of 10 times 
> after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-05 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019320#comment-14019320
 ] 

Mit Desai commented on HDFS-6487:
-

Thanks Andrew for looking into the patch. Using GenericTestUtils.waitFor looks 
like a better option. I will update my patch.
For the timeour, 5sec works for me now. But I will increase it to 60s (It does 
not hurt in waiting a little longer. It will come out of the wait before that 
time anyways)

> TestStandbyCheckpoint#testSBNCheckpoints is racy
> 
>
> Key: HDFS-6487
> URL: https://issues.apache.org/jira/browse/HDFS-6487
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6487.patch
>
>
> testSBNCheckpoints fails occasionally.
> I could not reproduce it consistently but it would fail 8 out of 10 times 
> after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6487:


Affects Version/s: (was: 2.5.0)
   2.4.1
   Status: Patch Available  (was: Open)

> TestStandbyCheckpoint#testSBNCheckpoints is racy
> 
>
> Key: HDFS-6487
> URL: https://issues.apache.org/jira/browse/HDFS-6487
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6487.patch
>
>
> testSBNCheckpoints fails occasionally.
> I could not reproduce it consistently but it would fail 8 out of 10 times 
> after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-05 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018953#comment-14018953
 ] 

Mit Desai commented on HDFS-6487:
-

In testSBNCheckpoints, after doEdits() it waits for the SBN to do checkpoint 
and immediately after that checks if the OIV image has been written. The race 
lies in between completion of checkpoint and checking for OIV image.
I have added a wait for writing the OIV image. This prevents the test from 
failing due to the race and if the OIV image is not written even after 5000ms, 
the test will fail. Which is what is expected.

> TestStandbyCheckpoint#testSBNCheckpoints is racy
> 
>
> Key: HDFS-6487
> URL: https://issues.apache.org/jira/browse/HDFS-6487
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6487.patch
>
>
> testSBNCheckpoints fails occasionally.
> I could not reproduce it consistently but it would fail 8 out of 10 times 
> after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6487:


Attachment: HDFS-6487.patch

Attaching patch for trunk and branch-2

> TestStandbyCheckpoint#testSBNCheckpoints is racy
> 
>
> Key: HDFS-6487
> URL: https://issues.apache.org/jira/browse/HDFS-6487
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6487.patch
>
>
> testSBNCheckpoints fails occasionally.
> I could not reproduce it consistently but it would fail 8 out of 10 times 
> after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-04 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6487:
---

 Summary: TestStandbyCheckpoint#testSBNCheckpoints is racy
 Key: HDFS-6487
 URL: https://issues.apache.org/jira/browse/HDFS-6487
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai


testSBNCheckpoints fails occasionally.
I could not reproduce it consistently but it would fail 8 out of 10 times after 
I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-19 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6421:


Status: Patch Available  (was: Open)

> RHEL4 fails to compile vecsum.c
> ---
>
> Key: HDFS-6421
> URL: https://issues.apache.org/jira/browse/HDFS-6421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.5.0
> Environment: RHEL4
>Reporter: Jason Lowe
>Assignee: Mit Desai
> Attachments: HDFS-6421.patch, HDFS-6421.patch
>
>
> After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
> have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
> compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-19 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6421:


Status: Open  (was: Patch Available)

> RHEL4 fails to compile vecsum.c
> ---
>
> Key: HDFS-6421
> URL: https://issues.apache.org/jira/browse/HDFS-6421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.5.0
> Environment: RHEL4
>Reporter: Jason Lowe
>Assignee: Mit Desai
> Attachments: HDFS-6421.patch, HDFS-6421.patch
>
>
> After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
> have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
> compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-19 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001718#comment-14001718
 ] 

Mit Desai commented on HDFS-6421:
-

Correction: Thanks [~cmccabe] for reviewing the patch. :-) 

> RHEL4 fails to compile vecsum.c
> ---
>
> Key: HDFS-6421
> URL: https://issues.apache.org/jira/browse/HDFS-6421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.5.0
> Environment: RHEL4
>Reporter: Jason Lowe
>Assignee: Mit Desai
> Attachments: HDFS-6421.patch, HDFS-6421.patch
>
>
> After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
> have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
> compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-19 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6421:


Attachment: HDFS-6421.patch

Thanks [~cmccabe] for taking a reviewing the patch.
Attaching the new patch addressing your comments.

> RHEL4 fails to compile vecsum.c
> ---
>
> Key: HDFS-6421
> URL: https://issues.apache.org/jira/browse/HDFS-6421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.5.0
> Environment: RHEL4
>Reporter: Jason Lowe
>Assignee: Mit Desai
> Attachments: HDFS-6421.patch, HDFS-6421.patch
>
>
> After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
> have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
> compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6421:


Status: Patch Available  (was: Open)

> RHEL4 fails to compile vecsum.c
> ---
>
> Key: HDFS-6421
> URL: https://issues.apache.org/jira/browse/HDFS-6421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.5.0
> Environment: RHEL4
>Reporter: Jason Lowe
>Assignee: Mit Desai
> Attachments: HDFS-6421.patch
>
>
> After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
> have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
> compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Status: Patch Available  (was: Open)

> Expose upgrade status through NameNode web UI
> -
>
> Key: HDFS-6230
> URL: https://issues.apache.org/jira/browse/HDFS-6230
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Mit Desai
> Attachments: HDFS-6230-NoUpgradesInProgress.png, 
> HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch, HDFS-6230.patch
>
>
> The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
> also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6421:


Attachment: HDFS-6421.patch

This code in the stopwatch structure gets the rusage and stores it into 
{{struct rusage rusage;}} but it is never used. 
{code}
if (getrusage(RUSAGE_THREAD, &watch->rusage) < 0) {
int err = errno;
fprintf(stderr, "getrusage failed: error %d (%s)\n",
err, strerror(err));
goto error;
}
{code}

Removing the block as to get REHL4 compiling again.

> RHEL4 fails to compile vecsum.c
> ---
>
> Key: HDFS-6421
> URL: https://issues.apache.org/jira/browse/HDFS-6421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.5.0
> Environment: RHEL4
>Reporter: Jason Lowe
>Assignee: Mit Desai
> Attachments: HDFS-6421.patch
>
>
> After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
> have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
> compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned HDFS-6421:
---

Assignee: Mit Desai

> RHEL4 fails to compile vecsum.c
> ---
>
> Key: HDFS-6421
> URL: https://issues.apache.org/jira/browse/HDFS-6421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.5.0
> Environment: RHEL4
>Reporter: Jason Lowe
>Assignee: Mit Desai
>
> After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
> have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
> compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Status: Open  (was: Patch Available)

> Expose upgrade status through NameNode web UI
> -
>
> Key: HDFS-6230
> URL: https://issues.apache.org/jira/browse/HDFS-6230
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Mit Desai
> Attachments: HDFS-6230-NoUpgradesInProgress.png, 
> HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch, HDFS-6230.patch
>
>
> The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
> also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2014-05-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned HDFS-742:
--

Assignee: Mit Desai  (was: Hairong Kuang)

> A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
> partial block list
> 
>
> Key: HDFS-742
> URL: https://issues.apache.org/jira/browse/HDFS-742
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Reporter: Hairong Kuang
>Assignee: Mit Desai
>
> We had a balancer that had not made any progress for a long time. It turned 
> out it was repeatingly asking Namenode for a partial block list of one 
> datanode, which was done while the balancer was running.
> NameNode should notify Balancer that the datanode is not available and 
> Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2014-05-15 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993598#comment-13993598
 ] 

Mit Desai commented on HDFS-742:


Taking this over. Feel free to reassign if you are still working on it.

> A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
> partial block list
> 
>
> Key: HDFS-742
> URL: https://issues.apache.org/jira/browse/HDFS-742
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
>
> We had a balancer that had not made any progress for a long time. It turned 
> out it was repeatingly asking Namenode for a partial block list of one 
> datanode, which was done while the balancer was running.
> NameNode should notify Balancer that the datanode is not available and 
> Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-13 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Attachment: HDFS-6230.patch

Thanks for looking at the patch [~wheat9]. Posting the updated patch.

> Expose upgrade status through NameNode web UI
> -
>
> Key: HDFS-6230
> URL: https://issues.apache.org/jira/browse/HDFS-6230
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Mit Desai
> Attachments: HDFS-6230-NoUpgradesInProgress.png, 
> HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch, HDFS-6230.patch
>
>
> The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
> also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2014-05-11 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992781#comment-13992781
 ] 

Mit Desai commented on HDFS-742:


Hey [~hairong], are you still working on this JIRA? If not, I can take it over 
and work on it.

> A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
> partial block list
> 
>
> Key: HDFS-742
> URL: https://issues.apache.org/jira/browse/HDFS-742
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
>
> We had a balancer that had not made any progress for a long time. It turned 
> out it was repeatingly asking Namenode for a partial block list of one 
> datanode, which was done while the balancer was running.
> NameNode should notify Balancer that the datanode is not available and 
> Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Status: Patch Available  (was: Open)

> Expose upgrade status through NameNode web UI
> -
>
> Key: HDFS-6230
> URL: https://issues.apache.org/jira/browse/HDFS-6230
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Mit Desai
> Attachments: HDFS-6230-NoUpgradesInProgress.png, 
> HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch
>
>
> The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
> also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Attachment: HDFS-6230.patch

Attaching the patch for showing message when upgrade is in progress.
Attaching the screenshots of the web UI when the upgrades are in progress and 
after the upgrade is finalized

> Expose upgrade status through NameNode web UI
> -
>
> Key: HDFS-6230
> URL: https://issues.apache.org/jira/browse/HDFS-6230
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Mit Desai
> Attachments: HDFS-6230-NoUpgradesInProgress.png, 
> HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch
>
>
> The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
> also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Attachment: HDFS-6230-UpgradeInProgress.jpg
HDFS-6230-NoUpgradesInProgress.png

> Expose upgrade status through NameNode web UI
> -
>
> Key: HDFS-6230
> URL: https://issues.apache.org/jira/browse/HDFS-6230
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Mit Desai
> Attachments: HDFS-6230-NoUpgradesInProgress.png, 
> HDFS-6230-UpgradeInProgress.jpg
>
>
> The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
> also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-04-30 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985985#comment-13985985
 ] 

Mit Desai commented on HDFS-6230:
-

Thanks! Taking it over

> Expose upgrade status through NameNode web UI
> -
>
> Key: HDFS-6230
> URL: https://issues.apache.org/jira/browse/HDFS-6230
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Mit Desai
>
> The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
> also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-04-30 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned HDFS-6230:
---

Assignee: Mit Desai  (was: Arpit Agarwal)

> Expose upgrade status through NameNode web UI
> -
>
> Key: HDFS-6230
> URL: https://issues.apache.org/jira/browse/HDFS-6230
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Mit Desai
>
> The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
> also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-04-30 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985963#comment-13985963
 ] 

Mit Desai commented on HDFS-6230:
-

[~arpitagarwal] are you working on the jira?

> Expose upgrade status through NameNode web UI
> -
>
> Key: HDFS-6230
> URL: https://issues.apache.org/jira/browse/HDFS-6230
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>
> The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
> also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
> the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2

2014-04-29 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984337#comment-13984337
 ] 

Mit Desai commented on HDFS-5892:
-

[~wheat9], taking a closer look on the commits, I found that this is not yet 
fixed into 2.4. Do we want to commit this into 2.4.1 and change the Fix version 
to 2.4.1 or edit the fix version to 2.5.0?

> TestDeleteBlockPool fails in branch-2
> -
>
> Key: HDFS-5892
> URL: https://issues.apache.org/jira/browse/HDFS-5892
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: HDFS-5892.patch, 
> org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt
>
>
> Running test suite on Linux, I got:
> {code}
> testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool)
>   Time elapsed: 8.143 sec  <<< ERROR!
> java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting...
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.

2014-04-25 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai resolved HDFS-3122.
-

  Resolution: Not a Problem
Target Version/s: 0.23.3, 0.24.0  (was: 0.24.0, 0.23.3)

Haven't heard anything yet. Resolving this issue. Feel free to reopen if anyone 
thinks the other way.

> Block recovery with closeFile flag true can race with blockReport. Due to 
> this blocks are getting marked as corrupt.
> 
>
> Key: HDFS-3122
> URL: https://issues.apache.org/jira/browse/HDFS-3122
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Attachments: blockCorrupt.txt
>
>
> *Block Report* can *race* with *Block Recovery* with closeFile flag true.
>  Block report generated just before block recovery at DN side and due to N/W 
> problems, block report got delayed to NN. 
> After this, recovery success and generation stamp modifies to new one. 
> And primary DN invokes the commitBlockSynchronization and block got updated 
> in NN side. Also block got marked as complete, since the closeFile flag was 
> true. Updated with new genstamp.
> Now blockReport started processing at NN side. This particular block from RBW 
> (when it generated the BR at DN), and file was completed at NN side.
> Finally block will be marked as corrupt because of genstamp mismatch.
> {code}
>  case RWR:
>   if (!storedBlock.isComplete()) {
> return null; // not corrupt
>   } else if (storedBlock.getGenerationStamp() != 
> iblk.getGenerationStamp()) {
> return new BlockToMarkCorrupt(storedBlock,
> "reported " + reportedState + " replica with genstamp " +
> iblk.getGenerationStamp() + " does not match COMPLETE block's " +
> "genstamp in block map " + storedBlock.getGenerationStamp());
>   } else { // COMPLETE block, same genstamp
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.

2014-04-24 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979848#comment-13979848
 ] 

Mit Desai commented on HDFS-3122:
-

Hi [~umamaheswararao],
Is this still an issue? I looked at the code and I think this got fixed 
sometime.
Here is the code snippet from BlockManager
{code}
case RWR:
  if (!storedBlock.isComplete()) {
return null; // not corrupt
  } else if (storedBlock.getGenerationStamp() != 
reported.getGenerationStamp()) {
final long reportedGS = reported.getGenerationStamp();
return new BlockToMarkCorrupt(storedBlock, reportedGS,
"reported " + reportedState + " replica with genstamp " + reportedGS
+ " does not match COMPLETE block's genstamp in block map "
+ storedBlock.getGenerationStamp(), Reason.GENSTAMP_MISMATCH);
  } else { // COMPLETE block, same genstamp
if (reportedState == ReplicaState.RBW) {
  // If it's a RBW report for a COMPLETE block, it may just be that
  // the block report got a little bit delayed after the pipeline
  // closed. So, ignore this report, assuming we will get a
  // FINALIZED replica later. See HDFS-2791
  LOG.info("Received an RBW replica for " + storedBlock +
  " on " + dn + ": ignoring it, since it is " +
  "complete with the same genstamp");
  return null;
} else {
  return new BlockToMarkCorrupt(storedBlock,
  "reported replica has invalid state " + reportedState,
  Reason.INVALID_STATE);
}
  }
{code}

I will resolve this Jira as "Not a Problem" tomorrow unless someone wants to go 
some other way.

> Block recovery with closeFile flag true can race with blockReport. Due to 
> this blocks are getting marked as corrupt.
> 
>
> Key: HDFS-3122
> URL: https://issues.apache.org/jira/browse/HDFS-3122
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Attachments: blockCorrupt.txt
>
>
> *Block Report* can *race* with *Block Recovery* with closeFile flag true.
>  Block report generated just before block recovery at DN side and due to N/W 
> problems, block report got delayed to NN. 
> After this, recovery success and generation stamp modifies to new one. 
> And primary DN invokes the commitBlockSynchronization and block got updated 
> in NN side. Also block got marked as complete, since the closeFile flag was 
> true. Updated with new genstamp.
> Now blockReport started processing at NN side. This particular block from RBW 
> (when it generated the BR at DN), and file was completed at NN side.
> Finally block will be marked as corrupt because of genstamp mismatch.
> {code}
>  case RWR:
>   if (!storedBlock.isComplete()) {
> return null; // not corrupt
>   } else if (storedBlock.getGenerationStamp() != 
> iblk.getGenerationStamp()) {
> return new BlockToMarkCorrupt(storedBlock,
> "reported " + reportedState + " replica with genstamp " +
> iblk.getGenerationStamp() + " does not match COMPLETE block's " +
> "genstamp in block map " + storedBlock.getGenerationStamp());
>   } else { // COMPLETE block, same genstamp
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered

2014-04-22 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai resolved HDFS-2734.
-

  Resolution: Not a Problem
Target Version/s: 0.23.0, 0.20.1  (was: 0.20.1, 0.23.0)

I think this issue is not a problem. Resolving it as Not a Problem. But feel 
free to reopen this jira if you still feel there is a problem

> Even if we configure the property fs.checkpoint.size in both core-site.xml 
> and hdfs-site.xml  the values are not been considered
> 
>
> Key: HDFS-2734
> URL: https://issues.apache.org/jira/browse/HDFS-2734
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.20.1, 0.23.0
>Reporter: J.Andreina
>Priority: Minor
>
> Even if we configure the property fs.checkpoint.size in both core-site.xml 
> and hdfs-site.xml  the values are not been considered



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2

2014-04-16 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971592#comment-13971592
 ] 

Mit Desai commented on HDFS-5892:
-

[~yuzhih...@gmail.com] [~dandan] : Are you guys still having the issues? This 
test still fails randomly in our nightly builds

> TestDeleteBlockPool fails in branch-2
> -
>
> Key: HDFS-5892
> URL: https://issues.apache.org/jira/browse/HDFS-5892
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: HDFS-5892.patch, 
> org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt
>
>
> Running test suite on Linux, I got:
> {code}
> testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool)
>   Time elapsed: 8.143 sec  <<< ERROR!
> java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting...
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-4587) Webhdfs secure clients are incompatible with non-secure NN

2014-04-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-4587:


Issue Type: Bug  (was: Sub-task)
Parent: (was: HDFS-4576)

> Webhdfs secure clients are incompatible with non-secure NN
> --
>
> Key: HDFS-4587
> URL: https://issues.apache.org/jira/browse/HDFS-4587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, webhdfs
>Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
>Reporter: Daryn Sharp
>
> A secure webhdfs client will receive an exception from a non-secure NN.  For 
> a NN in non-secure mode, {{FSNamesystem#getDelegationToken}} will return 
> "null" to indicate no token is required.  Hdfs will send back the null to the 
> client, but webhdfs uses {{DelegationTokenSecretManager.createCredentials}} 
> which instead throws an exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-4576) Webhdfs authentication issues

2014-04-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai resolved HDFS-4576.
-

   Resolution: Fixed
Fix Version/s: 0.23.11
   3.0.0

Resolving this task as resolved as all of its subtasks are resolved now

> Webhdfs authentication issues
> -
>
> Key: HDFS-4576
> URL: https://issues.apache.org/jira/browse/HDFS-4576
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.0.0-alpha, 3.0.0, 0.23.7
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 3.0.0, 0.23.11
>
>
> Umbrella jira to track the webhdfs authentication issues as subtasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4587) Webhdfs secure clients are incompatible with non-secure NN

2014-04-14 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968382#comment-13968382
 ] 

Mit Desai commented on HDFS-4587:
-

As 0.23 is going into the maintenance state and this bug will not be fixed in 
it, I am removing the target version for 0.23.11

> Webhdfs secure clients are incompatible with non-secure NN
> --
>
> Key: HDFS-4587
> URL: https://issues.apache.org/jira/browse/HDFS-4587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, webhdfs
>Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
>Reporter: Daryn Sharp
>
> A secure webhdfs client will receive an exception from a non-secure NN.  For 
> a NN in non-secure mode, {{FSNamesystem#getDelegationToken}} will return 
> "null" to indicate no token is required.  Hdfs will send back the null to the 
> client, but webhdfs uses {{DelegationTokenSecretManager.createCredentials}} 
> which instead throws an exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-4587) Webhdfs secure clients are incompatible with non-secure NN

2014-04-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-4587:


Target Version/s: 3.0.0  (was: 3.0.0, 0.23.11)

> Webhdfs secure clients are incompatible with non-secure NN
> --
>
> Key: HDFS-4587
> URL: https://issues.apache.org/jira/browse/HDFS-4587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, webhdfs
>Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
>Reporter: Daryn Sharp
>
> A secure webhdfs client will receive an exception from a non-secure NN.  For 
> a NN in non-secure mode, {{FSNamesystem#getDelegationToken}} will return 
> "null" to indicate no token is required.  Hdfs will send back the null to the 
> client, but webhdfs uses {{DelegationTokenSecretManager.createCredentials}} 
> which instead throws an exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered

2014-04-11 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966723#comment-13966723
 ] 

Mit Desai commented on HDFS-2734:
-

I see that that there is no activity on this Jira since a long time. 
[~andreina], Is this still reproducible on your side? If this is still an 
issue, can you provide the information [~qwertymaniac] requested?
For the analysis that Harsh did, I think this is not reproducable on his side 
and I have not seen anyone else raising this concern. In that case, if I do not 
hear back by 4/17/14, I will go ahead and close this issue as Not A Problem.

-Mit

> Even if we configure the property fs.checkpoint.size in both core-site.xml 
> and hdfs-site.xml  the values are not been considered
> 
>
> Key: HDFS-2734
> URL: https://issues.apache.org/jira/browse/HDFS-2734
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.20.1, 0.23.0
>Reporter: J.Andreina
>Priority: Minor
>
> Even if we configure the property fs.checkpoint.size in both core-site.xml 
> and hdfs-site.xml  the values are not been considered



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails

2014-04-09 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964579#comment-13964579
 ] 

Mit Desai commented on HDFS-5983:
-

Already fixed by HDFS-6160. So Closing it.

> TestSafeMode#testInitializeReplQueuesEarly fails
> 
>
> Key: HDFS-5983
> URL: https://issues.apache.org/jira/browse/HDFS-5983
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Kihwal Lee
>Assignee: Ming Ma
> Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt
>
>
> It was seen from one of the precommit build of HDFS-5962.  The test case 
> creates 15 blocks and then shuts down all datanodes. Then the namenode is 
> restarted with a low safe block threshold and one datanode is restarted. The 
> idea is that the initial block report from the restarted datanode will make 
> the namenode leave the safemode and initialize the replication queues.
> According to the log, the datanode reported 3 blocks, but slightly before 
> that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails

2014-04-09 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5983:


Status: Open  (was: Patch Available)

> TestSafeMode#testInitializeReplQueuesEarly fails
> 
>
> Key: HDFS-5983
> URL: https://issues.apache.org/jira/browse/HDFS-5983
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Ming Ma
> Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt
>
>
> It was seen from one of the precommit build of HDFS-5962.  The test case 
> creates 15 blocks and then shuts down all datanodes. Then the namenode is 
> restarted with a low safe block threshold and one datanode is restarted. The 
> idea is that the initial block report from the restarted datanode will make 
> the namenode leave the safemode and initialize the replication queues.
> According to the log, the datanode reported 3 blocks, but slightly before 
> that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails

2014-04-09 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964491#comment-13964491
 ] 

Mit Desai commented on HDFS-5983:
-

[~airbots], [~mingma] : Can any of you regenerate the patch and attach it to 
make sure it applies successfully?

Mit

> TestSafeMode#testInitializeReplQueuesEarly fails
> 
>
> Key: HDFS-5983
> URL: https://issues.apache.org/jira/browse/HDFS-5983
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Ming Ma
> Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt
>
>
> It was seen from one of the precommit build of HDFS-5962.  The test case 
> creates 15 blocks and then shuts down all datanodes. Then the namenode is 
> restarted with a low safe block threshold and one datanode is restarted. The 
> idea is that the initial block report from the restarted datanode will make 
> the namenode leave the safemode and initialize the replication queues.
> According to the log, the datanode reported 3 blocks, but slightly before 
> that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails

2014-04-09 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964462#comment-13964462
 ] 

Mit Desai commented on HDFS-5983:
-

One note, you need to Submit the Patch once you upload the patch to get the 
HadoopQA Comment. I just did that.

> TestSafeMode#testInitializeReplQueuesEarly fails
> 
>
> Key: HDFS-5983
> URL: https://issues.apache.org/jira/browse/HDFS-5983
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Ming Ma
> Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt
>
>
> It was seen from one of the precommit build of HDFS-5962.  The test case 
> creates 15 blocks and then shuts down all datanodes. Then the namenode is 
> restarted with a low safe block threshold and one datanode is restarted. The 
> idea is that the initial block report from the restarted datanode will make 
> the namenode leave the safemode and initialize the replication queues.
> According to the log, the datanode reported 3 blocks, but slightly before 
> that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails

2014-04-09 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5983:


Status: Patch Available  (was: Open)

> TestSafeMode#testInitializeReplQueuesEarly fails
> 
>
> Key: HDFS-5983
> URL: https://issues.apache.org/jira/browse/HDFS-5983
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Ming Ma
> Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt
>
>
> It was seen from one of the precommit build of HDFS-5962.  The test case 
> creates 15 blocks and then shuts down all datanodes. Then the namenode is 
> restarted with a low safe block threshold and one datanode is restarted. The 
> idea is that the initial block report from the restarted datanode will make 
> the namenode leave the safemode and initialize the replication queues.
> According to the log, the datanode reported 3 blocks, but slightly before 
> that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails

2014-04-09 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964459#comment-13964459
 ] 

Mit Desai commented on HDFS-5983:
-

Reviewed the patch. LGTM
+1 (non binding)

> TestSafeMode#testInitializeReplQueuesEarly fails
> 
>
> Key: HDFS-5983
> URL: https://issues.apache.org/jira/browse/HDFS-5983
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Ming Ma
> Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt
>
>
> It was seen from one of the precommit build of HDFS-5962.  The test case 
> creates 15 blocks and then shuts down all datanodes. Then the namenode is 
> restarted with a low safe block threshold and one datanode is restarted. The 
> idea is that the initial block report from the restarted datanode will make 
> the namenode leave the safemode and initialize the replication queues.
> According to the log, the datanode reported 3 blocks, but slightly before 
> that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2

2014-04-07 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962088#comment-13962088
 ] 

Mit Desai commented on HDFS-6195:
-

TestRMRestart is a different issue related to YARN-1906

> TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
> intermittently fails on trunk and branch2
> --
>
> Key: HDFS-6195
> URL: https://issues.apache.org/jira/browse/HDFS-6195
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6195.patch
>
>
> The test has 1 containers that it tries to cleanup.
> The cleanup has a timeout of 2ms in which the test sometimes cannot do 
> the cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2

2014-04-07 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961984#comment-13961984
 ] 

Mit Desai commented on HDFS-6195:
-

While cleaning up the containers,
{code}
while (cleanedSize < allocatedSize && waitCount++ < 200) {
  Thread.sleep(100);
  resp = nm.nodeHeartbeat(true);
  cleaned = resp.getContainersToCleanup();
  cleanedSize += cleaned.size();
}
{code}

The test sometimes cannot do the complete cleanup and some of the 1 
containers cannot be cleaned up. Resulting an assertion error at 
{{Assert.assertEquals(allocatedSize, cleanedSize);}}.

This test has been failing in our nightly builds since couple of days. I was 
able to reproduce this consistently on eclipse but not using maven. I think 
this is an environment issue so cannot be reproduced everywhere.

As a fix, I have increased the thread sleep time in the while loop. Which will 
give some extra time for the container cleanup. And as there is also a check in 
the while loop for the allocated size and cleaned size, the test will not 
always take up all cycles in the loop.

> TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
> intermittently fails on trunk and branch2
> --
>
> Key: HDFS-6195
> URL: https://issues.apache.org/jira/browse/HDFS-6195
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6195.patch
>
>
> The test has 1 containers that it tries to cleanup.
> The cleanup has a timeout of 2ms in which the test sometimes cannot do 
> the cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2

2014-04-07 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6195:


Fix Version/s: 2.5.0
   3.0.0
   Status: Patch Available  (was: Open)

> TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
> intermittently fails on trunk and branch2
> --
>
> Key: HDFS-6195
> URL: https://issues.apache.org/jira/browse/HDFS-6195
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6195.patch
>
>
> The test has 1 containers that it tries to cleanup.
> The cleanup has a timeout of 2ms in which the test sometimes cannot do 
> the cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2

2014-04-07 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6195:


Attachment: HDFS-6195.patch

Attaching the patch for trunk and branch2

> TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
> intermittently fails on trunk and branch2
> --
>
> Key: HDFS-6195
> URL: https://issues.apache.org/jira/browse/HDFS-6195
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-6195.patch
>
>
> The test has 1 containers that it tries to cleanup.
> The cleanup has a timeout of 2ms in which the test sometimes cannot do 
> the cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2

2014-04-07 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961887#comment-13961887
 ] 

Mit Desai commented on HDFS-6195:
-

analyzing the cause. Will post the analysis/fix soon

> TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
> intermittently fails on trunk and branch2
> --
>
> Key: HDFS-6195
> URL: https://issues.apache.org/jira/browse/HDFS-6195
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>
> The test has 1 containers that it tries to cleanup.
> The cleanup has a timeout of 2ms in which the test sometimes cannot do 
> the cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2

2014-04-07 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6195:
---

 Summary: 
TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
intermittently fails on trunk and branch2
 Key: HDFS-6195
 URL: https://issues.apache.org/jira/browse/HDFS-6195
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai


The test has 1 containers that it tries to cleanup.
The cleanup has a timeout of 2ms in which the test sometimes cannot do the 
cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5807) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2

2014-03-26 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947957#comment-13947957
 ] 

Mit Desai commented on HDFS-5807:
-

Thanks [~airbots]

> TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on 
> Branch-2
> 
>
> Key: HDFS-5807
> URL: https://issues.apache.org/jira/browse/HDFS-5807
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.3.0
>Reporter: Mit Desai
>Assignee: Chen He
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HDFS-5807.patch
>
>
> The test times out after some time.
> {noformat}
> java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
> to become 0.16, but on datanode 127.0.0.1:42451 it remains at 0.3 after more 
> than 2 msec.
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HDFS-5807) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2

2014-03-24 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reopened HDFS-5807:
-


[~airbots], I found this test failing again in our nightly builds, Can you take 
a look into it again? 

{noformat}
Error Message

Rebalancing expected avg utilization to become 0.16, but on datanode 
X.X.X.X: it remains at 0.3 after more than 4 msec.

Stacktrace

java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to 
become 0.16, but on datanode X.X.X.X: it remains at 0.3 after more than 
4 msec.
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302)

{noformat}

> TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on 
> Branch-2
> 
>
> Key: HDFS-5807
> URL: https://issues.apache.org/jira/browse/HDFS-5807
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.3.0
>Reporter: Mit Desai
>Assignee: Chen He
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HDFS-5807.patch
>
>
> The test times out after some time.
> {noformat}
> java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
> to become 0.16, but on datanode 127.0.0.1:42451 it remains at 0.3 after more 
> than 2 msec.
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6126) TestnameNodeMetrics#testCorruptBlock fails intermittently

2014-03-19 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6126:
---

 Summary: TestnameNodeMetrics#testCorruptBlock fails intermittently
 Key: HDFS-6126
 URL: https://issues.apache.org/jira/browse/HDFS-6126
 Project: Hadoop HDFS
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai


I get the following error
{noformat}
testCorruptBlock(org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics)
  Time elapsed: 5.556 sec  <<< FAILURE!
java.lang.AssertionError: Bad value for metric CorruptBlocks expected:<1> but 
was:<0>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:190)
at 
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:247)


Results :

Failed tests: 
  TestNameNodeMetrics.testCorruptBlock:247 Bad value for metric CorruptBlocks 
expected:<1> but was:<0>
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6104) TestFsLimits#testDefaultMaxComponentLength Fails on branch-2

2014-03-14 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6104:
---

 Summary: TestFsLimits#testDefaultMaxComponentLength Fails on 
branch-2
 Key: HDFS-6104
 URL: https://issues.apache.org/jira/browse/HDFS-6104
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai


testDefaultMaxComponentLength fails intermittently with the following error
{noformat}
java.lang.AssertionError: expected:<0> but was:<255>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.hadoop.hdfs.server.namenode.TestFsLimits.testDefaultMaxComponentLength(TestFsLimits.java:90)
{noformat}

On doing some research, I found that this is actually a JDK7 issue.
The test always fails when it runs after any test that runs addChildWithName() 
method



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2

2014-03-11 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930720#comment-13930720
 ] 

Mit Desai commented on HDFS-6035:
-

[~sathish.gurram], Can you let me know what branch are you testing this on?

> TestCacheDirectives#testCacheManagerRestart is failing on branch-2
> --
>
> Key: HDFS-6035
> URL: https://issues.apache.org/jira/browse/HDFS-6035
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.4.0
>Reporter: Mit Desai
>Assignee: sathish
> Attachments: HDFS-6035-0001.patch
>
>
> {noformat}
> java.io.IOException: Inconsistent checkpoint fields.
> LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; 
> blockpoolId = BP-423574854-x.x.x.x-1393478669835.
> Expecting respectively: -51; 2; 0; testClusterID; 
> BP-2051361571-x.x.x.x-1393478572877.
>   at 
> org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE

2014-03-05 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13921181#comment-13921181
 ] 

Mit Desai commented on HDFS-5857:
-

None of the test failures are related to the patch. I have manually tested them 
with the patch and they pass on my machine

> TestWebHDFS#testNamenodeRestart fails intermittently with NPE
> -
>
> Key: HDFS-5857
> URL: https://issues.apache.org/jira/browse/HDFS-5857
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-5857.patch, HDFS-5857.patch
>
>
> {noformat}
> java.lang.AssertionError: There are 1 exception(s):
>   Exception 0: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701)
>   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920)
>   at java.lang.Thread.run(Thread.java:722)
>   at org.junit.Assert.fail(Assert.java:93)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE

2014-03-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5857:


Attachment: HDFS-5857.patch

Thanks for the inputs Haohui.
Attaching the updated patch

> TestWebHDFS#testNamenodeRestart fails intermittently with NPE
> -
>
> Key: HDFS-5857
> URL: https://issues.apache.org/jira/browse/HDFS-5857
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-5857.patch, HDFS-5857.patch
>
>
> {noformat}
> java.lang.AssertionError: There are 1 exception(s):
>   Exception 0: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701)
>   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920)
>   at java.lang.Thread.run(Thread.java:722)
>   at org.junit.Assert.fail(Assert.java:93)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2

2014-03-05 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920948#comment-13920948
 ] 

Mit Desai commented on HDFS-6035:
-

I am trying but cannot reproduce it in eclipse as well. I'll have to put some 
more efforts and update you once I have some findings.

> TestCacheDirectives#testCacheManagerRestart is failing on branch-2
> --
>
> Key: HDFS-6035
> URL: https://issues.apache.org/jira/browse/HDFS-6035
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.4.0
>Reporter: Mit Desai
>Assignee: sathish
> Attachments: HDFS-6035-0001.patch
>
>
> {noformat}
> java.io.IOException: Inconsistent checkpoint fields.
> LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; 
> blockpoolId = BP-423574854-x.x.x.x-1393478669835.
> Expecting respectively: -51; 2; 0; testClusterID; 
> BP-2051361571-x.x.x.x-1393478572877.
>   at 
> org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-5839) TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk

2014-03-04 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai resolved HDFS-5839.
-

Resolution: Duplicate

HDFS-5857 has a patch for this issue. I am resolving this JIRA so that we have 
a single Jira tracking it

> TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk
> 
>
> Key: HDFS-5839
> URL: https://issues.apache.org/jira/browse/HDFS-5839
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Mit Desai
> Attachments: org.apache.hadoop.hdfs.web.TestWebHDFS-output.txt
>
>
> Here is test failure:
> {code}
> testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 45.206 sec  <<< FAILURE!
> java.lang.AssertionError: There are 1 exception(s):
>   Exception 0: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
> at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:104)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:615)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
> at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$OffsetUrlOpener.connect(WebHdfsFileSystem.java:878)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:180)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at 
> org.apache.hadoop.hdfs.TestDFSClientRetries$5.run(TestDFSClientRetries.java:954)
> at java.lang.Thread.run(Thread.java:724)
> at org.junit.Assert.fail(Assert.java:93)
> at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1083)
> at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:1003)
> at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
> {code}
> From test output:
> {code}
> 2014-01-27 17:55:59,388 WARN  resources.ExceptionHandler 
> (ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:166)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:231)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:658)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.access$400(NamenodeWebHdfsMethods.java:116)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:631)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:626)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1560)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:626)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
> at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
> at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at 
> com.sun.jersey.server.impl.uri.rules.Res

[jira] [Updated] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE

2014-03-04 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5857:


Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

> TestWebHDFS#testNamenodeRestart fails intermittently with NPE
> -
>
> Key: HDFS-5857
> URL: https://issues.apache.org/jira/browse/HDFS-5857
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-5857.patch
>
>
> {noformat}
> java.lang.AssertionError: There are 1 exception(s):
>   Exception 0: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701)
>   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920)
>   at java.lang.Thread.run(Thread.java:722)
>   at org.junit.Assert.fail(Assert.java:93)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE

2014-03-04 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5857:


Attachment: HDFS-5857.patch

Attaching the patch

> TestWebHDFS#testNamenodeRestart fails intermittently with NPE
> -
>
> Key: HDFS-5857
> URL: https://issues.apache.org/jira/browse/HDFS-5857
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-5857.patch
>
>
> {noformat}
> java.lang.AssertionError: There are 1 exception(s):
>   Exception 0: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701)
>   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920)
>   at java.lang.Thread.run(Thread.java:722)
>   at org.junit.Assert.fail(Assert.java:93)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE

2014-03-04 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned HDFS-5857:
---

Assignee: Mit Desai

> TestWebHDFS#testNamenodeRestart fails intermittently with NPE
> -
>
> Key: HDFS-5857
> URL: https://issues.apache.org/jira/browse/HDFS-5857
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>
> {noformat}
> java.lang.AssertionError: There are 1 exception(s):
>   Exception 0: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701)
>   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920)
>   at java.lang.Thread.run(Thread.java:722)
>   at org.junit.Assert.fail(Assert.java:93)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031)
>   at 
> org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5950) The DFSClient and DataNode should use shared memory segments to communicate short-circuit information

2014-03-03 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918570#comment-13918570
 ] 

Mit Desai commented on HDFS-5950:
-

Hey, I just found that this check in causes a Release Audit Warning for the 
empty file _TestShortCircuitShm.java_

> The DFSClient and DataNode should use shared memory segments to communicate 
> short-circuit information
> -
>
> Key: HDFS-5950
> URL: https://issues.apache.org/jira/browse/HDFS-5950
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client
>Affects Versions: 2.4.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.4.0
>
> Attachments: HDFS-5950.001.patch, HDFS-5950.003.patch, 
> HDFS-5950.004.patch, HDFS-5950.006.patch, HDFS-5950.007.patch, 
> HDFS-5950.008.patch
>
>
> The DFSClient and DataNode should use the shared memory segments and unified 
> cache added in the other HDFS-5182 subtasks to communicate short-circuit 
> information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2

2014-03-03 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918110#comment-13918110
 ] 

Mit Desai commented on HDFS-6035:
-

Thanks for taking this issue Sathish. This test is failing in our nightly 
builds but I am unable to reproduce it. is there a specific way you were able 
to reproduce it?

> TestCacheDirectives#testCacheManagerRestart is failing on branch-2
> --
>
> Key: HDFS-6035
> URL: https://issues.apache.org/jira/browse/HDFS-6035
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.4.0
>Reporter: Mit Desai
>Assignee: sathish
> Attachments: HDFS-6035-0001.patch
>
>
> {noformat}
> java.io.IOException: Inconsistent checkpoint fields.
> LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; 
> blockpoolId = BP-423574854-x.x.x.x-1393478669835.
> Expecting respectively: -51; 2; 0; testClusterID; 
> BP-2051361571-x.x.x.x-1393478572877.
>   at 
> org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133)
>   at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2

2014-02-28 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6035:
---

 Summary: TestCacheDirectives#testCacheManagerRestart is failing on 
branch-2
 Key: HDFS-6035
 URL: https://issues.apache.org/jira/browse/HDFS-6035
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Mit Desai


{noformat}
java.io.IOException: Inconsistent checkpoint fields.
LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; 
blockpoolId = BP-423574854-x.x.x.x-1393478669835.
Expecting respectively: -51; 2; 0; testClusterID; 
BP-2051361571-x.x.x.x-1393478572877.
at 
org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526)
at 
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-15 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5780:


Attachment: HDFS-5780-v3.patch

New Patch Attached. No code changes after form the previous patch.
This patch only contains the change in the comment where the Thread timeout was 
changed from 1sec to 2sec

> TestRBWBlockInvalidation times out intemittently on branch-2
> 
>
> Key: HDFS-5780
> URL: https://issues.apache.org/jira/browse/HDFS-5780
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch
>
>
> i recently found out that the test 
> TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
> out intermittently.
> I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5780:


Status: Patch Available  (was: Open)

> TestRBWBlockInvalidation times out intemittently on branch-2
> 
>
> Key: HDFS-5780
> URL: https://issues.apache.org/jira/browse/HDFS-5780
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-5780.patch, HDFS-5780.patch
>
>
> i recently found out that the test 
> TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
> out intermittently.
> I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5780:


Attachment: HDFS-5780.patch

Attaching the new patch with the addressed changes. I have increased the 
timeout to 10minutes and I had to make few other timing related changes.

> TestRBWBlockInvalidation times out intemittently on branch-2
> 
>
> Key: HDFS-5780
> URL: https://issues.apache.org/jira/browse/HDFS-5780
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-5780.patch, HDFS-5780.patch
>
>
> i recently found out that the test 
> TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
> out intermittently.
> I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5780:


Status: Open  (was: Patch Available)

> TestRBWBlockInvalidation times out intemittently on branch-2
> 
>
> Key: HDFS-5780
> URL: https://issues.apache.org/jira/browse/HDFS-5780
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: HDFS-5780.patch
>
>
> i recently found out that the test 
> TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
> out intermittently.
> I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >