[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2015-07-28 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644357#comment-14644357
 ] 

Mit Desai commented on HDFS-742:


Attached modified patch. But still, I do not have a unit test for the fix

 A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
 partial block list
 

 Key: HDFS-742
 URL: https://issues.apache.org/jira/browse/HDFS-742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Hairong Kuang
Assignee: Mit Desai
 Attachments: HDFS-742-trunk.patch, HDFS-742.patch


 We had a balancer that had not made any progress for a long time. It turned 
 out it was repeatingly asking Namenode for a partial block list of one 
 datanode, which was done while the balancer was running.
 NameNode should notify Balancer that the datanode is not available and 
 Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2015-07-28 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-742:
---
Attachment: HDFS-742-trunk.patch

 A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
 partial block list
 

 Key: HDFS-742
 URL: https://issues.apache.org/jira/browse/HDFS-742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Hairong Kuang
Assignee: Mit Desai
 Attachments: HDFS-742-trunk.patch, HDFS-742.patch


 We had a balancer that had not made any progress for a long time. It turned 
 out it was repeatingly asking Namenode for a partial block list of one 
 datanode, which was done while the balancer was running.
 NameNode should notify Balancer that the datanode is not available and 
 Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7364) Balancer always show zero Bytes Already Moved

2014-11-06 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200425#comment-14200425
 ] 

Mit Desai commented on HDFS-7364:
-

Nice catch. Here balancer exits here after 5 iterations of what it thinks has 
0B move. It means it is still balancing and exits in middle the process. I see 
that the Bytes left to move is going down in every iteration.
It will be nice to have this fixed. But it would be good to have a unit test as 
well.

 Balancer always show zero Bytes Already Moved
 -

 Key: HDFS-7364
 URL: https://issues.apache.org/jira/browse/HDFS-7364
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h7364_20141105.patch


 Here is an example:
 {noformat}
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 Nov 5, 2014 5:23:38 PM0  0 B   116.82 MB  
 181.07 MB
 Nov 5, 2014 5:24:30 PM1  0 B88.05 MB  
 181.07 MB
 Nov 5, 2014 5:25:10 PM2  0 B73.08 MB  
 181.07 MB
 Nov 5, 2014 5:25:49 PM3  0 B13.37 MB  
  90.53 MB
 Nov 5, 2014 5:26:30 PM4  0 B13.59 MB  
  90.53 MB
 Nov 5, 2014 5:27:12 PM5  0 B 9.25 MB  
  90.53 MB
 The cluster is balanced. Exiting...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7230) Add rolling downgrade documentation

2014-10-30 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190226#comment-14190226
 ] 

Mit Desai commented on HDFS-7230:
-

+1 (non-binding)

Thanks for the patch [~szetszwo].

 Add rolling downgrade documentation
 ---

 Key: HDFS-7230
 URL: https://issues.apache.org/jira/browse/HDFS-7230
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h7230_20141028.patch


 HDFS-5535 made a lot of improvement on rolling upgrade.  It also added the 
 cluster downgrade feature.  However, the downgrade described in HDFS-5535 
 requires cluster downtime.  In this JIRA, we discuss how to do rolling 
 downgrade, i.e. downgrade without downtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6687) nn.getNamesystem() may return NPE from JspHelper

2014-09-26 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6687:

Target Version/s: 2.6.0  (was: 2.5.0)

 nn.getNamesystem() may return NPE from JspHelper
 

 Key: HDFS-6687
 URL: https://issues.apache.org/jira/browse/HDFS-6687
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai

 In hadoop-2, the http server is started in the very early stage to show the 
 progress. If the user tries to get the name system, it may not be completely 
 up and the NN logs will have this kind of error.
 {noformat}
 2014-07-14 15:49:03,521 [***] WARN
 resources.ExceptionHandler: INTERNAL_SERVER_ERROR
 java.lang.NullPointerException
 at
 org.apache.hadoop.hdfs.server.common.JspHelper.getTokenUGI(JspHelper.java:661)
 at
 org.apache.hadoop.hdfs.server.common.JspHelper.getUGI(JspHelper.java:604)
 at
 org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:53)
 at
 org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:41)
 at
 com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46)
 at
 com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153)
 at
 com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203)
 at
 com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
 at
 com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
 at
 com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
 at
 com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
 at
 com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
 at
 com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
 at
 com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
 at
 com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
 at
 com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
 at
 com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
 at
 com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
 at
 com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
 at
 com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
 at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:84)
 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78)
 at com.yahoo.hadoop.GzipFilter.doFilter(GzipFilter.java:220)
 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223)
 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 

[jira] [Created] (HDFS-6983) TestBalancer#testExitZeroOnSuccess fails intermittently

2014-09-02 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6983:
---

 Summary: TestBalancer#testExitZeroOnSuccess fails intermittently
 Key: HDFS-6983
 URL: https://issues.apache.org/jira/browse/HDFS-6983
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Mit Desai


TestBalancer#testExitZeroOnSuccess fails intermittently on branch-2. And 
probably fails on trunk too.

The test fails 1 in 20 times when I ran it in a loop. Here is the how it fails.

{noformat}
org.apache.hadoop.hdfs.server.balancer.TestBalancer
testExitZeroOnSuccess(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  
Time elapsed: 53.965 sec   ERROR!
java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to 
become 0.2, but on datanode 127.0.0.1:35502 it remains at 0.08 after more than 
4 msec.
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:321)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:632)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:549)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:437)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:645)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:845)


Results :

Tests in error: 
  
TestBalancer.testExitZeroOnSuccess:845-oneNodeTest:645-doTest:437-doTest:549-runBalancerCli:632-waitForBalancer:321
 Timeout
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6754:


Status: Patch Available  (was: Open)

 TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
 retry
 ---

 Key: HDFS-6754
 URL: https://issues.apache.org/jira/browse/HDFS-6754
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6754.patch


 I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
 in our nightly builds with the following error:
 {noformat}
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6754:


Status: Open  (was: Patch Available)

 TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
 retry
 ---

 Key: HDFS-6754
 URL: https://issues.apache.org/jira/browse/HDFS-6754
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6754.patch


 I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
 in our nightly builds with the following error:
 {noformat}
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6754:


Attachment: HDFS-6754.patch

Attaching patch to enable retries.

 TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
 retry
 ---

 Key: HDFS-6754
 URL: https://issues.apache.org/jira/browse/HDFS-6754
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6754.patch


 I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
 in our nightly builds with the following error:
 {noformat}
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6754:


Status: Patch Available  (was: Open)

 TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
 retry
 ---

 Key: HDFS-6754
 URL: https://issues.apache.org/jira/browse/HDFS-6754
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6754.patch, HDFS-6754.patch


 I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
 in our nightly builds with the following error:
 {noformat}
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6754:


Attachment: HDFS-6754.patch

Refined patch to update the comment which described the changed line

 TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
 retry
 ---

 Key: HDFS-6754
 URL: https://issues.apache.org/jira/browse/HDFS-6754
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6754.patch, HDFS-6754.patch


 I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
 in our nightly builds with the following error:
 {noformat}
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076649#comment-14076649
 ] 

Mit Desai commented on HDFS-6754:
-

These test failures are not related to the patch.

 TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
 retry
 ---

 Key: HDFS-6754
 URL: https://issues.apache.org/jira/browse/HDFS-6754
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6754.patch, HDFS-6754.patch


 I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
 in our nightly builds with the following error:
 {noformat}
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-28 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076918#comment-14076918
 ] 

Mit Desai commented on HDFS-6754:
-

Thanks Daryn!

 TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
 retry
 ---

 Key: HDFS-6754
 URL: https://issues.apache.org/jira/browse/HDFS-6754
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6754.patch, HDFS-6754.patch


 I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
 in our nightly builds with the following error:
 {noformat}
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (HDFS-6754) TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of retry

2014-07-25 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai moved YARN-2358 to HDFS-6754:
---

 Target Version/s: 2.6.0  (was: 2.6.0)
Affects Version/s: (was: 2.6.0)
   2.6.0
  Key: HDFS-6754  (was: YARN-2358)
  Project: Hadoop HDFS  (was: Hadoop YARN)

 TestNamenodeCapacityReport.testXceiverCount may sometimes fail due to lack of 
 retry
 ---

 Key: HDFS-6754
 URL: https://issues.apache.org/jira/browse/HDFS-6754
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai

 I have seen TestNamenodeCapacityReport.testXceiverCount fail intermittently 
 in our nightly builds with the following error:
 {noformat}
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2151)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2119)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount(TestNamenodeCapacityReport.java:281)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6755) Make DFSOutputStream more efficient

2014-07-25 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6755:
---

 Summary: Make DFSOutputStream more efficient
 Key: HDFS-6755
 URL: https://issues.apache.org/jira/browse/HDFS-6755
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai


Following code in DFSOutputStream may have an unnecessary sleep.

{code}
try {
  Thread.sleep(localTimeout);
  if (retries == 0) {
throw new IOException(Unable to close file because the last block
+  does not have enough number of replicas.);
  }
  retries--;
  localTimeout *= 2;
  if (Time.now() - localstart  5000) {
DFSClient.LOG.info(Could not complete  + src +  retrying...);
  }
} catch (InterruptedException ie) {
  DFSClient.LOG.warn(Caught exception , ie);
}
{code}

Currently, the code sleeps before throwing an exception which should not be the 
case.
The sleep time gets doubled on every iteration, which can make a significant 
effect if there are more than one iterations. We need to move the sleep down 
after decrementing retries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6755) Make DFSOutputStream more efficient

2014-07-25 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6755:


Issue Type: Improvement  (was: Bug)

 Make DFSOutputStream more efficient
 ---

 Key: HDFS-6755
 URL: https://issues.apache.org/jira/browse/HDFS-6755
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai

 Following code in DFSOutputStream may have an unnecessary sleep.
 {code}
 try {
   Thread.sleep(localTimeout);
   if (retries == 0) {
 throw new IOException(Unable to close file because the last 
 block
 +  does not have enough number of replicas.);
   }
   retries--;
   localTimeout *= 2;
   if (Time.now() - localstart  5000) {
 DFSClient.LOG.info(Could not complete  + src +  retrying...);
   }
 } catch (InterruptedException ie) {
   DFSClient.LOG.warn(Caught exception , ie);
 }
 {code}
 Currently, the code sleeps before throwing an exception which should not be 
 the case.
 The sleep time gets doubled on every iteration, which can make a significant 
 effect if there are more than one iterations. We need to move the sleep down 
 after decrementing retries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6755) Make DFSOutputStream more efficient

2014-07-25 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6755:


Description: 
Following code in DFSOutputStream may have an unnecessary sleep.

{code}
try {
  Thread.sleep(localTimeout);
  if (retries == 0) {
throw new IOException(Unable to close file because the last block
+  does not have enough number of replicas.);
  }
  retries--;
  localTimeout *= 2;
  if (Time.now() - localstart  5000) {
DFSClient.LOG.info(Could not complete  + src +  retrying...);
  }
} catch (InterruptedException ie) {
  DFSClient.LOG.warn(Caught exception , ie);
}
{code}

Currently, the code sleeps before throwing an exception which should not be the 
case.
The sleep time gets doubled on every iteration, which can make a significant 
effect if there are more than one iterations and it would sleep just to throw 
an exception. We need to move the sleep down after decrementing retries.

  was:
Following code in DFSOutputStream may have an unnecessary sleep.

{code}
try {
  Thread.sleep(localTimeout);
  if (retries == 0) {
throw new IOException(Unable to close file because the last block
+  does not have enough number of replicas.);
  }
  retries--;
  localTimeout *= 2;
  if (Time.now() - localstart  5000) {
DFSClient.LOG.info(Could not complete  + src +  retrying...);
  }
} catch (InterruptedException ie) {
  DFSClient.LOG.warn(Caught exception , ie);
}
{code}

Currently, the code sleeps before throwing an exception which should not be the 
case.
The sleep time gets doubled on every iteration, which can make a significant 
effect if there are more than one iterations. We need to move the sleep down 
after decrementing retries.


 Make DFSOutputStream more efficient
 ---

 Key: HDFS-6755
 URL: https://issues.apache.org/jira/browse/HDFS-6755
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai

 Following code in DFSOutputStream may have an unnecessary sleep.
 {code}
 try {
   Thread.sleep(localTimeout);
   if (retries == 0) {
 throw new IOException(Unable to close file because the last 
 block
 +  does not have enough number of replicas.);
   }
   retries--;
   localTimeout *= 2;
   if (Time.now() - localstart  5000) {
 DFSClient.LOG.info(Could not complete  + src +  retrying...);
   }
 } catch (InterruptedException ie) {
   DFSClient.LOG.warn(Caught exception , ie);
 }
 {code}
 Currently, the code sleeps before throwing an exception which should not be 
 the case.
 The sleep time gets doubled on every iteration, which can make a significant 
 effect if there are more than one iterations and it would sleep just to throw 
 an exception. We need to move the sleep down after decrementing retries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6755) Make DFSOutputStream more efficient

2014-07-25 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6755:


Attachment: HDFS-6755.patch

Hi [~cmccabe],
I did not mean to get rid of the sleep. I have uploaded the patch to indicate 
the change I wanted to make.
I wanted to throw an IOException if the {{retries == 0}} before 
{{Thread.sleep(localTimeout);}} is called.

Does that seem reasonable?

 Make DFSOutputStream more efficient
 ---

 Key: HDFS-6755
 URL: https://issues.apache.org/jira/browse/HDFS-6755
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6755.patch


 Following code in DFSOutputStream may have an unnecessary sleep.
 {code}
 try {
   Thread.sleep(localTimeout);
   if (retries == 0) {
 throw new IOException(Unable to close file because the last 
 block
 +  does not have enough number of replicas.);
   }
   retries--;
   localTimeout *= 2;
   if (Time.now() - localstart  5000) {
 DFSClient.LOG.info(Could not complete  + src +  retrying...);
   }
 } catch (InterruptedException ie) {
   DFSClient.LOG.warn(Caught exception , ie);
 }
 {code}
 Currently, the code sleeps before throwing an exception which should not be 
 the case.
 The sleep time gets doubled on every iteration, which can make a significant 
 effect if there are more than one iterations and it would sleep just to throw 
 an exception. We need to move the sleep down after decrementing retries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6755) There is an unnecessary sleep in the code path where DFSOutputStream#close gives up its attempt to contact the namenode

2014-07-25 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075236#comment-14075236
 ] 

Mit Desai commented on HDFS-6755:
-

Thanks Colin!

 There is an unnecessary sleep in the code path where DFSOutputStream#close 
 gives up its attempt to contact the namenode
 ---

 Key: HDFS-6755
 URL: https://issues.apache.org/jira/browse/HDFS-6755
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6755.patch


 DFSOutputStream#close has a loop where it tries to contact the NameNode, to 
 call {{complete}} on the file which is open-for-write.  This loop includes a 
 sleep which increases exponentially (exponential backoff).  It makes sense to 
 sleep before re-contacting the NameNode, but the code also sleeps even in the 
 case where it has already decided to give up and throw an exception back to 
 the user.  It should not sleep after it has already decided to give up, since 
 there's no point.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6696) Name node cannot start if the path of a file under construction contains .snapshot

2014-07-21 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069235#comment-14069235
 ] 

Mit Desai commented on HDFS-6696:
-

[~andrew.wang], we were trying to upgrade 0.21.11 to 2.4.0

 Name node cannot start if the path of a file under construction contains 
 .snapshot
 

 Key: HDFS-6696
 URL: https://issues.apache.org/jira/browse/HDFS-6696
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Andrew Wang
Priority: Blocker

 Using {{-renameReserved}} to rename .snapshot in a pre-hdfs-snapshot 
 feature fsimage during upgrade only works, if there is nothing under 
 construction under the renamed directory.  I am not sure whether it takes 
 care of edits containing .snapshot properly.
 The workaround is to identify these directories and rename, then do 
 {{saveNamespace}} before performing upgrade.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6687) nn.getNamesystem() may return NPE from JspHelper

2014-07-15 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6687:
---

 Summary: nn.getNamesystem() may return NPE from JspHelper
 Key: HDFS-6687
 URL: https://issues.apache.org/jira/browse/HDFS-6687
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai


In hadoop-2, the http server is started in the very early stage to show the 
progress. If the user tries to get the name system, it may not be completely up 
and the NN logs will have this kind of error.

{noformat}
2014-07-14 15:49:03,521 [***] WARN
resources.ExceptionHandler: INTERNAL_SERVER_ERROR
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.common.JspHelper.getTokenUGI(JspHelper.java:661)
at
org.apache.hadoop.hdfs.server.common.JspHelper.getUGI(JspHelper.java:604)
at
org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:53)
at
org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:41)
at
com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46)
at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153)
at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203)
at
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:84)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78)
at com.yahoo.hadoop.GzipFilter.doFilter(GzipFilter.java:220)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at 

[jira] [Created] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade

2014-07-15 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6691:
---

 Summary: The message on NN UI can be confusing during a rolling 
upgrade 
 Key: HDFS-6691
 URL: https://issues.apache.org/jira/browse/HDFS-6691
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: ha1.png

On ANN, it says rollback image was created. On SBN, it says otherwise.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade

2014-07-15 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6691:


Attachment: ha1.png

 The message on NN UI can be confusing during a rolling upgrade 
 ---

 Key: HDFS-6691
 URL: https://issues.apache.org/jira/browse/HDFS-6691
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: ha1.png


 On ANN, it says rollback image was created. On SBN, it says otherwise.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade

2014-07-15 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6691:


Attachment: ha2.png

 The message on NN UI can be confusing during a rolling upgrade 
 ---

 Key: HDFS-6691
 URL: https://issues.apache.org/jira/browse/HDFS-6691
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: ha1.png, ha2.png


 On ANN, it says rollback image was created. On SBN, it says otherwise.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6597) New option for namenode upgrade

2014-06-24 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042518#comment-14042518
 ] 

Mit Desai commented on HDFS-6597:
-

The idea seems to be good as it does not alter the way -upgrade currently works.
* I agree with [~cmccabe] and [~cnauroth] on the new name to be force
* Instead of -force, -halt or -upgradeOnly seems to be reasonable. But anything 
would be good as long as it does nor imply we are forcing something to get done 
which it should not be doing.

Thanks,
Mit

 New option for namenode upgrade
 ---

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: JIRA-HDFS-30.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-06-24 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6597:


Summary: Add a new option to NN upgrade to terminate the process after 
upgrade on NN is completed  (was: New option for namenode upgrade)

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: JIRA-HDFS-30.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-06-24 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042525#comment-14042525
 ] 

Mit Desai commented on HDFS-6597:
-

Changing the summary to describe the jira more accurately.

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: JIRA-HDFS-30.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2014-06-10 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-742:
---

Attachment: HDFS-742.patch

 A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
 partial block list
 

 Key: HDFS-742
 URL: https://issues.apache.org/jira/browse/HDFS-742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Reporter: Hairong Kuang
Assignee: Mit Desai
 Attachments: HDFS-742.patch


 We had a balancer that had not made any progress for a long time. It turned 
 out it was repeatingly asking Namenode for a partial block list of one 
 datanode, which was done while the balancer was running.
 NameNode should notify Balancer that the datanode is not available and 
 Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2014-06-10 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026612#comment-14026612
 ] 

Mit Desai commented on HDFS-742:


Attaching the patch. Unfortunately I do not have a way to reproduce the issue 
so I'm unable to have a test to verify the change.
Here is the explanation of the part of the Balancer code makes it hang forever.

In the following while loop in Balancer.java, when the Balancer figures out 
that it should fetch more blocks, it gets the BlockList and decrements the 
blockToReceive by that many blocks. It again starts from the top of the loop 
after that.

{code}
 while(!isTimeUp  getScheduledSize()0 
  (!srcBlockList.isEmpty() || blocksToReceive0)) {
   
## SOME LINES OMITTED ##

filterMovedBlocks(); // filter already moved blocks
if (shouldFetchMoreBlocks()) {
  // fetch new blocks
  try {
blocksToReceive -= getBlockList();
continue;
  } catch (IOException e) {

## SOME LINES OMITTED ##

// check if time is up or not
if (Time.now()-startTime  MAX_ITERATION_TIME) {
  isTimeUp = true;
  continue;
}
## SOME LINES OMITTED ##

 }
{code}

The problem here is, if the datanode is decommissioned, the {{getBlockList()}} 
method will not return anything and the {{blocksToReceive}} will not be 
changed. It will keep on doing this indefinitely as the {{blocksToReceive}} 
will always be greater than 0. The {{isTimeUp}} will never be set to true as it 
will never reach that part of the code. In the patch that is submitted, the 
Time up condition is moved to the top of the loop. So it will check if 
{{isTimeUp}} is set and proceed ahead only if time up is not encountered.

 A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
 partial block list
 

 Key: HDFS-742
 URL: https://issues.apache.org/jira/browse/HDFS-742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Reporter: Hairong Kuang
Assignee: Mit Desai
 Attachments: HDFS-742.patch


 We had a balancer that had not made any progress for a long time. It turned 
 out it was repeatingly asking Namenode for a partial block list of one 
 datanode, which was done while the balancer was running.
 NameNode should notify Balancer that the datanode is not available and 
 Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-06 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6487:


Status: Open  (was: Patch Available)

 TestStandbyCheckpoint#testSBNCheckpoints is racy
 

 Key: HDFS-6487
 URL: https://issues.apache.org/jira/browse/HDFS-6487
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.1
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6487.patch


 testSBNCheckpoints fails occasionally.
 I could not reproduce it consistently but it would fail 8 out of 10 times 
 after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-06 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6487:


Attachment: HDFS-6487.patch

Attaching the updated patch.

 TestStandbyCheckpoint#testSBNCheckpoints is racy
 

 Key: HDFS-6487
 URL: https://issues.apache.org/jira/browse/HDFS-6487
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.1
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6487.patch, HDFS-6487.patch


 testSBNCheckpoints fails occasionally.
 I could not reproduce it consistently but it would fail 8 out of 10 times 
 after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-06 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6487:


Status: Patch Available  (was: Open)

 TestStandbyCheckpoint#testSBNCheckpoints is racy
 

 Key: HDFS-6487
 URL: https://issues.apache.org/jira/browse/HDFS-6487
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.1
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6487.patch, HDFS-6487.patch


 testSBNCheckpoints fails occasionally.
 I could not reproduce it consistently but it would fail 8 out of 10 times 
 after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-06 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020213#comment-14020213
 ] 

Mit Desai commented on HDFS-6487:
-

Failure not related to the patch submitted. This has been there since a long 
time. HDFS-5807 and HDFS-6159 were filed to resolve it. I will comment on those 
JIRAs.

 TestStandbyCheckpoint#testSBNCheckpoints is racy
 

 Key: HDFS-6487
 URL: https://issues.apache.org/jira/browse/HDFS-6487
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.1
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6487.patch, HDFS-6487.patch


 testSBNCheckpoints fails occasionally.
 I could not reproduce it consistently but it would fail 8 out of 10 times 
 after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success

2014-06-06 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020222#comment-14020222
 ] 

Mit Desai commented on HDFS-6159:
-

This test is failing again.
https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
[~airbots], can you take a look on this pre-commit?

 TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block 
 missing after balancer success
 --

 Key: HDFS-6159
 URL: https://issues.apache.org/jira/browse/HDFS-6159
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6159-v2.patch, HDFS-6159-v2.patch, HDFS-6159.patch, 
 logs.txt


 The TestBalancerWithNodeGroup.testBalancerWithNodeGroup will report negative 
 false failure if there is(are) data block(s) losing after balancer 
 successfuly finishes. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-06 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020413#comment-14020413
 ] 

Mit Desai commented on HDFS-6487:
-

Thanks Andrew!

 TestStandbyCheckpoint#testSBNCheckpoints is racy
 

 Key: HDFS-6487
 URL: https://issues.apache.org/jira/browse/HDFS-6487
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.1
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6487.patch, HDFS-6487.patch


 testSBNCheckpoints fails occasionally.
 I could not reproduce it consistently but it would fail 8 out of 10 times 
 after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6487:


Attachment: HDFS-6487.patch

Attaching patch for trunk and branch-2

 TestStandbyCheckpoint#testSBNCheckpoints is racy
 

 Key: HDFS-6487
 URL: https://issues.apache.org/jira/browse/HDFS-6487
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6487.patch


 testSBNCheckpoints fails occasionally.
 I could not reproduce it consistently but it would fail 8 out of 10 times 
 after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-05 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018953#comment-14018953
 ] 

Mit Desai commented on HDFS-6487:
-

In testSBNCheckpoints, after doEdits() it waits for the SBN to do checkpoint 
and immediately after that checks if the OIV image has been written. The race 
lies in between completion of checkpoint and checking for OIV image.
I have added a wait for writing the OIV image. This prevents the test from 
failing due to the race and if the OIV image is not written even after 5000ms, 
the test will fail. Which is what is expected.

 TestStandbyCheckpoint#testSBNCheckpoints is racy
 

 Key: HDFS-6487
 URL: https://issues.apache.org/jira/browse/HDFS-6487
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.1
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6487.patch


 testSBNCheckpoints fails occasionally.
 I could not reproduce it consistently but it would fail 8 out of 10 times 
 after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6487:


Affects Version/s: (was: 2.5.0)
   2.4.1
   Status: Patch Available  (was: Open)

 TestStandbyCheckpoint#testSBNCheckpoints is racy
 

 Key: HDFS-6487
 URL: https://issues.apache.org/jira/browse/HDFS-6487
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.1
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6487.patch


 testSBNCheckpoints fails occasionally.
 I could not reproduce it consistently but it would fail 8 out of 10 times 
 after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-05 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019320#comment-14019320
 ] 

Mit Desai commented on HDFS-6487:
-

Thanks Andrew for looking into the patch. Using GenericTestUtils.waitFor looks 
like a better option. I will update my patch.
For the timeour, 5sec works for me now. But I will increase it to 60s (It does 
not hurt in waiting a little longer. It will come out of the wait before that 
time anyways)

 TestStandbyCheckpoint#testSBNCheckpoints is racy
 

 Key: HDFS-6487
 URL: https://issues.apache.org/jira/browse/HDFS-6487
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.1
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6487.patch


 testSBNCheckpoints fails occasionally.
 I could not reproduce it consistently but it would fail 8 out of 10 times 
 after I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6487) TestStandbyCheckpoint#testSBNCheckpoints is racy

2014-06-04 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6487:
---

 Summary: TestStandbyCheckpoint#testSBNCheckpoints is racy
 Key: HDFS-6487
 URL: https://issues.apache.org/jira/browse/HDFS-6487
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai


testSBNCheckpoints fails occasionally.
I could not reproduce it consistently but it would fail 8 out of 10 times after 
I did mvn clean, mvn install and run the test



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-19 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6421:


Attachment: HDFS-6421.patch

Thanks [~cmccabe] for taking a reviewing the patch.
Attaching the new patch addressing your comments.

 RHEL4 fails to compile vecsum.c
 ---

 Key: HDFS-6421
 URL: https://issues.apache.org/jira/browse/HDFS-6421
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.5.0
 Environment: RHEL4
Reporter: Jason Lowe
Assignee: Mit Desai
 Attachments: HDFS-6421.patch, HDFS-6421.patch


 After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
 have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
 compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-19 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001718#comment-14001718
 ] 

Mit Desai commented on HDFS-6421:
-

Correction: Thanks [~cmccabe] for reviewing the patch. :-) 

 RHEL4 fails to compile vecsum.c
 ---

 Key: HDFS-6421
 URL: https://issues.apache.org/jira/browse/HDFS-6421
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.5.0
 Environment: RHEL4
Reporter: Jason Lowe
Assignee: Mit Desai
 Attachments: HDFS-6421.patch, HDFS-6421.patch


 After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
 have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
 compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-19 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6421:


Status: Open  (was: Patch Available)

 RHEL4 fails to compile vecsum.c
 ---

 Key: HDFS-6421
 URL: https://issues.apache.org/jira/browse/HDFS-6421
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.5.0
 Environment: RHEL4
Reporter: Jason Lowe
Assignee: Mit Desai
 Attachments: HDFS-6421.patch, HDFS-6421.patch


 After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
 have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
 compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-19 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6421:


Status: Patch Available  (was: Open)

 RHEL4 fails to compile vecsum.c
 ---

 Key: HDFS-6421
 URL: https://issues.apache.org/jira/browse/HDFS-6421
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.5.0
 Environment: RHEL4
Reporter: Jason Lowe
Assignee: Mit Desai
 Attachments: HDFS-6421.patch, HDFS-6421.patch


 After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
 have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
 compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2014-05-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned HDFS-742:
--

Assignee: Mit Desai  (was: Hairong Kuang)

 A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
 partial block list
 

 Key: HDFS-742
 URL: https://issues.apache.org/jira/browse/HDFS-742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Reporter: Hairong Kuang
Assignee: Mit Desai

 We had a balancer that had not made any progress for a long time. It turned 
 out it was repeatingly asking Namenode for a partial block list of one 
 datanode, which was done while the balancer was running.
 NameNode should notify Balancer that the datanode is not available and 
 Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Status: Open  (was: Patch Available)

 Expose upgrade status through NameNode web UI
 -

 Key: HDFS-6230
 URL: https://issues.apache.org/jira/browse/HDFS-6230
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Mit Desai
 Attachments: HDFS-6230-NoUpgradesInProgress.png, 
 HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch, HDFS-6230.patch


 The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
 the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned HDFS-6421:
---

Assignee: Mit Desai

 RHEL4 fails to compile vecsum.c
 ---

 Key: HDFS-6421
 URL: https://issues.apache.org/jira/browse/HDFS-6421
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.5.0
 Environment: RHEL4
Reporter: Jason Lowe
Assignee: Mit Desai

 After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
 have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
 compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6421:


Attachment: HDFS-6421.patch

This code in the stopwatch structure gets the rusage and stores it into 
{{struct rusage rusage;}} but it is never used. 
{code}
if (getrusage(RUSAGE_THREAD, watch-rusage)  0) {
int err = errno;
fprintf(stderr, getrusage failed: error %d (%s)\n,
err, strerror(err));
goto error;
}
{code}

Removing the block as to get REHL4 compiling again.

 RHEL4 fails to compile vecsum.c
 ---

 Key: HDFS-6421
 URL: https://issues.apache.org/jira/browse/HDFS-6421
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.5.0
 Environment: RHEL4
Reporter: Jason Lowe
Assignee: Mit Desai
 Attachments: HDFS-6421.patch


 After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
 have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
 compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Status: Patch Available  (was: Open)

 Expose upgrade status through NameNode web UI
 -

 Key: HDFS-6230
 URL: https://issues.apache.org/jira/browse/HDFS-6230
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Mit Desai
 Attachments: HDFS-6230-NoUpgradesInProgress.png, 
 HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch, HDFS-6230.patch


 The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
 the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6421:


Status: Patch Available  (was: Open)

 RHEL4 fails to compile vecsum.c
 ---

 Key: HDFS-6421
 URL: https://issues.apache.org/jira/browse/HDFS-6421
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.5.0
 Environment: RHEL4
Reporter: Jason Lowe
Assignee: Mit Desai
 Attachments: HDFS-6421.patch


 After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
 have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
 compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2014-05-15 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993598#comment-13993598
 ] 

Mit Desai commented on HDFS-742:


Taking this over. Feel free to reassign if you are still working on it.

 A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
 partial block list
 

 Key: HDFS-742
 URL: https://issues.apache.org/jira/browse/HDFS-742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Reporter: Hairong Kuang
Assignee: Hairong Kuang

 We had a balancer that had not made any progress for a long time. It turned 
 out it was repeatingly asking Namenode for a partial block list of one 
 datanode, which was done while the balancer was running.
 NameNode should notify Balancer that the datanode is not available and 
 Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-13 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Attachment: HDFS-6230.patch

Thanks for looking at the patch [~wheat9]. Posting the updated patch.

 Expose upgrade status through NameNode web UI
 -

 Key: HDFS-6230
 URL: https://issues.apache.org/jira/browse/HDFS-6230
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Mit Desai
 Attachments: HDFS-6230-NoUpgradesInProgress.png, 
 HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch, HDFS-6230.patch


 The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
 the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2014-05-11 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992781#comment-13992781
 ] 

Mit Desai commented on HDFS-742:


Hey [~hairong], are you still working on this JIRA? If not, I can take it over 
and work on it.

 A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
 partial block list
 

 Key: HDFS-742
 URL: https://issues.apache.org/jira/browse/HDFS-742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Reporter: Hairong Kuang
Assignee: Hairong Kuang

 We had a balancer that had not made any progress for a long time. It turned 
 out it was repeatingly asking Namenode for a partial block list of one 
 datanode, which was done while the balancer was running.
 NameNode should notify Balancer that the datanode is not available and 
 Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Attachment: HDFS-6230-UpgradeInProgress.jpg
HDFS-6230-NoUpgradesInProgress.png

 Expose upgrade status through NameNode web UI
 -

 Key: HDFS-6230
 URL: https://issues.apache.org/jira/browse/HDFS-6230
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Mit Desai
 Attachments: HDFS-6230-NoUpgradesInProgress.png, 
 HDFS-6230-UpgradeInProgress.jpg


 The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
 the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-05-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6230:


Status: Patch Available  (was: Open)

 Expose upgrade status through NameNode web UI
 -

 Key: HDFS-6230
 URL: https://issues.apache.org/jira/browse/HDFS-6230
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Mit Desai
 Attachments: HDFS-6230-NoUpgradesInProgress.png, 
 HDFS-6230-UpgradeInProgress.jpg, HDFS-6230.patch


 The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
 the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-04-30 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985963#comment-13985963
 ] 

Mit Desai commented on HDFS-6230:
-

[~arpitagarwal] are you working on the jira?

 Expose upgrade status through NameNode web UI
 -

 Key: HDFS-6230
 URL: https://issues.apache.org/jira/browse/HDFS-6230
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
 the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-04-30 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned HDFS-6230:
---

Assignee: Mit Desai  (was: Arpit Agarwal)

 Expose upgrade status through NameNode web UI
 -

 Key: HDFS-6230
 URL: https://issues.apache.org/jira/browse/HDFS-6230
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Mit Desai

 The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
 the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6230) Expose upgrade status through NameNode web UI

2014-04-30 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985985#comment-13985985
 ] 

Mit Desai commented on HDFS-6230:
-

Thanks! Taking it over

 Expose upgrade status through NameNode web UI
 -

 Key: HDFS-6230
 URL: https://issues.apache.org/jira/browse/HDFS-6230
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Mit Desai

 The NameNode web UI does not show upgrade information anymore. Hadoop 2.0 
 also does not have the _hadoop dfsadmin -upgradeProgress_ command to check 
 the upgrade status.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2

2014-04-29 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984337#comment-13984337
 ] 

Mit Desai commented on HDFS-5892:
-

[~wheat9], taking a closer look on the commits, I found that this is not yet 
fixed into 2.4. Do we want to commit this into 2.4.1 and change the Fix version 
to 2.4.1 or edit the fix version to 2.5.0?

 TestDeleteBlockPool fails in branch-2
 -

 Key: HDFS-5892
 URL: https://issues.apache.org/jira/browse/HDFS-5892
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
 Fix For: 2.4.0

 Attachments: HDFS-5892.patch, 
 org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt


 Running test suite on Linux, I got:
 {code}
 testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool)
   Time elapsed: 8.143 sec   ERROR!
 java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting...
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.

2014-04-25 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai resolved HDFS-3122.
-

  Resolution: Not a Problem
Target Version/s: 0.23.3, 0.24.0  (was: 0.24.0, 0.23.3)

Haven't heard anything yet. Resolving this issue. Feel free to reopen if anyone 
thinks the other way.

 Block recovery with closeFile flag true can race with blockReport. Due to 
 this blocks are getting marked as corrupt.
 

 Key: HDFS-3122
 URL: https://issues.apache.org/jira/browse/HDFS-3122
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 0.23.0, 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Critical
 Attachments: blockCorrupt.txt


 *Block Report* can *race* with *Block Recovery* with closeFile flag true.
  Block report generated just before block recovery at DN side and due to N/W 
 problems, block report got delayed to NN. 
 After this, recovery success and generation stamp modifies to new one. 
 And primary DN invokes the commitBlockSynchronization and block got updated 
 in NN side. Also block got marked as complete, since the closeFile flag was 
 true. Updated with new genstamp.
 Now blockReport started processing at NN side. This particular block from RBW 
 (when it generated the BR at DN), and file was completed at NN side.
 Finally block will be marked as corrupt because of genstamp mismatch.
 {code}
  case RWR:
   if (!storedBlock.isComplete()) {
 return null; // not corrupt
   } else if (storedBlock.getGenerationStamp() != 
 iblk.getGenerationStamp()) {
 return new BlockToMarkCorrupt(storedBlock,
 reported  + reportedState +  replica with genstamp  +
 iblk.getGenerationStamp() +  does not match COMPLETE block's  +
 genstamp in block map  + storedBlock.getGenerationStamp());
   } else { // COMPLETE block, same genstamp
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.

2014-04-24 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979848#comment-13979848
 ] 

Mit Desai commented on HDFS-3122:
-

Hi [~umamaheswararao],
Is this still an issue? I looked at the code and I think this got fixed 
sometime.
Here is the code snippet from BlockManager
{code}
case RWR:
  if (!storedBlock.isComplete()) {
return null; // not corrupt
  } else if (storedBlock.getGenerationStamp() != 
reported.getGenerationStamp()) {
final long reportedGS = reported.getGenerationStamp();
return new BlockToMarkCorrupt(storedBlock, reportedGS,
reported  + reportedState +  replica with genstamp  + reportedGS
+  does not match COMPLETE block's genstamp in block map 
+ storedBlock.getGenerationStamp(), Reason.GENSTAMP_MISMATCH);
  } else { // COMPLETE block, same genstamp
if (reportedState == ReplicaState.RBW) {
  // If it's a RBW report for a COMPLETE block, it may just be that
  // the block report got a little bit delayed after the pipeline
  // closed. So, ignore this report, assuming we will get a
  // FINALIZED replica later. See HDFS-2791
  LOG.info(Received an RBW replica for  + storedBlock +
   on  + dn + : ignoring it, since it is  +
  complete with the same genstamp);
  return null;
} else {
  return new BlockToMarkCorrupt(storedBlock,
  reported replica has invalid state  + reportedState,
  Reason.INVALID_STATE);
}
  }
{code}

I will resolve this Jira as Not a Problem tomorrow unless someone wants to go 
some other way.

 Block recovery with closeFile flag true can race with blockReport. Due to 
 this blocks are getting marked as corrupt.
 

 Key: HDFS-3122
 URL: https://issues.apache.org/jira/browse/HDFS-3122
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 0.23.0, 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Critical
 Attachments: blockCorrupt.txt


 *Block Report* can *race* with *Block Recovery* with closeFile flag true.
  Block report generated just before block recovery at DN side and due to N/W 
 problems, block report got delayed to NN. 
 After this, recovery success and generation stamp modifies to new one. 
 And primary DN invokes the commitBlockSynchronization and block got updated 
 in NN side. Also block got marked as complete, since the closeFile flag was 
 true. Updated with new genstamp.
 Now blockReport started processing at NN side. This particular block from RBW 
 (when it generated the BR at DN), and file was completed at NN side.
 Finally block will be marked as corrupt because of genstamp mismatch.
 {code}
  case RWR:
   if (!storedBlock.isComplete()) {
 return null; // not corrupt
   } else if (storedBlock.getGenerationStamp() != 
 iblk.getGenerationStamp()) {
 return new BlockToMarkCorrupt(storedBlock,
 reported  + reportedState +  replica with genstamp  +
 iblk.getGenerationStamp() +  does not match COMPLETE block's  +
 genstamp in block map  + storedBlock.getGenerationStamp());
   } else { // COMPLETE block, same genstamp
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered

2014-04-22 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai resolved HDFS-2734.
-

  Resolution: Not a Problem
Target Version/s: 0.23.0, 0.20.1  (was: 0.20.1, 0.23.0)

I think this issue is not a problem. Resolving it as Not a Problem. But feel 
free to reopen this jira if you still feel there is a problem

 Even if we configure the property fs.checkpoint.size in both core-site.xml 
 and hdfs-site.xml  the values are not been considered
 

 Key: HDFS-2734
 URL: https://issues.apache.org/jira/browse/HDFS-2734
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.20.1, 0.23.0
Reporter: J.Andreina
Priority: Minor

 Even if we configure the property fs.checkpoint.size in both core-site.xml 
 and hdfs-site.xml  the values are not been considered



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2

2014-04-16 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971592#comment-13971592
 ] 

Mit Desai commented on HDFS-5892:
-

[~yuzhih...@gmail.com] [~dandan] : Are you guys still having the issues? This 
test still fails randomly in our nightly builds

 TestDeleteBlockPool fails in branch-2
 -

 Key: HDFS-5892
 URL: https://issues.apache.org/jira/browse/HDFS-5892
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
 Fix For: 2.4.0

 Attachments: HDFS-5892.patch, 
 org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt


 Running test suite on Linux, I got:
 {code}
 testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool)
   Time elapsed: 8.143 sec   ERROR!
 java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting...
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-4587) Webhdfs secure clients are incompatible with non-secure NN

2014-04-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-4587:


Target Version/s: 3.0.0  (was: 3.0.0, 0.23.11)

 Webhdfs secure clients are incompatible with non-secure NN
 --

 Key: HDFS-4587
 URL: https://issues.apache.org/jira/browse/HDFS-4587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, webhdfs
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Daryn Sharp

 A secure webhdfs client will receive an exception from a non-secure NN.  For 
 a NN in non-secure mode, {{FSNamesystem#getDelegationToken}} will return 
 null to indicate no token is required.  Hdfs will send back the null to the 
 client, but webhdfs uses {{DelegationTokenSecretManager.createCredentials}} 
 which instead throws an exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4587) Webhdfs secure clients are incompatible with non-secure NN

2014-04-14 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968382#comment-13968382
 ] 

Mit Desai commented on HDFS-4587:
-

As 0.23 is going into the maintenance state and this bug will not be fixed in 
it, I am removing the target version for 0.23.11

 Webhdfs secure clients are incompatible with non-secure NN
 --

 Key: HDFS-4587
 URL: https://issues.apache.org/jira/browse/HDFS-4587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, webhdfs
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Daryn Sharp

 A secure webhdfs client will receive an exception from a non-secure NN.  For 
 a NN in non-secure mode, {{FSNamesystem#getDelegationToken}} will return 
 null to indicate no token is required.  Hdfs will send back the null to the 
 client, but webhdfs uses {{DelegationTokenSecretManager.createCredentials}} 
 which instead throws an exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-4576) Webhdfs authentication issues

2014-04-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai resolved HDFS-4576.
-

   Resolution: Fixed
Fix Version/s: 0.23.11
   3.0.0

Resolving this task as resolved as all of its subtasks are resolved now

 Webhdfs authentication issues
 -

 Key: HDFS-4576
 URL: https://issues.apache.org/jira/browse/HDFS-4576
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha, 3.0.0, 0.23.7
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Fix For: 3.0.0, 0.23.11


 Umbrella jira to track the webhdfs authentication issues as subtasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-4587) Webhdfs secure clients are incompatible with non-secure NN

2014-04-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-4587:


Issue Type: Bug  (was: Sub-task)
Parent: (was: HDFS-4576)

 Webhdfs secure clients are incompatible with non-secure NN
 --

 Key: HDFS-4587
 URL: https://issues.apache.org/jira/browse/HDFS-4587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, webhdfs
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Daryn Sharp

 A secure webhdfs client will receive an exception from a non-secure NN.  For 
 a NN in non-secure mode, {{FSNamesystem#getDelegationToken}} will return 
 null to indicate no token is required.  Hdfs will send back the null to the 
 client, but webhdfs uses {{DelegationTokenSecretManager.createCredentials}} 
 which instead throws an exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered

2014-04-11 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966723#comment-13966723
 ] 

Mit Desai commented on HDFS-2734:
-

I see that that there is no activity on this Jira since a long time. 
[~andreina], Is this still reproducible on your side? If this is still an 
issue, can you provide the information [~qwertymaniac] requested?
For the analysis that Harsh did, I think this is not reproducable on his side 
and I have not seen anyone else raising this concern. In that case, if I do not 
hear back by 4/17/14, I will go ahead and close this issue as Not A Problem.

-Mit

 Even if we configure the property fs.checkpoint.size in both core-site.xml 
 and hdfs-site.xml  the values are not been considered
 

 Key: HDFS-2734
 URL: https://issues.apache.org/jira/browse/HDFS-2734
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.20.1, 0.23.0
Reporter: J.Andreina
Priority: Minor

 Even if we configure the property fs.checkpoint.size in both core-site.xml 
 and hdfs-site.xml  the values are not been considered



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails

2014-04-09 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964459#comment-13964459
 ] 

Mit Desai commented on HDFS-5983:
-

Reviewed the patch. LGTM
+1 (non binding)

 TestSafeMode#testInitializeReplQueuesEarly fails
 

 Key: HDFS-5983
 URL: https://issues.apache.org/jira/browse/HDFS-5983
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Ming Ma
 Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt


 It was seen from one of the precommit build of HDFS-5962.  The test case 
 creates 15 blocks and then shuts down all datanodes. Then the namenode is 
 restarted with a low safe block threshold and one datanode is restarted. The 
 idea is that the initial block report from the restarted datanode will make 
 the namenode leave the safemode and initialize the replication queues.
 According to the log, the datanode reported 3 blocks, but slightly before 
 that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails

2014-04-09 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5983:


Status: Patch Available  (was: Open)

 TestSafeMode#testInitializeReplQueuesEarly fails
 

 Key: HDFS-5983
 URL: https://issues.apache.org/jira/browse/HDFS-5983
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Ming Ma
 Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt


 It was seen from one of the precommit build of HDFS-5962.  The test case 
 creates 15 blocks and then shuts down all datanodes. Then the namenode is 
 restarted with a low safe block threshold and one datanode is restarted. The 
 idea is that the initial block report from the restarted datanode will make 
 the namenode leave the safemode and initialize the replication queues.
 According to the log, the datanode reported 3 blocks, but slightly before 
 that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails

2014-04-09 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964462#comment-13964462
 ] 

Mit Desai commented on HDFS-5983:
-

One note, you need to Submit the Patch once you upload the patch to get the 
HadoopQA Comment. I just did that.

 TestSafeMode#testInitializeReplQueuesEarly fails
 

 Key: HDFS-5983
 URL: https://issues.apache.org/jira/browse/HDFS-5983
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Ming Ma
 Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt


 It was seen from one of the precommit build of HDFS-5962.  The test case 
 creates 15 blocks and then shuts down all datanodes. Then the namenode is 
 restarted with a low safe block threshold and one datanode is restarted. The 
 idea is that the initial block report from the restarted datanode will make 
 the namenode leave the safemode and initialize the replication queues.
 According to the log, the datanode reported 3 blocks, but slightly before 
 that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails

2014-04-09 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964491#comment-13964491
 ] 

Mit Desai commented on HDFS-5983:
-

[~airbots], [~mingma] : Can any of you regenerate the patch and attach it to 
make sure it applies successfully?

Mit

 TestSafeMode#testInitializeReplQueuesEarly fails
 

 Key: HDFS-5983
 URL: https://issues.apache.org/jira/browse/HDFS-5983
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Ming Ma
 Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt


 It was seen from one of the precommit build of HDFS-5962.  The test case 
 creates 15 blocks and then shuts down all datanodes. Then the namenode is 
 restarted with a low safe block threshold and one datanode is restarted. The 
 idea is that the initial block report from the restarted datanode will make 
 the namenode leave the safemode and initialize the replication queues.
 According to the log, the datanode reported 3 blocks, but slightly before 
 that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails

2014-04-09 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5983:


Status: Open  (was: Patch Available)

 TestSafeMode#testInitializeReplQueuesEarly fails
 

 Key: HDFS-5983
 URL: https://issues.apache.org/jira/browse/HDFS-5983
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Ming Ma
 Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt


 It was seen from one of the precommit build of HDFS-5962.  The test case 
 creates 15 blocks and then shuts down all datanodes. Then the namenode is 
 restarted with a low safe block threshold and one datanode is restarted. The 
 idea is that the initial block report from the restarted datanode will make 
 the namenode leave the safemode and initialize the replication queues.
 According to the log, the datanode reported 3 blocks, but slightly before 
 that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5983) TestSafeMode#testInitializeReplQueuesEarly fails

2014-04-09 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964579#comment-13964579
 ] 

Mit Desai commented on HDFS-5983:
-

Already fixed by HDFS-6160. So Closing it.

 TestSafeMode#testInitializeReplQueuesEarly fails
 

 Key: HDFS-5983
 URL: https://issues.apache.org/jira/browse/HDFS-5983
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Kihwal Lee
Assignee: Ming Ma
 Attachments: HDFS-5983-updated.patch, HDFS-5983.patch, testlog.txt


 It was seen from one of the precommit build of HDFS-5962.  The test case 
 creates 15 blocks and then shuts down all datanodes. Then the namenode is 
 restarted with a low safe block threshold and one datanode is restarted. The 
 idea is that the initial block report from the restarted datanode will make 
 the namenode leave the safemode and initialize the replication queues.
 According to the log, the datanode reported 3 blocks, but slightly before 
 that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2

2014-04-07 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6195:
---

 Summary: 
TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
intermittently fails on trunk and branch2
 Key: HDFS-6195
 URL: https://issues.apache.org/jira/browse/HDFS-6195
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai


The test has 1 containers that it tries to cleanup.
The cleanup has a timeout of 2ms in which the test sometimes cannot do the 
cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2

2014-04-07 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961887#comment-13961887
 ] 

Mit Desai commented on HDFS-6195:
-

analyzing the cause. Will post the analysis/fix soon

 TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
 intermittently fails on trunk and branch2
 --

 Key: HDFS-6195
 URL: https://issues.apache.org/jira/browse/HDFS-6195
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai

 The test has 1 containers that it tries to cleanup.
 The cleanup has a timeout of 2ms in which the test sometimes cannot do 
 the cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2

2014-04-07 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6195:


Attachment: HDFS-6195.patch

Attaching the patch for trunk and branch2

 TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
 intermittently fails on trunk and branch2
 --

 Key: HDFS-6195
 URL: https://issues.apache.org/jira/browse/HDFS-6195
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-6195.patch


 The test has 1 containers that it tries to cleanup.
 The cleanup has a timeout of 2ms in which the test sometimes cannot do 
 the cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2

2014-04-07 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-6195:


Fix Version/s: 2.5.0
   3.0.0
   Status: Patch Available  (was: Open)

 TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
 intermittently fails on trunk and branch2
 --

 Key: HDFS-6195
 URL: https://issues.apache.org/jira/browse/HDFS-6195
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6195.patch


 The test has 1 containers that it tries to cleanup.
 The cleanup has a timeout of 2ms in which the test sometimes cannot do 
 the cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2

2014-04-07 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961984#comment-13961984
 ] 

Mit Desai commented on HDFS-6195:
-

While cleaning up the containers,
{code}
while (cleanedSize  allocatedSize  waitCount++  200) {
  Thread.sleep(100);
  resp = nm.nodeHeartbeat(true);
  cleaned = resp.getContainersToCleanup();
  cleanedSize += cleaned.size();
}
{code}

The test sometimes cannot do the complete cleanup and some of the 1 
containers cannot be cleaned up. Resulting an assertion error at 
{{Assert.assertEquals(allocatedSize, cleanedSize);}}.

This test has been failing in our nightly builds since couple of days. I was 
able to reproduce this consistently on eclipse but not using maven. I think 
this is an environment issue so cannot be reproduced everywhere.

As a fix, I have increased the thread sleep time in the while loop. Which will 
give some extra time for the container cleanup. And as there is also a check in 
the while loop for the allocated size and cleaned size, the test will not 
always take up all cycles in the loop.

 TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
 intermittently fails on trunk and branch2
 --

 Key: HDFS-6195
 URL: https://issues.apache.org/jira/browse/HDFS-6195
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6195.patch


 The test has 1 containers that it tries to cleanup.
 The cleanup has a timeout of 2ms in which the test sometimes cannot do 
 the cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6195) TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails on trunk and branch2

2014-04-07 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962088#comment-13962088
 ] 

Mit Desai commented on HDFS-6195:
-

TestRMRestart is a different issue related to YARN-1906

 TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and 
 intermittently fails on trunk and branch2
 --

 Key: HDFS-6195
 URL: https://issues.apache.org/jira/browse/HDFS-6195
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6195.patch


 The test has 1 containers that it tries to cleanup.
 The cleanup has a timeout of 2ms in which the test sometimes cannot do 
 the cleanup completely and gives out an Assertion Failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5807) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2

2014-03-26 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947957#comment-13947957
 ] 

Mit Desai commented on HDFS-5807:
-

Thanks [~airbots]

 TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on 
 Branch-2
 

 Key: HDFS-5807
 URL: https://issues.apache.org/jira/browse/HDFS-5807
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.3.0
Reporter: Mit Desai
Assignee: Chen He
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5807.patch


 The test times out after some time.
 {noformat}
 java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
 to become 0.16, but on datanode 127.0.0.1:42451 it remains at 0.3 after more 
 than 2 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HDFS-5807) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2

2014-03-24 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reopened HDFS-5807:
-


[~airbots], I found this test failing again in our nightly builds, Can you take 
a look into it again? 

{noformat}
Error Message

Rebalancing expected avg utilization to become 0.16, but on datanode 
X.X.X.X: it remains at 0.3 after more than 4 msec.

Stacktrace

java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to 
become 0.16, but on datanode X.X.X.X: it remains at 0.3 after more than 
4 msec.
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302)

{noformat}

 TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on 
 Branch-2
 

 Key: HDFS-5807
 URL: https://issues.apache.org/jira/browse/HDFS-5807
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.3.0
Reporter: Mit Desai
Assignee: Chen He
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5807.patch


 The test times out after some time.
 {noformat}
 java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
 to become 0.16, but on datanode 127.0.0.1:42451 it remains at 0.3 after more 
 than 2 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.waitForBalancer(TestBalancerWithNodeGroup.java:151)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.runBalancer(TestBalancerWithNodeGroup.java:178)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup.testBalancerWithNodeGroup(TestBalancerWithNodeGroup.java:302)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6126) TestnameNodeMetrics#testCorruptBlock fails intermittently

2014-03-19 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6126:
---

 Summary: TestnameNodeMetrics#testCorruptBlock fails intermittently
 Key: HDFS-6126
 URL: https://issues.apache.org/jira/browse/HDFS-6126
 Project: Hadoop HDFS
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai


I get the following error
{noformat}
testCorruptBlock(org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics)
  Time elapsed: 5.556 sec   FAILURE!
java.lang.AssertionError: Bad value for metric CorruptBlocks expected:1 but 
was:0
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:190)
at 
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:247)


Results :

Failed tests: 
  TestNameNodeMetrics.testCorruptBlock:247 Bad value for metric CorruptBlocks 
expected:1 but was:0
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6104) TestFsLimits#testDefaultMaxComponentLength Fails on branch-2

2014-03-14 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6104:
---

 Summary: TestFsLimits#testDefaultMaxComponentLength Fails on 
branch-2
 Key: HDFS-6104
 URL: https://issues.apache.org/jira/browse/HDFS-6104
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai


testDefaultMaxComponentLength fails intermittently with the following error
{noformat}
java.lang.AssertionError: expected:0 but was:255
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.hadoop.hdfs.server.namenode.TestFsLimits.testDefaultMaxComponentLength(TestFsLimits.java:90)
{noformat}

On doing some research, I found that this is actually a JDK7 issue.
The test always fails when it runs after any test that runs addChildWithName() 
method



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2

2014-03-11 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930720#comment-13930720
 ] 

Mit Desai commented on HDFS-6035:
-

[~sathish.gurram], Can you let me know what branch are you testing this on?

 TestCacheDirectives#testCacheManagerRestart is failing on branch-2
 --

 Key: HDFS-6035
 URL: https://issues.apache.org/jira/browse/HDFS-6035
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: sathish
 Attachments: HDFS-6035-0001.patch


 {noformat}
 java.io.IOException: Inconsistent checkpoint fields.
 LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; 
 blockpoolId = BP-423574854-x.x.x.x-1393478669835.
 Expecting respectively: -51; 2; 0; testClusterID; 
 BP-2051361571-x.x.x.x-1393478572877.
   at 
 org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133)
   at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2

2014-03-05 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920948#comment-13920948
 ] 

Mit Desai commented on HDFS-6035:
-

I am trying but cannot reproduce it in eclipse as well. I'll have to put some 
more efforts and update you once I have some findings.

 TestCacheDirectives#testCacheManagerRestart is failing on branch-2
 --

 Key: HDFS-6035
 URL: https://issues.apache.org/jira/browse/HDFS-6035
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: sathish
 Attachments: HDFS-6035-0001.patch


 {noformat}
 java.io.IOException: Inconsistent checkpoint fields.
 LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; 
 blockpoolId = BP-423574854-x.x.x.x-1393478669835.
 Expecting respectively: -51; 2; 0; testClusterID; 
 BP-2051361571-x.x.x.x-1393478572877.
   at 
 org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133)
   at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE

2014-03-05 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5857:


Attachment: HDFS-5857.patch

Thanks for the inputs Haohui.
Attaching the updated patch

 TestWebHDFS#testNamenodeRestart fails intermittently with NPE
 -

 Key: HDFS-5857
 URL: https://issues.apache.org/jira/browse/HDFS-5857
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-5857.patch, HDFS-5857.patch


 {noformat}
 java.lang.AssertionError: There are 1 exception(s):
   Exception 0: 
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
   at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701)
   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816)
   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196)
   at 
 org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920)
   at java.lang.Thread.run(Thread.java:722)
   at org.junit.Assert.fail(Assert.java:93)
   at 
 org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031)
   at 
 org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951)
   at 
 org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE

2014-03-05 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921181#comment-13921181
 ] 

Mit Desai commented on HDFS-5857:
-

None of the test failures are related to the patch. I have manually tested them 
with the patch and they pass on my machine

 TestWebHDFS#testNamenodeRestart fails intermittently with NPE
 -

 Key: HDFS-5857
 URL: https://issues.apache.org/jira/browse/HDFS-5857
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-5857.patch, HDFS-5857.patch


 {noformat}
 java.lang.AssertionError: There are 1 exception(s):
   Exception 0: 
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
   at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701)
   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816)
   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196)
   at 
 org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920)
   at java.lang.Thread.run(Thread.java:722)
   at org.junit.Assert.fail(Assert.java:93)
   at 
 org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031)
   at 
 org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951)
   at 
 org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE

2014-03-04 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned HDFS-5857:
---

Assignee: Mit Desai

 TestWebHDFS#testNamenodeRestart fails intermittently with NPE
 -

 Key: HDFS-5857
 URL: https://issues.apache.org/jira/browse/HDFS-5857
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai

 {noformat}
 java.lang.AssertionError: There are 1 exception(s):
   Exception 0: 
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
   at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701)
   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816)
   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196)
   at 
 org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920)
   at java.lang.Thread.run(Thread.java:722)
   at org.junit.Assert.fail(Assert.java:93)
   at 
 org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031)
   at 
 org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951)
   at 
 org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5857) TestWebHDFS#testNamenodeRestart fails intermittently with NPE

2014-03-04 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5857:


Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

 TestWebHDFS#testNamenodeRestart fails intermittently with NPE
 -

 Key: HDFS-5857
 URL: https://issues.apache.org/jira/browse/HDFS-5857
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0, 3.0.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-5857.patch


 {noformat}
 java.lang.AssertionError: There are 1 exception(s):
   Exception 0: 
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
   at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:105)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:625)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:421)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.mkdirs(WebHdfsFileSystem.java:701)
   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1816)
   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:196)
   at 
 org.apache.hadoop.hdfs.TestDFSClientRetries$6.run(TestDFSClientRetries.java:920)
   at java.lang.Thread.run(Thread.java:722)
   at org.junit.Assert.fail(Assert.java:93)
   at 
 org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1031)
   at 
 org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:951)
   at 
 org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-5839) TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk

2014-03-04 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai resolved HDFS-5839.
-

Resolution: Duplicate

HDFS-5857 has a patch for this issue. I am resolving this JIRA so that we have 
a single Jira tracking it

 TestWebHDFS#testNamenodeRestart fails with NullPointerException in trunk
 

 Key: HDFS-5839
 URL: https://issues.apache.org/jira/browse/HDFS-5839
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Mit Desai
 Attachments: org.apache.hadoop.hdfs.web.TestWebHDFS-output.txt


 Here is test failure:
 {code}
 testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
 45.206 sec   FAILURE!
 java.lang.AssertionError: There are 1 exception(s):
   Exception 0: 
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
 at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:157)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:315)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:104)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:615)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:532)
 at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$OffsetUrlOpener.connect(WebHdfsFileSystem.java:878)
 at 
 org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119)
 at 
 org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103)
 at 
 org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:180)
 at java.io.FilterInputStream.read(FilterInputStream.java:83)
 at 
 org.apache.hadoop.hdfs.TestDFSClientRetries$5.run(TestDFSClientRetries.java:954)
 at java.lang.Thread.run(Thread.java:724)
 at org.junit.Assert.fail(Assert.java:93)
 at 
 org.apache.hadoop.hdfs.TestDFSClientRetries.assertEmpty(TestDFSClientRetries.java:1083)
 at 
 org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:1003)
 at 
 org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenodeRestart(TestWebHDFS.java:216)
 {code}
 From test output:
 {code}
 2014-01-27 17:55:59,388 WARN  resources.ExceptionHandler 
 (ExceptionHandler.java:toResponse(92)) - INTERNAL_SERVER_ERROR
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:166)
 at 
 org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:231)
 at 
 org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:658)
 at 
 org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.access$400(NamenodeWebHdfsMethods.java:116)
 at 
 org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:631)
 at 
 org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$3.run(NamenodeWebHdfsMethods.java:626)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1560)
 at 
 org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.get(NamenodeWebHdfsMethods.java:626)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
 at 
 com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
 at 
 com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
 at 
 com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
 at 
 com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
 at 
 com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
 at 
 

[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2

2014-03-03 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918110#comment-13918110
 ] 

Mit Desai commented on HDFS-6035:
-

Thanks for taking this issue Sathish. This test is failing in our nightly 
builds but I am unable to reproduce it. is there a specific way you were able 
to reproduce it?

 TestCacheDirectives#testCacheManagerRestart is failing on branch-2
 --

 Key: HDFS-6035
 URL: https://issues.apache.org/jira/browse/HDFS-6035
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: sathish
 Attachments: HDFS-6035-0001.patch


 {noformat}
 java.io.IOException: Inconsistent checkpoint fields.
 LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; 
 blockpoolId = BP-423574854-x.x.x.x-1393478669835.
 Expecting respectively: -51; 2; 0; testClusterID; 
 BP-2051361571-x.x.x.x-1393478572877.
   at 
 org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133)
   at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5950) The DFSClient and DataNode should use shared memory segments to communicate short-circuit information

2014-03-03 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918570#comment-13918570
 ] 

Mit Desai commented on HDFS-5950:
-

Hey, I just found that this check in causes a Release Audit Warning for the 
empty file _TestShortCircuitShm.java_

 The DFSClient and DataNode should use shared memory segments to communicate 
 short-circuit information
 -

 Key: HDFS-5950
 URL: https://issues.apache.org/jira/browse/HDFS-5950
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.4.0

 Attachments: HDFS-5950.001.patch, HDFS-5950.003.patch, 
 HDFS-5950.004.patch, HDFS-5950.006.patch, HDFS-5950.007.patch, 
 HDFS-5950.008.patch


 The DFSClient and DataNode should use the shared memory segments and unified 
 cache added in the other HDFS-5182 subtasks to communicate short-circuit 
 information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2

2014-02-28 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6035:
---

 Summary: TestCacheDirectives#testCacheManagerRestart is failing on 
branch-2
 Key: HDFS-6035
 URL: https://issues.apache.org/jira/browse/HDFS-6035
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Mit Desai


{noformat}
java.io.IOException: Inconsistent checkpoint fields.
LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; 
blockpoolId = BP-423574854-x.x.x.x-1393478669835.
Expecting respectively: -51; 2; 0; testClusterID; 
BP-2051361571-x.x.x.x-1393478572877.
at 
org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526)
at 
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-15 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5780:


Attachment: HDFS-5780-v3.patch

New Patch Attached. No code changes after form the previous patch.
This patch only contains the change in the comment where the Thread timeout was 
changed from 1sec to 2sec

 TestRBWBlockInvalidation times out intemittently on branch-2
 

 Key: HDFS-5780
 URL: https://issues.apache.org/jira/browse/HDFS-5780
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-5780-v3.patch, HDFS-5780.patch, HDFS-5780.patch


 i recently found out that the test 
 TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
 out intermittently.
 I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5780:


Attachment: HDFS-5780.patch

Attaching the patch.
We need to change the conditions in the test because the test failure is due to 
the Replication Monitor coming and making the changes to the corrupted block 
before the test checks for it. The test will than keep on waiting for the 
change to happen.

 TestRBWBlockInvalidation times out intemittently on branch-2
 

 Key: HDFS-5780
 URL: https://issues.apache.org/jira/browse/HDFS-5780
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-5780.patch


 i recently found out that the test 
 TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
 out intermittently.
 I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5780:


Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

 TestRBWBlockInvalidation times out intemittently on branch-2
 

 Key: HDFS-5780
 URL: https://issues.apache.org/jira/browse/HDFS-5780
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0, 3.0.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-5780.patch


 i recently found out that the test 
 TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
 out intermittently.
 I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-14 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901969#comment-13901969
 ] 

Mit Desai commented on HDFS-5780:
-

Thanks Arpit. I will address your concerns and post another patch.

 TestRBWBlockInvalidation times out intemittently on branch-2
 

 Key: HDFS-5780
 URL: https://issues.apache.org/jira/browse/HDFS-5780
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-5780.patch


 i recently found out that the test 
 TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
 out intermittently.
 I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5780:


Status: Open  (was: Patch Available)

 TestRBWBlockInvalidation times out intemittently on branch-2
 

 Key: HDFS-5780
 URL: https://issues.apache.org/jira/browse/HDFS-5780
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0, 3.0.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-5780.patch


 i recently found out that the test 
 TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
 out intermittently.
 I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2

2014-02-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated HDFS-5780:


Attachment: HDFS-5780.patch

Attaching the new patch with the addressed changes. I have increased the 
timeout to 10minutes and I had to make few other timing related changes.

 TestRBWBlockInvalidation times out intemittently on branch-2
 

 Key: HDFS-5780
 URL: https://issues.apache.org/jira/browse/HDFS-5780
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: HDFS-5780.patch, HDFS-5780.patch


 i recently found out that the test 
 TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times 
 out intermittently.
 I am using Fedora, JDK7



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >