[jira] [Commented] (HDFS-5919) FileJournalManager doesn't purge empty and corrupt inprogress edits files

2014-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897594#comment-13897594
 ] 

Hadoop QA commented on HDFS-5919:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627976/HDFS-5919.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager
  org.apache.hadoop.hdfs.server.namenode.TestEditLog
  
org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
  org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6106//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6106//console

This message is automatically generated.

> FileJournalManager doesn't purge empty and corrupt inprogress edits files
> -
>
> Key: HDFS-5919
> URL: https://issues.apache.org/jira/browse/HDFS-5919
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFS-5919.patch
>
>
> FileJournalManager doesn't purge empty and corrupt inprogress edit files.
> These stale files will be accumulated over time.
> These should be cleared along with the purging of other edit logs



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5899) Add configuration flag to disable/enable support for ACLs.

2014-02-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897591#comment-13897591
 ] 

Chris Nauroth commented on HDFS-5899:
-

I've submitted a patch on issue HDFS-5925 to make this change.

> Add configuration flag to disable/enable support for ACLs.
> --
>
> Key: HDFS-5899
> URL: https://issues.apache.org/jira/browse/HDFS-5899
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: HDFS ACLs (HDFS-4685)
>
> Attachments: HDFS-5899.1.patch, HDFS-5899.2.patch
>
>
> Add a new configuration property that allows administrators to toggle support 
> for HDFS ACLs on/off.  By default, the flag will be off.  This is a 
> conservative choice, and administrators interested in using ACLs can enable 
> it explicitly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring

2014-02-10 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897590#comment-13897590
 ] 

Vinayakumar B commented on HDFS-5583:
-

{code}
+  OOB_TYPE1 = 8;  // Quick restart
+  OOB_TYPE2 = 9;  // Reserved
+  OOB_TYPE3 = 10; // Reserved
+  OOB_TYPE4 = 11; // Reserved
{code}
I think instead of OOB_TYPE1, OOB_TYPE2 better names could be given.. any 
thoughts?

{code}
  if (!responderClosed) { // Abnormal termination.{code}
I think comment is no more holds good. May be that can be removed.

changes done in {{sendAckUpstream()}} are not formatter correctly. Contains tab 
characters too.

Java doc could be added for {{Status myStatus}} in {{sendAckUpstream()}}

> Make DN send an OOB Ack on shutdown before restaring
> 
>
> Key: HDFS-5583
> URL: https://issues.apache.org/jira/browse/HDFS-5583
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-5583.patch
>
>
> Add an ability for data nodes to send an OOB response in order to indicate an 
> upcoming upgrade-restart. Client should ignore the pipeline error from the 
> node for a configured amount of time and try reconstruct the pipeline without 
> excluding the restarted node.  If the node does not come back in time, 
> regular pipeline recovery should happen.
> This feature is useful for the applications with a need to keep blocks local. 
> If the upgrade-restart is fast, the wait is preferable to losing locality.  
> It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5925) ACL configuration flag must only reject ACL API calls, not ACLs present in fsimage or edits.

2014-02-10 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5925:


Attachment: HDFS-5925.1.patch

Attaching patch.

> ACL configuration flag must only reject ACL API calls, not ACLs present in 
> fsimage or edits.
> 
>
> Key: HDFS-5925
> URL: https://issues.apache.org/jira/browse/HDFS-5925
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-5925.1.patch
>
>
> In follow-up discussion on HDFS-5899, we decided that it would cause less 
> harm to administrators if setting {{dfs.namenode.acls.enabled}} to false only 
> causes ACL API calls to be rejected.  Existing ACLs found in fsimage or edits 
> will be loaded and enforced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Work started] (HDFS-5925) ACL configuration flag must only reject ACL API calls, not ACLs present in fsimage or edits.

2014-02-10 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5925 started by Chris Nauroth.

> ACL configuration flag must only reject ACL API calls, not ACLs present in 
> fsimage or edits.
> 
>
> Key: HDFS-5925
> URL: https://issues.apache.org/jira/browse/HDFS-5925
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-5925.1.patch
>
>
> In follow-up discussion on HDFS-5899, we decided that it would cause less 
> harm to administrators if setting {{dfs.namenode.acls.enabled}} to false only 
> causes ACL API calls to be rejected.  Existing ACLs found in fsimage or edits 
> will be loaded and enforced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5925) ACL configuration flag must only reject ACL API calls, not ACLs present in fsimage or edits.

2014-02-10 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-5925:
---

 Summary: ACL configuration flag must only reject ACL API calls, 
not ACLs present in fsimage or edits.
 Key: HDFS-5925
 URL: https://issues.apache.org/jira/browse/HDFS-5925
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Chris Nauroth


In follow-up discussion on HDFS-5899, we decided that it would cause less harm 
to administrators if setting {{dfs.namenode.acls.enabled}} to false only causes 
ACL API calls to be rejected.  Existing ACLs found in fsimage or edits will be 
loaded and enforced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache

2014-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897581#comment-13897581
 ] 

Hadoop QA commented on HDFS-5810:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628147/HDFS-5810.020.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.http.TestHttpServerLifecycle

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6105//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6105//console

This message is automatically generated.

> Unify mmap cache and short-circuit file descriptor cache
> 
>
> Key: HDFS-5810
> URL: https://issues.apache.org/jira/browse/HDFS-5810
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, 
> HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, 
> HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch, 
> HDFS-5810.020.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.  
> Since mmaps are granted corresponding to file descriptors in the cache 
> (currently FileInputStreamCache), they have to be tracked together to do 
> "smarter" things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5914) Incorporate ACLs with the changes from HDFS-5698

2014-02-10 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-5914.
-

   Resolution: Fixed
Fix Version/s: HDFS ACLs (HDFS-4685)
 Hadoop Flags: Reviewed

+1 for the patch.  I committed it to the HDFS-4685 branch.  Thanks again for 
taking care of this, Haohui.

> Incorporate ACLs with the changes from HDFS-5698
> 
>
> Key: HDFS-5914
> URL: https://issues.apache.org/jira/browse/HDFS-5914
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode, security
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: HDFS ACLs (HDFS-4685)
>
> Attachments: HDFS-5914.000.patch, HDFS-5914.001.patch
>
>
> HDFS-5698 uses protobuf to serialize the FSImage. The code needs to be 
> updated to work with these changes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring

2014-02-10 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897548#comment-13897548
 ] 

Kihwal Lee edited comment on HDFS-5583 at 2/11/14 5:38 AM:
---

The patch makes DN send OOB acks to clients who are writing.  The added test 
case currently doesn't do much, but after the client-side changes, it will be 
updated.  

The OOB Ack sending can still be verified from running the new test case. The 
test log should show something like following:

{panel}
[DataNode]
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before 
restart
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(BlockReceiver.java:receiveBlock(731)) - Shutting down for restart 
(BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002).
2014-02-10 23:23:52,413 INFO  datanode.DataNode 
(BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type 
OOB_TYPE1

[Upstream Datanode]
2014-02-10 23:23:52,413 INFO  datanode.DataNode (BlockReceiver.java:run(1060)) 
- Relaying an out of band ack of type OOB_TYPE1

[Client]
2014-02-10 23:23:52,414 WARN  hdfs.DFSClient (DFSOutputStream.java:run(784)) - 
DFSOutputStream ResponseProcessor exception  for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002
java.io.IOException: Bad response OOB_TYPE1 for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 
127.0.0.1:55182
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732)
{panel}


was (Author: kihwal):
The patch makes DN send OOB acks to clients who are writing.  The added test 
case currently doesn't do much, but after the client-side changes, it will be 
updated.  

The OOB Ack sending can still be verified from running the test new case. The 
test log should show something like following:

{panel}
[DataNode]
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before 
restart
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(BlockReceiver.java:receiveBlock(731)) - Shutting down for restart 
(BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002).
2014-02-10 23:23:52,413 INFO  datanode.DataNode 
(BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type 
OOB_TYPE1

[Upstream Datanode]
2014-02-10 23:23:52,413 INFO  datanode.DataNode (BlockReceiver.java:run(1060)) 
- Relaying an out of band ack of type OOB_TYPE1

[Client]
2014-02-10 23:23:52,414 WARN  hdfs.DFSClient (DFSOutputStream.java:run(784)) - 
DFSOutputStream ResponseProcessor exception  for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002
java.io.IOException: Bad response OOB_TYPE1 for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 
127.0.0.1:55182
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732)
{panel}

> Make DN send an OOB Ack on shutdown before restaring
> 
>
> Key: HDFS-5583
> URL: https://issues.apache.org/jira/browse/HDFS-5583
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-5583.patch
>
>
> Add an ability for data nodes to send an OOB response in order to indicate an 
> upcoming upgrade-restart. Client should ignore the pipeline error from the 
> node for a configured amount of time and try reconstruct the pipeline without 
> excluding the restarted node.  If the node does not come back in time, 
> regular pipeline recovery should happen.
> This feature is useful for the applications with a need to keep blocks local. 
> If the upgrade-restart is fast, the wait is preferable to losing locality.  
> It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring

2014-02-10 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897548#comment-13897548
 ] 

Kihwal Lee edited comment on HDFS-5583 at 2/11/14 5:37 AM:
---

The patch makes DN send OOB acks to clients who are writing.  The added test 
case currently doesn't do much, but after the client-side changes, it will be 
updated.  

The OOB Ack sending can still be verified from running the test new case. The 
test log should show something like following:

{panel}
[DataNode]
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before 
restart
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(BlockReceiver.java:receiveBlock(731)) - Shutting down for restart 
(BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002).
2014-02-10 23:23:52,413 INFO  datanode.DataNode 
(BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type 
OOB_TYPE1

[Upstream Datanode]
2014-02-10 23:23:52,413 INFO  datanode.DataNode (BlockReceiver.java:run(1060)) 
- Relaying an out of band ack of type OOB_TYPE1

[Client]
2014-02-10 23:23:52,414 WARN  hdfs.DFSClient (DFSOutputStream.java:run(784)) - 
DFSOutputStream ResponseProcessor exception  for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002
java.io.IOException: Bad response OOB_TYPE1 for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 
127.0.0.1:55182
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732)
{panel}


was (Author: kihwal):
The patch makes DN send OOB acks to clients who are writing.  The added test 
case currently doesn't do much, but after the client-side changes, it will be 
updated.  

The OOB Ack sending can still be verified from running the test new case. The 
test log should show something like following:

{panel}
[DataNode]
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before 
restart
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(BlockReceiver.java:receiveBlock(731)) - Shutting down for restart 
(BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002).
2014-02-10 23:23:52,413 INFO  datanode.DataNode 
(BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type 
OOB_TYPE1

[Upstream Datanode]
2014-02-10 23:23:52,413 INFO  datanode.DataNode (BlockReceiver.java:run(1060)) 
- Relaying an out of band ack of type OOB_TYPE

[Client]
2014-02-10 23:23:52,414 WARN  hdfs.DFSClient (DFSOutputStream.java:run(784)) - 
DFSOutputStream ResponseProcessor exception  for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002
java.io.IOException: Bad response OOB_TYPE1 for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 
127.0.0.1:55182
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732)
{panel}

> Make DN send an OOB Ack on shutdown before restaring
> 
>
> Key: HDFS-5583
> URL: https://issues.apache.org/jira/browse/HDFS-5583
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-5583.patch
>
>
> Add an ability for data nodes to send an OOB response in order to indicate an 
> upcoming upgrade-restart. Client should ignore the pipeline error from the 
> node for a configured amount of time and try reconstruct the pipeline without 
> excluding the restarted node.  If the node does not come back in time, 
> regular pipeline recovery should happen.
> This feature is useful for the applications with a need to keep blocks local. 
> If the upgrade-restart is fast, the wait is preferable to losing locality.  
> It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring

2014-02-10 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897548#comment-13897548
 ] 

Kihwal Lee edited comment on HDFS-5583 at 2/11/14 5:37 AM:
---

The patch makes DN send OOB acks to clients who are writing.  The added test 
case currently doesn't do much, but after the client-side changes, it will be 
updated.  

The OOB Ack sending can still be verified from running the test new case. The 
test log should show something like following:

{panel}
[DataNode]
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before 
restart
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(BlockReceiver.java:receiveBlock(731)) - Shutting down for restart 
(BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002).
2014-02-10 23:23:52,413 INFO  datanode.DataNode 
(BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type 
OOB_TYPE1

[Upstream Datanode]
2014-02-10 23:23:52,413 INFO  datanode.DataNode (BlockReceiver.java:run(1060)) 
- Relaying an out of band ack of type OOB_TYPE

[Client]
2014-02-10 23:23:52,414 WARN  hdfs.DFSClient (DFSOutputStream.java:run(784)) - 
DFSOutputStream ResponseProcessor exception  for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002
java.io.IOException: Bad response OOB_TYPE1 for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 
127.0.0.1:55182
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732)
{panel}


was (Author: kihwal):
The patch makes DN send OOB acks to clients who are writing.  The added test 
case currently doesn't do much, but after the client-side changes, it will be 
updated.  

The OOB Ack sending can still be verified from running the test new case. The 
test log should show something like following:

{noformat}
[DataNode]
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before 
restart
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(BlockReceiver.java:receiveBlock(731)) - Shutting down for restart 
(BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002).
2014-02-10 23:23:52,413 INFO  datanode.DataNode 
(BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type 
OOB_TYPE1

[Upstream Datanode]
2014-02-10 23:23:52,413 INFO  datanode.DataNode (BlockReceiver.java:run(1060)) 
- Relaying an out of band ack of type OOB_TYPE

[Client]
2014-02-10 23:23:52,414 WARN  hdfs.DFSClient (DFSOutputStream.java:run(784)) - 
DFSOutputStream ResponseProcessor exception  for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002
java.io.IOException: Bad response OOB_TYPE1 for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 
127.0.0.1:55182
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732)
{noformat}

> Make DN send an OOB Ack on shutdown before restaring
> 
>
> Key: HDFS-5583
> URL: https://issues.apache.org/jira/browse/HDFS-5583
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-5583.patch
>
>
> Add an ability for data nodes to send an OOB response in order to indicate an 
> upcoming upgrade-restart. Client should ignore the pipeline error from the 
> node for a configured amount of time and try reconstruct the pipeline without 
> excluding the restarted node.  If the node does not come back in time, 
> regular pipeline recovery should happen.
> This feature is useful for the applications with a need to keep blocks local. 
> If the upgrade-restart is fast, the wait is preferable to losing locality.  
> It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring

2014-02-10 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5583:
-

Attachment: HDFS-5583.patch

The patch makes DN send OOB acks to clients who are writing.  The added test 
case currently doesn't do much, but after the client-side changes, it will be 
updated.  

The OOB Ack sending can still be verified from running the test new case. The 
test log should show something like following:

{noformat}
[DataNode]
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(DataXceiverServer.java:run(190)) - Shutting down DataXceiverServer before 
restart
2014-02-10 23:23:52,412 INFO  datanode.DataNode 
(BlockReceiver.java:receiveBlock(731)) - Shutting down for restart 
(BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002).
2014-02-10 23:23:52,413 INFO  datanode.DataNode 
(BlockReceiver.java:sendOOBResponse(977)) - Sending an out of band ack of type 
OOB_TYPE1

[Upstream Datanode]
2014-02-10 23:23:52,413 INFO  datanode.DataNode (BlockReceiver.java:run(1060)) 
- Relaying an out of band ack of type OOB_TYPE

[Client]
2014-02-10 23:23:52,414 WARN  hdfs.DFSClient (DFSOutputStream.java:run(784)) - 
DFSOutputStream ResponseProcessor exception  for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002
java.io.IOException: Bad response OOB_TYPE1 for block 
BP-203907574-10.0.1.17-1392096230619:blk_1073741825_1002 from datanode 
127.0.0.1:55182
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732)
{noformat}

> Make DN send an OOB Ack on shutdown before restaring
> 
>
> Key: HDFS-5583
> URL: https://issues.apache.org/jira/browse/HDFS-5583
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-5583.patch
>
>
> Add an ability for data nodes to send an OOB response in order to indicate an 
> upcoming upgrade-restart. Client should ignore the pipeline error from the 
> node for a configured amount of time and try reconstruct the pipeline without 
> excluding the restarted node.  If the node does not come back in time, 
> regular pipeline recovery should happen.
> This feature is useful for the applications with a need to keep blocks local. 
> If the upgrade-restart is fast, the wait is preferable to losing locality.  
> It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring

2014-02-10 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897545#comment-13897545
 ] 

Kihwal Lee commented on HDFS-5583:
--

This jira depends on HDFS-5585. I will post a patch, which applies on top of 
HDFS-5585.

> Make DN send an OOB Ack on shutdown before restaring
> 
>
> Key: HDFS-5583
> URL: https://issues.apache.org/jira/browse/HDFS-5583
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>
> Add an ability for data nodes to send an OOB response in order to indicate an 
> upcoming upgrade-restart. Client should ignore the pipeline error from the 
> node for a configured amount of time and try reconstruct the pipeline without 
> excluding the restarted node.  If the node does not come back in time, 
> regular pipeline recovery should happen.
> This feature is useful for the applications with a need to keep blocks local. 
> If the upgrade-restart is fast, the wait is preferable to losing locality.  
> It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Reopened] (HDFS-5396) FSImage.getFsImageName should check whether fsimage exists

2014-02-10 Thread zhaoyunjiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaoyunjiong reopened HDFS-5396:



I made a mistake when I resolved this as Not A Problem.
Because   
for (Iterator it = 
  dirIterator(NameNodeDirType.IMAGE); it.hasNext();)
sd = it.next(); 
will return last StorageDirectory of image, but due to HDFS-5367, it may not 
have fsimage in it.  

> FSImage.getFsImageName should check whether fsimage exists
> --
>
> Key: HDFS-5396
> URL: https://issues.apache.org/jira/browse/HDFS-5396
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.2.1
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Fix For: 1.3.0
>
> Attachments: HDFS-5396-branch-1.2.patch
>
>
> In https://issues.apache.org/jira/browse/HDFS-5367, fsimage may not write to 
> all IMAGE dir, so we need to check whether fsimage exists before 
> FSImage.getFsImageName returned.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5919) FileJournalManager doesn't purge empty and corrupt inprogress edits files

2014-02-10 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-5919:


Status: Patch Available  (was: Open)

> FileJournalManager doesn't purge empty and corrupt inprogress edits files
> -
>
> Key: HDFS-5919
> URL: https://issues.apache.org/jira/browse/HDFS-5919
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFS-5919.patch
>
>
> FileJournalManager doesn't purge empty and corrupt inprogress edit files.
> These stale files will be accumulated over time.
> These should be cleared along with the purging of other edit logs



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5899) Add configuration flag to disable/enable support for ACLs.

2014-02-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897519#comment-13897519
 ] 

Colin Patrick McCabe commented on HDFS-5899:


bq. Here is a compromise proposal. Let's reject the API calls when 
dfs.namenode.acls.enabled is false, but let's still load and enforce all 
existing ACLs found in fsimage or edits.

Sounds reasonable.

> Add configuration flag to disable/enable support for ACLs.
> --
>
> Key: HDFS-5899
> URL: https://issues.apache.org/jira/browse/HDFS-5899
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: HDFS ACLs (HDFS-4685)
>
> Attachments: HDFS-5899.1.patch, HDFS-5899.2.patch
>
>
> Add a new configuration property that allows administrators to toggle support 
> for HDFS ACLs on/off.  By default, the flag will be off.  This is a 
> conservative choice, and administrators interested in using ACLs can enable 
> it explicitly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache

2014-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897518#comment-13897518
 ] 

Hadoop QA commented on HDFS-5810:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628100/HDFS-5810.019.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6104//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6104//console

This message is automatically generated.

> Unify mmap cache and short-circuit file descriptor cache
> 
>
> Key: HDFS-5810
> URL: https://issues.apache.org/jira/browse/HDFS-5810
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, 
> HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, 
> HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch, 
> HDFS-5810.020.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.  
> Since mmaps are granted corresponding to file descriptors in the cache 
> (currently FileInputStreamCache), they have to be tracked together to do 
> "smarter" things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5917) Have an ability to refresh deadNodes list periodically

2014-02-10 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-5917:


Description: In current HBase + HDFS trunk impl, if one node is added into 
deadNodes map, before deadNodes.clear() be invoked, this node could not be 
chosen any more. When i fixed HDFS-5637, i had a raw thought, since there're 
not a few conditions could trigger a node be added into deadNodes map,  it 
would be better if we have an ability to refresh this cache map info 
automaticly. It's good for HBase scenario at least, e.g. before HDFS-5637 
fixed, if a local node be added into deadNodes, then it will read remotely even 
if the local node is live in real:) if more unfortunately, this block is in a 
huge HFile which doesn't be picked into any minor compaction in short period, 
the performance penality will be continued until a large compaction or region 
reopend or deadNodes.clear() be invoked...  (was: In current HBase + HDFS trunk 
impl, if one node is inserted into deadNodes list, before deadNodes.clear() be 
invoked, this node could not be choose always. When i fixed HDFS-5637, i had a 
raw thought, since there're not a few conditions could trigger a node be 
inserted into deadNodes,  we should have an ability to refresh this important 
cache list info automaticly. It's benefit for HBase scenario at least, e.g. 
before HDFS-5637 fixed, if a local node be inserted into deadNodes, then it 
will read remotely even the local node is not dead:) if more unfortunately, 
this block is in a huge HFile which doesn't be picked into any minor compaction 
in short period, the performance penality will be continued until a large 
compaction or region reopend or deadNodes.clear() be invoked...)

> Have an ability to refresh deadNodes list periodically
> --
>
> Key: HDFS-5917
> URL: https://issues.apache.org/jira/browse/HDFS-5917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-5917.txt
>
>
> In current HBase + HDFS trunk impl, if one node is added into deadNodes map, 
> before deadNodes.clear() be invoked, this node could not be chosen any more. 
> When i fixed HDFS-5637, i had a raw thought, since there're not a few 
> conditions could trigger a node be added into deadNodes map,  it would be 
> better if we have an ability to refresh this cache map info automaticly. It's 
> good for HBase scenario at least, e.g. before HDFS-5637 fixed, if a local 
> node be added into deadNodes, then it will read remotely even if the local 
> node is live in real:) if more unfortunately, this block is in a huge HFile 
> which doesn't be picked into any minor compaction in short period, the 
> performance penality will be continued until a large compaction or region 
> reopend or deadNodes.clear() be invoked...



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring

2014-02-10 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5583:
-

Summary: Make DN send an OOB Ack on shutdown before restaring  (was: Make 
DN send an OOB Ack on upgrade-shutdown)

> Make DN send an OOB Ack on shutdown before restaring
> 
>
> Key: HDFS-5583
> URL: https://issues.apache.org/jira/browse/HDFS-5583
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>
> Add an ability for data nodes to send an OOB response in order to indicate an 
> upcoming upgrade-restart. Client should ignore the pipeline error from the 
> node for a configured amount of time and try reconstruct the pipeline without 
> excluding the restarted node.  If the node does not come back in time, 
> regular pipeline recovery should happen.
> This feature is useful for the applications with a need to keep blocks local. 
> If the upgrade-restart is fast, the wait is preferable to losing locality.  
> It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on upgrade-shutdown

2014-02-10 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897509#comment-13897509
 ] 

Kihwal Lee commented on HDFS-5583:
--

The client-side logic will be done in HDFS-5924. If the client-side change is 
missing, the OOB ack will simply treated as an error by clients.

> Make DN send an OOB Ack on upgrade-shutdown
> ---
>
> Key: HDFS-5583
> URL: https://issues.apache.org/jira/browse/HDFS-5583
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>
> Add an ability for data nodes to send an OOB response in order to indicate an 
> upcoming upgrade-restart. Client should ignore the pipeline error from the 
> node for a configured amount of time and try reconstruct the pipeline without 
> excluding the restarted node.  If the node does not come back in time, 
> regular pipeline recovery should happen.
> This feature is useful for the applications with a need to keep blocks local. 
> If the upgrade-restart is fast, the wait is preferable to losing locality.  
> It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5583) Make DN send an OOB Ack on upgrade-shutdown

2014-02-10 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5583:
-

Summary: Make DN send an OOB Ack on upgrade-shutdown  (was: Add OOB upgrade 
response and client-side logic for writes)

> Make DN send an OOB Ack on upgrade-shutdown
> ---
>
> Key: HDFS-5583
> URL: https://issues.apache.org/jira/browse/HDFS-5583
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>
> Add an ability for data nodes to send an OOB response in order to indicate an 
> upcoming upgrade-restart. Client should ignore the pipeline error from the 
> node for a configured amount of time and try reconstruct the pipeline without 
> excluding the restarted node.  If the node does not come back in time, 
> regular pipeline recovery should happen.
> This feature is useful for the applications with a need to keep blocks local. 
> If the upgrade-restart is fast, the wait is preferable to losing locality.  
> It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5924) Client-side OOB upgrade message processing for writes

2014-02-10 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-5924:


 Summary: Client-side OOB upgrade message processing for writes
 Key: HDFS-5924
 URL: https://issues.apache.org/jira/browse/HDFS-5924
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kihwal Lee
Assignee: Kihwal Lee






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache

2014-02-10 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5810:
---

Attachment: HDFS-5810.020.patch

> Unify mmap cache and short-circuit file descriptor cache
> 
>
> Key: HDFS-5810
> URL: https://issues.apache.org/jira/browse/HDFS-5810
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, 
> HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, 
> HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch, 
> HDFS-5810.020.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.  
> Since mmaps are granted corresponding to file descriptors in the cache 
> (currently FileInputStreamCache), they have to be tracked together to do 
> "smarter" things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-02-10 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897505#comment-13897505
 ] 

Wilfred Spiegelenburg commented on HDFS-4858:
-

I agree with Aaron: the change could be a simple change confined to the 
DatanodeProtocolClientSideTranslatorPB and leave the Client as is. That will 
remove the change of regressions in other areas that rely on the Client.
Whether you want to use Client#getTimeout or Client#getPingInterval that is up 
to you to decide.

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Konstantin Boudnik
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5888) Cannot get the FileStatus of the root inode from the new Globber

2014-02-10 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897478#comment-13897478
 ] 

Andrew Wang commented on HDFS-5888:
---

I poked around to debug the failing test. It turns out we have some FC only 
code in TestGlobPaths#TestGlobFillsInScheme:

{code}
  if (fc != null) {
// If we're using FileContext, then we can list a file:/// URI.
// Since everyone should have the root directory, we list that.
statuses = wrap.globStatus(new Path("file:///"),
new AcceptAllPathFilter());
Assert.assertEquals(1, statuses.length);
Path filePath = statuses[0].getPath();
Assert.assertEquals("file", filePath.toUri().getScheme());
Assert.assertEquals("/", filePath.toUri().getPath());
  }
{code}

The tricky part here is that the default filesystem for this FileContext is an 
HDFS, which is why Jenkins is picking up "localhost:port" for the authority in 
Globber#authorityFromPath:

{code}
authority = fc.getDefaultFileSystem().getUri().getAuthority();
{code}

If I change it to this, the test passes:

{code}
authority = fc.getFSofPath(path).getUri().getAuthority();
{code}

I think the error stems from how file:// URIs have a null authority, and we 
shouldn't fill it in. I think the fix is to use getFSofPath for both FC and FS 
in authorityFromPath.

> Cannot get the FileStatus of the root inode from the new Globber
> 
>
> Key: HDFS-5888
> URL: https://issues.apache.org/jira/browse/HDFS-5888
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Andrew Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5888.002.patch
>
>
> We can no longer get the correct FileStatus of the root inode "/" from the 
> Globber.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5923) Do not persist the ACL bit in the FsPermission

2014-02-10 Thread Fengdong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897475#comment-13897475
 ] 

Fengdong Yu commented on HDFS-5923:
---

Thanks Zhao Jing.

Another question,  HDFS-5968 has serialized FsImage using Protobuf, then does 
that also serialized ACL states?  I don't think we've done it. because 
HDFS-4685 not merged to trunk yet.


> Do not persist the ACL bit in the FsPermission
> --
>
> Key: HDFS-5923
> URL: https://issues.apache.org/jira/browse/HDFS-5923
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode, security
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> The current implementation persists and ACL bit in FSImage and editlogs. 
> Moreover, the security decisions also depend on whether the bit is set.
> The problem here is that we have to maintain the implicit invariant, which is 
> the ACL bit is set if and only if the the inode has AclFeature. The invariant 
> has to be maintained everywhere otherwise it can lead to a security 
> vulnerability. In the worst case, an attacker can toggle the bit and bypass 
> the ACL checks.
> The jira proposes to treat the ACL bit as a transient bit. The bit should not 
> be persisted onto the disk, neither it should affect any security decisions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-02-10 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897468#comment-13897468
 ] 

Aaron T. Myers commented on HDFS-4858:
--

bq. It may seem a simple fix, but you are absolutely right this will affect 
everything that is using Client.java, and there is a lot of things our there, 
such as your TaskTracker, which we don't know about, but can break them because 
of that. Why don't we open a separate jira for your proposal.

If you want to do a small fix that is just isolated to the DN, and has no 
farther-reaching implications, then my suggestion would be to remove the 
changes to {{Client}} from this patch and change the call to 
{{Client#getTimeout}} in 
{{DatanodeProtocolClientSideTranslatorPB#createNamenode}} to instead call 
{{Client#getPingInterval}}. This should have the same net effect for DN RPCs 
without possibly impacting anything else.

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Konstantin Boudnik
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5923) Do not persist the ACL bit in the FsPermission

2014-02-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897466#comment-13897466
 ] 

Jing Zhao commented on HDFS-5923:
-

Hi Fengdong, here Haohui refers to the ACL bit, not the whole ACL state. The 
ACL information will still be persisted in editlog and fsimage.

> Do not persist the ACL bit in the FsPermission
> --
>
> Key: HDFS-5923
> URL: https://issues.apache.org/jira/browse/HDFS-5923
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode, security
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> The current implementation persists and ACL bit in FSImage and editlogs. 
> Moreover, the security decisions also depend on whether the bit is set.
> The problem here is that we have to maintain the implicit invariant, which is 
> the ACL bit is set if and only if the the inode has AclFeature. The invariant 
> has to be maintained everywhere otherwise it can lead to a security 
> vulnerability. In the worst case, an attacker can toggle the bit and bypass 
> the ACL checks.
> The jira proposes to treat the ACL bit as a transient bit. The bit should not 
> be persisted onto the disk, neither it should affect any security decisions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5923) Do not persist the ACL bit in the FsPermission

2014-02-10 Thread Fengdong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897462#comment-13897462
 ] 

Fengdong Yu commented on HDFS-5923:
---

Does that all Acl setting disappeaered after NN restart if we don't persisit 
ACL state in the fsImage?

> Do not persist the ACL bit in the FsPermission
> --
>
> Key: HDFS-5923
> URL: https://issues.apache.org/jira/browse/HDFS-5923
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode, security
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> The current implementation persists and ACL bit in FSImage and editlogs. 
> Moreover, the security decisions also depend on whether the bit is set.
> The problem here is that we have to maintain the implicit invariant, which is 
> the ACL bit is set if and only if the the inode has AclFeature. The invariant 
> has to be maintained everywhere otherwise it can lead to a security 
> vulnerability. In the worst case, an attacker can toggle the bit and bypass 
> the ACL checks.
> The jira proposes to treat the ACL bit as a transient bit. The bit should not 
> be persisted onto the disk, neither it should affect any security decisions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5923) Do not persist the ACL bit in the FsPermission

2014-02-10 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-5923:


 Summary: Do not persist the ACL bit in the FsPermission
 Key: HDFS-5923
 URL: https://issues.apache.org/jira/browse/HDFS-5923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai


The current implementation persists and ACL bit in FSImage and editlogs. 
Moreover, the security decisions also depend on whether the bit is set.

The problem here is that we have to maintain the implicit invariant, which is 
the ACL bit is set if and only if the the inode has AclFeature. The invariant 
has to be maintained everywhere otherwise it can lead to a security 
vulnerability. In the worst case, an attacker can toggle the bit and bypass the 
ACL checks.

The jira proposes to treat the ACL bit as a transient bit. The bit should not 
be persisted onto the disk, neither it should affect any security decisions.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-02-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897452#comment-13897452
 ] 

Konstantin Shvachko commented on HDFS-4858:
---

Ok, sounds like you don't want it fixed in this release.

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Konstantin Boudnik
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache

2014-02-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897447#comment-13897447
 ] 

Colin Patrick McCabe commented on HDFS-5810:


munmap is going to be manipulating things in memory; mmap often has to hit 
disk.  That's why the latter is more expensive.  Recent Linux kernels have more 
fine-grained locking in this area, although I'm not an expert on that area of 
the kernel.  We can't do I/O while holding a global client-side lock-- clients 
like HBase have on the order of 10k open files and we don't want to block 
everyone.

bq. ClientContext#getFromConf, can we push the creation of a new DFSClient.Conf 
into #get when it's necessary? Seems better to avoid doing all those hash 
lookups.

That method is really only for tests, where it's inconvenient to dig around to 
get a DFSClient.Conf.  I will add a comment explaining that this is mostly for 
testing.  (I think JspHelper uses it too.)

bq. We removed the javadoc parameter descriptions in a few places, some of 
which were helpful (e.g. len of -1 means read as many bytes as possible). Could 
we add the one-line docs back to the builder variables?

Good idea.  I added javadoc for the BlockReaderFactory members.

bq. Mind adding "dfs.client.cached.conn.retry" to hdfs-default.xml?

OK.

bq. cacheTries now counts down instead of counting up, so I think it needs a 
new name. cacheTriesRemaining isn't great, but something like that.

ok

bq. cacheTries used to also only tick when we got a stale peer out of the 
cache. Now, nextTcpPeer and nextDomainPeer tick cacheTries unconditionally.

The effect is the same, since if we get a non-stale (i.e. usable) peer out of 
the cache, we're done.  Centralizing it is a good idea since it avoids the kind 
of bugs we had in the past where we forgot to handle certain kinds of retries 
correctly.

bq. Previously, we would disable domain sockets or throw an exception if we hit 
an error when using a new Peer (domain or TCP respectively). Now, we don't know 
if a peer is cached or new, and spin until we run out of cacheTries (which 
isn't really related here).

OK, that's fair.  That variable is supposed to be about how many times we'll 
try the *cache*, not how many times we'll retry in general.  Fixed.

> Unify mmap cache and short-circuit file descriptor cache
> 
>
> Key: HDFS-5810
> URL: https://issues.apache.org/jira/browse/HDFS-5810
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, 
> HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, 
> HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.  
> Since mmaps are granted corresponding to file descriptors in the cache 
> (currently FileInputStreamCache), they have to be tracked together to do 
> "smarter" things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-02-10 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897445#comment-13897445
 ] 

Aaron T. Myers commented on HDFS-4858:
--

Even if we choose to not fix this in a more general way in this JIRA, I don't 
think we should be changing the default behavior of whether or not to do client 
pings in this patch. That change also has the potential to affect things well 
beyond HDFS.

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Konstantin Boudnik
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout

2014-02-10 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897431#comment-13897431
 ] 

Konstantin Shvachko commented on HDFS-4858:
---

I understand now what you mean. So this is the same problem, but you want to 
fix it in a generic way.
It may seem a simple fix, but you are absolutely right this will affect 
everything that is using Client.java, and there is a lot of things our there, 
such as your TaskTracker, which we don't know about, but can break them because 
of that. Why don't we open a separate jira for your proposal.
Are you OK with committing this?

My +1

> HDFS DataNode to NameNode RPC should timeout
> 
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
>Reporter: Jagane Sundar
>Assignee: Konstantin Boudnik
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval 
> 14000. This configuration means that the IPC Client (DataNode, in this case) 
> should timeout in 14000 seconds if the Standby NameNode does not respond to a 
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any 
> reason, the DataNodes that are heartbeating to this Standby get stuck forever 
> while trying to sendHeartbeat. See Stack trace included below. When the 
> Standby NameNode comes back up, we find that the DataNode never re-registers 
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in 
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When 
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the 
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to 
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data]  heartbeating to 
> vmhost6-vm1/10.10.10.151:8020):
>   State: WAITING
>   Blocked count: 23843
>   Waited count: 45676
>   Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
> 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-10 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5920:


Description: 
This jira provides rollback functionality for NameNode and JournalNode in 
rolling upgrade.

Currently the proposed rollback for rolling upgrade is:
1. Shutdown both NN
2. Start one of the NN using "-rollingUpgrade rollback" option
3. This NN will load the special fsimage right before the upgrade marker, then 
discard all the editlog segments after the txid of the fsimage
4. The NN will also send RPC requests to all the JNs to discard editlog 
segments. This call expects response from all the JNs. The NN will keep running 
if the call succeeds.
5. We start the other NN using bootstrapstandby rather than "-rollingUpgrade 
rollback"

  was:This jira provides rollback functionality for NameNode and JournalNode in 
rolling upgrade.


> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch, 
> HDFS-5920.001.patch
>
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.
> Currently the proposed rollback for rolling upgrade is:
> 1. Shutdown both NN
> 2. Start one of the NN using "-rollingUpgrade rollback" option
> 3. This NN will load the special fsimage right before the upgrade marker, 
> then discard all the editlog segments after the txid of the fsimage
> 4. The NN will also send RPC requests to all the JNs to discard editlog 
> segments. This call expects response from all the JNs. The NN will keep 
> running if the call succeeds.
> 5. We start the other NN using bootstrapstandby rather than "-rollingUpgrade 
> rollback"



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-10 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5920:


Attachment: HDFS-5920.001.patch

Update the patch:
# address Suresh's comments
# add unit tests for JN's rollback
# fix a bug in JN to update the committedTxnId after discarding journal 
segments.

> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch, 
> HDFS-5920.001.patch
>
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5914) Incorporate ACLs with the changes from HDFS-5698

2014-02-10 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897393#comment-13897393
 ] 

Haohui Mai commented on HDFS-5914:
--

Thanks Chris for the comments.

The v1 patch no longer serializes the ACLs for a symlink.

Based on the discussion of HDFS-5899, this patch removes the 
{{TestAclConfigFlag#testFsImage}} test.

> Incorporate ACLs with the changes from HDFS-5698
> 
>
> Key: HDFS-5914
> URL: https://issues.apache.org/jira/browse/HDFS-5914
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode, security
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5914.000.patch, HDFS-5914.001.patch
>
>
> HDFS-5698 uses protobuf to serialize the FSImage. The code needs to be 
> updated to work with these changes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5914) Incorporate ACLs with the changes from HDFS-5698

2014-02-10 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5914:
-

Attachment: HDFS-5914.001.patch

> Incorporate ACLs with the changes from HDFS-5698
> 
>
> Key: HDFS-5914
> URL: https://issues.apache.org/jira/browse/HDFS-5914
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode, security
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5914.000.patch, HDFS-5914.001.patch
>
>
> HDFS-5698 uses protobuf to serialize the FSImage. The code needs to be 
> updated to work with these changes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5899) Add configuration flag to disable/enable support for ACLs.

2014-02-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897387#comment-13897387
 ] 

Chris Nauroth commented on HDFS-5899:
-

Both [~cmccabe] and [~wheat9] have expressed concerns about causing pain for 
administrators if we have code that aborts intentionally while loading fsimage 
or edits, so I think I need to reconsider this.

Regarding skipping enforcement, my concern is the risk of unintentionally 
widening permissions due to interactions with the mask entry.  (The full 
explanation is in my prior comment.)

Here is a compromise proposal.  Let's reject the API calls when 
{{dfs.namenode.acls.enabled}} is false, but let's still load *and enforce* all 
existing ACLs found in fsimage or edits.  I expect that addresses the concerns 
about administrative pain, and it addresses my concerns about weakening 
enforcement.  This does mean that the config flag is not a hard restriction, 
but admins who really want to nuke all ACLs can still use the procedure I 
described, and I expect this to be a rare occurrence.

It looks like an acceptable compromise to me.  Do others agree?  If so, then 
I'll file a new issue for the change.  Thank you, Colin and Haohui.

> Add configuration flag to disable/enable support for ACLs.
> --
>
> Key: HDFS-5899
> URL: https://issues.apache.org/jira/browse/HDFS-5899
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: HDFS ACLs (HDFS-4685)
>
> Attachments: HDFS-5899.1.patch, HDFS-5899.2.patch
>
>
> Add a new configuration property that allows administrators to toggle support 
> for HDFS ACLs on/off.  By default, the flag will be off.  This is a 
> conservative choice, and administrators interested in using ACLs can enable 
> it explicitly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-95) UnknownHostException if the system can't determine its own name and you go DNS.getIPs("name-of-an-unknown-interface");

2014-02-10 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HDFS-95.


   Resolution: Fixed
Fix Version/s: 0.21.0
 Assignee: Steve Loughran

fixed in HADOOP-3426

> UnknownHostException if the system can't determine its own name and you go 
> DNS.getIPs("name-of-an-unknown-interface");
> --
>
> Key: HDFS-95
> URL: https://issues.apache.org/jira/browse/HDFS-95
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 0.21.0
>
>
> If you give an interface that doesnt exist, DNS.getIPs falls back to 
> InetAddress.getLocalHost().getHostAddress()
> But there's an assumption there: that InetAddress.getLocalHost(). is valid. 
> If it doesnt resolve properly, you get an UnknownHostException
> java.net.UnknownHostException: k2: k2
>   at java.net.InetAddress.getLocalHost(InetAddress.java:1353)
>   at org.apache.hadoop.net.DNS.getIPs(DNS.java:96)
>   at 
> org.apache.hadoop.net.TestDNS.testIPsOfUnknownInterface(TestDNS.java:73)
> It is possible to catch this and return something else. The big question: 
> what to fall back to? 127.0.0.1 would be an obvious choice



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897377#comment-13897377
 ] 

Hudson commented on HDFS-5921:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5142 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5142/])
HDFS-5921. Cannot browse file system via NN web UI if any directory has the 
sticky bit set. Contributed by Aaron T. Myers. (atm: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1566916)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/explorer.js


> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Fix For: 2.3.0
>
> Attachments: HDFS-5921.patch, HDFS-5921.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5921:
-

  Resolution: Fixed
   Fix Version/s: 2.3.0
Target Version/s: 2.3.0  (was: 2.4.0)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I've just committed this to trunk, branch-2, and branch-2.3.

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Fix For: 2.3.0
>
> Attachments: HDFS-5921.patch, HDFS-5921.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5888) Cannot get the FileStatus of the root inode from the new Globber

2014-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897358#comment-13897358
 ] 

Hadoop QA commented on HDFS-5888:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627683/HDFS-5888.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.TestGlobPaths

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6102//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6102//console

This message is automatically generated.

> Cannot get the FileStatus of the root inode from the new Globber
> 
>
> Key: HDFS-5888
> URL: https://issues.apache.org/jira/browse/HDFS-5888
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Andrew Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5888.002.patch
>
>
> We can no longer get the correct FileStatus of the root inode "/" from the 
> Globber.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897357#comment-13897357
 ] 

Aaron T. Myers commented on HDFS-5921:
--

Since Jenkins came back clean I'm going to go ahead and commit this based on 
Andrew and Haohui's +1's.

Thanks for the quick reviews, gents.

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-5921.patch, HDFS-5921.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897297#comment-13897297
 ] 

Hadoop QA commented on HDFS-5921:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628058/HDFS-5921.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6101//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6101//console

This message is automatically generated.

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-5921.patch, HDFS-5921.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5922) DN heartbeat thread can get stuck in tight loop

2014-02-10 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897282#comment-13897282
 ] 

Arpit Agarwal commented on HDFS-5922:
-

That sounds fine too. Thanks.

> DN heartbeat thread can get stuck in tight loop
> ---
>
> Key: HDFS-5922
> URL: https://issues.apache.org/jira/browse/HDFS-5922
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Arpit Agarwal
>
> We saw an issue recently on a test cluster where one of the DN threads was 
> consuming 100% of a single CPU. Running jstack indicated that it was the DN 
> heartbeat thread. I believe I've tracked down the cause to a bug in the 
> accounting around the value of {{pendingReceivedRequests}}.
> More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads

2014-02-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897283#comment-13897283
 ] 

Hudson commented on HDFS-5915:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5141 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5141/])
HDFS-5915. Refactor FSImageFormatProtobuf to simplify cross section reads. 
Contributed by Haohui Mai. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1566824)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FSImageFormatPBSnapshot.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeduplicationMap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageStorageInspector.java


> Refactor FSImageFormatProtobuf to simplify cross section reads
> --
>
> Key: HDFS-5915
> URL: https://issues.apache.org/jira/browse/HDFS-5915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 3.0.0
>
> Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch
>
>
> The PB-based FSImage puts the user name and the group name into a separate 
> section for deduplication. This jira refactor the code so that it is easier 
> to apply the same techniques for other types of data (e.g., 
> {{INodeReference}})



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads

2014-02-10 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5915:


Affects Version/s: 3.0.0

> Refactor FSImageFormatProtobuf to simplify cross section reads
> --
>
> Key: HDFS-5915
> URL: https://issues.apache.org/jira/browse/HDFS-5915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 3.0.0
>
> Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch
>
>
> The PB-based FSImage puts the user name and the group name into a separate 
> section for deduplication. This jira refactor the code so that it is easier 
> to apply the same techniques for other types of data (e.g., 
> {{INodeReference}})



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads

2014-02-10 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5915:


  Resolution: Fixed
   Fix Version/s: 3.0.0
Target Version/s: 3.0.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

+1

 I committed this to trunk.  Haohui, thank you for the patch.

> Refactor FSImageFormatProtobuf to simplify cross section reads
> --
>
> Key: HDFS-5915
> URL: https://issues.apache.org/jira/browse/HDFS-5915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 3.0.0
>
> Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch
>
>
> The PB-based FSImage puts the user name and the group name into a separate 
> section for deduplication. This jira refactor the code so that it is easier 
> to apply the same techniques for other types of data (e.g., 
> {{INodeReference}})



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads

2014-02-10 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5915:


Component/s: namenode

> Refactor FSImageFormatProtobuf to simplify cross section reads
> --
>
> Key: HDFS-5915
> URL: https://issues.apache.org/jira/browse/HDFS-5915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 3.0.0
>
> Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch
>
>
> The PB-based FSImage puts the user name and the group name into a separate 
> section for deduplication. This jira refactor the code so that it is easier 
> to apply the same techniques for other types of data (e.g., 
> {{INodeReference}})



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache

2014-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897176#comment-13897176
 ] 

Hadoop QA commented on HDFS-5810:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628100/HDFS-5810.019.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6103//console

This message is automatically generated.

> Unify mmap cache and short-circuit file descriptor cache
> 
>
> Key: HDFS-5810
> URL: https://issues.apache.org/jira/browse/HDFS-5810
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, 
> HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, 
> HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.  
> Since mmaps are granted corresponding to file descriptors in the cache 
> (currently FileInputStreamCache), they have to be tracked together to do 
> "smarter" things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache

2014-02-10 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897170#comment-13897170
 ] 

Andrew Wang commented on HDFS-5810:
---

Hi Colin, some replies and new comments. I looked at the remaining parts of the 
previous patch, I haven't looked at the newest rev yet:

Replies:
bq. 
Sure, let's just doc it.

bq. polymorphic Object in SCReplica
Sure, this is just a style nit. If you tried it the other way and it looked 
worse, it's fine to leave it as is.

bq. 
I guess this is for my own edification, but isn't munmap going to be 
approximately the same cost as mmap? Both involve updating the page tables and 
a TLB flush AFAIK, which should be order microseconds. This could be pushed up 
to milliseconds if the page tables are swapped out, but that's again an issue 
for both. I'd like to be internally consistent with regard to our locking, if 
it's a performance argument.

Overall, I feel like microseconds are not a big deal, and mmap/munmap 
themselves have to grab a kernel lock. The code savings from removing the CV 
also aren't bad, since we could reduce the polymorphism of SCReplica#mmapData.

Some new comments too (I think I've looked at all the changed files at this 
point):

ClientContext:
* ClientContext#confAsString has a dupe of socketCacheExpiry. Do we also need 
the mmap cache settings here?
* ClientContext#getFromConf, can we push the creation of a new DFSClient.Conf 
into #get when it's necessary? Seems better to avoid doing all those hash 
lookups.

BlockReaderFactory:
* We removed the javadoc parameter descriptions in a few places, some of which 
were helpful (e.g. {{len}} of {{-1}} means read as many bytes as possible). 
Could we add the one-line docs back to the builder variables?
* Mind adding "dfs.client.cached.conn.retry" to hdfs-default.xml?
* cacheTries now counts down instead of counting up, so I think it needs a new 
name. cacheTriesRemaining isn't great, but something like that.
* cacheTries used to also only tick when we got a stale peer out of the cache. 
Now, nextTcpPeer and nextDomainPeer tick cacheTries unconditionally.
* Previously, we would disable domain sockets or throw an exception if we hit 
an error when using a new Peer (domain or TCP respectively). Now, we don't know 
if a peer is cached or new, and spin until we run out of cacheTries (which 
isn't really related here).

> Unify mmap cache and short-circuit file descriptor cache
> 
>
> Key: HDFS-5810
> URL: https://issues.apache.org/jira/browse/HDFS-5810
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, 
> HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, 
> HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.  
> Since mmaps are granted corresponding to file descriptors in the cache 
> (currently FileInputStreamCache), they have to be tracked together to do 
> "smarter" things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Work started] (HDFS-5922) DN heartbeat thread can get stuck in tight loop

2014-02-10 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5922 started by Arpit Agarwal.

> DN heartbeat thread can get stuck in tight loop
> ---
>
> Key: HDFS-5922
> URL: https://issues.apache.org/jira/browse/HDFS-5922
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Arpit Agarwal
>
> We saw an issue recently on a test cluster where one of the DN threads was 
> consuming 100% of a single CPU. Running jstack indicated that it was the DN 
> heartbeat thread. I believe I've tracked down the cause to a bug in the 
> accounting around the value of {{pendingReceivedRequests}}.
> More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-5922) DN heartbeat thread can get stuck in tight loop

2014-02-10 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HDFS-5922:
---

Assignee: Arpit Agarwal

> DN heartbeat thread can get stuck in tight loop
> ---
>
> Key: HDFS-5922
> URL: https://issues.apache.org/jira/browse/HDFS-5922
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Arpit Agarwal
>
> We saw an issue recently on a test cluster where one of the DN threads was 
> consuming 100% of a single CPU. Running jstack indicated that it was the DN 
> heartbeat thread. I believe I've tracked down the cause to a bug in the 
> accounting around the value of {{pendingReceivedRequests}}.
> More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache

2014-02-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897159#comment-13897159
 ] 

Colin Patrick McCabe commented on HDFS-5810:


I uploaded a new version which is rebased on trunk.  It changes the "caller 
strings" for dumping stack traces, uses 
{{dfs.client.read.shortcircuit.streams.cache.size}} as an upper bound on the 
size of both mmapped and non-mmapped replicas, and uses {{TimeUnit}} for time 
conversions.

I changed the handling of {{outstandingMmapCount}} a little bit.  Although we 
still track this stat, we don't try to cap the number of outstanding mmaps.  
That is up to the caller code, not to us.  This is similar to how we handle 
opening new FDs in general... we do it on request, no matter how many existing 
FDs there are.  Only when something is returned to the cache do we apply the 
limits.

> Unify mmap cache and short-circuit file descriptor cache
> 
>
> Key: HDFS-5810
> URL: https://issues.apache.org/jira/browse/HDFS-5810
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, 
> HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, 
> HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.  
> Since mmaps are granted corresponding to file descriptors in the cache 
> (currently FileInputStreamCache), they have to be tracked together to do 
> "smarter" things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5922) DN heartbeat thread can get stuck in tight loop

2014-02-10 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897155#comment-13897155
 ] 

Aaron T. Myers commented on HDFS-5922:
--

Hi Arpit, yes please do take a look at fixing it. I was hoping you'd notice it 
since I'm less familiar with this code. :)

I didn't file it as a blocker against 2.3 because the window for hitting this 
is really quite narrow, it's not the end of the world if a DN ends up hitting 
this, and I don't want to further hold up the 2.3.0 release. I personally think 
we should target this for 2.3.1 / 2.4.0.

That said, if you think this is more serious than I do, then we can certainly 
raise the priority and target it for 2.3.0 if you want.

> DN heartbeat thread can get stuck in tight loop
> ---
>
> Key: HDFS-5922
> URL: https://issues.apache.org/jira/browse/HDFS-5922
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>
> We saw an issue recently on a test cluster where one of the DN threads was 
> consuming 100% of a single CPU. Running jstack indicated that it was the DN 
> heartbeat thread. I believe I've tracked down the cause to a bug in the 
> accounting around the value of {{pendingReceivedRequests}}.
> More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache

2014-02-10 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5810:
---

Attachment: HDFS-5810.019.patch

> Unify mmap cache and short-circuit file descriptor cache
> 
>
> Key: HDFS-5810
> URL: https://issues.apache.org/jira/browse/HDFS-5810
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, 
> HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, 
> HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.  
> Since mmaps are granted corresponding to file descriptors in the cache 
> (currently FileInputStreamCache), they have to be tracked together to do 
> "smarter" things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5922) DN heartbeat thread can get stuck in tight loop

2014-02-10 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897151#comment-13897151
 ] 

Arpit Agarwal commented on HDFS-5922:
-

Hi Aaron,

Good catch and thanks for the detailed explanation.

I can fix it today if you haven't started. This probably needs to be in 2.3.



> DN heartbeat thread can get stuck in tight loop
> ---
>
> Key: HDFS-5922
> URL: https://issues.apache.org/jira/browse/HDFS-5922
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>
> We saw an issue recently on a test cluster where one of the DN threads was 
> consuming 100% of a single CPU. Running jstack indicated that it was the DN 
> heartbeat thread. I believe I've tracked down the cause to a bug in the 
> accounting around the value of {{pendingReceivedRequests}}.
> More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5922) DN heartbeat thread can get stuck in tight loop

2014-02-10 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897137#comment-13897137
 ] 

Aaron T. Myers commented on HDFS-5922:
--

In the heartbeat thread in BPServiceActor, we have the following:

{code}
if (waitTime > 0 && pendingReceivedRequests == 0) {
  try {
   pendingIncrementalBRperStorage.wait(waitTime);
{code}

This means that if for some reason the value of {{pendingReceivedRequests}} 
permanently stays positive then we will never sleep in the heartbeat thread. 
The question, then, is what can cause this value to stay positive.

I believe the issue is that in 
{{BPServiceActor#addPendingReplicationBlockInfo}} we might not increase the 
size of the {{PerStoragePendingIncrementalBR}} if there is already an entry for 
a given block in there:

{code}
// Make sure another entry for the same block is first removed.
// There may only be one such entry.
for (Map.Entry entry :
  pendingIncrementalBRperStorage.entrySet()) {
  if (entry.getValue().removeBlockInfo(bInfo)) {
break;
  }
}
getIncrementalBRMapForStorage(storageUuid).putBlockInfo(bInfo);
{code}

But in {{BPServiceActor#notifyNamenodeBlockImmediately}} we will always 
increment {{pendingReceivedRequests}} regardless of whether or not there was 
already an entry for the block:

{code}
  void notifyNamenodeBlockImmediately(
  ReceivedDeletedBlockInfo bInfo, String storageUuid) {
synchronized (pendingIncrementalBRperStorage) {
  addPendingReplicationBlockInfo(bInfo, storageUuid);
  pendingReceivedRequests++;
  pendingIncrementalBRperStorage.notifyAll();
}
  }
{code}

Then, in {{BPServiceActor#reportReceivedDeletedBlocks}}, we will only subtract 
the number of blocks that are actually in the 
{{PerStoragePendingIncrementalBR}} from {{pendingReceivedRequests}}:

{code}
  ReceivedDeletedBlockInfo[] rdbi = perStorageMap.dequeueBlockInfos();
  pendingReceivedRequests =
  (pendingReceivedRequests > rdbi.length ?
  (pendingReceivedRequests - rdbi.length) : 0);
{code}

This means that if we ever call 
{{BPServiceActor#notifyNamenodeBlockImmediately}} twice without calling 
{{BPServiceActor#reportReceivedDeletedBlocks}} in between, we will have 
{{pendingReceivedRequests}} at 2, but then only subtract 1 from it.

[~andrew.wang] also pointed out offline that it is perhaps incorrect to be 
subtracting the number of _deleted_ blocks from {{pendingReceivedRequests}} in 
{{BPServiceActor#reportReceivedDeletedBlocks}}, but the result of that is 
somewhat less serious, since in that case the worst case is just that we send a 
somewhat delayed IBR.

> DN heartbeat thread can get stuck in tight loop
> ---
>
> Key: HDFS-5922
> URL: https://issues.apache.org/jira/browse/HDFS-5922
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>
> We saw an issue recently on a test cluster where one of the DN threads was 
> consuming 100% of a single CPU. Running jstack indicated that it was the DN 
> heartbeat thread. I believe I've tracked down the cause to a bug in the 
> accounting around the value of {{pendingReceivedRequests}}.
> More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897124#comment-13897124
 ] 

Hadoop QA commented on HDFS-5921:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628038/HDFS-5921.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6100//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6100//console

This message is automatically generated.

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-5921.patch, HDFS-5921.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5922) DN heartbeat thread can get stuck in tight loop

2014-02-10 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5922:


 Summary: DN heartbeat thread can get stuck in tight loop
 Key: HDFS-5922
 URL: https://issues.apache.org/jira/browse/HDFS-5922
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.3.0
Reporter: Aaron T. Myers


We saw an issue recently on a test cluster where one of the DN threads was 
consuming 100% of a single CPU. Running jstack indicated that it was the DN 
heartbeat thread. I believe I've tracked down the cause to a bug in the 
accounting around the value of {{pendingReceivedRequests}}.

More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5914) Incorporate ACLs with the changes from HDFS-5698

2014-02-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897116#comment-13897116
 ] 

Chris Nauroth commented on HDFS-5914:
-

Sorry I missed the HDFS-5915 pre-requisite first time.  A few minor comments:

# {{FSImageFormatPBINode}}: Symlinks don't get ACLs of their own.  Shall we 
skip serialization/deserialization of ACLs here for symlinks?
# {{TestAclConfigFlag#testFsImage}} is failing now, because it allowed loading 
of an fsimage containing an ACL even though ACLs were disabled in 
configuration.  Previously, this was rejected by 
{{FSImageFormat#loadAclFeature}} checking the config flag.
# {{FSImageFormatProtobuf}}: Minor typo: I think {{saveExtendAclSection}} was 
meant to be named {{saveExtendedAclSection}}.


> Incorporate ACLs with the changes from HDFS-5698
> 
>
> Key: HDFS-5914
> URL: https://issues.apache.org/jira/browse/HDFS-5914
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode, security
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5914.000.patch
>
>
> HDFS-5698 uses protobuf to serialize the FSImage. The code needs to be 
> updated to work with these changes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897092#comment-13897092
 ] 

Jing Zhao commented on HDFS-5920:
-

Thanks for the comments Suresh!

bq. JournalNodeRpcServer#doRollingRollback is an empty method
Oops.. I forgot to put "jn.doRollingRollback(journalId, startTxId)" there... 
The functionality has been included in the 000 patch except the call there.

bq. "their editlog starting at or above the given txid."
Here I want to mean ">=". I will update it to make it more clear.

bq. Instead of RollingRollbackRequest in QJournalProtocol, we may be able to 
call it discardSegments?
Will rename it in the next patch.


> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch
>
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-10 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897074#comment-13897074
 ] 

Suresh Srinivas commented on HDFS-5920:
---

Few early comments:
# Instead of RollingRollbackRequest in QJournalProtocol, we may be able to call 
it discardSegments?
# I am assuming that based on your previous comment, 
JournalNodeRpcServer#doRollingRollback is an empty method and you are still 
implementing the functionality.
# "their editlog starting at or above the given txid." Is this correct? Journal 
must delete records starting from given txid. If the transaction ends before 
this txid, then the journal can ignore the request.


> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch
>
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads

2014-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897073#comment-13897073
 ] 

Hadoop QA commented on HDFS-5915:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628034/HDFS-5915.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6098//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6098//console

This message is automatically generated.

> Refactor FSImageFormatProtobuf to simplify cross section reads
> --
>
> Key: HDFS-5915
> URL: https://issues.apache.org/jira/browse/HDFS-5915
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch
>
>
> The PB-based FSImage puts the user name and the group name into a separate 
> section for deduplication. This jira refactor the code so that it is easier 
> to apply the same techniques for other types of data (e.g., 
> {{INodeReference}})



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-02-10 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897069#comment-13897069
 ] 

Jimmy Xiang commented on HDFS-4239:
---

Ping. Can anyone take  a look patch v4? Thanks.

> Means of telling the datanode to stop using a sick disk
> ---
>
> Key: HDFS-4239
> URL: https://issues.apache.org/jira/browse/HDFS-4239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Jimmy Xiang
> Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, 
> hdfs-4239_v4.patch, hdfs-4239_v5.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-10 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897059#comment-13897059
 ] 

Suresh Srinivas commented on HDFS-5920:
---

[~tlipcon], if you have time, can you please take a look at this patch as well?

> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch
>
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897054#comment-13897054
 ] 

Haohui Mai commented on HDFS-5921:
--

Looks good to me. +1

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-5921.patch, HDFS-5921.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-10 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-5920:
--

Status: Open  (was: Patch Available)

> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch
>
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-10 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-5920:
--

Status: Patch Available  (was: Open)

> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch
>
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5921:
-

Attachment: HDFS-5921.patch

Here's an updated patch which does 's/slice/substr/g'.

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-5921.patch, HDFS-5921.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5888) Cannot get the FileStatus of the root inode from the new Globber

2014-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897004#comment-13897004
 ] 

Hadoop QA commented on HDFS-5888:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627683/HDFS-5888.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.TestGlobPaths
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6097//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6097//console

This message is automatically generated.

> Cannot get the FileStatus of the root inode from the new Globber
> 
>
> Key: HDFS-5888
> URL: https://issues.apache.org/jira/browse/HDFS-5888
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Andrew Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5888.002.patch
>
>
> We can no longer get the correct FileStatus of the root inode "/" from the 
> Globber.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896948#comment-13896948
 ] 

Andrew Wang commented on HDFS-5921:
---

+1 pending Haohui's comment and Jenkins. I'd like to see this in 2.3.0 too, 
since it's a rather embarrassing bug.

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-5921.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations

2014-02-10 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-4564:
--

Target Version/s: 2.4.0  (was: 2.3.0)

>From what I understand, this is an existing issue with 2.2. and is NOT a 
>regression. This can patch can go in if need be, but I am moving it to 2.4 to 
>unblock 2.3. Please revert back if you disagree. Thanks!

> Webhdfs returns incorrect http response codes for denied operations
> ---
>
> Key: HDFS-4564
> URL: https://issues.apache.org/jira/browse/HDFS-4564
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: webhdfs
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: HDFS-4564.branch-23.patch, HDFS-4564.branch-23.patch, 
> HDFS-4564.branch-23.patch, HDFS-4564.patch
>
>
> Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's 
> denying operations.  Examples including rejecting invalid proxy user attempts 
> and renew/cancel with an invalid user.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896919#comment-13896919
 ] 

Haohui Mai commented on HDFS-5921:
--

{code}
+var otherExec = ((ctx.current().permission % 10) & 1) == 1;
+res = res.slice(0, res.length - 1) + (otherExec ? 't' : 'T');
{code}

You probably want to use {{substr}} instead of {{slice}}, as {{substr}} usually 
performs better than {{slice}} in this use case. 
(http://jsperf.com/string-slice-vs-substr). Here is an example:

{code}
var exec = ((ctx.current().permission % 10) & 1) == 1;
res = res.substr(0, res.length - 1) + (exec ? 't' : 'T');
{code}

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-5921.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads

2014-02-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896916#comment-13896916
 ] 

Jing Zhao commented on HDFS-5915:
-

+1 pending Jenkins.

> Refactor FSImageFormatProtobuf to simplify cross section reads
> --
>
> Key: HDFS-5915
> URL: https://issues.apache.org/jira/browse/HDFS-5915
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch
>
>
> The PB-based FSImage puts the user name and the group name into a separate 
> section for deduplication. This jira refactor the code so that it is easier 
> to apply the same techniques for other types of data (e.g., 
> {{INodeReference}})



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-10 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5920:


Attachment: HDFS-5920.000.patch

Fixed some bugs and added a simple unit test covering the NN's local directory 
rollback. Still need to add tests for JNs' rollback.

> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch
>
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5921:
-

Attachment: HDFS-5921.patch

Used the wrong file name for the patch.

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-5921.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896904#comment-13896904
 ] 

Hadoop QA commented on HDFS-5921:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6099//console

This message is automatically generated.

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-5921.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5921:
-

Attachment: (was: HDFS-5291.patch)

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5916) provide API to bulk delete directories/files

2014-02-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896902#comment-13896902
 ] 

Sergey Shelukhin commented on HDFS-5916:


1-2-3 are both up to you, for the case I have in mind it should operate like a 
sequence of regular deletes, for (1) probably best-effort, 2 - no, 3 - 
non-atomically. But that could be controlled by parameters.
4 - what do other operations do? As far as I recall some of them can recover.

Can you provide details on how to enforce multiple RPC calls in one for this 
case?  We currently use FileSystem/DistributedFileSystem interface.
The workaround wouldn't work, due to legacy users as well as due to the fact 
that the files/dirs are already in the same path, it's just that we don't want 
to delete all of them - e.g. from /path/A, /path/B/, /path/C/ and /path/D we 
only want to delete B and D (of course with longer lists)

> provide API to bulk delete directories/files
> 
>
> Key: HDFS-5916
> URL: https://issues.apache.org/jira/browse/HDFS-5916
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>
> It would be nice to have an API to delete directories and files in bulk - for 
> example, when deleting Hive partitions or HBase regions in large numbers, the 
> code could avoid many trips to NN. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5921:
-

Target Version/s: 2.4.0
  Status: Patch Available  (was: Open)

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-5291.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5921:
-

Attachment: HDFS-5291.patch

Simple patch which fixes the issue. The code in question should have never been 
referencing the "params" variable at all, and the code which was attempting to 
replace the last character of the string in the case of the sticky bit will not 
in fact replace anything in the string. I tested this manually and it seems to 
work as intended.

> Cannot browse file system via NN web UI if any directory has the sticky bit 
> set
> ---
>
> Key: HDFS-5921
> URL: https://issues.apache.org/jira/browse/HDFS-5921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Critical
> Attachments: HDFS-5291.patch
>
>
> You'll see an error like this in the JS console if any directory has the 
> sticky bit set:
> {noformat}
> 'helper_to_permission': function(chunk, ctx, bodies, params) {
> 
> var exec = ((parms.perm % 10) & 1) == 1;
> Uncaught ReferenceError: parms is not defined
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5916) provide API to bulk delete directories/files

2014-02-10 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896889#comment-13896889
 ] 

Haohui Mai commented on HDFS-5916:
--

I have a few questions:

# What would be the semantic of the call if one of the deletion has failed?
# Should this operation be atomic?
# When should the changes propagate to other users?
# When should happen when the operation happen in the middle of NN failover?

I can't think of good answers of any of these questions, thus it looks to me 
that the semantic at the file system layer is unclear.

Maybe it is better to implement as multiple RPC calls, but the RPC the messages 
are sent in the same packet. Alternatively, if you are able to put the files 
into a single directory then it might solve your problem :-)

> provide API to bulk delete directories/files
> 
>
> Key: HDFS-5916
> URL: https://issues.apache.org/jira/browse/HDFS-5916
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>
> It would be nice to have an API to delete directories and files in bulk - for 
> example, when deleting Hive partitions or HBase regions in large numbers, the 
> code could avoid many trips to NN. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5837) dfs.namenode.replication.considerLoad does not consider decommissioned nodes

2014-02-10 Thread Tao Luo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896883#comment-13896883
 ] 

Tao Luo commented on HDFS-5837:
---

Thanks Konstantin!

> dfs.namenode.replication.considerLoad does not consider decommissioned nodes
> 
>
> Key: HDFS-5837
> URL: https://issues.apache.org/jira/browse/HDFS-5837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.6-alpha, 2.2.0
>Reporter: Bryan Beaudreault
>Assignee: Tao Luo
> Fix For: 2.3.0
>
> Attachments: HDFS-5837.patch, HDFS-5837_B.patch, HDFS-5837_C.patch, 
> HDFS-5837_C_branch_2.2.0.patch, HDFS-5837_branch_2.2.0.patch
>
>
> In DefaultBlockPlacementPolicy, there is a setting 
> dfs.namenode.replication.considerLoad which tries to balance the load of the 
> cluster when choosing replica locations.  This code does not take into 
> account decommissioned nodes.
> The code for considerLoad calculates the load by doing:  TotalClusterLoad / 
> numNodes.  However, numNodes includes decommissioned nodes (which have 0 
> load).  Therefore, the average load is artificially low.  Example:
> TotalLoad = 250
> numNodes = 100
> decommissionedNodes = 70
> remainingNodes = numNodes - decommissionedNodes = 30
> avgLoad = 250/100 = 2.50
> trueAvgLoad = 250 / 30 = 8.33
> If the real load of the remaining 30 nodes is (on average) 8.33, this is more 
> than 2x the calculated average load of 2.50.  This causes these nodes to be 
> rejected as replica locations. The final result is that all nodes are 
> rejected, and no replicas can be placed.  
> See exceptions printed from client during this scenario: 
> https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads

2014-02-10 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5915:
-

Attachment: HDFS-5915.001.patch

> Refactor FSImageFormatProtobuf to simplify cross section reads
> --
>
> Key: HDFS-5915
> URL: https://issues.apache.org/jira/browse/HDFS-5915
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5915.000.patch, HDFS-5915.001.patch
>
>
> The PB-based FSImage puts the user name and the group name into a separate 
> section for deduplication. This jira refactor the code so that it is easier 
> to apply the same techniques for other types of data (e.g., 
> {{INodeReference}})



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5899) Add configuration flag to disable/enable support for ACLs.

2014-02-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896858#comment-13896858
 ] 

Chris Nauroth commented on HDFS-5899:
-

bq. I agree that we should never wipe ACLs automatically. But what's the 
problem with just not enforcing them when dfs.namenode.acls.enabled is false? 
Why do we have to fail to start up? That seems like it will introduce problems 
for admins.

If ACLs were defined, but not enforced, then the cluster would be in a state of 
partial enforcement.  The traditional permission bits would be enforced, but 
the ACLs would be ignored during permission checks.  In all respects, it would 
appear to end users that they have set an ACL correctly on their file, but they 
wouldn't know that the rules aren't really being enforced.  This could open a 
risk of unauthorized access.  It's particularly dangerous when we consider that 
for an inode with an ACL, the group permission bits store the mask, not the 
group permissions.  The default setting of the mask is calculated as the union 
of permissions for all named user entries, named group entries, and the unnamed 
group entry in the ACL.  This union may be wider than the permissions intended 
for the file's group.

The combination of {{dfs.permissions.enabled=false}} + 
{{dfs.namenode.acls.enabled=true}} would work for deployments that want to 
allow setting of ACLs but skip enforcement (and also skip enforcement of 
permission bits).

The motivation for this patch was to provide a "feature flag".  (Sorry to bring 
that phrase up again and risk confusion with HDFS-5223, but it's the best 
description.)  An admin can leave this toggled off and be guaranteed that the 
feature is completely off, including no consumption of RAM or disk by ACLs.

Note that in order to reach this state, the admin must have toggled ACL support 
on in configuration at some point.  It's off by default, so turning it on was a 
conscious decision.  Then, the admin has a change of heart and decides to turn 
ACLs off, but meanwhile, a user snuck in with a setfacl.  I expect this to be a 
rare situation.

bq. How do you propose that the admin do this?

Our existing tools have it covered.  Startup with ACLs enabled.  Remove ACLs 
using setfacl -x.  There is a recursive option if it's necessary to remove from 
a whole sub-tree.  Enter safe mode.  Save a new checkpoint.  Restart with ACLs 
disabled.

> Add configuration flag to disable/enable support for ACLs.
> --
>
> Key: HDFS-5899
> URL: https://issues.apache.org/jira/browse/HDFS-5899
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: HDFS ACLs (HDFS-4685)
>
> Attachments: HDFS-5899.1.patch, HDFS-5899.2.patch
>
>
> Add a new configuration property that allows administrators to toggle support 
> for HDFS ACLs on/off.  By default, the flag will be off.  This is a 
> conservative choice, and administrators interested in using ACLs can enable 
> it explicitly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5837) dfs.namenode.replication.considerLoad does not consider decommissioned nodes

2014-02-10 Thread Tao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Luo updated HDFS-5837:
--

Attachment: HDFS-5837_C_branch_2.2.0.patch

> dfs.namenode.replication.considerLoad does not consider decommissioned nodes
> 
>
> Key: HDFS-5837
> URL: https://issues.apache.org/jira/browse/HDFS-5837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.6-alpha, 2.2.0
>Reporter: Bryan Beaudreault
>Assignee: Tao Luo
> Fix For: 2.3.0
>
> Attachments: HDFS-5837.patch, HDFS-5837_B.patch, HDFS-5837_C.patch, 
> HDFS-5837_C_branch_2.2.0.patch, HDFS-5837_branch_2.2.0.patch
>
>
> In DefaultBlockPlacementPolicy, there is a setting 
> dfs.namenode.replication.considerLoad which tries to balance the load of the 
> cluster when choosing replica locations.  This code does not take into 
> account decommissioned nodes.
> The code for considerLoad calculates the load by doing:  TotalClusterLoad / 
> numNodes.  However, numNodes includes decommissioned nodes (which have 0 
> load).  Therefore, the average load is artificially low.  Example:
> TotalLoad = 250
> numNodes = 100
> decommissionedNodes = 70
> remainingNodes = numNodes - decommissionedNodes = 30
> avgLoad = 250/100 = 2.50
> trueAvgLoad = 250 / 30 = 8.33
> If the real load of the remaining 30 nodes is (on average) 8.33, this is more 
> than 2x the calculated average load of 2.50.  This causes these nodes to be 
> rejected as replica locations. The final result is that all nodes are 
> rejected, and no replicas can be placed.  
> See exceptions printed from client during this scenario: 
> https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5914) Incorporate ACLs with the changes from HDFS-5698

2014-02-10 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896846#comment-13896846
 ] 

Haohui Mai commented on HDFS-5914:
--

You'll need to apply HDFS-5915 before this patch.

> Incorporate ACLs with the changes from HDFS-5698
> 
>
> Key: HDFS-5914
> URL: https://issues.apache.org/jira/browse/HDFS-5914
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode, security
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5914.000.patch
>
>
> HDFS-5698 uses protobuf to serialize the FSImage. The code needs to be 
> updated to work with these changes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5921:


 Summary: Cannot browse file system via NN web UI if any directory 
has the sticky bit set
 Key: HDFS-5921
 URL: https://issues.apache.org/jira/browse/HDFS-5921
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.3.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Critical


You'll see an error like this in the JS console if any directory has the sticky 
bit set:

{noformat}
'helper_to_permission': function(chunk, ctx, bodies, params) {

var exec = ((parms.perm % 10) & 1) == 1;
Uncaught ReferenceError: parms is not defined
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896844#comment-13896844
 ] 

Jing Zhao commented on HDFS-5920:
-

I will add unit tests later.

> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5920.000.patch
>
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5915) Refactor FSImageFormatProtobuf to simplify cross section reads

2014-02-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896841#comment-13896841
 ] 

Jing Zhao commented on HDFS-5915:
-

The patch looks pretty good to me. It will be better to have unit test for the 
new Loader/SaverContext. +1 after addressing the comment.

> Refactor FSImageFormatProtobuf to simplify cross section reads
> --
>
> Key: HDFS-5915
> URL: https://issues.apache.org/jira/browse/HDFS-5915
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5915.000.patch
>
>
> The PB-based FSImage puts the user name and the group name into a separate 
> section for deduplication. This jira refactor the code so that it is easier 
> to apply the same techniques for other types of data (e.g., 
> {{INodeReference}})



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-10 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5920:


Attachment: HDFS-5920.000.patch

Preliminary patch for review. The patch still depends on some funtionality 
provided by HDFS-5889. 

For rollback (for rolling upgrade) in JNs, this patch simply adds a new RPC 
call "doRollingRollback" (doRollback is used for rollback in HA setup). This 
RollingRollback is idempotent and expect response from all the JNs.

> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5920.000.patch
>
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896828#comment-13896828
 ] 

Jing Zhao commented on HDFS-5920:
-

HDFS-5753 already defines the Rollback option for rolling upgrade. Users can 
use "-rollingUpgrade rollback" to start the NameNode and rollback the NameNode 
to the status before starting the rolling upgrade.

Since NN will do a checkpoint right before the upgrade marker, for rollback, we 
only need to go back to that fsimage, and delete all the editlog segments on or 
above (marker txid - 1). This editlog deletion should happen in both NN's local 
directory and shared storage (JNs for QJM's HA setup).

> Support rollback of rolling upgrade in NameNode and JournalNodes
> 
>
> Key: HDFS-5920
> URL: https://issues.apache.org/jira/browse/HDFS-5920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: journal-node, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>
> This jira provides rollback functionality for NameNode and JournalNode in 
> rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5899) Add configuration flag to disable/enable support for ACLs.

2014-02-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896821#comment-13896821
 ] 

Colin Patrick McCabe commented on HDFS-5899:


bq. dfs.permissions.enabled continues to work as expected, suppressing 
permission checks if set to false, whether the permissions are defined via 
permission bits or ACLs.
bq. The superuser is still immune to all permission checks, whether they come 
from permission bits or ACLs.
bq. If ACLs are not in use, then permission checks go through the exact same 
code path that we have in FSPermissionChecker today. We go down a separate path 
only if the inode has an ACL.

That makes sense to me.

bq. When ACLs are disabled, all APIs related to ACLs will fail intentionally, 
an fsimage containing an ACL will cause the NameNode to abort during startup, 
and ACLs present in the edit log will cause the NameNode to abort. 

bq. Existing ACLs never get wiped automatically. This recovery procedure is a 
conscious decision by the cluster admin.

I agree that we should never wipe ACLs automatically.  But what's the problem 
with just not enforcing them when {{dfs.namenode.acls.enabled}} is false?  Why 
do we have to fail to start up?  That seems like it will introduce problems for 
admins.

bq. If ACLs accidentally crept into the fsimage or edits (i.e. accidentally 
started with ACLs enabled, but now the admin wants to switch them off), then 
the recovery procedure would be to restart with ACLs enabled, remove all ACLs, 
save a new checkpoint, and then restart with ACLs disabled.

How do you propose that the admin do this?

> Add configuration flag to disable/enable support for ACLs.
> --
>
> Key: HDFS-5899
> URL: https://issues.apache.org/jira/browse/HDFS-5899
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS ACLs (HDFS-4685)
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: HDFS ACLs (HDFS-4685)
>
> Attachments: HDFS-5899.1.patch, HDFS-5899.2.patch
>
>
> Add a new configuration property that allows administrators to toggle support 
> for HDFS ACLs on/off.  By default, the flag will be off.  This is a 
> conservative choice, and administrators interested in using ACLs can enable 
> it explicitly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes

2014-02-10 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-5920:
---

 Summary: Support rollback of rolling upgrade in NameNode and 
JournalNodes
 Key: HDFS-5920
 URL: https://issues.apache.org/jira/browse/HDFS-5920
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: journal-node, namenode
Reporter: Jing Zhao
Assignee: Jing Zhao


This jira provides rollback functionality for NameNode and JournalNode in 
rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5888) Cannot get the FileStatus of the root inode from the new Globber

2014-02-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896814#comment-13896814
 ] 

Colin Patrick McCabe commented on HDFS-5888:


build failed because:
{code}
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1078712 bytes for 
Chunk::new
# An error report file with more information is saved as:
# 
/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hs_err_pid17045.log
{code}

> Cannot get the FileStatus of the root inode from the new Globber
> 
>
> Key: HDFS-5888
> URL: https://issues.apache.org/jira/browse/HDFS-5888
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Andrew Wang
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-5888.002.patch
>
>
> We can no longer get the correct FileStatus of the root inode "/" from the 
> Globber.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5805) TestCheckpoint.testCheckpoint fails intermittently on branch2

2014-02-10 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai resolved HDFS-5805.
-

Resolution: Cannot Reproduce

I was not able to reproduce the test failure even a single time in many 
efforts. Closing this for now.

> TestCheckpoint.testCheckpoint fails intermittently on branch2
> -
>
> Key: HDFS-5805
> URL: https://issues.apache.org/jira/browse/HDFS-5805
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>
> {noformat}
> java.lang.AssertionError: Bad value for metric GetEditAvgTime
> Expected: gt(0.0)
>  got: <0.0>
>   at org.junit.Assert.assertThat(Assert.java:780)
>   at 
> org.apache.hadoop.test.MetricsAsserts.assertGaugeGt(MetricsAsserts.java:341)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCheckpoint.testCheckpoint(TestCheckpoint.java:1070)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Work started] (HDFS-5805) TestCheckpoint.testCheckpoint fails intermittently on branch2

2014-02-10 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5805 started by Mit Desai.

> TestCheckpoint.testCheckpoint fails intermittently on branch2
> -
>
> Key: HDFS-5805
> URL: https://issues.apache.org/jira/browse/HDFS-5805
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>
> {noformat}
> java.lang.AssertionError: Bad value for metric GetEditAvgTime
> Expected: gt(0.0)
>  got: <0.0>
>   at org.junit.Assert.assertThat(Assert.java:780)
>   at 
> org.apache.hadoop.test.MetricsAsserts.assertGaugeGt(MetricsAsserts.java:341)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCheckpoint.testCheckpoint(TestCheckpoint.java:1070)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Work stopped] (HDFS-5805) TestCheckpoint.testCheckpoint fails intermittently on branch2

2014-02-10 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5805 stopped by Mit Desai.

> TestCheckpoint.testCheckpoint fails intermittently on branch2
> -
>
> Key: HDFS-5805
> URL: https://issues.apache.org/jira/browse/HDFS-5805
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>
> {noformat}
> java.lang.AssertionError: Bad value for metric GetEditAvgTime
> Expected: gt(0.0)
>  got: <0.0>
>   at org.junit.Assert.assertThat(Assert.java:780)
>   at 
> org.apache.hadoop.test.MetricsAsserts.assertGaugeGt(MetricsAsserts.java:341)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestCheckpoint.testCheckpoint(TestCheckpoint.java:1070)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >