[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)

2012-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448976#comment-13448976
 ] 

Hadoop QA commented on HDFS-3876:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12543870/hdfs-3876.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3148//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3148//console

This message is automatically generated.

> NN should not RPC to self to find trash defaults (causes deadlock)
> --
>
> Key: HDFS-3876
> URL: https://issues.apache.org/jira/browse/HDFS-3876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Todd Lipcon
>Assignee: Eli Collins
>Priority: Blocker
> Attachments: hdfs-3876.txt, hdfs-3876.txt, hdfs-3876.txt, 
> hdfs-3876.txt
>
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
> {{initialize}} function then tries to make an RPC to the same node to find 
> out the defaults.
> - This is happening inside the NN write lock (since it's part of the active 
> initialization). Hence, all of the other handler threads are already blocked 
> waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never 
> enters active state.
> We need to have a general policy that the NN should never make RPCs to itself 
> for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)

2012-09-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448873#comment-13448873
 ] 

Todd Lipcon commented on HDFS-3876:
---

+1 pending Jenkins report

> NN should not RPC to self to find trash defaults (causes deadlock)
> --
>
> Key: HDFS-3876
> URL: https://issues.apache.org/jira/browse/HDFS-3876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Todd Lipcon
>Assignee: Eli Collins
>Priority: Blocker
> Attachments: hdfs-3876.txt, hdfs-3876.txt, hdfs-3876.txt, 
> hdfs-3876.txt
>
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
> {{initialize}} function then tries to make an RPC to the same node to find 
> out the defaults.
> - This is happening inside the NN write lock (since it's part of the active 
> initialization). Hence, all of the other handler threads are already blocked 
> waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never 
> enters active state.
> We need to have a general policy that the NN should never make RPCs to itself 
> for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)

2012-09-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447830#comment-13447830
 ] 

Todd Lipcon commented on HDFS-3876:
---

- When you clobber the trash interval in the configuration, you sohuld do it on 
a copy, rather than modifying the config that the user passed in.

{code}
+  // If we can not determine that trash is enabled server side then
+  // bail rather than potentially deleting a file when trash is enabled.
+  System.err.println("Failed to determine server trash configuration: "
+  + e.getMessage());
+  return false;
{code}

This doesn't seem to be what happens. See the TODO in {{Delete.java}}:
{code}
  // TODO: if the user wants the trash to be used but there is any
  // problem (ie. creating the trash dir, moving the item to be deleted,
  // etc), then the path will just be deleted because moveToTrash returns
  // false and it falls thru to fs.delete.  this doesn't seem right
{code}

if you actually want it to bail, it should probably throw an exception -- or 
return false but remove this comment, and separately address the TODO mentioned 
above.


{code}
+Configuration clientConf = new Configuration(serverConf);
+if (clientTrash) {
+  clientConf.setLong(FS_TRASH_INTERVAL_KEY, 200);
+}
{code}

Since you instantiate the client conf from {{serverConf}}, won't you end up 
with client trash enabled even if {{clientTrash}} is false?

> NN should not RPC to self to find trash defaults (causes deadlock)
> --
>
> Key: HDFS-3876
> URL: https://issues.apache.org/jira/browse/HDFS-3876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Todd Lipcon
>Assignee: Eli Collins
>Priority: Blocker
> Attachments: hdfs-3876.txt, hdfs-3876.txt, hdfs-3876.txt
>
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
> {{initialize}} function then tries to make an RPC to the same node to find 
> out the defaults.
> - This is happening inside the NN write lock (since it's part of the active 
> initialization). Hence, all of the other handler threads are already blocked 
> waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never 
> enters active state.
> We need to have a general policy that the NN should never make RPCs to itself 
> for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)

2012-09-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446862#comment-13446862
 ] 

Hadoop QA commented on HDFS-3876:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12543455/hdfs-3876.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3138//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3138//console

This message is automatically generated.

> NN should not RPC to self to find trash defaults (causes deadlock)
> --
>
> Key: HDFS-3876
> URL: https://issues.apache.org/jira/browse/HDFS-3876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Todd Lipcon
>Assignee: Eli Collins
>Priority: Blocker
> Attachments: hdfs-3876.txt, hdfs-3876.txt, hdfs-3876.txt
>
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
> {{initialize}} function then tries to make an RPC to the same node to find 
> out the defaults.
> - This is happening inside the NN write lock (since it's part of the active 
> initialization). Hence, all of the other handler threads are already blocked 
> waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never 
> enters active state.
> We need to have a general policy that the NN should never make RPCs to itself 
> for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)

2012-09-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446858#comment-13446858
 ] 

Hadoop QA commented on HDFS-3876:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12543455/hdfs-3876.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3137//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3137//console

This message is automatically generated.

> NN should not RPC to self to find trash defaults (causes deadlock)
> --
>
> Key: HDFS-3876
> URL: https://issues.apache.org/jira/browse/HDFS-3876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Todd Lipcon
>Assignee: Eli Collins
>Priority: Blocker
> Attachments: hdfs-3876.txt, hdfs-3876.txt, hdfs-3876.txt
>
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
> {{initialize}} function then tries to make an RPC to the same node to find 
> out the defaults.
> - This is happening inside the NN write lock (since it's part of the active 
> initialization). Hence, all of the other handler threads are already blocked 
> waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never 
> enters active state.
> We need to have a general policy that the NN should never make RPCs to itself 
> for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)

2012-08-31 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446623#comment-13446623
 ] 

Eli Collins commented on HDFS-3876:
---

The TestViewFsTrash failure is related, will upload a new patch.

> NN should not RPC to self to find trash defaults (causes deadlock)
> --
>
> Key: HDFS-3876
> URL: https://issues.apache.org/jira/browse/HDFS-3876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Todd Lipcon
>Assignee: Eli Collins
>Priority: Blocker
> Attachments: hdfs-3876.txt, hdfs-3876.txt
>
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
> {{initialize}} function then tries to make an RPC to the same node to find 
> out the defaults.
> - This is happening inside the NN write lock (since it's part of the active 
> initialization). Hence, all of the other handler threads are already blocked 
> waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never 
> enters active state.
> We need to have a general policy that the NN should never make RPCs to itself 
> for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)

2012-08-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446611#comment-13446611
 ] 

Hadoop QA commented on HDFS-3876:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12543377/hdfs-3876.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.viewfs.TestViewFsTrash
  org.apache.hadoop.ha.TestZKFailoverController
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
  org.apache.hadoop.hdfs.TestPersistBlocks

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3136//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3136//console

This message is automatically generated.

> NN should not RPC to self to find trash defaults (causes deadlock)
> --
>
> Key: HDFS-3876
> URL: https://issues.apache.org/jira/browse/HDFS-3876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.2.0-alpha
>Reporter: Todd Lipcon
>Assignee: Eli Collins
>Priority: Blocker
> Attachments: hdfs-3876.txt, hdfs-3876.txt
>
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
> {{initialize}} function then tries to make an RPC to the same node to find 
> out the defaults.
> - This is happening inside the NN write lock (since it's part of the active 
> initialization). Hence, all of the other handler threads are already blocked 
> waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never 
> enters active state.
> We need to have a general policy that the NN should never make RPCs to itself 
> for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)

2012-08-31 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446500#comment-13446500
 ] 

Aaron T. Myers commented on HDFS-3876:
--

Makes sense. I was just responding to the comment that "We need to have a 
general policy that the NN should never make RPCs to itself for any reason, due 
to potential for deadlocks like this."

> NN should not RPC to self to find trash defaults (causes deadlock)
> --
>
> Key: HDFS-3876
> URL: https://issues.apache.org/jira/browse/HDFS-3876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 3.0.0, 2.2.0-alpha
>Reporter: Todd Lipcon
>Assignee: Eli Collins
>Priority: Blocker
> Attachments: hdfs-3876.txt
>
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
> {{initialize}} function then tries to make an RPC to the same node to find 
> out the defaults.
> - This is happening inside the NN write lock (since it's part of the active 
> initialization). Hence, all of the other handler threads are already blocked 
> waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never 
> enters active state.
> We need to have a general policy that the NN should never make RPCs to itself 
> for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)

2012-08-31 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446495#comment-13446495
 ] 

Todd Lipcon commented on HDFS-3876:
---

bq. Somewhat related: I believe that the trash emptier thread in the NN also 
makes RPCs to the NN to delete the appropriate files, in addition to 
configuring itself. Should we do something about that as well?

Would be a nice improvement but it can't cause a deadlock, since it's not being 
done from an IPC handler thread. The issue here is that we have a handler 
thread making an IPC back to itself. So it's possible that, when you don't have 
enough threads, the IPC blocks forever because the client itself is holding the 
thread up.

> NN should not RPC to self to find trash defaults (causes deadlock)
> --
>
> Key: HDFS-3876
> URL: https://issues.apache.org/jira/browse/HDFS-3876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 3.0.0, 2.2.0-alpha
>Reporter: Todd Lipcon
>Assignee: Eli Collins
>Priority: Blocker
> Attachments: hdfs-3876.txt
>
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
> {{initialize}} function then tries to make an RPC to the same node to find 
> out the defaults.
> - This is happening inside the NN write lock (since it's part of the active 
> initialization). Hence, all of the other handler threads are already blocked 
> waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never 
> enters active state.
> We need to have a general policy that the NN should never make RPCs to itself 
> for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)

2012-08-31 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446477#comment-13446477
 ] 

Aaron T. Myers commented on HDFS-3876:
--

Somewhat related: I believe that the trash emptier thread in the NN also makes 
RPCs to the NN to delete the appropriate files, in addition to configuring 
itself. Should we do something about that as well?

> NN should not RPC to self to find trash defaults (causes deadlock)
> --
>
> Key: HDFS-3876
> URL: https://issues.apache.org/jira/browse/HDFS-3876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 3.0.0, 2.2.0-alpha
>Reporter: Todd Lipcon
>Assignee: Eli Collins
>Priority: Blocker
> Attachments: hdfs-3876.txt
>
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
> {{initialize}} function then tries to make an RPC to the same node to find 
> out the defaults.
> - This is happening inside the NN write lock (since it's part of the active 
> initialization). Hence, all of the other handler threads are already blocked 
> waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never 
> enters active state.
> We need to have a general policy that the NN should never make RPCs to itself 
> for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)

2012-08-30 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445531#comment-13445531
 ] 

Eli Collins commented on HDFS-3876:
---

I'll try to get a patch up tonight, if it's blocking you I can revert it.

> NN should not RPC to self to find trash defaults (causes deadlock)
> --
>
> Key: HDFS-3876
> URL: https://issues.apache.org/jira/browse/HDFS-3876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 3.0.0, 2.2.0-alpha
>Reporter: Todd Lipcon
>Assignee: Eli Collins
>Priority: Blocker
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
> {{initialize}} function then tries to make an RPC to the same node to find 
> out the defaults.
> - This is happening inside the NN write lock (since it's part of the active 
> initialization). Hence, all of the other handler threads are already blocked 
> waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never 
> enters active state.
> We need to have a general policy that the NN should never make RPCs to itself 
> for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira