[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3876:
------------------------------

    Attachment: hdfs-3876.txt

I considered these approaches:

1. Modify the current check to not use the the server defaults if called from 
the NN context. We could try to detect this or plumb a boolean through the 
various constructors. Both are a pain (because this code lives in common and 
there are a fair number of paths and it would be easy to miss a new one).

2. In FsShell get server defaults at shell initialization time and clobber the 
client config with the server config if set. This is simple and has the benefit 
of only retrieving the server defaults once rather than per path item but 
because operations can span NNs and each NN may have a different trash config 
this option doesn't work.

3. Approach #2 but wait to get the server config when we figure out the right 
NN for the path.

I prefer the last one, here's a patch that implements it, I still need to 
update the tests to match.
                
> NN should not RPC to self to find trash defaults (causes deadlock)
> ------------------------------------------------------------------
>
>                 Key: HDFS-3876
>                 URL: https://issues.apache.org/jira/browse/HDFS-3876
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 3.0.0, 2.2.0-alpha
>            Reporter: Todd Lipcon
>            Assignee: Eli Collins
>            Priority: Blocker
>         Attachments: hdfs-3876.txt
>
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
> {{initialize}} function then tries to make an RPC to the same node to find 
> out the defaults.
> - This is happening inside the NN write lock (since it's part of the active 
> initialization). Hence, all of the other handler threads are already blocked 
> waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never 
> enters active state.
> We need to have a general policy that the NN should never make RPCs to itself 
> for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to