Re: Long running Spark job on YARN throws "No AMRMToken"

2016-02-13 Thread Steve Loughran

On 11 Feb 2016, at 15:24, Prabhu Joseph 
> wrote:

Steve,


  When ResourceManager is submitted with an application, AMLauncher creates 
the token YARN_AM_RM_TOKEN (token used between RM and AM). When 
ApplicationMaster
is launched, it tries to contact RM for registering request, allocate request 
to receive containers, finish request. In all the requests,

yes, see

https://github.com/steveloughran/hadoop-trunk/blob/HADOOP-12649-security/YARN-4653-yarn/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md


ResourceManager does the
authorizeRequest, where it checks if the Current User has the token 
YARN_AM_RM_TOKEN, if not throws the "No AMRMToken".

yes; prior to YARN-3103 it did the login user


   ResourceManager for every 
yarn.resourcemanager.am-rm-tokens.master-key-rolling-interval-sec rolls the 
master key, before rolling it, it has a period
of 1.5 *  yarn.am.liveness-monitor.expiry-interval-ms during which if AM 
contacts RM with allocate request, RM checks if the AM has the YARN_AM_RM_TOKEN
prepared using the previous master key, if so, it updates the AM user with 
YARN_AM_RM_TOKEN prepared using new master key.

 If AM contacts with an YARN_AM_RM_TOKEN which is neither constructed using 
current master key nor previous master key, then "Invalid AMRMToken" message is 
thrown. This
error is the one will happen if AM has not been updated with new RM master key. 
[YARN-3103 and YARN-2212 ]

Need your help to find scenario where "No AMRMToken" will happen, an user added 
with a token but later that token is missing. Is token removed since expired?


...or there's some confusion about the current user

I've got a java class to help with credential creation and diagnostics, not yet 
ported to hadoop core, which can do some listing & dumping of credentials

https://github.com/apache/incubator-slider/blob/develop/slider-core/src/main/java/org/apache/slider/core/launch/CredentialUtils.java

you may be able to copy that code and use it to print out what tokens the 
current user has; otherwise I don't know. I've never personally hit the message


Re: Spark 1.6.1

2016-02-13 Thread Jong Wook Kim
Is 1.6.1 going to be ready this week? I see that the two last unresolved
issues targeting 1.6.1 are fixed
 now
.

On 3 February 2016 at 08:16, Daniel Darabos <
daniel.dara...@lynxanalytics.com> wrote:

>
> On Tue, Feb 2, 2016 at 7:10 PM, Michael Armbrust 
> wrote:
>
>> What about the memory leak bug?
>>> https://issues.apache.org/jira/browse/SPARK-11293
>>> Even after the memory rewrite in 1.6.0, it still happens in some cases.
>>> Will it be fixed for 1.6.1?
>>>
>>
>> I think we have enough issues queued up that I would not hold the release
>> for that, but if there is a patch we should try and review it.  We can
>> always do 1.6.2 when more issues have been resolved.  Is this an actual
>> issue that is affecting a production workload or are we concerned about an
>> edge case?
>>
>
> The way we (Lynx Analytics) use RDDs, this affects almost everything we do
> in production. Thankfully it does not cause any issues, it just logs a lot
> of errors. I think the adverse effect may be that the memory manager does
> not have a fully correct picture. But as long as the leak fits in the
> "other" (unmanaged) memory fraction this will not cause issues. We don't
> see this as an urgent issue. Thanks!
>