Re: yarn usercache dir not resolved properly when running an example application

2019-02-21 Thread Vinay Kashyap
Yes Jeff Thanks again.
I could successfully run standalone TF training application with
Tensorboard on docker container. Will definitely take care of silent ssh
once I start with Distributed TF..



On Tue, Feb 19, 2019 at 9:44 PM Jeff Hubbs  wrote:

> Great, Vinay - I'm glad that made a difference. When you get to the point
> where you are running a cluster, the same sort of thing will have to carry
> over to all nodes, with the added issue that ssh and keys must be
> configured such that each of those users can shell to other nodes without
> supplying a password.
>
> On 2/18/19 11:41 PM, Vinay Kashyap wrote:
>
> Perfect Jeff, I clearly understand.
> After changing the setup to the appropriate users and folder permissions,
> I can see some progress..
>
> Cheers..
>
> On Fri, Feb 15, 2019 at 10:05 AM Jeff Hubbs  wrote:
>
>> On 2/14/19 11:09 PM, Vinay Kashyap wrote:
>>
>> I am running hadoop on my mac and all the folders have *myuser:staff* as
>> the owner. I have verified the permissions for the local dirs to be 755.
>>
>> This doesn't sound right. By-the-book, there are supposed to be separate
>> "users" for hdfs, yarn, and mapred to run their respective daemons. The
>> directories they read/write in are supposed to be permed and owned to
>> expect that. One possible approach for purposes of log-writing etc. is to
>> put those user accounts in a group (perhaps named "hadoop") so that
>> read/written areas in common are owned by that group and permed accordingly.
>>
>> If you're going to ad-lib that arrangement then you'll have to ad-lib a
>> lot of the rest of how worker nodes and edge nodes behave accordingly.
>>
>> I run all hadoop services with myuser and I have configured
>> *yarn.nodemanager.linux-container-executor.group**=staff *accordingly
>> both in *yarn-site.xml* and *container-executor.cfg*
>>
>> 1. Is the container-executor binary certified to work as expected on
>> OSX.?
>> 2. When linux container executor is configured, is there any hard
>> expectation that users of the running hadoop services to be part of [*root,
>> hdfs, yarn...*] and group to be *hadoop*.? So that the directory
>> permissions fall in line accordingly?
>>
>> Can you please help me understand this.? Could not find any write up on
>> this.
>>
>> On Thu, Feb 14, 2019 at 11:13 PM Prabhu Josephraj 
>> wrote:
>>
>>> In case of Distributed Shell Job - ApplicationMaster runs in normal
>>> linux container and the subsequent shell command runs inside Docker
>>> container. The job fails even before launching AM, that is before
>>> starting Docker Container. I think the Distributed Shell job will fail even
>>> without Docker Settings.
>>>
>>> As per the error code 20 , it is mostly related to accessing of NM local
>>> directory.
>>>
>>>
>>> https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html
>>>
>>> 20
>>>
>>> INITIALIZE_USER_FAILED
>>>
>>> Couldn't get, stat, or secure the per-user NodeManager directory.
>>>
>>> Can we try below steps on (all) NodeManager machine.
>>>
>>> Remove all contents under /data/yarn and make sure the /data and
>>> /data/yarn directory permission is 755 with owner root:root and local
>>> directory
>>> is owned by yarn:hadoop.
>>>
>>> [root@tparimi-tarunhdp26-4 ~]# ls -lrt /
>>> drwxr-xr-x.   5 root root44 Oct 24 11:47 data
>>>
>>> [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/
>>> drwxr-xr-x. 4 root  root   28 Oct 24 14:30 yarn
>>>
>>> [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/yarn/
>>> total 4
>>> drwxr-xr-x.  5 yarn hadoop   54 Feb 14 17:32 local
>>> drwxrwxr-x. 10 yarn hadoop 4096 Feb 14 17:32 log
>>>
>>> And also check if Distributed Shell jobs runs fine without Docker
>>> Settings.
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Feb 14, 2019 at 10:15 PM Vinay Kashyap 
>>> wrote:
>>>
 Hi Prabhu,

 Thanks for your reply.
 I tried the configurations as per your suggestion. But I get the
 same error.
 Is this related to container localization by any chance?.
 Also, is there any log or out information which says that the docker
 container runtime has been picked up.?



 On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj 
 wrote:

> Hi Vinay,
>
> Can you try specifying below configs under Docker section in
> container-executor.cfg which will allow Docker Containers to use the NM
> Local Dirs.
>
>
> docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
>   docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log
>
> Thanks,
> Prabhu Joseph
>
> On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap 
> wrote:
>
>>
>> I am using Hadoop 3.2.0 and trying to run a simple application in a
>> docker container and I have made the required configuration changes both 
>> in
>> *yarn-site.xml* and *container-executor.cfg* to choose
>> LinuxContainerExecutor and docker runtime.
>>
>> I use the example of distribute

Re: yarn usercache dir not resolved properly when running an example application

2019-02-19 Thread Jeff Hubbs
Great, Vinay - I'm glad that made a difference. When you get to the 
point where you are running a cluster, the same sort of thing will have 
to carry over to all nodes, with the added issue that ssh and keys must 
be configured such that each of those users can shell to other nodes 
without supplying a password.


On 2/18/19 11:41 PM, Vinay Kashyap wrote:

Perfect Jeff, I clearly understand.
After changing the setup to the appropriate users and folder 
permissions, I can see some progress..


Cheers..

On Fri, Feb 15, 2019 at 10:05 AM Jeff Hubbs > wrote:


On 2/14/19 11:09 PM, Vinay Kashyap wrote:

I am running hadoop on my mac and all the folders have
*myuser:staff* as the owner. I have verified the permissions for
the local dirs to be 755.


This doesn't sound right. By-the-book, there are supposed to be
separate "users" for hdfs, yarn, and mapred to run their
respective daemons. The directories they read/write in are
supposed to be permed and owned to expect that. One possible
approach for purposes of log-writing etc. is to put those user
accounts in a group (perhaps named "hadoop") so that read/written
areas in common are owned by that group and permed accordingly.

If you're going to ad-lib that arrangement then you'll have to
ad-lib a lot of the rest of how worker nodes and edge nodes behave
accordingly.


I run all hadoop services with myuser and I have configured
/yarn.nodemanager.linux-container-executor.group/*=staff
*accordingly both in *yarn-site.xml* and *container-executor.cfg*

1. Is the container-executor binary certified to work as expected
on OSX.?
2. When linux container executor is configured, is there any hard
expectation that users of the running hadoop services to be part
of [*root, hdfs, yarn...*] and group to be *hadoop*.? So that the
directory permissions fall in line accordingly?

Can you please help me understand this.? Could not find any write
up on this.

On Thu, Feb 14, 2019 at 11:13 PM Prabhu Josephraj
mailto:pjos...@cloudera.com>> wrote:

In case of Distributed Shell Job - ApplicationMaster runs in
normal linux container and the subsequent shell command runs
inside Docker
container. The job fails even before launching AM, that is
before starting Docker Container. I think the Distributed
Shell job will fail even
without Docker Settings.

As per the error code 20 , it is mostly related to accessing
of NM local directory.


https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html

20



INITIALIZE_USER_FAILED



Couldn't get, stat, or secure the per-user NodeManager directory.


Can we try below steps on (all) NodeManager machine.

Remove all contents under /data/yarn and make sure the /data
and /data/yarn directory permission is 755 with owner
root:root and local directory
is owned by yarn:hadoop.

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /
drwxr-xr-x.   5 root root    44 Oct 24 11:47 data

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/
drwxr-xr-x. 4 root      root   28 Oct 24 14:30 yarn

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/yarn/
total 4
drwxr-xr-x.  5 yarn hadoop   54 Feb 14 17:32 local
drwxrwxr-x. 10 yarn hadoop 4096 Feb 14 17:32 log

And also check if Distributed Shell jobs runs fine without
Docker Settings.





On Thu, Feb 14, 2019 at 10:15 PM Vinay Kashyap
mailto:vinu.k...@gmail.com>> wrote:

Hi Prabhu,

Thanks for your reply.
I tried the configurations as per your suggestion. But I
get the same error.
Is this related to container localization by any chance?.
Also, is there any log or out information which says that
the docker container runtime has been picked up.?



On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj
mailto:pjos...@cloudera.com>> wrote:

Hi Vinay,

    Can you try specifying below configs under Docker
section in container-executor.cfg which will allow
Docker Containers to use the NM Local Dirs.


docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log

Thanks,
Prabhu Joseph

On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap
mailto:vinu.k...@gmail.com>> wrote:


I am using Hadoop 3.2.0 and trying to run a
simple application in a docker container and I
have made the required configuration changes both

Re: yarn usercache dir not resolved properly when running an example application

2019-02-18 Thread Vinay Kashyap
Perfect Jeff, I clearly understand.
After changing the setup to the appropriate users and folder permissions, I
can see some progress..

Cheers..

On Fri, Feb 15, 2019 at 10:05 AM Jeff Hubbs  wrote:

> On 2/14/19 11:09 PM, Vinay Kashyap wrote:
>
> I am running hadoop on my mac and all the folders have *myuser:staff* as
> the owner. I have verified the permissions for the local dirs to be 755.
>
> This doesn't sound right. By-the-book, there are supposed to be separate
> "users" for hdfs, yarn, and mapred to run their respective daemons. The
> directories they read/write in are supposed to be permed and owned to
> expect that. One possible approach for purposes of log-writing etc. is to
> put those user accounts in a group (perhaps named "hadoop") so that
> read/written areas in common are owned by that group and permed accordingly.
>
> If you're going to ad-lib that arrangement then you'll have to ad-lib a
> lot of the rest of how worker nodes and edge nodes behave accordingly.
>
> I run all hadoop services with myuser and I have configured
> *yarn.nodemanager.linux-container-executor.group**=staff *accordingly
> both in *yarn-site.xml* and *container-executor.cfg*
>
> 1. Is the container-executor binary certified to work as expected on OSX.?
> 2. When linux container executor is configured, is there any hard
> expectation that users of the running hadoop services to be part of [*root,
> hdfs, yarn...*] and group to be *hadoop*.? So that the directory
> permissions fall in line accordingly?
>
> Can you please help me understand this.? Could not find any write up on
> this.
>
> On Thu, Feb 14, 2019 at 11:13 PM Prabhu Josephraj 
> wrote:
>
>> In case of Distributed Shell Job - ApplicationMaster runs in normal linux
>> container and the subsequent shell command runs inside Docker
>> container. The job fails even before launching AM, that is before
>> starting Docker Container. I think the Distributed Shell job will fail even
>> without Docker Settings.
>>
>> As per the error code 20 , it is mostly related to accessing of NM local
>> directory.
>>
>>
>> https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html
>>
>> 20
>>
>> INITIALIZE_USER_FAILED
>>
>> Couldn't get, stat, or secure the per-user NodeManager directory.
>>
>> Can we try below steps on (all) NodeManager machine.
>>
>> Remove all contents under /data/yarn and make sure the /data and
>> /data/yarn directory permission is 755 with owner root:root and local
>> directory
>> is owned by yarn:hadoop.
>>
>> [root@tparimi-tarunhdp26-4 ~]# ls -lrt /
>> drwxr-xr-x.   5 root root44 Oct 24 11:47 data
>>
>> [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/
>> drwxr-xr-x. 4 root  root   28 Oct 24 14:30 yarn
>>
>> [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/yarn/
>> total 4
>> drwxr-xr-x.  5 yarn hadoop   54 Feb 14 17:32 local
>> drwxrwxr-x. 10 yarn hadoop 4096 Feb 14 17:32 log
>>
>> And also check if Distributed Shell jobs runs fine without Docker
>> Settings.
>>
>>
>>
>>
>>
>> On Thu, Feb 14, 2019 at 10:15 PM Vinay Kashyap 
>> wrote:
>>
>>> Hi Prabhu,
>>>
>>> Thanks for your reply.
>>> I tried the configurations as per your suggestion. But I get the
>>> same error.
>>> Is this related to container localization by any chance?.
>>> Also, is there any log or out information which says that the docker
>>> container runtime has been picked up.?
>>>
>>>
>>>
>>> On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj 
>>> wrote:
>>>
 Hi Vinay,

 Can you try specifying below configs under Docker section in
 container-executor.cfg which will allow Docker Containers to use the NM
 Local Dirs.


 docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
   docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log

 Thanks,
 Prabhu Joseph

 On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap 
 wrote:

>
> I am using Hadoop 3.2.0 and trying to run a simple application in a
> docker container and I have made the required configuration changes both 
> in
> *yarn-site.xml* and *container-executor.cfg* to choose
> LinuxContainerExecutor and docker runtime.
>
> I use the example of distributed shell in one of the hortonworks blog.
> https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
>
> The problem I face here is when the application is submitted to YARN
> it fails with a reason related to directory creation issue with the below
> error
>
> 2019-02-14 20:51:16,450 INFO distributedshell.Client: Got application
> report from ASM for, appId=2, clientToAMToken=null,
> appDiagnostics=Application application_1550156488785_0002 failed 2 times
> due to AM Container for appattempt_1550156488785_0002_02 exited with
> exitCode: -1000 Failing this attempt.Diagnostics: [2019-02-14
> 20:51:16.282]Application application_1550156488785_0002 initialization
>>

Re: yarn usercache dir not resolved properly when running an example application

2019-02-14 Thread Jeff Hubbs

On 2/14/19 11:09 PM, Vinay Kashyap wrote:
I am running hadoop on my mac and all the folders have *myuser:staff* 
as the owner. I have verified the permissions for the local dirs to be 
755.


This doesn't sound right. By-the-book, there are supposed to be separate 
"users" for hdfs, yarn, and mapred to run their respective daemons. The 
directories they read/write in are supposed to be permed and owned to 
expect that. One possible approach for purposes of log-writing etc. is 
to put those user accounts in a group (perhaps named "hadoop") so that 
read/written areas in common are owned by that group and permed accordingly.


If you're going to ad-lib that arrangement then you'll have to ad-lib a 
lot of the rest of how worker nodes and edge nodes behave accordingly.


I run all hadoop services with myuser and I have configured 
/yarn.nodemanager.linux-container-executor.group/*=staff *accordingly 
both in *yarn-site.xml* and *container-executor.cfg*


1. Is the container-executor binary certified to work as expected on 
OSX.?
2. When linux container executor is configured, is there any hard 
expectation that users of the running hadoop services to be part of 
[*root, hdfs, yarn...*] and group to be *hadoop*.? So that the 
directory permissions fall in line accordingly?


Can you please help me understand this.? Could not find any write up 
on this.


On Thu, Feb 14, 2019 at 11:13 PM Prabhu Josephraj 
mailto:pjos...@cloudera.com>> wrote:


In case of Distributed Shell Job - ApplicationMaster runs in
normal linux container and the subsequent shell command runs
inside Docker
container. The job fails even before launching AM, that is before
starting Docker Container. I think the Distributed Shell job will
fail even
without Docker Settings.

As per the error code 20 , it is mostly related to accessing of NM
local directory.


https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html

20



INITIALIZE_USER_FAILED



Couldn't get, stat, or secure the per-user NodeManager directory.


Can we try below steps on (all) NodeManager machine.

Remove all contents under /data/yarn and make sure the /data and
/data/yarn directory permission is 755 with owner root:root and
local directory
is owned by yarn:hadoop.

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /
drwxr-xr-x.   5 root root    44 Oct 24 11:47 data

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/
drwxr-xr-x. 4 root      root   28 Oct 24 14:30 yarn

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/yarn/
total 4
drwxr-xr-x.  5 yarn hadoop   54 Feb 14 17:32 local
drwxrwxr-x. 10 yarn hadoop 4096 Feb 14 17:32 log

And also check if Distributed Shell jobs runs fine without Docker
Settings.





On Thu, Feb 14, 2019 at 10:15 PM Vinay Kashyap
mailto:vinu.k...@gmail.com>> wrote:

Hi Prabhu,

Thanks for your reply.
I tried the configurations as per your suggestion. But I get
the same error.
Is this related to container localization by any chance?.
Also, is there any log or out information which says that the
docker container runtime has been picked up.?



On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj
mailto:pjos...@cloudera.com>> wrote:

Hi Vinay,

    Can you try specifying below configs under Docker
section in container-executor.cfg which will allow Docker
Containers to use the NM Local Dirs.


docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log

Thanks,
Prabhu Joseph

On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap
mailto:vinu.k...@gmail.com>> wrote:


I am using Hadoop 3.2.0 and trying to run a simple
application in a docker container and I have made the
required configuration changes both in
*/yarn-site.xml/* and */container-executor.cfg/* to
choose LinuxContainerExecutor and docker runtime.

I use the example of distributed shell in one of the
hortonworks blog.

https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/

The problem I face here is when the application is
submitted to YARN it fails with a reason related to
directory creation issue with the below error

2019-02-14 20:51:16,450 INFO
distributedshell.Client: Got application report
from ASM for, appId=2, clientToAMToken=null,
appDiagnostics=Application
application_1550156488785_0002 failed 2 times due
to AM Container for
appattempt_1550

Re: yarn usercache dir not resolved properly when running an example application

2019-02-14 Thread Vinay Kashyap
I am running hadoop on my mac and all the folders have *myuser:staff* as
the owner. I have verified the permissions for the local dirs to be 755.
I run all hadoop services with myuser and I have configured
*yarn.nodemanager.linux-container-executor.group**=staff *accordingly both
in *yarn-site.xml* and *container-executor.cfg*

1. Is the container-executor binary certified to work as expected on OSX.?
2. When linux container executor is configured, is there any hard
expectation that users of the running hadoop services to be part of [*root,
hdfs, yarn...*] and group to be *hadoop*.? So that the directory
permissions fall in line accordingly?

Can you please help me understand this.? Could not find any write up on
this.

On Thu, Feb 14, 2019 at 11:13 PM Prabhu Josephraj 
wrote:

> In case of Distributed Shell Job - ApplicationMaster runs in normal linux
> container and the subsequent shell command runs inside Docker
> container. The job fails even before launching AM, that is before starting
> Docker Container. I think the Distributed Shell job will fail even
> without Docker Settings.
>
> As per the error code 20 , it is mostly related to accessing of NM local
> directory.
>
>
> https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html
>
> 20
>
> INITIALIZE_USER_FAILED
>
> Couldn't get, stat, or secure the per-user NodeManager directory.
>
> Can we try below steps on (all) NodeManager machine.
>
> Remove all contents under /data/yarn and make sure the /data and
> /data/yarn directory permission is 755 with owner root:root and local
> directory
> is owned by yarn:hadoop.
>
> [root@tparimi-tarunhdp26-4 ~]# ls -lrt /
> drwxr-xr-x.   5 root root44 Oct 24 11:47 data
>
> [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/
> drwxr-xr-x. 4 root  root   28 Oct 24 14:30 yarn
>
> [root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/yarn/
> total 4
> drwxr-xr-x.  5 yarn hadoop   54 Feb 14 17:32 local
> drwxrwxr-x. 10 yarn hadoop 4096 Feb 14 17:32 log
>
> And also check if Distributed Shell jobs runs fine without Docker Settings.
>
>
>
>
>
> On Thu, Feb 14, 2019 at 10:15 PM Vinay Kashyap 
> wrote:
>
>> Hi Prabhu,
>>
>> Thanks for your reply.
>> I tried the configurations as per your suggestion. But I get the
>> same error.
>> Is this related to container localization by any chance?.
>> Also, is there any log or out information which says that the docker
>> container runtime has been picked up.?
>>
>>
>>
>> On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj 
>> wrote:
>>
>>> Hi Vinay,
>>>
>>> Can you try specifying below configs under Docker section in
>>> container-executor.cfg which will allow Docker Containers to use the NM
>>> Local Dirs.
>>>
>>>
>>> docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
>>>   docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log
>>>
>>> Thanks,
>>> Prabhu Joseph
>>>
>>> On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap 
>>> wrote:
>>>

 I am using Hadoop 3.2.0 and trying to run a simple application in a
 docker container and I have made the required configuration changes both in
 *yarn-site.xml* and *container-executor.cfg* to choose
 LinuxContainerExecutor and docker runtime.

 I use the example of distributed shell in one of the hortonworks blog.
 https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/

 The problem I face here is when the application is submitted to YARN it
 fails with a reason related to directory creation issue with the below 
 error

 2019-02-14 20:51:16,450 INFO distributedshell.Client: Got application
 report from ASM for, appId=2, clientToAMToken=null,
 appDiagnostics=Application application_1550156488785_0002 failed 2 times
 due to AM Container for appattempt_1550156488785_0002_02 exited with
 exitCode: -1000 Failing this attempt.Diagnostics: [2019-02-14
 20:51:16.282]Application application_1550156488785_0002 initialization
 failed (exitCode=20) with output: main : command provided 0 main : user is
 myuser main : requested yarn user is myuser Failed to create directory
 /data/yarn/local/nmPrivate/container_1550156488785_0002_02_01.tokens/usercache/myuser
 - Not a directory

 I have configured *yarn.nodemanager.local-dirs* in yarn-site.xml and I
 can see the same reflected in YARN web ui *localhost:8088/conf*

 
 yarn.nodemanager.local-dirs
 /data/yarn/local
 false
 yarn-site.xml
 

 I do not understand why is it trying to create usercache dir inside the
 nmPrivate directory.

 Note : I have verified the permissions for myuser to the directories
 and also have tried clearing the directories manually as suggested in a
 related post. But no fruit. I do not see any additional information about
 container launch failure in any other logs.

 How do I debug why the usercache dir i

Re: yarn usercache dir not resolved properly when running an example application

2019-02-14 Thread Prabhu Josephraj
In case of Distributed Shell Job - ApplicationMaster runs in normal linux
container and the subsequent shell command runs inside Docker
container. The job fails even before launching AM, that is before starting
Docker Container. I think the Distributed Shell job will fail even
without Docker Settings.

As per the error code 20 , it is mostly related to accessing of NM local
directory.

https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html

20

INITIALIZE_USER_FAILED

Couldn't get, stat, or secure the per-user NodeManager directory.

Can we try below steps on (all) NodeManager machine.

Remove all contents under /data/yarn and make sure the /data and /data/yarn
directory permission is 755 with owner root:root and local directory
is owned by yarn:hadoop.

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /
drwxr-xr-x.   5 root root44 Oct 24 11:47 data

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/
drwxr-xr-x. 4 root  root   28 Oct 24 14:30 yarn

[root@tparimi-tarunhdp26-4 ~]# ls -lrt /data/yarn/
total 4
drwxr-xr-x.  5 yarn hadoop   54 Feb 14 17:32 local
drwxrwxr-x. 10 yarn hadoop 4096 Feb 14 17:32 log

And also check if Distributed Shell jobs runs fine without Docker Settings.





On Thu, Feb 14, 2019 at 10:15 PM Vinay Kashyap  wrote:

> Hi Prabhu,
>
> Thanks for your reply.
> I tried the configurations as per your suggestion. But I get the
> same error.
> Is this related to container localization by any chance?.
> Also, is there any log or out information which says that the docker
> container runtime has been picked up.?
>
>
>
> On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj 
> wrote:
>
>> Hi Vinay,
>>
>> Can you try specifying below configs under Docker section in
>> container-executor.cfg which will allow Docker Containers to use the NM
>> Local Dirs.
>>
>>
>> docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
>>   docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log
>>
>> Thanks,
>> Prabhu Joseph
>>
>> On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap 
>> wrote:
>>
>>>
>>> I am using Hadoop 3.2.0 and trying to run a simple application in a
>>> docker container and I have made the required configuration changes both in
>>> *yarn-site.xml* and *container-executor.cfg* to choose
>>> LinuxContainerExecutor and docker runtime.
>>>
>>> I use the example of distributed shell in one of the hortonworks blog.
>>> https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
>>>
>>> The problem I face here is when the application is submitted to YARN it
>>> fails with a reason related to directory creation issue with the below error
>>>
>>> 2019-02-14 20:51:16,450 INFO distributedshell.Client: Got application
>>> report from ASM for, appId=2, clientToAMToken=null,
>>> appDiagnostics=Application application_1550156488785_0002 failed 2 times
>>> due to AM Container for appattempt_1550156488785_0002_02 exited with
>>> exitCode: -1000 Failing this attempt.Diagnostics: [2019-02-14
>>> 20:51:16.282]Application application_1550156488785_0002 initialization
>>> failed (exitCode=20) with output: main : command provided 0 main : user is
>>> myuser main : requested yarn user is myuser Failed to create directory
>>> /data/yarn/local/nmPrivate/container_1550156488785_0002_02_01.tokens/usercache/myuser
>>> - Not a directory
>>>
>>> I have configured *yarn.nodemanager.local-dirs* in yarn-site.xml and I
>>> can see the same reflected in YARN web ui *localhost:8088/conf*
>>>
>>> 
>>> yarn.nodemanager.local-dirs
>>> /data/yarn/local
>>> false
>>> yarn-site.xml
>>> 
>>>
>>> I do not understand why is it trying to create usercache dir inside the
>>> nmPrivate directory.
>>>
>>> Note : I have verified the permissions for myuser to the directories and
>>> also have tried clearing the directories manually as suggested in a related
>>> post. But no fruit. I do not see any additional information about container
>>> launch failure in any other logs.
>>>
>>> How do I debug why the usercache dir is not resolved properly??
>>>
>>> Really appreciate any help on this.
>>>
>>> Thanks
>>>
>>> Vinay Kashyap
>>>
>>
>
> --
> *Thanks and regards*
> *Vinay Kashyap*
>


Re: yarn usercache dir not resolved properly when running an example application

2019-02-14 Thread Vinay Kashyap
Hi Prabhu,

Thanks for your reply.
I tried the configurations as per your suggestion. But I get the same error.
Is this related to container localization by any chance?.
Also, is there any log or out information which says that the docker
container runtime has been picked up.?



On Thu, Feb 14, 2019 at 9:38 PM Prabhu Josephraj 
wrote:

> Hi Vinay,
>
> Can you try specifying below configs under Docker section in
> container-executor.cfg which will allow Docker Containers to use the NM
> Local Dirs.
>
>
> docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
>   docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log
>
> Thanks,
> Prabhu Joseph
>
> On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap  wrote:
>
>>
>> I am using Hadoop 3.2.0 and trying to run a simple application in a
>> docker container and I have made the required configuration changes both in
>> *yarn-site.xml* and *container-executor.cfg* to choose
>> LinuxContainerExecutor and docker runtime.
>>
>> I use the example of distributed shell in one of the hortonworks blog.
>> https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
>>
>> The problem I face here is when the application is submitted to YARN it
>> fails with a reason related to directory creation issue with the below error
>>
>> 2019-02-14 20:51:16,450 INFO distributedshell.Client: Got application
>> report from ASM for, appId=2, clientToAMToken=null,
>> appDiagnostics=Application application_1550156488785_0002 failed 2 times
>> due to AM Container for appattempt_1550156488785_0002_02 exited with
>> exitCode: -1000 Failing this attempt.Diagnostics: [2019-02-14
>> 20:51:16.282]Application application_1550156488785_0002 initialization
>> failed (exitCode=20) with output: main : command provided 0 main : user is
>> myuser main : requested yarn user is myuser Failed to create directory
>> /data/yarn/local/nmPrivate/container_1550156488785_0002_02_01.tokens/usercache/myuser
>> - Not a directory
>>
>> I have configured *yarn.nodemanager.local-dirs* in yarn-site.xml and I
>> can see the same reflected in YARN web ui *localhost:8088/conf*
>>
>> 
>> yarn.nodemanager.local-dirs
>> /data/yarn/local
>> false
>> yarn-site.xml
>> 
>>
>> I do not understand why is it trying to create usercache dir inside the
>> nmPrivate directory.
>>
>> Note : I have verified the permissions for myuser to the directories and
>> also have tried clearing the directories manually as suggested in a related
>> post. But no fruit. I do not see any additional information about container
>> launch failure in any other logs.
>>
>> How do I debug why the usercache dir is not resolved properly??
>>
>> Really appreciate any help on this.
>>
>> Thanks
>>
>> Vinay Kashyap
>>
>

-- 
*Thanks and regards*
*Vinay Kashyap*


Re: yarn usercache dir not resolved properly when running an example application

2019-02-14 Thread Prabhu Josephraj
Hi Vinay,

Can you try specifying below configs under Docker section in
container-executor.cfg which will allow Docker Containers to use the NM
Local Dirs.

  docker.allowed.ro-mounts=/data/yarn/local,,/usr/jdk64/jdk1.8.0_112/bin
  docker.allowed.rw-mounts=/data/yarn/local,/data/yarn/log

Thanks,
Prabhu Joseph

On Thu, Feb 14, 2019 at 9:28 PM Vinay Kashyap  wrote:

>
> I am using Hadoop 3.2.0 and trying to run a simple application in a docker
> container and I have made the required configuration changes both in
> *yarn-site.xml* and *container-executor.cfg* to choose
> LinuxContainerExecutor and docker runtime.
>
> I use the example of distributed shell in one of the hortonworks blog.
> https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
>
> The problem I face here is when the application is submitted to YARN it
> fails with a reason related to directory creation issue with the below error
>
> 2019-02-14 20:51:16,450 INFO distributedshell.Client: Got application
> report from ASM for, appId=2, clientToAMToken=null,
> appDiagnostics=Application application_1550156488785_0002 failed 2 times
> due to AM Container for appattempt_1550156488785_0002_02 exited with
> exitCode: -1000 Failing this attempt.Diagnostics: [2019-02-14
> 20:51:16.282]Application application_1550156488785_0002 initialization
> failed (exitCode=20) with output: main : command provided 0 main : user is
> myuser main : requested yarn user is myuser Failed to create directory
> /data/yarn/local/nmPrivate/container_1550156488785_0002_02_01.tokens/usercache/myuser
> - Not a directory
>
> I have configured *yarn.nodemanager.local-dirs* in yarn-site.xml and I
> can see the same reflected in YARN web ui *localhost:8088/conf*
>
> 
> yarn.nodemanager.local-dirs
> /data/yarn/local
> false
> yarn-site.xml
> 
>
> I do not understand why is it trying to create usercache dir inside the
> nmPrivate directory.
>
> Note : I have verified the permissions for myuser to the directories and
> also have tried clearing the directories manually as suggested in a related
> post. But no fruit. I do not see any additional information about container
> launch failure in any other logs.
>
> How do I debug why the usercache dir is not resolved properly??
>
> Really appreciate any help on this.
>
> Thanks
>
> Vinay Kashyap
>