[jira] [Updated] (MAPREDUCE-6550) archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor

2015-11-25 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-6550:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2!

> archive-logs tool changes log ownership to the Yarn user when using 
> DefaultContainerExecutor
> 
>
> Key: MAPREDUCE-6550
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6550
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6550.001.patch, MAPREDUCE-6550.002.patch, 
> MAPREDUCE-6550.003.patch, MAPREDUCE-6550.004.patch, MAPREDUCE-6550.005.patch
>
>
> The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell 
> app.  When using the DefaultContainerExecutor, this means that the job will 
> actually run as the Yarn user, so the resulting har files are owned by the 
> Yarn user instead of the original owner. The permissions are also now 
> world-readable.
> In the below example, the archived logs are owned by 'yarn' instead of 'paul' 
> and are now world-readable:
> {noformat}
> [root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs
> ...
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005
> drwxr-xr-x   - yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har
> -rw-r--r--   3 yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS
> -rw-r--r--   3 yarn  hadoop   1256 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index
> -rw-r--r--   3 yarn  hadoop 24 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex
> -rw-r--r--   3 yarn  hadoop8451177 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006
> -rw-r-   3 paul  hadoop   1155 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041
> -rw-r-   3 paul  hadoop   4880 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6550) archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor

2015-11-24 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-6550:
-
Attachment: MAPREDUCE-6550.005.patch

The 005 patch increases some of the test timeouts, including the one that 
failed (due to a timeout).

> archive-logs tool changes log ownership to the Yarn user when using 
> DefaultContainerExecutor
> 
>
> Key: MAPREDUCE-6550
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6550
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-6550.001.patch, MAPREDUCE-6550.002.patch, 
> MAPREDUCE-6550.003.patch, MAPREDUCE-6550.004.patch, MAPREDUCE-6550.005.patch
>
>
> The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell 
> app.  When using the DefaultContainerExecutor, this means that the job will 
> actually run as the Yarn user, so the resulting har files are owned by the 
> Yarn user instead of the original owner. The permissions are also now 
> world-readable.
> In the below example, the archived logs are owned by 'yarn' instead of 'paul' 
> and are now world-readable:
> {noformat}
> [root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs
> ...
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005
> drwxr-xr-x   - yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har
> -rw-r--r--   3 yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS
> -rw-r--r--   3 yarn  hadoop   1256 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index
> -rw-r--r--   3 yarn  hadoop 24 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex
> -rw-r--r--   3 yarn  hadoop8451177 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006
> -rw-r-   3 paul  hadoop   1155 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041
> -rw-r-   3 paul  hadoop   4880 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6550) archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor

2015-11-24 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-6550:
-
Attachment: MAPREDUCE-6550.004.patch

The 004 patch fixes the conf checkstyle thing.

> archive-logs tool changes log ownership to the Yarn user when using 
> DefaultContainerExecutor
> 
>
> Key: MAPREDUCE-6550
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6550
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-6550.001.patch, MAPREDUCE-6550.002.patch, 
> MAPREDUCE-6550.003.patch, MAPREDUCE-6550.004.patch
>
>
> The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell 
> app.  When using the DefaultContainerExecutor, this means that the job will 
> actually run as the Yarn user, so the resulting har files are owned by the 
> Yarn user instead of the original owner. The permissions are also now 
> world-readable.
> In the below example, the archived logs are owned by 'yarn' instead of 'paul' 
> and are now world-readable:
> {noformat}
> [root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs
> ...
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005
> drwxr-xr-x   - yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har
> -rw-r--r--   3 yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS
> -rw-r--r--   3 yarn  hadoop   1256 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index
> -rw-r--r--   3 yarn  hadoop 24 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex
> -rw-r--r--   3 yarn  hadoop8451177 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006
> -rw-r-   3 paul  hadoop   1155 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041
> -rw-r-   3 paul  hadoop   4880 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6550) archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor

2015-11-20 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-6550:
-
Attachment: MAPREDUCE-6550.003.patch

That's also a good idea, and should be more efficient too.

The 003 patch sets the umask for the 'hadoop archive' MR job to 027 so we get 
640 files and 750 dirs.

> archive-logs tool changes log ownership to the Yarn user when using 
> DefaultContainerExecutor
> 
>
> Key: MAPREDUCE-6550
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6550
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-6550.001.patch, MAPREDUCE-6550.002.patch, 
> MAPREDUCE-6550.003.patch
>
>
> The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell 
> app.  When using the DefaultContainerExecutor, this means that the job will 
> actually run as the Yarn user, so the resulting har files are owned by the 
> Yarn user instead of the original owner. The permissions are also now 
> world-readable.
> In the below example, the archived logs are owned by 'yarn' instead of 'paul' 
> and are now world-readable:
> {noformat}
> [root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs
> ...
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005
> drwxr-xr-x   - yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har
> -rw-r--r--   3 yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS
> -rw-r--r--   3 yarn  hadoop   1256 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index
> -rw-r--r--   3 yarn  hadoop 24 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex
> -rw-r--r--   3 yarn  hadoop8451177 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006
> -rw-r-   3 paul  hadoop   1155 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041
> -rw-r-   3 paul  hadoop   4880 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6550) archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor

2015-11-19 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-6550:
-
Attachment: MAPREDUCE-6550.002.patch

Thanks for taking a look Jason.  Those sound like good ideas.

The 002 patch fixes checkstyle warnings, sets the sticky bit on the working 
dir, and adds a {{-noProxy}} option.

> archive-logs tool changes log ownership to the Yarn user when using 
> DefaultContainerExecutor
> 
>
> Key: MAPREDUCE-6550
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6550
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-6550.001.patch, MAPREDUCE-6550.002.patch
>
>
> The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell 
> app.  When using the DefaultContainerExecutor, this means that the job will 
> actually run as the Yarn user, so the resulting har files are owned by the 
> Yarn user instead of the original owner. The permissions are also now 
> world-readable.
> In the below example, the archived logs are owned by 'yarn' instead of 'paul' 
> and are now world-readable:
> {noformat}
> [root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs
> ...
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005
> drwxr-xr-x   - yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har
> -rw-r--r--   3 yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS
> -rw-r--r--   3 yarn  hadoop   1256 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index
> -rw-r--r--   3 yarn  hadoop 24 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex
> -rw-r--r--   3 yarn  hadoop8451177 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006
> -rw-r-   3 paul  hadoop   1155 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041
> -rw-r-   3 paul  hadoop   4880 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6550) archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor

2015-11-18 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-6550:

Description: 
The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell 
app.  When using the DefaultContainerExecutor, this means that the job will 
actually run as the Yarn user, so the resulting har files are owned by the Yarn 
user instead of the original owner. The permissions are also now world-readable.

In the below example, the archived logs are owned by 'yarn' instead of 'paul' 
and are now world-readable:
{noformat}
[root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs
...
drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0005
drwxr-xr-x   - yarn  hadoop  0 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har
-rw-r--r--   3 yarn  hadoop  0 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS
-rw-r--r--   3 yarn  hadoop   1256 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index
-rw-r--r--   3 yarn  hadoop 24 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex
-rw-r--r--   3 yarn  hadoop8451177 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0
drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0006
-rw-r-   3 paul  hadoop   1155 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041
-rw-r-   3 paul  hadoop   4880 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041
...
{noformat}

  was:
The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell 
app.  When using the DistributedContainerExecutor, this means that the job will 
actually run as the Yarn user, so the resulting har files are owned by the Yarn 
user instead of the original owner.  The permissions are also now 
world-readable.

In the below example, the archived logs are owned by 'yarn' instead of 'paul' 
and are now world-readable:
{noformat}
[root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs
...
drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0005
drwxr-xr-x   - yarn  hadoop  0 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har
-rw-r--r--   3 yarn  hadoop  0 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS
-rw-r--r--   3 yarn  hadoop   1256 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index
-rw-r--r--   3 yarn  hadoop 24 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex
-rw-r--r--   3 yarn  hadoop8451177 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0
drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0006
-rw-r-   3 paul  hadoop   1155 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041
-rw-r-   3 paul  hadoop   4880 2015-10-02 13:24 
/tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041
...
{noformat}


> archive-logs tool changes log ownership to the Yarn user when using 
> DefaultContainerExecutor
> 
>
> Key: MAPREDUCE-6550
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6550
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-6550.001.patch
>
>
> The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell 
> app.  When using the DefaultContainerExecutor, this means that the job will 
> actually run as the Yarn user, so the resulting har files are owned by the 
> Yarn user instead of the original owner. The permissions are also now 
> world-readable.
> In the below example, the archived logs are owned by 'yarn' instead of 'paul' 
> and are now world-readable:
> {noformat}
> [root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs
> ...
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005
> drwxr-xr-x   - yarn  

[jira] [Updated] (MAPREDUCE-6550) archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor

2015-11-17 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-6550:
-
Attachment: MAPREDUCE-6550.001.patch

The patch fixes the user problem by using UGI to proxy as the correct user.  It 
fixes the permissions problem by setting the correct permissions on the files.  
Other than those changes, the bulk of the changes in the patch are due to 
moving some things around and indenting.

I've updated the unit tests to check for the permissions and also verified in a 
cluster that it behaves correctly.

> archive-logs tool changes log ownership to the Yarn user when using 
> DefaultContainerExecutor
> 
>
> Key: MAPREDUCE-6550
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6550
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-6550.001.patch
>
>
> The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell 
> app.  When using the DistributedContainerExecutor, this means that the job 
> will actually run as the Yarn user, so the resulting har files are owned by 
> the Yarn user instead of the original owner.  The permissions are also now 
> world-readable.
> In the below example, the archived logs are owned by 'yarn' instead of 'paul' 
> and are now world-readable:
> {noformat}
> [root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs
> ...
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005
> drwxr-xr-x   - yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har
> -rw-r--r--   3 yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS
> -rw-r--r--   3 yarn  hadoop   1256 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index
> -rw-r--r--   3 yarn  hadoop 24 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex
> -rw-r--r--   3 yarn  hadoop8451177 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006
> -rw-r-   3 paul  hadoop   1155 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041
> -rw-r-   3 paul  hadoop   4880 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6550) archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor

2015-11-17 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-6550:
-
Status: Patch Available  (was: Open)

> archive-logs tool changes log ownership to the Yarn user when using 
> DefaultContainerExecutor
> 
>
> Key: MAPREDUCE-6550
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6550
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-6550.001.patch
>
>
> The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell 
> app.  When using the DistributedContainerExecutor, this means that the job 
> will actually run as the Yarn user, so the resulting har files are owned by 
> the Yarn user instead of the original owner.  The permissions are also now 
> world-readable.
> In the below example, the archived logs are owned by 'yarn' instead of 'paul' 
> and are now world-readable:
> {noformat}
> [root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs
> ...
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005
> drwxr-xr-x   - yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har
> -rw-r--r--   3 yarn  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS
> -rw-r--r--   3 yarn  hadoop   1256 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index
> -rw-r--r--   3 yarn  hadoop 24 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex
> -rw-r--r--   3 yarn  hadoop8451177 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0
> drwxrwx---   - paul  hadoop  0 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006
> -rw-r-   3 paul  hadoop   1155 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041
> -rw-r-   3 paul  hadoop   4880 2015-10-02 13:24 
> /tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)