[jira] [Updated] (MAPREDUCE-6550) archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/MAPREDUCE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6550: - Attachment: MAPREDUCE-6550.002.patch Thanks for taking a look Jason. Those sound like good ideas. The 002 patch fixes checkstyle warnings, sets the sticky bit on the working dir, and adds a {{-noProxy}} option. > archive-logs tool changes log ownership to the Yarn user when using > DefaultContainerExecutor > > > Key: MAPREDUCE-6550 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6550 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6550.001.patch, MAPREDUCE-6550.002.patch > > > The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell > app. When using the DefaultContainerExecutor, this means that the job will > actually run as the Yarn user, so the resulting har files are owned by the > Yarn user instead of the original owner. The permissions are also now > world-readable. > In the below example, the archived logs are owned by 'yarn' instead of 'paul' > and are now world-readable: > {noformat} > [root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs > ... > drwxrwx--- - paul hadoop 0 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005 > drwxr-xr-x - yarn hadoop 0 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har > -rw-r--r-- 3 yarn hadoop 0 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS > -rw-r--r-- 3 yarn hadoop 1256 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index > -rw-r--r-- 3 yarn hadoop 24 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex > -rw-r--r-- 3 yarn hadoop8451177 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0 > drwxrwx--- - paul hadoop 0 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0006 > -rw-r- 3 paul hadoop 1155 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041 > -rw-r- 3 paul hadoop 4880 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041 > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6550) archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor
Robert Kanter created MAPREDUCE-6550: Summary: archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor Key: MAPREDUCE-6550 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6550 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell app. When using the DistributedContainerExecutor, this means that the job will actually run as the Yarn user, so the resulting har files are owned by the Yarn user instead of the original owner. The permissions are also now world-readable. In the below example, the archived logs are owned by 'yarn' instead of 'paul' and are now world-readable: {noformat} [root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs ... drwxrwx--- - paul hadoop 0 2015-10-02 13:24 /tmp/logs/paul/logs/application_1443805425363_0005 drwxr-xr-x - yarn hadoop 0 2015-10-02 13:24 /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har -rw-r--r-- 3 yarn hadoop 0 2015-10-02 13:24 /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS -rw-r--r-- 3 yarn hadoop 1256 2015-10-02 13:24 /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index -rw-r--r-- 3 yarn hadoop 24 2015-10-02 13:24 /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex -rw-r--r-- 3 yarn hadoop8451177 2015-10-02 13:24 /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0 drwxrwx--- - paul hadoop 0 2015-10-02 13:24 /tmp/logs/paul/logs/application_1443805425363_0006 -rw-r- 3 paul hadoop 1155 2015-10-02 13:24 /tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041 -rw-r- 3 paul hadoop 4880 2015-10-02 13:24 /tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041 ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6550) archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/MAPREDUCE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6550: - Attachment: MAPREDUCE-6550.001.patch The patch fixes the user problem by using UGI to proxy as the correct user. It fixes the permissions problem by setting the correct permissions on the files. Other than those changes, the bulk of the changes in the patch are due to moving some things around and indenting. I've updated the unit tests to check for the permissions and also verified in a cluster that it behaves correctly. > archive-logs tool changes log ownership to the Yarn user when using > DefaultContainerExecutor > > > Key: MAPREDUCE-6550 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6550 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6550.001.patch > > > The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell > app. When using the DistributedContainerExecutor, this means that the job > will actually run as the Yarn user, so the resulting har files are owned by > the Yarn user instead of the original owner. The permissions are also now > world-readable. > In the below example, the archived logs are owned by 'yarn' instead of 'paul' > and are now world-readable: > {noformat} > [root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs > ... > drwxrwx--- - paul hadoop 0 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005 > drwxr-xr-x - yarn hadoop 0 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har > -rw-r--r-- 3 yarn hadoop 0 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS > -rw-r--r-- 3 yarn hadoop 1256 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index > -rw-r--r-- 3 yarn hadoop 24 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex > -rw-r--r-- 3 yarn hadoop8451177 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0 > drwxrwx--- - paul hadoop 0 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0006 > -rw-r- 3 paul hadoop 1155 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041 > -rw-r- 3 paul hadoop 4880 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041 > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6550) archive-logs tool changes log ownership to the Yarn user when using DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/MAPREDUCE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6550: - Status: Patch Available (was: Open) > archive-logs tool changes log ownership to the Yarn user when using > DefaultContainerExecutor > > > Key: MAPREDUCE-6550 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6550 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6550.001.patch > > > The archive-logs tool added in MAPREDUCE-6415 leverages the Distributed Shell > app. When using the DistributedContainerExecutor, this means that the job > will actually run as the Yarn user, so the resulting har files are owned by > the Yarn user instead of the original owner. The permissions are also now > world-readable. > In the below example, the archived logs are owned by 'yarn' instead of 'paul' > and are now world-readable: > {noformat} > [root@gs28-centos66-5 ~]# sudo -u hdfs hdfs dfs -ls -R /tmp/logs > ... > drwxrwx--- - paul hadoop 0 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005 > drwxr-xr-x - yarn hadoop 0 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har > -rw-r--r-- 3 yarn hadoop 0 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_SUCCESS > -rw-r--r-- 3 yarn hadoop 1256 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_index > -rw-r--r-- 3 yarn hadoop 24 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/_masterindex > -rw-r--r-- 3 yarn hadoop8451177 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0005/application_1443805425363_0005.har/part-0 > drwxrwx--- - paul hadoop 0 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0006 > -rw-r- 3 paul hadoop 1155 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0006/gs-centos66-2.vpc.cloudera.com_8041 > -rw-r- 3 paul hadoop 4880 2015-10-02 13:24 > /tmp/logs/paul/logs/application_1443805425363_0006/gs28-centos66-3.vpc.cloudera.com_8041 > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6495) Docs for archive-logs tool
[ https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6495: - Attachment: MAPREDUCE-6495.002.patch The 002 patch has the super minor change, for reference. > Docs for archive-logs tool > -- > > Key: MAPREDUCE-6495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: documentation >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch > > > Write documentation for the 'mapred archive-logs' tool added in > MAPREDUCE-6415. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6495) Docs for archive-logs tool
[ https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6495: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the review Anubhav. I made the change to the committed patch. Comitted to trunk and branch-2! > Docs for archive-logs tool > -- > > Key: MAPREDUCE-6495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: documentation >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch > > > Write documentation for the 'mapred archive-logs' tool added in > MAPREDUCE-6415. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6495) Docs for archive-logs tool
[ https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6495: - Fix Version/s: 2.8.0 > Docs for archive-logs tool > -- > > Key: MAPREDUCE-6495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: documentation >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6495.001.patch, MAPREDUCE-6495.002.patch > > > Write documentation for the 'mapred archive-logs' tool added in > MAPREDUCE-6415. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6503) archive-logs tool should use HADOOP_PREFIX instead of HADOOP_HOME
[ https://issues.apache.org/jira/browse/MAPREDUCE-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6503: - Status: Patch Available (was: Open) > archive-logs tool should use HADOOP_PREFIX instead of HADOOP_HOME > - > > Key: MAPREDUCE-6503 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6503 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6503.001.patch > > > The archive-logs tool currently uses {{HADOOP_HOME}} in the distributed shell > job. It should instead use {{HADOOP_PREFIX}} due to > HADOOP-12451/HADOOP-12456. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6503) archive-logs tool should use HADOOP_PREFIX instead of HADOOP_HOME
Robert Kanter created MAPREDUCE-6503: Summary: archive-logs tool should use HADOOP_PREFIX instead of HADOOP_HOME Key: MAPREDUCE-6503 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6503 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter The archive-logs tool currently uses {{HADOOP_HOME}} in the distributed shell job. It should instead use {{HADOOP_PREFIX}} due to HADOOP-12451/HADOOP-12456. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6503) archive-logs tool should use HADOOP_PREFIX instead of HADOOP_HOME
[ https://issues.apache.org/jira/browse/MAPREDUCE-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945787#comment-14945787 ] Robert Kanter commented on MAPREDUCE-6503: -- Sorry; I forgot to mention that. I verified that {{$HADOOP_PREFIX}} is in the distributed shell's shell and also ran the {{archive-logs}} command against some logs. > archive-logs tool should use HADOOP_PREFIX instead of HADOOP_HOME > - > > Key: MAPREDUCE-6503 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6503 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6503.001.patch > > > The archive-logs tool currently uses {{HADOOP_HOME}} in the distributed shell > job. It should instead use {{HADOOP_PREFIX}} due to > HADOOP-12451/HADOOP-12456. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6503) archive-logs tool should use HADOOP_PREFIX instead of HADOOP_HOME
[ https://issues.apache.org/jira/browse/MAPREDUCE-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945910#comment-14945910 ] Robert Kanter commented on MAPREDUCE-6503: -- checkstyle warning is a line that's too long. It's the comment of the example of the script, so that's fine. The release audit warning is a missing Apache header in an unrelated file. > archive-logs tool should use HADOOP_PREFIX instead of HADOOP_HOME > - > > Key: MAPREDUCE-6503 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6503 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6503.001.patch > > > The archive-logs tool currently uses {{HADOOP_HOME}} in the distributed shell > job. It should instead use {{HADOOP_PREFIX}} due to > HADOOP-12451/HADOOP-12456. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6503) archive-logs tool should use HADOOP_PREFIX instead of HADOOP_HOME
[ https://issues.apache.org/jira/browse/MAPREDUCE-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6503: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Thanks for the review Karthik. Committed to trunk and branch-2! > archive-logs tool should use HADOOP_PREFIX instead of HADOOP_HOME > - > > Key: MAPREDUCE-6503 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6503 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6503.001.patch > > > The archive-logs tool currently uses {{HADOOP_HOME}} in the distributed shell > job. It should instead use {{HADOOP_PREFIX}} due to > HADOOP-12451/HADOOP-12456. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6494) Permission issue when running archive-logs tool as different users
[ https://issues.apache.org/jira/browse/MAPREDUCE-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6494: - Attachment: MAPREDUCE-6494.003.patch We actually found one last permissions issue. If using the DefaultContainerExecutor, the shells are run as the Yarn user who won't have permission to operate on the temp dir when it's permissions are using 700. The 003 patch changes it to use 770 permissions to fix this. > Permission issue when running archive-logs tool as different users > -- > > Key: MAPREDUCE-6494 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6494 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6494.001.patch, MAPREDUCE-6494.002.patch, > MAPREDUCE-6494.003.patch > > > If the tool is run as user A, it creates {{/tmp/logs/archive-logs-work}} for > temp work, but doesn't delete it. When it's run again as user B, you can run > into permissions problems on {{/tmp/logs/archive-logs-work}} because user B > doesn't have permission to do anything to that dir (it's 700). We should > have the tool delete {{/tmp/logs/archive-logs-work}} when exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6494) Permission issue when running archive-logs tool as different users
[ https://issues.apache.org/jira/browse/MAPREDUCE-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6494: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Thanks for the review Anubhav. Committed to trunk and branch-2! > Permission issue when running archive-logs tool as different users > -- > > Key: MAPREDUCE-6494 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6494 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6494.001.patch, MAPREDUCE-6494.002.patch, > MAPREDUCE-6494.003.patch > > > If the tool is run as user A, it creates {{/tmp/logs/archive-logs-work}} for > temp work, but doesn't delete it. When it's run again as user B, you can run > into permissions problems on {{/tmp/logs/archive-logs-work}} because user B > doesn't have permission to do anything to that dir (it's 700). We should > have the tool delete {{/tmp/logs/archive-logs-work}} when exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6495) Docs for archive-logs tool
[ https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6495: - Attachment: MAPREDUCE-6495.001.patch > Docs for archive-logs tool > -- > > Key: MAPREDUCE-6495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: documentation >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6495.001.patch > > > Write documentation for the 'mapred archive-logs' tool added in > MAPREDUCE-6415. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6495) Docs for archive-logs tool
[ https://issues.apache.org/jira/browse/MAPREDUCE-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6495: - Status: Patch Available (was: Open) > Docs for archive-logs tool > -- > > Key: MAPREDUCE-6495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: documentation >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6495.001.patch > > > Write documentation for the 'mapred archive-logs' tool added in > MAPREDUCE-6415. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6494) Permission issue when running archive-logs tool as different users
Robert Kanter created MAPREDUCE-6494: Summary: Permission issue when running archive-logs tool as different users Key: MAPREDUCE-6494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6494 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter If the tool is run as user A, it creates {{/tmp/logs/archive-logs-work}} for temp work, but doesn't delete it. When it's run again as user B, you can run into permissions problems on {{/tmp/logs/archive-logs-work}} because user B doesn't have permission to do anything to that dir (it's 700). We should have the tool delete {{/tmp/logs/archive-logs-work}} when exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6494) Permission issue when running archive-logs tool as different users
[ https://issues.apache.org/jira/browse/MAPREDUCE-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934086#comment-14934086 ] Robert Kanter commented on MAPREDUCE-6494: -- It uses {code} Path remoteRootLogDir = new Path(conf.get( YarnConfiguration.NM_REMOTE_APP_LOG_DIR, YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR)); Path workingDir = new Path(remoteRootLogDir, "archive-logs-work"); {code} and {{YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR}} is {{/tmp/logs}}, so the working dir becomes {{/tmp/logs/archive-logs-work}} (by default). > Permission issue when running archive-logs tool as different users > -- > > Key: MAPREDUCE-6494 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6494 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6494.001.patch > > > If the tool is run as user A, it creates {{/tmp/logs/archive-logs-work}} for > temp work, but doesn't delete it. When it's run again as user B, you can run > into permissions problems on {{/tmp/logs/archive-logs-work}} because user B > doesn't have permission to do anything to that dir (it's 700). We should > have the tool delete {{/tmp/logs/archive-logs-work}} when exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6494) Permission issue when running archive-logs tool as different users
[ https://issues.apache.org/jira/browse/MAPREDUCE-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6494: - Attachment: MAPREDUCE-6494.001.patch The changes are mostly just from indenting and moving a few things around. Otherwise, the "real" change is only deleting the directory. > Permission issue when running archive-logs tool as different users > -- > > Key: MAPREDUCE-6494 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6494 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6494.001.patch > > > If the tool is run as user A, it creates {{/tmp/logs/archive-logs-work}} for > temp work, but doesn't delete it. When it's run again as user B, you can run > into permissions problems on {{/tmp/logs/archive-logs-work}} because user B > doesn't have permission to do anything to that dir (it's 700). We should > have the tool delete {{/tmp/logs/archive-logs-work}} when exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6494) Permission issue when running archive-logs tool as different users
[ https://issues.apache.org/jira/browse/MAPREDUCE-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6494: - Status: Patch Available (was: Open) > Permission issue when running archive-logs tool as different users > -- > > Key: MAPREDUCE-6494 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6494 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6494.001.patch > > > If the tool is run as user A, it creates {{/tmp/logs/archive-logs-work}} for > temp work, but doesn't delete it. When it's run again as user B, you can run > into permissions problems on {{/tmp/logs/archive-logs-work}} because user B > doesn't have permission to do anything to that dir (it's 700). We should > have the tool delete {{/tmp/logs/archive-logs-work}} when exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6494) Permission issue when running archive-logs tool as different users
[ https://issues.apache.org/jira/browse/MAPREDUCE-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6494: - Attachment: MAPREDUCE-6494.002.patch The 002 patch uses the working dir to prevent multiple calls to the tool and the {{-force}} option as described in my previous comment. > Permission issue when running archive-logs tool as different users > -- > > Key: MAPREDUCE-6494 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6494 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6494.001.patch, MAPREDUCE-6494.002.patch > > > If the tool is run as user A, it creates {{/tmp/logs/archive-logs-work}} for > temp work, but doesn't delete it. When it's run again as user B, you can run > into permissions problems on {{/tmp/logs/archive-logs-work}} because user B > doesn't have permission to do anything to that dir (it's 700). We should > have the tool delete {{/tmp/logs/archive-logs-work}} when exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6495) Docs for archive-logs tool
Robert Kanter created MAPREDUCE-6495: Summary: Docs for archive-logs tool Key: MAPREDUCE-6495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6495 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Write documentation for the 'mapred archive-logs' tool added in MAPREDUCE-6415. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6494) Permission issue when running archive-logs tool as different users
[ https://issues.apache.org/jira/browse/MAPREDUCE-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934397#comment-14934397 ] Robert Kanter commented on MAPREDUCE-6494: -- That's a good idea. The tool is designed to only run one instance at a time, and we put off enforcing that to figure out later. This sounds like a good way to do that. If the working directory already exists, then the tool can print out a message explaining that. I'll also add a {{-force}} option that will delete the directory in case it's incorrectly there somehow. > Permission issue when running archive-logs tool as different users > -- > > Key: MAPREDUCE-6494 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6494 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6494.001.patch > > > If the tool is run as user A, it creates {{/tmp/logs/archive-logs-work}} for > temp work, but doesn't delete it. When it's run again as user B, you can run > into permissions problems on {{/tmp/logs/archive-logs-work}} because user B > doesn't have permission to do anything to that dir (it's 700). We should > have the tool delete {{/tmp/logs/archive-logs-work}} when exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6480) archive-logs tool may miss applications
[ https://issues.apache.org/jira/browse/MAPREDUCE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6480: - Attachment: MAPREDUCE-6480.003.patch 003 patch addresses Anubhav's comments: 1. Ya. For FAILED, we don't know why, so I figured it was safest to just not do anything to the logs. For TIMED_OUT, it seemed like the idea is that nothing bad happened, but we're not expecting any more logs. 2. Done 3. This was generated by IntelliJ. I'd rather leave it in as-is to be safe. 4. I've added a {{-verbose}} option, which prints out a lot more details about what's happening. 5. Done 6. Done 7. Done 8. I think we should do both. You're right that we could break out if there's an error with just one app. At the same time, if the user running the tool doesn't have permission to access the logs for a specific user, we don't want to keep trying over and over again. 8 (#2). Done 9. Done > archive-logs tool may miss applications > --- > > Key: MAPREDUCE-6480 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6480 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6480.001.patch, MAPREDUCE-6480.002.patch, > MAPREDUCE-6480.003.patch > > > MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files. It > seeds the initial list of applications to process based on apps which have > finished aggregated, according to the RM. However, the RM doesn't remember > completed applications forever (e.g. failover), so it's possible for the tool > to miss applications if they're no longer in the RM. > Instead, we should do the following: > # Seed the initial list of apps based on the aggregated log directories > # Make the RM not consider applications "complete" until their log > aggregation has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, > TIME_OUT). > #2 will allow #1 to assume that any apps not found in the RM are done > aggregating. #1 on it's own should cover most cases though -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6480) archive-logs tool may miss applications
[ https://issues.apache.org/jira/browse/MAPREDUCE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6480: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Thanks for the review [~adhoot]. Committed to trunk and branch-2! > archive-logs tool may miss applications > --- > > Key: MAPREDUCE-6480 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6480 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6480.001.patch, MAPREDUCE-6480.002.patch, > MAPREDUCE-6480.003.patch > > > MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files. It > seeds the initial list of applications to process based on apps which have > finished aggregated, according to the RM. However, the RM doesn't remember > completed applications forever (e.g. failover), so it's possible for the tool > to miss applications if they're no longer in the RM. > Instead, we should do the following: > # Seed the initial list of apps based on the aggregated log directories > # Make the RM not consider applications "complete" until their log > aggregation has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, > TIME_OUT). > #2 will allow #1 to assume that any apps not found in the RM are done > aggregating. #1 on it's own should cover most cases though -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6480) archive-logs tool may miss applications
[ https://issues.apache.org/jira/browse/MAPREDUCE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14875954#comment-14875954 ] Robert Kanter commented on MAPREDUCE-6480: -- One of the checkstyle warnings is that {{finishTime}} hides a field here: {code:java} public void setFinishTime(long finishTime) { this.finishTime = finishTime; } {code} but this is pretty standard practice for a setter, isn't it? > archive-logs tool may miss applications > --- > > Key: MAPREDUCE-6480 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6480 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6480.001.patch, MAPREDUCE-6480.002.patch > > > MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files. It > seeds the initial list of applications to process based on apps which have > finished aggregated, according to the RM. However, the RM doesn't remember > completed applications forever (e.g. failover), so it's possible for the tool > to miss applications if they're no longer in the RM. > Instead, we should do the following: > # Seed the initial list of apps based on the aggregated log directories > # Make the RM not consider applications "complete" until their log > aggregation has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, > TIME_OUT). > #2 will allow #1 to assume that any apps not found in the RM are done > aggregating. #1 on it's own should cover most cases though -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6480) archive-logs tool may miss applications
[ https://issues.apache.org/jira/browse/MAPREDUCE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6480: - Attachment: MAPREDUCE-6480.002.patch The 002 patch fixes the failing tests, checkstyle, and findbugs. > archive-logs tool may miss applications > --- > > Key: MAPREDUCE-6480 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6480 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6480.001.patch, MAPREDUCE-6480.002.patch > > > MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files. It > seeds the initial list of applications to process based on apps which have > finished aggregated, according to the RM. However, the RM doesn't remember > completed applications forever (e.g. failover), so it's possible for the tool > to miss applications if they're no longer in the RM. > Instead, we should do the following: > # Seed the initial list of apps based on the aggregated log directories > # Make the RM not consider applications "complete" until their log > aggregation has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, > TIME_OUT). > #2 will allow #1 to assume that any apps not found in the RM are done > aggregating. #1 on it's own should cover most cases though -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804722#comment-14804722 ] Robert Kanter commented on MAPREDUCE-6460: -- +1 LGTM > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails > --- > > Key: MAPREDUCE-6460 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: MAPREDUCE-6460.000.patch > > > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > fails with the following logs: > --- > T E S T S > --- > Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator > testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) > Time elapsed: 2.606 sec <<< FAILURE! > java.lang.AssertionError: Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > at > org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > Results : > Failed tests: > TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException > Expected exception: > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException > Tests run: 24, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6480) archive-logs tool may miss applications
Robert Kanter created MAPREDUCE-6480: Summary: archive-logs tool may miss applications Key: MAPREDUCE-6480 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6480 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files. It seeds the initial list of applications to process based on apps which have finished aggregated, according to the RM. However, the RM doesn't remember completed applications forever (e.g. failover), so it's possible for the tool to miss applications if they're no longer in the RM. Instead, we should do the following: # Seed the initial list of apps based on the aggregated log directories # Make the RM not consider applications "complete" until their log aggregation has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, TIME_OUT). #2 will allow #1 to assume that any apps not found in the RM are done aggregating. #2 on it's own should cover most cases though -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6480) archive-logs tool may miss applications
[ https://issues.apache.org/jira/browse/MAPREDUCE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6480: - Description: MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files. It seeds the initial list of applications to process based on apps which have finished aggregated, according to the RM. However, the RM doesn't remember completed applications forever (e.g. failover), so it's possible for the tool to miss applications if they're no longer in the RM. Instead, we should do the following: # Seed the initial list of apps based on the aggregated log directories # Make the RM not consider applications "complete" until their log aggregation has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, TIME_OUT). #2 will allow #1 to assume that any apps not found in the RM are done aggregating. #1 on it's own should cover most cases though was: MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files. It seeds the initial list of applications to process based on apps which have finished aggregated, according to the RM. However, the RM doesn't remember completed applications forever (e.g. failover), so it's possible for the tool to miss applications if they're no longer in the RM. Instead, we should do the following: # Seed the initial list of apps based on the aggregated log directories # Make the RM not consider applications "complete" until their log aggregation has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, TIME_OUT). #2 will allow #1 to assume that any apps not found in the RM are done aggregating. #2 on it's own should cover most cases though > archive-logs tool may miss applications > --- > > Key: MAPREDUCE-6480 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6480 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > > MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files. It > seeds the initial list of applications to process based on apps which have > finished aggregated, according to the RM. However, the RM doesn't remember > completed applications forever (e.g. failover), so it's possible for the tool > to miss applications if they're no longer in the RM. > Instead, we should do the following: > # Seed the initial list of apps based on the aggregated log directories > # Make the RM not consider applications "complete" until their log > aggregation has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, > TIME_OUT). > #2 will allow #1 to assume that any apps not found in the RM are done > aggregating. #1 on it's own should cover most cases though -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6480) archive-logs tool may miss applications
[ https://issues.apache.org/jira/browse/MAPREDUCE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6480: - Attachment: MAPREDUCE-6480.001.patch The patch takes care of #1 by refactoring a bunch of code to seed the list from the aggregated log files and filter out any that haven't finished aggregating according to the RM. We can do #2 in a followup YARN JIRA. > archive-logs tool may miss applications > --- > > Key: MAPREDUCE-6480 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6480 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6480.001.patch > > > MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files. It > seeds the initial list of applications to process based on apps which have > finished aggregated, according to the RM. However, the RM doesn't remember > completed applications forever (e.g. failover), so it's possible for the tool > to miss applications if they're no longer in the RM. > Instead, we should do the following: > # Seed the initial list of apps based on the aggregated log directories > # Make the RM not consider applications "complete" until their log > aggregation has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, > TIME_OUT). > #2 will allow #1 to assume that any apps not found in the RM are done > aggregating. #1 on it's own should cover most cases though -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6480) archive-logs tool may miss applications
[ https://issues.apache.org/jira/browse/MAPREDUCE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6480: - Status: Patch Available (was: Open) > archive-logs tool may miss applications > --- > > Key: MAPREDUCE-6480 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6480 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6480.001.patch > > > MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files. It > seeds the initial list of applications to process based on apps which have > finished aggregated, according to the RM. However, the RM doesn't remember > completed applications forever (e.g. failover), so it's possible for the tool > to miss applications if they're no longer in the RM. > Instead, we should do the following: > # Seed the initial list of apps based on the aggregated log directories > # Make the RM not consider applications "complete" until their log > aggregation has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, > TIME_OUT). > #2 will allow #1 to assume that any apps not found in the RM are done > aggregating. #1 on it's own should cover most cases though -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737909#comment-14737909 ] Robert Kanter commented on MAPREDUCE-6415: -- Thanks everyone! I'm glad we finally have a workable solution to this issue in now > Create a tool to combine aggregated logs into HAR files > --- > > Key: MAPREDUCE-6415 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415.001.patch, > MAPREDUCE-6415.002.patch, MAPREDUCE-6415.002.patch, MAPREDUCE-6415.003.patch, > MAPREDUCE-6415_branch-2.001.patch, MAPREDUCE-6415_branch-2.002.patch, > MAPREDUCE-6415_branch-2.003.patch, MAPREDUCE-6415_branch-2_prelim_001.patch, > MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, > MAPREDUCE-6415_prelim_002.patch > > > While we wait for YARN-2942 to become viable, it would still be great to > improve the aggregated logs problem. We can write a tool that combines > aggregated log files into a single HAR file per application, which should > solve the too many files and too many blocks problems. See the design > document for details. > See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6415: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) > Create a tool to combine aggregated logs into HAR files > --- > > Key: MAPREDUCE-6415 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 2.8.0 > > Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415.001.patch, > MAPREDUCE-6415.002.patch, MAPREDUCE-6415.002.patch, MAPREDUCE-6415.003.patch, > MAPREDUCE-6415_branch-2.001.patch, MAPREDUCE-6415_branch-2.002.patch, > MAPREDUCE-6415_branch-2.003.patch, MAPREDUCE-6415_branch-2_prelim_001.patch, > MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, > MAPREDUCE-6415_prelim_002.patch > > > While we wait for YARN-2942 to become viable, it would still be great to > improve the aggregated logs problem. We can write a tool that combines > aggregated log files into a single HAR file per application, which should > solve the too many files and too many blocks problems. See the design > document for details. > See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6415: - Attachment: MAPREDUCE-6415.003.patch The 003 patch addresses the issues Karthik pointed out. I agree that we can follow up with those other things in new JIRAs. > Create a tool to combine aggregated logs into HAR files > --- > > Key: MAPREDUCE-6415 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415.001.patch, > MAPREDUCE-6415.002.patch, MAPREDUCE-6415.002.patch, MAPREDUCE-6415.003.patch, > MAPREDUCE-6415_branch-2.001.patch, MAPREDUCE-6415_branch-2.002.patch, > MAPREDUCE-6415_branch-2.003.patch, MAPREDUCE-6415_branch-2_prelim_001.patch, > MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, > MAPREDUCE-6415_prelim_002.patch > > > While we wait for YARN-2942 to become viable, it would still be great to > improve the aggregated logs problem. We can write a tool that combines > aggregated log files into a single HAR file per application, which should > solve the too many files and too many blocks problems. See the design > document for details. > See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6415: - Attachment: MAPREDUCE-6415_branch-2.003.patch > Create a tool to combine aggregated logs into HAR files > --- > > Key: MAPREDUCE-6415 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415.001.patch, > MAPREDUCE-6415.002.patch, MAPREDUCE-6415.002.patch, > MAPREDUCE-6415_branch-2.001.patch, MAPREDUCE-6415_branch-2.002.patch, > MAPREDUCE-6415_branch-2.003.patch, MAPREDUCE-6415_branch-2_prelim_001.patch, > MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, > MAPREDUCE-6415_prelim_002.patch > > > While we wait for YARN-2942 to become viable, it would still be great to > improve the aggregated logs problem. We can write a tool that combines > aggregated log files into a single HAR file per application, which should > solve the too many files and too many blocks problems. See the design > document for details. > See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6415: - Attachment: MAPREDUCE-6415_branch-2.002.patch MAPREDUCE-6415.002.patch Thanks for the review [~jlowe]! The 002 patch address most of the issues Jason brought up: - fixes dependencies, though I had to keep some of the ones that maven didn't think it needed - fixes usage output to use variables for the defaults. I also changed the units for the max total logs size to megabytes instead of bytes to be easier to use. - now SUCCEEDED and FAILED log aggregation statuses are considered. - improves checkFiles to be more efficient - if maxEligible is 0, it will now print out a message and exit right away. I think having 0 be equivalent to all might be confusing? I'm fine either way; let me know if you think it's better to treat it as equivalent to a negative value. I don't think we should add a unique ID to the working directory. The tool won't work correctly with simultaneous runs anyway because it doesn't acquire any sort of "lock" that would stop another instance from trying to process the same application's logs. As it is now, by using a non-unique directory, anything left over will get cleaned up when you run the tool again (presumably, you're running it at some interval). On that last point, it would be good if we could prevent two instances of the tool from running at the same time. I think the best way to do (without using a lock) is for the tool to check for a RUNNING job named "ArchiveLogs" in the RM, though this won't protect against all situations and will have a false positive if the user has another job named "ArchiveLogs". > Create a tool to combine aggregated logs into HAR files > --- > > Key: MAPREDUCE-6415 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415.001.patch, > MAPREDUCE-6415.002.patch, MAPREDUCE-6415_branch-2.001.patch, > MAPREDUCE-6415_branch-2.002.patch, MAPREDUCE-6415_branch-2_prelim_001.patch, > MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, > MAPREDUCE-6415_prelim_002.patch > > > While we wait for YARN-2942 to become viable, it would still be great to > improve the aggregated logs problem. We can write a tool that combines > aggregated log files into a single HAR file per application, which should > solve the too many files and too many blocks problems. See the design > document for details. > See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717391#comment-14717391 ] Robert Kanter commented on MAPREDUCE-6415: -- [~jlowe], can you take a look at this? Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415.001.patch, MAPREDUCE-6415_branch-2.001.patch, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, MAPREDUCE-6415_prelim_002.patch While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6415: - Attachment: MAPREDUCE-6415_branch-2.001.patch MAPREDUCE-6415.001.patch MAPREDUCE-6415.001.patch and MAPREDUCE-6415_branch-2.001.patch contain the MapReduce changes, though most of it's actually under hadoop-tools. This includes all of the code to find and process the aggregated log files into HAR files. It's mostly the same as the prelim patch, with some minor changes and unit tests. I've uploaded the YARN changes to YARN-4086. The patches for this and YARN-4086 can be applied independently. Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415.001.patch, MAPREDUCE-6415_branch-2.001.patch, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, MAPREDUCE-6415_prelim_002.patch While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6415: - Status: Patch Available (was: In Progress) Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415.001.patch, MAPREDUCE-6415_branch-2.001.patch, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, MAPREDUCE-6415_prelim_002.patch While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709812#comment-14709812 ] Robert Kanter commented on MAPREDUCE-6415: -- Ok, I'll change the logging, start adding unit tests, and clean up some things. Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, MAPREDUCE-6415_prelim_002.patch While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706979#comment-14706979 ] Robert Kanter commented on MAPREDUCE-6415: -- Thanks for the review [~asuresh]. This is just the preliminary patch. I still have to write unit tests, javadocs, and split out the yarn changes into a YARN JIRA. But it sounds like you're good with the approach. [~aw], any other comments? How about you [~jlowe]? Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, MAPREDUCE-6415_prelim_002.patch While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660437#comment-14660437 ] Robert Kanter commented on MAPREDUCE-6443: -- Thanks for the review [~djp] Add JvmPauseMonitor to Job History Server - Key: MAPREDUCE-6443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.8.0 Attachments: MAPREDUCE-6443.001.patch, MAPREDUCE-6443.002.patch We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6443: - Attachment: MAPREDUCE-6443.002.patch The existing metrics init stuff was in the start method, so I moved them all the init method too, in the 002 patch. Add JvmPauseMonitor to Job History Server - Key: MAPREDUCE-6443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: MAPREDUCE-6443.001.patch, MAPREDUCE-6443.002.patch We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
Robert Kanter created MAPREDUCE-6443: Summary: Add JvmPauseMonitor to Job History Server Key: MAPREDUCE-6443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6415: - Attachment: MAPREDUCE-6415_branch-2_prelim_002.patch MAPREDUCE-6415_prelim_002.patch The prelim_002 patch: - Uses {{YARN_SHELL_ID}} from YARN-3950 instead of parsing {{CONTAINER_ID}} - Runs 'hadoop archive' and the FileSystem commands from a Java program, so we can limit the JVM startup cost Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, MAPREDUCE-6415_prelim_002.patch While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6443: - Attachment: MAPREDUCE-6443.001.patch Add JvmPauseMonitor to Job History Server - Key: MAPREDUCE-6443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: MAPREDUCE-6443.001.patch We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6443: - Status: Patch Available (was: Open) Add JvmPauseMonitor to Job History Server - Key: MAPREDUCE-6443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: MAPREDUCE-6443.001.patch We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6433) launchTime may be negative
[ https://issues.apache.org/jira/browse/MAPREDUCE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648284#comment-14648284 ] Robert Kanter commented on MAPREDUCE-6433: -- Good catch; I missed the {{VisibleForTesting}} that should have been there. +1 launchTime may be negative -- Key: MAPREDUCE-6433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6433 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 2.4.1 Reporter: Allen Wittenauer Assignee: zhihai xu Attachments: MAPREDUCE-6433.000.patch, MAPREDUCE-6433.001.patch, REPRODUCE.patch Under extremely rare conditions (.0017% in our sample size), launchTime in the jhist files may be set to -1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6433) launchTime may be negative
[ https://issues.apache.org/jira/browse/MAPREDUCE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645154#comment-14645154 ] Robert Kanter commented on MAPREDUCE-6433: -- LGTM +1 launchTime may be negative -- Key: MAPREDUCE-6433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6433 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 2.4.1 Reporter: Allen Wittenauer Assignee: zhihai xu Attachments: MAPREDUCE-6433.000.patch, REPRODUCE.patch Under extremely rare conditions (.0017% in our sample size), launchTime in the jhist files may be set to -1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639513#comment-14639513 ] Robert Kanter commented on MAPREDUCE-6415: -- I that case, I suppose I could write a Java program that calls the 'hadoop archive' command programmatically, and then the equivalent 'hadoop fs' operations with the Java API. This would only require the one JVM startup. Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_prelim_001.patch While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637767#comment-14637767 ] Robert Kanter commented on MAPREDUCE-6415: -- {quote}Maybe I'm missing it, but why is this being written in bash instead of as an actual yarn application? The JVM startup costs are going to be massive.{quote} The 'hadoop archive' command starts up a JVM. I don't see how we can get around that unless we call it programmatically from an existing JVM and also do it serially, which is going to take a lot longer overall. I figured it would be simpler to use the DistributedShell because it already exists and does most of what we need, than to write a whole new AM that creates containers to run 'hadoop archive'. {quote}Also, is there something that is guaranteeing that HADOOP_HOME is set?{quote} The shell inherits the env of the NodeManager as a base. HADOOP_HOME should be defined for the NM, so it ends up in env of the shell. I wasn't aware of shellcheck before, but that looks like a really useful tool. I'll fix those. Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_prelim_001.patch While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636083#comment-14636083 ] Robert Kanter commented on MAPREDUCE-6415: -- I've created YARN-3950 to add the SHELL_ID and put up a patch there. Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_prelim_001.patch While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6415: - Attachment: MAPREDUCE-6415_branch-2_prelim_001.patch MAPREDUCE-6415_prelim_001.patch I've uploaded a preliminary patch. It adds a command that will look for eligible apps to process, generate a script that will run the 'hadoop archive' command, and runs the script in the distributed shell. It also modifies the 'yarn logs' command and JHS to be able to read the har files. All as described in the design document. I still have to write some unit tests and split up the patch into MAPREDUCE and YARN (and HADOOP?) JIRAs. We can also discuss if we have the right criteria for eligibility. I implemented the ones mentioned in the design document, but it shouldn't be too hard to change them. Here's the CLI usage: {noformat} bin/mapred archive-logs -help usage: yarn archive-logs -help Prints this message -maxEligibleApps nThe maximum number of eligible apps to process (default: -1 (all)) -maxTotalLogsSize bytes The maximum total logs size required to be eligible (default: 1GB) -memory megabytes The amount of memory for each container (default: 1024) -minNumberLogFiles n The minimum number of log files required to be eligible (default: 20) {noformat} I know it's a bit hard to tell from the Java code what the shell script looks like, so here's an example of one: {code} #!/bin/bash set -e set -x CONTAINER_ID_NUM=`echo $CONTAINER_ID | cut -d _ -f 5` if [ $CONTAINER_ID_NUM == 02 ]; then appId=application_1437514991365_0004 user=rkanter elif [ $CONTAINER_ID_NUM == 03 ]; then appId=application_1437514991365_0005 user=rkanter elif [ $CONTAINER_ID_NUM == 04 ]; then appId=application_1437514991365_0003 user=rkanter elif [ $CONTAINER_ID_NUM == 05 ]; then appId=application_1437514991365_0007 user=rkanter elif [ $CONTAINER_ID_NUM == 06 ]; then appId=application_1437514991365_0006 user=rkanter else echo Unknown Mapping! exit -1 fi export HADOOP_CLIENT_OPTS=-Xmx1024m $HADOOP_HOME/bin/hadoop archive -Dmapreduce.framework.name=local -archiveName $appId.har -p /tmp/logs/$user/logs/$appId \* /tmp/logs/archive-logs-work $HADOOP_HOME/bin/hadoop fs -mv /tmp/logs/archive-logs-work/$appId.har /tmp/logs/$user/logs/$appId/$appId.har originalLogs=`$HADOOP_HOME/bin/hadoop fs -ls /tmp/logs/$user/logs/$appId | grep ^- | awk '{print $8}'` if [ ! -z $originalLogs ]; then $HADOOP_HOME/bin/hadoop fs -rm $originalLogs fi {code} Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_prelim_001.patch While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636027#comment-14636027 ] Robert Kanter commented on MAPREDUCE-6415: -- I didn't realize that that can happen. In that case, having a monotonically increasing number in each container's env independent of the CONTAINER_ID sounds like a good solution. Plus, I won't have to do any parsing to get the unique number. I'll double check, but I think each shell has the same env (other than the CONTAINER_ID), and there's no way to set different ones per shell. If that's the case, it should be fairly easy to add a SHELL_ID env var to the DistributedShell AM that behaves how we want, as a separate JIRA. Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_prelim_001.patch While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAPREDUCE-6415 started by Robert Kanter. Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6121) JobSubmitter.compareFs doesn't handle HA namespaces
[ https://issues.apache.org/jira/browse/MAPREDUCE-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609101#comment-14609101 ] Robert Kanter commented on MAPREDUCE-6121: -- Actually, I just realized that this affects MR and Common. I think for bookkeeping purposes, we should split this up into two JIRAs: one to do the FileUtil and DistCp changes (HADOOP) and one to do the MR change (MAPREDUCE). JobSubmitter.compareFs doesn't handle HA namespaces --- Key: MAPREDUCE-6121 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6121 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0 Reporter: Thomas Graves Assignee: Ray Chiang Attachments: MAPREDUCE-6121.001.patch, MAPREDUCE-6121.002.patch Looking at the JobSubmitter.compareFs it doesn't look like it properly handles HA namespaces. The code tries to lookup the hostname using InetAddress.getByName, but if you are using namespaces this is going to fail and its going to copy the file when it doesn't need to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-6121) JobResourceUpdater#compareFs() doesn't handle HA namespaces
[ https://issues.apache.org/jira/browse/MAPREDUCE-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter resolved MAPREDUCE-6121. -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Thanks Ray. Committed to trunk and branch-2! JobResourceUpdater#compareFs() doesn't handle HA namespaces --- Key: MAPREDUCE-6121 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6121 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0 Reporter: Thomas Graves Assignee: Ray Chiang Fix For: 2.8.0 Attachments: MAPREDUCE-6121.001.patch, MAPREDUCE-6121.002.patch, MAPREDUCE-6121.003.patch Looking at the JobSubmitter.compareFs it doesn't look like it properly handles HA namespaces. The code tries to lookup the hostname using InetAddress.getByName, but if you are using namespaces this is going to fail and its going to copy the file when it doesn't need to. Edit: JobSubmitter was updated to JobResourceUpdater in MAPREDUCE-6267. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6121) JobSubmitter.compareFs doesn't handle HA namespaces
[ https://issues.apache.org/jira/browse/MAPREDUCE-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606836#comment-14606836 ] Robert Kanter commented on MAPREDUCE-6121: -- +1 LGTM JobSubmitter.compareFs doesn't handle HA namespaces --- Key: MAPREDUCE-6121 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6121 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.5.0 Reporter: Thomas Graves Assignee: Ray Chiang Attachments: MAPREDUCE-6121.001.patch, MAPREDUCE-6121.002.patch Looking at the JobSubmitter.compareFs it doesn't look like it properly handles HA namespaces. The code tries to lookup the hostname using InetAddress.getByName, but if you are using namespaces this is going to fail and its going to copy the file when it doesn't need to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601422#comment-14601422 ] Robert Kanter commented on MAPREDUCE-6415: -- Yahoo! also has their own private implementation. So, it seems like there's a need for something like this, and it would be great if everyone could use and contribute to the same version of it. YARN-2942 is being put on hold for the moment because of concerns about HDFS-3689. We can still do it eventually. As for MAPREDUCE-6283, I agree with Vinod's comment there that it seems to be a duplicate of YARN-2942 for the logs part, and a duplicate of the ATSv2 work for the jhist part. Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6415: - Attachment: HAR-ableAggregatedLogs_v1.pdf Create a tool to combine aggregated logs into HAR files --- Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: HAR-ableAggregatedLogs_v1.pdf While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files
Robert Kanter created MAPREDUCE-6415: Summary: Create a tool to combine aggregated logs into HAR files Key: MAPREDUCE-6415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem. We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems. See the design document for details. See YARN-2942 for more context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6409) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6409: - Attachment: MAPREDUCE-6409.002.patch The new patch renames the enum to {{FAILED_BY_YARN}}. The checkstyle warnings are all for lines in the state machine transitions, which currently match the rest of the lines so I don't want to fix those. NM restarts could lead to app failures -- Key: MAPREDUCE-6409 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6409 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6409) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594212#comment-14594212 ] Robert Kanter commented on MAPREDUCE-6409: -- I moved this to MAPREDUCE because I'm doing the 3rd suggestion that Karthik mentioned where MR handles this type of failure differently, and doesn't count it against the retries. NM restarts could lead to app failures -- Key: MAPREDUCE-6409 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6409 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6409) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6409: - Status: Patch Available (was: Open) NM restarts could lead to app failures -- Key: MAPREDUCE-6409 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6409 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6409) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6409: - Attachment: MAPREDUCE-6409.001.patch NM restarts could lead to app failures -- Key: MAPREDUCE-6409 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6409 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Attachments: MAPREDUCE-6409.001.patch Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (MAPREDUCE-6409) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter moved YARN-3811 to MAPREDUCE-6409: Component/s: (was: nodemanager) Target Version/s: 2.8.0 (was: 2.7.1) Affects Version/s: (was: 2.7.0) 2.7.0 Key: MAPREDUCE-6409 (was: YARN-3811) Project: Hadoop Map/Reduce (was: Hadoop YARN) NM restarts could lead to app failures -- Key: MAPREDUCE-6409 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6409 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Robert Kanter Priority: Critical Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high
[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571683#comment-14571683 ] Robert Kanter commented on MAPREDUCE-5965: -- +1 LGTM. Will commit this later today if nobody has any other comments. Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high Key: MAPREDUCE-5965 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arup Malakar Assignee: Wilfred Spiegelenburg Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, MAPREDUCE-5965.3.patch, MAPREDUCE-5965.patch Hadoop streaming exposes all the key values in job conf as environment variables when it forks a process for streaming code to run. Unfortunately the variable mapreduce_input_fileinputformat_inputdir contains the list of input files, and Linux has a limit on size of environment variables + arguments. Based on how long the list of files and their full path is this could be pretty huge. And given all of these variables are not even used it stops user from running hadoop job with large number of files, even though it could be run. Linux throws E2BIG if the size is greater than certain size which is error code 7. And java translates that to error=7, Argument list too long. More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping variables if it is greater than certain length. That way if user code requires the environment variable it would fail. It should also introduce a config variable to skip long variables, and set it to false by default. That way user has to specifically set it to true to invoke this feature. Here is the exception: {code} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at
[jira] [Updated] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high
[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-5965: - Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Wilfred. Committed to trunk and branch-2! Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high Key: MAPREDUCE-5965 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arup Malakar Assignee: Wilfred Spiegelenburg Fix For: 2.8.0 Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, MAPREDUCE-5965.3.patch, MAPREDUCE-5965.patch Hadoop streaming exposes all the key values in job conf as environment variables when it forks a process for streaming code to run. Unfortunately the variable mapreduce_input_fileinputformat_inputdir contains the list of input files, and Linux has a limit on size of environment variables + arguments. Based on how long the list of files and their full path is this could be pretty huge. And given all of these variables are not even used it stops user from running hadoop job with large number of files, even though it could be run. Linux throws E2BIG if the size is greater than certain size which is error code 7. And java translates that to error=7, Argument list too long. More: http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping variables if it is greater than certain length. That way if user code requires the environment variable it would fail. It should also introduce a config variable to skip long variables, and set it to false by default. That way user has to specifically set it to true to invoke this feature. Here is the exception: {code} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.forkAndExec(Native Method)
[jira] [Created] (MAPREDUCE-6375) Modify the JHS to be able to read the ConcatenatableAggregatedLogFormat
Robert Kanter created MAPREDUCE-6375: Summary: Modify the JHS to be able to read the ConcatenatableAggregatedLogFormat Key: MAPREDUCE-6375 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6375 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter When serving logs, the JHS needs to be able to read the {{ConcatenatableAggregatedLogFormat}} or the {{AggregatedLogFormat}} transparently. (see YARN-2942) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535615#comment-14535615 ] Robert Kanter commented on MAPREDUCE-6288: -- I agree with Gera and Karthik. The permissions change should be benign (plus, as it stands, it seems funny for the user to own the file if they can't actually even read it). mapred job -status fails with AccessControlException - Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker Attachments: MAPREDUCE-6288-gera-001.patch, MAPREDUCE-6288.002.patch, MAPREDUCE-6288.patch After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:257) at
[jira] [Commented] (MAPREDUCE-6192) Create unit test to automatically compare MR related classes and mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529353#comment-14529353 ] Robert Kanter commented on MAPREDUCE-6192: -- +1 Create unit test to automatically compare MR related classes and mapred-default.xml --- Key: MAPREDUCE-6192 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6192 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Fix For: 2.8.0 Attachments: MAPREDUCE-6192.001.patch, MAPREDUCE-6192.002.patch, MAPREDUCE-6192.003.patch, MAPREDUCE-6192.004.patch, MAPREDUCE-6192.005.patch, MAPREDUCE-6192.006.patch, MAPREDUCE-6192.007.patch, MAPREDUCE-6192.branch-2.007.patch Create a unit test that will automatically compare the fields in the various MapReduce related classes and mapred-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6192) Create unit test to automatically compare MR related classes and mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6192: - Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Ray. Committed to trunk and branch-2! Create unit test to automatically compare MR related classes and mapred-default.xml --- Key: MAPREDUCE-6192 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6192 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Fix For: 2.8.0 Attachments: MAPREDUCE-6192.001.patch, MAPREDUCE-6192.002.patch, MAPREDUCE-6192.003.patch, MAPREDUCE-6192.004.patch, MAPREDUCE-6192.005.patch, MAPREDUCE-6192.006.patch, MAPREDUCE-6192.007.patch, MAPREDUCE-6192.branch-2.007.patch Create a unit test that will automatically compare the fields in the various MapReduce related classes and mapred-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6192) Create unit test to automatically compare MR related classes and mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527011#comment-14527011 ] Robert Kanter commented on MAPREDUCE-6192: -- LGTM +1 Create unit test to automatically compare MR related classes and mapred-default.xml --- Key: MAPREDUCE-6192 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6192 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Attachments: MAPREDUCE-6192.001.patch, MAPREDUCE-6192.002.patch, MAPREDUCE-6192.003.patch, MAPREDUCE-6192.004.patch, MAPREDUCE-6192.005.patch, MAPREDUCE-6192.006.patch, MAPREDUCE-6192.007.patch Create a unit test that will automatically compare the fields in the various MapReduce related classes and mapred-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6192) Create unit test to automatically compare MR related classes and mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527685#comment-14527685 ] Robert Kanter commented on MAPREDUCE-6192: -- [~rchiang], it works fine on trunk, but it looks like there's some differences in branch-2. Can you look into this? {noformat} Running org.apache.hadoop.mapreduce.TestMapreduceConfigFields Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.624 sec FAILURE! - in org.apache.hadoop.mapreduce.TestMapreduceConfigFields testCompareXmlAgainstConfigurationClass(org.apache.hadoop.mapreduce.TestMapreduceConfigFields) Time elapsed: 0.497 sec FAILURE! java.lang.AssertionError: mapred-default.xml has 4 properties missing in interface org.apache.hadoop.mapreduce.MRJobConfig interface org.apache.hadoop.mapreduce.MRConfig class org.apache.hadoop.mapreduce.v2.jobhistory.JHAdminConfig class org.apache.hadoop.mapred.ShuffleHandler class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat class org.apache.hadoop.mapreduce.lib.input.FileInputFormat class org.apache.hadoop.mapreduce.Job class org.apache.hadoop.mapreduce.lib.input.NLineInputFormat class org.apache.hadoop.mapred.JobConf class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareXmlAgainstConfigurationClass(TestConfigurationFieldsBase.java:468) ... {noformat} Create unit test to automatically compare MR related classes and mapred-default.xml --- Key: MAPREDUCE-6192 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6192 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Attachments: MAPREDUCE-6192.001.patch, MAPREDUCE-6192.002.patch, MAPREDUCE-6192.003.patch, MAPREDUCE-6192.004.patch, MAPREDUCE-6192.005.patch, MAPREDUCE-6192.006.patch, MAPREDUCE-6192.007.patch Create a unit test that will automatically compare the fields in the various MapReduce related classes and mapred-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527394#comment-14527394 ] Robert Kanter commented on MAPREDUCE-6165: -- I ran the tests in hadoop-mapreduce-client-jobclient and they all passed locally (took about 1.5 hours). I've kicked off Jenkins again just to be extra sure, but it looks like this patch is probably fine. [JDK8] TestCombineFileInputFormat failed on JDK8 Key: MAPREDUCE-6165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Wei Yan Assignee: Akira AJISAKA Priority: Minor Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-002.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-004.patch, MAPREDUCE-6165-reproduce.patch The error msg: {noformat} testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 2.487 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911) testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 0.985 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521884#comment-14521884 ] Robert Kanter commented on MAPREDUCE-6165: -- I've kicked it off again. [JDK8] TestCombineFileInputFormat failed on JDK8 Key: MAPREDUCE-6165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Wei Yan Assignee: Akira AJISAKA Priority: Minor Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-002.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-004.patch, MAPREDUCE-6165-reproduce.patch The error msg: {noformat} testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 2.487 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911) testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 0.985 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6192) Create unit test to automatically compare MR related classes and mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518185#comment-14518185 ] Robert Kanter commented on MAPREDUCE-6192: -- To fix the deprecation warnings, can you add the {{@SuppressWarnings(deprecation)}} thing where appropriate? Create unit test to automatically compare MR related classes and mapred-default.xml --- Key: MAPREDUCE-6192 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6192 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Attachments: MAPREDUCE-6192.001.patch, MAPREDUCE-6192.002.patch, MAPREDUCE-6192.003.patch, MAPREDUCE-6192.004.patch, MAPREDUCE-6192.005.patch, MAPREDUCE-6192.006.patch Create a unit test that will automatically compare the fields in the various MapReduce related classes and mapred-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518172#comment-14518172 ] Robert Kanter commented on MAPREDUCE-6165: -- LGTM. I've checked locally that it now passes on JDKs 7 and 8. I've just kicked off Jenkins to run again; hopefully it will work correctly this time. +1 pending Jenkins [JDK8] TestCombineFileInputFormat failed on JDK8 Key: MAPREDUCE-6165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Wei Yan Assignee: Akira AJISAKA Priority: Minor Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-002.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-003.patch, MAPREDUCE-6165-reproduce.patch The error msg: {noformat} testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 2.487 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911) testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat) Time elapsed: 0.985 sec FAILURE! junit.framework.AssertionFailedError: expected:2 but was:1 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:78) at junit.framework.Assert.assertEquals(Assert.java:234) at junit.framework.Assert.assertEquals(Assert.java:241) at junit.framework.TestCase.assertEquals(TestCase.java:409) at org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6324) Uber jobs fail to update AMRM token when it rolls over
[ https://issues.apache.org/jira/browse/MAPREDUCE-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505767#comment-14505767 ] Robert Kanter commented on MAPREDUCE-6324: -- Thanks for finding this issue and posting it; this'll be helpful for anyone using Uber mode. LGTM +1 on the change. It seems like this might be tricky to do, but would it be possible to add a unit test? Uber jobs fail to update AMRM token when it rolls over -- Key: MAPREDUCE-6324 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6324 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: MAPREDUCE-6324.001.patch When the RM rolls a new AMRM master key the AMs are supposed to receive a new AMRM token on subsequent heartbeats between the time when the new key is rolled and when it is activated. This is not occurring for uber jobs. If the connection to the RM needs to be re-established after the new key is activated (e.g.: RM restart or network hiccup) then the uber job AM will be unable to reconnect to the RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6238) MR2 can't run local jobs with -libjars command options which is a regression from MR1
[ https://issues.apache.org/jira/browse/MAPREDUCE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6238: - Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Zhihai. Committed to trunk and branch-2! MR2 can't run local jobs with -libjars command options which is a regression from MR1 - Key: MAPREDUCE-6238 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6238 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Fix For: 2.8.0 Attachments: MAPREDUCE-6238.000.patch MR2 can't run local jobs with -libjars command options which is a regression from MR1. When run MR2 job with -jt local and -libjars, the job fails with java.io.FileNotFoundException: File does not exist: hdfs://XXX.jar. But the same command is working in MR1. I find the problem is 1. because when MR2 run local job using LocalJobRunner from JobSubmitter, the JobSubmitter#jtFs is local filesystem, So copyRemoteFiles will return from [the middle of the function|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L138] because source and destination file system are same. {code} if (compareFs(remoteFs, jtFs)) { return originalPath; } {code} The following code at [JobSubmitter.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L219] try to add the destination file to DistributedCache which introduce a bug for local job. {code} Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, replication); DistributedCache.addFileToClassPath( new Path(newPath.toUri().getPath()), conf); {code} Because new Path(newPath.toUri().getPath()) will lose the filesystem information from newPath, the file added to DistributedCache will use the default Uri filesystem hdfs based on the following code. This causes the FileNotFoundException when we access the file later at [determineTimestampsAndCacheVisibilities|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L270] {code} public static void addFileToClassPath(Path file, Configuration conf) throws IOException { addFileToClassPath(file, conf, file.getFileSystem(conf)); } public static void addFileToClassPath (Path file, Configuration conf, FileSystem fs) throws IOException { String classpath = conf.get(MRJobConfig.CLASSPATH_FILES); conf.set(MRJobConfig.CLASSPATH_FILES, classpath == null ? file.toString() : classpath + , + file.toString()); URI uri = fs.makeQualified(file).toUri(); addCacheFile(uri, conf); } {code} Compare to the following [MR1 code|https://github.com/apache/hadoop/blob/branch-1/src/mapred/org/apache/hadoop/mapred/JobClient.java#L811]: {code} Path newPath = copyRemoteFiles(fs, libjarsDir, tmp, job, replication); DistributedCache.addFileToClassPath( new Path(newPath.toUri().getPath()), job, fs); {code} You will see why MR1 doesn't have this issue. because it passes the local filesystem into DistributedCache#addFileToClassPath instead of using the default Uri filesystem hdfs. 2. Another incompatible change in MR2 is in [LocalDistributedCacheManager#setup|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#L113] {code} // Find which resources are to be put on the local classpath MapString, Path classpaths = new HashMapString, Path(); Path[] archiveClassPaths = DistributedCache.getArchiveClassPaths(conf); if (archiveClassPaths != null) { for (Path p : archiveClassPaths) { FileSystem remoteFS = p.getFileSystem(conf); p = remoteFS.resolvePath(p.makeQualified(remoteFS.getUri(), remoteFS.getWorkingDirectory())); classpaths.put(p.toUri().getPath().toString(), p); } } Path[] fileClassPaths = DistributedCache.getFileClassPaths(conf); if (fileClassPaths != null) { for (Path p : fileClassPaths) { FileSystem remoteFS = p.getFileSystem(conf); p =
[jira] [Commented] (MAPREDUCE-6238) MR2 can't run local jobs with -libjars command options which is a regression from MR1
[ https://issues.apache.org/jira/browse/MAPREDUCE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499060#comment-14499060 ] Robert Kanter commented on MAPREDUCE-6238: -- Looks good to me. I'd like to get a more clean build before we commit this, so I've kicked off another one. +1 pending Jenkins MR2 can't run local jobs with -libjars command options which is a regression from MR1 - Key: MAPREDUCE-6238 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6238 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: MAPREDUCE-6238.000.patch MR2 can't run local jobs with -libjars command options which is a regression from MR1. When run MR2 job with -jt local and -libjars, the job fails with java.io.FileNotFoundException: File does not exist: hdfs://XXX.jar. But the same command is working in MR1. I find the problem is 1. because when MR2 run local job using LocalJobRunner from JobSubmitter, the JobSubmitter#jtFs is local filesystem, So copyRemoteFiles will return from [the middle of the function|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L138] because source and destination file system are same. {code} if (compareFs(remoteFs, jtFs)) { return originalPath; } {code} The following code at [JobSubmitter.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L219] try to add the destination file to DistributedCache which introduce a bug for local job. {code} Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, replication); DistributedCache.addFileToClassPath( new Path(newPath.toUri().getPath()), conf); {code} Because new Path(newPath.toUri().getPath()) will lose the filesystem information from newPath, the file added to DistributedCache will use the default Uri filesystem hdfs based on the following code. This causes the FileNotFoundException when we access the file later at [determineTimestampsAndCacheVisibilities|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L270] {code} public static void addFileToClassPath(Path file, Configuration conf) throws IOException { addFileToClassPath(file, conf, file.getFileSystem(conf)); } public static void addFileToClassPath (Path file, Configuration conf, FileSystem fs) throws IOException { String classpath = conf.get(MRJobConfig.CLASSPATH_FILES); conf.set(MRJobConfig.CLASSPATH_FILES, classpath == null ? file.toString() : classpath + , + file.toString()); URI uri = fs.makeQualified(file).toUri(); addCacheFile(uri, conf); } {code} Compare to the following [MR1 code|https://github.com/apache/hadoop/blob/branch-1/src/mapred/org/apache/hadoop/mapred/JobClient.java#L811]: {code} Path newPath = copyRemoteFiles(fs, libjarsDir, tmp, job, replication); DistributedCache.addFileToClassPath( new Path(newPath.toUri().getPath()), job, fs); {code} You will see why MR1 doesn't have this issue. because it passes the local filesystem into DistributedCache#addFileToClassPath instead of using the default Uri filesystem hdfs. 2. Another incompatible change in MR2 is in [LocalDistributedCacheManager#setup|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#L113] {code} // Find which resources are to be put on the local classpath MapString, Path classpaths = new HashMapString, Path(); Path[] archiveClassPaths = DistributedCache.getArchiveClassPaths(conf); if (archiveClassPaths != null) { for (Path p : archiveClassPaths) { FileSystem remoteFS = p.getFileSystem(conf); p = remoteFS.resolvePath(p.makeQualified(remoteFS.getUri(), remoteFS.getWorkingDirectory())); classpaths.put(p.toUri().getPath().toString(), p); } } Path[] fileClassPaths = DistributedCache.getFileClassPaths(conf); if (fileClassPaths != null) { for (Path p : fileClassPaths) { FileSystem remoteFS = p.getFileSystem(conf); p = remoteFS.resolvePath(p.makeQualified(remoteFS.getUri(),
[jira] [Updated] (MAPREDUCE-6266) Job#getTrackingURL should consistently return a proper URL
[ https://issues.apache.org/jira/browse/MAPREDUCE-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6266: - Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Ray. Committed to trunk and branch-2! Job#getTrackingURL should consistently return a proper URL -- Key: MAPREDUCE-6266 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6266 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Fix For: 2.8.0 Attachments: MAPREDUCE-6266.001.patch, MAPREDUCE-6266.002.patch, MAPREDUCE-6266.003.patch When a job is running, Job#getTrackingURL returns a proper URL like: http://RM_IP:8088/proxy/application_1424910897258_0004/ Once a job is finished and the job has moved to the JHS, then Job#getTrackingURL returns a URL without the protocol like: JHS_IP:19888/jobhistory/job/job_1424910897258_0004 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6266) Job#getTrackingURL should consistently return a proper URL
[ https://issues.apache.org/jira/browse/MAPREDUCE-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393523#comment-14393523 ] Robert Kanter commented on MAPREDUCE-6266: -- The 003 patch looks good to me. +1 [~djp], any additional comments? Job#getTrackingURL should consistently return a proper URL -- Key: MAPREDUCE-6266 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6266 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Attachments: MAPREDUCE-6266.001.patch, MAPREDUCE-6266.002.patch, MAPREDUCE-6266.003.patch When a job is running, Job#getTrackingURL returns a proper URL like: http://RM_IP:8088/proxy/application_1424910897258_0004/ Once a job is finished and the job has moved to the JHS, then Job#getTrackingURL returns a URL without the protocol like: JHS_IP:19888/jobhistory/job/job_1424910897258_0004 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6076: - Resolution: Fixed Fix Version/s: 1.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Zhihai. Committed to branch-1! Zero map split input length combine with none zero map split input length will cause MR1 job hung. Key: MAPREDUCE-6076 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6076 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Fix For: 1.3.0 Attachments: MAPREDUCE-6076.branch-1.000.patch Zero map split input length combine with none zero map split input length will cause MR1 job hung. This problem may happen when use HBASE input split(TableSplit). HBASE split input length can be zero for unknown regions or non-zero for known regions in the following code: {code} // TableSplit.java public long getLength() { return length; } // RegionSizeCalculator.java public long getRegionSize(byte[] regionId) { Long size = sizeMap.get(regionId); if (size == null) { LOG.debug(Unknown region: + Arrays.toString(regionId)); return 0; } else { return size; } } {code} The TableSplit length come from RegionSizeCalculator.getRegionSize. The job hung is because in MR1, If these zero split input length map tasks are scheduled and completed before all none zero split input length map tasks are scheduled, Scheduling new map task in JobProgress.java will be failed to pass the TaskTracker resources check at. {code} // findNewMapTask // Check to ensure this TaskTracker has enough resources to // run tasks from this job long outSize = resourceEstimator.getEstimatedMapOutputSize(); long availSpace = tts.getResourceStatus().getAvailableSpace(); if(availSpace outSize) { LOG.warn(No room for map task. Node + tts.getHost() + has + availSpace + bytes free; but we expect map to take + outSize); return -1; //see if a different TIP might work better. } {code} The resource calculation is at {code} // in ResourceEstimator.java protected synchronized long getEstimatedTotalMapOutputSize() { if(completedMapsUpdates threshholdToUse) { return 0; } else { long inputSize = job.getInputLength() + job.desiredMaps(); //add desiredMaps() so that randomwriter case doesn't blow up //the multiplication might lead to overflow, casting it with //double prevents it long estimate = Math.round(((double)inputSize * completedMapsOutputSize * 2.0)/completedMapsInputSize); if (LOG.isDebugEnabled()) { LOG.debug(estimate total map output will be + estimate); } return estimate; } } protected synchronized void updateWithCompletedTask(TaskStatus ts, TaskInProgress tip) { //-1 indicates error, which we don't average in. if(tip.isMapTask() ts.getOutputSize() != -1) { completedMapsUpdates++; completedMapsInputSize+=(tip.getMapInputSize()+1); completedMapsOutputSize+=ts.getOutputSize(); if(LOG.isDebugEnabled()) { LOG.debug(completedMapsUpdates:+completedMapsUpdates+ + completedMapsInputSize:+completedMapsInputSize+ + completedMapsOutputSize:+completedMapsOutputSize); } } } {code} You can see in the calculation: completedMapsInputSize will be a very small number and inputSize * completedMapsOutputSize will be a very big number For example, completedMapsInputSize = 1; inputSize = 100MBytes and completedMapsOutputSize=100MBytes, The estimate will be 5000TB which will be more than most task tracker disk space size. So I think if the map split input length is 0, it means the split input length is unknown and it is reasonable to use map output size as input size for the calculation in ResourceEstimator. I will upload a fix based on this method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391777#comment-14391777 ] Robert Kanter commented on MAPREDUCE-6076: -- +1 Zero map split input length combine with none zero map split input length will cause MR1 job hung. Key: MAPREDUCE-6076 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6076 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-6076.branch-1.000.patch Zero map split input length combine with none zero map split input length will cause MR1 job hung. This problem may happen when use HBASE input split(TableSplit). HBASE split input length can be zero for unknown regions or non-zero for known regions in the following code: {code} // TableSplit.java public long getLength() { return length; } // RegionSizeCalculator.java public long getRegionSize(byte[] regionId) { Long size = sizeMap.get(regionId); if (size == null) { LOG.debug(Unknown region: + Arrays.toString(regionId)); return 0; } else { return size; } } {code} The TableSplit length come from RegionSizeCalculator.getRegionSize. The job hung is because in MR1, If these zero split input length map tasks are scheduled and completed before all none zero split input length map tasks are scheduled, Scheduling new map task in JobProgress.java will be failed to pass the TaskTracker resources check at. {code} // findNewMapTask // Check to ensure this TaskTracker has enough resources to // run tasks from this job long outSize = resourceEstimator.getEstimatedMapOutputSize(); long availSpace = tts.getResourceStatus().getAvailableSpace(); if(availSpace outSize) { LOG.warn(No room for map task. Node + tts.getHost() + has + availSpace + bytes free; but we expect map to take + outSize); return -1; //see if a different TIP might work better. } {code} The resource calculation is at {code} // in ResourceEstimator.java protected synchronized long getEstimatedTotalMapOutputSize() { if(completedMapsUpdates threshholdToUse) { return 0; } else { long inputSize = job.getInputLength() + job.desiredMaps(); //add desiredMaps() so that randomwriter case doesn't blow up //the multiplication might lead to overflow, casting it with //double prevents it long estimate = Math.round(((double)inputSize * completedMapsOutputSize * 2.0)/completedMapsInputSize); if (LOG.isDebugEnabled()) { LOG.debug(estimate total map output will be + estimate); } return estimate; } } protected synchronized void updateWithCompletedTask(TaskStatus ts, TaskInProgress tip) { //-1 indicates error, which we don't average in. if(tip.isMapTask() ts.getOutputSize() != -1) { completedMapsUpdates++; completedMapsInputSize+=(tip.getMapInputSize()+1); completedMapsOutputSize+=ts.getOutputSize(); if(LOG.isDebugEnabled()) { LOG.debug(completedMapsUpdates:+completedMapsUpdates+ + completedMapsInputSize:+completedMapsInputSize+ + completedMapsOutputSize:+completedMapsOutputSize); } } } {code} You can see in the calculation: completedMapsInputSize will be a very small number and inputSize * completedMapsOutputSize will be a very big number For example, completedMapsInputSize = 1; inputSize = 100MBytes and completedMapsOutputSize=100MBytes, The estimate will be 5000TB which will be more than most task tracker disk space size. So I think if the map split input length is 0, it means the split input length is unknown and it is reasonable to use map output size as input size for the calculation in ResourceEstimator. I will upload a fix based on this method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387040#comment-14387040 ] Robert Kanter commented on MAPREDUCE-6288: -- Thanks everyone. mapred job -status fails with AccessControlException - Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker Fix For: 2.7.0 Attachments: MAPREDUCE-6288-gera-001.patch, MAPREDUCE-6288.002.patch, MAPREDUCE-6288.patch After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:257) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1490) at
[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384673#comment-14384673 ] Robert Kanter commented on MAPREDUCE-6288: -- I'll post an updated patch later today that makes the JHS update the permissions on startup. mapred job -status fails with AccessControlException - Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker Attachments: MAPREDUCE-6288-gera-001.patch, MAPREDUCE-6288.patch After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:257) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1490) at
[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384925#comment-14384925 ] Robert Kanter commented on MAPREDUCE-6288: -- What if we do this fix for now and add a followup JIRA to add a cache? This shouldn't be a that huge of a performance hit. mapred job -status fails with AccessControlException - Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker Attachments: MAPREDUCE-6288-gera-001.patch, MAPREDUCE-6288.patch After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:257) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1490) at
[jira] [Updated] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6288: - Attachment: MAPREDUCE-6288.002.patch The MAPREDUCE-6288.002.patch builds on the previous MAPREDUCE-6288.patch I originally made. Besides using 771 instead of 770, it also makes the JHS check and correct the permissions on startup. I added a unit test and also verified in a cluster. mapred job -status fails with AccessControlException - Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker Attachments: MAPREDUCE-6288-gera-001.patch, MAPREDUCE-6288.002.patch, MAPREDUCE-6288.patch After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) at
[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382626#comment-14382626 ] Robert Kanter commented on MAPREDUCE-6288: -- Exactly. That's what my original patch does. It doesn't change the permissions of the jobconf file itself. It only adds executable permissions to the parent directories. Users will still be prevented from listing the directories or other operations they shouldn't be doing. It simply allows users who own files in that hierarchy the ability to access those files, which they previously couldn't (which in hindsight, seems incorrect regardless). {quote}I quickly checked, and there seems be another not-so-minor problem of this patch not working for existing clusters without needing admins to explicitly go and change the permissions on the directory. History server doesn't seem to correct the permissions if the directory already exists.{quote} It should be pretty easy to make the JHS correct the permissions on startup. mapred job -status fails with AccessControlException - Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker Attachments: MAPREDUCE-6288-gera-001.patch, MAPREDUCE-6288.patch After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at
[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380601#comment-14380601 ] Robert Kanter commented on MAPREDUCE-6288: -- [~kasha] is correct; executable permissions alone don't let you list the contents of a directory. mapred job -status fails with AccessControlException - Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker Attachments: MAPREDUCE-6288-gera-001.patch, MAPREDUCE-6288.patch After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:257) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1490) at
[jira] [Commented] (MAPREDUCE-5875) Make Counter limits consistent across JobClient, MRAppMaster, and YarnChild
[ https://issues.apache.org/jira/browse/MAPREDUCE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376897#comment-14376897 ] Robert Kanter commented on MAPREDUCE-5875: -- It looks like this patch breaks {{mapred job -status job-id}}. See MAPREDUCE-6288. Make Counter limits consistent across JobClient, MRAppMaster, and YarnChild --- Key: MAPREDUCE-5875 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5875 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, client, task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Fix For: 2.7.0 Attachments: MAPREDUCE-5875.v01.patch, MAPREDUCE-5875.v02.patch, MAPREDUCE-5875.v03.patch, MAPREDUCE-5875.v04.patch, MAPREDUCE-5875.v05.patch, MAPREDUCE-5875.v06.patch, MAPREDUCE-5875.v07.patch, MAPREDUCE-5875.v08.patch, MAPREDUCE-5875.v09.patch Currently, counter limits mapreduce.job.counters.* handled by {{org.apache.hadoop.mapreduce.counters.Limits}} are initialized asymmetrically: on the client side, and on the AM, job.xml is ignored whereas it's taken into account in YarnChild. It would be good to make the Limits job-configurable, such that max counters/groups is only increased when needed. With the current Limits implementation relying on static constants, it's going to be challenging for tools that submit jobs concurrently without resorting to class loading isolation. The patch that I am uploading is not perfect but demonstrates the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6288: - Attachment: MAPREDUCE-6288.patch The patch simply changes the permissions for the directories to have world executable (771 instead of 770). I think writing a unit test for this issue might be tricky, but I did verify in a cluster that it works correctly. mapred job -status fails with AccessControlException - Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker Attachments: MAPREDUCE-6288.patch After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:257) at
[jira] [Updated] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6288: - Status: Patch Available (was: Open) mapred job -status fails with AccessControlException - Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker Attachments: MAPREDUCE-6288.patch After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:257) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1490) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:302) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:298)
[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376896#comment-14376896 ] Robert Kanter commented on MAPREDUCE-6288: -- Looks like it's this part of the patch from MAPREDUCE-5875: {code:java} --- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Cluster.java +++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Cluster.java @@ -182,15 +182,15 @@ public FileSystem run() throws IOException, InterruptedException { public Job getJob(JobID jobId) throws IOException, InterruptedException { JobStatus status = client.getJobStatus(jobId); if (status != null) { - JobConf conf; + final JobConf conf = new JobConf(); + final Path jobPath = new Path(client.getFilesystemName(), + status.getJobFile()); + final FileSystem fs = FileSystem.get(jobPath.toUri(), getConf()); try { -conf = new JobConf(status.getJobFile()); - } catch (RuntimeException ex) { -// If job file doesn't exist it means we can't find the job -if (ex.getCause() instanceof FileNotFoundException) { - return null; -} else { - throw ex; +conf.addResource(fs.open(jobPath), jobPath.toString()); + } catch (FileNotFoundException fnf) { +if (LOG.isWarnEnabled()) { + LOG.warn(Job conf missing on cluster, fnf); } } return Job.getInstance(this, status, conf); {code} the old code {{new JobConf(status.getJobFile());}}, seems like it wasn't working correctly in that it didn't actually load the jobconf file from HDFS, which would have triggered this same Exception. The new code does this correctly, and gets the Exception because the parent directories of {{/user/history/done/2015/03/22/00/job_1427080398288_0001_conf.xml}} need to be executable by the user. Changing the permissions of the parents to have world +x confirms this. mapred job -status fails with AccessControlException - Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at
[jira] [Created] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
Robert Kanter created MAPREDUCE-6288: Summary: mapred job -status fails with AccessControlException Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:257) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1490) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:302) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:298) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:298) at
[jira] [Commented] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377017#comment-14377017 ] Robert Kanter commented on MAPREDUCE-6288: -- [~jira.shegalov], can you take a look? mapred job -status fails with AccessControlException - Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker Attachments: MAPREDUCE-6288.patch After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1191) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:299) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:265) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:257) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1490) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:302) at
[jira] [Commented] (MAPREDUCE-6282) Reuse historyFileAbsolute.getFileSystem in CompletedJob#loadFullHistoryData for code optimization.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371987#comment-14371987 ] Robert Kanter commented on MAPREDUCE-6282: -- +1 Reuse historyFileAbsolute.getFileSystem in CompletedJob#loadFullHistoryData for code optimization. -- Key: MAPREDUCE-6282 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6282 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Labels: optimization Attachments: MAPREDUCE-6282.000.patch Reuse historyFileAbsolute.getFileSystem in CompletedJob#loadFullHistoryData for code optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6282) Reuse historyFileAbsolute.getFileSystem in CompletedJob#loadFullHistoryData for code optimization.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6282: - Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Zhihai. Committed to trunk and branch-2! Reuse historyFileAbsolute.getFileSystem in CompletedJob#loadFullHistoryData for code optimization. -- Key: MAPREDUCE-6282 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6282 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Labels: optimization Fix For: 2.8.0 Attachments: MAPREDUCE-6282.000.patch Reuse historyFileAbsolute.getFileSystem in CompletedJob#loadFullHistoryData for code optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)