[jira] [Commented] (MAPREDUCE-4374) Fix child task environment variable config and add support for Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427851#comment-13427851 ] Bikas Saha commented on MAPREDUCE-4374: --- bq.Notice this is tmp directory. On Linux /tmp exists by default which is not the case on Windows. I think this is not limited to /tmp. It could be any directory set in the config. btw, in the if condition is tmpDir exists then the check for it being a dir is skipped I think. bq. Let’s not push this to Shell as I think the code is simply enough to be understood and we can work towards a better abstraction for this (handling ‘set’ vs ‘export’) in the future. I see what the issue is. At the same time, this code is duplicated in 4 different places. So IMO, it makes sense to put these in a helper method call. When we make a better abstraction, we will know the single place to look for the old code instead of multiple places. Thoughts? Fix child task environment variable config and add support for Windows -- Key: MAPREDUCE-4374 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4374 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: MAPREDUCE-4374-branch-1-win-2.patch, MAPREDUCE-4374-branch-1-win.patch In HADOOP-2838, a new feature was introduced to set environment variables via the Hadoop config 'mapred.child.env' for child tasks. There are some further fixes and improvements around this feature, e.g. HADOOP-5981 were a bug fix; MAPREDUCE-478 broke the config into 'mapred.map.child.env' and 'mapred.reduce.child.env'. However the current implementation is still not complete. It does not match its documentation or original intend as I believe. Also, by using ‘:’ (colon) and ‘;’ (semicolon) in the configuration syntax, we will have problems using them on Windows because ‘:’ appears very often in Windows path as in “C:\”, and environment variables are used very often to hold path names. The Jira is created to fix the problem and provide support on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4393) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS
[ https://issues.apache.org/jira/browse/MAPREDUCE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4393: - Status: Open (was: Patch Available) Jaigak - I spent some more thinking about this in light of MAPREDUCE-4495. Unfortunately, it seems that we are running the risk of turning YARN into an 'umbrella' project by accepting applications built on top of YARN into the project itself... Essentially, as folks like Chris Mattman have pointed out in MAPREDUCE-4495, the PaaS prototype is better off being a standalone project in Apache Incubator since the Apache Software Foundation frowns upon one 'umbrella' project housing several smaller projects i.e. YARN vis-a-vis PaaS, Workflow AM etc. If you are interested, I'm more than happy to help you through the Apache Incubator process and we collaborate via the Incubator. Do you mind doing that? Thanks! PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS Key: MAPREDUCE-4393 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4393 Project: Hadoop Map/Reduce Issue Type: Task Components: examples Affects Versions: 0.23.1 Reporter: Jaigak Song Assignee: Jaigak Song Fix For: 3.0.0 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, MAPREDUCE4393.patch Original Estimate: 336h Time Spent: 336h Remaining Estimate: 0h This application is to demonstrate that YARN can be used for non-mapreduce applications. As Hadoop has already been adopted and deployed widely and its deployment in future will be highly increased, we thought that it's a good potential to be used as PaaS. I have implemented a proof of concept to demonstrate that YARN can be used as a PaaS (Platform as a Service). I have done a gap analysis against VMware's Cloud Foundry and tried to achieve as many PaaS functionalities as possible on YARN. I'd like to check in this POC as a YARN example application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4393) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS
[ https://issues.apache.org/jira/browse/MAPREDUCE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427854#comment-13427854 ] Arun C Murthy commented on MAPREDUCE-4393: -- Here is more information about proposing this via the incubator: http://incubator.apache.org/guides/proposal.html I do apologize for not seeing the danger of this (i.e. turning YARN into an umbrella project) earlier - I'm willing to make up for it by helping you through the Incubator. However, it is something the ASF cares deeply about and is something I have to follow as part of the responsibility of the Hadoop PMC. Again, apologies - but I do hope we can collaborate through the Incubator and my offer of help stands. Thanks! PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS Key: MAPREDUCE-4393 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4393 Project: Hadoop Map/Reduce Issue Type: Task Components: examples Affects Versions: 0.23.1 Reporter: Jaigak Song Assignee: Jaigak Song Fix For: 3.0.0 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, MAPREDUCE4393.patch Original Estimate: 336h Time Spent: 336h Remaining Estimate: 0h This application is to demonstrate that YARN can be used for non-mapreduce applications. As Hadoop has already been adopted and deployed widely and its deployment in future will be highly increased, we thought that it's a good potential to be used as PaaS. I have implemented a proof of concept to demonstrate that YARN can be used as a PaaS (Platform as a Service). I have done a gap analysis against VMware's Cloud Foundry and tried to achieve as many PaaS functionalities as possible on YARN. I'd like to check in this POC as a YARN example application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427856#comment-13427856 ] Arun C Murthy commented on MAPREDUCE-4495: -- Santosh - I agree that both PaaS and Workflow-AM are similar. I think both show that we could easily turn YARN into an umbrella project with a proliferation of YARN applications. Hence, I have re-considered my opinion on MAPREDUCE-4393 and asked them to go the Incubator route too: http://s.apache.org/1K5 I look forward to collaborating on both projects in the Incubator. Thanks. Workflow Application Master in YARN --- Key: MAPREDUCE-4495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: - Less number of consumed resources, since only one application master will be spawned for the whole workflow. - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). - More optimization opportunities in terms of collective resource requests. - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4068) Jars in lib subdirectory of the submittable JAR are not added to the classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427876#comment-13427876 ] Harsh J commented on MAPREDUCE-4068: This is a major regression if its true. Are there no tests covering this unpacking feature? Jars in lib subdirectory of the submittable JAR are not added to the classpath -- Key: MAPREDUCE-4068 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4068 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Ahmed Radwan Fix For: 0.23.2 Prior to hadoop 0.23, users could add third party jars to the lib subdirectory of the submitted job jar and they become available in the task's classpath. I see this functionality was in TaskRunner.java, but I can't see similar functionality in hadoop 0.23 (neither in MapReduceChildJVM.java nor other places). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4501) couldn't compile hadoop-2.0 successfully because of errors in build files
[ https://issues.apache.org/jira/browse/MAPREDUCE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427902#comment-13427902 ] Yan Liu commented on MAPREDUCE-4501: This error happens after merging MAPREDUCE-4438, in the pom.xml for hadoop-yarn-applications. Now it's ok in current trunk version. couldn't compile hadoop-2.0 successfully because of errors in build files - Key: MAPREDUCE-4501 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4501 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Yan Liu hadoop-yarn-applications relies on is 2.0.1-SNAPSHOT, however, the commit makes it 3.0.0-SNAPSHOT. This makes the compile fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4511) Add IFile readahead
Ahmed Radwan created MAPREDUCE-4511: --- Summary: Add IFile readahead Key: MAPREDUCE-4511 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4511 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Reporter: Ahmed Radwan Assignee: Ahmed Radwan This ticket is to add IFile readahead as part of HADOOP-7714. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4511) Add IFile readahead
[ https://issues.apache.org/jira/browse/MAPREDUCE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427968#comment-13427968 ] Ahmed Radwan commented on MAPREDUCE-4511: - Here is the updated branch-1 patch based on Todd's HADOOP-7714 patches. Note that this patch requires HADOOP-7754 patch. Add IFile readahead --- Key: MAPREDUCE-4511 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4511 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Reporter: Ahmed Radwan Assignee: Ahmed Radwan This ticket is to add IFile readahead as part of HADOOP-7714. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4511) Add IFile readahead
[ https://issues.apache.org/jira/browse/MAPREDUCE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Radwan updated MAPREDUCE-4511: Attachment: MAPREDUCE-4511_branch1.patch Add IFile readahead --- Key: MAPREDUCE-4511 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4511 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Reporter: Ahmed Radwan Assignee: Ahmed Radwan Attachments: MAPREDUCE-4511_branch1.patch This ticket is to add IFile readahead as part of HADOOP-7714. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4275) Plugable process tree
[ https://issues.apache.org/jira/browse/MAPREDUCE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated MAPREDUCE-4275: --- Attachment: plugable-pstree-3.txt Do not create processtree instance from resourcecalculator plugin. Make them separated. Plugable process tree - Key: MAPREDUCE-4275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4275 Project: Hadoop Map/Reduce Issue Type: Improvement Components: nodemanager Affects Versions: 3.0.0 Environment: FreeBSD 64 bit Reporter: Radim Kolar Attachments: plugable-pstree-1.txt, plugable-pstree-2.txt, plugable-pstree-3.txt, plugable-pstree.txt Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4275) Plugable process tree
[ https://issues.apache.org/jira/browse/MAPREDUCE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427979#comment-13427979 ] Hadoop QA commented on MAPREDUCE-4275: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539020/plugable-pstree-3.txt against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2702//console This message is automatically generated. Plugable process tree - Key: MAPREDUCE-4275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4275 Project: Hadoop Map/Reduce Issue Type: Improvement Components: nodemanager Affects Versions: 3.0.0 Environment: FreeBSD 64 bit Reporter: Radim Kolar Attachments: plugable-pstree-1.txt, plugable-pstree-2.txt, plugable-pstree-3.txt, plugable-pstree.txt Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4431) killing already completed job gives ambiguous message as Killed job job id
[ https://issues.apache.org/jira/browse/MAPREDUCE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427989#comment-13427989 ] Devaraj K commented on MAPREDUCE-4431: -- Hi Harsh, can you have a look into the updated patch when you find some time? killing already completed job gives ambiguous message as Killed job job id -- Key: MAPREDUCE-4431 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4431 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Nishan Shetty Assignee: Devaraj K Priority: Minor Attachments: MAPREDUCE-4431-1.patch, MAPREDUCE-4431.patch If we try to kill the already completed job by the following command it gives ambiguous message as Killed job job id ./mapred job -kill already completed job id -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3193) FileInputFormat doesn't read files recursively in the input path dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427990#comment-13427990 ] Devaraj K commented on MAPREDUCE-3193: -- Hi Harsh, can you have a look into the updated patch when you find some time? FileInputFormat doesn't read files recursively in the input path dir Key: MAPREDUCE-3193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3193 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Affects Versions: 1.0.2, 0.23.2, 2.0.0-alpha, 3.0.0 Reporter: Ramgopal N Assignee: Devaraj K Attachments: MAPREDUCE-3193-1.patch, MAPREDUCE-3193-2.patch, MAPREDUCE-3193-2.patch, MAPREDUCE-3193-3.patch, MAPREDUCE-3193.patch, MAPREDUCE-3193.security.patch java.io.FileNotFoundException is thrown,if input file is more than one folder level deep and the job is getting failed. Example:Input file is /r1/r2/input.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4507) IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict
[ https://issues.apache.org/jira/browse/MAPREDUCE-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427993#comment-13427993 ] Harsh J commented on MAPREDUCE-4507: The {{map()}} function is to be properly overriden when using the new API. Using @Override annotations on map() (and for that matter, reduce() too) will help you catch your mistake here. As discussed on http://search-hadoop.com/m/hSxqz1vsQPc, this is a user-side mistake, but in no way a bug. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Mapper.html#map(KEYIN,%20VALUEIN,%20org.apache.hadoop.mapreduce.Mapper.Context). We can add a javadoc improvement (and a tutorial improvement) to state the right answer to avoiding this issue: Always use @Override annotations when overriding methods. (Any IDE today provides support for this). IdentityMapper is being triggered when the type of the Input Key at class level and method level has a conflict --- Key: MAPREDUCE-4507 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4507 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.0.3 Environment: linux ubuntu Reporter: Bejoy KS If we use the default InputFormat (TextInputFormat) but specify the Key type in mapper as IntWritable instead of Long Writable. The framework is supposed throw a class cast exception.Such an exception is thrown only if the key types at class level and method level are the same (IntWritable). But if we provide the Input key type as IntWritable on the class level but LongWritable on the method level (map method), instead of throwing a compile time error, the code compliles fine . In addition to it on execution the framework triggers Identity Mapper instead of the custom mapper provided with the configuration. In this case the 'mapreduce.map.class' in job.xml shows mapper as Custom Mapper itself , it should show IdentityMapper in cases where IdentityMapper is triggered to avoid confusion and easy debugging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4275) Plugable process tree
[ https://issues.apache.org/jira/browse/MAPREDUCE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated MAPREDUCE-4275: --- Attachment: plugable-pstree-4.txt check if ProcessTree is available before enabling monitoring Plugable process tree - Key: MAPREDUCE-4275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4275 Project: Hadoop Map/Reduce Issue Type: Improvement Components: nodemanager Affects Versions: 3.0.0 Environment: FreeBSD 64 bit Reporter: Radim Kolar Attachments: plugable-pstree-1.txt, plugable-pstree-2.txt, plugable-pstree-3.txt, plugable-pstree-4.txt, plugable-pstree.txt Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4275) Plugable process tree
[ https://issues.apache.org/jira/browse/MAPREDUCE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427998#comment-13427998 ] Hadoop QA commented on MAPREDUCE-4275: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539022/plugable-pstree-4.txt against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2703//console This message is automatically generated. Plugable process tree - Key: MAPREDUCE-4275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4275 Project: Hadoop Map/Reduce Issue Type: Improvement Components: nodemanager Affects Versions: 3.0.0 Environment: FreeBSD 64 bit Reporter: Radim Kolar Attachments: plugable-pstree-1.txt, plugable-pstree-2.txt, plugable-pstree-3.txt, plugable-pstree-4.txt, plugable-pstree.txt Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4275) Plugable process tree
[ https://issues.apache.org/jira/browse/MAPREDUCE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated MAPREDUCE-4275: --- Attachment: plugable-pstree-4-with-whitespace.txt now without removed whitespace lines Plugable process tree - Key: MAPREDUCE-4275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4275 Project: Hadoop Map/Reduce Issue Type: Improvement Components: nodemanager Affects Versions: 3.0.0 Environment: FreeBSD 64 bit Reporter: Radim Kolar Attachments: plugable-pstree-1.txt, plugable-pstree-2.txt, plugable-pstree-3.txt, plugable-pstree-4-with-whitespace.txt, plugable-pstree-4.txt, plugable-pstree.txt Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4275) Plugable process tree
[ https://issues.apache.org/jira/browse/MAPREDUCE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428037#comment-13428037 ] Hadoop QA commented on MAPREDUCE-4275: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539036/plugable-pstree-4-with-whitespace.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor org.apache.hadoop.yarn.server.nodemanager.TestEventFlow org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2704//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2704//console This message is automatically generated. Plugable process tree - Key: MAPREDUCE-4275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4275 Project: Hadoop Map/Reduce Issue Type: Improvement Components: nodemanager Affects Versions: 3.0.0 Environment: FreeBSD 64 bit Reporter: Radim Kolar Attachments: plugable-pstree-1.txt, plugable-pstree-2.txt, plugable-pstree-3.txt, plugable-pstree-4-with-whitespace.txt, plugable-pstree-4.txt, plugable-pstree.txt Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3289) Make use of fadvise in the NM's shuffle handler
[ https://issues.apache.org/jira/browse/MAPREDUCE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428052#comment-13428052 ] Hudson commented on MAPREDUCE-3289: --- Integrated in Hadoop-Hdfs-trunk #1124 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1124/]) MAPREDUCE-3289. Make use of fadvise in the NM's shuffle handler. (Contributed by Todd Lipcon and Siddharth Seth) (Revision 1368718) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1368718 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java Make use of fadvise in the NM's shuffle handler --- Key: MAPREDUCE-3289 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3289 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2, nodemanager, performance Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 1.2.0, 2.2.0-alpha Attachments: 3289-1.txt, 3289-2.txt, MAPREDUCE-3289.branch-1.patch, MAPREDUCE-3289.branch-1.patch, MR3289_trunk.txt, MR3289_trunk_2.txt, MR3289_trunk_3.txt, mr-3289.txt Using the new NativeIO fadvise functions, we can make the NodeManager prefetch map output before it's send over the socket, and drop it out of the fs cache once it's been sent (since it's very rare for an output to have to be re-sent). This improves IO efficiency and reduces cache pollution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4275) Plugable process tree
[ https://issues.apache.org/jira/browse/MAPREDUCE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated MAPREDUCE-4275: --- Attachment: plugable-pstree-5-with-whitespace.txt avoid null pointer dereference in init() Plugable process tree - Key: MAPREDUCE-4275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4275 Project: Hadoop Map/Reduce Issue Type: Improvement Components: nodemanager Affects Versions: 3.0.0 Environment: FreeBSD 64 bit Reporter: Radim Kolar Attachments: plugable-pstree-1.txt, plugable-pstree-2.txt, plugable-pstree-3.txt, plugable-pstree-4-with-whitespace.txt, plugable-pstree-4.txt, plugable-pstree-5-with-whitespace.txt, plugable-pstree.txt Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4275) Plugable process tree
[ https://issues.apache.org/jira/browse/MAPREDUCE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428094#comment-13428094 ] Hadoop QA commented on MAPREDUCE-4275: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539050/plugable-pstree-5-with-whitespace.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2705//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2705//console This message is automatically generated. Plugable process tree - Key: MAPREDUCE-4275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4275 Project: Hadoop Map/Reduce Issue Type: Improvement Components: nodemanager Affects Versions: 3.0.0 Environment: FreeBSD 64 bit Reporter: Radim Kolar Attachments: plugable-pstree-1.txt, plugable-pstree-2.txt, plugable-pstree-3.txt, plugable-pstree-4-with-whitespace.txt, plugable-pstree-4.txt, plugable-pstree-5-with-whitespace.txt, plugable-pstree.txt Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3289) Make use of fadvise in the NM's shuffle handler
[ https://issues.apache.org/jira/browse/MAPREDUCE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428100#comment-13428100 ] Hudson commented on MAPREDUCE-3289: --- Integrated in Hadoop-Mapreduce-trunk #1156 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1156/]) MAPREDUCE-3289. Make use of fadvise in the NM's shuffle handler. (Contributed by Todd Lipcon and Siddharth Seth) (Revision 1368718) Result = FAILURE sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1368718 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java Make use of fadvise in the NM's shuffle handler --- Key: MAPREDUCE-3289 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3289 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2, nodemanager, performance Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 1.2.0, 2.2.0-alpha Attachments: 3289-1.txt, 3289-2.txt, MAPREDUCE-3289.branch-1.patch, MAPREDUCE-3289.branch-1.patch, MR3289_trunk.txt, MR3289_trunk_2.txt, MR3289_trunk_3.txt, mr-3289.txt Using the new NativeIO fadvise functions, we can make the NodeManager prefetch map output before it's send over the socket, and drop it out of the fs cache once it's been sent (since it's very rare for an output to have to be re-sent). This improves IO efficiency and reduces cache pollution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428188#comment-13428188 ] Tom White commented on MAPREDUCE-4488: -- Alejandro - the code is from MAPREDUCE-463. Can I make the changes you suggest in another JIRA so that branches 1 and 2 are kept the same? Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428194#comment-13428194 ] Alejandro Abdelnur commented on MAPREDUCE-4495: --- I don't think PaaS and Workflow-AM are similar. Workflow-AM aims to provide a AM can that can run multiple MR jobs and do intra-AM processing all from the same AM. This would be enough for projects that typically run multiple MR jobs as single unit of processing, like Pig/Hive/Sqoop/Oozie. Workflow-AM will need to tap into the MapReduce AM private classes, as the intention is to fully leverage what has been done already. And most likely will require changes in the MapReduce AM, such as making it thread-safe and multi-mr-job safe (which I believe it is not the case today). Because of this, I think that it belongs in MapReduce. And having it outside, at least during its inception, it will make much more difficult its development. Said this, I don't have any issue, quite the opposite, once we finalize the initial implementation to see how it can be generalized and move out. Workflow Application Master in YARN --- Key: MAPREDUCE-4495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: - Less number of consumed resources, since only one application master will be spawned for the whole workflow. - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). - More optimization opportunities in terms of collective resource requests. - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428195#comment-13428195 ] Alejandro Abdelnur commented on MAPREDUCE-4488: --- +1 Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence
Gelesh created MAPREDUCE-4512: - Summary: TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence Key: MAPREDUCE-4512 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/mumak, mr-am, mrv1, mrv2, task Affects Versions: 2.0.0-alpha Environment: Lynux Reporter: Gelesh Fix For: 0.20.204.0 TextInputFormat delimiter bug scenario , a character sequence of the input text, in which the first character matches with the first character of delimiter, and reaming input text character sequence matches with the entire delimiter character sequence from the starting position of the delimiter. eg delimiter =record; and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location Bangalore record 2: name = sdf .. location =Bangalorrecord 3: name Here string =Bangalorrecord 3: satisfy two condition 1) contains the delimiter record 2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter. (ie =Bangalor ends with and Delimiter starts with same character/char sequence 'r' ), Hear the delimiter is skipped -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence
[ https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4512: -- Status: Patch Available (was: Open) just one line of code change @ LineReader, would do. Tested Any issues please let me know to help further gelesh.had...@gmail.com TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence - Key: MAPREDUCE-4512 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/mumak, mr-am, mrv1, mrv2, task Affects Versions: 2.0.0-alpha Environment: Lynux Reporter: Gelesh Labels: patch Fix For: 0.20.204.0 Original Estimate: 1m Remaining Estimate: 1m TextInputFormat delimiter bug scenario , a character sequence of the input text, in which the first character matches with the first character of delimiter, and reaming input text character sequence matches with the entire delimiter character sequence from the starting position of the delimiter. eg delimiter =record; and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location Bangalore record 2: name = sdf .. location =Bangalorrecord 3: name Here string =Bangalorrecord 3: satisfy two condition 1) contains the delimiter record 2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter. (ie =Bangalor ends with and Delimiter starts with same character/char sequence 'r' ), Hear the delimiter is skipped -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence
[ https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4512: -- Attachment: MAPREDUCE-4512.txt Just One line code change at LineRecord. Tested in case there is any issue please mail me gelesh.had...@gmail.com TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence - Key: MAPREDUCE-4512 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/mumak, mr-am, mrv1, mrv2, task Affects Versions: 2.0.0-alpha Environment: Lynux Reporter: Gelesh Labels: patch Fix For: 0.20.204.0 Attachments: MAPREDUCE-4512.txt Original Estimate: 1m Remaining Estimate: 1m TextInputFormat delimiter bug scenario , a character sequence of the input text, in which the first character matches with the first character of delimiter, and reaming input text character sequence matches with the entire delimiter character sequence from the starting position of the delimiter. eg delimiter =record; and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location Bangalore record 2: name = sdf .. location =Bangalorrecord 3: name Here string =Bangalorrecord 3: satisfy two condition 1) contains the delimiter record 2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter. (ie =Bangalor ends with and Delimiter starts with same character/char sequence 'r' ), Hear the delimiter is skipped -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-3902: -- Attachment: MAPREDUCE-3902.2.patch As a first step, I fixed the patch by Arun to pass compile against current source code. MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc. -- Key: MAPREDUCE-3902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, mrv2 Reporter: Arun C Murthy Assignee: Siddharth Seth Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch The MR AM is now in a great position to reuse containers across (map) tasks. This is something similar to JVM re-use we had in 0.20.x, but in a significantly better manner: # Consider data-locality when re-using containers # Consider the new shuffle - ensure that reduces fetch output of the whole container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428211#comment-13428211 ] Tsuyoshi OZAWA commented on MAPREDUCE-3902: --- IMHO, the 2nd topic(combining per container) should be moved, because the change seems to be too big. If there are no counter opinion, I'm going to create new ticket to deal with the 2nd topic as a sub-task of MAPREDUCe-3902. MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc. -- Key: MAPREDUCE-3902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, mrv2 Reporter: Arun C Murthy Assignee: Siddharth Seth Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch The MR AM is now in a great position to reuse containers across (map) tasks. This is something similar to JVM re-use we had in 0.20.x, but in a significantly better manner: # Consider data-locality when re-using containers # Consider the new shuffle - ensure that reduces fetch output of the whole container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428212#comment-13428212 ] Tsuyoshi OZAWA commented on MAPREDUCE-3902: --- s/should be moved/should be moved to the new ticket/ MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc. -- Key: MAPREDUCE-3902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, mrv2 Reporter: Arun C Murthy Assignee: Siddharth Seth Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch The MR AM is now in a great position to reuse containers across (map) tasks. This is something similar to JVM re-use we had in 0.20.x, but in a significantly better manner: # Consider data-locality when re-using containers # Consider the new shuffle - ensure that reduces fetch output of the whole container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence
[ https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428214#comment-13428214 ] Hadoop QA commented on MAPREDUCE-4512: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539059/MAPREDUCE-4512.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2706//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2706//console This message is automatically generated. TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence - Key: MAPREDUCE-4512 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/mumak, mr-am, mrv1, mrv2, task Affects Versions: 2.0.0-alpha Environment: Lynux Reporter: Gelesh Labels: patch Fix For: 0.20.204.0 Attachments: MAPREDUCE-4512.txt Original Estimate: 1m Remaining Estimate: 1m TextInputFormat delimiter bug scenario , a character sequence of the input text, in which the first character matches with the first character of delimiter, and reaming input text character sequence matches with the entire delimiter character sequence from the starting position of the delimiter. eg delimiter =record; and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location Bangalore record 2: name = sdf .. location =Bangalorrecord 3: name Here string =Bangalorrecord 3: satisfy two condition 1) contains the delimiter record 2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter. (ie =Bangalor ends with and Delimiter starts with same character/char sequence 'r' ), Hear the delimiter is skipped -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428237#comment-13428237 ] Bo Wang commented on MAPREDUCE-4495: I agree with Alejandro. The goals of workflow-AM are beyond job scheduling and include local resource management and optimization. These goals require a tight interaction of workflow AM and MR AM. It can be regarded as an extension to MR AM. I noticed MAPREDUCE-3902 on reusing containers in MR AM. Workflow AM can reuse containers across jobs, which is a more general case. Workflow Application Master in YARN --- Key: MAPREDUCE-4495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: - Less number of consumed resources, since only one application master will be spawned for the whole workflow. - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). - More optimization opportunities in terms of collective resource requests. - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-3600) Add Minimal Fair Scheduler to MR2
[ https://issues.apache.org/jira/browse/MAPREDUCE-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved MAPREDUCE-3600. Resolution: Fixed Fixed by parent ticket. Add Minimal Fair Scheduler to MR2 - Key: MAPREDUCE-3600 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3600 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: mrv2, scheduler Reporter: Patrick Wendell Assignee: Patrick Wendell Attachments: MAPREDUCE-3600.v1.patch, MAPREDUCE-3600.v2.patch This covers the addition of the Fair Scheduler to the MR2 infrastructure. This patch will represent the minimum functional FairScheduler in MR2. It will be limited to a configuration file reader, functionality to calculate fair shares, and hooks into the actual MR2 scheduling code. It will not include delay scheduling, preemption, or a web UI, which will be handled in separate JIRA's. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-3602) Add Preemption to MR2 Fair Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved MAPREDUCE-3602. Resolution: Fixed Solved with parent ticket. Add Preemption to MR2 Fair Scheduler Key: MAPREDUCE-3602 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3602 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: scheduler Reporter: Patrick Wendell Assignee: Patrick Wendell Attachments: MAPREDUCE-3602.v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-3601) Add Delay Scheduling to MR2 Fair Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved MAPREDUCE-3601. Resolution: Fixed Fixed with parent ticket. Add Delay Scheduling to MR2 Fair Scheduler -- Key: MAPREDUCE-3601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3601 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: scheduler Reporter: Patrick Wendell Assignee: Patrick Wendell Attachments: MAPREDUCE-3601.v1.patch JIRA for delay scheduling component. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428257#comment-13428257 ] Arun C Murthy commented on MAPREDUCE-4495: -- Alejandro, making MR AM thread-safe is a good goal. We can do that independently of the new AM. I have opened MAPREDUCE-4513 for the same. I don't which other 'private' classes you need - frankly that concerns me. It means you are adding new requirements on MR-AM which isn't an 'api' of that nature. Also, if we are going that route I strongly suggest we do not import code from Oozie and merely take JobControl api and support it. That should be a trivial exercise without adding any new 'interfaces' to MapReduce. So, I see two options: # Enhance JobControl api to work in AM by making MR-AM, specifially MRAppMaster thread-safe. This will allow for multiple objects of MRAppMaster to be created. This means there are no new interfaces to MapReduce. # Go the full distance, make it generic, import code from Oozie, come up with a new set of interfaces etc. etc. and do it in a separate Incubator project. As I indicated previously, my preference is option #2 and I have already offered help to deal with the specifics so you and Bo can concentrate on getting the code out. Thoughts? Workflow Application Master in YARN --- Key: MAPREDUCE-4495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: - Less number of consumed resources, since only one application master will be spawned for the whole workflow. - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). - More optimization opportunities in terms of collective resource requests. - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4513) Make MR AM thread-safe
Arun C Murthy created MAPREDUCE-4513: Summary: Make MR AM thread-safe Key: MAPREDUCE-4513 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4513 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy Currently MR-AM has a bunch of statics making it thread unsafe. We should fix that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428270#comment-13428270 ] Owen O'Malley commented on MAPREDUCE-4495: -- The Hadoop project has gone down the path of having large contrib components before and it created substantial difficulties for the Hadoop community. Hadoop should be about creating a platform for other projects to build on rather than bundling all components within itself. Since many of the people interested in working on this are in the Oozie project, it might make sense to host it there. Otherwise incubator would be a great place to go while you build the project and community. Any work that you can do to help YARN become a better platform is appreciated, but I expect there to be a lot of YARN-based frameworks and they will all need be managed from outside of Hadoop. Workflow Application Master in YARN --- Key: MAPREDUCE-4495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: - Less number of consumed resources, since only one application master will be spawned for the whole workflow. - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). - More optimization opportunities in terms of collective resource requests. - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428272#comment-13428272 ] Arun C Murthy commented on MAPREDUCE-4495: -- {quote} So, I see two options: # Enhance JobControl api to work in AM by making MR-AM, specifially MRAppMaster thread-safe. This will allow for multiple objects of MRAppMaster to be created. This means there are no new interfaces to MapReduce. # Go the full distance, make it generic, import code from oozie, come up with a new set of interfaces for generic DAG mgmt infrastructure etc. etc. and do it in a separate Incubator project. {quote} I think this is coming to a point where we are arguing too much in the abstract. Frankly, this is really not how I want to spend my time. Maybe we can wait for a detailed proposal from Bo or Alejandro and then revisit this discussion. I believe I have laid my thoughts out clearly with respect to the options etc. Let's discuss when we actually have something concrete (design or code). OTOH, if we can agree on the Incubator proposal I'm happy to do the legwork for Alejandro right-away. At least that is tractable and not merely abstract. Workflow Application Master in YARN --- Key: MAPREDUCE-4495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: - Less number of consumed resources, since only one application master will be spawned for the whole workflow. - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). - More optimization opportunities in terms of collective resource requests. - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4431) killing already completed job gives ambiguous message as Killed job job id
[ https://issues.apache.org/jira/browse/MAPREDUCE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428273#comment-13428273 ] Mayank Bansal commented on MAPREDUCE-4431: -- +1 Looks good. Thanks, Mayank killing already completed job gives ambiguous message as Killed job job id -- Key: MAPREDUCE-4431 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4431 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Nishan Shetty Assignee: Devaraj K Priority: Minor Attachments: MAPREDUCE-4431-1.patch, MAPREDUCE-4431.patch If we try to kill the already completed job by the following command it gives ambiguous message as Killed job job id ./mapred job -kill already completed job id -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4275) Plugable process tree
[ https://issues.apache.org/jira/browse/MAPREDUCE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428279#comment-13428279 ] Bikas Saha commented on MAPREDUCE-4275: --- Thanks for incorporating my comments. +1. Minor typo in unavailable {code} +if (resourceCalculatorPlugin == null) { +LOG.info(ResourceCalculatorPlugin is unavaiable on this system. ++ this.getClass().getName() + is disabled.); +return false; +} +if (ResourceCalculatorProcessTree.getResourceCalculatorProcessTree(0, processTreeClass, conf) == null) { +LOG.info(ResourceCalculatorProcessTree is unavaiable on this system. ++ this.getClass().getName() + is disabled.); {code} Plugable process tree - Key: MAPREDUCE-4275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4275 Project: Hadoop Map/Reduce Issue Type: Improvement Components: nodemanager Affects Versions: 3.0.0 Environment: FreeBSD 64 bit Reporter: Radim Kolar Attachments: plugable-pstree-1.txt, plugable-pstree-2.txt, plugable-pstree-3.txt, plugable-pstree-4-with-whitespace.txt, plugable-pstree-4.txt, plugable-pstree-5-with-whitespace.txt, plugable-pstree.txt Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4508) YARN needs to properly check the NM,AM memory properties in yarn-site.xml and mapred.xml and report errors accordingly.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428280#comment-13428280 ] Anil Gupta commented on MAPREDUCE-4508: --- Hi Hitesh, If you think that MAPREDUCE-3796 will cover the test case of checking that yarn.nodemanager.resource.memory-mb yarn.app.mapreduce.am.resource.mb and take appropriate actions accordingly then you can close it as dup of MAPREDUCE-3796. Thanks, Anil Gupta YARN needs to properly check the NM,AM memory properties in yarn-site.xml and mapred.xml and report errors accordingly. --- Key: MAPREDUCE-4508 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4508 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.0-alpha Environment: CentOs6.0, Hadoop2.0.0 Alpha Reporter: Anil Gupta Labels: Map, Reduce, YARN Please refer to this discussion on the Hadoop Mailing list: http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/33110 Summary: I was running YARN(Hadoop2.0.0 Alpha) on a 8 datanode, 4 admin node Hadoop/HBase cluster. My datanodes were only having 3.2GB of memory. So, i configured the yarn.nodemanager.resource.memory-mb property in yarn-site.xml to 1200. After setting the property if i run any Yarn Job then the NodemManager wont be able to start any Map task since by default the yarn.app.mapreduce.am.resource.mb property is set to 1500 MB in mapred-site.xml. Expected Behavior: NodeManager should give an error if yarn.app.mapreduce.am.resource.mb = yarn.nodemanager.resource.memory-mb. Please let me know if more information is required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428293#comment-13428293 ] Patrick Wendell commented on MAPREDUCE-4495: Just caught up with this - there are several issues being debated here simultaneously. It is really pointless to start arguing about them until we have a clear and thorough design doc along with a preliminary discussion of technical merit. This description needs a lot more color given the scope of the proposal. I agree with Arun - we should wait until that happens to continue discussion. Workflow Application Master in YARN --- Key: MAPREDUCE-4495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: - Less number of consumed resources, since only one application master will be spawned for the whole workflow. - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). - More optimization opportunities in terms of collective resource requests. - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428297#comment-13428297 ] Alejandro Abdelnur commented on MAPREDUCE-4495: --- bq. Maybe we can wait for a detailed proposal from Bo or Alejandro and then revisit this discussion. I believe I have laid my thoughts out clearly with respect to the options etc. Let's discuss when we actually have something concrete (design or code). Sounds like a plan. Workflow Application Master in YARN --- Key: MAPREDUCE-4495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: - Less number of consumed resources, since only one application master will be spawned for the whole workflow. - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). - More optimization opportunities in terms of collective resource requests. - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4503) Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles
[ https://issues.apache.org/jira/browse/MAPREDUCE-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428352#comment-13428352 ] Jonathan Eagles commented on MAPREDUCE-4503: +1 Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles --- Key: MAPREDUCE-4503 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4503 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MR-4503.txt, MR-4503.txt in 1.0 if a file was both in a jobs cache archives and cache files, and InvalidJobConfException was thrown. We should replicate this behavior on mrv2. We should also extend it so that if a cache archive or cache file is not going to be downloaded at all because of conflicts in the names of the symlinks a similar exception is thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4503) Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles
[ https://issues.apache.org/jira/browse/MAPREDUCE-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated MAPREDUCE-4503: --- Resolution: Fixed Fix Version/s: 2.2.0-alpha 3.0.0 0.23.3 Status: Resolved (was: Patch Available) Looks great. Thanks, Bobby. Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles --- Key: MAPREDUCE-4503 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4503 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.3, 3.0.0, 2.2.0-alpha Attachments: MR-4503.txt, MR-4503.txt in 1.0 if a file was both in a jobs cache archives and cache files, and InvalidJobConfException was thrown. We should replicate this behavior on mrv2. We should also extend it so that if a cache archive or cache file is not going to be downloaded at all because of conflicts in the names of the symlinks a similar exception is thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4323) NM leaks sockets
[ https://issues.apache.org/jira/browse/MAPREDUCE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428392#comment-13428392 ] Daryn Sharp commented on MAPREDUCE-4323: {{FileSystem.closeAllForUGI}} is actually a reasonable approach. Each request is creating a new ugi so there's no issue with pulling the rug out from underneath other fs users. NM leaks sockets Key: MAPREDUCE-4323 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4323 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 0.23.0, 0.24.0, 2.0.0-alpha Reporter: Daryn Sharp Priority: Critical The NM is exhausting its fds because it's not closing fs instances when the app is finished. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4323) NM leaks sockets
[ https://issues.apache.org/jira/browse/MAPREDUCE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp reassigned MAPREDUCE-4323: -- Assignee: Daryn Sharp NM leaks sockets Key: MAPREDUCE-4323 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4323 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 0.23.0, 0.24.0, 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical The NM is exhausting its fds because it's not closing fs instances when the app is finished. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4466) Using URI for yarn.nodemanager log dirs fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-4466: -- Fix Version/s: (was: trunk) Target Version/s: 2.2.0-alpha Affects Version/s: 0.23.3 Status: Open (was: Patch Available) Looks like actual log rendering will also be broken - further up in ContainerLogsPage {{new File(this.dirsHandler.getLogPathToRead(}}. Also, changing {{getContainerLogDirs}} may be a cleaner fix. If testNMWebServer.testNMWebApp is modified to use file:// - it ends up creating a dir structure with file:// being the top level directory under the current working dir. That could be modified to verify the patch. All access to the local-dirs and log-dirs happens via the LocalDirsHandlerService - maybe we should have this convert URIs to simple strings. file:// works in other places - since {{Path}} is used instead of {{File}}. Using URI for yarn.nodemanager log dirs fails - Key: MAPREDUCE-4466 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4466 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.3 Reporter: Eli Collins Assignee: Mayank Bansal Priority: Minor Attachments: MAPREDUCE-4466-trunk-v1.patch If I use URIs (eg file:///home/eli/hadoop/dirs) for yarn.nodemanager.log-dirs or yarn.nodemanager.remote-app-log-dir the container log servlet fails with an NPE (works if I remove the file scheme). Using a URI for yarn.nodemanager.local-dirs works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4503) Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles
[ https://issues.apache.org/jira/browse/MAPREDUCE-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428398#comment-13428398 ] Hudson commented on MAPREDUCE-4503: --- Integrated in Hadoop-Mapreduce-trunk-Commit #2572 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2572/]) MAPREDUCE-4503. Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles (Robert Evans via jeagles) (Revision 1369197) Result = FAILURE jeagles : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1369197 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles --- Key: MAPREDUCE-4503 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4503 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.3, 3.0.0, 2.2.0-alpha Attachments: MR-4503.txt, MR-4503.txt in 1.0 if a file was both in a jobs cache archives and cache files, and InvalidJobConfException was thrown. We should replicate this behavior on mrv2. We should also extend it so that if a cache archive or cache file is not going to be downloaded at all because of conflicts in the names of the symlinks a similar exception is thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4514) Symlinks to peer distributed cache files no longer work
Jason Lowe created MAPREDUCE-4514: - Summary: Symlinks to peer distributed cache files no longer work Key: MAPREDUCE-4514 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4514 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache, mrv2 Affects Versions: 0.23.3, 2.0.1-alpha Reporter: Jason Lowe Assignee: Jason Lowe Trying to create a symlink to another file that is specified for the distributed cache will fail to create the link. For example: hadoop jar ... -files x,y,x#z will localize the files x and y as x and y, but the z symlink for x will not be created. This is a regression from 1.x behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428412#comment-13428412 ] Siddharth Seth commented on MAPREDUCE-3902: --- @Tsuyoshi; I'd spoken with Vinod and others about this a while ago. Should have posted this earlier.. Adding the functionality to the AM in the current state is possible - but will further complicate some components which are already quite complicated - and tough to change. The TaskAttempt state machine is currently really a mix of TaskAttempt transitions as well as Container transitions. The RMContaienrAllocator is also dealing with more than it should - Nodes, Containers as well as scheduling. The idea was to split the functionality into a separate TaskAttempt, Container and Node state machine, along with reduced functionality in the scheduler (also decoupling the RM request and AM scheduling). This would make the code cleaner and make re-use (as well as other improvements like handling retired nodes) easier to implement. Had worked with Vinod on the state transitions, and have been working on the implementation in bits and pieces to see how feasible it is. The code is at https://github.com/sidseth/h2-container-reuse . It's a little bit of a mess at the moment, with lots of TODOs, etc splattered all over, but is just about functional. There's no explicit re-use scheduling yet - but re-use can be tested by running a job which requires more containers than available on the cluster (and some config changes). bq. the 2nd topic(combining per container) should be moved, because the change seems to be too big. I believe this was, at least initially, meant to ensure that output from all taskAttempts in one container, would be fetched only once by a reducer (without a common combiner). Either way, that could be a separate jira. MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc. -- Key: MAPREDUCE-3902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, mrv2 Reporter: Arun C Murthy Assignee: Siddharth Seth Attachments: MAPREDUCE-3902.2.patch, MAPREDUCE-3902.patch The MR AM is now in a great position to reuse containers across (map) tasks. This is something similar to JVM re-use we had in 0.20.x, but in a significantly better manner: # Consider data-locality when re-using containers # Consider the new shuffle - ensure that reduces fetch output of the whole container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4503) Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles
[ https://issues.apache.org/jira/browse/MAPREDUCE-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428422#comment-13428422 ] Hudson commented on MAPREDUCE-4503: --- Integrated in Hadoop-Hdfs-trunk-Commit #2618 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2618/]) MAPREDUCE-4503. Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles (Robert Evans via jeagles) (Revision 1369197) Result = SUCCESS jeagles : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1369197 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles --- Key: MAPREDUCE-4503 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4503 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.3, 3.0.0, 2.2.0-alpha Attachments: MR-4503.txt, MR-4503.txt in 1.0 if a file was both in a jobs cache archives and cache files, and InvalidJobConfException was thrown. We should replicate this behavior on mrv2. We should also extend it so that if a cache archive or cache file is not going to be downloaded at all because of conflicts in the names of the symlinks a similar exception is thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4503) Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles
[ https://issues.apache.org/jira/browse/MAPREDUCE-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428425#comment-13428425 ] Hudson commented on MAPREDUCE-4503: --- Integrated in Hadoop-Common-trunk-Commit #2553 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2553/]) MAPREDUCE-4503. Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles (Robert Evans via jeagles) (Revision 1369197) Result = SUCCESS jeagles : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1369197 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles --- Key: MAPREDUCE-4503 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4503 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.3, 3.0.0, 2.2.0-alpha Attachments: MR-4503.txt, MR-4503.txt in 1.0 if a file was both in a jobs cache archives and cache files, and InvalidJobConfException was thrown. We should replicate this behavior on mrv2. We should also extend it so that if a cache archive or cache file is not going to be downloaded at all because of conflicts in the names of the symlinks a similar exception is thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428429#comment-13428429 ] eric baldeschwieler commented on MAPREDUCE-4495: Agree with discussing a particular proposal. I want to point out that the whole point of YARN is to open up the ability to try lots of different changes to MR and to implement lots of alternatives to it in parallel. As a community, we need to be clear that to move fast we need to let lots of different people try lots of different things on top of a stable platform. Pig and Hive folks want to radically change what MR is. There are lots of different ideas for how to do this. With open APIs everyone is empowered to try new things without asking to get their code into the core project. If we don't embrace the principle of new AMs starting outside the core, we are going to have a huge number of arguments like this without making anyone happy. That's not the best way for us to spend our time. I'm not trying to stop anyone from trying anything, I'm trying to reduce friction. My last point is the overhead argument. Arguing that one doesn't want to go to incubator because that adds cost to your project really doesn't look at the whole picture. Adding a new module or sub-project to an existing Apache project creates as much work as doing it in the incubator. It just tosses that work into the lap of the folks maintaining the existing project. When one talks about Apache being about community before code, that doesn't mean one has a right to do anything in the code. One needs to first build consensus that your coding idea is aligned with the community. Any time you add something to a project, you are implicitly asking the others in the community to do a lot of work to support you. That only makes sense if you are working in a direction that the community sees as aligned with the larger goals of the project. Going full circle, Yarn's open APIs have as a goal allowing people to try a lot more things much less expensively. They don't need to get permission to merge their work into MR, which is good for experimenters. Hadoop committers are not burdened with vetting and support many different experiments in Hadoop. The experimenters carry the burden of building community and supporting / selling their ideas. This should save us a lot of time arguing on this list! ;-) Workflow Application Master in YARN --- Key: MAPREDUCE-4495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: - Less number of consumed resources, since only one application master will be spawned for the whole workflow. - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). - More optimization opportunities in terms of collective resource requests. - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4275) Plugable process tree
[ https://issues.apache.org/jira/browse/MAPREDUCE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated MAPREDUCE-4275: --- Attachment: plugable-pstree-6-typofix.txt typo fixed Plugable process tree - Key: MAPREDUCE-4275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4275 Project: Hadoop Map/Reduce Issue Type: Improvement Components: nodemanager Affects Versions: 3.0.0 Environment: FreeBSD 64 bit Reporter: Radim Kolar Attachments: plugable-pstree-1.txt, plugable-pstree-2.txt, plugable-pstree-3.txt, plugable-pstree-4-with-whitespace.txt, plugable-pstree-4.txt, plugable-pstree-5-with-whitespace.txt, plugable-pstree-6-typofix.txt, plugable-pstree.txt Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4514) Symlinks to peer distributed cache files no longer work
[ https://issues.apache.org/jira/browse/MAPREDUCE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428472#comment-13428472 ] Jason Lowe commented on MAPREDUCE-4514: --- This also breaks when trying to create multiple symlinks to the same file, e.g.: {{x#a,x#b,x#c}} only creates the symlink for {{a}} instead of all three. The problem is Container holds a map from resource Path to symlink String, but there could be multiple symlinks to the same source Path. Symlinks to peer distributed cache files no longer work --- Key: MAPREDUCE-4514 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4514 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache, mrv2 Affects Versions: 0.23.3, 2.0.1-alpha Reporter: Jason Lowe Assignee: Jason Lowe Trying to create a symlink to another file that is specified for the distributed cache will fail to create the link. For example: hadoop jar ... -files x,y,x#z will localize the files x and y as x and y, but the z symlink for x will not be created. This is a regression from 1.x behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4515) Add test to check if userlogs are retained across TaskTracker restarts
Karthik Kambatla created MAPREDUCE-4515: --- Summary: Add test to check if userlogs are retained across TaskTracker restarts Key: MAPREDUCE-4515 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4515 Project: Hadoop Map/Reduce Issue Type: Test Reporter: Karthik Kambatla Assignee: Karthik Kambatla -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4514) Symlinks to peer distributed cache files no longer work
[ https://issues.apache.org/jira/browse/MAPREDUCE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4514: -- Attachment: MAPREDUCE-4514.patch Patch that changes Container to map pending and localized resources to ListString instead of String so resources can have multiple symlink destinations. Symlinks to peer distributed cache files no longer work --- Key: MAPREDUCE-4514 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4514 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache, mrv2 Affects Versions: 0.23.3, 2.0.1-alpha Reporter: Jason Lowe Assignee: Jason Lowe Attachments: MAPREDUCE-4514.patch Trying to create a symlink to another file that is specified for the distributed cache will fail to create the link. For example: hadoop jar ... -files x,y,x#z will localize the files x and y as x and y, but the z symlink for x will not be created. This is a regression from 1.x behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4367) mapred job -kill tries to connect to history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428482#comment-13428482 ] Mayank Bansal commented on MAPREDUCE-4367: -- I don't see this in trunk. Is it still the issue? Thanks, Mayank mapred job -kill tries to connect to history server --- Key: MAPREDUCE-4367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4367 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 0.23.3 Reporter: Jason Lowe Priority: Minor The {{mapred job -kill}} command attempts to connect to the history server, even though it is unrelated to the process of killing a job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4514) Symlinks to peer distributed cache files no longer work
[ https://issues.apache.org/jira/browse/MAPREDUCE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4514: -- Target Version/s: 0.23.3, 2.2.0-alpha Status: Patch Available (was: Open) Symlinks to peer distributed cache files no longer work --- Key: MAPREDUCE-4514 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4514 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache, mrv2 Affects Versions: 0.23.3, 2.0.1-alpha Reporter: Jason Lowe Assignee: Jason Lowe Attachments: MAPREDUCE-4514.patch Trying to create a symlink to another file that is specified for the distributed cache will fail to create the link. For example: hadoop jar ... -files x,y,x#z will localize the files x and y as x and y, but the z symlink for x will not be created. This is a regression from 1.x behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4367) mapred job -kill tries to connect to history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428492#comment-13428492 ] Jason Lowe commented on MAPREDUCE-4367: --- Yes, it's still happening for me. From a recent trunk pull on a single-node cluster where the history server isn't running yet: {noformat} $ mapred job -kill job_1344038428359_0002 2012-08-04 00:09:56,871 INFO mapred.ClientServiceDelegate (ClientServiceDelegate.java:getProxy(255)) - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2012-08-04 00:09:57,886 INFO ipc.Client (Client.java:handleConnectionFailure(715)) - Retrying connect to server: includespoke.champ.corp.yahoo.com/10.74.91.112:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2012-08-04 00:09:58,887 INFO ipc.Client (Client.java:handleConnectionFailure(715)) - Retrying connect to server: includespoke.champ.corp.yahoo.com/10.74.91.112:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2012-08-04 00:09:59,890 INFO ipc.Client (Client.java:handleConnectionFailure(715)) - Retrying connect to server: includespoke.champ.corp.yahoo.com/10.74.91.112:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2012-08-04 00:10:00,891 INFO ipc.Client (Client.java:handleConnectionFailure(715)) - Retrying connect to server: includespoke.champ.corp.yahoo.com/10.74.91.112:10020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) ... {noformat} And here's what it says after I start the history server: {noformat} $ mapred job -kill job_1344038428359_0002 2012-08-04 00:12:52,226 INFO mapred.ClientServiceDelegate (ClientServiceDelegate.java:getProxy(255)) - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2012-08-04 00:12:53,195 INFO mapred.ResourceMgrDelegate (ResourceMgrDelegate.java:killApplication(329)) - Killing application application_1344038428359_0002 Killed job job_1344038428359_0002 {noformat} Note that in both cases it says the application state is completed and is redirecting. If the application state is completed, there's no point in redirecting to the history server if we're trying to kill the application. Knowing the application state is completed means we can short-circuit the kill attempt before the redirect. mapred job -kill tries to connect to history server --- Key: MAPREDUCE-4367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4367 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 0.23.3 Reporter: Jason Lowe Priority: Minor The {{mapred job -kill}} command attempts to connect to the history server, even though it is unrelated to the process of killing a job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4514) Symlinks to peer distributed cache files no longer work
[ https://issues.apache.org/jira/browse/MAPREDUCE-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428522#comment-13428522 ] Hadoop QA commented on MAPREDUCE-4514: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539121/MAPREDUCE-4514.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2707//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2707//console This message is automatically generated. Symlinks to peer distributed cache files no longer work --- Key: MAPREDUCE-4514 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4514 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache, mrv2 Affects Versions: 0.23.3, 2.0.1-alpha Reporter: Jason Lowe Assignee: Jason Lowe Attachments: MAPREDUCE-4514.patch Trying to create a symlink to another file that is specified for the distributed cache will fail to create the link. For example: hadoop jar ... -files x,y,x#z will localize the files x and y as x and y, but the z symlink for x will not be created. This is a regression from 1.x behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4275) Plugable process tree
[ https://issues.apache.org/jira/browse/MAPREDUCE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428526#comment-13428526 ] Hadoop QA commented on MAPREDUCE-4275: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539110/plugable-pstree-6-typofix.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2708//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2708//console This message is automatically generated. Plugable process tree - Key: MAPREDUCE-4275 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4275 Project: Hadoop Map/Reduce Issue Type: Improvement Components: nodemanager Affects Versions: 3.0.0 Environment: FreeBSD 64 bit Reporter: Radim Kolar Attachments: plugable-pstree-1.txt, plugable-pstree-2.txt, plugable-pstree-3.txt, plugable-pstree-4-with-whitespace.txt, plugable-pstree-4.txt, plugable-pstree-5-with-whitespace.txt, plugable-pstree-6-typofix.txt, plugable-pstree.txt Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4431) killing already completed job gives ambiguous message as Killed job job id
[ https://issues.apache.org/jira/browse/MAPREDUCE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428541#comment-13428541 ] Harsh J commented on MAPREDUCE-4431: +1, looks good to me too. One comment though (just want your thought): {code} +System.out.println(The job + jobid + has already been killed.); +exitCode = -1; {code} In case the job was already killed, should we perhaps return 0 exitCode (since the kill was (already) successful? killing already completed job gives ambiguous message as Killed job job id -- Key: MAPREDUCE-4431 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4431 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Nishan Shetty Assignee: Devaraj K Priority: Minor Attachments: MAPREDUCE-4431-1.patch, MAPREDUCE-4431.patch If we try to kill the already completed job by the following command it gives ambiguous message as Killed job job id ./mapred job -kill already completed job id -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4516) Error reading task output Server returned HTTP response code: 400 for URL: http://hadoop03:8080/tasklog?plaintext=trueattemptid=attempt_1344047400780_0002_m_000000_0
jiafeng.zhang created MAPREDUCE-4516: Summary: Error reading task output Server returned HTTP response code: 400 for URL: http://hadoop03:8080/tasklog?plaintext=trueattemptid=attempt_1344047400780_0002_m_00_0filter=stdout Key: MAPREDUCE-4516 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4516 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.1 Environment: hadoop-0.23.1 JDK_1.6.0_31 Centos-6.0 Reporter: jiafeng.zhang Fix For: 0.23.1 bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-0.23.1.jar teragen 100 /in_test 12/08/04 11:01:47 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 12/08/04 11:01:47 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 12/08/04 11:01:49 INFO terasort.TeraSort: Generating 100 using 2 12/08/04 11:01:50 INFO mapreduce.JobSubmitter: number of splits:2 12/08/04 11:01:52 INFO mapred.ResourceMgrDelegate: Submitted application application_1344047400780_0002 to ResourceManager at hadoop01/192.168.37.101:8032 12/08/04 11:01:52 INFO mapreduce.Job: The url to track the job: http://hadoop01:50030/proxy/application_1344047400780_0002/ 12/08/04 11:01:52 INFO mapreduce.Job: Running job: job_1344047400780_0002 12/08/04 11:02:11 INFO mapreduce.Job: Job job_1344047400780_0002 running in uber mode : false 12/08/04 11:02:11 INFO mapreduce.Job: map 0% reduce 0% 12/08/04 11:02:19 INFO mapreduce.Job: Task Id : attempt_1344047400780_0002_m_00_0, Status : FAILED 12/08/04 11:02:20 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://hadoop03:8080/tasklog?plaintext=trueattemptid=attempt_1344047400780_0002_m_00_0filter=stdout 12/08/04 11:02:20 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://hadoop03:8080/tasklog?plaintext=trueattemptid=attempt_1344047400780_0002_m_00_0filter=stderr 12/08/04 11:02:25 INFO mapreduce.Job: map 9% reduce 0% 12/08/04 11:02:30 INFO mapreduce.Job: map 13% reduce 0% 12/08/04 11:02:33 INFO mapreduce.Job: map 15% reduce 0% 12/08/04 11:02:40 INFO mapreduce.Job: map 17% reduce 0% 12/08/04 11:02:46 INFO mapreduce.Job: map 18% reduce 0% 12/08/04 11:02:52 INFO mapreduce.Job: map 25% reduce 0% 12/08/04 11:02:56 INFO mapreduce.Job: map 29% reduce 0% 12/08/04 11:03:01 INFO mapreduce.Job: map 31% reduce 0% 12/08/04 11:03:08 INFO mapreduce.Job: map 34% reduce 0% 12/08/04 11:03:11 INFO mapreduce.Job: map 38% reduce 0% 12/08/04 11:03:14 INFO mapreduce.Job: map 42% reduce 0% 12/08/04 11:03:15 INFO mapreduce.Job: map 46% reduce 0% 12/08/04 11:03:17 INFO mapreduce.Job: map 51% reduce 0% 12/08/04 11:03:18 INFO mapreduce.Job: map 55% reduce 0% 12/08/04 11:03:20 INFO mapreduce.Job: map 56% reduce 0% 12/08/04 11:03:24 INFO mapreduce.Job: map 58% reduce 0% 12/08/04 11:03:25 INFO mapreduce.Job: map 59% reduce 0% 12/08/04 11:03:26 INFO mapreduce.Job: map 62% reduce 0% 12/08/04 11:03:28 INFO mapreduce.Job: map 67% reduce 0% 12/08/04 11:03:29 INFO mapreduce.Job: map 71% reduce 0% 12/08/04 11:03:32 INFO mapreduce.Job: map 73% reduce 0% 12/08/04 11:03:33 INFO mapreduce.Job: map 74% reduce 0% 12/08/04 11:03:35 INFO mapreduce.Job: map 76% reduce 0% 12/08/04 11:03:36 INFO mapreduce.Job: map 78% reduce 0% 12/08/04 11:03:38 INFO mapreduce.Job: map 79% reduce 0% 12/08/04 11:03:39 INFO mapreduce.Job: map 81% reduce 0% 12/08/04 11:03:41 INFO mapreduce.Job: map 84% reduce 0% 12/08/04 11:03:44 INFO mapreduce.Job: map 87% reduce 0% 12/08/04 11:03:48 INFO mapreduce.Job: map 90% reduce 0% 12/08/04 11:03:51 INFO mapreduce.Job: map 100% reduce 0% 12/08/04 11:03:52 INFO mapreduce.Job: Job job_1344047400780_0002 completed successfully 12/08/04 11:03:52 INFO mapreduce.Job: Counters: 28 File System Counters FILE: Number of bytes read=240 FILE: Number of bytes written=118412 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=167 HDFS: Number of bytes written=1 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Failed map tasks=1 Launched map tasks=3 Other local map tasks=3 Total time spent by all maps in occupied slots (ms)=193607 Map-Reduce Framework Map input records=100 Map output records=100 Input split bytes=167 Spilled Records=0 Failed Shuffles=0 Merged