[jira] [Resolved] (MAPREDUCE-4365) Shipping Profiler Libraries by DistributedCache
[ https://issues.apache.org/jira/browse/MAPREDUCE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Li resolved MAPREDUCE-4365. --- Resolution: Fixed Target Version/s: (was: 1.1.0) One way is to include the profiler library into the job jar and use relative path like ../../foo.library to locate it. Thanks Deveraj, Sid, Vinod and everyone! Shipping Profiler Libraries by DistributedCache --- Key: MAPREDUCE-4365 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4365 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 1.0.3 Reporter: Jie Li Hadoop profiling is great for performance tuning and debugging, but currently we can only use Java built-in profilers such as HProf, and for other profilers we need to install them on all slave nodes first, which is inconvenient for large clusters and sometimes impossible for production clusters. Supporting shipping profiler libraries using DistributedCache will solve this problem. For example, in mapred.task.profile.params, we specify a profiler library from the DistributedCache using special place holders such as foo.jar, and Hadoop can look at the DistributedCache to replace foo.jar with the localized path before launching the child jvm. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4365) Shipping Profiler Libraries by DistributedCache
[ https://issues.apache.org/jira/browse/MAPREDUCE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401803#comment-13401803 ] Jie Li commented on MAPREDUCE-4365: --- Thanks Arun and Robert. I meant profiling tasks and actually I'm using [Hadoop profiling|http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Profiling] by setting mapred.task.profile.{maps|reduces} so Hadoop will automatically send back the profiling output files. The reason why your approach couldn't work is that, currently it is task's responsibility to set up the symlink for the distributed cache, so when TT launches the task, the symlink is not set up yet. Note TaskRunner#setupWorkDir is called in Child#main. So one solution is to create the symlink before launching tasks, or we can replace the distributed cache entry found in the profiling parameters with the localized path for this particular problem? Shipping Profiler Libraries by DistributedCache --- Key: MAPREDUCE-4365 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4365 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 1.0.3 Reporter: Jie Li Hadoop profiling is great for performance tuning and debugging, but currently we can only use Java built-in profilers such as HProf, and for other profilers we need to install them on all slave nodes first, which is inconvenient for large clusters and sometimes impossible for production clusters. Supporting shipping profiler libraries using DistributedCache will solve this problem. For example, in mapred.task.profile.params, we specify a profiler library from the DistributedCache using special place holders such as foo.jar, and Hadoop can look at the DistributedCache to replace foo.jar with the localized path before launching the child jvm. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4365) Shipping Profiler Libraries by DistributedCache
Jie Li created MAPREDUCE-4365: - Summary: Shipping Profiler Libraries by DistributedCache Key: MAPREDUCE-4365 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4365 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 1.0.3 Reporter: Jie Li Hadoop profiling is great for performance tuning and debugging, but currently we can only use Java built-in profilers such as HProf, and for other profilers we need to install them on all slave nodes first, which is inconvenient for large clusters and sometimes impossible for production clusters. Supporting shipping profiler libraries using DistributedCache will solve this problem. For example, in mapred.task.profile.params, we specify a profiler library from the DistributedCache using special place holders such as foo.jar, and Hadoop can look at the DistributedCache to replace foo.jar with the localized path before launching the child jvm. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4365) Shipping Profiler Libraries by DistributedCache
[ https://issues.apache.org/jira/browse/MAPREDUCE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400766#comment-13400766 ] Jie Li commented on MAPREDUCE-4365: --- Hi Robert, I don't quite understand your approach, because we need to provide the path of the profiler libraries to the TaskTracker instead of the tasks. So if the libraries appear in the task' working directory, how can the TaskTracker find it when launching the task? And currently TT doesn't look into the profile parameters to see if there is any distributed cache entry. Shipping Profiler Libraries by DistributedCache --- Key: MAPREDUCE-4365 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4365 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 1.0.3 Reporter: Jie Li Hadoop profiling is great for performance tuning and debugging, but currently we can only use Java built-in profilers such as HProf, and for other profilers we need to install them on all slave nodes first, which is inconvenient for large clusters and sometimes impossible for production clusters. Supporting shipping profiler libraries using DistributedCache will solve this problem. For example, in mapred.task.profile.params, we specify a profiler library from the DistributedCache using special place holders such as foo.jar, and Hadoop can look at the DistributedCache to replace foo.jar with the localized path before launching the child jvm. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira