[jira] [Resolved] (MAPREDUCE-4365) Shipping Profiler Libraries by DistributedCache

2012-06-27 Thread Jie Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Li resolved MAPREDUCE-4365.
---

  Resolution: Fixed
Target Version/s:   (was: 1.1.0)

One way is to include the profiler library into the job jar and use relative 
path like ../../foo.library to locate it.

Thanks Deveraj, Sid, Vinod and everyone!

 Shipping Profiler Libraries by DistributedCache
 ---

 Key: MAPREDUCE-4365
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4365
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 1.0.3
Reporter: Jie Li

 Hadoop profiling is great for performance tuning and debugging, but currently 
 we can only use Java built-in profilers such as HProf, and for other 
 profilers we need to install them on all slave nodes first, which is 
 inconvenient for large clusters and sometimes impossible for production 
 clusters. 
 Supporting shipping profiler libraries using DistributedCache will solve this 
 problem. For example, in mapred.task.profile.params, we specify a profiler 
 library from the DistributedCache using special place holders such as 
 foo.jar, and Hadoop can look at the DistributedCache to replace foo.jar 
 with the localized path before launching the child jvm.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4365) Shipping Profiler Libraries by DistributedCache

2012-06-26 Thread Jie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401803#comment-13401803
 ] 

Jie Li commented on MAPREDUCE-4365:
---

Thanks Arun and Robert. 

I meant profiling tasks and actually I'm using [Hadoop 
profiling|http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Profiling]
 by setting mapred.task.profile.{maps|reduces} so Hadoop will automatically 
send back the profiling output files.

The reason why your approach couldn't work is that, currently it is task's 
responsibility to set up the symlink for the distributed cache, so when TT 
launches the task, the symlink is not set up yet. Note TaskRunner#setupWorkDir 
is called in Child#main.

So one solution is to create the symlink before launching tasks, or we can 
replace the distributed cache entry found in the profiling parameters with the 
localized path for this particular problem?

 Shipping Profiler Libraries by DistributedCache
 ---

 Key: MAPREDUCE-4365
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4365
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 1.0.3
Reporter: Jie Li

 Hadoop profiling is great for performance tuning and debugging, but currently 
 we can only use Java built-in profilers such as HProf, and for other 
 profilers we need to install them on all slave nodes first, which is 
 inconvenient for large clusters and sometimes impossible for production 
 clusters. 
 Supporting shipping profiler libraries using DistributedCache will solve this 
 problem. For example, in mapred.task.profile.params, we specify a profiler 
 library from the DistributedCache using special place holders such as 
 foo.jar, and Hadoop can look at the DistributedCache to replace foo.jar 
 with the localized path before launching the child jvm.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4365) Shipping Profiler Libraries by DistributedCache

2012-06-25 Thread Jie Li (JIRA)
Jie Li created MAPREDUCE-4365:
-

 Summary: Shipping Profiler Libraries by DistributedCache
 Key: MAPREDUCE-4365
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4365
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 1.0.3
Reporter: Jie Li


Hadoop profiling is great for performance tuning and debugging, but currently 
we can only use Java built-in profilers such as HProf, and for other profilers 
we need to install them on all slave nodes first, which is inconvenient for 
large clusters and sometimes impossible for production clusters. 

Supporting shipping profiler libraries using DistributedCache will solve this 
problem. For example, in mapred.task.profile.params, we specify a profiler 
library from the DistributedCache using special place holders such as 
foo.jar, and Hadoop can look at the DistributedCache to replace foo.jar 
with the localized path before launching the child jvm.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4365) Shipping Profiler Libraries by DistributedCache

2012-06-25 Thread Jie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400766#comment-13400766
 ] 

Jie Li commented on MAPREDUCE-4365:
---

Hi Robert,

I don't quite understand your approach, because we need to provide the
path of the profiler libraries to the TaskTracker instead of the
tasks. So if the libraries appear in the task' working directory, how
can the TaskTracker find it when launching the task? And currently TT
doesn't look into the profile parameters to see if there is any
distributed cache entry.

 Shipping Profiler Libraries by DistributedCache
 ---

 Key: MAPREDUCE-4365
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4365
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 1.0.3
Reporter: Jie Li

 Hadoop profiling is great for performance tuning and debugging, but currently 
 we can only use Java built-in profilers such as HProf, and for other 
 profilers we need to install them on all slave nodes first, which is 
 inconvenient for large clusters and sometimes impossible for production 
 clusters. 
 Supporting shipping profiler libraries using DistributedCache will solve this 
 problem. For example, in mapred.task.profile.params, we specify a profiler 
 library from the DistributedCache using special place holders such as 
 foo.jar, and Hadoop can look at the DistributedCache to replace foo.jar 
 with the localized path before launching the child jvm.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira