Hi, As I suspected, cache files are symlinked after a child JVM is started: TaskRunner.setupWorkDir is being called from org.apache.hadoop.mapred.Child.main. This is unfortunate as it makes impossible to leverage distributed cache for the purpose of deploying JVM agents. I could submit a jira if there is any interest in getting this to work. Otherwise, I'll think of some other hacks and use a distributed scp as a last resort.
Thanks, stan On Thu, Jan 17, 2013 at 2:32 PM, Stan Rosenberg <stan.rosenb...@gmail.com> wrote: > Hi, > > I am back with my original problem. I am trying to bootstrap child > JVM via -javaagent. I am doing what Harsh and Arun suggested, which > also agrees with the documentation. > In theory this should work, but it doesn't. Any ideas before I start > digging into the code? Thanks. > > Here is the command I am using to test: > > hadoop jar /usr/lib/hadoop/hadoop-examples-0.20.2-cdh3u3.jar wordcount > -files "core-tools-0.0.1-SNAPSHOT-common-assembly.jar#foo.jar" > -Dmapred.map.child.java.opts="-javaagent:./foo.jar=classes=.*" test1 > output > > I can see the following (relevant) properties set in job.xml, > > mapred.cache.files=/user/srosenberg/.staging/job_201211061805_50132/files/core-tools-0.0.1-SNAPSHOT-common-assembly.jar#foo.jar > mapred.create.symlink=yes > mapred.map.child.java.opts=-javaagent:./foo.jar=classes=.* > > The map tasks fail with the following stdout/stderr output, resp., > > Error occurred during initialization of VM > agent library failed to init: instrument > > Error opening zip file or JAR manifest missing : ./foo.jar > > This seems like the jar is not symlinked into the current working > directory of the child JVM; or perhaps the symlinking happens after > the child JVM starts? > > > > > On Fri, Aug 3, 2012 at 1:31 PM, Harsh J <ha...@cloudera.com> wrote: >> Stan, >> >> What Arun says would surely work. >> >> For instance, read this command: >> >> hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0.jar pi >> -files >> "share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.0.0.jar#foo.jar" >> -Dmapred.child.java.opts="-javaagent:./foo.jar" 1 1 >> >> What this would do is merely take your passed -files jar (client-common) and >> symlink it into the JVM's working directory (the task's working directory) >> _before_ the JVM is begun, as "foo.jar". So if I pass additionally, JVM opts >> that refer to this foo.jar under ./, then it would work as you expect it to, >> as the JVM is begun from that directory (its CWD). >> >> Do let us know if this solves it and also makes sense? >> >> >> On Fri, Aug 3, 2012 at 10:02 PM, Stan Rosenberg <stan.rosenb...@gmail.com> >> wrote: >>> >>> Arun, >>> >>> I don't believe the symlink is of help. The symlink is created in the >>> task's current working directory (cwd), but I don't know what cwd is >>> when I launch with 'hadoop jar ...'. >>> >>> Thanks, >>> >>> stan >>> >>> On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy <a...@hortonworks.com> wrote: >>> > Stan, >>> > >>> > You can ask TT to create a symlink to your jar shipped via DistCache: >>> > >>> > >>> > http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache >>> > >>> > That should give you what you want. >>> > >>> > hth, >>> > Arun >>> > >>> > On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: >>> > >>> > Hi, >>> > >>> > I am seeking a way to leverage hadoop's distributed cache in order to >>> > ship jars that are required to bootstrap a task's jvm, i.e., before a >>> > map/reduce task is launched. >>> > As a concrete example, let's say that I need to launch with >>> > '-javaagent:/path/profiler.jar'. In theory, the task tracker is >>> > responsible for downloading cached files onto its local filesystem. >>> > However, the absolute path to a given cached file is not known a >>> > priori; however, we need the path in order to configure '-javaagent'. >>> > >>> > Is this currently possible with the distributed cache? If not, is the >>> > use case appealing enough to open a jira ticket? >>> > >>> > Thanks, >>> > >>> > stan >>> > >>> > >>> > -- >>> > Arun C. Murthy >>> > Hortonworks Inc. >>> > http://hortonworks.com/ >>> > >>> > >> >> >> >> >> -- >> Harsh J