Looks like you have to wait for HADOOP-3570 and use -libjars for the same.

Thanks
Amareshwari
Christian Ulrik Søttrup wrote:
Ok i've tried what you suggested and all sorts of combinations with no luck. Then I went through the source of the Streaming lib. It looks like it checks for the existence of the combiner while it is building the jobconf i.e. before the job is sent to the nodes. It calls class.forName() on the combiner in goodClassOrNull() from StreamUtil.java
called from setJobconf() in StreamJob.java.

Anybody have an idea how i can use a custom combiner? would I have to package it into the streaming jar?

cheers,
Christian

Dennis Kubes wrote:
If testlink is a package, it should be:

hadoop -jar streaming/hadoop-0.17.0-streaming.jar -input store -output cout -mapper MyProg -combiner testlink.combiner -reducer testlink.reduce -file /home/hadoop/MyProg -cacheFile /shared/part-00000#in.cl -cacheArchive /related/MyJar.jar#testlink

if not a package, remove the testlink part.

Dennis

Christian Ulrik Søttrup wrote:
Ok, so I added the JAR to the cacheArchive option and my command looks like this:

hadoop jar streaming/hadoop-0.17.0-streaming.jar -input /store/ -output /cout/ -mapper MyProg -combiner testlink/combiner.class -reducer testlink/reduce.class -file /home/hadoop/MyProg -cacheFile /shared/part-00000#in.cl -cacheArchive /related/MyJar.jar#testlink

Now it fails because it cannot find the combiner. The cacheArchive option creates a symlink in the local running directory, correct? Just like the cacheFile option? If not how can i then specify which class to use?

cheers,
Christian

Amareshwari Sriramadasu wrote:
Dennis Kubes wrote:
If I understand what you are asking you can use the -cacheArchive with the path to the jar to including the jar file in the classpath of your streaming job.

Dennis

You can also use -cacheArchive option to include jar file and symlink the unjarred directory from cwd by providing the uri as hdfs://<path>#link. You have to provide -reducer and -combiner options as appropriate paths in the unjarred directory.

Thanks
Amareshwari
Christian Søttrup wrote:
Hi all,

I have an application that i use to run with the "hadoop jar" command.
I have now written an optimized version of the mapper in C.
I have run this using the streaming library and everything looks ok (using num.reducers=0).

Now i want to use this mapper together with the combiner and reducer from my old .jar file. How do i do this? How can i distribute the jar and run the reducer and combiner from it?
While also running the c program as the mapper in streaming mode.

cheers,
Christian





Reply via email to