Re: streaming question

Amareshwari Sriramadasu Tue, 16 Sep 2008 02:20:29 -0700

Looks like you have to wait for HADOOP-3570 and use -libjars for the same.


Thanks
Amareshwari
Christian Ulrik Søttrup wrote:

Ok i've tried what you suggested and all sorts of combinations with noluck.Then I went through the source of the Streaming lib. It looks like itchecks for the existenceof the combiner while it is building the jobconf i.e. before the jobis sent to the nodes.It calls class.forName() on the combiner in goodClassOrNull() fromStreamUtil.java
called from setJobconf() in StreamJob.java.
Anybody have an idea how i can use a custom combiner? would I have topackage it into the streaming jar?
cheers,
Christian

Dennis Kubes wrote:
If testlink is a package, it should be:
hadoop -jar streaming/hadoop-0.17.0-streaming.jar -input store-output cout -mapper MyProg -combiner testlink.combiner -reducertestlink.reduce -file /home/hadoop/MyProg -cacheFile/shared/part-00000#in.cl -cacheArchive /related/MyJar.jar#testlink
if not a package, remove the testlink part.

Dennis

Christian Ulrik Søttrup wrote:
Ok, so I added the JAR to the cacheArchive option and my commandlooks like this:
hadoop jar streaming/hadoop-0.17.0-streaming.jar -input /store/-output /cout/ -mapper MyProg -combiner testlink/combiner.class-reducer testlink/reduce.class -file /home/hadoop/MyProg -cacheFile/shared/part-00000#in.cl -cacheArchive /related/MyJar.jar#testlink
Now it fails because it cannot find the combiner. The cacheArchiveoption creates a symlink in the local running directory, correct?Just like the cacheFile option? If not how can i then specify whichclass to use?
cheers,
Christian

Amareshwari Sriramadasu wrote:
Dennis Kubes wrote:
If I understand what you are asking you can use the -cacheArchivewith the path to the jar to including the jar file in theclasspath of your streaming job.
Dennis
You can also use -cacheArchive option to include jar file andsymlink the unjarred directory from cwd by providing the uri ashdfs://<path>#link. You have to provide -reducer and -combineroptions as appropriate paths in the unjarred directory.
Thanks
Amareshwari
Christian Søttrup wrote:
Hi all,
I have an application that i use to run with the "hadoop jar"command.
I have now written an optimized version of the mapper in C.
I have run this using the streaming library and everything looksok (using num.reducers=0).
Now i want to use this mapper together with the combiner andreducer from my old .jar file.How do i do this? How can i distribute the jar and run thereducer and combiner from it?
While also running the c program as the mapper in streaming mode.

cheers,
Christian

Re: streaming question

Reply via email to