Hi,

I have a need to run Hadoop streaming jobs from a Java app and I'm looking
for the best way to do this. I though there was discussion around this on
the list a few months back, but I couldn't fine it.

Looking through the code, I've found one way to do this, but it seems clunky
and I suspect there's a better approach:

String[] args = new String[] {
     "/path/to/hadoop-0.18.3-streaming.jar",
     "-input", "sample_data",
     "-output", "sample_data/output",
     "-file", "src-python/mapper.py",
     "-file", "src-python/reducer.py",
     "-mapper", "mapper.py", "-reducer", "reducer.py", "-inputformat",
     "org.apache.hadoop.mapred.KeyValueTextInputFormat"
    };

RunJar.main(args);

I'd like to have a more type-safe approach to running this jar than using a
String[] to pass params to a main method (i.e. using something like JobConf)
. Also, I'd like to not have to know the location of the streaming jar on
disk, but instead have it be taken from the classpath.

Does anyone have any examples or suggestions re how to do this?

thanks,
Bill

Reply via email to