hi users,

I have started writing my first project on Hadoop and am now seeking some 
guidance from more experienced members.

The project is about running some CPU intensive computations in parallel and 
should be a straightforward application for MapReduce, as the input dataset 
can easily be partitioned to independent jobs and the final aggregation is a 
low cost step. The application, however, relies on a legacy command line exe 
file (which runs OK under wine). It reads about 10 small files (5mb) from its 
working folder and produces another 10 as a result.

I can easily send those files and the app to all nodes via DistributedCache so 
that they get stored read-only to the local file system. I now need to get a 
local working folder for the task-attempt, where I could copy or symlink the 
relevant inputs, execute the legacy exe, and read off the output. As I 
understand, the task is returning an HDFS location when I ask for 
FileOutputFormat.getWorkOutputPath(job);

I read from docs that there should be task-attempt local working folder, but I 
struggle to find a way to get the filesystem path to it, so that I could copy 
files and pass it in to my app for local processing.

Tell me it's an easy one that I've missed.

Many Thanks,
Chris

Reply via email to