hi users, I have started writing my first project on Hadoop and am now seeking some guidance from more experienced members.
The project is about running some CPU intensive computations in parallel and should be a straightforward application for MapReduce, as the input dataset can easily be partitioned to independent jobs and the final aggregation is a low cost step. The application, however, relies on a legacy command line exe file (which runs OK under wine). It reads about 10 small files (5mb) from its working folder and produces another 10 as a result. I can easily send those files and the app to all nodes via DistributedCache so that they get stored read-only to the local file system. I now need to get a local working folder for the task-attempt, where I could copy or symlink the relevant inputs, execute the legacy exe, and read off the output. As I understand, the task is returning an HDFS location when I ask for FileOutputFormat.getWorkOutputPath(job); I read from docs that there should be task-attempt local working folder, but I struggle to find a way to get the filesystem path to it, so that I could copy files and pass it in to my app for local processing. Tell me it's an easy one that I've missed. Many Thanks, Chris