Hi,
I looked back to the old post trying to find out a solution to my
problem. I am using hadoop 0.20.203 streaming for a C++ program. The
program loads many dictionaries stored in local folders. For example,
mainfolder - dir1 -> dicfile 1
mainfolder - dir1 -> dicfile 2
mainfolder - dir2 -> dicfile 3
mainfolder - dir2 -> dicfile 4
I didn't change those dictionary loading functions in C++ based on the
assumption that the whole directory at mainfolder level could be passed
to streaming. However, it seems not working well cause I observed the
following error:
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed
with code 1
at
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
It seems the program failed to load the dictionaries. What is the most
efficient way to do pass multiple files with directory dependencies in
hadoop streaming? I guess I don't need to change the C++ code, or
should I remove all the directory dependencies in dictionary loading?
Thanks!
Shi
On 6/29/2011 1:44 AM, Guang-Nan Cheng wrote:
Well, my bad. I made a simple test and confirmed that -files works that way
already.
On 06/28/2011 11:19 AM, Guang-Nan Cheng wrote:
I'm fancied about passing a whole ruby app to streaming, so I don't need
to
bother with ruby file dependencies.
For example,
./streaming
...
-mapper 'ruby aaa/bbb/ccc'
-files aaa<--- pass the folder
Is this supported already? If not, any tips on how to make this work?
I'm
willing to add some code by myself and rebuild the streaming jar.
--
Nick Jones