Hi,

I looked back to the old post trying to find out a solution to my problem. I am using hadoop 0.20.203 streaming for a C++ program. The program loads many dictionaries stored in local folders. For example,

mainfolder - dir1 ->  dicfile 1
mainfolder - dir1 ->  dicfile 2
mainfolder - dir2 ->  dicfile 3
mainfolder - dir2 ->  dicfile 4

I didn't change those dictionary loading functions in C++ based on the assumption that the whole directory at mainfolder level could be passed to streaming. However, it seems not working well cause I observed the following error:

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
with code 1
        at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
        at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)


It seems the program failed to load the dictionaries. What is the most efficient way to do pass multiple files with directory dependencies in hadoop streaming? I guess I don't need to change the C++ code, or should I remove all the directory dependencies in dictionary loading?

Thanks!

Shi

On 6/29/2011 1:44 AM, Guang-Nan Cheng wrote:
Well, my bad. I made a simple test and confirmed that  -files works that way
already.

On 06/28/2011 11:19 AM, Guang-Nan Cheng wrote:

I'm fancied about passing a whole ruby app to streaming, so I don't need
to
bother with ruby file dependencies.

For example,

./streaming

...
-mapper 'ruby aaa/bbb/ccc'
-files  aaa<--- pass the folder




Is this supported already? If not, any tips on how to make this work?
I'm
willing to add some code by myself and rebuild the streaming jar.

--
Nick Jones




Reply via email to