Hi all,

I'm trying to create M/R tasks that will output more than one "type" of data. 
Ideal thing would be MultipleOutputs feature of Map Reduce, but in our current 
production version, CDH3 ( 0.20.2 ), this support is broken. 

So, I tried to simulate MultipleOutputs. In Reducer setup I'm opening hdfs 
output stream, during reduce calls writing to stream and in close call closing 
stream. Output streams are named with attempt id inside. This is working great. 
Speculative execution is disabled, but sometimes one of reduce task fail, and 
I' getting two files for reducer on same data. Is there any way to find out 
which task attempts where successful, so I can delete unneeded data after 
successful job? I'm using new MapReduce API. Or some other better idea to 
achieve this?

Best,
Vanja

Komadinovic Vanja
+381 (64) 296 03 43
vanja...@gmail.com


Reply via email to