This is puzzling me ... With a mapper producing output of size ~ 400 MB ... which one is supposed to be faster?
1) output collector: which will write to local file then copy to HDFS since I don't have reducers. 2) Open a unique local file inside "mapred.local.dir" for each mapper. I thought of (2), but (1) was actually faster ... can someone explains ? Thanks, Mark