I saw some puzzling behavior tonight when running a MapReduce program I wrote.
It would perform the mapping just fine, and would begin to shuffle. It got to 33% complete reduce (end of shuffle) and then the task fails, claiming that <output_dir>/_temporary was deleted. I didn't touch HDFS while this was going on. I tried running the job multiple more times, and this repeated twice more. Puzzlingly, I was doing bin/hadoop fs -ls <output_dir> periodically in another window. The _temporary directory got created just fine, but at some point after shuffling began, it was removed. I tried to see if I could manually race this, so I did a mkdir _temporary, and the job proceeded just fine. Even more bizarre, the removal of the _temporary directory did not occur on any subsequent MR jobs (executions of the same, unmodified program). So I can't reproduce the bug. This is on 0.18.2. It went away, so I'm not *too* concerned, but I'd rather not deal with heisenbugs if at all possible So: has anyone seen this behavior? Have you figured out how to reproduce it, or even better, prevent it? Thanks, - Aaron