On Sep 25, 2009, at 11:34 PM, Starry SHI wrote:
Hi.
I am wondering where the temp files (intermediate files) are stored.
They
should be located in the hadoop.tmp.dir by default, right? why I
cannot find
them in either the local file system and hdfs?
You might look under ${dfs.data.dir}/tmp. Granted, I've not consulted
the
code to verify that is how the path is built, but that is where I've
seen
them on my cluster...
Another question is about the replication of the intermediate files.
By
default, will the intermediate (tmp) files be written to HDFS?
No, they live on the node that processed the map task. You wouldn't
want to spend the cycles/time to do multiple replication of this data
out
to other nodes (and then cleanup it up) when you can rerun the task if
the node holding the data happens to go down (unlikely).
dave bayer