Hi Dave.

Thank you for your reply!

I have checked {dfs.data.dir}/tmp, the tmp files are there while the job is
running. However, it seems that the tmp files on each node are the same.
That is to say, the whole HDFS is sharing the same tmp files. This looks
strange, because each node should process its own part of data. Do you have
some ideas on this point?

Best regards,
Starry

/* Tomorrow is another day. So is today. */


On Sat, Sep 26, 2009 at 15:07, dave bayer <da...@cloudfactory.org> wrote:

>
> On Sep 25, 2009, at 11:34 PM, Starry SHI wrote:
>
>  Hi.
>>
>> I am wondering where the temp files (intermediate files) are stored. They
>> should be located in the hadoop.tmp.dir by default, right? why I cannot
>> find
>> them in either the local file system and hdfs?
>>
>
> You might look under ${dfs.data.dir}/tmp. Granted, I've not consulted the
> code to verify that is how the path is built, but that is where I've seen
> them on my cluster...
>
>  Another question is about the replication of the intermediate files. By
>> default, will the intermediate (tmp) files be written to HDFS?
>>
>
> No, they live on the node that processed the map task. You wouldn't
> want to spend the cycles/time to do multiple replication of this data out
> to other nodes (and then cleanup it up) when you can rerun the task if
> the node holding the data happens to go down (unlikely).
>
> dave bayer
>

Reply via email to