Re: Which replica?

Jim Cipar Tue, 02 Dec 2008 08:52:15 -0800

I'm looking at alternative policies for task and data placement. As afirst step, I'd like to be able to observe what Hadoop is doing withoutmodifying our cluster's software. We saw that the datanodes log everyblock that is read from them, but we didn't see any way to map fromthose block names to a (filename, chunk) pair.



Doug Cutting wrote:

A task may read from more than one block. For example, inline-oriented input, lines frequently cross block boundaries. And ablock may be read from more than one host. For example, if a datanodedies midway through providing a block, the client will switch to usinga different datanode. So the mapping is not simple. This informationis also not, as you inferred, available to applications. Why do youneed this? Do you have a compelling reason?
Doug

James Cipar wrote:
Is there any way to determine which replica of each chunk is read bya map-reduce program? I've been looking through the hadoop code, andit seems like it tries to hide those kinds of details from the higherlevel API. Ideally, I'd like the host the task was running on, thefile name and chunk number, and the host the chunk was read from.

Re: Which replica?

Reply via email to