Re: Lack of data locality in Hadoop-0.20.2

Virajith Jalaparti Tue, 12 Jul 2011 11:35:50 -0700

On 7/12/2011 7:20 PM, Allen Wittenauer wrote:

On Jul 12, 2011, at 10:27 AM, Virajith Jalaparti wrote:

I agree that the scheduler has lesser leeway when the replication factor is
1. However, I would still expect the number of data-local tasks to be more
than 10% even when the replication factor is 1.

        How did you load your data?

        Did you load it from outside the grid or from one of the datanodes?  If 
you loaded from one of the datanodes, you'll basically have no real locality, 
especially with a rep factor of 1.

I create the data using the randomwriter in the hadoop examples. Iessentially run the example at http://wiki.apache.org/hadoop/Sort (%bin/hadoop jar hadoop-*-examples.jar randomwriter rand % bin/hadoop jarhadoop-*-examples.jar sort rand rand-sort) with the necessary parameters.


-Virajith

Re: Lack of data locality in Hadoop-0.20.2

Reply via email to