On 7/12/2011 7:20 PM, Allen Wittenauer wrote:
On Jul 12, 2011, at 10:27 AM, Virajith Jalaparti wrote:
I agree that the scheduler has lesser leeway when the replication factor is
1. However, I would still expect the number of data-local tasks to be more
than 10% even when the replication factor is 1.
How did you load your data?
Did you load it from outside the grid or from one of the datanodes? If
you loaded from one of the datanodes, you'll basically have no real locality,
especially with a rep factor of 1.
I create the data using the randomwriter in the hadoop examples. I
essentially run the example at http://wiki.apache.org/hadoop/Sort (%
bin/hadoop jar hadoop-*-examples.jar randomwriter rand % bin/hadoop jar
hadoop-*-examples.jar sort rand rand-sort) with the necessary parameters.
-Virajith