Do the jobs run on the whole cluster or a single rack?
If you write from a single rack, you will get something similar to what you
described, because the default policy is to put one block locally and 2
blocks on the same remote rack. It does check that there is enough place
available, but does not try to balance.


On Thu, Aug 22, 2013 at 9:41 AM, Marc Sturlese <marc.sturl...@gmail.com>wrote:

> Hey there,
> I've set up rack awareness on my hadoop cluster with replication 3. I have
> 2
> racks and each contains 50% of the nodes.
> I can see that the blocks are spread on the 2 racks, the problem is that
> all
> nodes from a rack are storing 2 replicas and the nodes of the other rack
> just one. If I launch the hadoop balancer script, it will properly spread
> the replicas across the 2 racks, leaving all nodes with exactly the same
> available disk space but, after jobs are running for hours, the data will
> be
> unbalanced again (rack1 having all nodes with less empty disk space than
> all
> nodes from rack2)
>
> Any clue whats going on?
> Thanks in advance
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/rack-awarness-unexpected-behaviour-tp4086029.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>

Reply via email to