On Thu, Mar 12, 2009 at 8:05 AM, Mat Hofschen <[email protected]> wrote:
> ... > > I can see that our hardware will not be sufficient. For now this is a > testlab setup and will have to be upgraded. > Unless you do as Jon suggests and split your cluster. Apart from Jon's suggestion, you might also consider running the regionservers and datanodes together and tasktrackers elsewhere. > One more question to understand the scenario better: > I have 120 reduce jobs running on all nodes and there is only one node that > hosts the initial region. Then all 120 reduce jobs are trying to write to > this one machine? Yes. > What happens then if the region is split? Do some of the > Reduce Jobs notice that write ops go to a new region, or are they still > writing to the first region which then redirects traffic? All reducers notice the split and will write to the appropriate region. For example, on first split, assuming your MR job sorted rows, half the load should go to the first region and the other half to the second. St.Ack
