Re: Quick question

Andrei Savu Thu, 03 Nov 2011 01:12:44 -0700

That's quite a large cluster and I'm not sure if it's even possible with
the current
implementation (we are already seeing problems when starting clusters with
20+ nodes).

I think in order to get there we need to start by working on jclouds:
http://www.jclouds.org/documentation/reference/pool-design

and by optimising the Apache Hadoop setup scripts for this size.

Some of the questions we need to address are:

* How do we provide the needed artefacts to a large number of machines
being setup at the same time? In this case the blob store should be good
enough.

* How do we start the region servers and the task trackers to prevent
the risk of DDoS-ing the namenode and the jobtracker?

* Is it a good idea to implement this as a cluster resize? (e.g. start with
a smaller cluster and keep on adding region servers and task trackers
until you reach the desired size)

* Should Whirr implement policies for repairing failed nodes? I would
expect to see failures at this scale quite often.

I'm sure that there are many other  questions relevant here. Let me know
what you think.

What's the usage scenario you have in mind?

-- Andrei Savu

On Thu, Nov 3, 2011 at 4:30 AM, Edward J. Yoon <[email protected]>wrote:

> Hi,
>
> When user want to deploy Hadoop 1 thousand VMs cluster using Whirr and
> JClouds, how many API calls e.g., auth + VM creation + firewall
> settings, .., etc can be made?
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Quick question

Reply via email to