Edward Capriolo wrote:
'cloud computing' is a hot term. According to the definition provided
by wikipedia http://en.wikipedia.org/wiki/Cloud_computing,
Hadoop+HBase+Lucene+Zookeeper, fits some of the criteria but not well.

Hadoop is scalable, with HOD it is dynamically scalable.

I do not think (Hadoop+HBase+Lucene+Zookeeper) can be used for
'utility computing'. as managing the stack and getting started is
quite a complex process.

Exactly. Which is why the Apache Clouds proposal emphasises

-Lightweight front end: low Wattage, stateless nodes for web GUI, bonded to the back end

-instrumentation for liveness and load monitoring. Hadoop has a lot of this, I'm trying to add more, but we want it everywhere.

-Resource Management: bringing up and tearing down nodes by asking the infrastructure. Some Apache projects have done this but only for EC2 and only for their layer of the stack. You need something that keeps track of everything and acts in your interests, not those of the datacentre provider

-Packaging for fully automated install/deploy on Linux systems (=rpm and deb)

-A development process in which the tools push the code out to a targeted infrastracture even for test runs

Hadoop and friends are part of this, they are a very interesting foundation, but they are only part of the storing

Also this stack is best running on LAN network with high speed
interlinks. Historically the "Cloud" is composed of WAN links. An
implication of Cloud Computing is that different services would be
running in different geographical locations which is not how hadoop is
normally deployed.

I believe 'Apache Grid Stack' would be a more fitting.

http://en.wikipedia.org/wiki/Grid_computing

Grid computing (or the use of computational grids) is the application
of several computers to a single problem at the same time — usually to
a scientific or technical problem that requires a great number of
computer processing cycles or access to large amounts of data.

Classic Grid computing - OGSi/OGSA is something I want to steer clear of. Historically, you end up in WS-* and computer management politics. Furthermore, OGSA never had a good use case except "rewrite your apps for the cloud and they will be better". They (lets be fair, we) also focused too much on CPU scheduling, not on storage.

Grid computing via the Wikipedia definition describes exactly what
hadoop does. Without amazon S3 and EC2 hadoop does not fit well into a
'cloud computing' IMHO

To be precise: without a dynamic infrastructure provider that is more than just AWS: it could be Sun/Oracle, IBM/google, HP/Intel/Yahoo!, it could be your ops team and Eucalyptus.

The other hardware/service vendors are working on this infrastructure. Apache doesn't work at that level, but if we provide the code to run on all of them, we give the users the independence of a particular infrastructure provider

Reply via email to