Edward Capriolo wrote:
'cloud computing' is a hot term. According to the definition provided
by wikipedia http://en.wikipedia.org/wiki/Cloud_computing,
Hadoop+HBase+Lucene+Zookeeper, fits some of the criteria but not well.
Hadoop is scalable, with HOD it is dynamically scalable.
I do not think (Hadoop+HBase+Lucene+Zookeeper) can be used for
'utility computing'. as managing the stack and getting started is
quite a complex process.
Exactly. Which is why the Apache Clouds proposal emphasises
-Lightweight front end: low Wattage, stateless nodes for web GUI, bonded
to the back end
-instrumentation for liveness and load monitoring. Hadoop has a lot of
this, I'm trying to add more, but we want it everywhere.
-Resource Management: bringing up and tearing down nodes by asking the
infrastructure. Some Apache projects have done this but only for EC2 and
only for their layer of the stack. You need something that keeps track
of everything and acts in your interests, not those of the datacentre
provider
-Packaging for fully automated install/deploy on Linux systems (=rpm and
deb)
-A development process in which the tools push the code out to a
targeted infrastracture even for test runs
Hadoop and friends are part of this, they are a very interesting
foundation, but they are only part of the storing
Also this stack is best running on LAN network with high speed
interlinks. Historically the "Cloud" is composed of WAN links. An
implication of Cloud Computing is that different services would be
running in different geographical locations which is not how hadoop is
normally deployed.
I believe 'Apache Grid Stack' would be a more fitting.
http://en.wikipedia.org/wiki/Grid_computing
Grid computing (or the use of computational grids) is the application
of several computers to a single problem at the same time — usually to
a scientific or technical problem that requires a great number of
computer processing cycles or access to large amounts of data.
Classic Grid computing - OGSi/OGSA is something I want to steer clear
of. Historically, you end up in WS-* and computer management politics.
Furthermore, OGSA never had a good use case except "rewrite your apps
for the cloud and they will be better". They (lets be fair, we) also
focused too much on CPU scheduling, not on storage.
Grid computing via the Wikipedia definition describes exactly what
hadoop does. Without amazon S3 and EC2 hadoop does not fit well into a
'cloud computing' IMHO
To be precise: without a dynamic infrastructure provider that is more
than just AWS: it could be Sun/Oracle, IBM/google, HP/Intel/Yahoo!, it
could be your ops team and Eucalyptus.
The other hardware/service vendors are working on this infrastructure.
Apache doesn't work at that level, but if we provide the code to run on
all of them, we give the users the independence of a particular
infrastructure provider