Re: What do we call Hadoop+HBase+Lucene+Zookeeper+etc....

Steve Loughran Wed, 06 May 2009 03:01:29 -0700

Edward Capriolo wrote:

'cloud computing' is a hot term. According to the definition provided
by wikipedia http://en.wikipedia.org/wiki/Cloud_computing,
Hadoop+HBase+Lucene+Zookeeper, fits some of the criteria but not well.


Hadoop is scalable, with HOD it is dynamically scalable.

I do not think (Hadoop+HBase+Lucene+Zookeeper) can be used for
'utility computing'. as managing the stack and getting started is
quite a complex process.


Exactly. Which is why the Apache Clouds proposal emphasises

-Lightweight front end: low Wattage, stateless nodes for web GUI, bondedto the back end

-instrumentation for liveness and load monitoring. Hadoop has a lot ofthis, I'm trying to add more, but we want it everywhere.

-Resource Management: bringing up and tearing down nodes by asking theinfrastructure. Some Apache projects have done this but only for EC2 andonly for their layer of the stack. You need something that keeps trackof everything and acts in your interests, not those of the datacentreprovider

-Packaging for fully automated install/deploy on Linux systems (=rpm anddeb)

-A development process in which the tools push the code out to atargeted infrastracture even for test runs

Hadoop and friends are part of this, they are a very interestingfoundation, but they are only part of the storing


Also this stack is best running on LAN network with high speed
interlinks. Historically the "Cloud" is composed of WAN links. An
implication of Cloud Computing is that different services would be
running in different geographical locations which is not how hadoop is
normally deployed.

I believe 'Apache Grid Stack' would be a more fitting.

http://en.wikipedia.org/wiki/Grid_computing

Grid computing (or the use of computational grids) is the application
of several computers to a single problem at the same time — usually to
a scientific or technical problem that requires a great number of
computer processing cycles or access to large amounts of data.

Classic Grid computing - OGSi/OGSA is something I want to steer clearof. Historically, you end up in WS-* and computer management politics.Furthermore, OGSA never had a good use case except "rewrite your appsfor the cloud and they will be better". They (lets be fair, we) alsofocused too much on CPU scheduling, not on storage.

Grid computing via the Wikipedia definition describes exactly what
hadoop does. Without amazon S3 and EC2 hadoop does not fit well into a
'cloud computing' IMHO

To be precise: without a dynamic infrastructure provider that is morethan just AWS: it could be Sun/Oracle, IBM/google, HP/Intel/Yahoo!, itcould be your ops team and Eucalyptus.

The other hardware/service vendors are working on this infrastructure.Apache doesn't work at that level, but if we provide the code to run onall of them, we give the users the independence of a particularinfrastructure provider

Re: What do we call Hadoop+HBase+Lucene+Zookeeper+etc....

Reply via email to