On 2/2/10 9:02 AM, zaki rahaman wrote:
Most of your questions are easily answered by taking a look at the
documentation, FAQs, and some smart Googling/Yahooing/Binging.
1. The main Hadoop project consists of two major components: HDFS (Hadoop
Distributed File System) and MapReduce.
2. Not sure what you mean by your second question.
3. MapReduce is simply a framework for doing distributed computation. And
again, I don't understand what you mean by your "clubbing "Cloud
Computing,Hadoop and
Webhosting"......please this is really important"
4. From my understanding, Hbase and Hypertable are two different
implementations/approaches to solving the problem of having a low-latency
distributed table system similar to BigTable. One major difference is that
Hbase is implemented in Java and built to work on top of HDFS. I don't know
too much about Hypertable other than it's written in C++.
5. Again, Google is your friend. Most map/reduce implementations are pretty
general purpose data processing tasks (aggregations, sorting, filters, etc.)
that aren't specific to search.
6. See #1. MapReduce is one of the major components of the Hadoop Project
and the solution to doing a lot of the data processing.
7. Yes, of course. THe VMware image that Cloudera provides is a good place
to start. I would also watch their videos and presentations.
8. Again, I am not all that familiar with HBase but my understanding is that
this is handled by the regionserver/ZooKeeper setup although I do not know
the details.
On Tue, Feb 2, 2010 at 11:43 AM, nijil<[email protected]> wrote:
i have read about basic stuff about hadoop..err i have a few doubts...mind
u
am a begginer
1:so is hadoop a file sytem only?
2:can hbase be used instead of other databases in other platforms(eg java)?
For more hbase related questions - please post to -
[email protected] .
To add to what zaki had mentioned - hadoop project consists of the
datastructure (HDFS , similar to GFS) and the algorithm ( MapReduce ,
based on Ghemawat et. al. in the public domain ).
HBase / Hypertable etc. come under the realm of column oriented
databases. While it is tempting to use HBase in place of MySQL and
suggest as an analogy - it is specifically meant for large scale data
processing with high transactions , and hands-free architecture. And
the process involves unlearning a lot of concepts in the RDBMS world ,
to gain better throughput. By itself - HBase depends on distributed file
system implementation, primarily HDFS in practice, ( although in theory
it is possible to plug other DFS implementations as well ).
HBase concerns itself only with the structured data representation and
the failover mechanisms of the same, while delegating the storage of the
actual data to a distributed file system ( HDFS, say).
To answer #8 , refer to the paxos algorithm for the theory and
zookeeper , implements a variation of the same, that is used by HBase.
3:what is mapreduce exactly and hw is it related to hadoop(i mean is it
only
about parallel computing.....i dont understand how much paralell computing
is possible in a hadoop cloud sytem which is use for webhosting) .I require
some help on the topic on clubbing "Cloud Computing,Hadoop and
Webhosting"......please this is really important
4:Is hbase and hypertabe similar or is there a big difference
They are 2 different approaches to solving the same problem.
5:Can some one provide a map reduce implementation example other than
related to search engine.
Go through Amdahl's law to get an (theoretical at least , to begin with)
estimate of the parallelism in the code. M-R as a concept can be
applied to the same.
Look for an example where NY Times had applied the M-R in EC2 to do some
data intensive jobs.
6:How is mapreduce and hadoop related?
There is a misconception about the hadoop terminology. A better question
would be - how are mapreduce and hdfs related ? The former being an
algorithm and the latter can be assumed to be a data structure,
supporting the algorithm, complementary to the same. Of course - these
are grossly over-simplified definitions , with the details spared here,
but that helps with the terminology.
Hadoop, the original version of the project, had either of them packed
together in a single distro, making the line blurry. But there are
efforts underway to separate them conceptually under different trees,
while maintaining the orthogonality between them. So , hadoop refers to
the eco-system altogether with specific modules addressing specific
problems in the eco system.
7:Can i learn hadoop with a "cloudera's Distribution for Hadoop" vmware
image..........
8:how is database synchronization done in hbase.....i belive hbase is a
distributed database
:can some one provide contact details for further help if u dont
mind........... :)
mailing lists are your friend. Of course - feel free to use them after
appropriate homework. Good luck !
--
View this message in context:
http://old.nabble.com/help-for-hadoop-begginer-tp27423435p27423435.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.