Re: help for hadoop begginer

Kay Kay Tue, 02 Feb 2010 10:29:53 -0800

On 2/2/10 9:02 AM, zaki rahaman wrote:

Most of your questions are easily answered by taking a look at the
documentation, FAQs, and some smart Googling/Yahooing/Binging.



1. The main Hadoop project consists of two major components: HDFS (Hadoop
Distributed File System) and MapReduce.

2. Not sure what you mean by your second question.

3. MapReduce is simply a framework for doing distributed computation. And
again, I don't understand what you mean by your "clubbing "Cloud
Computing,Hadoop and
Webhosting"......please this is really important"

4. From my understanding, Hbase and Hypertable are two different
implementations/approaches to solving the problem of having a low-latency
distributed table system similar to BigTable. One major difference is that
Hbase is implemented in Java and built to work on top of HDFS. I don't know
too much about Hypertable other than it's written in C++.

5. Again, Google is your friend. Most map/reduce implementations are pretty
general purpose data processing tasks (aggregations, sorting, filters, etc.)
that aren't specific to search.

6. See #1. MapReduce is one of the major components of the Hadoop Project
and the solution to doing a lot of the data processing.

7. Yes, of course. THe VMware image that Cloudera provides is a good place
to start. I would also watch their videos and presentations.

8. Again, I am not all that familiar with HBase but my understanding is that
this is handled by the regionserver/ZooKeeper setup although I do not know
the details.

On Tue, Feb 2, 2010 at 11:43 AM, nijil<[email protected]>  wrote:

i have read about basic stuff about hadoop..err i have a few doubts...mind
u
am a begginer

1:so is hadoop a file sytem only?

2:can hbase be used instead of other databases in other platforms(eg java)?

For more hbase related questions - please post to -[email protected] .

To add to what zaki had mentioned - hadoop project consists of thedatastructure (HDFS , similar to GFS) and the algorithm ( MapReduce ,based on Ghemawat et. al. in the public domain ).

HBase / Hypertable etc. come under the realm of column orienteddatabases. While it is tempting to use HBase in place of MySQL andsuggest as an analogy - it is specifically meant for large scale dataprocessing with high transactions , and hands-free architecture. Andthe process involves unlearning a lot of concepts in the RDBMS world ,to gain better throughput. By itself - HBase depends on distributed filesystem implementation, primarily HDFS in practice, ( although in theoryit is possible to plug other DFS implementations as well ).

HBase concerns itself only with the structured data representation andthe failover mechanisms of the same, while delegating the storage of theactual data to a distributed file system ( HDFS, say).

To answer #8 , refer to the paxos algorithm for the theory andzookeeper , implements a variation of the same, that is used by HBase.

3:what is mapreduce exactly and hw is it related to hadoop(i mean is it
only
about parallel computing.....i dont understand how much paralell computing
is possible in a hadoop cloud sytem which is use for webhosting) .I require
some help on the topic on clubbing "Cloud Computing,Hadoop and
Webhosting"......please this is really important



4:Is hbase and hypertabe similar or is there a big difference


They are 2 different approaches to solving the same problem.

5:Can some one provide a map reduce implementation example other than
related to search engine.

Go through Amdahl's law to get an (theoretical at least , to begin with)estimate of the parallelism in the code. M-R as a concept can beapplied to the same.Look for an example where NY Times had applied the M-R in EC2 to do somedata intensive jobs.

6:How is mapreduce and hadoop related?

There is a misconception about the hadoop terminology. A better questionwould be - how are mapreduce and hdfs related ? The former being analgorithm and the latter can be assumed to be a data structure,supporting the algorithm, complementary to the same. Of course - theseare grossly over-simplified definitions , with the details spared here,but that helps with the terminology.

Hadoop, the original version of the project, had either of them packedtogether in a single distro, making the line blurry. But there areefforts underway to separate them conceptually under different trees,while maintaining the orthogonality between them. So , hadoop refers tothe eco-system altogether with specific modules addressing specificproblems in the eco system.

7:Can i learn hadoop with a "cloudera's Distribution for Hadoop" vmware
image..........

8:how is database synchronization done in hbase.....i belive hbase is a
distributed database

:can some one provide contact details for further help if u dont
mind........... :)

mailing lists are your friend. Of course - feel free to use them afterappropriate homework. Good luck !

--
View this message in context:
http://old.nabble.com/help-for-hadoop-begginer-tp27423435p27423435.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: help for hadoop begginer

Reply via email to