Re: [Beowulf] hadoop

Prentice Bisbal Mon, 09 Feb 2015 10:48:13 -0800

Time for my two cents...

For the best understanding of Hadoop, I think Google's original paperson MapReduce and GFS (Google File System) are still the best startingsource. If for no other reason, they were written before the the Hadoophype-train left the station, so they don't claim that MapReduce can fixevery problem.


http://research.google.com/archive/mapreduce.html
http://research.google.com/archive/gfs.html


On 02/07/2015 01:38 PM, Douglas Eadline wrote:

Hello Jonathan.
Here it is a good document to get you thinking.
http://www.cs.berkeley.edu/~rxin/db-papers/WarehouseScaleComputing.pdf

Although Doug said "Oh, and Hadoop clusters are not going to supplant your
HPC
cluster"

I should have continued, ... and there will be overlap.

Definitely. Glen Lockwood has written a great article on exactly this,which I think sums up the issues perfectly and has been discussed onthis list in the past:


http://glennklockwood.blogspot.com/2014/05/hadoops-uncomfortable-fit-in-hpc.html

In my opinion, a common source of confusion is that people are using theterm 'big data' to refer to ALL data that takes up GBs, TBs, or PBs. Mydefinition includes 'unstructured data from disparate sources', whichmeans data from different sources, in different formats. A relationaldatabase, like Oracle or MySQL, regardless of its size doesn't fit thisdefinition, since it's structured. I also don't consider the output ofHPC simulations to be 'big data' since the output files of thosesimulations are usually structured in some sort of way (HDF5, orwhatever). Are those example a LOT of data? Yes, but I don't considerthem to be 'big data'.


I hope I didn't just start a religious war.

--
Prentice


_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] hadoop

Reply via email to