> Hello Jonathan. > Here it is a good document to get you thinking. > http://www.cs.berkeley.edu/~rxin/db-papers/WarehouseScaleComputing.pdf > > Although Doug said "Oh, and Hadoop clusters are not going to supplant your > HPC > cluster"
I should have continued, ... and there will be overlap. -- Doug > I believe that there is an ongoing effort to converge Cloud computing (eg. > Hadoop) and HPC. > The key things are exposed in the link I provided. > To me the convergence is summarized in: > -strong scalability. > -reliability/fault tolerance. > -programming productivity. > -standarized/cheap infrastructure. > > Joshua > > ------ Original Message ------ > Received: 09:20 AM PST, 02/07/2015 > From: Jonathan Aquilina <[email protected]> > To: Douglas Eadline <[email protected]>Cc: Beowulf > <[email protected]> > Subject: Re: [Beowulf] hadoop > >> >> >> Hey Douglas, >> >> Thanks for the information, what has me curious is if it can be used for >> example in applications which dont involve large amounts of data. >> >> It would be great if you or anyone has any resources like ebooks are >> useful websites to read up on it would be great if you could send them >> reason being where I am working we deal with lots of live telemetry in >> terms of positioning etc. and since we are going to be moving our system >> away from windows to open source technologies such as angular.js for the >> web site of our platform as well as mongodb and nodejs, we will be >> implementing hadoop from amazon to take advantage of Amazon's elastic >> map reduce. >> >> --- >> Regards, >> Jonathan Aquilina >> Founder Eagle Eye T >> >> On 2015-02-07 17:33, Douglas Eadline wrote: >> >> > Jonathan >> > >> > I understand your confusion. Hadoop and Big Data have reached >> > overused but not well understood status years ago. >> > >> > First, Hadoop started out at a MapReduce engine. This all >> > changed with Hadoop V2 and YARN (Yet Another Resource Negotiator) >> > Hadoop V2 can be considered a platform on which applications that need >> > parallel access to large amounts of unstructured data (i.e. raw data >> not >> > in a traditional database. It can also used with its own database >> HBase, >> > which is based on Google Big Table. >> > >> > The idea is this, a "Hadoop" cluster has a large amount of storage >> > using HDFS (or possibly another parallel filesystem) This is often > referred >> > to as the "Data Lake." Raw data is dumped in the lake. There is no >> > ETL (Extract Transform and Load) step. Various Hadoop YARN frameworks >> use >> > this data. YARN provides a very dynamic resource allocation model and >> the >> > ability to provide data locality to your application (i.e. the > traditional >> > MapReduce idea was "move the computation to the data") >> > >> > Thus in a Hadoop V2 cluster you can have MapReduce applications (which >> > support many of the the popular apps like Pig and Hive) It also >> supports >> > Spark, Storm, Giraph and even MPI (not the most efficient but it >> works) >> > There are many other applications being ported to YARN. >> > >> > Second, Big Data is usually defined by Volume, Velocity, and Variety. >> > The definition seems to be what ever a vendor wants it to be, however. >> > It reminds me of products that suddenly became "grid ready" in years > past. >> > Again such designations mean as much as "now works with binary data" >> > >> > Finally, if you are interested in Hadoop YARN you can check out the >> book >> > "Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with >> > Apache Hadoop 2" (I helped write it). There also many online >> resources. >> > The first chapter of the book has the history of Hadoop as written by >> > one of the developers. It is quite interested to read and helps dispel >> > many of the Hadoop myths. You can read this chapter for free here: >> > >> > > http://ptgmedia.pearsoncmg.com/images/9780321934505/samplepages/0321934504.pdf > [2]That is enough Hadoop for Saturday morning. Oh, and Hadoop clusters >> > are not going to supplant your HPC cluster. >> > >> > -- >> > Doug >> > >> >> Can someone explain to me what exactly the purpose of hadoop is and >> what > we mean when we say big data? Is this for data storage and retrieval? > Number > crunching? -- Regards, Jonathan Aquilina Founder Eagle Eye T -- > Mailscanner: > Clean _______________________________________________ Beowulf mailing > list, > [email protected] sponsored by Penguin Computing To change your > subscription > (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf [1] >> > >> > -- >> > Doug >> >> >> Links: >> ------ >> [1] http://www.beowulf.org/mailman/listinfo/beowulf >> [2] >> > http://ptgmedia.pearsoncmg.com/images/9780321934505/samplepages/0321934504.pdf > >> _______________________________________________ >> Beowulf mailing list, [email protected] sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >> > > > > -- > Mailscanner: Clean > > -- Doug -- Mailscanner: Clean _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
