Can it be used for example in a web hosting application to process site requests in the form of load balancing etc
Sent from my iPhone > On 07 Feb 2015, at 09:45, Matt Wallis <[email protected]> wrote: > > Hi Jonathan, > >> On 7 Feb 2015, at 6:20 pm, Jonathan Aquilina <[email protected]> wrote: >> >> Can someone explain to me what exactly the purpose of hadoop is and what we >> mean when we say big data? Is this for data storage and retrieval? Number >> crunching? > > Hadoop can be thought of as HTPC, High Throughput Computing, over a > collection of simple servers. Where in HPC you might have hundreds of nodes > with a shared file system working on the same copy of the data, Hadoop > distributes the data to local storage in each node of the cluster using the > Hadoop Filesystem, and then collects the output at the end. I believe it has > built in redundancy, allowing you to distribute the same job to 2 or 3 nodes > for fault tolerance. It means your "cluster" can be very simple, no complex > parallel filesystems, no specialised networks, no redundancy at the hardware > level. > > Originally built to work with MapReduce as it's core application, there are a > number of other applications that can be found on the Apache website. > > As for big data, this is basically about taking things like 10 billion > tweets, breaking them up into chunks of 500,000 or so, and doing analytics on > them. Things like that break up very easily for distribution, as there is > usually very little linkage between each tweet. > > Hadoop came out of the need for places like Google, Yahoo, Paypal and eBay to > process terabytes of transaction logs an hour. They already had the servers, > but they were in data centres all over the world. Rather than hook them all > up to some common file server, just build a system to package up the data and > the application and send it where ever can process it the quickest. Send it 3 > times to make sure it gets done, then pull back the results at the end. > > Matt. > _______________________________________________ > Beowulf mailing list, [email protected] sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
