Re: Hadoop with Netapp

Steve Loughran Thu, 01 Sep 2011 02:49:31 -0700

On 25/08/11 08:20, Sagar Shukla wrote:

Hi Hakan,

         Please find my comments inline in blue :

-----Original Message-----
From: Hakan (c)lter [mailto:hakanil...@gmail.com]
Sent: Thursday, August 25, 2011 12:28 PM
To: common-user@hadoop.apache.org
Subject: Hadoop with Netapp

Hi everyone,

We are going to create a new Hadoop cluster in our company, i have to get some 
advises from you:

1. Does anyone have stored whole Hadoop data not on local disks but on Netapp 
or other storage system? Do we have to store datas on local disks, if so is it 
because of performace issues?

<sagar>: Yes, we were using SAN LUNs for storing Hadoop data. SAN works faster 
than NAS in terms of performance while writing the data to the storage. Also SAN LUNs 
can be auto-mounted while booting up the system.

Silly question: why? SANs are SPOFs (Gray & van Ingen, MS, 2005; SANresponsible for 11% of terraserver downtime).

Was it because you had the rack and wanted to run Hadoop, or did youwant a more agile cluster? Because it's going to increase your cost ofstorage dramatically, which means you pay more per TB, or end up withless TB of storage. I wouldn't go this way for a dedicated Hadoopcluster. For a multi-use cluster, it's a different story




2. What do you think about running Hadoop nodes in virtual (VMware) servers?



<sagar>: If high speed computing is not a requirement for you then Hadoop nodes 
in VM environment could be a good option, but one other slight drawback is when the 
VM crashes recovery of the in-memory data would be gone. Hadoop takes care of some 
amount of failover, but there is some amount of risk involved and requires good HA 
building capabilities.

I do it for dev and test work, and for isolated clusters in a sharedenvironment.

-for CPU bound stuff, it actually works quite well, as there's nosignificant overhead

-for HDD access, reading from the FS, writing to the FS and to storetransient spill data you take a tangible performance hit. That's OK ifyou can afford to wait or rent a few extra CPUs -and your block size issuch that those extra servers can help out -which may be in the mapphase more than the reduce phase

Some Hadoop-ish projects -Stratosphere from TuB in particular- aredesigned for VM infrastructure so come up with execution plans to useVMs efficiently.


-steve

Re: Hadoop with Netapp

Reply via email to