Our Hadoop journey included a brief stint running on our own virtualised 
infrastructure. Our pre-Hadoop application was already running on the VM 
infrastructure so we set up a small cluster as virtual machines on the SAN.

It worked ok for a while but as our usage grew we ditched it for a couple of 
reasons:

1) Performance was inconsistent because the infrastructure was multi-teneted 
(the VM hosts served other applications, and the SAN backed most storage for 
the company). This became an issue as jobs we'd expect to complete within a few 
minute or hours would take most of a morning.

2) Cost of growth was a stepped line. At the time we started we had plenty of 
space, but, as we started viewing HDFS (and Hadoop) as a fantastic place to 
store structured and unstructured data our storage growth accelerated. We could 
buy more expensive disks to grow capacity a bit or we'd need to buy a whole new 
controller. This actually turned out to be the reason to just buy our own 
physical boxes- the cost of buying 5 decent specced machines was significantly 
less than the cost of another SAN. 

For us SAN storage would never have worked out- we're now at around 90TB of 
capacity (probably small compared to some on this list :) but that would have 
cost us a small fortune.

I can't say much about SAN performance vs. physical performance other than 
physical was drastically better for us. However, it was the limitations above 
that caused us to make the leap and it's been well worth it!



On 29 Sep 2011, at 07:50, praveenesh kumar wrote:

> Hi,
> 
> I want to know can we use SAN storage for Hadoop cluster setup ?
> If yes, what should be the best pratices ?
> 
> Is it a good way to do considering the fact "the underlining power of Hadoop
> is co-locating the processing power (CPU) with the data storage and thus it
> must be local storage to be effective".
> *But also, is it better to say “local is better” in the situation where I
> have a single local 5400 RPM IDE drive, which  would be dramatically slower
> than SAN storage striped  across many drives spinning at 10k RPM and
> accessed via fiber channel ?*
> *
> *
> Thanks,
> Praveenesh

Reply via email to