Our Hadoop journey included a brief stint running on our own virtualised infrastructure. Our pre-Hadoop application was already running on the VM infrastructure so we set up a small cluster as virtual machines on the SAN.
It worked ok for a while but as our usage grew we ditched it for a couple of reasons: 1) Performance was inconsistent because the infrastructure was multi-teneted (the VM hosts served other applications, and the SAN backed most storage for the company). This became an issue as jobs we'd expect to complete within a few minute or hours would take most of a morning. 2) Cost of growth was a stepped line. At the time we started we had plenty of space, but, as we started viewing HDFS (and Hadoop) as a fantastic place to store structured and unstructured data our storage growth accelerated. We could buy more expensive disks to grow capacity a bit or we'd need to buy a whole new controller. This actually turned out to be the reason to just buy our own physical boxes- the cost of buying 5 decent specced machines was significantly less than the cost of another SAN. For us SAN storage would never have worked out- we're now at around 90TB of capacity (probably small compared to some on this list :) but that would have cost us a small fortune. I can't say much about SAN performance vs. physical performance other than physical was drastically better for us. However, it was the limitations above that caused us to make the leap and it's been well worth it! On 29 Sep 2011, at 07:50, praveenesh kumar wrote: > Hi, > > I want to know can we use SAN storage for Hadoop cluster setup ? > If yes, what should be the best pratices ? > > Is it a good way to do considering the fact "the underlining power of Hadoop > is co-locating the processing power (CPU) with the data storage and thus it > must be local storage to be effective". > *But also, is it better to say “local is better” in the situation where I > have a single local 5400 RPM IDE drive, which would be dramatically slower > than SAN storage striped across many drives spinning at 10k RPM and > accessed via fiber channel ?* > * > * > Thanks, > Praveenesh