I am working out the Total Cost for a large (er, scalable...) hdfs
deployment to use for persistent storage. We already use a hdfs /
hadoop setup for computational purposes. The numbers for hardware are
relatively easy depending on design. However, like all opensource
initiatives, calculating the time / resources for administration by
humans is tricky until you do it for awhile. Our company has plenty of
experience using OpenSource tools for the majority of our needs, with
the proprietary solution thrown in where applicable. So I was wondering
if anyone who is attempting to setup a hdfs storage setup, or who is
using one now could shed some light on the subject. I figured
initially, on a new deployment, about 60% of a single Administrators
time would be taken up to maintain a large hdfs cluster. I figure
design and implementation would take much more than that (maybe 2 or 3
people to save time). I am comparing this to a couple proprietary
solutions we have been looking at that are known to scale well into
multi TB or even PB setup that are basically Scalable NAS solutions that
use a "brick" mentality. The theory being that you pay upfront for
stability or support if needed. But I am wondering if say 60% for the
first year of learning how to maintain and grow a production hdfs is
close or not. Too high / low?
Any thoughts or real work data would be great.
Thanks all!!