Bandwidth is definitely better with more active spindles.  I would recommend
several larger disks.  The cost is very nearly the same.

On Fri, Feb 11, 2011 at 3:52 PM, Shrinivas Joshi <jshrini...@gmail.com>wrote:

> Thanks for your inputs, Michael.  We have 6 open SATA ports on the
> motherboards. That is the reason why we are thinking of 4 to 5 data disks
> and 1 OS disk.
> Are you suggesting use of one 2TB disk instead of four 500GB disks lets
> say?
> I thought that the HDFS utilization/throughput increases with the # of
> disks
> per node (assuming that the total usable IO bandwidth increases
> proportionally).
>
> -Shrinivas
>
> On Thu, Feb 10, 2011 at 4:25 PM, Michael Segel <michael_se...@hotmail.com
> >wrote:
>
> >
> > Shrinivas,
> >
> > Assuming you're in the US, I'd recommend the following:
> >
> > Go with 2TB 7200 SATA hard drives.
> > (Not sure what type of hardware you have)
> >
> > What  we've found is that in the data nodes, there's an optimal
> > configuration that balances price versus performance.
> >
> > While your chasis may hold 8 drives, how many open SATA ports are on the
> > motherboard? Since you're using JBOD, you don't want the additional
> expense
> > of having to purchase a separate controller card for the additional
> drives.
> >
> > I'm running Seagate drives at home and I haven't had any problems for
> > years.
> > When you look at your drive, you need to know total storage, speed
> (rpms),
> > and cache size.
> > Looking at Microcenter's pricing... 2TB 3.0GB SATA Hitachi was $110.00 A
> > 1TB Seagate was 70.00
> > A 250GB SATA drive was $45.00
> >
> > So 2TB = 110, 140, 180 (respectively)
> >
> > So you get a better deal on 2TB.
> >
> > So if you go out and get more drives but of lower density, you'll end up
> > spending more money and use more energy, but I doubt you'll see a real
> > performance difference.
> >
> > The other thing is that if you want to add more disk, you have room to
> > grow. (Just add more disk and restart the node, right?)
> > If all of your disk slots are filled, you're SOL. You have to take out
> the
> > box, replace all of the drives, then add to cluster as 'new' node.
> >
> > Just my $0.02 cents.
> >
> > HTH
> >
> > -Mike
> >
> > > Date: Thu, 10 Feb 2011 15:47:16 -0600
> > > Subject: Re: recommendation on HDDs
> > > From: jshrini...@gmail.com
> > > To: common-user@hadoop.apache.org
> > >
> > > Hi Ted, Chris,
> > >
> > > Much appreciate your quick reply. The reason why we are looking for
> > smaller
> > > capacity drives is because we are not anticipating a huge growth in
> data
> > > footprint and also read somewhere that larger the capacity of the
> drive,
> > > bigger the number of platters in them and that could affect drive
> > > performance. But looks like you can get 1TB drives with only 2
> platters.
> > > Large capacity drives should be OK for us as long as they perform
> equally
> > > well.
> > >
> > > Also, the systems that we have can host up to 8 SATA drives in them. In
> > that
> > > case, would  backplanes offer additional advantages?
> > >
> > > Any suggestions on 5400 vs. 7200 vs. 10000 RPM disks?  I guess 10K rpm
> > disks
> > > would be overkill comparing their perf/cost advantage?
> > >
> > > Thanks for your inputs.
> > >
> > > -Shrinivas
> > >
> > > On Thu, Feb 10, 2011 at 2:48 PM, Chris Collins <
> > chris_j_coll...@yahoo.com>wrote:
> > >
> > > > Of late we have had serious issues with seagate drives in our hadoop
> > > > cluster.  These were purchased over several purchasing cycles and
> > pretty
> > > > sure it wasnt just a single "bad batch".   Because of this we
> switched
> > to
> > > > buying 2TB hitachi drives which seem to of been considerably more
> > reliable.
> > > >
> > > > Best
> > > >
> > > > C
> > > > On Feb 10, 2011, at 12:43 PM, Ted Dunning wrote:
> > > >
> > > > > Get bigger disks.  Data only grows and having extra is always good.
> > > > >
> > > > > You can get 2TB drives for <$100 and 1TB for < $75.
> > > > >
> > > > > As far as transfer rates are concerned, any 3GB/s SATA drive is
> going
> > to
> > > > be
> > > > > about the same (ish).  Seek times will vary a bit with rotation
> > speed,
> > > > but
> > > > > with Hadoop, you will be doing long reads and writes.
> > > > >
> > > > > Your controller and backplane will have a MUCH bigger vote in
> getting
> > > > > acceptable performance.  With only 4 or 5 drives, you don't have to
> > worry
> > > > > about super-duper backplane, but you can still kill performance
> with
> > a
> > > > lousy
> > > > > controller.
> > > > >
> > > > > On Thu, Feb 10, 2011 at 12:26 PM, Shrinivas Joshi <
> > jshrini...@gmail.com
> > > > >wrote:
> > > > >
> > > > >> What would be a good hard drive for a 7 node cluster which is
> > targeted
> > > > to
> > > > >> run a mix of IO and CPU intensive Hadoop workloads? We are looking
> > for
> > > > >> around 1 TB of storage on each node distributed amongst 4 or 5
> > disks. So
> > > > >> either 250GB * 4 disks or 160GB * 5 disks. Also it should be less
> > than
> > > > 100$
> > > > >> each ;)
> > > > >>
> > > > >> I looked at HDD benchmark comparisons on tomshardware,
> storagereview
> > > > etc.
> > > > >> Got overwhelmed with the # of benchmarks and different aspects of
> > HDD
> > > > >> performance.
> > > > >>
> > > > >> Appreciate your help on this.
> > > > >>
> > > > >> -Shrinivas
> > > > >>
> > > >
> > > >
> > > >
> >
> >
>

Reply via email to