Thanks for your inputs, Michael.  We have 6 open SATA ports on the
motherboards. That is the reason why we are thinking of 4 to 5 data disks
and 1 OS disk.
Are you suggesting use of one 2TB disk instead of four 500GB disks lets say?
I thought that the HDFS utilization/throughput increases with the # of disks
per node (assuming that the total usable IO bandwidth increases
proportionally).

-Shrinivas

On Thu, Feb 10, 2011 at 4:25 PM, Michael Segel <michael_se...@hotmail.com>wrote:

>
> Shrinivas,
>
> Assuming you're in the US, I'd recommend the following:
>
> Go with 2TB 7200 SATA hard drives.
> (Not sure what type of hardware you have)
>
> What  we've found is that in the data nodes, there's an optimal
> configuration that balances price versus performance.
>
> While your chasis may hold 8 drives, how many open SATA ports are on the
> motherboard? Since you're using JBOD, you don't want the additional expense
> of having to purchase a separate controller card for the additional drives.
>
> I'm running Seagate drives at home and I haven't had any problems for
> years.
> When you look at your drive, you need to know total storage, speed (rpms),
> and cache size.
> Looking at Microcenter's pricing... 2TB 3.0GB SATA Hitachi was $110.00 A
> 1TB Seagate was 70.00
> A 250GB SATA drive was $45.00
>
> So 2TB = 110, 140, 180 (respectively)
>
> So you get a better deal on 2TB.
>
> So if you go out and get more drives but of lower density, you'll end up
> spending more money and use more energy, but I doubt you'll see a real
> performance difference.
>
> The other thing is that if you want to add more disk, you have room to
> grow. (Just add more disk and restart the node, right?)
> If all of your disk slots are filled, you're SOL. You have to take out the
> box, replace all of the drives, then add to cluster as 'new' node.
>
> Just my $0.02 cents.
>
> HTH
>
> -Mike
>
> > Date: Thu, 10 Feb 2011 15:47:16 -0600
> > Subject: Re: recommendation on HDDs
> > From: jshrini...@gmail.com
> > To: common-user@hadoop.apache.org
> >
> > Hi Ted, Chris,
> >
> > Much appreciate your quick reply. The reason why we are looking for
> smaller
> > capacity drives is because we are not anticipating a huge growth in data
> > footprint and also read somewhere that larger the capacity of the drive,
> > bigger the number of platters in them and that could affect drive
> > performance. But looks like you can get 1TB drives with only 2 platters.
> > Large capacity drives should be OK for us as long as they perform equally
> > well.
> >
> > Also, the systems that we have can host up to 8 SATA drives in them. In
> that
> > case, would  backplanes offer additional advantages?
> >
> > Any suggestions on 5400 vs. 7200 vs. 10000 RPM disks?  I guess 10K rpm
> disks
> > would be overkill comparing their perf/cost advantage?
> >
> > Thanks for your inputs.
> >
> > -Shrinivas
> >
> > On Thu, Feb 10, 2011 at 2:48 PM, Chris Collins <
> chris_j_coll...@yahoo.com>wrote:
> >
> > > Of late we have had serious issues with seagate drives in our hadoop
> > > cluster.  These were purchased over several purchasing cycles and
> pretty
> > > sure it wasnt just a single "bad batch".   Because of this we switched
> to
> > > buying 2TB hitachi drives which seem to of been considerably more
> reliable.
> > >
> > > Best
> > >
> > > C
> > > On Feb 10, 2011, at 12:43 PM, Ted Dunning wrote:
> > >
> > > > Get bigger disks.  Data only grows and having extra is always good.
> > > >
> > > > You can get 2TB drives for <$100 and 1TB for < $75.
> > > >
> > > > As far as transfer rates are concerned, any 3GB/s SATA drive is going
> to
> > > be
> > > > about the same (ish).  Seek times will vary a bit with rotation
> speed,
> > > but
> > > > with Hadoop, you will be doing long reads and writes.
> > > >
> > > > Your controller and backplane will have a MUCH bigger vote in getting
> > > > acceptable performance.  With only 4 or 5 drives, you don't have to
> worry
> > > > about super-duper backplane, but you can still kill performance with
> a
> > > lousy
> > > > controller.
> > > >
> > > > On Thu, Feb 10, 2011 at 12:26 PM, Shrinivas Joshi <
> jshrini...@gmail.com
> > > >wrote:
> > > >
> > > >> What would be a good hard drive for a 7 node cluster which is
> targeted
> > > to
> > > >> run a mix of IO and CPU intensive Hadoop workloads? We are looking
> for
> > > >> around 1 TB of storage on each node distributed amongst 4 or 5
> disks. So
> > > >> either 250GB * 4 disks or 160GB * 5 disks. Also it should be less
> than
> > > 100$
> > > >> each ;)
> > > >>
> > > >> I looked at HDD benchmark comparisons on tomshardware, storagereview
> > > etc.
> > > >> Got overwhelmed with the # of benchmarks and different aspects of
> HDD
> > > >> performance.
> > > >>
> > > >> Appreciate your help on this.
> > > >>
> > > >> -Shrinivas
> > > >>
> > >
> > >
> > >
>
>

Reply via email to