Ryan Byrd wrote:
Let's say I need to store a petabyte of data. I need fast access (tape/DVDs
aren't fast enough) and redundancy like RAID.

By "fast access" do you mean both read and write, mainly read (once it's 
written), or...?  What RAID level would you run (this will impact both your access speed 
and the raw size of the storage you need).

How should I do it? Initially the data will be in large (several MB) binary
objects and could be stored as files, but eventually, it will need to be
placed into a relational database like Oracle.

How will the data be accessed?  Are the "several MB binary objects" to be read 
as chunks or streamed, accessed at random locations, indexed, searched?

Let's say I have access to racks that are 44U tall. I've listed several very
different scenarios. Which is best? Is there a better way? How would *you*
store 1PB? What do you/your company currently use to store your large
datasets?
Large SANs.

What kind of drives should I use? Do SCSI drives last longer than
SATA?
Depends on who you ask, I doubt you're going to find an enterprise-class 
storage system running on SATA drives.

What about IDE or SAS?
SAS has probably already surpassed regular old SCSI.

Do higher RPM drives have a shorter mean time to failure?
Perhaps, but that's going to be the least of your worries...  With the 
thousands of drives required to make up your 1PB, you'll be lucky if only a few 
fail before even spinning up.


How will the data be sucked off the disk?  Several large servers?  A large farm 
of clients?  You won't want to connect it all to a single machine, so...where 
are your real bottlenecks?

Do you need some of the enterprise features such as snapshotting, etc?  
Redundancy (not just disk, but connectivity, power feeds and supplies, etc.)?

Scenario 1:
----------------
EMC CLARiiON CX300 PSI w/ Fiber channel
14x300GB ultrawide SCSI

Sounds like small potatoes.  If you're going to get something from EMC, why not 
something larger than the CX300?  DMX series perhaps 
(http://www.emc.com/products/systems/dmx_compare.jsp)?  Rather than 33 racks full of 
tiny boxes at only 31TB per rack, there are larger boxes (storage-wise) that have a 
smaller footprint.  In fact <VENDOR REDACTED> will be coming out with a very 
nice rackmount storage box that...er...nevermind...I can't talk about that.

Scenario 2:
----------------
Dell PowerVault MD3000

And connect them all to what?  Dell jokes aside, is this really what you'd want?

Scenario 3 & 4:
----------------
HP ProLiant DL* Server
Yes, it'd be a lot cheaper starting out, but what about the maintenance associated with an operating system, etc.
Do you have lots of datacenter space that you really need to use up, or does 
physical size matter (I mean, you're talking about solutions that take up 
nearly 50 racks!!!)?  What are your power and cooling capacities and costs in 
your datacenter?  With that much hardware, your Environmentals are going to 
become a factor.  What are your maintenance and support costs?

I suspect you don't already have 1PB of data to fill up all this storage right 
away, so can you use a phased approach that allows you to get some storage now, 
more later on, etc.?  Take advantage of falling costs, improving technology, 
and what the vendor you choose has on their roadmap as you fill out the entire 
1PB.  Is the 1PB expected to grow?  Where is the data coming from, and how 
quickly will it be generated, etc.?

Lots more questions before you'll be able to really decide on a solution, but 
with that kind of change floating around, you'll need to define your 
requirements a lot more.  I suggest you spend the time to define your 
requirements, then request proposals from several vendors.  With the 
appropriate reciprocal NDAs, and proposals from vendors, you should be able to 
make a more informed decision.

If you have additional questions, or need help contacting a vendor, let me 
know.  My finder's fee and consulting costs are quite reasonable ;)

Frank

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to