We are looking at building a 50TB DSpace Repository in AWS and are new to 
DSpace.  At this scale, it does not look like Amazon Relational Database 
Service can meet the 50TB requirement.  RDS has a 3 TB maximum size limitation. 
 Some questions we have come up with will help us decide how to go about this 
task:

1.      Can DSpace store data content external to the database?  The Amazon S3 
is a good place to store the data, but it is not database, it is object 
storage.  The database can then store a pointer to that external data.  A URL 
in DSpace would be a good way to access S3 data.  Comparing S3 (Simple Storage 
Service) and EBS (Elastic Block Storage) costs for 50TB makes S3 look very 
attractive.

2.      What are the High Availability solutions for DSpace?

3.      Is there a replication mechanism in DSpace for High Availability if we 
store in Amazon Ephemeral Storage which is not persistent?  This replication 
would synchronize the database in multiple Amazon Availability Zones in the 
same Region.  This is another much less costly alternative than EBS.  Not all 
that reliable though, when instance fails, data is lost.

4.      How far along is the MySQL implementation in DSpace.  I saw an article 
in the email lists about MySQL that was several years old.

5.      Is there an Hadoop alternative for DSpace storage?

6.      Is the a NoSQL alternative for DSpace storage?

Thank you.  The storage requirement grew from 100GB to 50TB in the blink of an 
eye.  Now the scaling part of it.

Charles Keagle
Sr. Cloud Engineer | 2nd Watch
603 Stewart St, Suite 707 | Seattle, WA | 98101
Mobile 425-417-3434 | Office 888.747.8254
http://www.2ndwatch.com
[2ndwatch]
[aws-image]
CONFIDENTIALITY NOTICE: The information contained in this email and any 
accompanying attachment(s) is intended only for the use of the intended 
recipient and may be confidential and/or privileged. If any reader of this 
communication is not the intended recipient, unauthorized use, disclosure or 
copying is strictly prohibited, and may be unlawful. If you have received this 
communication in error, please immediately notify the sender by telephone at 
425.224.3127 or by return email, and delete the original message and all copies 
from your system. Thank you.

<<inline: image005.jpg>>

<<inline: image006.jpg>>

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to