We are looking at building a 50TB DSpace Repository in AWS and are new to DSpace. At this scale, it does not look like Amazon Relational Database Service can meet the 50TB requirement. RDS has a 3 TB maximum size limitation. Some questions we have come up with will help us decide how to go about this task:
1. Can DSpace store data content external to the database? The Amazon S3 is a good place to store the data, but it is not database, it is object storage. The database can then store a pointer to that external data. A URL in DSpace would be a good way to access S3 data. Comparing S3 (Simple Storage Service) and EBS (Elastic Block Storage) costs for 50TB makes S3 look very attractive. 2. What are the High Availability solutions for DSpace? 3. Is there a replication mechanism in DSpace for High Availability if we store in Amazon Ephemeral Storage which is not persistent? This replication would synchronize the database in multiple Amazon Availability Zones in the same Region. This is another much less costly alternative than EBS. Not all that reliable though, when instance fails, data is lost. 4. How far along is the MySQL implementation in DSpace. I saw an article in the email lists about MySQL that was several years old. 5. Is there an Hadoop alternative for DSpace storage? 6. Is the a NoSQL alternative for DSpace storage? Thank you. The storage requirement grew from 100GB to 50TB in the blink of an eye. Now the scaling part of it. Charles Keagle Sr. Cloud Engineer | 2nd Watch 603 Stewart St, Suite 707 | Seattle, WA | 98101 Mobile 425-417-3434 | Office 888.747.8254 http://www.2ndwatch.com [2ndwatch] [aws-image] CONFIDENTIALITY NOTICE: The information contained in this email and any accompanying attachment(s) is intended only for the use of the intended recipient and may be confidential and/or privileged. If any reader of this communication is not the intended recipient, unauthorized use, disclosure or copying is strictly prohibited, and may be unlawful. If you have received this communication in error, please immediately notify the sender by telephone at 425.224.3127 or by return email, and delete the original message and all copies from your system. Thank you.
<<inline: image005.jpg>>
<<inline: image006.jpg>>
------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

