I have been looking at making DSpace use an S3 Bucket when it stores a database 
entry so the Metadata goes into the oracle database and the content goes 
directly to the S3.  It is way less costly than using EBS volumes at the 50TB 
scale we are looking at.

With s3cmd, using the http access to the S3 has much better performance than 
using the operating system call.  The http put or get goes straight from the 
file current storage to the S3.  The operating system call first copies it down 
to the EC2 instance, then copies it over to the S3.  The S3 copy is sequential 
so start it and take a nap.

When DSpace writes the content external to the database, the command would be 
something like:
S3cmd put http://something-amazonaws.com/subdir1/subdir2 filename.pdf

Note that I made it look like the S3 is a file system which it is not.  But 
doing it this way makes the S3 look like a file system to the end user.

Using the Operating System File access, s3fs, is really slow and not that 
reliable.  The mount tends to fail and has to be remounted from time to time.

I think the DSpace ItemImport.java class can be modified to write the external 
data to S3 this way.  Has this been looked at in the past?  Is there a clean 
way to do it?

Thank you.


Charles Keagle
Sr. Cloud Engineer | 2nd Watch
603 Stewart St, Suite 707 | Seattle, WA | 98101
Mobile 425-417-3434 | Office 888.747.8254
http://www.2ndwatch.com
[2ndwatch]
[aws-image]
CONFIDENTIALITY NOTICE: The information contained in this email and any 
accompanying attachment(s) is intended only for the use of the intended 
recipient and may be confidential and/or privileged. If any reader of this 
communication is not the intended recipient, unauthorized use, disclosure or 
copying is strictly prohibited, and may be unlawful. If you have received this 
communication in error, please immediately notify the sender by telephone at 
425.224.3127 or by return email, and delete the original message and all copies 
from your system. Thank you.

<<inline: image002.jpg>>

<<inline: image004.jpg>>

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to