Hi Joseph,
Going with S3 would actually be a great way to break the we can't put
it in the repository because we'll run out of disk space barrier, and
for cheap. Many repo admins will also be likely to consult the
trusted repository handbook, as well as your legal rights to move
the files uploaded to other storage silos. As a result, our university
has a massive data center (and massive data center costs).
The s3fs option sounds the most likely to accomplish quickly,
refactoring DSpace to have a pluggable asset-storage system, and then
implementing it for s3 would take some effort, however, hopefully
someone more knowledgeable can chime in. (there may be some prior art)
The downsides of having to make a network connection for disk access
is when you do a index-init, or filtermedia, and have to do network
request to do what typically are fast disk accesses. It should work
fine, but those tasks will be much slower. This point is likely less
of a problem if you're going with Amazon EC2.
All that said, having s3 would be useful for managing multiple
development environments, where rsyncing productions assetstore to an
external drive connected to each computer becomes a chore. Not sure if
rsync to s3 is much better though.
Also, the demo.dspace.org site resides wholly in Amazon EC2 with
likely an EBS filesystem. So theres nothing wrong with Amazon, just
whatever solution you use, the distance between your virtual CPU and
virtual disk should be as close as possible. Or, perhaps as close as
possible to the end user.
@Hardy, I don't think the 64GB max per file is going to slow me down
any. Our entire repo is about that size, and thats thousands of files.
On 3/18/11, Pottinger, Hardy J. pottinge...@umsystem.edu wrote:
Hi, I'm certainly not an expert in this area, but from my quick read,
depending on the use case for your repository, this looks like something
that might work. One thing to be aware of is the 64GB max file size imposed
by s3fs, and the potential for S3's Eventual Consistency model to cause
problems with user submissions. More details on the s3fs wiki:
http://code.google.com/p/s3fs/wiki/EventualConsistency
I'm interesting in hearing more about this, if anyone has actually played
around with putting an assetstore on s3fs.
--Hardy
-Original Message-
From: Joseph Rhoads [mailto:jrho...@westga.edu]
Sent: Friday, March 18, 2011 1:12 PM
To: dspace-devel@lists.sourceforge.net
Subject: [Dspace-devel] Using Amazon S3 for an Assetstore
I've seen some talk about integrating Amazon S3 as an assetstore (or
bitstream store as it's sometimes called).
Has anyone tried using something like s3fs, a FUSE-based file system on
Amazon ?
(I know there are several flavors of the same idea around but
http://code.google.com/p/s3fs/ seems like a fairly mature one. Another
is http://code.google.com/p/s3ql/ )
And just using a directory the mounted fs as your directory for the
assetstore.
Are there subtlties that I haven't noticed (after a 10 minute first
glance) that would make it apparent that this is a bad idea?
Has anyone done this successfully?
-Joseph
--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel
--
Peter Dietz
--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel