Re: [CODE4LIB] Digital collection backups

Joshua Welker Fri, 11 Jan 2013 06:11:18 -0800

Glacier sounds even better than S3 for what we're looking for. We are only 
going to be retrieving the files in the case of corruption, so the 
pay-per-retrieval model would work well. I heard of Glacier in the past but 
forgot all about it. Thank you.


Josh Welker


-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy 
Tennant
Sent: Thursday, January 10, 2013 4:56 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data 
files in logical tar'd and gzip'd chunks and it's costing my employer less than 
50 cents/month. Glacier, however, is best for "park it and forget" kinds of 
needs, as the real cost is in data flow.
Storage is cheap, but must be considered "offline" or "near line" as you must 
first request to retrieve a file, wait for about a day, and then retrieve the 
file. And you're charged more for the download throughput than just about 
anything.

I'm using a Unix client to handle all of the heavy lifting of uploading and 
downloading, as Glacier is meant to be used via an API rather than a web 
client.[1] If anyone is interested, I have local documentation on usage that I 
could probably genericize. And yes, I did round-trip a file to make sure it 
functioned as advertised.
Roy

[1] https://github.com/vsespb/mt-aws-glacier

On Thu, Jan 10, 2013 at 2:29 PM,  <ddwigg...@historicnewengland.org> wrote:
> We built our own solution for this by creating a plugin that works with our 
> digital asset management system (ResourceSpace) to invidually back up files 
> to Amazon S3. Because S3 is replicated to multiple data centers, this 
> provides a fairly high level of redundancy. And because it's an object-based 
> web service, we can access any given object individually by using a URL 
> related to the original storage URL within our system.
>
> This also allows us to take advantage of S3 for images on our website. All of 
> the images from in our online collections database are being served straight 
> from S3, which diverts the load from our public web server. When we launch 
> zoomable images later this year, all of the tiles will also be generated 
> locally in the DAM and then served to the public via the mirrored copy in S3.
>
> The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
> fairly reasonable for what we're getting. They just dropped the price 
> substantially a few months ago.
>
> DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
> abstraction layer so you can build something like this that is portable 
> between different cloud storage providers. But I haven't really looked into 
> this as of yet.
>
> -David
>
>
> __________
>
> David Dwiggins
> Systems Librarian/Archivist, Historic New England
> 141 Cambridge Street, Boston, MA 02114
> (617) 994-5948
> ddwigg...@historicnewengland.org
> http://www.historicnewengland.org
>>>> Joshua Welker <jwel...@sbuniv.edu> 1/10/2013 5:20 PM >>>
> Hi everyone,
>
> We are starting a digitization project for some of our special collections, 
> and we are having a hard time setting up a backup system that meets the 
> long-term preservation needs of digital archives. The backup mechanisms 
> currently used by campus IT are short-term full-server backups. What we are 
> looking for is more granular, file-level backup over the very long term. Does 
> anyone have any recommendations of software or some service or technique? We 
> are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses 
> LOCKSS tell me a bit of their experiences with it?
>
> Josh Welker
> Electronic/Media Services Librarian
> College Liaison
> University Libraries
> Southwest Baptist University
> 417.328.1624

Re: [CODE4LIB] Digital collection backups

Reply via email to