Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
David,

That sounds like a definite option. Thanks. Does S3 has an API for uploading so 
that the upload process could be scripted, or do you manually upload each file?

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
ddwigg...@historicnewengland.org
Sent: Thursday, January 10, 2013 4:29 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

We built our own solution for this by creating a plugin that works with our 
digital asset management system (ResourceSpace) to invidually back up files to 
Amazon S3. Because S3 is replicated to multiple data centers, this provides a 
fairly high level of redundancy. And because it's an object-based web service, 
we can access any given object individually by using a URL related to the 
original storage URL within our system.
 
This also allows us to take advantage of S3 for images on our website. All of 
the images from in our online collections database are being served straight 
from S3, which diverts the load from our public web server. When we launch 
zoomable images later this year, all of the tiles will also be generated 
locally in the DAM and then served to the public via the mirrored copy in S3.
 
The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
fairly reasonable for what we're getting. They just dropped the price 
substantially a few months ago.
 
DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
abstraction layer so you can build something like this that is portable between 
different cloud storage providers. But I haven't really looked into this as of 
yet.
 
-David

 
__
 
David Dwiggins
Systems Librarian/Archivist, Historic New England
141 Cambridge Street, Boston, MA 02114
(617) 994-5948
ddwigg...@historicnewengland.org
http://www.historicnewengland.org
 Joshua Welker jwel...@sbuniv.edu 1/10/2013 5:20 PM 
Hi everyone,

We are starting a digitization project for some of our special collections, and 
we are having a hard time setting up a backup system that meets the long-term 
preservation needs of digital archives. The backup mechanisms currently used by 
campus IT are short-term full-server backups. What we are looking for is more 
granular, file-level backup over the very long term. Does anyone have any 
recommendations of software or some service or technique? We are looking into 
LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit 
of their experiences with it?

Josh Welker
Electronic/Media Services Librarian
College Liaison
University Libraries
Southwest Baptist University
417.328.1624


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Gary McGath
Concerns have been raised about how expensive Glacier gets if you need
to recover a lot of files in a short time period.

http://www.wired.com/wiredenterprise/2012/08/glacier/

On 1/10/13 5:56 PM, Roy Tennant wrote:
 I'd also take a look at Amazon Glacier. Recently I parked about 50GB
 of data files in logical tar'd and gzip'd chunks and it's costing my
 employer less than 50 cents/month. Glacier, however, is best for park
 it and forget kinds of needs, as the real cost is in data flow.
 Storage is cheap, but must be considered offline or near line as
 you must first request to retrieve a file, wait for about a day, and
 then retrieve the file. And you're charged more for the download
 throughput than just about anything.
 
 I'm using a Unix client to handle all of the heavy lifting of
 uploading and downloading, as Glacier is meant to be used via an API
 rather than a web client.[1] If anyone is interested, I have local
 documentation on usage that I could probably genericize. And yes, I
 did round-trip a file to make sure it functioned as advertised.
 Roy
 
 [1] https://github.com/vsespb/mt-aws-glacier
 
 On Thu, Jan 10, 2013 at 2:29 PM,  ddwigg...@historicnewengland.org wrote:
 We built our own solution for this by creating a plugin that works with our 
 digital asset management system (ResourceSpace) to invidually back up files 
 to Amazon S3. Because S3 is replicated to multiple data centers, this 
 provides a fairly high level of redundancy. And because it's an object-based 
 web service, we can access any given object individually by using a URL 
 related to the original storage URL within our system.

 This also allows us to take advantage of S3 for images on our website. All 
 of the images from in our online collections database are being served 
 straight from S3, which diverts the load from our public web server. When we 
 launch zoomable images later this year, all of the tiles will also be 
 generated locally in the DAM and then served to the public via the mirrored 
 copy in S3.

 The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
 fairly reasonable for what we're getting. They just dropped the price 
 substantially a few months ago.

 DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
 abstraction layer so you can build something like this that is portable 
 between different cloud storage providers. But I haven't really looked into 
 this as of yet.


-- 
Gary McGath, Professional Software Developer
http://www.garymcgath.com


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Glacier sounds even better than S3 for what we're looking for. We are only 
going to be retrieving the files in the case of corruption, so the 
pay-per-retrieval model would work well. I heard of Glacier in the past but 
forgot all about it. Thank you.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy 
Tennant
Sent: Thursday, January 10, 2013 4:56 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data 
files in logical tar'd and gzip'd chunks and it's costing my employer less than 
50 cents/month. Glacier, however, is best for park it and forget kinds of 
needs, as the real cost is in data flow.
Storage is cheap, but must be considered offline or near line as you must 
first request to retrieve a file, wait for about a day, and then retrieve the 
file. And you're charged more for the download throughput than just about 
anything.

I'm using a Unix client to handle all of the heavy lifting of uploading and 
downloading, as Glacier is meant to be used via an API rather than a web 
client.[1] If anyone is interested, I have local documentation on usage that I 
could probably genericize. And yes, I did round-trip a file to make sure it 
functioned as advertised.
Roy

[1] https://github.com/vsespb/mt-aws-glacier

On Thu, Jan 10, 2013 at 2:29 PM,  ddwigg...@historicnewengland.org wrote:
 We built our own solution for this by creating a plugin that works with our 
 digital asset management system (ResourceSpace) to invidually back up files 
 to Amazon S3. Because S3 is replicated to multiple data centers, this 
 provides a fairly high level of redundancy. And because it's an object-based 
 web service, we can access any given object individually by using a URL 
 related to the original storage URL within our system.

 This also allows us to take advantage of S3 for images on our website. All of 
 the images from in our online collections database are being served straight 
 from S3, which diverts the load from our public web server. When we launch 
 zoomable images later this year, all of the tiles will also be generated 
 locally in the DAM and then served to the public via the mirrored copy in S3.

 The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
 fairly reasonable for what we're getting. They just dropped the price 
 substantially a few months ago.

 DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
 abstraction layer so you can build something like this that is portable 
 between different cloud storage providers. But I haven't really looked into 
 this as of yet.

 -David


 __

 David Dwiggins
 Systems Librarian/Archivist, Historic New England
 141 Cambridge Street, Boston, MA 02114
 (617) 994-5948
 ddwigg...@historicnewengland.org
 http://www.historicnewengland.org
 Joshua Welker jwel...@sbuniv.edu 1/10/2013 5:20 PM 
 Hi everyone,

 We are starting a digitization project for some of our special collections, 
 and we are having a hard time setting up a backup system that meets the 
 long-term preservation needs of digital archives. The backup mechanisms 
 currently used by campus IT are short-term full-server backups. What we are 
 looking for is more granular, file-level backup over the very long term. Does 
 anyone have any recommendations of software or some service or technique? We 
 are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses 
 LOCKSS tell me a bit of their experiences with it?

 Josh Welker
 Electronic/Media Services Librarian
 College Liaison
 University Libraries
 Southwest Baptist University
 417.328.1624


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Good point. But since campus IT will be creating regular disaster-recovery 
backups, the odds that we'd need ever need to retrieve more than a handful of 
files from Glacier at a time is pretty low. 

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Gary 
McGath
Sent: Friday, January 11, 2013 8:03 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Concerns have been raised about how expensive Glacier gets if you need to 
recover a lot of files in a short time period.

http://www.wired.com/wiredenterprise/2012/08/glacier/

On 1/10/13 5:56 PM, Roy Tennant wrote:
 I'd also take a look at Amazon Glacier. Recently I parked about 50GB 
 of data files in logical tar'd and gzip'd chunks and it's costing my 
 employer less than 50 cents/month. Glacier, however, is best for park 
 it and forget kinds of needs, as the real cost is in data flow.
 Storage is cheap, but must be considered offline or near line as 
 you must first request to retrieve a file, wait for about a day, and 
 then retrieve the file. And you're charged more for the download 
 throughput than just about anything.
 
 I'm using a Unix client to handle all of the heavy lifting of 
 uploading and downloading, as Glacier is meant to be used via an API 
 rather than a web client.[1] If anyone is interested, I have local 
 documentation on usage that I could probably genericize. And yes, I 
 did round-trip a file to make sure it functioned as advertised.
 Roy
 
 [1] https://github.com/vsespb/mt-aws-glacier
 
 On Thu, Jan 10, 2013 at 2:29 PM,  ddwigg...@historicnewengland.org wrote:
 We built our own solution for this by creating a plugin that works with our 
 digital asset management system (ResourceSpace) to invidually back up files 
 to Amazon S3. Because S3 is replicated to multiple data centers, this 
 provides a fairly high level of redundancy. And because it's an object-based 
 web service, we can access any given object individually by using a URL 
 related to the original storage URL within our system.

 This also allows us to take advantage of S3 for images on our website. All 
 of the images from in our online collections database are being served 
 straight from S3, which diverts the load from our public web server. When we 
 launch zoomable images later this year, all of the tiles will also be 
 generated locally in the DAM and then served to the public via the mirrored 
 copy in S3.

 The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
 fairly reasonable for what we're getting. They just dropped the price 
 substantially a few months ago.

 DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
 abstraction layer so you can build something like this that is portable 
 between different cloud storage providers. But I haven't really looked into 
 this as of yet.


--
Gary McGath, Professional Software Developer http://www.garymcgath.com


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Matt Schultz
Hi Josh,

Glad you are looking into LOCKSS as a potential solution for your needs and
that you are thinking beyond simple backup solutions for more long-term
preservation. Here at MetaArchive Cooperative we make use of LOCKSS to
preserve a range of content/collections from our member institutions.

The nice thing (I think) about our approach and our use of LOCKSS as an
embedded technology is that you as an institution retain full control over
your collections in the preservation network and get to play an active and
on-going part in their preservation treatment over time. Storage costs in
MetaArchive are competitive ($1/GB/year), and with that you get up to 7
geographic replications. MetaArchive is international at this point and so
your collections really do achieve some safe distance from any disasters
that may hit close to home.

I'd be more than happy to talk with you further about your collection
needs, why we like LOCKSS, and any interest your institution may have in
being part of a collaborative approach to preserving your content above and
beyond simple backup. Feel free to contact me directly.

Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative
http://www.metaarchive.org
matt.schu...@metaarchive.org
616-566-3204

On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote:

 Hi everyone,

 We are starting a digitization project for some of our special
 collections, and we are having a hard time setting up a backup system that
 meets the long-term preservation needs of digital archives. The backup
 mechanisms currently used by campus IT are short-term full-server backups.
 What we are looking for is more granular, file-level backup over the very
 long term. Does anyone have any recommendations of software or some service
 or technique? We are looking into LOCKSS but haven't dug too deeply yet.
 Can anyone who uses LOCKSS tell me a bit of their experiences with it?

 Josh Welker
 Electronic/Media Services Librarian
 College Liaison
 University Libraries
 Southwest Baptist University
 417.328.1624




-- 
Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative
http://www.metaarchive.org
matt.schu...@metaarchive.org
616-566-3204


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Al Matthews
We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is
typically spec-d for consumer hardware, and so, presumably as a result of
SE Asia flooding, there have been some drive failures and cache downtimes
and adjustments accordingly.

However, that is the worst of it, first.

LOCKSS is to some perhaps even considerable degree, tamper-resistant since
it relies on mechanisms of collective polling among multiple copies to
preserve integrity. This, as opposed to static checksums or some other
solution.

As such, it seems to me important to run a LOCKSS box with other LOCKSS
boxes; MA cooperative specifies six or so, distributed locations for each
cache.

The economic sustainability of such an enterprise is a valid question.
David S H Rosenthal at Stanford seems to lead the charge for this research.

e.g. http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more

I've heard mention from other players that they watch MA carefully for
such sustainability considerations, especially because MA uses LOCKSS for
non-journal content. In some sense this may extend LOCKSS beyond its
original design.

MetaArchive has in my opinion been extremely responsible in designating
succession scenarios and disaster recovery scenarios, going to far as to
fund, develop and test services for migration out of the system, into an
IRODS repository in the initial case.


Al Matthews
AUC Robert W. Woodruff Library

On 1/11/13 9:10 AM, Joshua Welker jwel...@sbuniv.edu wrote:

Good point. But since campus IT will be creating regular
disaster-recovery backups, the odds that we'd need ever need to retrieve
more than a handful of files from Glacier at a time is pretty low.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Gary McGath
Sent: Friday, January 11, 2013 8:03 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Concerns have been raised about how expensive Glacier gets if you need to
recover a lot of files in a short time period.

http://www.wired.com/wiredenterprise/2012/08/glacier/

On 1/10/13 5:56 PM, Roy Tennant wrote:
 I'd also take a look at Amazon Glacier. Recently I parked about 50GB
 of data files in logical tar'd and gzip'd chunks and it's costing my
 employer less than 50 cents/month. Glacier, however, is best for park
 it and forget kinds of needs, as the real cost is in data flow.
 Storage is cheap, but must be considered offline or near line as
 you must first request to retrieve a file, wait for about a day, and
 then retrieve the file. And you're charged more for the download
 throughput than just about anything.

 I'm using a Unix client to handle all of the heavy lifting of
 uploading and downloading, as Glacier is meant to be used via an API
 rather than a web client.[1] If anyone is interested, I have local
 documentation on usage that I could probably genericize. And yes, I
 did round-trip a file to make sure it functioned as advertised.
 Roy

 [1] https://github.com/vsespb/mt-aws-glacier

 On Thu, Jan 10, 2013 at 2:29 PM,  ddwigg...@historicnewengland.org
wrote:
 We built our own solution for this by creating a plugin that works
with our digital asset management system (ResourceSpace) to invidually
back up files to Amazon S3. Because S3 is replicated to multiple data
centers, this provides a fairly high level of redundancy. And because
it's an object-based web service, we can access any given object
individually by using a URL related to the original storage URL within
our system.

 This also allows us to take advantage of S3 for images on our website.
All of the images from in our online collections database are being
served straight from S3, which diverts the load from our public web
server. When we launch zoomable images later this year, all of the
tiles will also be generated locally in the DAM and then served to the
public via the mirrored copy in S3.

 The current pricing is around $0.08/GB/month for 1-50 TB, which I
think is fairly reasonable for what we're getting. They just dropped
the price substantially a few months ago.

 DuraCloud http://www.duracloud.org/ supposedly offers a way to add
another abstraction layer so you can build something like this that is
portable between different cloud storage providers. But I haven't
really looked into this as of yet.


--
Gary McGath, Professional Software Developer http://www.garymcgath.com


-
**
The contents of this email and any attachments are confidential.
They are intended for the named recipient(s) only.
If you have received this email in error please notify the system
manager or  the 
sender immediately and do not disclose the contents to anyone or
make copies.

** IronMail scanned this email for viruses, vandals and malicious
content. **
**


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Thanks, Al. I think we'd join a LOCKSS network rather than run multiple LOCKSS 
boxes ourselves. Does anyone have any experience with one of those, like the 
LOCKSS Global Alliance?

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Al 
Matthews
Sent: Friday, January 11, 2013 8:50 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is typically 
spec-d for consumer hardware, and so, presumably as a result of SE Asia 
flooding, there have been some drive failures and cache downtimes and 
adjustments accordingly.

However, that is the worst of it, first.

LOCKSS is to some perhaps even considerable degree, tamper-resistant since it 
relies on mechanisms of collective polling among multiple copies to preserve 
integrity. This, as opposed to static checksums or some other solution.

As such, it seems to me important to run a LOCKSS box with other LOCKSS boxes; 
MA cooperative specifies six or so, distributed locations for each cache.

The economic sustainability of such an enterprise is a valid question.
David S H Rosenthal at Stanford seems to lead the charge for this research.

e.g. http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more

I've heard mention from other players that they watch MA carefully for such 
sustainability considerations, especially because MA uses LOCKSS for 
non-journal content. In some sense this may extend LOCKSS beyond its original 
design.

MetaArchive has in my opinion been extremely responsible in designating 
succession scenarios and disaster recovery scenarios, going to far as to fund, 
develop and test services for migration out of the system, into an IRODS 
repository in the initial case.


Al Matthews
AUC Robert W. Woodruff Library

On 1/11/13 9:10 AM, Joshua Welker jwel...@sbuniv.edu wrote:

Good point. But since campus IT will be creating regular 
disaster-recovery backups, the odds that we'd need ever need to 
retrieve more than a handful of files from Glacier at a time is pretty low.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Gary McGath
Sent: Friday, January 11, 2013 8:03 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Concerns have been raised about how expensive Glacier gets if you need 
to recover a lot of files in a short time period.

http://www.wired.com/wiredenterprise/2012/08/glacier/

On 1/10/13 5:56 PM, Roy Tennant wrote:
 I'd also take a look at Amazon Glacier. Recently I parked about 50GB 
 of data files in logical tar'd and gzip'd chunks and it's costing my 
 employer less than 50 cents/month. Glacier, however, is best for 
 park it and forget kinds of needs, as the real cost is in data flow.
 Storage is cheap, but must be considered offline or near line as 
 you must first request to retrieve a file, wait for about a day, and 
 then retrieve the file. And you're charged more for the download 
 throughput than just about anything.

 I'm using a Unix client to handle all of the heavy lifting of 
 uploading and downloading, as Glacier is meant to be used via an API 
 rather than a web client.[1] If anyone is interested, I have local 
 documentation on usage that I could probably genericize. And yes, I 
 did round-trip a file to make sure it functioned as advertised.
 Roy

 [1] https://github.com/vsespb/mt-aws-glacier

 On Thu, Jan 10, 2013 at 2:29 PM,  ddwigg...@historicnewengland.org
wrote:
 We built our own solution for this by creating a plugin that works 
with our digital asset management system (ResourceSpace) to 
invidually back up files to Amazon S3. Because S3 is replicated to 
multiple data centers, this provides a fairly high level of 
redundancy. And because it's an object-based web service, we can 
access any given object individually by using a URL related to the 
original storage URL within our system.

 This also allows us to take advantage of S3 for images on our website.
All of the images from in our online collections database are being 
served straight from S3, which diverts the load from our public web 
server. When we launch zoomable images later this year, all of the 
tiles will also be generated locally in the DAM and then served to 
the public via the mirrored copy in S3.

 The current pricing is around $0.08/GB/month for 1-50 TB, which I 
think is fairly reasonable for what we're getting. They just dropped 
the price substantially a few months ago.

 DuraCloud http://www.duracloud.org/ supposedly offers a way to add 
another abstraction layer so you can build something like this that 
is portable between different cloud storage providers. But I haven't 
really looked into this as of yet.


--
Gary McGath, Professional Software Developer http://www.garymcgath.com

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Al Matthews
http://metaarchive.org/costs in our case. Interested to hear other
experiences. Al


On 1/11/13 10:01 AM, Joshua Welker jwel...@sbuniv.edu wrote:

Thanks, Al. I think we'd join a LOCKSS network rather than run multiple
LOCKSS boxes ourselves. Does anyone have any experience with one of
those, like the LOCKSS Global Alliance?

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Al Matthews
Sent: Friday, January 11, 2013 8:50 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is
typically spec-d for consumer hardware, and so, presumably as a result of
SE Asia flooding, there have been some drive failures and cache downtimes
and adjustments accordingly.

However, that is the worst of it, first.

LOCKSS is to some perhaps even considerable degree, tamper-resistant
since it relies on mechanisms of collective polling among multiple copies
to preserve integrity. This, as opposed to static checksums or some other
solution.

As such, it seems to me important to run a LOCKSS box with other LOCKSS
boxes; MA cooperative specifies six or so, distributed locations for each
cache.

The economic sustainability of such an enterprise is a valid question.
David S H Rosenthal at Stanford seems to lead the charge for this
research.

e.g.
http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more

I've heard mention from other players that they watch MA carefully for
such sustainability considerations, especially because MA uses LOCKSS for
non-journal content. In some sense this may extend LOCKSS beyond its
original design.

MetaArchive has in my opinion been extremely responsible in designating
succession scenarios and disaster recovery scenarios, going to far as to
fund, develop and test services for migration out of the system, into an
IRODS repository in the initial case.


Al Matthews
AUC Robert W. Woodruff Library

On 1/11/13 9:10 AM, Joshua Welker jwel...@sbuniv.edu wrote:

Good point. But since campus IT will be creating regular
disaster-recovery backups, the odds that we'd need ever need to
retrieve more than a handful of files from Glacier at a time is pretty
low.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Gary McGath
Sent: Friday, January 11, 2013 8:03 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Concerns have been raised about how expensive Glacier gets if you need
to recover a lot of files in a short time period.

http://www.wired.com/wiredenterprise/2012/08/glacier/

On 1/10/13 5:56 PM, Roy Tennant wrote:
 I'd also take a look at Amazon Glacier. Recently I parked about 50GB
 of data files in logical tar'd and gzip'd chunks and it's costing my
 employer less than 50 cents/month. Glacier, however, is best for
 park it and forget kinds of needs, as the real cost is in data flow.
 Storage is cheap, but must be considered offline or near line as
 you must first request to retrieve a file, wait for about a day, and
 then retrieve the file. And you're charged more for the download
 throughput than just about anything.

 I'm using a Unix client to handle all of the heavy lifting of
 uploading and downloading, as Glacier is meant to be used via an API
 rather than a web client.[1] If anyone is interested, I have local
 documentation on usage that I could probably genericize. And yes, I
 did round-trip a file to make sure it functioned as advertised.
 Roy

 [1] https://github.com/vsespb/mt-aws-glacier

 On Thu, Jan 10, 2013 at 2:29 PM,  ddwigg...@historicnewengland.org
wrote:
 We built our own solution for this by creating a plugin that works
with our digital asset management system (ResourceSpace) to
invidually back up files to Amazon S3. Because S3 is replicated to
multiple data centers, this provides a fairly high level of
redundancy. And because it's an object-based web service, we can
access any given object individually by using a URL related to the
original storage URL within our system.

 This also allows us to take advantage of S3 for images on our website.
All of the images from in our online collections database are being
served straight from S3, which diverts the load from our public web
server. When we launch zoomable images later this year, all of the
tiles will also be generated locally in the DAM and then served to
the public via the mirrored copy in S3.

 The current pricing is around $0.08/GB/month for 1-50 TB, which I
think is fairly reasonable for what we're getting. They just dropped
the price substantially a few months ago.

 DuraCloud http://www.duracloud.org/ supposedly offers a way to add
another abstraction layer so you can build something like this that
is portable between different cloud storage providers. But I haven't
really looked into this as of yet.


--
Gary McGath, Professional Software Developer http

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread James Gilbert
Hi Josh,

I lurked on this thread, as I did not know the size of your institution.

Being a public library serving about 24,000 residents - we have the
small-institution issues as well for this type of project. We recently
tackled a similar situation and the solution:

1) Purchase a 3TB SeaGate external network storage device (residential drive
from Best Buy)
2) Burn archived materials to DVD
3) Copy files to external storage (on site in my server room)
4) DVDs reside off-site (we are still determining where this would be, as
the library does not have a Safe Deposit Box)

This removes external companies, and the data is quick trip home and back.

I know it is not elaborate and fancy, very little code... but it was $150
for the drive; and cost of DVDs. 

James Gilbert, BS, MLIS
Systems Librarian
Whitehall Township Public Library
3700 Mechanicsville Road
Whitehall, PA 18052
 
610-432-4330 ext: 203


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Joshua Welker
Sent: Friday, January 11, 2013 10:09 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Matt,

I appreciate the information. At that price, it looks like MetaArchive would
be a better option than most of the other services mentioned in this thread.
At this point, I think it is going to come down to a LOCKSS solution such as
what MetaArchive provides or Amazon Glacier. We anticipate our digital
collection growing to about 3TB in the first two years. With Glacier, that
would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As
much as I would like to support library initiatives like LOCKSS, we are a
small institution with a very small budget, and the pricing of Glacier is
starting to look too good to pass up.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt
Schultz
Sent: Friday, January 11, 2013 8:49 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi Josh,

Glad you are looking into LOCKSS as a potential solution for your needs and
that you are thinking beyond simple backup solutions for more long-term
preservation. Here at MetaArchive Cooperative we make use of LOCKSS to
preserve a range of content/collections from our member institutions.

The nice thing (I think) about our approach and our use of LOCKSS as an
embedded technology is that you as an institution retain full control over
your collections in the preservation network and get to play an active and
on-going part in their preservation treatment over time. Storage costs in
MetaArchive are competitive ($1/GB/year), and with that you get up to 7
geographic replications. MetaArchive is international at this point and so
your collections really do achieve some safe distance from any disasters
that may hit close to home.

I'd be more than happy to talk with you further about your collection needs,
why we like LOCKSS, and any interest your institution may have in being part
of a collaborative approach to preserving your content above and beyond
simple backup. Feel free to contact me directly.

Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
matt.schu...@metaarchive.org
616-566-3204

On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote:

 Hi everyone,

 We are starting a digitization project for some of our special 
 collections, and we are having a hard time setting up a backup system 
 that meets the long-term preservation needs of digital archives. The 
 backup mechanisms currently used by campus IT are short-term full-server
backups.
 What we are looking for is more granular, file-level backup over the 
 very long term. Does anyone have any recommendations of software or 
 some service or technique? We are looking into LOCKSS but haven't dug too
deeply yet.
 Can anyone who uses LOCKSS tell me a bit of their experiences with it?

 Josh Welker
 Electronic/Media Services Librarian
 College Liaison
 University Libraries
 Southwest Baptist University
 417.328.1624




--
Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
matt.schu...@metaarchive.org
616-566-3204


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
James,

Definitely a simple and elegant solution, but that is not a viable long-term 
option for us. We currently have tons of old CDs and DVDs full of data, and one 
of our goals is to wean off those media completely.  Most consumer-grade CDs 
and DVDs are very poor in terms of long-term data integrity. Those discs have a 
shelf life of probably a decade or two tops. Plus we are wanting more 
redundancy than what is offered by having the backups as a collection of discs 
in a single physical location. But if that works for you guys, power to you. 
Cheap is good.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of James 
Gilbert
Sent: Friday, January 11, 2013 9:34 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi Josh,

I lurked on this thread, as I did not know the size of your institution.

Being a public library serving about 24,000 residents - we have the 
small-institution issues as well for this type of project. We recently tackled 
a similar situation and the solution:

1) Purchase a 3TB SeaGate external network storage device (residential drive 
from Best Buy)
2) Burn archived materials to DVD
3) Copy files to external storage (on site in my server room)
4) DVDs reside off-site (we are still determining where this would be, as the 
library does not have a Safe Deposit Box)

This removes external companies, and the data is quick trip home and back.

I know it is not elaborate and fancy, very little code... but it was $150 for 
the drive; and cost of DVDs. 

James Gilbert, BS, MLIS
Systems Librarian
Whitehall Township Public Library
3700 Mechanicsville Road
Whitehall, PA 18052
 
610-432-4330 ext: 203


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joshua 
Welker
Sent: Friday, January 11, 2013 10:09 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Matt,

I appreciate the information. At that price, it looks like MetaArchive would be 
a better option than most of the other services mentioned in this thread.
At this point, I think it is going to come down to a LOCKSS solution such as 
what MetaArchive provides or Amazon Glacier. We anticipate our digital 
collection growing to about 3TB in the first two years. With Glacier, that 
would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As much 
as I would like to support library initiatives like LOCKSS, we are a small 
institution with a very small budget, and the pricing of Glacier is starting to 
look too good to pass up.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt 
Schultz
Sent: Friday, January 11, 2013 8:49 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi Josh,

Glad you are looking into LOCKSS as a potential solution for your needs and 
that you are thinking beyond simple backup solutions for more long-term 
preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve 
a range of content/collections from our member institutions.

The nice thing (I think) about our approach and our use of LOCKSS as an 
embedded technology is that you as an institution retain full control over your 
collections in the preservation network and get to play an active and on-going 
part in their preservation treatment over time. Storage costs in MetaArchive 
are competitive ($1/GB/year), and with that you get up to 7 geographic 
replications. MetaArchive is international at this point and so your 
collections really do achieve some safe distance from any disasters that may 
hit close to home.

I'd be more than happy to talk with you further about your collection needs, 
why we like LOCKSS, and any interest your institution may have in being part of 
a collaborative approach to preserving your content above and beyond simple 
backup. Feel free to contact me directly.

Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org 
matt.schu...@metaarchive.org
616-566-3204

On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote:

 Hi everyone,

 We are starting a digitization project for some of our special 
 collections, and we are having a hard time setting up a backup system 
 that meets the long-term preservation needs of digital archives. The 
 backup mechanisms currently used by campus IT are short-term 
 full-server
backups.
 What we are looking for is more granular, file-level backup over the 
 very long term. Does anyone have any recommendations of software or 
 some service or technique? We are looking into LOCKSS but haven't dug 
 too
deeply yet.
 Can anyone who uses LOCKSS tell me a bit of their experiences with it?

 Josh Welker
 Electronic/Media Services Librarian
 College Liaison
 University Libraries
 Southwest Baptist University
 417.328.1624




--
Matt Schultz
Program

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Matt Schultz
Josh,

Totally understand the resource constraints and the price comparison
up-front. As Roy alluded to earlier, it pays with Glacier to envision what
your content retrieval scenarios might be, because that $368 up-front could
very easily balloon in situations where you are needing to restore a
collection(s) en-masse at a later date. Amazon Glacier as a service makes
their money on that end. In MetaArchive there is currently no charge for
collection retrieval for the sake of a restoration. You are also subject
and powerless over the long-term to Amazon's price hikes with Glacier.
Because we are a Cooperative, our members collaboratively work together
annually to determine technology preferences, vendors, pricing, cost
control, etc. You have a direct seat at the table to help steer the
solution in your direction.

On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker jwel...@sbuniv.edu wrote:

 Matt,

 I appreciate the information. At that price, it looks like MetaArchive
 would be a better option than most of the other services mentioned in this
 thread. At this point, I think it is going to come down to a LOCKSS
 solution such as what MetaArchive provides or Amazon Glacier. We anticipate
 our digital collection growing to about 3TB in the first two years. With
 Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and
 LOCKSS. As much as I would like to support library initiatives like LOCKSS,
 we are a small institution with a very small budget, and the pricing of
 Glacier is starting to look too good to pass up.

 Josh Welker


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Matt Schultz
 Sent: Friday, January 11, 2013 8:49 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Digital collection backups

 Hi Josh,

 Glad you are looking into LOCKSS as a potential solution for your needs
 and that you are thinking beyond simple backup solutions for more long-term
 preservation. Here at MetaArchive Cooperative we make use of LOCKSS to
 preserve a range of content/collections from our member institutions.

 The nice thing (I think) about our approach and our use of LOCKSS as an
 embedded technology is that you as an institution retain full control over
 your collections in the preservation network and get to play an active and
 on-going part in their preservation treatment over time. Storage costs in
 MetaArchive are competitive ($1/GB/year), and with that you get up to 7
 geographic replications. MetaArchive is international at this point and so
 your collections really do achieve some safe distance from any disasters
 that may hit close to home.

 I'd be more than happy to talk with you further about your collection
 needs, why we like LOCKSS, and any interest your institution may have in
 being part of a collaborative approach to preserving your content above and
 beyond simple backup. Feel free to contact me directly.

 Matt Schultz
 Program Manager
 Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
 matt.schu...@metaarchive.org
 616-566-3204

 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote:

  Hi everyone,
 
  We are starting a digitization project for some of our special
  collections, and we are having a hard time setting up a backup system
  that meets the long-term preservation needs of digital archives. The
  backup mechanisms currently used by campus IT are short-term full-server
 backups.
  What we are looking for is more granular, file-level backup over the
  very long term. Does anyone have any recommendations of software or
  some service or technique? We are looking into LOCKSS but haven't dug
 too deeply yet.
  Can anyone who uses LOCKSS tell me a bit of their experiences with it?
 
  Josh Welker
  Electronic/Media Services Librarian
  College Liaison
  University Libraries
  Southwest Baptist University
  417.328.1624
 



 --
 Matt Schultz
 Program Manager
 Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
 matt.schu...@metaarchive.org
 616-566-3204




-- 
Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative
http://www.metaarchive.org
matt.schu...@metaarchive.org
616-566-3204


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Cary Gordon
Restoring 3 Tb from Glacier is $370. Add about $90 if you use AWS
Import/Export (you provide the device).

Hopefully, this is not something that you would do often.

Cary

On Fri, Jan 11, 2013 at 8:14 AM, Matt Schultz
matt.schu...@metaarchive.org wrote:
 Josh,

 Totally understand the resource constraints and the price comparison
 up-front. As Roy alluded to earlier, it pays with Glacier to envision what
 your content retrieval scenarios might be, because that $368 up-front could
 very easily balloon in situations where you are needing to restore a
 collection(s) en-masse at a later date. Amazon Glacier as a service makes
 their money on that end. In MetaArchive there is currently no charge for
 collection retrieval for the sake of a restoration. You are also subject
 and powerless over the long-term to Amazon's price hikes with Glacier.
 Because we are a Cooperative, our members collaboratively work together
 annually to determine technology preferences, vendors, pricing, cost
 control, etc. You have a direct seat at the table to help steer the
 solution in your direction.

 On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker jwel...@sbuniv.edu wrote:

 Matt,

 I appreciate the information. At that price, it looks like MetaArchive
 would be a better option than most of the other services mentioned in this
 thread. At this point, I think it is going to come down to a LOCKSS
 solution such as what MetaArchive provides or Amazon Glacier. We anticipate
 our digital collection growing to about 3TB in the first two years. With
 Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and
 LOCKSS. As much as I would like to support library initiatives like LOCKSS,
 we are a small institution with a very small budget, and the pricing of
 Glacier is starting to look too good to pass up.

 Josh Welker


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Matt Schultz
 Sent: Friday, January 11, 2013 8:49 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Digital collection backups

 Hi Josh,

 Glad you are looking into LOCKSS as a potential solution for your needs
 and that you are thinking beyond simple backup solutions for more long-term
 preservation. Here at MetaArchive Cooperative we make use of LOCKSS to
 preserve a range of content/collections from our member institutions.

 The nice thing (I think) about our approach and our use of LOCKSS as an
 embedded technology is that you as an institution retain full control over
 your collections in the preservation network and get to play an active and
 on-going part in their preservation treatment over time. Storage costs in
 MetaArchive are competitive ($1/GB/year), and with that you get up to 7
 geographic replications. MetaArchive is international at this point and so
 your collections really do achieve some safe distance from any disasters
 that may hit close to home.

 I'd be more than happy to talk with you further about your collection
 needs, why we like LOCKSS, and any interest your institution may have in
 being part of a collaborative approach to preserving your content above and
 beyond simple backup. Feel free to contact me directly.

 Matt Schultz
 Program Manager
 Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
 matt.schu...@metaarchive.org
 616-566-3204

 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote:

  Hi everyone,
 
  We are starting a digitization project for some of our special
  collections, and we are having a hard time setting up a backup system
  that meets the long-term preservation needs of digital archives. The
  backup mechanisms currently used by campus IT are short-term full-server
 backups.
  What we are looking for is more granular, file-level backup over the
  very long term. Does anyone have any recommendations of software or
  some service or technique? We are looking into LOCKSS but haven't dug
 too deeply yet.
  Can anyone who uses LOCKSS tell me a bit of their experiences with it?
 
  Josh Welker
  Electronic/Media Services Librarian
  College Liaison
  University Libraries
  Southwest Baptist University
  417.328.1624
 



 --
 Matt Schultz
 Program Manager
 Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
 matt.schu...@metaarchive.org
 616-566-3204




 --
 Matt Schultz
 Program Manager
 Educopia Institute, MetaArchive Cooperative
 http://www.metaarchive.org
 matt.schu...@metaarchive.org
 616-566-3204



-- 
Cary Gordon
The Cherry Hill Company
http://chillco.com


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Aaron Trehub
Hello Josh,

Auburn University is a member of two Private LOCKSS Networks: the MetaArchive 
Cooperative and the Alabama Digital Preservation Network (ADPNet).  Here's a 
link to a recent conference paper that describes both networks, including their 
current pricing structures:

http://conference.ifla.org/past/ifla78/216-trehub-en.pdf

LOCKSS has worked well for us so far, in part because supporting 
community-based solutions is important to us.  As you point out, however, 
Glacier is an attractive alternative, especially for institutions that may be 
more interested in low-cost, low-throughput storage and less concerned about 
entrusting their content to a commercial outfit or having to pay extra to get 
it back out.  As with most things, you pay your money--more or less, 
depending--and make your choice.  And take your risks.

Good luck with whatever solution(s) you decide on.  They need not be mutually 
exclusive.

Best,

Aaron

Aaron Trehub
Assistant Dean for Technology and Technical Services
Auburn University Libraries
231 Mell Street, RBD Library
Auburn, AL 36849-5606
Phone: (334) 844-1716
Skype: ajtrehub
E-mail: treh...@auburn.edu
URL: http://lib.auburn.edu/

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@listserv.nd.edu] On Behalf Of Joshua 
Welker
Sent: Friday, January 11, 2013 9:09 AM
To: CODE4LIB@listserv.nd.edu
Subject: Re: [CODE4LIB] Digital collection backups

Matt,

I appreciate the information. At that price, it looks like MetaArchive would be 
a better option than most of the other services mentioned in this thread. At 
this point, I think it is going to come down to a LOCKSS solution such as what 
MetaArchive provides or Amazon Glacier. We anticipate our digital collection 
growing to about 3TB in the first two years. With Glacier, that would be $368 
per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like 
to support library initiatives like LOCKSS, we are a small institution with a 
very small budget, and the pricing of Glacier is starting to look too good to 
pass up.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt 
Schultz
Sent: Friday, January 11, 2013 8:49 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi Josh,

Glad you are looking into LOCKSS as a potential solution for your needs and 
that you are thinking beyond simple backup solutions for more long-term 
preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve 
a range of content/collections from our member institutions.

The nice thing (I think) about our approach and our use of LOCKSS as an 
embedded technology is that you as an institution retain full control over your 
collections in the preservation network and get to play an active and on-going 
part in their preservation treatment over time. Storage costs in MetaArchive 
are competitive ($1/GB/year), and with that you get up to 7 geographic 
replications. MetaArchive is international at this point and so your 
collections really do achieve some safe distance from any disasters that may 
hit close to home.

I'd be more than happy to talk with you further about your collection needs, 
why we like LOCKSS, and any interest your institution may have in being part of 
a collaborative approach to preserving your content above and beyond simple 
backup. Feel free to contact me directly.

Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org 
matt.schu...@metaarchive.org
616-566-3204

On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote:

 Hi everyone,

 We are starting a digitization project for some of our special 
 collections, and we are having a hard time setting up a backup system 
 that meets the long-term preservation needs of digital archives. The 
 backup mechanisms currently used by campus IT are short-term full-server 
 backups.
 What we are looking for is more granular, file-level backup over the 
 very long term. Does anyone have any recommendations of software or 
 some service or technique? We are looking into LOCKSS but haven't dug too 
 deeply yet.
 Can anyone who uses LOCKSS tell me a bit of their experiences with it?

 Josh Welker
 Electronic/Media Services Librarian
 College Liaison
 University Libraries
 Southwest Baptist University
 417.328.1624


--
Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org 
matt.schu...@metaarchive.org
616-566-3204


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Thanks, I missed the part about DuraCloud as an abstraction layer. I might look 
into hosting an install of it on the primary server running the digitization 
platform.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tim 
Donohue
Sent: Friday, January 11, 2013 12:39 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi all,

Just wanted to add some additional details about DuraCloud (mentioned earlier 
in this thread), in case it is of interest to anyone.

DuraCloud essentially provides an abstraction layer (as previously
mentioned) above several cloud storage providers.  DuraCloud also provides 
additional preservation services to help manage your content in the cloud (e.g. 
integrity checks, replication across several storage providers, migration 
between storage providers, various health/status reports).

The currently supported cloud storage providers include:
- Amazon S3
- Rackspace
- SDSC

There's several other cloud storage providers which are beta-level or in 
development. These include:
- Amazon Glacier (in development)
- Chronopolis (in development)
- Azure (beta)
- iRODS (beta)
- HP Cloud (beta)

DuraCloud is open source (so you could run it on your own server), but it is 
also offered as a hosted service (through DuraSpace, my employer). 
You can also try out the hosted service for free for two months.

For much more info, see:
- http://www.duracloud.org
- Pricing for hosted service: http://duracloud.org/content/pricing
* The pricing has dropped recently to reflect market changes
- More technical info / documentation: 
https://wiki.duraspace.org/display/DURACLOUD/DuraCloud

If it's of interest, I can put folks in touch with the DuraCloud team for more 
info (or you can email i...@duracloud.org).

- Tim

--
Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
The only scenario I can think of where we'd need to do a full restore is if the 
server crashes, and for those cases, we are going to have typical short-term 
imaging setups in place. Our needs beyond that are to make sure our original 
files are backed up redundantly in some non-volatile location so that in the 
event a file on the local server becomes corrupt, we have a high fidelity copy 
of the original on hand to use to restore it. Since data decay I assume happens 
rather infrequently and over a long period of time, it's not important for us 
to be able to restore all the files at once. Like I said, if the server catches 
on fire and crashes, we have regular off-site tape-based storage to fix those 
short-term problems.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cary 
Gordon
Sent: Friday, January 11, 2013 10:27 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Restoring 3 Tb from Glacier is $370. Add about $90 if you use AWS Import/Export 
(you provide the device).

Hopefully, this is not something that you would do often.

Cary

On Fri, Jan 11, 2013 at 8:14 AM, Matt Schultz matt.schu...@metaarchive.org 
wrote:
 Josh,

 Totally understand the resource constraints and the price comparison 
 up-front. As Roy alluded to earlier, it pays with Glacier to envision 
 what your content retrieval scenarios might be, because that $368 
 up-front could very easily balloon in situations where you are needing 
 to restore a
 collection(s) en-masse at a later date. Amazon Glacier as a service 
 makes their money on that end. In MetaArchive there is currently no 
 charge for collection retrieval for the sake of a restoration. You are 
 also subject and powerless over the long-term to Amazon's price hikes with 
 Glacier.
 Because we are a Cooperative, our members collaboratively work 
 together annually to determine technology preferences, vendors, 
 pricing, cost control, etc. You have a direct seat at the table to 
 help steer the solution in your direction.

 On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker jwel...@sbuniv.edu wrote:

 Matt,

 I appreciate the information. At that price, it looks like 
 MetaArchive would be a better option than most of the other services 
 mentioned in this thread. At this point, I think it is going to come 
 down to a LOCKSS solution such as what MetaArchive provides or Amazon 
 Glacier. We anticipate our digital collection growing to about 3TB in 
 the first two years. With Glacier, that would be $368 per year vs 
 $3,072 per year for MetaArchive and LOCKSS. As much as I would like 
 to support library initiatives like LOCKSS, we are a small 
 institution with a very small budget, and the pricing of Glacier is starting 
 to look too good to pass up.

 Josh Welker


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf 
 Of Matt Schultz
 Sent: Friday, January 11, 2013 8:49 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Digital collection backups

 Hi Josh,

 Glad you are looking into LOCKSS as a potential solution for your 
 needs and that you are thinking beyond simple backup solutions for 
 more long-term preservation. Here at MetaArchive Cooperative we make 
 use of LOCKSS to preserve a range of content/collections from our member 
 institutions.

 The nice thing (I think) about our approach and our use of LOCKSS as 
 an embedded technology is that you as an institution retain full 
 control over your collections in the preservation network and get to 
 play an active and on-going part in their preservation treatment over 
 time. Storage costs in MetaArchive are competitive ($1/GB/year), and 
 with that you get up to 7 geographic replications. MetaArchive is 
 international at this point and so your collections really do achieve 
 some safe distance from any disasters that may hit close to home.

 I'd be more than happy to talk with you further about your collection 
 needs, why we like LOCKSS, and any interest your institution may have 
 in being part of a collaborative approach to preserving your content 
 above and beyond simple backup. Feel free to contact me directly.

 Matt Schultz
 Program Manager
 Educopia Institute, MetaArchive Cooperative 
 http://www.metaarchive.org matt.schu...@metaarchive.org
 616-566-3204

 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote:

  Hi everyone,
 
  We are starting a digitization project for some of our special 
  collections, and we are having a hard time setting up a backup 
  system that meets the long-term preservation needs of digital 
  archives. The backup mechanisms currently used by campus IT are 
  short-term full-server
 backups.
  What we are looking for is more granular, file-level backup over 
  the very long term. Does anyone have any recommendations of 
  software or some service or technique? We are looking into LOCKSS 
  but haven't dug

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Thanks for bringing up the issue of the cost of making sure the data is 
consistent. We will be using DSpace for now, and I know DSpace has some 
checksum functionality built in out-of-the-box. It shouldn't be too difficult 
to write a script that loops through DSpace's checksum data and compares it 
against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it 
looks like they provide an archive inventory (updated daily) that can be 
downloaded as JSON. I read some users saying that this inventory includes 
checksum data. So hopefully it will just be a matter of comparing the local 
checksum to the Glacier checksum, and that would be easy enough to script.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby
Sent: Friday, January 11, 2013 11:37 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

As Aaron alludes to your decision should base off your real needs and they 
might not be exclusive.

LOCKSS/MetaArchive might be worth the money if it is the community archival 
aspect you are going for. Depending on your institution being a participant 
might make political/mission sense regardless of the storage needs and it could 
just be a specific collection that makes sense.

Glacier is a great choice if you are looking for spreading a backup across 
regions. S3 similarly if you also want to benefit from CloudFront (the CDN
setup) to take load off your institutions server (you can now use cloudfront 
off your own origin server as well). Depending on your bandwidth this might be 
worth the money regardless of LOCKSS participation (which can be more dark). 
Amazon also tends to be dropping prices over time vs raising but as any 
outsource you have to plan that it might not exist in the future. Also look 
more at Glacier prices in terms of checking your data for consistency. There 
have been a few papers on the costs of making sure Amazon really has the proper 
data depending on how often your requirements want you to check.

Another option if you are just looking for more geo placement is finding an 
institution or service provider that will colocate. There may be another small 
institution that would love to shove a cheap box with hard drives on your 
network in exchange for the same. Not as involved/formal as LOCKSS but gives 
you something you control to satisfy your requirements. It could also be as low 
tech as shipping SSDs to another institution who then runs some bagit checksums 
on the drive, etc.

All of the above should be scriptable in your workflow. Just need to decide 
what you really want out of it.

Eby


On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub treh...@auburn.edu wrote:

 Hello Josh,

 Auburn University is a member of two Private LOCKSS Networks: the 
 MetaArchive Cooperative and the Alabama Digital Preservation Network 
 (ADPNet).  Here's a link to a recent conference paper that describes 
 both networks, including their current pricing structures:

 http://conference.ifla.org/past/ifla78/216-trehub-en.pdf

 LOCKSS has worked well for us so far, in part because supporting 
 community-based solutions is important to us.  As you point out, 
 however, Glacier is an attractive alternative, especially for 
 institutions that may be more interested in low-cost, low-throughput 
 storage and less concerned about entrusting their content to a 
 commercial outfit or having to pay extra to get it back out.  As with 
 most things, you pay your money--more or less, depending--and make your 
 choice.  And take your risks.

 Good luck with whatever solution(s) you decide on.  They need not be 
 mutually exclusive.

 Best,

 Aaron

 Aaron Trehub
 Assistant Dean for Technology and Technical Services Auburn University 
 Libraries
 231 Mell Street, RBD Library
 Auburn, AL 36849-5606
 Phone: (334) 844-1716
 Skype: ajtrehub
 E-mail: treh...@auburn.edu
 URL: http://lib.auburn.edu/




Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Thomas Kula
On Fri, Jan 11, 2013 at 07:45:21PM +, Joshua Welker wrote:
 Thanks for bringing up the issue of the cost of making sure the data is 
 consistent. We will be using DSpace for now, and I know DSpace has some 
 checksum functionality built in out-of-the-box. It shouldn't be too difficult 
 to write a script that loops through DSpace's checksum data and compares it 
 against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it 
 looks like they provide an archive inventory (updated daily) that can be 
 downloaded as JSON. I read some users saying that this inventory includes 
 checksum data. So hopefully it will just be a matter of comparing the local 
 checksum to the Glacier checksum, and that would be easy enough to script.

An important question to ask here, though, is if that included checksum
data is the same that Amazon uses to perform the systematic data
integrity checks they mention in the Glacier FAQ, or if it's just
catalog data --- here's the checksum when we put it in. This is always
the question we run into when we consider services like this, can we
tease enough information out to convince ourselves that their checking
is sufficient. 

--
Thomas L. Kula | tlk2...@columbia.edu
Systems Engineer | Library Information Technology Office
The Libraries, Columbia University in the City of New York


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Tim Donohue

Hi Josh,

Now that you bring up DSpace as being part of the equation...

You might want to look at the newly released Replication Task Suite 
plugin/addon for DSpace (supports DSpace versions 1.8.x  3.0):


https://wiki.duraspace.org/display/DSPACE/ReplicationTaskSuite

This DSpace plugin does essentially what you are talking about...

It allows you to backup (i.e. replicate) DSpace content files and 
metadata (in the form of a set of AIPs, Archival Information Packages) 
to a local filesystem/drive or to cloud storage.  Plus it provides an 
auditing tool to audit changes between DSpace and the cloud storage 
provider.  Currently, for the Replication Task Suite, that only cloud 
storage plugin we have created is for DuraCloud. But, it wouldn't be too 
hard to create a new plugin for Glacier (if you wanted to send DSpace 
content directly to Glacier without DuraCloud in between).


The code is in GitHub at:
https://github.com/DSpace/dspace-replicate

If you decide to use it and create anything cool, feel free to send us a 
pull request.


Good luck,

- Tim

--
Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org

On 1/11/2013 1:45 PM, Joshua Welker wrote:

Thanks for bringing up the issue of the cost of making sure the data is 
consistent. We will be using DSpace for now, and I know DSpace has some 
checksum functionality built in out-of-the-box. It shouldn't be too difficult 
to write a script that loops through DSpace's checksum data and compares it 
against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it 
looks like they provide an archive inventory (updated daily) that can be 
downloaded as JSON. I read some users saying that this inventory includes 
checksum data. So hopefully it will just be a matter of comparing the local 
checksum to the Glacier checksum, and that would be easy enough to script.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby
Sent: Friday, January 11, 2013 11:37 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

As Aaron alludes to your decision should base off your real needs and they 
might not be exclusive.

LOCKSS/MetaArchive might be worth the money if it is the community archival 
aspect you are going for. Depending on your institution being a participant 
might make political/mission sense regardless of the storage needs and it could 
just be a specific collection that makes sense.

Glacier is a great choice if you are looking for spreading a backup across 
regions. S3 similarly if you also want to benefit from CloudFront (the CDN
setup) to take load off your institutions server (you can now use cloudfront 
off your own origin server as well). Depending on your bandwidth this might be 
worth the money regardless of LOCKSS participation (which can be more dark). 
Amazon also tends to be dropping prices over time vs raising but as any 
outsource you have to plan that it might not exist in the future. Also look 
more at Glacier prices in terms of checking your data for consistency. There 
have been a few papers on the costs of making sure Amazon really has the proper 
data depending on how often your requirements want you to check.

Another option if you are just looking for more geo placement is finding an 
institution or service provider that will colocate. There may be another small 
institution that would love to shove a cheap box with hard drives on your 
network in exchange for the same. Not as involved/formal as LOCKSS but gives 
you something you control to satisfy your requirements. It could also be as low 
tech as shipping SSDs to another institution who then runs some bagit checksums 
on the drive, etc.

All of the above should be scriptable in your workflow. Just need to decide 
what you really want out of it.

Eby


On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub treh...@auburn.edu wrote:


Hello Josh,

Auburn University is a member of two Private LOCKSS Networks: the
MetaArchive Cooperative and the Alabama Digital Preservation Network
(ADPNet).  Here's a link to a recent conference paper that describes
both networks, including their current pricing structures:

http://conference.ifla.org/past/ifla78/216-trehub-en.pdf

LOCKSS has worked well for us so far, in part because supporting
community-based solutions is important to us.  As you point out,
however, Glacier is an attractive alternative, especially for
institutions that may be more interested in low-cost, low-throughput
storage and less concerned about entrusting their content to a
commercial outfit or having to pay extra to get it back out.  As with
most things, you pay your money--more or less, depending--and make your choice. 
 And take your risks.

Good luck with whatever solution(s) you decide on.  They need not be
mutually exclusive.

Best,

Aaron

Aaron Trehub
Assistant Dean for Technology and Technical Services Auburn University
Libraries
231 Mell Street, RBD Library

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Awesome! Thanks. I will look into this for sure.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tim 
Donohue
Sent: Friday, January 11, 2013 2:30 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi Josh,

Now that you bring up DSpace as being part of the equation...

You might want to look at the newly released Replication Task Suite 
plugin/addon for DSpace (supports DSpace versions 1.8.x  3.0):

https://wiki.duraspace.org/display/DSPACE/ReplicationTaskSuite

This DSpace plugin does essentially what you are talking about...

It allows you to backup (i.e. replicate) DSpace content files and metadata (in 
the form of a set of AIPs, Archival Information Packages) to a local 
filesystem/drive or to cloud storage.  Plus it provides an auditing tool to 
audit changes between DSpace and the cloud storage provider.  Currently, for 
the Replication Task Suite, that only cloud storage plugin we have created is 
for DuraCloud. But, it wouldn't be too hard to create a new plugin for Glacier 
(if you wanted to send DSpace content directly to Glacier without DuraCloud in 
between).

The code is in GitHub at:
https://github.com/DSpace/dspace-replicate

If you decide to use it and create anything cool, feel free to send us a pull 
request.

Good luck,

- Tim

--
Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org

On 1/11/2013 1:45 PM, Joshua Welker wrote:
 Thanks for bringing up the issue of the cost of making sure the data is 
 consistent. We will be using DSpace for now, and I know DSpace has some 
 checksum functionality built in out-of-the-box. It shouldn't be too difficult 
 to write a script that loops through DSpace's checksum data and compares it 
 against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it 
 looks like they provide an archive inventory (updated daily) that can be 
 downloaded as JSON. I read some users saying that this inventory includes 
 checksum data. So hopefully it will just be a matter of comparing the local 
 checksum to the Glacier checksum, and that would be easy enough to script.

 Josh Welker


 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf 
 Of Ryan Eby
 Sent: Friday, January 11, 2013 11:37 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Digital collection backups

 As Aaron alludes to your decision should base off your real needs and they 
 might not be exclusive.

 LOCKSS/MetaArchive might be worth the money if it is the community archival 
 aspect you are going for. Depending on your institution being a participant 
 might make political/mission sense regardless of the storage needs and it 
 could just be a specific collection that makes sense.

 Glacier is a great choice if you are looking for spreading a backup 
 across regions. S3 similarly if you also want to benefit from 
 CloudFront (the CDN
 setup) to take load off your institutions server (you can now use cloudfront 
 off your own origin server as well). Depending on your bandwidth this might 
 be worth the money regardless of LOCKSS participation (which can be more 
 dark). Amazon also tends to be dropping prices over time vs raising but as 
 any outsource you have to plan that it might not exist in the future. Also 
 look more at Glacier prices in terms of checking your data for consistency. 
 There have been a few papers on the costs of making sure Amazon really has 
 the proper data depending on how often your requirements want you to check.

 Another option if you are just looking for more geo placement is finding an 
 institution or service provider that will colocate. There may be another 
 small institution that would love to shove a cheap box with hard drives on 
 your network in exchange for the same. Not as involved/formal as LOCKSS but 
 gives you something you control to satisfy your requirements. It could also 
 be as low tech as shipping SSDs to another institution who then runs some 
 bagit checksums on the drive, etc.

 All of the above should be scriptable in your workflow. Just need to decide 
 what you really want out of it.

 Eby


 On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub treh...@auburn.edu wrote:

 Hello Josh,

 Auburn University is a member of two Private LOCKSS Networks: the 
 MetaArchive Cooperative and the Alabama Digital Preservation Network 
 (ADPNet).  Here's a link to a recent conference paper that describes 
 both networks, including their current pricing structures:

 http://conference.ifla.org/past/ifla78/216-trehub-en.pdf

 LOCKSS has worked well for us so far, in part because supporting 
 community-based solutions is important to us.  As you point out, 
 however, Glacier is an attractive alternative, especially for 
 institutions that may be more interested in low-cost, low-throughput 
 storage and less concerned about entrusting their content to a 
 commercial outfit or having to pay extra to get

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread ddwiggins
Be careful about assuming too much on this.
 
When I started working with S3, the system required an MD5 sum to upload, and 
would respond to requests with this etag in the header as well. I therefor 
assumed that this was integral to the system, and was a good way to compare 
local files against the remote copies.
 
Then, maybe a year or two ago, Amazon introduced chunked uploads, so that you 
could send files in pieces and reassemble them once they got to S3. This was 
good, because it eliminated problems with huge files failing to upload due to 
network hicups. I went ahead and implemented it in my scripts. Then, all of a 
sudden I started getting invalid checksums. Turns out that for multipart file 
uploads, they now create etag identifiers that are not the md5 sum of the 
underlying files. 
 
I now store the checksum as a separate piece of header metadata. And my sync 
script does periodically compare against this. But since this is just metadata, 
checking it doesn't really prove anything about the underlying file that Amazon 
has. To do this I would need to write a script that would actually retrieve the 
file and rerun the checksum. I have not done this yet, although it is on my 
to-do list at some point. This would ideally happen on an Amazon server so that 
I wouldn't have to send the file back and forth.
 
In any case, my main point is: don't assume that you can just check against a 
checksum from the API to verify a file for digital preservation purposes.
 
-David
 
 
 
 
 
__
 
David Dwiggins
Systems Librarian/Archivist, Historic New England
141 Cambridge Street, Boston, MA 02114
(617) 994-5948
ddwigg...@historicnewengland.org
http://www.historicnewengland.org
 Joshua Welker jwel...@sbuniv.edu 1/11/2013 2:45 PM 
Thanks for bringing up the issue of the cost of making sure the data is 
consistent. We will be using DSpace for now, and I know DSpace has some 
checksum functionality built in out-of-the-box. It shouldn't be too difficult 
to write a script that loops through DSpace's checksum data and compares it 
against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it 
looks like they provide an archive inventory (updated daily) that can be 
downloaded as JSON. I read some users saying that this inventory includes 
checksum data. So hopefully it will just be a matter of comparing the local 
checksum to the Glacier checksum, and that would be easy enough to script.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby
Sent: Friday, January 11, 2013 11:37 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

As Aaron alludes to your decision should base off your real needs and they 
might not be exclusive.

LOCKSS/MetaArchive might be worth the money if it is the community archival 
aspect you are going for. Depending on your institution being a participant 
might make political/mission sense regardless of the storage needs and it could 
just be a specific collection that makes sense.

Glacier is a great choice if you are looking for spreading a backup across 
regions. S3 similarly if you also want to benefit from CloudFront (the CDN
setup) to take load off your institutions server (you can now use cloudfront 
off your own origin server as well). Depending on your bandwidth this might be 
worth the money regardless of LOCKSS participation (which can be more dark). 
Amazon also tends to be dropping prices over time vs raising but as any 
outsource you have to plan that it might not exist in the future. Also look 
more at Glacier prices in terms of checking your data for consistency. There 
have been a few papers on the costs of making sure Amazon really has the proper 
data depending on how often your requirements want you to check.

Another option if you are just looking for more geo placement is finding an 
institution or service provider that will colocate. There may be another small 
institution that would love to shove a cheap box with hard drives on your 
network in exchange for the same. Not as involved/formal as LOCKSS but gives 
you something you control to satisfy your requirements. It could also be as low 
tech as shipping SSDs to another institution who then runs some bagit checksums 
on the drive, etc.

All of the above should be scriptable in your workflow. Just need to decide 
what you really want out of it.

Eby


On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub treh...@auburn.edu wrote:

 Hello Josh,

 Auburn University is a member of two Private LOCKSS Networks: the 
 MetaArchive Cooperative and the Alabama Digital Preservation Network 
 (ADPNet).  Here's a link to a recent conference paper that describes 
 both networks, including their current pricing structures:

 http://conference.ifla.org/past/ifla78/216-trehub-en.pdf

 LOCKSS has worked well for us so far, in part because supporting 
 community-based solutions is important to us.  As you point out

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Randy Fischer
On Fri, Jan 11, 2013 at 2:45 PM, Joshua Welker jwel...@sbuniv.edu wrote:

 Reading the Glacier FAQ on Amazon's site, it looks like they provide an
 archive inventory (updated daily) that can be downloaded as JSON. I read
 some users saying that this inventory includes checksum data. So hopefully
 it will just be a matter of comparing the local checksum to the Glacier
 checksum, and that would be easy enough to script.



One could also occasionally spin up local EC2 instances to do the checksums
in the same data center, and ship just that metadata down - you would not
incur any bulk transfer costs in that case (if memory serves).   DAITSS
uses both md5 and sha1 checksums in combination, other preservation systems
might require similar.

-Randy Fischer


[CODE4LIB] Digital collection backups

2013-01-10 Thread Joshua Welker
Hi everyone,

We are starting a digitization project for some of our special collections, and 
we are having a hard time setting up a backup system that meets the long-term 
preservation needs of digital archives. The backup mechanisms currently used by 
campus IT are short-term full-server backups. What we are looking for is more 
granular, file-level backup over the very long term. Does anyone have any 
recommendations of software or some service or technique? We are looking into 
LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit 
of their experiences with it?

Josh Welker
Electronic/Media Services Librarian
College Liaison
University Libraries
Southwest Baptist University
417.328.1624


Re: [CODE4LIB] Digital collection backups

2013-01-10 Thread Roy Tennant
I'd also take a look at Amazon Glacier. Recently I parked about 50GB
of data files in logical tar'd and gzip'd chunks and it's costing my
employer less than 50 cents/month. Glacier, however, is best for park
it and forget kinds of needs, as the real cost is in data flow.
Storage is cheap, but must be considered offline or near line as
you must first request to retrieve a file, wait for about a day, and
then retrieve the file. And you're charged more for the download
throughput than just about anything.

I'm using a Unix client to handle all of the heavy lifting of
uploading and downloading, as Glacier is meant to be used via an API
rather than a web client.[1] If anyone is interested, I have local
documentation on usage that I could probably genericize. And yes, I
did round-trip a file to make sure it functioned as advertised.
Roy

[1] https://github.com/vsespb/mt-aws-glacier

On Thu, Jan 10, 2013 at 2:29 PM,  ddwigg...@historicnewengland.org wrote:
 We built our own solution for this by creating a plugin that works with our 
 digital asset management system (ResourceSpace) to invidually back up files 
 to Amazon S3. Because S3 is replicated to multiple data centers, this 
 provides a fairly high level of redundancy. And because it's an object-based 
 web service, we can access any given object individually by using a URL 
 related to the original storage URL within our system.

 This also allows us to take advantage of S3 for images on our website. All of 
 the images from in our online collections database are being served straight 
 from S3, which diverts the load from our public web server. When we launch 
 zoomable images later this year, all of the tiles will also be generated 
 locally in the DAM and then served to the public via the mirrored copy in S3.

 The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
 fairly reasonable for what we're getting. They just dropped the price 
 substantially a few months ago.

 DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
 abstraction layer so you can build something like this that is portable 
 between different cloud storage providers. But I haven't really looked into 
 this as of yet.

 -David


 __

 David Dwiggins
 Systems Librarian/Archivist, Historic New England
 141 Cambridge Street, Boston, MA 02114
 (617) 994-5948
 ddwigg...@historicnewengland.org
 http://www.historicnewengland.org
 Joshua Welker jwel...@sbuniv.edu 1/10/2013 5:20 PM 
 Hi everyone,

 We are starting a digitization project for some of our special collections, 
 and we are having a hard time setting up a backup system that meets the 
 long-term preservation needs of digital archives. The backup mechanisms 
 currently used by campus IT are short-term full-server backups. What we are 
 looking for is more granular, file-level backup over the very long term. Does 
 anyone have any recommendations of software or some service or technique? We 
 are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses 
 LOCKSS tell me a bit of their experiences with it?

 Josh Welker
 Electronic/Media Services Librarian
 College Liaison
 University Libraries
 Southwest Baptist University
 417.328.1624


Re: [CODE4LIB] Digital collection backups

2013-01-10 Thread Fleming, Declan
Hi - you might look into Chronopolis (which can be front ended by DuraCloud or 
not)  http://chronopolis.sdsc.edu/

Declan

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy 
Tennant
Sent: Thursday, January 10, 2013 2:56 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data 
files in logical tar'd and gzip'd chunks and it's costing my employer less than 
50 cents/month. Glacier, however, is best for park it and forget kinds of 
needs, as the real cost is in data flow.
Storage is cheap, but must be considered offline or near line as you must 
first request to retrieve a file, wait for about a day, and then retrieve the 
file. And you're charged more for the download throughput than just about 
anything.

I'm using a Unix client to handle all of the heavy lifting of uploading and 
downloading, as Glacier is meant to be used via an API rather than a web 
client.[1] If anyone is interested, I have local documentation on usage that I 
could probably genericize. And yes, I did round-trip a file to make sure it 
functioned as advertised.
Roy

[1] https://github.com/vsespb/mt-aws-glacier

On Thu, Jan 10, 2013 at 2:29 PM,  ddwigg...@historicnewengland.org wrote:
 We built our own solution for this by creating a plugin that works with our 
 digital asset management system (ResourceSpace) to invidually back up files 
 to Amazon S3. Because S3 is replicated to multiple data centers, this 
 provides a fairly high level of redundancy. And because it's an object-based 
 web service, we can access any given object individually by using a URL 
 related to the original storage URL within our system.

 This also allows us to take advantage of S3 for images on our website. All of 
 the images from in our online collections database are being served straight 
 from S3, which diverts the load from our public web server. When we launch 
 zoomable images later this year, all of the tiles will also be generated 
 locally in the DAM and then served to the public via the mirrored copy in S3.

 The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
 fairly reasonable for what we're getting. They just dropped the price 
 substantially a few months ago.

 DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
 abstraction layer so you can build something like this that is portable 
 between different cloud storage providers. But I haven't really looked into 
 this as of yet.

 -David


 __

 David Dwiggins
 Systems Librarian/Archivist, Historic New England
 141 Cambridge Street, Boston, MA 02114
 (617) 994-5948
 ddwigg...@historicnewengland.org
 http://www.historicnewengland.org
 Joshua Welker jwel...@sbuniv.edu 1/10/2013 5:20 PM 
 Hi everyone,

 We are starting a digitization project for some of our special collections, 
 and we are having a hard time setting up a backup system that meets the 
 long-term preservation needs of digital archives. The backup mechanisms 
 currently used by campus IT are short-term full-server backups. What we are 
 looking for is more granular, file-level backup over the very long term. Does 
 anyone have any recommendations of software or some service or technique? We 
 are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses 
 LOCKSS tell me a bit of their experiences with it?

 Josh Welker
 Electronic/Media Services Librarian
 College Liaison
 University Libraries
 Southwest Baptist University
 417.328.1624


Re: [CODE4LIB] Digital collection backups

2013-01-10 Thread Chris Cormack
Obnam http://liw.fi/obnam/ might do what you need with the minimum of fuss

Chris

On 11 January 2013 12:05, Fleming, Declan dflem...@ucsd.edu wrote:
 Hi - you might look into Chronopolis (which can be front ended by DuraCloud 
 or not)  http://chronopolis.sdsc.edu/

 Declan

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy 
 Tennant
 Sent: Thursday, January 10, 2013 2:56 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Digital collection backups

 I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data 
 files in logical tar'd and gzip'd chunks and it's costing my employer less 
 than 50 cents/month. Glacier, however, is best for park it and forget kinds 
 of needs, as the real cost is in data flow.
 Storage is cheap, but must be considered offline or near line as you must 
 first request to retrieve a file, wait for about a day, and then retrieve the 
 file. And you're charged more for the download throughput than just about 
 anything.

 I'm using a Unix client to handle all of the heavy lifting of uploading and 
 downloading, as Glacier is meant to be used via an API rather than a web 
 client.[1] If anyone is interested, I have local documentation on usage that 
 I could probably genericize. And yes, I did round-trip a file to make sure it 
 functioned as advertised.
 Roy

 [1] https://github.com/vsespb/mt-aws-glacier

 On Thu, Jan 10, 2013 at 2:29 PM,  ddwigg...@historicnewengland.org wrote:
 We built our own solution for this by creating a plugin that works with our 
 digital asset management system (ResourceSpace) to invidually back up files 
 to Amazon S3. Because S3 is replicated to multiple data centers, this 
 provides a fairly high level of redundancy. And because it's an object-based 
 web service, we can access any given object individually by using a URL 
 related to the original storage URL within our system.

 This also allows us to take advantage of S3 for images on our website. All 
 of the images from in our online collections database are being served 
 straight from S3, which diverts the load from our public web server. When we 
 launch zoomable images later this year, all of the tiles will also be 
 generated locally in the DAM and then served to the public via the mirrored 
 copy in S3.

 The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
 fairly reasonable for what we're getting. They just dropped the price 
 substantially a few months ago.

 DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
 abstraction layer so you can build something like this that is portable 
 between different cloud storage providers. But I haven't really looked into 
 this as of yet.

 -David


 __

 David Dwiggins
 Systems Librarian/Archivist, Historic New England
 141 Cambridge Street, Boston, MA 02114
 (617) 994-5948
 ddwigg...@historicnewengland.org
 http://www.historicnewengland.org
 Joshua Welker jwel...@sbuniv.edu 1/10/2013 5:20 PM 
 Hi everyone,

 We are starting a digitization project for some of our special collections, 
 and we are having a hard time setting up a backup system that meets the 
 long-term preservation needs of digital archives. The backup mechanisms 
 currently used by campus IT are short-term full-server backups. What we are 
 looking for is more granular, file-level backup over the very long term. 
 Does anyone have any recommendations of software or some service or 
 technique? We are looking into LOCKSS but haven't dug too deeply yet. Can 
 anyone who uses LOCKSS tell me a bit of their experiences with it?

 Josh Welker
 Electronic/Media Services Librarian
 College Liaison
 University Libraries
 Southwest Baptist University
 417.328.1624