Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
I think we have a winning idea here. Thanks.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Randy 
Fischer
Sent: Friday, January 11, 2013 3:46 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

On Fri, Jan 11, 2013 at 2:45 PM, Joshua Welker  wrote:

> Reading the Glacier FAQ on Amazon's site, it looks like they provide 
> an archive inventory (updated daily) that can be downloaded as JSON. I 
> read some users saying that this inventory includes checksum data. So 
> hopefully it will just be a matter of comparing the local checksum to 
> the Glacier checksum, and that would be easy enough to script.
>
>

One could also occasionally spin up local EC2 instances to do the checksums in 
the same data center, and ship just that metadata down - you would not
incur any bulk transfer costs in that case (if memory serves).   DAITSS
uses both md5 and sha1 checksums in combination, other preservation systems 
might require similar.

-Randy Fischer


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Randy Fischer
On Fri, Jan 11, 2013 at 2:45 PM, Joshua Welker  wrote:

> Reading the Glacier FAQ on Amazon's site, it looks like they provide an
> archive inventory (updated daily) that can be downloaded as JSON. I read
> some users saying that this inventory includes checksum data. So hopefully
> it will just be a matter of comparing the local checksum to the Glacier
> checksum, and that would be easy enough to script.
>
>

One could also occasionally spin up local EC2 instances to do the checksums
in the same data center, and ship just that metadata down - you would not
incur any bulk transfer costs in that case (if memory serves).   DAITSS
uses both md5 and sha1 checksums in combination, other preservation systems
might require similar.

-Randy Fischer


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread ddwiggins
Be careful about assuming too much on this.
 
When I started working with S3, the system required an MD5 sum to upload, and 
would respond to requests with this "etag" in the header as well. I therefor 
assumed that this was integral to the system, and was a good way to compare 
local files against the remote copies.
 
Then, maybe a year or two ago, Amazon introduced chunked uploads, so that you 
could send files in pieces and reassemble them once they got to S3. This was 
good, because it eliminated problems with huge files failing to upload due to 
network hicups. I went ahead and implemented it in my scripts. Then, all of a 
sudden I started getting invalid checksums. Turns out that for multipart file 
uploads, they now create etag identifiers that are not the md5 sum of the 
underlying files. 
 
I now store the checksum as a separate piece of header metadata. And my sync 
script does periodically compare against this. But since this is just metadata, 
checking it doesn't really prove anything about the underlying file that Amazon 
has. To do this I would need to write a script that would actually retrieve the 
file and rerun the checksum. I have not done this yet, although it is on my 
to-do list at some point. This would ideally happen on an Amazon server so that 
I wouldn't have to send the file back and forth.
 
In any case, my main point is: don't assume that you can just check against a 
checksum from the API to verify a file for digital preservation purposes.
 
-David
 
 
 
 
 
__
 
David Dwiggins
Systems Librarian/Archivist, Historic New England
141 Cambridge Street, Boston, MA 02114
(617) 994-5948
ddwigg...@historicnewengland.org
http://www.historicnewengland.org
>>> Joshua Welker  1/11/2013 2:45 PM >>>
Thanks for bringing up the issue of the cost of making sure the data is 
consistent. We will be using DSpace for now, and I know DSpace has some 
checksum functionality built in out-of-the-box. It shouldn't be too difficult 
to write a script that loops through DSpace's checksum data and compares it 
against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it 
looks like they provide an archive inventory (updated daily) that can be 
downloaded as JSON. I read some users saying that this inventory includes 
checksum data. So hopefully it will just be a matter of comparing the local 
checksum to the Glacier checksum, and that would be easy enough to script.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby
Sent: Friday, January 11, 2013 11:37 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

As Aaron alludes to your decision should base off your real needs and they 
might not be exclusive.

LOCKSS/MetaArchive might be worth the money if it is the community archival 
aspect you are going for. Depending on your institution being a participant 
might make political/mission sense regardless of the storage needs and it could 
just be a specific collection that makes sense.

Glacier is a great choice if you are looking for spreading a backup across 
regions. S3 similarly if you also want to benefit from CloudFront (the CDN
setup) to take load off your institutions server (you can now use cloudfront 
off your own origin server as well). Depending on your bandwidth this might be 
worth the money regardless of LOCKSS participation (which can be more dark). 
Amazon also tends to be dropping prices over time vs raising but as any 
outsource you have to plan that it might not exist in the future. Also look 
more at Glacier prices in terms of checking your data for consistency. There 
have been a few papers on the costs of making sure Amazon really has the proper 
data depending on how often your requirements want you to check.

Another option if you are just looking for more geo placement is finding an 
institution or service provider that will colocate. There may be another small 
institution that would love to shove a cheap box with hard drives on your 
network in exchange for the same. Not as involved/formal as LOCKSS but gives 
you something you control to satisfy your requirements. It could also be as low 
tech as shipping SSDs to another institution who then runs some bagit checksums 
on the drive, etc.

All of the above should be scriptable in your workflow. Just need to decide 
what you really want out of it.

Eby


On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub  wrote:

> Hello Josh,
>
> Auburn University is a member of two Private LOCKSS Networks: the 
> MetaArchive Cooperative and the Alabama Digital Preservation Network 
> (ADPNet).  Here's a link to a recent conference paper that describes 
> both networks, including their current pricing structures:
>
> http://conference.ifla.org/past/ifla78/216-trehub-en.pdf
>
> LOCKSS has worked well for us so far, in part because supporti

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Awesome! Thanks. I will look into this for sure.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tim 
Donohue
Sent: Friday, January 11, 2013 2:30 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi Josh,

Now that you bring up DSpace as being part of the equation...

You might want to look at the newly released "Replication Task Suite" 
plugin/addon for DSpace (supports DSpace versions 1.8.x & 3.0):

https://wiki.duraspace.org/display/DSPACE/ReplicationTaskSuite

This DSpace plugin does essentially what you are talking about...

It allows you to backup (i.e. replicate) DSpace content files and metadata (in 
the form of a set of AIPs, Archival Information Packages) to a local 
filesystem/drive or to cloud storage.  Plus it provides an "auditing" tool to 
audit changes between DSpace and the cloud storage provider.  Currently, for 
the Replication Task Suite, that only cloud storage plugin we have created is 
for DuraCloud. But, it wouldn't be too hard to create a new plugin for Glacier 
(if you wanted to send DSpace content directly to Glacier without DuraCloud in 
between).

The code is in GitHub at:
https://github.com/DSpace/dspace-replicate

If you decide to use it and create anything cool, feel free to send us a pull 
request.

Good luck,

- Tim

--
Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org

On 1/11/2013 1:45 PM, Joshua Welker wrote:
> Thanks for bringing up the issue of the cost of making sure the data is 
> consistent. We will be using DSpace for now, and I know DSpace has some 
> checksum functionality built in out-of-the-box. It shouldn't be too difficult 
> to write a script that loops through DSpace's checksum data and compares it 
> against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it 
> looks like they provide an archive inventory (updated daily) that can be 
> downloaded as JSON. I read some users saying that this inventory includes 
> checksum data. So hopefully it will just be a matter of comparing the local 
> checksum to the Glacier checksum, and that would be easy enough to script.
>
> Josh Welker
>
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf 
> Of Ryan Eby
> Sent: Friday, January 11, 2013 11:37 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Digital collection backups
>
> As Aaron alludes to your decision should base off your real needs and they 
> might not be exclusive.
>
> LOCKSS/MetaArchive might be worth the money if it is the community archival 
> aspect you are going for. Depending on your institution being a participant 
> might make political/mission sense regardless of the storage needs and it 
> could just be a specific collection that makes sense.
>
> Glacier is a great choice if you are looking for spreading a backup 
> across regions. S3 similarly if you also want to benefit from 
> CloudFront (the CDN
> setup) to take load off your institutions server (you can now use cloudfront 
> off your own origin server as well). Depending on your bandwidth this might 
> be worth the money regardless of LOCKSS participation (which can be more 
> dark). Amazon also tends to be dropping prices over time vs raising but as 
> any outsource you have to plan that it might not exist in the future. Also 
> look more at Glacier prices in terms of checking your data for consistency. 
> There have been a few papers on the costs of making sure Amazon really has 
> the proper data depending on how often your requirements want you to check.
>
> Another option if you are just looking for more geo placement is finding an 
> institution or service provider that will colocate. There may be another 
> small institution that would love to shove a cheap box with hard drives on 
> your network in exchange for the same. Not as involved/formal as LOCKSS but 
> gives you something you control to satisfy your requirements. It could also 
> be as low tech as shipping SSDs to another institution who then runs some 
> bagit checksums on the drive, etc.
>
> All of the above should be scriptable in your workflow. Just need to decide 
> what you really want out of it.
>
> Eby
>
>
> On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub  wrote:
>
>> Hello Josh,
>>
>> Auburn University is a member of two Private LOCKSS Networks: the 
>> MetaArchive Cooperative and the Alabama Digital Preservation Network 
>> (ADPNet).  Here's a link to a recent conference paper that describes 
>> both networks, including their current pricing structures:
>>
>> http://conference.ifla.org/past/ifla78/216-trehub-en.pdf
>>
>> LOCKSS has worked well for us so

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Tim Donohue

Hi Josh,

Now that you bring up DSpace as being part of the equation...

You might want to look at the newly released "Replication Task Suite" 
plugin/addon for DSpace (supports DSpace versions 1.8.x & 3.0):


https://wiki.duraspace.org/display/DSPACE/ReplicationTaskSuite

This DSpace plugin does essentially what you are talking about...

It allows you to backup (i.e. replicate) DSpace content files and 
metadata (in the form of a set of AIPs, Archival Information Packages) 
to a local filesystem/drive or to cloud storage.  Plus it provides an 
"auditing" tool to audit changes between DSpace and the cloud storage 
provider.  Currently, for the Replication Task Suite, that only cloud 
storage plugin we have created is for DuraCloud. But, it wouldn't be too 
hard to create a new plugin for Glacier (if you wanted to send DSpace 
content directly to Glacier without DuraCloud in between).


The code is in GitHub at:
https://github.com/DSpace/dspace-replicate

If you decide to use it and create anything cool, feel free to send us a 
pull request.


Good luck,

- Tim

--
Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org

On 1/11/2013 1:45 PM, Joshua Welker wrote:

Thanks for bringing up the issue of the cost of making sure the data is 
consistent. We will be using DSpace for now, and I know DSpace has some 
checksum functionality built in out-of-the-box. It shouldn't be too difficult 
to write a script that loops through DSpace's checksum data and compares it 
against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it 
looks like they provide an archive inventory (updated daily) that can be 
downloaded as JSON. I read some users saying that this inventory includes 
checksum data. So hopefully it will just be a matter of comparing the local 
checksum to the Glacier checksum, and that would be easy enough to script.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby
Sent: Friday, January 11, 2013 11:37 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

As Aaron alludes to your decision should base off your real needs and they 
might not be exclusive.

LOCKSS/MetaArchive might be worth the money if it is the community archival 
aspect you are going for. Depending on your institution being a participant 
might make political/mission sense regardless of the storage needs and it could 
just be a specific collection that makes sense.

Glacier is a great choice if you are looking for spreading a backup across 
regions. S3 similarly if you also want to benefit from CloudFront (the CDN
setup) to take load off your institutions server (you can now use cloudfront 
off your own origin server as well). Depending on your bandwidth this might be 
worth the money regardless of LOCKSS participation (which can be more dark). 
Amazon also tends to be dropping prices over time vs raising but as any 
outsource you have to plan that it might not exist in the future. Also look 
more at Glacier prices in terms of checking your data for consistency. There 
have been a few papers on the costs of making sure Amazon really has the proper 
data depending on how often your requirements want you to check.

Another option if you are just looking for more geo placement is finding an 
institution or service provider that will colocate. There may be another small 
institution that would love to shove a cheap box with hard drives on your 
network in exchange for the same. Not as involved/formal as LOCKSS but gives 
you something you control to satisfy your requirements. It could also be as low 
tech as shipping SSDs to another institution who then runs some bagit checksums 
on the drive, etc.

All of the above should be scriptable in your workflow. Just need to decide 
what you really want out of it.

Eby


On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub  wrote:


Hello Josh,

Auburn University is a member of two Private LOCKSS Networks: the
MetaArchive Cooperative and the Alabama Digital Preservation Network
(ADPNet).  Here's a link to a recent conference paper that describes
both networks, including their current pricing structures:

http://conference.ifla.org/past/ifla78/216-trehub-en.pdf

LOCKSS has worked well for us so far, in part because supporting
community-based solutions is important to us.  As you point out,
however, Glacier is an attractive alternative, especially for
institutions that may be more interested in low-cost, low-throughput
storage and less concerned about entrusting their content to a
commercial outfit or having to pay extra to get it back out.  As with
most things, you pay your money--more or less, depending--and make your choice. 
 And take your risks.

Good luck with whatever solution(s) you decide on.  They need not be
mutually exclusive.

Best,

Aaron

Aaron Trehub
Assistant Dean for Technology and Technical Services Auburn University
Librar

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Thomas Kula
On Fri, Jan 11, 2013 at 07:45:21PM +, Joshua Welker wrote:
> Thanks for bringing up the issue of the cost of making sure the data is 
> consistent. We will be using DSpace for now, and I know DSpace has some 
> checksum functionality built in out-of-the-box. It shouldn't be too difficult 
> to write a script that loops through DSpace's checksum data and compares it 
> against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it 
> looks like they provide an archive inventory (updated daily) that can be 
> downloaded as JSON. I read some users saying that this inventory includes 
> checksum data. So hopefully it will just be a matter of comparing the local 
> checksum to the Glacier checksum, and that would be easy enough to script.

An important question to ask here, though, is if that included checksum
data is the same that Amazon uses to perform the "systematic data
integrity checks" they mention in the Glacier FAQ, or if it's just
catalog data --- "here's the checksum when we put it in". This is always
the question we run into when we consider services like this, can we
tease enough information out to convince ourselves that their checking
is sufficient. 

--
Thomas L. Kula | tlk2...@columbia.edu
Systems Engineer | Library Information Technology Office
The Libraries, Columbia University in the City of New York


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Thanks for bringing up the issue of the cost of making sure the data is 
consistent. We will be using DSpace for now, and I know DSpace has some 
checksum functionality built in out-of-the-box. It shouldn't be too difficult 
to write a script that loops through DSpace's checksum data and compares it 
against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it 
looks like they provide an archive inventory (updated daily) that can be 
downloaded as JSON. I read some users saying that this inventory includes 
checksum data. So hopefully it will just be a matter of comparing the local 
checksum to the Glacier checksum, and that would be easy enough to script.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby
Sent: Friday, January 11, 2013 11:37 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

As Aaron alludes to your decision should base off your real needs and they 
might not be exclusive.

LOCKSS/MetaArchive might be worth the money if it is the community archival 
aspect you are going for. Depending on your institution being a participant 
might make political/mission sense regardless of the storage needs and it could 
just be a specific collection that makes sense.

Glacier is a great choice if you are looking for spreading a backup across 
regions. S3 similarly if you also want to benefit from CloudFront (the CDN
setup) to take load off your institutions server (you can now use cloudfront 
off your own origin server as well). Depending on your bandwidth this might be 
worth the money regardless of LOCKSS participation (which can be more dark). 
Amazon also tends to be dropping prices over time vs raising but as any 
outsource you have to plan that it might not exist in the future. Also look 
more at Glacier prices in terms of checking your data for consistency. There 
have been a few papers on the costs of making sure Amazon really has the proper 
data depending on how often your requirements want you to check.

Another option if you are just looking for more geo placement is finding an 
institution or service provider that will colocate. There may be another small 
institution that would love to shove a cheap box with hard drives on your 
network in exchange for the same. Not as involved/formal as LOCKSS but gives 
you something you control to satisfy your requirements. It could also be as low 
tech as shipping SSDs to another institution who then runs some bagit checksums 
on the drive, etc.

All of the above should be scriptable in your workflow. Just need to decide 
what you really want out of it.

Eby


On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub  wrote:

> Hello Josh,
>
> Auburn University is a member of two Private LOCKSS Networks: the 
> MetaArchive Cooperative and the Alabama Digital Preservation Network 
> (ADPNet).  Here's a link to a recent conference paper that describes 
> both networks, including their current pricing structures:
>
> http://conference.ifla.org/past/ifla78/216-trehub-en.pdf
>
> LOCKSS has worked well for us so far, in part because supporting 
> community-based solutions is important to us.  As you point out, 
> however, Glacier is an attractive alternative, especially for 
> institutions that may be more interested in low-cost, low-throughput 
> storage and less concerned about entrusting their content to a 
> commercial outfit or having to pay extra to get it back out.  As with 
> most things, you pay your money--more or less, depending--and make your 
> choice.  And take your risks.
>
> Good luck with whatever solution(s) you decide on.  They need not be 
> mutually exclusive.
>
> Best,
>
> Aaron
>
> Aaron Trehub
> Assistant Dean for Technology and Technical Services Auburn University 
> Libraries
> 231 Mell Street, RBD Library
> Auburn, AL 36849-5606
> Phone: (334) 844-1716
> Skype: ajtrehub
> E-mail: treh...@auburn.edu
> URL: http://lib.auburn.edu/
>
>


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
The only scenario I can think of where we'd need to do a full restore is if the 
server crashes, and for those cases, we are going to have typical short-term 
imaging setups in place. Our needs beyond that are to make sure our original 
files are backed up redundantly in some non-volatile location so that in the 
event a file on the local server becomes corrupt, we have a high fidelity copy 
of the original on hand to use to restore it. Since data decay I assume happens 
rather infrequently and over a long period of time, it's not important for us 
to be able to restore all the files at once. Like I said, if the server catches 
on fire and crashes, we have regular off-site tape-based storage to fix those 
short-term problems.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cary 
Gordon
Sent: Friday, January 11, 2013 10:27 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Restoring 3 Tb from Glacier is $370. Add about $90 if you use AWS Import/Export 
(you provide the device).

Hopefully, this is not something that you would do often.

Cary

On Fri, Jan 11, 2013 at 8:14 AM, Matt Schultz  
wrote:
> Josh,
>
> Totally understand the resource constraints and the price comparison 
> up-front. As Roy alluded to earlier, it pays with Glacier to envision 
> what your content retrieval scenarios might be, because that $368 
> up-front could very easily balloon in situations where you are needing 
> to restore a
> collection(s) en-masse at a later date. Amazon Glacier as a service 
> makes their money on that end. In MetaArchive there is currently no 
> charge for collection retrieval for the sake of a restoration. You are 
> also subject and powerless over the long-term to Amazon's price hikes with 
> Glacier.
> Because we are a Cooperative, our members collaboratively work 
> together annually to determine technology preferences, vendors, 
> pricing, cost control, etc. You have a direct seat at the table to 
> help steer the solution in your direction.
>
> On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker  wrote:
>
>> Matt,
>>
>> I appreciate the information. At that price, it looks like 
>> MetaArchive would be a better option than most of the other services 
>> mentioned in this thread. At this point, I think it is going to come 
>> down to a LOCKSS solution such as what MetaArchive provides or Amazon 
>> Glacier. We anticipate our digital collection growing to about 3TB in 
>> the first two years. With Glacier, that would be $368 per year vs 
>> $3,072 per year for MetaArchive and LOCKSS. As much as I would like 
>> to support library initiatives like LOCKSS, we are a small 
>> institution with a very small budget, and the pricing of Glacier is starting 
>> to look too good to pass up.
>>
>> Josh Welker
>>
>>
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf 
>> Of Matt Schultz
>> Sent: Friday, January 11, 2013 8:49 AM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] Digital collection backups
>>
>> Hi Josh,
>>
>> Glad you are looking into LOCKSS as a potential solution for your 
>> needs and that you are thinking beyond simple backup solutions for 
>> more long-term preservation. Here at MetaArchive Cooperative we make 
>> use of LOCKSS to preserve a range of content/collections from our member 
>> institutions.
>>
>> The nice thing (I think) about our approach and our use of LOCKSS as 
>> an embedded technology is that you as an institution retain full 
>> control over your collections in the preservation network and get to 
>> play an active and on-going part in their preservation treatment over 
>> time. Storage costs in MetaArchive are competitive ($1/GB/year), and 
>> with that you get up to 7 geographic replications. MetaArchive is 
>> international at this point and so your collections really do achieve 
>> some safe distance from any disasters that may hit close to home.
>>
>> I'd be more than happy to talk with you further about your collection 
>> needs, why we like LOCKSS, and any interest your institution may have 
>> in being part of a collaborative approach to preserving your content 
>> above and beyond simple backup. Feel free to contact me directly.
>>
>> Matt Schultz
>> Program Manager
>> Educopia Institute, MetaArchive Cooperative 
>> http://www.metaarchive.org matt.schu...@metaarchive.org
>> 616-566-3204
>>
>> On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker  wrote:
>>
>> > Hi everyone,
>> >
>> > We are starting a digiti

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Thanks, I missed the part about DuraCloud as an abstraction layer. I might look 
into hosting an install of it on the primary server running the digitization 
platform.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tim 
Donohue
Sent: Friday, January 11, 2013 12:39 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi all,

Just wanted to add some additional details about DuraCloud (mentioned earlier 
in this thread), in case it is of interest to anyone.

DuraCloud essentially provides an "abstraction layer" (as previously
mentioned) above several cloud storage providers.  DuraCloud also provides 
additional preservation services to help manage your content in the cloud (e.g. 
integrity checks, replication across several storage providers, migration 
between storage providers, various health/status reports).

The currently supported cloud storage providers include:
- Amazon S3
- Rackspace
- SDSC

There's several other cloud storage providers which are "beta-level" or in 
development. These include:
- Amazon Glacier (in development)
- Chronopolis (in development)
- Azure (beta)
- iRODS (beta)
- HP Cloud (beta)

DuraCloud is open source (so you could run it on your own server), but it is 
also offered as a hosted service (through DuraSpace, my employer). 
You can also try out the hosted service for free for two months.

For much more info, see:
- http://www.duracloud.org
- Pricing for hosted service: http://duracloud.org/content/pricing
* The pricing has dropped recently to reflect market changes
- More technical info / documentation: 
https://wiki.duraspace.org/display/DURACLOUD/DuraCloud

If it's of interest, I can put folks in touch with the DuraCloud team for more 
info (or you can email i...@duracloud.org).

- Tim

--
Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Tim Donohue

Hi all,

Just wanted to add some additional details about DuraCloud (mentioned 
earlier in this thread), in case it is of interest to anyone.


DuraCloud essentially provides an "abstraction layer" (as previously 
mentioned) above several cloud storage providers.  DuraCloud also 
provides additional preservation services to help manage your content in 
the cloud (e.g. integrity checks, replication across several storage 
providers, migration between storage providers, various health/status 
reports).


The currently supported cloud storage providers include:
- Amazon S3
- Rackspace
- SDSC

There's several other cloud storage providers which are "beta-level" or 
in development. These include:

- Amazon Glacier (in development)
- Chronopolis (in development)
- Azure (beta)
- iRODS (beta)
- HP Cloud (beta)

DuraCloud is open source (so you could run it on your own server), but 
it is also offered as a hosted service (through DuraSpace, my employer). 
You can also try out the hosted service for free for two months.


For much more info, see:
- http://www.duracloud.org
- Pricing for hosted service: http://duracloud.org/content/pricing
   * The pricing has dropped recently to reflect market changes
- More technical info / documentation: 
https://wiki.duraspace.org/display/DURACLOUD/DuraCloud


If it's of interest, I can put folks in touch with the DuraCloud team 
for more info (or you can email i...@duracloud.org).


- Tim

--
Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Ryan Eby
As Aaron alludes to your decision should base off your real needs and they
might not be exclusive.

LOCKSS/MetaArchive might be worth the money if it is the community archival
aspect you are going for. Depending on your institution being a participant
might make political/mission sense regardless of the storage needs and it
could just be a specific collection that makes sense.

Glacier is a great choice if you are looking for spreading a backup across
regions. S3 similarly if you also want to benefit from CloudFront (the CDN
setup) to take load off your institutions server (you can now use
cloudfront off your own origin server as well). Depending on your bandwidth
this might be worth the money regardless of LOCKSS participation (which can
be more dark). Amazon also tends to be dropping prices over time vs raising
but as any outsource you have to plan that it might not exist in the
future. Also look more at Glacier prices in terms of checking your data for
consistency. There have been a few papers on the costs of making sure
Amazon really has the proper data depending on how often your requirements
want you to check.

Another option if you are just looking for more geo placement is finding an
institution or service provider that will colocate. There may be another
small institution that would love to shove a cheap box with hard drives on
your network in exchange for the same. Not as involved/formal as LOCKSS but
gives you something you control to satisfy your requirements. It could also
be as low tech as shipping SSDs to another institution who then runs some
bagit checksums on the drive, etc.

All of the above should be scriptable in your workflow. Just need to decide
what you really want out of it.

Eby


On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub  wrote:

> Hello Josh,
>
> Auburn University is a member of two Private LOCKSS Networks: the
> MetaArchive Cooperative and the Alabama Digital Preservation Network
> (ADPNet).  Here's a link to a recent conference paper that describes both
> networks, including their current pricing structures:
>
> http://conference.ifla.org/past/ifla78/216-trehub-en.pdf
>
> LOCKSS has worked well for us so far, in part because supporting
> community-based solutions is important to us.  As you point out, however,
> Glacier is an attractive alternative, especially for institutions that may
> be more interested in low-cost, low-throughput storage and less concerned
> about entrusting their content to a commercial outfit or having to pay
> extra to get it back out.  As with most things, you pay your money--more or
> less, depending--and make your choice.  And take your risks.
>
> Good luck with whatever solution(s) you decide on.  They need not be
> mutually exclusive.
>
> Best,
>
> Aaron
>
> Aaron Trehub
> Assistant Dean for Technology and Technical Services
> Auburn University Libraries
> 231 Mell Street, RBD Library
> Auburn, AL 36849-5606
> Phone: (334) 844-1716
> Skype: ajtrehub
> E-mail: treh...@auburn.edu
> URL: http://lib.auburn.edu/
>
>


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Aaron Trehub
Hello Josh,

Auburn University is a member of two Private LOCKSS Networks: the MetaArchive 
Cooperative and the Alabama Digital Preservation Network (ADPNet).  Here's a 
link to a recent conference paper that describes both networks, including their 
current pricing structures:

http://conference.ifla.org/past/ifla78/216-trehub-en.pdf

LOCKSS has worked well for us so far, in part because supporting 
community-based solutions is important to us.  As you point out, however, 
Glacier is an attractive alternative, especially for institutions that may be 
more interested in low-cost, low-throughput storage and less concerned about 
entrusting their content to a commercial outfit or having to pay extra to get 
it back out.  As with most things, you pay your money--more or less, 
depending--and make your choice.  And take your risks.

Good luck with whatever solution(s) you decide on.  They need not be mutually 
exclusive.

Best,

Aaron

Aaron Trehub
Assistant Dean for Technology and Technical Services
Auburn University Libraries
231 Mell Street, RBD Library
Auburn, AL 36849-5606
Phone: (334) 844-1716
Skype: ajtrehub
E-mail: treh...@auburn.edu
URL: http://lib.auburn.edu/

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@listserv.nd.edu] On Behalf Of Joshua 
Welker
Sent: Friday, January 11, 2013 9:09 AM
To: CODE4LIB@listserv.nd.edu
Subject: Re: [CODE4LIB] Digital collection backups

Matt,

I appreciate the information. At that price, it looks like MetaArchive would be 
a better option than most of the other services mentioned in this thread. At 
this point, I think it is going to come down to a LOCKSS solution such as what 
MetaArchive provides or Amazon Glacier. We anticipate our digital collection 
growing to about 3TB in the first two years. With Glacier, that would be $368 
per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like 
to support library initiatives like LOCKSS, we are a small institution with a 
very small budget, and the pricing of Glacier is starting to look too good to 
pass up.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt 
Schultz
Sent: Friday, January 11, 2013 8:49 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi Josh,

Glad you are looking into LOCKSS as a potential solution for your needs and 
that you are thinking beyond simple backup solutions for more long-term 
preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve 
a range of content/collections from our member institutions.

The nice thing (I think) about our approach and our use of LOCKSS as an 
embedded technology is that you as an institution retain full control over your 
collections in the preservation network and get to play an active and on-going 
part in their preservation treatment over time. Storage costs in MetaArchive 
are competitive ($1/GB/year), and with that you get up to 7 geographic 
replications. MetaArchive is international at this point and so your 
collections really do achieve some safe distance from any disasters that may 
hit close to home.

I'd be more than happy to talk with you further about your collection needs, 
why we like LOCKSS, and any interest your institution may have in being part of 
a collaborative approach to preserving your content above and beyond simple 
backup. Feel free to contact me directly.

Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org 
matt.schu...@metaarchive.org
616-566-3204

On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker  wrote:

> Hi everyone,
>
> We are starting a digitization project for some of our special 
> collections, and we are having a hard time setting up a backup system 
> that meets the long-term preservation needs of digital archives. The 
> backup mechanisms currently used by campus IT are short-term full-server 
> backups.
> What we are looking for is more granular, file-level backup over the 
> very long term. Does anyone have any recommendations of software or 
> some service or technique? We are looking into LOCKSS but haven't dug too 
> deeply yet.
> Can anyone who uses LOCKSS tell me a bit of their experiences with it?
>
> Josh Welker
> Electronic/Media Services Librarian
> College Liaison
> University Libraries
> Southwest Baptist University
> 417.328.1624
>

--
Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org 
matt.schu...@metaarchive.org
616-566-3204


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Edward M. Corrado
Without looking into any other issues with Glaicer ()such as privacy,
security, etc.), it seems like it could be a good solution for
long-term backups of digital preservation. I am not sure I would use
it for regular backups of my digital preservation system, but for a
long-term off-site storage "insurance policy" it is worth looking
into. I can picture using it for bi-monthly or quarterly backups, for
instance. In this case it would be something you would never hope to
use, but it could be good to have it in case of a major disaster.

Edward

On Fri, Jan 11, 2013 at 11:27 AM, Cary Gordon  wrote:
> Restoring 3 Tb from Glacier is $370. Add about $90 if you use AWS
> Import/Export (you provide the device).
>
> Hopefully, this is not something that you would do often.
>
> Cary
>
> On Fri, Jan 11, 2013 at 8:14 AM, Matt Schultz
>  wrote:
>> Josh,
>>
>> Totally understand the resource constraints and the price comparison
>> up-front. As Roy alluded to earlier, it pays with Glacier to envision what
>> your content retrieval scenarios might be, because that $368 up-front could
>> very easily balloon in situations where you are needing to restore a
>> collection(s) en-masse at a later date. Amazon Glacier as a service makes
>> their money on that end. In MetaArchive there is currently no charge for
>> collection retrieval for the sake of a restoration. You are also subject
>> and powerless over the long-term to Amazon's price hikes with Glacier.
>> Because we are a Cooperative, our members collaboratively work together
>> annually to determine technology preferences, vendors, pricing, cost
>> control, etc. You have a direct seat at the table to help steer the
>> solution in your direction.
>>
>> On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker  wrote:
>>
>>> Matt,
>>>
>>> I appreciate the information. At that price, it looks like MetaArchive
>>> would be a better option than most of the other services mentioned in this
>>> thread. At this point, I think it is going to come down to a LOCKSS
>>> solution such as what MetaArchive provides or Amazon Glacier. We anticipate
>>> our digital collection growing to about 3TB in the first two years. With
>>> Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and
>>> LOCKSS. As much as I would like to support library initiatives like LOCKSS,
>>> we are a small institution with a very small budget, and the pricing of
>>> Glacier is starting to look too good to pass up.
>>>
>>> Josh Welker
>>>
>>>
>>> -Original Message-
>>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>>> Matt Schultz
>>> Sent: Friday, January 11, 2013 8:49 AM
>>> To: CODE4LIB@LISTSERV.ND.EDU
>>> Subject: Re: [CODE4LIB] Digital collection backups
>>>
>>> Hi Josh,
>>>
>>> Glad you are looking into LOCKSS as a potential solution for your needs
>>> and that you are thinking beyond simple backup solutions for more long-term
>>> preservation. Here at MetaArchive Cooperative we make use of LOCKSS to
>>> preserve a range of content/collections from our member institutions.
>>>
>>> The nice thing (I think) about our approach and our use of LOCKSS as an
>>> embedded technology is that you as an institution retain full control over
>>> your collections in the preservation network and get to play an active and
>>> on-going part in their preservation treatment over time. Storage costs in
>>> MetaArchive are competitive ($1/GB/year), and with that you get up to 7
>>> geographic replications. MetaArchive is international at this point and so
>>> your collections really do achieve some safe distance from any disasters
>>> that may hit close to home.
>>>
>>> I'd be more than happy to talk with you further about your collection
>>> needs, why we like LOCKSS, and any interest your institution may have in
>>> being part of a collaborative approach to preserving your content above and
>>> beyond simple backup. Feel free to contact me directly.
>>>
>>> Matt Schultz
>>> Program Manager
>>> Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
>>> matt.schu...@metaarchive.org
>>> 616-566-3204
>>>
>>> On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker  wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > We are starting a digitization project for some of our special
>>> > collections, and we are having a hard time setting u

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Cary Gordon
Restoring 3 Tb from Glacier is $370. Add about $90 if you use AWS
Import/Export (you provide the device).

Hopefully, this is not something that you would do often.

Cary

On Fri, Jan 11, 2013 at 8:14 AM, Matt Schultz
 wrote:
> Josh,
>
> Totally understand the resource constraints and the price comparison
> up-front. As Roy alluded to earlier, it pays with Glacier to envision what
> your content retrieval scenarios might be, because that $368 up-front could
> very easily balloon in situations where you are needing to restore a
> collection(s) en-masse at a later date. Amazon Glacier as a service makes
> their money on that end. In MetaArchive there is currently no charge for
> collection retrieval for the sake of a restoration. You are also subject
> and powerless over the long-term to Amazon's price hikes with Glacier.
> Because we are a Cooperative, our members collaboratively work together
> annually to determine technology preferences, vendors, pricing, cost
> control, etc. You have a direct seat at the table to help steer the
> solution in your direction.
>
> On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker  wrote:
>
>> Matt,
>>
>> I appreciate the information. At that price, it looks like MetaArchive
>> would be a better option than most of the other services mentioned in this
>> thread. At this point, I think it is going to come down to a LOCKSS
>> solution such as what MetaArchive provides or Amazon Glacier. We anticipate
>> our digital collection growing to about 3TB in the first two years. With
>> Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and
>> LOCKSS. As much as I would like to support library initiatives like LOCKSS,
>> we are a small institution with a very small budget, and the pricing of
>> Glacier is starting to look too good to pass up.
>>
>> Josh Welker
>>
>>
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>> Matt Schultz
>> Sent: Friday, January 11, 2013 8:49 AM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] Digital collection backups
>>
>> Hi Josh,
>>
>> Glad you are looking into LOCKSS as a potential solution for your needs
>> and that you are thinking beyond simple backup solutions for more long-term
>> preservation. Here at MetaArchive Cooperative we make use of LOCKSS to
>> preserve a range of content/collections from our member institutions.
>>
>> The nice thing (I think) about our approach and our use of LOCKSS as an
>> embedded technology is that you as an institution retain full control over
>> your collections in the preservation network and get to play an active and
>> on-going part in their preservation treatment over time. Storage costs in
>> MetaArchive are competitive ($1/GB/year), and with that you get up to 7
>> geographic replications. MetaArchive is international at this point and so
>> your collections really do achieve some safe distance from any disasters
>> that may hit close to home.
>>
>> I'd be more than happy to talk with you further about your collection
>> needs, why we like LOCKSS, and any interest your institution may have in
>> being part of a collaborative approach to preserving your content above and
>> beyond simple backup. Feel free to contact me directly.
>>
>> Matt Schultz
>> Program Manager
>> Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
>> matt.schu...@metaarchive.org
>> 616-566-3204
>>
>> On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker  wrote:
>>
>> > Hi everyone,
>> >
>> > We are starting a digitization project for some of our special
>> > collections, and we are having a hard time setting up a backup system
>> > that meets the long-term preservation needs of digital archives. The
>> > backup mechanisms currently used by campus IT are short-term full-server
>> backups.
>> > What we are looking for is more granular, file-level backup over the
>> > very long term. Does anyone have any recommendations of software or
>> > some service or technique? We are looking into LOCKSS but haven't dug
>> too deeply yet.
>> > Can anyone who uses LOCKSS tell me a bit of their experiences with it?
>> >
>> > Josh Welker
>> > Electronic/Media Services Librarian
>> > College Liaison
>> > University Libraries
>> > Southwest Baptist University
>> > 417.328.1624
>> >
>>
>>
>>
>> --
>> Matt Schultz
>> Program Manager
>> Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
>> matt.schu...@metaarchive.org
>> 616-566-3204
>>
>
>
>
> --
> Matt Schultz
> Program Manager
> Educopia Institute, MetaArchive Cooperative
> http://www.metaarchive.org
> matt.schu...@metaarchive.org
> 616-566-3204



-- 
Cary Gordon
The Cherry Hill Company
http://chillco.com


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Cary Gordon
In my experience, writable DVDs are not a stable backup medium. If you
really want to go the DIY, simple as possible route, I suggest that
you get 3-4 of the drives and rotate them.

On Fri, Jan 11, 2013 at 7:34 AM, James Gilbert
 wrote:
> Hi Josh,
>
> I lurked on this thread, as I did not know the size of your institution.
>
> Being a public library serving about 24,000 residents - we have the
> small-institution issues as well for this type of project. We recently
> tackled a similar situation and the solution:
>
> 1) Purchase a 3TB SeaGate external network storage device (residential drive
> from Best Buy)
> 2) Burn archived materials to DVD
> 3) Copy files to external storage (on site in my server room)
> 4) DVDs reside off-site (we are still determining where this would be, as
> the library does not have a Safe Deposit Box)
>
> This removes external companies, and the data is quick trip home and back.
>
> I know it is not elaborate and fancy, very little code... but it was $150
> for the drive; and cost of DVDs.
>
> James Gilbert, BS, MLIS
> Systems Librarian
> Whitehall Township Public Library
> 3700 Mechanicsville Road
> Whitehall, PA 18052
>
> 610-432-4330 ext: 203
>
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Joshua Welker
> Sent: Friday, January 11, 2013 10:09 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Digital collection backups
>
> Matt,
>
> I appreciate the information. At that price, it looks like MetaArchive would
> be a better option than most of the other services mentioned in this thread.
> At this point, I think it is going to come down to a LOCKSS solution such as
> what MetaArchive provides or Amazon Glacier. We anticipate our digital
> collection growing to about 3TB in the first two years. With Glacier, that
> would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As
> much as I would like to support library initiatives like LOCKSS, we are a
> small institution with a very small budget, and the pricing of Glacier is
> starting to look too good to pass up.
>
> Josh Welker
>
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt
> Schultz
> Sent: Friday, January 11, 2013 8:49 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Digital collection backups
>
> Hi Josh,
>
> Glad you are looking into LOCKSS as a potential solution for your needs and
> that you are thinking beyond simple backup solutions for more long-term
> preservation. Here at MetaArchive Cooperative we make use of LOCKSS to
> preserve a range of content/collections from our member institutions.
>
> The nice thing (I think) about our approach and our use of LOCKSS as an
> embedded technology is that you as an institution retain full control over
> your collections in the preservation network and get to play an active and
> on-going part in their preservation treatment over time. Storage costs in
> MetaArchive are competitive ($1/GB/year), and with that you get up to 7
> geographic replications. MetaArchive is international at this point and so
> your collections really do achieve some safe distance from any disasters
> that may hit close to home.
>
> I'd be more than happy to talk with you further about your collection needs,
> why we like LOCKSS, and any interest your institution may have in being part
> of a collaborative approach to preserving your content above and beyond
> simple backup. Feel free to contact me directly.
>
> Matt Schultz
> Program Manager
> Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
> matt.schu...@metaarchive.org
> 616-566-3204
>
> On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker  wrote:
>
>> Hi everyone,
>>
>> We are starting a digitization project for some of our special
>> collections, and we are having a hard time setting up a backup system
>> that meets the long-term preservation needs of digital archives. The
>> backup mechanisms currently used by campus IT are short-term full-server
> backups.
>> What we are looking for is more granular, file-level backup over the
>> very long term. Does anyone have any recommendations of software or
>> some service or technique? We are looking into LOCKSS but haven't dug too
> deeply yet.
>> Can anyone who uses LOCKSS tell me a bit of their experiences with it?
>>
>> Josh Welker
>> Electronic/Media Services Librarian
>> College Liaison
>> University Libraries
>> Southwest Baptist University
>> 417.328.1624
>>
>
>
>
> --
> Matt Schultz
> Program Manager
> Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
> matt.schu...@metaarchive.org
> 616-566-3204



-- 
Cary Gordon
The Cherry Hill Company
http://chillco.com


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Matt Schultz
Josh,

Totally understand the resource constraints and the price comparison
up-front. As Roy alluded to earlier, it pays with Glacier to envision what
your content retrieval scenarios might be, because that $368 up-front could
very easily balloon in situations where you are needing to restore a
collection(s) en-masse at a later date. Amazon Glacier as a service makes
their money on that end. In MetaArchive there is currently no charge for
collection retrieval for the sake of a restoration. You are also subject
and powerless over the long-term to Amazon's price hikes with Glacier.
Because we are a Cooperative, our members collaboratively work together
annually to determine technology preferences, vendors, pricing, cost
control, etc. You have a direct seat at the table to help steer the
solution in your direction.

On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker  wrote:

> Matt,
>
> I appreciate the information. At that price, it looks like MetaArchive
> would be a better option than most of the other services mentioned in this
> thread. At this point, I think it is going to come down to a LOCKSS
> solution such as what MetaArchive provides or Amazon Glacier. We anticipate
> our digital collection growing to about 3TB in the first two years. With
> Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and
> LOCKSS. As much as I would like to support library initiatives like LOCKSS,
> we are a small institution with a very small budget, and the pricing of
> Glacier is starting to look too good to pass up.
>
> Josh Welker
>
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Matt Schultz
> Sent: Friday, January 11, 2013 8:49 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Digital collection backups
>
> Hi Josh,
>
> Glad you are looking into LOCKSS as a potential solution for your needs
> and that you are thinking beyond simple backup solutions for more long-term
> preservation. Here at MetaArchive Cooperative we make use of LOCKSS to
> preserve a range of content/collections from our member institutions.
>
> The nice thing (I think) about our approach and our use of LOCKSS as an
> embedded technology is that you as an institution retain full control over
> your collections in the preservation network and get to play an active and
> on-going part in their preservation treatment over time. Storage costs in
> MetaArchive are competitive ($1/GB/year), and with that you get up to 7
> geographic replications. MetaArchive is international at this point and so
> your collections really do achieve some safe distance from any disasters
> that may hit close to home.
>
> I'd be more than happy to talk with you further about your collection
> needs, why we like LOCKSS, and any interest your institution may have in
> being part of a collaborative approach to preserving your content above and
> beyond simple backup. Feel free to contact me directly.
>
> Matt Schultz
> Program Manager
> Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
> matt.schu...@metaarchive.org
> 616-566-3204
>
> On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker  wrote:
>
> > Hi everyone,
> >
> > We are starting a digitization project for some of our special
> > collections, and we are having a hard time setting up a backup system
> > that meets the long-term preservation needs of digital archives. The
> > backup mechanisms currently used by campus IT are short-term full-server
> backups.
> > What we are looking for is more granular, file-level backup over the
> > very long term. Does anyone have any recommendations of software or
> > some service or technique? We are looking into LOCKSS but haven't dug
> too deeply yet.
> > Can anyone who uses LOCKSS tell me a bit of their experiences with it?
> >
> > Josh Welker
> > Electronic/Media Services Librarian
> > College Liaison
> > University Libraries
> > Southwest Baptist University
> > 417.328.1624
> >
>
>
>
> --
> Matt Schultz
> Program Manager
> Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
> matt.schu...@metaarchive.org
> 616-566-3204
>



-- 
Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative
http://www.metaarchive.org
matt.schu...@metaarchive.org
616-566-3204


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
James,

Definitely a simple and elegant solution, but that is not a viable long-term 
option for us. We currently have tons of old CDs and DVDs full of data, and one 
of our goals is to wean off those media completely.  Most consumer-grade CDs 
and DVDs are very poor in terms of long-term data integrity. Those discs have a 
shelf life of probably a decade or two tops. Plus we are wanting more 
redundancy than what is offered by having the backups as a collection of discs 
in a single physical location. But if that works for you guys, power to you. 
Cheap is good.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of James 
Gilbert
Sent: Friday, January 11, 2013 9:34 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi Josh,

I lurked on this thread, as I did not know the size of your institution.

Being a public library serving about 24,000 residents - we have the 
small-institution issues as well for this type of project. We recently tackled 
a similar situation and the solution:

1) Purchase a 3TB SeaGate external network storage device (residential drive 
from Best Buy)
2) Burn archived materials to DVD
3) Copy files to external storage (on site in my server room)
4) DVDs reside off-site (we are still determining where this would be, as the 
library does not have a Safe Deposit Box)

This removes external companies, and the data is quick trip home and back.

I know it is not elaborate and fancy, very little code... but it was $150 for 
the drive; and cost of DVDs. 

James Gilbert, BS, MLIS
Systems Librarian
Whitehall Township Public Library
3700 Mechanicsville Road
Whitehall, PA 18052
 
610-432-4330 ext: 203


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joshua 
Welker
Sent: Friday, January 11, 2013 10:09 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Matt,

I appreciate the information. At that price, it looks like MetaArchive would be 
a better option than most of the other services mentioned in this thread.
At this point, I think it is going to come down to a LOCKSS solution such as 
what MetaArchive provides or Amazon Glacier. We anticipate our digital 
collection growing to about 3TB in the first two years. With Glacier, that 
would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As much 
as I would like to support library initiatives like LOCKSS, we are a small 
institution with a very small budget, and the pricing of Glacier is starting to 
look too good to pass up.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt 
Schultz
Sent: Friday, January 11, 2013 8:49 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi Josh,

Glad you are looking into LOCKSS as a potential solution for your needs and 
that you are thinking beyond simple backup solutions for more long-term 
preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve 
a range of content/collections from our member institutions.

The nice thing (I think) about our approach and our use of LOCKSS as an 
embedded technology is that you as an institution retain full control over your 
collections in the preservation network and get to play an active and on-going 
part in their preservation treatment over time. Storage costs in MetaArchive 
are competitive ($1/GB/year), and with that you get up to 7 geographic 
replications. MetaArchive is international at this point and so your 
collections really do achieve some safe distance from any disasters that may 
hit close to home.

I'd be more than happy to talk with you further about your collection needs, 
why we like LOCKSS, and any interest your institution may have in being part of 
a collaborative approach to preserving your content above and beyond simple 
backup. Feel free to contact me directly.

Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org 
matt.schu...@metaarchive.org
616-566-3204

On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker  wrote:

> Hi everyone,
>
> We are starting a digitization project for some of our special 
> collections, and we are having a hard time setting up a backup system 
> that meets the long-term preservation needs of digital archives. The 
> backup mechanisms currently used by campus IT are short-term 
> full-server
backups.
> What we are looking for is more granular, file-level backup over the 
> very long term. Does anyone have any recommendations of software or 
> some service or technique? We are looking into LOCKSS but haven't dug 
> too
deeply yet.
> Can anyone who uses LOCKSS tell me a bit of their experiences with it?
>
> Josh Welker
> Electronic/Media Services Librarian
> College Liaison
> University Libraries
> Southwes

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread James Gilbert
Hi Josh,

I lurked on this thread, as I did not know the size of your institution.

Being a public library serving about 24,000 residents - we have the
small-institution issues as well for this type of project. We recently
tackled a similar situation and the solution:

1) Purchase a 3TB SeaGate external network storage device (residential drive
from Best Buy)
2) Burn archived materials to DVD
3) Copy files to external storage (on site in my server room)
4) DVDs reside off-site (we are still determining where this would be, as
the library does not have a Safe Deposit Box)

This removes external companies, and the data is quick trip home and back.

I know it is not elaborate and fancy, very little code... but it was $150
for the drive; and cost of DVDs. 

James Gilbert, BS, MLIS
Systems Librarian
Whitehall Township Public Library
3700 Mechanicsville Road
Whitehall, PA 18052
 
610-432-4330 ext: 203


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Joshua Welker
Sent: Friday, January 11, 2013 10:09 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Matt,

I appreciate the information. At that price, it looks like MetaArchive would
be a better option than most of the other services mentioned in this thread.
At this point, I think it is going to come down to a LOCKSS solution such as
what MetaArchive provides or Amazon Glacier. We anticipate our digital
collection growing to about 3TB in the first two years. With Glacier, that
would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As
much as I would like to support library initiatives like LOCKSS, we are a
small institution with a very small budget, and the pricing of Glacier is
starting to look too good to pass up.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt
Schultz
Sent: Friday, January 11, 2013 8:49 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi Josh,

Glad you are looking into LOCKSS as a potential solution for your needs and
that you are thinking beyond simple backup solutions for more long-term
preservation. Here at MetaArchive Cooperative we make use of LOCKSS to
preserve a range of content/collections from our member institutions.

The nice thing (I think) about our approach and our use of LOCKSS as an
embedded technology is that you as an institution retain full control over
your collections in the preservation network and get to play an active and
on-going part in their preservation treatment over time. Storage costs in
MetaArchive are competitive ($1/GB/year), and with that you get up to 7
geographic replications. MetaArchive is international at this point and so
your collections really do achieve some safe distance from any disasters
that may hit close to home.

I'd be more than happy to talk with you further about your collection needs,
why we like LOCKSS, and any interest your institution may have in being part
of a collaborative approach to preserving your content above and beyond
simple backup. Feel free to contact me directly.

Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
matt.schu...@metaarchive.org
616-566-3204

On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker  wrote:

> Hi everyone,
>
> We are starting a digitization project for some of our special 
> collections, and we are having a hard time setting up a backup system 
> that meets the long-term preservation needs of digital archives. The 
> backup mechanisms currently used by campus IT are short-term full-server
backups.
> What we are looking for is more granular, file-level backup over the 
> very long term. Does anyone have any recommendations of software or 
> some service or technique? We are looking into LOCKSS but haven't dug too
deeply yet.
> Can anyone who uses LOCKSS tell me a bit of their experiences with it?
>
> Josh Welker
> Electronic/Media Services Librarian
> College Liaison
> University Libraries
> Southwest Baptist University
> 417.328.1624
>



--
Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org
matt.schu...@metaarchive.org
616-566-3204


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Al Matthews
http://metaarchive.org/costs in our case. Interested to hear other
experiences. Al


On 1/11/13 10:01 AM, "Joshua Welker"  wrote:

>Thanks, Al. I think we'd join a LOCKSS network rather than run multiple
>LOCKSS boxes ourselves. Does anyone have any experience with one of
>those, like the LOCKSS Global Alliance?
>
>Josh Welker
>
>
>-Original Message-
>From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>Al Matthews
>Sent: Friday, January 11, 2013 8:50 AM
>To: CODE4LIB@LISTSERV.ND.EDU
>Subject: Re: [CODE4LIB] Digital collection backups
>
>We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is
>typically spec-d for consumer hardware, and so, presumably as a result of
>SE Asia flooding, there have been some drive failures and cache downtimes
>and adjustments accordingly.
>
>However, that is the worst of it, first.
>
>LOCKSS is to some perhaps even considerable degree, tamper-resistant
>since it relies on mechanisms of collective polling among multiple copies
>to preserve integrity. This, as opposed to static checksums or some other
>solution.
>
>As such, it seems to me important to run a LOCKSS box with other LOCKSS
>boxes; MA cooperative specifies six or so, distributed locations for each
>cache.
>
>The economic sustainability of such an enterprise is a valid question.
>David S H Rosenthal at Stanford seems to lead the charge for this
>research.
>
>e.g.
>http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more
>
>I've heard mention from other players that they watch MA carefully for
>such sustainability considerations, especially because MA uses LOCKSS for
>non-journal content. In some sense this may extend LOCKSS beyond its
>original design.
>
>MetaArchive has in my opinion been extremely responsible in designating
>succession scenarios and disaster recovery scenarios, going to far as to
>fund, develop and test services for migration out of the system, into an
>IRODS repository in the initial case.
>
>
>Al Matthews
>AUC Robert W. Woodruff Library
>
>On 1/11/13 9:10 AM, "Joshua Welker"  wrote:
>
>>Good point. But since campus IT will be creating regular
>>disaster-recovery backups, the odds that we'd need ever need to
>>retrieve more than a handful of files from Glacier at a time is pretty
>>low.
>>
>>Josh Welker
>>
>>
>>-Original Message-
>>From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>>Gary McGath
>>Sent: Friday, January 11, 2013 8:03 AM
>>To: CODE4LIB@LISTSERV.ND.EDU
>>Subject: Re: [CODE4LIB] Digital collection backups
>>
>>Concerns have been raised about how expensive Glacier gets if you need
>>to recover a lot of files in a short time period.
>>
>>http://www.wired.com/wiredenterprise/2012/08/glacier/
>>
>>On 1/10/13 5:56 PM, Roy Tennant wrote:
>>> I'd also take a look at Amazon Glacier. Recently I parked about 50GB
>>> of data files in logical tar'd and gzip'd chunks and it's costing my
>>> employer less than 50 cents/month. Glacier, however, is best for
>>> "park it and forget" kinds of needs, as the real cost is in data flow.
>>> Storage is cheap, but must be considered "offline" or "near line" as
>>> you must first request to retrieve a file, wait for about a day, and
>>> then retrieve the file. And you're charged more for the download
>>> throughput than just about anything.
>>>
>>> I'm using a Unix client to handle all of the heavy lifting of
>>> uploading and downloading, as Glacier is meant to be used via an API
>>> rather than a web client.[1] If anyone is interested, I have local
>>> documentation on usage that I could probably genericize. And yes, I
>>> did round-trip a file to make sure it functioned as advertised.
>>> Roy
>>>
>>> [1] https://github.com/vsespb/mt-aws-glacier
>>>
>>> On Thu, Jan 10, 2013 at 2:29 PM,  
>>>wrote:
>>>> We built our own solution for this by creating a plugin that works
>>>>with our digital asset management system (ResourceSpace) to
>>>>invidually back up files to Amazon S3. Because S3 is replicated to
>>>>multiple data centers, this provides a fairly high level of
>>>>redundancy. And because it's an object-based web service, we can
>>>>access any given object individually by using a URL related to the
>>>>original storage URL within our system.
>>>>
>>>> This also allows us to take advantage of S3 for images on our

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Matt,

I appreciate the information. At that price, it looks like MetaArchive would be 
a better option than most of the other services mentioned in this thread. At 
this point, I think it is going to come down to a LOCKSS solution such as what 
MetaArchive provides or Amazon Glacier. We anticipate our digital collection 
growing to about 3TB in the first two years. With Glacier, that would be $368 
per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like 
to support library initiatives like LOCKSS, we are a small institution with a 
very small budget, and the pricing of Glacier is starting to look too good to 
pass up.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt 
Schultz
Sent: Friday, January 11, 2013 8:49 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Hi Josh,

Glad you are looking into LOCKSS as a potential solution for your needs and 
that you are thinking beyond simple backup solutions for more long-term 
preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve 
a range of content/collections from our member institutions.

The nice thing (I think) about our approach and our use of LOCKSS as an 
embedded technology is that you as an institution retain full control over your 
collections in the preservation network and get to play an active and on-going 
part in their preservation treatment over time. Storage costs in MetaArchive 
are competitive ($1/GB/year), and with that you get up to 7 geographic 
replications. MetaArchive is international at this point and so your 
collections really do achieve some safe distance from any disasters that may 
hit close to home.

I'd be more than happy to talk with you further about your collection needs, 
why we like LOCKSS, and any interest your institution may have in being part of 
a collaborative approach to preserving your content above and beyond simple 
backup. Feel free to contact me directly.

Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org 
matt.schu...@metaarchive.org
616-566-3204

On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker  wrote:

> Hi everyone,
>
> We are starting a digitization project for some of our special 
> collections, and we are having a hard time setting up a backup system 
> that meets the long-term preservation needs of digital archives. The 
> backup mechanisms currently used by campus IT are short-term full-server 
> backups.
> What we are looking for is more granular, file-level backup over the 
> very long term. Does anyone have any recommendations of software or 
> some service or technique? We are looking into LOCKSS but haven't dug too 
> deeply yet.
> Can anyone who uses LOCKSS tell me a bit of their experiences with it?
>
> Josh Welker
> Electronic/Media Services Librarian
> College Liaison
> University Libraries
> Southwest Baptist University
> 417.328.1624
>



--
Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org 
matt.schu...@metaarchive.org
616-566-3204


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Thanks, Al. I think we'd join a LOCKSS network rather than run multiple LOCKSS 
boxes ourselves. Does anyone have any experience with one of those, like the 
LOCKSS Global Alliance?

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Al 
Matthews
Sent: Friday, January 11, 2013 8:50 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is typically 
spec-d for consumer hardware, and so, presumably as a result of SE Asia 
flooding, there have been some drive failures and cache downtimes and 
adjustments accordingly.

However, that is the worst of it, first.

LOCKSS is to some perhaps even considerable degree, tamper-resistant since it 
relies on mechanisms of collective polling among multiple copies to preserve 
integrity. This, as opposed to static checksums or some other solution.

As such, it seems to me important to run a LOCKSS box with other LOCKSS boxes; 
MA cooperative specifies six or so, distributed locations for each cache.

The economic sustainability of such an enterprise is a valid question.
David S H Rosenthal at Stanford seems to lead the charge for this research.

e.g. http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more

I've heard mention from other players that they watch MA carefully for such 
sustainability considerations, especially because MA uses LOCKSS for 
non-journal content. In some sense this may extend LOCKSS beyond its original 
design.

MetaArchive has in my opinion been extremely responsible in designating 
succession scenarios and disaster recovery scenarios, going to far as to fund, 
develop and test services for migration out of the system, into an IRODS 
repository in the initial case.


Al Matthews
AUC Robert W. Woodruff Library

On 1/11/13 9:10 AM, "Joshua Welker"  wrote:

>Good point. But since campus IT will be creating regular 
>disaster-recovery backups, the odds that we'd need ever need to 
>retrieve more than a handful of files from Glacier at a time is pretty low.
>
>Josh Welker
>
>
>-Original Message-
>From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
>Gary McGath
>Sent: Friday, January 11, 2013 8:03 AM
>To: CODE4LIB@LISTSERV.ND.EDU
>Subject: Re: [CODE4LIB] Digital collection backups
>
>Concerns have been raised about how expensive Glacier gets if you need 
>to recover a lot of files in a short time period.
>
>http://www.wired.com/wiredenterprise/2012/08/glacier/
>
>On 1/10/13 5:56 PM, Roy Tennant wrote:
>> I'd also take a look at Amazon Glacier. Recently I parked about 50GB 
>> of data files in logical tar'd and gzip'd chunks and it's costing my 
>> employer less than 50 cents/month. Glacier, however, is best for 
>> "park it and forget" kinds of needs, as the real cost is in data flow.
>> Storage is cheap, but must be considered "offline" or "near line" as 
>> you must first request to retrieve a file, wait for about a day, and 
>> then retrieve the file. And you're charged more for the download 
>> throughput than just about anything.
>>
>> I'm using a Unix client to handle all of the heavy lifting of 
>> uploading and downloading, as Glacier is meant to be used via an API 
>> rather than a web client.[1] If anyone is interested, I have local 
>> documentation on usage that I could probably genericize. And yes, I 
>> did round-trip a file to make sure it functioned as advertised.
>> Roy
>>
>> [1] https://github.com/vsespb/mt-aws-glacier
>>
>> On Thu, Jan 10, 2013 at 2:29 PM,  
>>wrote:
>>> We built our own solution for this by creating a plugin that works 
>>>with our digital asset management system (ResourceSpace) to 
>>>invidually back up files to Amazon S3. Because S3 is replicated to 
>>>multiple data centers, this provides a fairly high level of 
>>>redundancy. And because it's an object-based web service, we can 
>>>access any given object individually by using a URL related to the 
>>>original storage URL within our system.
>>>
>>> This also allows us to take advantage of S3 for images on our website.
>>>All of the images from in our online collections database are being 
>>>served straight from S3, which diverts the load from our public web 
>>>server. When we launch zoomable images later this year, all of the 
>>>tiles will also be generated locally in the DAM and then served to 
>>>the public via the mirrored copy in S3.
>>>
>>> The current pricing is around $0.08/GB/month for 1-50 TB, which I 
>>>think is fairly reasonable for

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Al Matthews
We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is
typically spec-d for consumer hardware, and so, presumably as a result of
SE Asia flooding, there have been some drive failures and cache downtimes
and adjustments accordingly.

However, that is the worst of it, first.

LOCKSS is to some perhaps even considerable degree, tamper-resistant since
it relies on mechanisms of collective polling among multiple copies to
preserve integrity. This, as opposed to static checksums or some other
solution.

As such, it seems to me important to run a LOCKSS box with other LOCKSS
boxes; MA cooperative specifies six or so, distributed locations for each
cache.

The economic sustainability of such an enterprise is a valid question.
David S H Rosenthal at Stanford seems to lead the charge for this research.

e.g. http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more

I've heard mention from other players that they watch MA carefully for
such sustainability considerations, especially because MA uses LOCKSS for
non-journal content. In some sense this may extend LOCKSS beyond its
original design.

MetaArchive has in my opinion been extremely responsible in designating
succession scenarios and disaster recovery scenarios, going to far as to
fund, develop and test services for migration out of the system, into an
IRODS repository in the initial case.


Al Matthews
AUC Robert W. Woodruff Library

On 1/11/13 9:10 AM, "Joshua Welker"  wrote:

>Good point. But since campus IT will be creating regular
>disaster-recovery backups, the odds that we'd need ever need to retrieve
>more than a handful of files from Glacier at a time is pretty low.
>
>Josh Welker
>
>
>-Original Message-
>From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>Gary McGath
>Sent: Friday, January 11, 2013 8:03 AM
>To: CODE4LIB@LISTSERV.ND.EDU
>Subject: Re: [CODE4LIB] Digital collection backups
>
>Concerns have been raised about how expensive Glacier gets if you need to
>recover a lot of files in a short time period.
>
>http://www.wired.com/wiredenterprise/2012/08/glacier/
>
>On 1/10/13 5:56 PM, Roy Tennant wrote:
>> I'd also take a look at Amazon Glacier. Recently I parked about 50GB
>> of data files in logical tar'd and gzip'd chunks and it's costing my
>> employer less than 50 cents/month. Glacier, however, is best for "park
>> it and forget" kinds of needs, as the real cost is in data flow.
>> Storage is cheap, but must be considered "offline" or "near line" as
>> you must first request to retrieve a file, wait for about a day, and
>> then retrieve the file. And you're charged more for the download
>> throughput than just about anything.
>>
>> I'm using a Unix client to handle all of the heavy lifting of
>> uploading and downloading, as Glacier is meant to be used via an API
>> rather than a web client.[1] If anyone is interested, I have local
>> documentation on usage that I could probably genericize. And yes, I
>> did round-trip a file to make sure it functioned as advertised.
>> Roy
>>
>> [1] https://github.com/vsespb/mt-aws-glacier
>>
>> On Thu, Jan 10, 2013 at 2:29 PM,  
>>wrote:
>>> We built our own solution for this by creating a plugin that works
>>>with our digital asset management system (ResourceSpace) to invidually
>>>back up files to Amazon S3. Because S3 is replicated to multiple data
>>>centers, this provides a fairly high level of redundancy. And because
>>>it's an object-based web service, we can access any given object
>>>individually by using a URL related to the original storage URL within
>>>our system.
>>>
>>> This also allows us to take advantage of S3 for images on our website.
>>>All of the images from in our online collections database are being
>>>served straight from S3, which diverts the load from our public web
>>>server. When we launch zoomable images later this year, all of the
>>>tiles will also be generated locally in the DAM and then served to the
>>>public via the mirrored copy in S3.
>>>
>>> The current pricing is around $0.08/GB/month for 1-50 TB, which I
>>>think is fairly reasonable for what we're getting. They just dropped
>>>the price substantially a few months ago.
>>>
>>> DuraCloud http://www.duracloud.org/ supposedly offers a way to add
>>>another abstraction layer so you can build something like this that is
>>>portable between different cloud storage providers. But I haven't
>>>really looked into this as of yet.
>
>
>--
>Gary McGath, Professional Software Developer http

Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Matt Schultz
Hi Josh,

Glad you are looking into LOCKSS as a potential solution for your needs and
that you are thinking beyond simple backup solutions for more long-term
preservation. Here at MetaArchive Cooperative we make use of LOCKSS to
preserve a range of content/collections from our member institutions.

The nice thing (I think) about our approach and our use of LOCKSS as an
embedded technology is that you as an institution retain full control over
your collections in the preservation network and get to play an active and
on-going part in their preservation treatment over time. Storage costs in
MetaArchive are competitive ($1/GB/year), and with that you get up to 7
geographic replications. MetaArchive is international at this point and so
your collections really do achieve some safe distance from any disasters
that may hit close to home.

I'd be more than happy to talk with you further about your collection
needs, why we like LOCKSS, and any interest your institution may have in
being part of a collaborative approach to preserving your content above and
beyond simple backup. Feel free to contact me directly.

Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative
http://www.metaarchive.org
matt.schu...@metaarchive.org
616-566-3204

On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker  wrote:

> Hi everyone,
>
> We are starting a digitization project for some of our special
> collections, and we are having a hard time setting up a backup system that
> meets the long-term preservation needs of digital archives. The backup
> mechanisms currently used by campus IT are short-term full-server backups.
> What we are looking for is more granular, file-level backup over the very
> long term. Does anyone have any recommendations of software or some service
> or technique? We are looking into LOCKSS but haven't dug too deeply yet.
> Can anyone who uses LOCKSS tell me a bit of their experiences with it?
>
> Josh Welker
> Electronic/Media Services Librarian
> College Liaison
> University Libraries
> Southwest Baptist University
> 417.328.1624
>



-- 
Matt Schultz
Program Manager
Educopia Institute, MetaArchive Cooperative
http://www.metaarchive.org
matt.schu...@metaarchive.org
616-566-3204


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Good point. But since campus IT will be creating regular disaster-recovery 
backups, the odds that we'd need ever need to retrieve more than a handful of 
files from Glacier at a time is pretty low. 

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Gary 
McGath
Sent: Friday, January 11, 2013 8:03 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

Concerns have been raised about how expensive Glacier gets if you need to 
recover a lot of files in a short time period.

http://www.wired.com/wiredenterprise/2012/08/glacier/

On 1/10/13 5:56 PM, Roy Tennant wrote:
> I'd also take a look at Amazon Glacier. Recently I parked about 50GB 
> of data files in logical tar'd and gzip'd chunks and it's costing my 
> employer less than 50 cents/month. Glacier, however, is best for "park 
> it and forget" kinds of needs, as the real cost is in data flow.
> Storage is cheap, but must be considered "offline" or "near line" as 
> you must first request to retrieve a file, wait for about a day, and 
> then retrieve the file. And you're charged more for the download 
> throughput than just about anything.
> 
> I'm using a Unix client to handle all of the heavy lifting of 
> uploading and downloading, as Glacier is meant to be used via an API 
> rather than a web client.[1] If anyone is interested, I have local 
> documentation on usage that I could probably genericize. And yes, I 
> did round-trip a file to make sure it functioned as advertised.
> Roy
> 
> [1] https://github.com/vsespb/mt-aws-glacier
> 
> On Thu, Jan 10, 2013 at 2:29 PM,   wrote:
>> We built our own solution for this by creating a plugin that works with our 
>> digital asset management system (ResourceSpace) to invidually back up files 
>> to Amazon S3. Because S3 is replicated to multiple data centers, this 
>> provides a fairly high level of redundancy. And because it's an object-based 
>> web service, we can access any given object individually by using a URL 
>> related to the original storage URL within our system.
>>
>> This also allows us to take advantage of S3 for images on our website. All 
>> of the images from in our online collections database are being served 
>> straight from S3, which diverts the load from our public web server. When we 
>> launch zoomable images later this year, all of the tiles will also be 
>> generated locally in the DAM and then served to the public via the mirrored 
>> copy in S3.
>>
>> The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
>> fairly reasonable for what we're getting. They just dropped the price 
>> substantially a few months ago.
>>
>> DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
>> abstraction layer so you can build something like this that is portable 
>> between different cloud storage providers. But I haven't really looked into 
>> this as of yet.


--
Gary McGath, Professional Software Developer http://www.garymcgath.com


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
Glacier sounds even better than S3 for what we're looking for. We are only 
going to be retrieving the files in the case of corruption, so the 
pay-per-retrieval model would work well. I heard of Glacier in the past but 
forgot all about it. Thank you.

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy 
Tennant
Sent: Thursday, January 10, 2013 4:56 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data 
files in logical tar'd and gzip'd chunks and it's costing my employer less than 
50 cents/month. Glacier, however, is best for "park it and forget" kinds of 
needs, as the real cost is in data flow.
Storage is cheap, but must be considered "offline" or "near line" as you must 
first request to retrieve a file, wait for about a day, and then retrieve the 
file. And you're charged more for the download throughput than just about 
anything.

I'm using a Unix client to handle all of the heavy lifting of uploading and 
downloading, as Glacier is meant to be used via an API rather than a web 
client.[1] If anyone is interested, I have local documentation on usage that I 
could probably genericize. And yes, I did round-trip a file to make sure it 
functioned as advertised.
Roy

[1] https://github.com/vsespb/mt-aws-glacier

On Thu, Jan 10, 2013 at 2:29 PM,   wrote:
> We built our own solution for this by creating a plugin that works with our 
> digital asset management system (ResourceSpace) to invidually back up files 
> to Amazon S3. Because S3 is replicated to multiple data centers, this 
> provides a fairly high level of redundancy. And because it's an object-based 
> web service, we can access any given object individually by using a URL 
> related to the original storage URL within our system.
>
> This also allows us to take advantage of S3 for images on our website. All of 
> the images from in our online collections database are being served straight 
> from S3, which diverts the load from our public web server. When we launch 
> zoomable images later this year, all of the tiles will also be generated 
> locally in the DAM and then served to the public via the mirrored copy in S3.
>
> The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
> fairly reasonable for what we're getting. They just dropped the price 
> substantially a few months ago.
>
> DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
> abstraction layer so you can build something like this that is portable 
> between different cloud storage providers. But I haven't really looked into 
> this as of yet.
>
> -David
>
>
> __
>
> David Dwiggins
> Systems Librarian/Archivist, Historic New England
> 141 Cambridge Street, Boston, MA 02114
> (617) 994-5948
> ddwigg...@historicnewengland.org
> http://www.historicnewengland.org
>>>> Joshua Welker  1/10/2013 5:20 PM >>>
> Hi everyone,
>
> We are starting a digitization project for some of our special collections, 
> and we are having a hard time setting up a backup system that meets the 
> long-term preservation needs of digital archives. The backup mechanisms 
> currently used by campus IT are short-term full-server backups. What we are 
> looking for is more granular, file-level backup over the very long term. Does 
> anyone have any recommendations of software or some service or technique? We 
> are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses 
> LOCKSS tell me a bit of their experiences with it?
>
> Josh Welker
> Electronic/Media Services Librarian
> College Liaison
> University Libraries
> Southwest Baptist University
> 417.328.1624


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Gary McGath
Concerns have been raised about how expensive Glacier gets if you need
to recover a lot of files in a short time period.

http://www.wired.com/wiredenterprise/2012/08/glacier/

On 1/10/13 5:56 PM, Roy Tennant wrote:
> I'd also take a look at Amazon Glacier. Recently I parked about 50GB
> of data files in logical tar'd and gzip'd chunks and it's costing my
> employer less than 50 cents/month. Glacier, however, is best for "park
> it and forget" kinds of needs, as the real cost is in data flow.
> Storage is cheap, but must be considered "offline" or "near line" as
> you must first request to retrieve a file, wait for about a day, and
> then retrieve the file. And you're charged more for the download
> throughput than just about anything.
> 
> I'm using a Unix client to handle all of the heavy lifting of
> uploading and downloading, as Glacier is meant to be used via an API
> rather than a web client.[1] If anyone is interested, I have local
> documentation on usage that I could probably genericize. And yes, I
> did round-trip a file to make sure it functioned as advertised.
> Roy
> 
> [1] https://github.com/vsespb/mt-aws-glacier
> 
> On Thu, Jan 10, 2013 at 2:29 PM,   wrote:
>> We built our own solution for this by creating a plugin that works with our 
>> digital asset management system (ResourceSpace) to invidually back up files 
>> to Amazon S3. Because S3 is replicated to multiple data centers, this 
>> provides a fairly high level of redundancy. And because it's an object-based 
>> web service, we can access any given object individually by using a URL 
>> related to the original storage URL within our system.
>>
>> This also allows us to take advantage of S3 for images on our website. All 
>> of the images from in our online collections database are being served 
>> straight from S3, which diverts the load from our public web server. When we 
>> launch zoomable images later this year, all of the tiles will also be 
>> generated locally in the DAM and then served to the public via the mirrored 
>> copy in S3.
>>
>> The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
>> fairly reasonable for what we're getting. They just dropped the price 
>> substantially a few months ago.
>>
>> DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
>> abstraction layer so you can build something like this that is portable 
>> between different cloud storage providers. But I haven't really looked into 
>> this as of yet.


-- 
Gary McGath, Professional Software Developer
http://www.garymcgath.com


Re: [CODE4LIB] Digital collection backups

2013-01-11 Thread Joshua Welker
David,

That sounds like a definite option. Thanks. Does S3 has an API for uploading so 
that the upload process could be scripted, or do you manually upload each file?

Josh Welker


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
ddwigg...@historicnewengland.org
Sent: Thursday, January 10, 2013 4:29 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

We built our own solution for this by creating a plugin that works with our 
digital asset management system (ResourceSpace) to invidually back up files to 
Amazon S3. Because S3 is replicated to multiple data centers, this provides a 
fairly high level of redundancy. And because it's an object-based web service, 
we can access any given object individually by using a URL related to the 
original storage URL within our system.
 
This also allows us to take advantage of S3 for images on our website. All of 
the images from in our online collections database are being served straight 
from S3, which diverts the load from our public web server. When we launch 
zoomable images later this year, all of the tiles will also be generated 
locally in the DAM and then served to the public via the mirrored copy in S3.
 
The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
fairly reasonable for what we're getting. They just dropped the price 
substantially a few months ago.
 
DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
abstraction layer so you can build something like this that is portable between 
different cloud storage providers. But I haven't really looked into this as of 
yet.
 
-David

 
__
 
David Dwiggins
Systems Librarian/Archivist, Historic New England
141 Cambridge Street, Boston, MA 02114
(617) 994-5948
ddwigg...@historicnewengland.org
http://www.historicnewengland.org
>>> Joshua Welker  1/10/2013 5:20 PM >>>
Hi everyone,

We are starting a digitization project for some of our special collections, and 
we are having a hard time setting up a backup system that meets the long-term 
preservation needs of digital archives. The backup mechanisms currently used by 
campus IT are short-term full-server backups. What we are looking for is more 
granular, file-level backup over the very long term. Does anyone have any 
recommendations of software or some service or technique? We are looking into 
LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit 
of their experiences with it?

Josh Welker
Electronic/Media Services Librarian
College Liaison
University Libraries
Southwest Baptist University
417.328.1624


Re: [CODE4LIB] Digital collection backups

2013-01-10 Thread Chris Cormack
Obnam http://liw.fi/obnam/ might do what you need with the minimum of fuss

Chris

On 11 January 2013 12:05, Fleming, Declan  wrote:
> Hi - you might look into Chronopolis (which can be front ended by DuraCloud 
> or not)  http://chronopolis.sdsc.edu/
>
> Declan
>
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy 
> Tennant
> Sent: Thursday, January 10, 2013 2:56 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Digital collection backups
>
> I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data 
> files in logical tar'd and gzip'd chunks and it's costing my employer less 
> than 50 cents/month. Glacier, however, is best for "park it and forget" kinds 
> of needs, as the real cost is in data flow.
> Storage is cheap, but must be considered "offline" or "near line" as you must 
> first request to retrieve a file, wait for about a day, and then retrieve the 
> file. And you're charged more for the download throughput than just about 
> anything.
>
> I'm using a Unix client to handle all of the heavy lifting of uploading and 
> downloading, as Glacier is meant to be used via an API rather than a web 
> client.[1] If anyone is interested, I have local documentation on usage that 
> I could probably genericize. And yes, I did round-trip a file to make sure it 
> functioned as advertised.
> Roy
>
> [1] https://github.com/vsespb/mt-aws-glacier
>
> On Thu, Jan 10, 2013 at 2:29 PM,   wrote:
>> We built our own solution for this by creating a plugin that works with our 
>> digital asset management system (ResourceSpace) to invidually back up files 
>> to Amazon S3. Because S3 is replicated to multiple data centers, this 
>> provides a fairly high level of redundancy. And because it's an object-based 
>> web service, we can access any given object individually by using a URL 
>> related to the original storage URL within our system.
>>
>> This also allows us to take advantage of S3 for images on our website. All 
>> of the images from in our online collections database are being served 
>> straight from S3, which diverts the load from our public web server. When we 
>> launch zoomable images later this year, all of the tiles will also be 
>> generated locally in the DAM and then served to the public via the mirrored 
>> copy in S3.
>>
>> The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
>> fairly reasonable for what we're getting. They just dropped the price 
>> substantially a few months ago.
>>
>> DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
>> abstraction layer so you can build something like this that is portable 
>> between different cloud storage providers. But I haven't really looked into 
>> this as of yet.
>>
>> -David
>>
>>
>> __
>>
>> David Dwiggins
>> Systems Librarian/Archivist, Historic New England
>> 141 Cambridge Street, Boston, MA 02114
>> (617) 994-5948
>> ddwigg...@historicnewengland.org
>> http://www.historicnewengland.org
>>>>> Joshua Welker  1/10/2013 5:20 PM >>>
>> Hi everyone,
>>
>> We are starting a digitization project for some of our special collections, 
>> and we are having a hard time setting up a backup system that meets the 
>> long-term preservation needs of digital archives. The backup mechanisms 
>> currently used by campus IT are short-term full-server backups. What we are 
>> looking for is more granular, file-level backup over the very long term. 
>> Does anyone have any recommendations of software or some service or 
>> technique? We are looking into LOCKSS but haven't dug too deeply yet. Can 
>> anyone who uses LOCKSS tell me a bit of their experiences with it?
>>
>> Josh Welker
>> Electronic/Media Services Librarian
>> College Liaison
>> University Libraries
>> Southwest Baptist University
>> 417.328.1624


Re: [CODE4LIB] Digital collection backups

2013-01-10 Thread Fleming, Declan
Hi - you might look into Chronopolis (which can be front ended by DuraCloud or 
not)  http://chronopolis.sdsc.edu/

Declan

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy 
Tennant
Sent: Thursday, January 10, 2013 2:56 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Digital collection backups

I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data 
files in logical tar'd and gzip'd chunks and it's costing my employer less than 
50 cents/month. Glacier, however, is best for "park it and forget" kinds of 
needs, as the real cost is in data flow.
Storage is cheap, but must be considered "offline" or "near line" as you must 
first request to retrieve a file, wait for about a day, and then retrieve the 
file. And you're charged more for the download throughput than just about 
anything.

I'm using a Unix client to handle all of the heavy lifting of uploading and 
downloading, as Glacier is meant to be used via an API rather than a web 
client.[1] If anyone is interested, I have local documentation on usage that I 
could probably genericize. And yes, I did round-trip a file to make sure it 
functioned as advertised.
Roy

[1] https://github.com/vsespb/mt-aws-glacier

On Thu, Jan 10, 2013 at 2:29 PM,   wrote:
> We built our own solution for this by creating a plugin that works with our 
> digital asset management system (ResourceSpace) to invidually back up files 
> to Amazon S3. Because S3 is replicated to multiple data centers, this 
> provides a fairly high level of redundancy. And because it's an object-based 
> web service, we can access any given object individually by using a URL 
> related to the original storage URL within our system.
>
> This also allows us to take advantage of S3 for images on our website. All of 
> the images from in our online collections database are being served straight 
> from S3, which diverts the load from our public web server. When we launch 
> zoomable images later this year, all of the tiles will also be generated 
> locally in the DAM and then served to the public via the mirrored copy in S3.
>
> The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
> fairly reasonable for what we're getting. They just dropped the price 
> substantially a few months ago.
>
> DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
> abstraction layer so you can build something like this that is portable 
> between different cloud storage providers. But I haven't really looked into 
> this as of yet.
>
> -David
>
>
> __
>
> David Dwiggins
> Systems Librarian/Archivist, Historic New England
> 141 Cambridge Street, Boston, MA 02114
> (617) 994-5948
> ddwigg...@historicnewengland.org
> http://www.historicnewengland.org
>>>> Joshua Welker  1/10/2013 5:20 PM >>>
> Hi everyone,
>
> We are starting a digitization project for some of our special collections, 
> and we are having a hard time setting up a backup system that meets the 
> long-term preservation needs of digital archives. The backup mechanisms 
> currently used by campus IT are short-term full-server backups. What we are 
> looking for is more granular, file-level backup over the very long term. Does 
> anyone have any recommendations of software or some service or technique? We 
> are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses 
> LOCKSS tell me a bit of their experiences with it?
>
> Josh Welker
> Electronic/Media Services Librarian
> College Liaison
> University Libraries
> Southwest Baptist University
> 417.328.1624


Re: [CODE4LIB] Digital collection backups

2013-01-10 Thread Roy Tennant
I'd also take a look at Amazon Glacier. Recently I parked about 50GB
of data files in logical tar'd and gzip'd chunks and it's costing my
employer less than 50 cents/month. Glacier, however, is best for "park
it and forget" kinds of needs, as the real cost is in data flow.
Storage is cheap, but must be considered "offline" or "near line" as
you must first request to retrieve a file, wait for about a day, and
then retrieve the file. And you're charged more for the download
throughput than just about anything.

I'm using a Unix client to handle all of the heavy lifting of
uploading and downloading, as Glacier is meant to be used via an API
rather than a web client.[1] If anyone is interested, I have local
documentation on usage that I could probably genericize. And yes, I
did round-trip a file to make sure it functioned as advertised.
Roy

[1] https://github.com/vsespb/mt-aws-glacier

On Thu, Jan 10, 2013 at 2:29 PM,   wrote:
> We built our own solution for this by creating a plugin that works with our 
> digital asset management system (ResourceSpace) to invidually back up files 
> to Amazon S3. Because S3 is replicated to multiple data centers, this 
> provides a fairly high level of redundancy. And because it's an object-based 
> web service, we can access any given object individually by using a URL 
> related to the original storage URL within our system.
>
> This also allows us to take advantage of S3 for images on our website. All of 
> the images from in our online collections database are being served straight 
> from S3, which diverts the load from our public web server. When we launch 
> zoomable images later this year, all of the tiles will also be generated 
> locally in the DAM and then served to the public via the mirrored copy in S3.
>
> The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
> fairly reasonable for what we're getting. They just dropped the price 
> substantially a few months ago.
>
> DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
> abstraction layer so you can build something like this that is portable 
> between different cloud storage providers. But I haven't really looked into 
> this as of yet.
>
> -David
>
>
> __
>
> David Dwiggins
> Systems Librarian/Archivist, Historic New England
> 141 Cambridge Street, Boston, MA 02114
> (617) 994-5948
> ddwigg...@historicnewengland.org
> http://www.historicnewengland.org
 Joshua Welker  1/10/2013 5:20 PM >>>
> Hi everyone,
>
> We are starting a digitization project for some of our special collections, 
> and we are having a hard time setting up a backup system that meets the 
> long-term preservation needs of digital archives. The backup mechanisms 
> currently used by campus IT are short-term full-server backups. What we are 
> looking for is more granular, file-level backup over the very long term. Does 
> anyone have any recommendations of software or some service or technique? We 
> are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses 
> LOCKSS tell me a bit of their experiences with it?
>
> Josh Welker
> Electronic/Media Services Librarian
> College Liaison
> University Libraries
> Southwest Baptist University
> 417.328.1624


Re: [CODE4LIB] Digital collection backups

2013-01-10 Thread ddwiggins
We built our own solution for this by creating a plugin that works with our 
digital asset management system (ResourceSpace) to invidually back up files to 
Amazon S3. Because S3 is replicated to multiple data centers, this provides a 
fairly high level of redundancy. And because it's an object-based web service, 
we can access any given object individually by using a URL related to the 
original storage URL within our system.
 
This also allows us to take advantage of S3 for images on our website. All of 
the images from in our online collections database are being served straight 
from S3, which diverts the load from our public web server. When we launch 
zoomable images later this year, all of the tiles will also be generated 
locally in the DAM and then served to the public via the mirrored copy in S3.
 
The current pricing is around $0.08/GB/month for 1-50 TB, which I think is 
fairly reasonable for what we're getting. They just dropped the price 
substantially a few months ago.
 
DuraCloud http://www.duracloud.org/ supposedly offers a way to add another 
abstraction layer so you can build something like this that is portable between 
different cloud storage providers. But I haven't really looked into this as of 
yet.
 
-David

 
__
 
David Dwiggins
Systems Librarian/Archivist, Historic New England
141 Cambridge Street, Boston, MA 02114
(617) 994-5948
ddwigg...@historicnewengland.org
http://www.historicnewengland.org
>>> Joshua Welker  1/10/2013 5:20 PM >>>
Hi everyone,

We are starting a digitization project for some of our special collections, and 
we are having a hard time setting up a backup system that meets the long-term 
preservation needs of digital archives. The backup mechanisms currently used by 
campus IT are short-term full-server backups. What we are looking for is more 
granular, file-level backup over the very long term. Does anyone have any 
recommendations of software or some service or technique? We are looking into 
LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit 
of their experiences with it?

Josh Welker
Electronic/Media Services Librarian
College Liaison
University Libraries
Southwest Baptist University
417.328.1624