Re: [CODE4LIB] Digital collection backups
I think we have a winning idea here. Thanks. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Randy Fischer Sent: Friday, January 11, 2013 3:46 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups On Fri, Jan 11, 2013 at 2:45 PM, Joshua Welker wrote: > Reading the Glacier FAQ on Amazon's site, it looks like they provide > an archive inventory (updated daily) that can be downloaded as JSON. I > read some users saying that this inventory includes checksum data. So > hopefully it will just be a matter of comparing the local checksum to > the Glacier checksum, and that would be easy enough to script. > > One could also occasionally spin up local EC2 instances to do the checksums in the same data center, and ship just that metadata down - you would not incur any bulk transfer costs in that case (if memory serves). DAITSS uses both md5 and sha1 checksums in combination, other preservation systems might require similar. -Randy Fischer
Re: [CODE4LIB] Digital collection backups
On Fri, Jan 11, 2013 at 2:45 PM, Joshua Welker wrote: > Reading the Glacier FAQ on Amazon's site, it looks like they provide an > archive inventory (updated daily) that can be downloaded as JSON. I read > some users saying that this inventory includes checksum data. So hopefully > it will just be a matter of comparing the local checksum to the Glacier > checksum, and that would be easy enough to script. > > One could also occasionally spin up local EC2 instances to do the checksums in the same data center, and ship just that metadata down - you would not incur any bulk transfer costs in that case (if memory serves). DAITSS uses both md5 and sha1 checksums in combination, other preservation systems might require similar. -Randy Fischer
Re: [CODE4LIB] Digital collection backups
Be careful about assuming too much on this. When I started working with S3, the system required an MD5 sum to upload, and would respond to requests with this "etag" in the header as well. I therefor assumed that this was integral to the system, and was a good way to compare local files against the remote copies. Then, maybe a year or two ago, Amazon introduced chunked uploads, so that you could send files in pieces and reassemble them once they got to S3. This was good, because it eliminated problems with huge files failing to upload due to network hicups. I went ahead and implemented it in my scripts. Then, all of a sudden I started getting invalid checksums. Turns out that for multipart file uploads, they now create etag identifiers that are not the md5 sum of the underlying files. I now store the checksum as a separate piece of header metadata. And my sync script does periodically compare against this. But since this is just metadata, checking it doesn't really prove anything about the underlying file that Amazon has. To do this I would need to write a script that would actually retrieve the file and rerun the checksum. I have not done this yet, although it is on my to-do list at some point. This would ideally happen on an Amazon server so that I wouldn't have to send the file back and forth. In any case, my main point is: don't assume that you can just check against a checksum from the API to verify a file for digital preservation purposes. -David __ David Dwiggins Systems Librarian/Archivist, Historic New England 141 Cambridge Street, Boston, MA 02114 (617) 994-5948 ddwigg...@historicnewengland.org http://www.historicnewengland.org >>> Joshua Welker 1/11/2013 2:45 PM >>> Thanks for bringing up the issue of the cost of making sure the data is consistent. We will be using DSpace for now, and I know DSpace has some checksum functionality built in out-of-the-box. It shouldn't be too difficult to write a script that loops through DSpace's checksum data and compares it against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it looks like they provide an archive inventory (updated daily) that can be downloaded as JSON. I read some users saying that this inventory includes checksum data. So hopefully it will just be a matter of comparing the local checksum to the Glacier checksum, and that would be easy enough to script. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby Sent: Friday, January 11, 2013 11:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups As Aaron alludes to your decision should base off your real needs and they might not be exclusive. LOCKSS/MetaArchive might be worth the money if it is the community archival aspect you are going for. Depending on your institution being a participant might make political/mission sense regardless of the storage needs and it could just be a specific collection that makes sense. Glacier is a great choice if you are looking for spreading a backup across regions. S3 similarly if you also want to benefit from CloudFront (the CDN setup) to take load off your institutions server (you can now use cloudfront off your own origin server as well). Depending on your bandwidth this might be worth the money regardless of LOCKSS participation (which can be more dark). Amazon also tends to be dropping prices over time vs raising but as any outsource you have to plan that it might not exist in the future. Also look more at Glacier prices in terms of checking your data for consistency. There have been a few papers on the costs of making sure Amazon really has the proper data depending on how often your requirements want you to check. Another option if you are just looking for more geo placement is finding an institution or service provider that will colocate. There may be another small institution that would love to shove a cheap box with hard drives on your network in exchange for the same. Not as involved/formal as LOCKSS but gives you something you control to satisfy your requirements. It could also be as low tech as shipping SSDs to another institution who then runs some bagit checksums on the drive, etc. All of the above should be scriptable in your workflow. Just need to decide what you really want out of it. Eby On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub wrote: > Hello Josh, > > Auburn University is a member of two Private LOCKSS Networks: the > MetaArchive Cooperative and the Alabama Digital Preservation Network > (ADPNet). Here's a link to a recent conference paper that describes > both networks, including their current pricing structures: > > http://conference.ifla.org/past/ifla78/216-trehub-en.pdf > > LOCKSS has worked well for us so far, in part because supporti
Re: [CODE4LIB] Digital collection backups
Awesome! Thanks. I will look into this for sure. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tim Donohue Sent: Friday, January 11, 2013 2:30 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, Now that you bring up DSpace as being part of the equation... You might want to look at the newly released "Replication Task Suite" plugin/addon for DSpace (supports DSpace versions 1.8.x & 3.0): https://wiki.duraspace.org/display/DSPACE/ReplicationTaskSuite This DSpace plugin does essentially what you are talking about... It allows you to backup (i.e. replicate) DSpace content files and metadata (in the form of a set of AIPs, Archival Information Packages) to a local filesystem/drive or to cloud storage. Plus it provides an "auditing" tool to audit changes between DSpace and the cloud storage provider. Currently, for the Replication Task Suite, that only cloud storage plugin we have created is for DuraCloud. But, it wouldn't be too hard to create a new plugin for Glacier (if you wanted to send DSpace content directly to Glacier without DuraCloud in between). The code is in GitHub at: https://github.com/DSpace/dspace-replicate If you decide to use it and create anything cool, feel free to send us a pull request. Good luck, - Tim -- Tim Donohue Technical Lead for DSpace Project DuraSpace.org On 1/11/2013 1:45 PM, Joshua Welker wrote: > Thanks for bringing up the issue of the cost of making sure the data is > consistent. We will be using DSpace for now, and I know DSpace has some > checksum functionality built in out-of-the-box. It shouldn't be too difficult > to write a script that loops through DSpace's checksum data and compares it > against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it > looks like they provide an archive inventory (updated daily) that can be > downloaded as JSON. I read some users saying that this inventory includes > checksum data. So hopefully it will just be a matter of comparing the local > checksum to the Glacier checksum, and that would be easy enough to script. > > Josh Welker > > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf > Of Ryan Eby > Sent: Friday, January 11, 2013 11:37 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Digital collection backups > > As Aaron alludes to your decision should base off your real needs and they > might not be exclusive. > > LOCKSS/MetaArchive might be worth the money if it is the community archival > aspect you are going for. Depending on your institution being a participant > might make political/mission sense regardless of the storage needs and it > could just be a specific collection that makes sense. > > Glacier is a great choice if you are looking for spreading a backup > across regions. S3 similarly if you also want to benefit from > CloudFront (the CDN > setup) to take load off your institutions server (you can now use cloudfront > off your own origin server as well). Depending on your bandwidth this might > be worth the money regardless of LOCKSS participation (which can be more > dark). Amazon also tends to be dropping prices over time vs raising but as > any outsource you have to plan that it might not exist in the future. Also > look more at Glacier prices in terms of checking your data for consistency. > There have been a few papers on the costs of making sure Amazon really has > the proper data depending on how often your requirements want you to check. > > Another option if you are just looking for more geo placement is finding an > institution or service provider that will colocate. There may be another > small institution that would love to shove a cheap box with hard drives on > your network in exchange for the same. Not as involved/formal as LOCKSS but > gives you something you control to satisfy your requirements. It could also > be as low tech as shipping SSDs to another institution who then runs some > bagit checksums on the drive, etc. > > All of the above should be scriptable in your workflow. Just need to decide > what you really want out of it. > > Eby > > > On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub wrote: > >> Hello Josh, >> >> Auburn University is a member of two Private LOCKSS Networks: the >> MetaArchive Cooperative and the Alabama Digital Preservation Network >> (ADPNet). Here's a link to a recent conference paper that describes >> both networks, including their current pricing structures: >> >> http://conference.ifla.org/past/ifla78/216-trehub-en.pdf >> >> LOCKSS has worked well for us so
Re: [CODE4LIB] Digital collection backups
Hi Josh, Now that you bring up DSpace as being part of the equation... You might want to look at the newly released "Replication Task Suite" plugin/addon for DSpace (supports DSpace versions 1.8.x & 3.0): https://wiki.duraspace.org/display/DSPACE/ReplicationTaskSuite This DSpace plugin does essentially what you are talking about... It allows you to backup (i.e. replicate) DSpace content files and metadata (in the form of a set of AIPs, Archival Information Packages) to a local filesystem/drive or to cloud storage. Plus it provides an "auditing" tool to audit changes between DSpace and the cloud storage provider. Currently, for the Replication Task Suite, that only cloud storage plugin we have created is for DuraCloud. But, it wouldn't be too hard to create a new plugin for Glacier (if you wanted to send DSpace content directly to Glacier without DuraCloud in between). The code is in GitHub at: https://github.com/DSpace/dspace-replicate If you decide to use it and create anything cool, feel free to send us a pull request. Good luck, - Tim -- Tim Donohue Technical Lead for DSpace Project DuraSpace.org On 1/11/2013 1:45 PM, Joshua Welker wrote: Thanks for bringing up the issue of the cost of making sure the data is consistent. We will be using DSpace for now, and I know DSpace has some checksum functionality built in out-of-the-box. It shouldn't be too difficult to write a script that loops through DSpace's checksum data and compares it against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it looks like they provide an archive inventory (updated daily) that can be downloaded as JSON. I read some users saying that this inventory includes checksum data. So hopefully it will just be a matter of comparing the local checksum to the Glacier checksum, and that would be easy enough to script. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby Sent: Friday, January 11, 2013 11:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups As Aaron alludes to your decision should base off your real needs and they might not be exclusive. LOCKSS/MetaArchive might be worth the money if it is the community archival aspect you are going for. Depending on your institution being a participant might make political/mission sense regardless of the storage needs and it could just be a specific collection that makes sense. Glacier is a great choice if you are looking for spreading a backup across regions. S3 similarly if you also want to benefit from CloudFront (the CDN setup) to take load off your institutions server (you can now use cloudfront off your own origin server as well). Depending on your bandwidth this might be worth the money regardless of LOCKSS participation (which can be more dark). Amazon also tends to be dropping prices over time vs raising but as any outsource you have to plan that it might not exist in the future. Also look more at Glacier prices in terms of checking your data for consistency. There have been a few papers on the costs of making sure Amazon really has the proper data depending on how often your requirements want you to check. Another option if you are just looking for more geo placement is finding an institution or service provider that will colocate. There may be another small institution that would love to shove a cheap box with hard drives on your network in exchange for the same. Not as involved/formal as LOCKSS but gives you something you control to satisfy your requirements. It could also be as low tech as shipping SSDs to another institution who then runs some bagit checksums on the drive, etc. All of the above should be scriptable in your workflow. Just need to decide what you really want out of it. Eby On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub wrote: Hello Josh, Auburn University is a member of two Private LOCKSS Networks: the MetaArchive Cooperative and the Alabama Digital Preservation Network (ADPNet). Here's a link to a recent conference paper that describes both networks, including their current pricing structures: http://conference.ifla.org/past/ifla78/216-trehub-en.pdf LOCKSS has worked well for us so far, in part because supporting community-based solutions is important to us. As you point out, however, Glacier is an attractive alternative, especially for institutions that may be more interested in low-cost, low-throughput storage and less concerned about entrusting their content to a commercial outfit or having to pay extra to get it back out. As with most things, you pay your money--more or less, depending--and make your choice. And take your risks. Good luck with whatever solution(s) you decide on. They need not be mutually exclusive. Best, Aaron Aaron Trehub Assistant Dean for Technology and Technical Services Auburn University Librar
Re: [CODE4LIB] Digital collection backups
On Fri, Jan 11, 2013 at 07:45:21PM +, Joshua Welker wrote: > Thanks for bringing up the issue of the cost of making sure the data is > consistent. We will be using DSpace for now, and I know DSpace has some > checksum functionality built in out-of-the-box. It shouldn't be too difficult > to write a script that loops through DSpace's checksum data and compares it > against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it > looks like they provide an archive inventory (updated daily) that can be > downloaded as JSON. I read some users saying that this inventory includes > checksum data. So hopefully it will just be a matter of comparing the local > checksum to the Glacier checksum, and that would be easy enough to script. An important question to ask here, though, is if that included checksum data is the same that Amazon uses to perform the "systematic data integrity checks" they mention in the Glacier FAQ, or if it's just catalog data --- "here's the checksum when we put it in". This is always the question we run into when we consider services like this, can we tease enough information out to convince ourselves that their checking is sufficient. -- Thomas L. Kula | tlk2...@columbia.edu Systems Engineer | Library Information Technology Office The Libraries, Columbia University in the City of New York
Re: [CODE4LIB] Digital collection backups
Thanks for bringing up the issue of the cost of making sure the data is consistent. We will be using DSpace for now, and I know DSpace has some checksum functionality built in out-of-the-box. It shouldn't be too difficult to write a script that loops through DSpace's checksum data and compares it against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it looks like they provide an archive inventory (updated daily) that can be downloaded as JSON. I read some users saying that this inventory includes checksum data. So hopefully it will just be a matter of comparing the local checksum to the Glacier checksum, and that would be easy enough to script. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby Sent: Friday, January 11, 2013 11:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups As Aaron alludes to your decision should base off your real needs and they might not be exclusive. LOCKSS/MetaArchive might be worth the money if it is the community archival aspect you are going for. Depending on your institution being a participant might make political/mission sense regardless of the storage needs and it could just be a specific collection that makes sense. Glacier is a great choice if you are looking for spreading a backup across regions. S3 similarly if you also want to benefit from CloudFront (the CDN setup) to take load off your institutions server (you can now use cloudfront off your own origin server as well). Depending on your bandwidth this might be worth the money regardless of LOCKSS participation (which can be more dark). Amazon also tends to be dropping prices over time vs raising but as any outsource you have to plan that it might not exist in the future. Also look more at Glacier prices in terms of checking your data for consistency. There have been a few papers on the costs of making sure Amazon really has the proper data depending on how often your requirements want you to check. Another option if you are just looking for more geo placement is finding an institution or service provider that will colocate. There may be another small institution that would love to shove a cheap box with hard drives on your network in exchange for the same. Not as involved/formal as LOCKSS but gives you something you control to satisfy your requirements. It could also be as low tech as shipping SSDs to another institution who then runs some bagit checksums on the drive, etc. All of the above should be scriptable in your workflow. Just need to decide what you really want out of it. Eby On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub wrote: > Hello Josh, > > Auburn University is a member of two Private LOCKSS Networks: the > MetaArchive Cooperative and the Alabama Digital Preservation Network > (ADPNet). Here's a link to a recent conference paper that describes > both networks, including their current pricing structures: > > http://conference.ifla.org/past/ifla78/216-trehub-en.pdf > > LOCKSS has worked well for us so far, in part because supporting > community-based solutions is important to us. As you point out, > however, Glacier is an attractive alternative, especially for > institutions that may be more interested in low-cost, low-throughput > storage and less concerned about entrusting their content to a > commercial outfit or having to pay extra to get it back out. As with > most things, you pay your money--more or less, depending--and make your > choice. And take your risks. > > Good luck with whatever solution(s) you decide on. They need not be > mutually exclusive. > > Best, > > Aaron > > Aaron Trehub > Assistant Dean for Technology and Technical Services Auburn University > Libraries > 231 Mell Street, RBD Library > Auburn, AL 36849-5606 > Phone: (334) 844-1716 > Skype: ajtrehub > E-mail: treh...@auburn.edu > URL: http://lib.auburn.edu/ > >
Re: [CODE4LIB] Digital collection backups
The only scenario I can think of where we'd need to do a full restore is if the server crashes, and for those cases, we are going to have typical short-term imaging setups in place. Our needs beyond that are to make sure our original files are backed up redundantly in some non-volatile location so that in the event a file on the local server becomes corrupt, we have a high fidelity copy of the original on hand to use to restore it. Since data decay I assume happens rather infrequently and over a long period of time, it's not important for us to be able to restore all the files at once. Like I said, if the server catches on fire and crashes, we have regular off-site tape-based storage to fix those short-term problems. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cary Gordon Sent: Friday, January 11, 2013 10:27 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Restoring 3 Tb from Glacier is $370. Add about $90 if you use AWS Import/Export (you provide the device). Hopefully, this is not something that you would do often. Cary On Fri, Jan 11, 2013 at 8:14 AM, Matt Schultz wrote: > Josh, > > Totally understand the resource constraints and the price comparison > up-front. As Roy alluded to earlier, it pays with Glacier to envision > what your content retrieval scenarios might be, because that $368 > up-front could very easily balloon in situations where you are needing > to restore a > collection(s) en-masse at a later date. Amazon Glacier as a service > makes their money on that end. In MetaArchive there is currently no > charge for collection retrieval for the sake of a restoration. You are > also subject and powerless over the long-term to Amazon's price hikes with > Glacier. > Because we are a Cooperative, our members collaboratively work > together annually to determine technology preferences, vendors, > pricing, cost control, etc. You have a direct seat at the table to > help steer the solution in your direction. > > On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker wrote: > >> Matt, >> >> I appreciate the information. At that price, it looks like >> MetaArchive would be a better option than most of the other services >> mentioned in this thread. At this point, I think it is going to come >> down to a LOCKSS solution such as what MetaArchive provides or Amazon >> Glacier. We anticipate our digital collection growing to about 3TB in >> the first two years. With Glacier, that would be $368 per year vs >> $3,072 per year for MetaArchive and LOCKSS. As much as I would like >> to support library initiatives like LOCKSS, we are a small >> institution with a very small budget, and the pricing of Glacier is starting >> to look too good to pass up. >> >> Josh Welker >> >> >> -Original Message- >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf >> Of Matt Schultz >> Sent: Friday, January 11, 2013 8:49 AM >> To: CODE4LIB@LISTSERV.ND.EDU >> Subject: Re: [CODE4LIB] Digital collection backups >> >> Hi Josh, >> >> Glad you are looking into LOCKSS as a potential solution for your >> needs and that you are thinking beyond simple backup solutions for >> more long-term preservation. Here at MetaArchive Cooperative we make >> use of LOCKSS to preserve a range of content/collections from our member >> institutions. >> >> The nice thing (I think) about our approach and our use of LOCKSS as >> an embedded technology is that you as an institution retain full >> control over your collections in the preservation network and get to >> play an active and on-going part in their preservation treatment over >> time. Storage costs in MetaArchive are competitive ($1/GB/year), and >> with that you get up to 7 geographic replications. MetaArchive is >> international at this point and so your collections really do achieve >> some safe distance from any disasters that may hit close to home. >> >> I'd be more than happy to talk with you further about your collection >> needs, why we like LOCKSS, and any interest your institution may have >> in being part of a collaborative approach to preserving your content >> above and beyond simple backup. Feel free to contact me directly. >> >> Matt Schultz >> Program Manager >> Educopia Institute, MetaArchive Cooperative >> http://www.metaarchive.org matt.schu...@metaarchive.org >> 616-566-3204 >> >> On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker wrote: >> >> > Hi everyone, >> > >> > We are starting a digiti
Re: [CODE4LIB] Digital collection backups
Thanks, I missed the part about DuraCloud as an abstraction layer. I might look into hosting an install of it on the primary server running the digitization platform. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tim Donohue Sent: Friday, January 11, 2013 12:39 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi all, Just wanted to add some additional details about DuraCloud (mentioned earlier in this thread), in case it is of interest to anyone. DuraCloud essentially provides an "abstraction layer" (as previously mentioned) above several cloud storage providers. DuraCloud also provides additional preservation services to help manage your content in the cloud (e.g. integrity checks, replication across several storage providers, migration between storage providers, various health/status reports). The currently supported cloud storage providers include: - Amazon S3 - Rackspace - SDSC There's several other cloud storage providers which are "beta-level" or in development. These include: - Amazon Glacier (in development) - Chronopolis (in development) - Azure (beta) - iRODS (beta) - HP Cloud (beta) DuraCloud is open source (so you could run it on your own server), but it is also offered as a hosted service (through DuraSpace, my employer). You can also try out the hosted service for free for two months. For much more info, see: - http://www.duracloud.org - Pricing for hosted service: http://duracloud.org/content/pricing * The pricing has dropped recently to reflect market changes - More technical info / documentation: https://wiki.duraspace.org/display/DURACLOUD/DuraCloud If it's of interest, I can put folks in touch with the DuraCloud team for more info (or you can email i...@duracloud.org). - Tim -- Tim Donohue Technical Lead for DSpace Project DuraSpace.org
Re: [CODE4LIB] Digital collection backups
Hi all, Just wanted to add some additional details about DuraCloud (mentioned earlier in this thread), in case it is of interest to anyone. DuraCloud essentially provides an "abstraction layer" (as previously mentioned) above several cloud storage providers. DuraCloud also provides additional preservation services to help manage your content in the cloud (e.g. integrity checks, replication across several storage providers, migration between storage providers, various health/status reports). The currently supported cloud storage providers include: - Amazon S3 - Rackspace - SDSC There's several other cloud storage providers which are "beta-level" or in development. These include: - Amazon Glacier (in development) - Chronopolis (in development) - Azure (beta) - iRODS (beta) - HP Cloud (beta) DuraCloud is open source (so you could run it on your own server), but it is also offered as a hosted service (through DuraSpace, my employer). You can also try out the hosted service for free for two months. For much more info, see: - http://www.duracloud.org - Pricing for hosted service: http://duracloud.org/content/pricing * The pricing has dropped recently to reflect market changes - More technical info / documentation: https://wiki.duraspace.org/display/DURACLOUD/DuraCloud If it's of interest, I can put folks in touch with the DuraCloud team for more info (or you can email i...@duracloud.org). - Tim -- Tim Donohue Technical Lead for DSpace Project DuraSpace.org
Re: [CODE4LIB] Digital collection backups
As Aaron alludes to your decision should base off your real needs and they might not be exclusive. LOCKSS/MetaArchive might be worth the money if it is the community archival aspect you are going for. Depending on your institution being a participant might make political/mission sense regardless of the storage needs and it could just be a specific collection that makes sense. Glacier is a great choice if you are looking for spreading a backup across regions. S3 similarly if you also want to benefit from CloudFront (the CDN setup) to take load off your institutions server (you can now use cloudfront off your own origin server as well). Depending on your bandwidth this might be worth the money regardless of LOCKSS participation (which can be more dark). Amazon also tends to be dropping prices over time vs raising but as any outsource you have to plan that it might not exist in the future. Also look more at Glacier prices in terms of checking your data for consistency. There have been a few papers on the costs of making sure Amazon really has the proper data depending on how often your requirements want you to check. Another option if you are just looking for more geo placement is finding an institution or service provider that will colocate. There may be another small institution that would love to shove a cheap box with hard drives on your network in exchange for the same. Not as involved/formal as LOCKSS but gives you something you control to satisfy your requirements. It could also be as low tech as shipping SSDs to another institution who then runs some bagit checksums on the drive, etc. All of the above should be scriptable in your workflow. Just need to decide what you really want out of it. Eby On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub wrote: > Hello Josh, > > Auburn University is a member of two Private LOCKSS Networks: the > MetaArchive Cooperative and the Alabama Digital Preservation Network > (ADPNet). Here's a link to a recent conference paper that describes both > networks, including their current pricing structures: > > http://conference.ifla.org/past/ifla78/216-trehub-en.pdf > > LOCKSS has worked well for us so far, in part because supporting > community-based solutions is important to us. As you point out, however, > Glacier is an attractive alternative, especially for institutions that may > be more interested in low-cost, low-throughput storage and less concerned > about entrusting their content to a commercial outfit or having to pay > extra to get it back out. As with most things, you pay your money--more or > less, depending--and make your choice. And take your risks. > > Good luck with whatever solution(s) you decide on. They need not be > mutually exclusive. > > Best, > > Aaron > > Aaron Trehub > Assistant Dean for Technology and Technical Services > Auburn University Libraries > 231 Mell Street, RBD Library > Auburn, AL 36849-5606 > Phone: (334) 844-1716 > Skype: ajtrehub > E-mail: treh...@auburn.edu > URL: http://lib.auburn.edu/ > >
Re: [CODE4LIB] Digital collection backups
Hello Josh, Auburn University is a member of two Private LOCKSS Networks: the MetaArchive Cooperative and the Alabama Digital Preservation Network (ADPNet). Here's a link to a recent conference paper that describes both networks, including their current pricing structures: http://conference.ifla.org/past/ifla78/216-trehub-en.pdf LOCKSS has worked well for us so far, in part because supporting community-based solutions is important to us. As you point out, however, Glacier is an attractive alternative, especially for institutions that may be more interested in low-cost, low-throughput storage and less concerned about entrusting their content to a commercial outfit or having to pay extra to get it back out. As with most things, you pay your money--more or less, depending--and make your choice. And take your risks. Good luck with whatever solution(s) you decide on. They need not be mutually exclusive. Best, Aaron Aaron Trehub Assistant Dean for Technology and Technical Services Auburn University Libraries 231 Mell Street, RBD Library Auburn, AL 36849-5606 Phone: (334) 844-1716 Skype: ajtrehub E-mail: treh...@auburn.edu URL: http://lib.auburn.edu/ -Original Message- From: Code for Libraries [mailto:CODE4LIB@listserv.nd.edu] On Behalf Of Joshua Welker Sent: Friday, January 11, 2013 9:09 AM To: CODE4LIB@listserv.nd.edu Subject: Re: [CODE4LIB] Digital collection backups Matt, I appreciate the information. At that price, it looks like MetaArchive would be a better option than most of the other services mentioned in this thread. At this point, I think it is going to come down to a LOCKSS solution such as what MetaArchive provides or Amazon Glacier. We anticipate our digital collection growing to about 3TB in the first two years. With Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like to support library initiatives like LOCKSS, we are a small institution with a very small budget, and the pricing of Glacier is starting to look too good to pass up. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt Schultz Sent: Friday, January 11, 2013 8:49 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, Glad you are looking into LOCKSS as a potential solution for your needs and that you are thinking beyond simple backup solutions for more long-term preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve a range of content/collections from our member institutions. The nice thing (I think) about our approach and our use of LOCKSS as an embedded technology is that you as an institution retain full control over your collections in the preservation network and get to play an active and on-going part in their preservation treatment over time. Storage costs in MetaArchive are competitive ($1/GB/year), and with that you get up to 7 geographic replications. MetaArchive is international at this point and so your collections really do achieve some safe distance from any disasters that may hit close to home. I'd be more than happy to talk with you further about your collection needs, why we like LOCKSS, and any interest your institution may have in being part of a collaborative approach to preserving your content above and beyond simple backup. Feel free to contact me directly. Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker wrote: > Hi everyone, > > We are starting a digitization project for some of our special > collections, and we are having a hard time setting up a backup system > that meets the long-term preservation needs of digital archives. The > backup mechanisms currently used by campus IT are short-term full-server > backups. > What we are looking for is more granular, file-level backup over the > very long term. Does anyone have any recommendations of software or > some service or technique? We are looking into LOCKSS but haven't dug too > deeply yet. > Can anyone who uses LOCKSS tell me a bit of their experiences with it? > > Josh Welker > Electronic/Media Services Librarian > College Liaison > University Libraries > Southwest Baptist University > 417.328.1624 > -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204
Re: [CODE4LIB] Digital collection backups
Without looking into any other issues with Glaicer ()such as privacy, security, etc.), it seems like it could be a good solution for long-term backups of digital preservation. I am not sure I would use it for regular backups of my digital preservation system, but for a long-term off-site storage "insurance policy" it is worth looking into. I can picture using it for bi-monthly or quarterly backups, for instance. In this case it would be something you would never hope to use, but it could be good to have it in case of a major disaster. Edward On Fri, Jan 11, 2013 at 11:27 AM, Cary Gordon wrote: > Restoring 3 Tb from Glacier is $370. Add about $90 if you use AWS > Import/Export (you provide the device). > > Hopefully, this is not something that you would do often. > > Cary > > On Fri, Jan 11, 2013 at 8:14 AM, Matt Schultz > wrote: >> Josh, >> >> Totally understand the resource constraints and the price comparison >> up-front. As Roy alluded to earlier, it pays with Glacier to envision what >> your content retrieval scenarios might be, because that $368 up-front could >> very easily balloon in situations where you are needing to restore a >> collection(s) en-masse at a later date. Amazon Glacier as a service makes >> their money on that end. In MetaArchive there is currently no charge for >> collection retrieval for the sake of a restoration. You are also subject >> and powerless over the long-term to Amazon's price hikes with Glacier. >> Because we are a Cooperative, our members collaboratively work together >> annually to determine technology preferences, vendors, pricing, cost >> control, etc. You have a direct seat at the table to help steer the >> solution in your direction. >> >> On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker wrote: >> >>> Matt, >>> >>> I appreciate the information. At that price, it looks like MetaArchive >>> would be a better option than most of the other services mentioned in this >>> thread. At this point, I think it is going to come down to a LOCKSS >>> solution such as what MetaArchive provides or Amazon Glacier. We anticipate >>> our digital collection growing to about 3TB in the first two years. With >>> Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and >>> LOCKSS. As much as I would like to support library initiatives like LOCKSS, >>> we are a small institution with a very small budget, and the pricing of >>> Glacier is starting to look too good to pass up. >>> >>> Josh Welker >>> >>> >>> -Original Message- >>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of >>> Matt Schultz >>> Sent: Friday, January 11, 2013 8:49 AM >>> To: CODE4LIB@LISTSERV.ND.EDU >>> Subject: Re: [CODE4LIB] Digital collection backups >>> >>> Hi Josh, >>> >>> Glad you are looking into LOCKSS as a potential solution for your needs >>> and that you are thinking beyond simple backup solutions for more long-term >>> preservation. Here at MetaArchive Cooperative we make use of LOCKSS to >>> preserve a range of content/collections from our member institutions. >>> >>> The nice thing (I think) about our approach and our use of LOCKSS as an >>> embedded technology is that you as an institution retain full control over >>> your collections in the preservation network and get to play an active and >>> on-going part in their preservation treatment over time. Storage costs in >>> MetaArchive are competitive ($1/GB/year), and with that you get up to 7 >>> geographic replications. MetaArchive is international at this point and so >>> your collections really do achieve some safe distance from any disasters >>> that may hit close to home. >>> >>> I'd be more than happy to talk with you further about your collection >>> needs, why we like LOCKSS, and any interest your institution may have in >>> being part of a collaborative approach to preserving your content above and >>> beyond simple backup. Feel free to contact me directly. >>> >>> Matt Schultz >>> Program Manager >>> Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org >>> matt.schu...@metaarchive.org >>> 616-566-3204 >>> >>> On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker wrote: >>> >>> > Hi everyone, >>> > >>> > We are starting a digitization project for some of our special >>> > collections, and we are having a hard time setting u
Re: [CODE4LIB] Digital collection backups
Restoring 3 Tb from Glacier is $370. Add about $90 if you use AWS Import/Export (you provide the device). Hopefully, this is not something that you would do often. Cary On Fri, Jan 11, 2013 at 8:14 AM, Matt Schultz wrote: > Josh, > > Totally understand the resource constraints and the price comparison > up-front. As Roy alluded to earlier, it pays with Glacier to envision what > your content retrieval scenarios might be, because that $368 up-front could > very easily balloon in situations where you are needing to restore a > collection(s) en-masse at a later date. Amazon Glacier as a service makes > their money on that end. In MetaArchive there is currently no charge for > collection retrieval for the sake of a restoration. You are also subject > and powerless over the long-term to Amazon's price hikes with Glacier. > Because we are a Cooperative, our members collaboratively work together > annually to determine technology preferences, vendors, pricing, cost > control, etc. You have a direct seat at the table to help steer the > solution in your direction. > > On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker wrote: > >> Matt, >> >> I appreciate the information. At that price, it looks like MetaArchive >> would be a better option than most of the other services mentioned in this >> thread. At this point, I think it is going to come down to a LOCKSS >> solution such as what MetaArchive provides or Amazon Glacier. We anticipate >> our digital collection growing to about 3TB in the first two years. With >> Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and >> LOCKSS. As much as I would like to support library initiatives like LOCKSS, >> we are a small institution with a very small budget, and the pricing of >> Glacier is starting to look too good to pass up. >> >> Josh Welker >> >> >> -Original Message- >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of >> Matt Schultz >> Sent: Friday, January 11, 2013 8:49 AM >> To: CODE4LIB@LISTSERV.ND.EDU >> Subject: Re: [CODE4LIB] Digital collection backups >> >> Hi Josh, >> >> Glad you are looking into LOCKSS as a potential solution for your needs >> and that you are thinking beyond simple backup solutions for more long-term >> preservation. Here at MetaArchive Cooperative we make use of LOCKSS to >> preserve a range of content/collections from our member institutions. >> >> The nice thing (I think) about our approach and our use of LOCKSS as an >> embedded technology is that you as an institution retain full control over >> your collections in the preservation network and get to play an active and >> on-going part in their preservation treatment over time. Storage costs in >> MetaArchive are competitive ($1/GB/year), and with that you get up to 7 >> geographic replications. MetaArchive is international at this point and so >> your collections really do achieve some safe distance from any disasters >> that may hit close to home. >> >> I'd be more than happy to talk with you further about your collection >> needs, why we like LOCKSS, and any interest your institution may have in >> being part of a collaborative approach to preserving your content above and >> beyond simple backup. Feel free to contact me directly. >> >> Matt Schultz >> Program Manager >> Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org >> matt.schu...@metaarchive.org >> 616-566-3204 >> >> On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker wrote: >> >> > Hi everyone, >> > >> > We are starting a digitization project for some of our special >> > collections, and we are having a hard time setting up a backup system >> > that meets the long-term preservation needs of digital archives. The >> > backup mechanisms currently used by campus IT are short-term full-server >> backups. >> > What we are looking for is more granular, file-level backup over the >> > very long term. Does anyone have any recommendations of software or >> > some service or technique? We are looking into LOCKSS but haven't dug >> too deeply yet. >> > Can anyone who uses LOCKSS tell me a bit of their experiences with it? >> > >> > Josh Welker >> > Electronic/Media Services Librarian >> > College Liaison >> > University Libraries >> > Southwest Baptist University >> > 417.328.1624 >> > >> >> >> >> -- >> Matt Schultz >> Program Manager >> Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org >> matt.schu...@metaarchive.org >> 616-566-3204 >> > > > > -- > Matt Schultz > Program Manager > Educopia Institute, MetaArchive Cooperative > http://www.metaarchive.org > matt.schu...@metaarchive.org > 616-566-3204 -- Cary Gordon The Cherry Hill Company http://chillco.com
Re: [CODE4LIB] Digital collection backups
In my experience, writable DVDs are not a stable backup medium. If you really want to go the DIY, simple as possible route, I suggest that you get 3-4 of the drives and rotate them. On Fri, Jan 11, 2013 at 7:34 AM, James Gilbert wrote: > Hi Josh, > > I lurked on this thread, as I did not know the size of your institution. > > Being a public library serving about 24,000 residents - we have the > small-institution issues as well for this type of project. We recently > tackled a similar situation and the solution: > > 1) Purchase a 3TB SeaGate external network storage device (residential drive > from Best Buy) > 2) Burn archived materials to DVD > 3) Copy files to external storage (on site in my server room) > 4) DVDs reside off-site (we are still determining where this would be, as > the library does not have a Safe Deposit Box) > > This removes external companies, and the data is quick trip home and back. > > I know it is not elaborate and fancy, very little code... but it was $150 > for the drive; and cost of DVDs. > > James Gilbert, BS, MLIS > Systems Librarian > Whitehall Township Public Library > 3700 Mechanicsville Road > Whitehall, PA 18052 > > 610-432-4330 ext: 203 > > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Joshua Welker > Sent: Friday, January 11, 2013 10:09 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Digital collection backups > > Matt, > > I appreciate the information. At that price, it looks like MetaArchive would > be a better option than most of the other services mentioned in this thread. > At this point, I think it is going to come down to a LOCKSS solution such as > what MetaArchive provides or Amazon Glacier. We anticipate our digital > collection growing to about 3TB in the first two years. With Glacier, that > would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As > much as I would like to support library initiatives like LOCKSS, we are a > small institution with a very small budget, and the pricing of Glacier is > starting to look too good to pass up. > > Josh Welker > > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt > Schultz > Sent: Friday, January 11, 2013 8:49 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Digital collection backups > > Hi Josh, > > Glad you are looking into LOCKSS as a potential solution for your needs and > that you are thinking beyond simple backup solutions for more long-term > preservation. Here at MetaArchive Cooperative we make use of LOCKSS to > preserve a range of content/collections from our member institutions. > > The nice thing (I think) about our approach and our use of LOCKSS as an > embedded technology is that you as an institution retain full control over > your collections in the preservation network and get to play an active and > on-going part in their preservation treatment over time. Storage costs in > MetaArchive are competitive ($1/GB/year), and with that you get up to 7 > geographic replications. MetaArchive is international at this point and so > your collections really do achieve some safe distance from any disasters > that may hit close to home. > > I'd be more than happy to talk with you further about your collection needs, > why we like LOCKSS, and any interest your institution may have in being part > of a collaborative approach to preserving your content above and beyond > simple backup. Feel free to contact me directly. > > Matt Schultz > Program Manager > Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org > matt.schu...@metaarchive.org > 616-566-3204 > > On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker wrote: > >> Hi everyone, >> >> We are starting a digitization project for some of our special >> collections, and we are having a hard time setting up a backup system >> that meets the long-term preservation needs of digital archives. The >> backup mechanisms currently used by campus IT are short-term full-server > backups. >> What we are looking for is more granular, file-level backup over the >> very long term. Does anyone have any recommendations of software or >> some service or technique? We are looking into LOCKSS but haven't dug too > deeply yet. >> Can anyone who uses LOCKSS tell me a bit of their experiences with it? >> >> Josh Welker >> Electronic/Media Services Librarian >> College Liaison >> University Libraries >> Southwest Baptist University >> 417.328.1624 >> > > > > -- > Matt Schultz > Program Manager > Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org > matt.schu...@metaarchive.org > 616-566-3204 -- Cary Gordon The Cherry Hill Company http://chillco.com
Re: [CODE4LIB] Digital collection backups
Josh, Totally understand the resource constraints and the price comparison up-front. As Roy alluded to earlier, it pays with Glacier to envision what your content retrieval scenarios might be, because that $368 up-front could very easily balloon in situations where you are needing to restore a collection(s) en-masse at a later date. Amazon Glacier as a service makes their money on that end. In MetaArchive there is currently no charge for collection retrieval for the sake of a restoration. You are also subject and powerless over the long-term to Amazon's price hikes with Glacier. Because we are a Cooperative, our members collaboratively work together annually to determine technology preferences, vendors, pricing, cost control, etc. You have a direct seat at the table to help steer the solution in your direction. On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker wrote: > Matt, > > I appreciate the information. At that price, it looks like MetaArchive > would be a better option than most of the other services mentioned in this > thread. At this point, I think it is going to come down to a LOCKSS > solution such as what MetaArchive provides or Amazon Glacier. We anticipate > our digital collection growing to about 3TB in the first two years. With > Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and > LOCKSS. As much as I would like to support library initiatives like LOCKSS, > we are a small institution with a very small budget, and the pricing of > Glacier is starting to look too good to pass up. > > Josh Welker > > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Matt Schultz > Sent: Friday, January 11, 2013 8:49 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Digital collection backups > > Hi Josh, > > Glad you are looking into LOCKSS as a potential solution for your needs > and that you are thinking beyond simple backup solutions for more long-term > preservation. Here at MetaArchive Cooperative we make use of LOCKSS to > preserve a range of content/collections from our member institutions. > > The nice thing (I think) about our approach and our use of LOCKSS as an > embedded technology is that you as an institution retain full control over > your collections in the preservation network and get to play an active and > on-going part in their preservation treatment over time. Storage costs in > MetaArchive are competitive ($1/GB/year), and with that you get up to 7 > geographic replications. MetaArchive is international at this point and so > your collections really do achieve some safe distance from any disasters > that may hit close to home. > > I'd be more than happy to talk with you further about your collection > needs, why we like LOCKSS, and any interest your institution may have in > being part of a collaborative approach to preserving your content above and > beyond simple backup. Feel free to contact me directly. > > Matt Schultz > Program Manager > Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org > matt.schu...@metaarchive.org > 616-566-3204 > > On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker wrote: > > > Hi everyone, > > > > We are starting a digitization project for some of our special > > collections, and we are having a hard time setting up a backup system > > that meets the long-term preservation needs of digital archives. The > > backup mechanisms currently used by campus IT are short-term full-server > backups. > > What we are looking for is more granular, file-level backup over the > > very long term. Does anyone have any recommendations of software or > > some service or technique? We are looking into LOCKSS but haven't dug > too deeply yet. > > Can anyone who uses LOCKSS tell me a bit of their experiences with it? > > > > Josh Welker > > Electronic/Media Services Librarian > > College Liaison > > University Libraries > > Southwest Baptist University > > 417.328.1624 > > > > > > -- > Matt Schultz > Program Manager > Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org > matt.schu...@metaarchive.org > 616-566-3204 > -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204
Re: [CODE4LIB] Digital collection backups
James, Definitely a simple and elegant solution, but that is not a viable long-term option for us. We currently have tons of old CDs and DVDs full of data, and one of our goals is to wean off those media completely. Most consumer-grade CDs and DVDs are very poor in terms of long-term data integrity. Those discs have a shelf life of probably a decade or two tops. Plus we are wanting more redundancy than what is offered by having the backups as a collection of discs in a single physical location. But if that works for you guys, power to you. Cheap is good. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of James Gilbert Sent: Friday, January 11, 2013 9:34 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, I lurked on this thread, as I did not know the size of your institution. Being a public library serving about 24,000 residents - we have the small-institution issues as well for this type of project. We recently tackled a similar situation and the solution: 1) Purchase a 3TB SeaGate external network storage device (residential drive from Best Buy) 2) Burn archived materials to DVD 3) Copy files to external storage (on site in my server room) 4) DVDs reside off-site (we are still determining where this would be, as the library does not have a Safe Deposit Box) This removes external companies, and the data is quick trip home and back. I know it is not elaborate and fancy, very little code... but it was $150 for the drive; and cost of DVDs. James Gilbert, BS, MLIS Systems Librarian Whitehall Township Public Library 3700 Mechanicsville Road Whitehall, PA 18052 610-432-4330 ext: 203 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joshua Welker Sent: Friday, January 11, 2013 10:09 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Matt, I appreciate the information. At that price, it looks like MetaArchive would be a better option than most of the other services mentioned in this thread. At this point, I think it is going to come down to a LOCKSS solution such as what MetaArchive provides or Amazon Glacier. We anticipate our digital collection growing to about 3TB in the first two years. With Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like to support library initiatives like LOCKSS, we are a small institution with a very small budget, and the pricing of Glacier is starting to look too good to pass up. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt Schultz Sent: Friday, January 11, 2013 8:49 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, Glad you are looking into LOCKSS as a potential solution for your needs and that you are thinking beyond simple backup solutions for more long-term preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve a range of content/collections from our member institutions. The nice thing (I think) about our approach and our use of LOCKSS as an embedded technology is that you as an institution retain full control over your collections in the preservation network and get to play an active and on-going part in their preservation treatment over time. Storage costs in MetaArchive are competitive ($1/GB/year), and with that you get up to 7 geographic replications. MetaArchive is international at this point and so your collections really do achieve some safe distance from any disasters that may hit close to home. I'd be more than happy to talk with you further about your collection needs, why we like LOCKSS, and any interest your institution may have in being part of a collaborative approach to preserving your content above and beyond simple backup. Feel free to contact me directly. Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker wrote: > Hi everyone, > > We are starting a digitization project for some of our special > collections, and we are having a hard time setting up a backup system > that meets the long-term preservation needs of digital archives. The > backup mechanisms currently used by campus IT are short-term > full-server backups. > What we are looking for is more granular, file-level backup over the > very long term. Does anyone have any recommendations of software or > some service or technique? We are looking into LOCKSS but haven't dug > too deeply yet. > Can anyone who uses LOCKSS tell me a bit of their experiences with it? > > Josh Welker > Electronic/Media Services Librarian > College Liaison > University Libraries > Southwes
Re: [CODE4LIB] Digital collection backups
Hi Josh, I lurked on this thread, as I did not know the size of your institution. Being a public library serving about 24,000 residents - we have the small-institution issues as well for this type of project. We recently tackled a similar situation and the solution: 1) Purchase a 3TB SeaGate external network storage device (residential drive from Best Buy) 2) Burn archived materials to DVD 3) Copy files to external storage (on site in my server room) 4) DVDs reside off-site (we are still determining where this would be, as the library does not have a Safe Deposit Box) This removes external companies, and the data is quick trip home and back. I know it is not elaborate and fancy, very little code... but it was $150 for the drive; and cost of DVDs. James Gilbert, BS, MLIS Systems Librarian Whitehall Township Public Library 3700 Mechanicsville Road Whitehall, PA 18052 610-432-4330 ext: 203 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joshua Welker Sent: Friday, January 11, 2013 10:09 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Matt, I appreciate the information. At that price, it looks like MetaArchive would be a better option than most of the other services mentioned in this thread. At this point, I think it is going to come down to a LOCKSS solution such as what MetaArchive provides or Amazon Glacier. We anticipate our digital collection growing to about 3TB in the first two years. With Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like to support library initiatives like LOCKSS, we are a small institution with a very small budget, and the pricing of Glacier is starting to look too good to pass up. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt Schultz Sent: Friday, January 11, 2013 8:49 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, Glad you are looking into LOCKSS as a potential solution for your needs and that you are thinking beyond simple backup solutions for more long-term preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve a range of content/collections from our member institutions. The nice thing (I think) about our approach and our use of LOCKSS as an embedded technology is that you as an institution retain full control over your collections in the preservation network and get to play an active and on-going part in their preservation treatment over time. Storage costs in MetaArchive are competitive ($1/GB/year), and with that you get up to 7 geographic replications. MetaArchive is international at this point and so your collections really do achieve some safe distance from any disasters that may hit close to home. I'd be more than happy to talk with you further about your collection needs, why we like LOCKSS, and any interest your institution may have in being part of a collaborative approach to preserving your content above and beyond simple backup. Feel free to contact me directly. Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker wrote: > Hi everyone, > > We are starting a digitization project for some of our special > collections, and we are having a hard time setting up a backup system > that meets the long-term preservation needs of digital archives. The > backup mechanisms currently used by campus IT are short-term full-server backups. > What we are looking for is more granular, file-level backup over the > very long term. Does anyone have any recommendations of software or > some service or technique? We are looking into LOCKSS but haven't dug too deeply yet. > Can anyone who uses LOCKSS tell me a bit of their experiences with it? > > Josh Welker > Electronic/Media Services Librarian > College Liaison > University Libraries > Southwest Baptist University > 417.328.1624 > -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204
Re: [CODE4LIB] Digital collection backups
http://metaarchive.org/costs in our case. Interested to hear other experiences. Al On 1/11/13 10:01 AM, "Joshua Welker" wrote: >Thanks, Al. I think we'd join a LOCKSS network rather than run multiple >LOCKSS boxes ourselves. Does anyone have any experience with one of >those, like the LOCKSS Global Alliance? > >Josh Welker > > >-Original Message- >From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of >Al Matthews >Sent: Friday, January 11, 2013 8:50 AM >To: CODE4LIB@LISTSERV.ND.EDU >Subject: Re: [CODE4LIB] Digital collection backups > >We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is >typically spec-d for consumer hardware, and so, presumably as a result of >SE Asia flooding, there have been some drive failures and cache downtimes >and adjustments accordingly. > >However, that is the worst of it, first. > >LOCKSS is to some perhaps even considerable degree, tamper-resistant >since it relies on mechanisms of collective polling among multiple copies >to preserve integrity. This, as opposed to static checksums or some other >solution. > >As such, it seems to me important to run a LOCKSS box with other LOCKSS >boxes; MA cooperative specifies six or so, distributed locations for each >cache. > >The economic sustainability of such an enterprise is a valid question. >David S H Rosenthal at Stanford seems to lead the charge for this >research. > >e.g. >http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more > >I've heard mention from other players that they watch MA carefully for >such sustainability considerations, especially because MA uses LOCKSS for >non-journal content. In some sense this may extend LOCKSS beyond its >original design. > >MetaArchive has in my opinion been extremely responsible in designating >succession scenarios and disaster recovery scenarios, going to far as to >fund, develop and test services for migration out of the system, into an >IRODS repository in the initial case. > > >Al Matthews >AUC Robert W. Woodruff Library > >On 1/11/13 9:10 AM, "Joshua Welker" wrote: > >>Good point. But since campus IT will be creating regular >>disaster-recovery backups, the odds that we'd need ever need to >>retrieve more than a handful of files from Glacier at a time is pretty >>low. >> >>Josh Welker >> >> >>-Original Message- >>From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of >>Gary McGath >>Sent: Friday, January 11, 2013 8:03 AM >>To: CODE4LIB@LISTSERV.ND.EDU >>Subject: Re: [CODE4LIB] Digital collection backups >> >>Concerns have been raised about how expensive Glacier gets if you need >>to recover a lot of files in a short time period. >> >>http://www.wired.com/wiredenterprise/2012/08/glacier/ >> >>On 1/10/13 5:56 PM, Roy Tennant wrote: >>> I'd also take a look at Amazon Glacier. Recently I parked about 50GB >>> of data files in logical tar'd and gzip'd chunks and it's costing my >>> employer less than 50 cents/month. Glacier, however, is best for >>> "park it and forget" kinds of needs, as the real cost is in data flow. >>> Storage is cheap, but must be considered "offline" or "near line" as >>> you must first request to retrieve a file, wait for about a day, and >>> then retrieve the file. And you're charged more for the download >>> throughput than just about anything. >>> >>> I'm using a Unix client to handle all of the heavy lifting of >>> uploading and downloading, as Glacier is meant to be used via an API >>> rather than a web client.[1] If anyone is interested, I have local >>> documentation on usage that I could probably genericize. And yes, I >>> did round-trip a file to make sure it functioned as advertised. >>> Roy >>> >>> [1] https://github.com/vsespb/mt-aws-glacier >>> >>> On Thu, Jan 10, 2013 at 2:29 PM, >>>wrote: >>>> We built our own solution for this by creating a plugin that works >>>>with our digital asset management system (ResourceSpace) to >>>>invidually back up files to Amazon S3. Because S3 is replicated to >>>>multiple data centers, this provides a fairly high level of >>>>redundancy. And because it's an object-based web service, we can >>>>access any given object individually by using a URL related to the >>>>original storage URL within our system. >>>> >>>> This also allows us to take advantage of S3 for images on our
Re: [CODE4LIB] Digital collection backups
Matt, I appreciate the information. At that price, it looks like MetaArchive would be a better option than most of the other services mentioned in this thread. At this point, I think it is going to come down to a LOCKSS solution such as what MetaArchive provides or Amazon Glacier. We anticipate our digital collection growing to about 3TB in the first two years. With Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like to support library initiatives like LOCKSS, we are a small institution with a very small budget, and the pricing of Glacier is starting to look too good to pass up. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt Schultz Sent: Friday, January 11, 2013 8:49 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, Glad you are looking into LOCKSS as a potential solution for your needs and that you are thinking beyond simple backup solutions for more long-term preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve a range of content/collections from our member institutions. The nice thing (I think) about our approach and our use of LOCKSS as an embedded technology is that you as an institution retain full control over your collections in the preservation network and get to play an active and on-going part in their preservation treatment over time. Storage costs in MetaArchive are competitive ($1/GB/year), and with that you get up to 7 geographic replications. MetaArchive is international at this point and so your collections really do achieve some safe distance from any disasters that may hit close to home. I'd be more than happy to talk with you further about your collection needs, why we like LOCKSS, and any interest your institution may have in being part of a collaborative approach to preserving your content above and beyond simple backup. Feel free to contact me directly. Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker wrote: > Hi everyone, > > We are starting a digitization project for some of our special > collections, and we are having a hard time setting up a backup system > that meets the long-term preservation needs of digital archives. The > backup mechanisms currently used by campus IT are short-term full-server > backups. > What we are looking for is more granular, file-level backup over the > very long term. Does anyone have any recommendations of software or > some service or technique? We are looking into LOCKSS but haven't dug too > deeply yet. > Can anyone who uses LOCKSS tell me a bit of their experiences with it? > > Josh Welker > Electronic/Media Services Librarian > College Liaison > University Libraries > Southwest Baptist University > 417.328.1624 > -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204
Re: [CODE4LIB] Digital collection backups
Thanks, Al. I think we'd join a LOCKSS network rather than run multiple LOCKSS boxes ourselves. Does anyone have any experience with one of those, like the LOCKSS Global Alliance? Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Al Matthews Sent: Friday, January 11, 2013 8:50 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is typically spec-d for consumer hardware, and so, presumably as a result of SE Asia flooding, there have been some drive failures and cache downtimes and adjustments accordingly. However, that is the worst of it, first. LOCKSS is to some perhaps even considerable degree, tamper-resistant since it relies on mechanisms of collective polling among multiple copies to preserve integrity. This, as opposed to static checksums or some other solution. As such, it seems to me important to run a LOCKSS box with other LOCKSS boxes; MA cooperative specifies six or so, distributed locations for each cache. The economic sustainability of such an enterprise is a valid question. David S H Rosenthal at Stanford seems to lead the charge for this research. e.g. http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more I've heard mention from other players that they watch MA carefully for such sustainability considerations, especially because MA uses LOCKSS for non-journal content. In some sense this may extend LOCKSS beyond its original design. MetaArchive has in my opinion been extremely responsible in designating succession scenarios and disaster recovery scenarios, going to far as to fund, develop and test services for migration out of the system, into an IRODS repository in the initial case. Al Matthews AUC Robert W. Woodruff Library On 1/11/13 9:10 AM, "Joshua Welker" wrote: >Good point. But since campus IT will be creating regular >disaster-recovery backups, the odds that we'd need ever need to >retrieve more than a handful of files from Glacier at a time is pretty low. > >Josh Welker > > >-Original Message- >From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of >Gary McGath >Sent: Friday, January 11, 2013 8:03 AM >To: CODE4LIB@LISTSERV.ND.EDU >Subject: Re: [CODE4LIB] Digital collection backups > >Concerns have been raised about how expensive Glacier gets if you need >to recover a lot of files in a short time period. > >http://www.wired.com/wiredenterprise/2012/08/glacier/ > >On 1/10/13 5:56 PM, Roy Tennant wrote: >> I'd also take a look at Amazon Glacier. Recently I parked about 50GB >> of data files in logical tar'd and gzip'd chunks and it's costing my >> employer less than 50 cents/month. Glacier, however, is best for >> "park it and forget" kinds of needs, as the real cost is in data flow. >> Storage is cheap, but must be considered "offline" or "near line" as >> you must first request to retrieve a file, wait for about a day, and >> then retrieve the file. And you're charged more for the download >> throughput than just about anything. >> >> I'm using a Unix client to handle all of the heavy lifting of >> uploading and downloading, as Glacier is meant to be used via an API >> rather than a web client.[1] If anyone is interested, I have local >> documentation on usage that I could probably genericize. And yes, I >> did round-trip a file to make sure it functioned as advertised. >> Roy >> >> [1] https://github.com/vsespb/mt-aws-glacier >> >> On Thu, Jan 10, 2013 at 2:29 PM, >>wrote: >>> We built our own solution for this by creating a plugin that works >>>with our digital asset management system (ResourceSpace) to >>>invidually back up files to Amazon S3. Because S3 is replicated to >>>multiple data centers, this provides a fairly high level of >>>redundancy. And because it's an object-based web service, we can >>>access any given object individually by using a URL related to the >>>original storage URL within our system. >>> >>> This also allows us to take advantage of S3 for images on our website. >>>All of the images from in our online collections database are being >>>served straight from S3, which diverts the load from our public web >>>server. When we launch zoomable images later this year, all of the >>>tiles will also be generated locally in the DAM and then served to >>>the public via the mirrored copy in S3. >>> >>> The current pricing is around $0.08/GB/month for 1-50 TB, which I >>>think is fairly reasonable for
Re: [CODE4LIB] Digital collection backups
We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is typically spec-d for consumer hardware, and so, presumably as a result of SE Asia flooding, there have been some drive failures and cache downtimes and adjustments accordingly. However, that is the worst of it, first. LOCKSS is to some perhaps even considerable degree, tamper-resistant since it relies on mechanisms of collective polling among multiple copies to preserve integrity. This, as opposed to static checksums or some other solution. As such, it seems to me important to run a LOCKSS box with other LOCKSS boxes; MA cooperative specifies six or so, distributed locations for each cache. The economic sustainability of such an enterprise is a valid question. David S H Rosenthal at Stanford seems to lead the charge for this research. e.g. http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more I've heard mention from other players that they watch MA carefully for such sustainability considerations, especially because MA uses LOCKSS for non-journal content. In some sense this may extend LOCKSS beyond its original design. MetaArchive has in my opinion been extremely responsible in designating succession scenarios and disaster recovery scenarios, going to far as to fund, develop and test services for migration out of the system, into an IRODS repository in the initial case. Al Matthews AUC Robert W. Woodruff Library On 1/11/13 9:10 AM, "Joshua Welker" wrote: >Good point. But since campus IT will be creating regular >disaster-recovery backups, the odds that we'd need ever need to retrieve >more than a handful of files from Glacier at a time is pretty low. > >Josh Welker > > >-Original Message- >From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of >Gary McGath >Sent: Friday, January 11, 2013 8:03 AM >To: CODE4LIB@LISTSERV.ND.EDU >Subject: Re: [CODE4LIB] Digital collection backups > >Concerns have been raised about how expensive Glacier gets if you need to >recover a lot of files in a short time period. > >http://www.wired.com/wiredenterprise/2012/08/glacier/ > >On 1/10/13 5:56 PM, Roy Tennant wrote: >> I'd also take a look at Amazon Glacier. Recently I parked about 50GB >> of data files in logical tar'd and gzip'd chunks and it's costing my >> employer less than 50 cents/month. Glacier, however, is best for "park >> it and forget" kinds of needs, as the real cost is in data flow. >> Storage is cheap, but must be considered "offline" or "near line" as >> you must first request to retrieve a file, wait for about a day, and >> then retrieve the file. And you're charged more for the download >> throughput than just about anything. >> >> I'm using a Unix client to handle all of the heavy lifting of >> uploading and downloading, as Glacier is meant to be used via an API >> rather than a web client.[1] If anyone is interested, I have local >> documentation on usage that I could probably genericize. And yes, I >> did round-trip a file to make sure it functioned as advertised. >> Roy >> >> [1] https://github.com/vsespb/mt-aws-glacier >> >> On Thu, Jan 10, 2013 at 2:29 PM, >>wrote: >>> We built our own solution for this by creating a plugin that works >>>with our digital asset management system (ResourceSpace) to invidually >>>back up files to Amazon S3. Because S3 is replicated to multiple data >>>centers, this provides a fairly high level of redundancy. And because >>>it's an object-based web service, we can access any given object >>>individually by using a URL related to the original storage URL within >>>our system. >>> >>> This also allows us to take advantage of S3 for images on our website. >>>All of the images from in our online collections database are being >>>served straight from S3, which diverts the load from our public web >>>server. When we launch zoomable images later this year, all of the >>>tiles will also be generated locally in the DAM and then served to the >>>public via the mirrored copy in S3. >>> >>> The current pricing is around $0.08/GB/month for 1-50 TB, which I >>>think is fairly reasonable for what we're getting. They just dropped >>>the price substantially a few months ago. >>> >>> DuraCloud http://www.duracloud.org/ supposedly offers a way to add >>>another abstraction layer so you can build something like this that is >>>portable between different cloud storage providers. But I haven't >>>really looked into this as of yet. > > >-- >Gary McGath, Professional Software Developer http
Re: [CODE4LIB] Digital collection backups
Hi Josh, Glad you are looking into LOCKSS as a potential solution for your needs and that you are thinking beyond simple backup solutions for more long-term preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve a range of content/collections from our member institutions. The nice thing (I think) about our approach and our use of LOCKSS as an embedded technology is that you as an institution retain full control over your collections in the preservation network and get to play an active and on-going part in their preservation treatment over time. Storage costs in MetaArchive are competitive ($1/GB/year), and with that you get up to 7 geographic replications. MetaArchive is international at this point and so your collections really do achieve some safe distance from any disasters that may hit close to home. I'd be more than happy to talk with you further about your collection needs, why we like LOCKSS, and any interest your institution may have in being part of a collaborative approach to preserving your content above and beyond simple backup. Feel free to contact me directly. Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker wrote: > Hi everyone, > > We are starting a digitization project for some of our special > collections, and we are having a hard time setting up a backup system that > meets the long-term preservation needs of digital archives. The backup > mechanisms currently used by campus IT are short-term full-server backups. > What we are looking for is more granular, file-level backup over the very > long term. Does anyone have any recommendations of software or some service > or technique? We are looking into LOCKSS but haven't dug too deeply yet. > Can anyone who uses LOCKSS tell me a bit of their experiences with it? > > Josh Welker > Electronic/Media Services Librarian > College Liaison > University Libraries > Southwest Baptist University > 417.328.1624 > -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204
Re: [CODE4LIB] Digital collection backups
Good point. But since campus IT will be creating regular disaster-recovery backups, the odds that we'd need ever need to retrieve more than a handful of files from Glacier at a time is pretty low. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Gary McGath Sent: Friday, January 11, 2013 8:03 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Concerns have been raised about how expensive Glacier gets if you need to recover a lot of files in a short time period. http://www.wired.com/wiredenterprise/2012/08/glacier/ On 1/10/13 5:56 PM, Roy Tennant wrote: > I'd also take a look at Amazon Glacier. Recently I parked about 50GB > of data files in logical tar'd and gzip'd chunks and it's costing my > employer less than 50 cents/month. Glacier, however, is best for "park > it and forget" kinds of needs, as the real cost is in data flow. > Storage is cheap, but must be considered "offline" or "near line" as > you must first request to retrieve a file, wait for about a day, and > then retrieve the file. And you're charged more for the download > throughput than just about anything. > > I'm using a Unix client to handle all of the heavy lifting of > uploading and downloading, as Glacier is meant to be used via an API > rather than a web client.[1] If anyone is interested, I have local > documentation on usage that I could probably genericize. And yes, I > did round-trip a file to make sure it functioned as advertised. > Roy > > [1] https://github.com/vsespb/mt-aws-glacier > > On Thu, Jan 10, 2013 at 2:29 PM, wrote: >> We built our own solution for this by creating a plugin that works with our >> digital asset management system (ResourceSpace) to invidually back up files >> to Amazon S3. Because S3 is replicated to multiple data centers, this >> provides a fairly high level of redundancy. And because it's an object-based >> web service, we can access any given object individually by using a URL >> related to the original storage URL within our system. >> >> This also allows us to take advantage of S3 for images on our website. All >> of the images from in our online collections database are being served >> straight from S3, which diverts the load from our public web server. When we >> launch zoomable images later this year, all of the tiles will also be >> generated locally in the DAM and then served to the public via the mirrored >> copy in S3. >> >> The current pricing is around $0.08/GB/month for 1-50 TB, which I think is >> fairly reasonable for what we're getting. They just dropped the price >> substantially a few months ago. >> >> DuraCloud http://www.duracloud.org/ supposedly offers a way to add another >> abstraction layer so you can build something like this that is portable >> between different cloud storage providers. But I haven't really looked into >> this as of yet. -- Gary McGath, Professional Software Developer http://www.garymcgath.com
Re: [CODE4LIB] Digital collection backups
Glacier sounds even better than S3 for what we're looking for. We are only going to be retrieving the files in the case of corruption, so the pay-per-retrieval model would work well. I heard of Glacier in the past but forgot all about it. Thank you. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy Tennant Sent: Thursday, January 10, 2013 4:56 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data files in logical tar'd and gzip'd chunks and it's costing my employer less than 50 cents/month. Glacier, however, is best for "park it and forget" kinds of needs, as the real cost is in data flow. Storage is cheap, but must be considered "offline" or "near line" as you must first request to retrieve a file, wait for about a day, and then retrieve the file. And you're charged more for the download throughput than just about anything. I'm using a Unix client to handle all of the heavy lifting of uploading and downloading, as Glacier is meant to be used via an API rather than a web client.[1] If anyone is interested, I have local documentation on usage that I could probably genericize. And yes, I did round-trip a file to make sure it functioned as advertised. Roy [1] https://github.com/vsespb/mt-aws-glacier On Thu, Jan 10, 2013 at 2:29 PM, wrote: > We built our own solution for this by creating a plugin that works with our > digital asset management system (ResourceSpace) to invidually back up files > to Amazon S3. Because S3 is replicated to multiple data centers, this > provides a fairly high level of redundancy. And because it's an object-based > web service, we can access any given object individually by using a URL > related to the original storage URL within our system. > > This also allows us to take advantage of S3 for images on our website. All of > the images from in our online collections database are being served straight > from S3, which diverts the load from our public web server. When we launch > zoomable images later this year, all of the tiles will also be generated > locally in the DAM and then served to the public via the mirrored copy in S3. > > The current pricing is around $0.08/GB/month for 1-50 TB, which I think is > fairly reasonable for what we're getting. They just dropped the price > substantially a few months ago. > > DuraCloud http://www.duracloud.org/ supposedly offers a way to add another > abstraction layer so you can build something like this that is portable > between different cloud storage providers. But I haven't really looked into > this as of yet. > > -David > > > __ > > David Dwiggins > Systems Librarian/Archivist, Historic New England > 141 Cambridge Street, Boston, MA 02114 > (617) 994-5948 > ddwigg...@historicnewengland.org > http://www.historicnewengland.org >>>> Joshua Welker 1/10/2013 5:20 PM >>> > Hi everyone, > > We are starting a digitization project for some of our special collections, > and we are having a hard time setting up a backup system that meets the > long-term preservation needs of digital archives. The backup mechanisms > currently used by campus IT are short-term full-server backups. What we are > looking for is more granular, file-level backup over the very long term. Does > anyone have any recommendations of software or some service or technique? We > are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses > LOCKSS tell me a bit of their experiences with it? > > Josh Welker > Electronic/Media Services Librarian > College Liaison > University Libraries > Southwest Baptist University > 417.328.1624
Re: [CODE4LIB] Digital collection backups
Concerns have been raised about how expensive Glacier gets if you need to recover a lot of files in a short time period. http://www.wired.com/wiredenterprise/2012/08/glacier/ On 1/10/13 5:56 PM, Roy Tennant wrote: > I'd also take a look at Amazon Glacier. Recently I parked about 50GB > of data files in logical tar'd and gzip'd chunks and it's costing my > employer less than 50 cents/month. Glacier, however, is best for "park > it and forget" kinds of needs, as the real cost is in data flow. > Storage is cheap, but must be considered "offline" or "near line" as > you must first request to retrieve a file, wait for about a day, and > then retrieve the file. And you're charged more for the download > throughput than just about anything. > > I'm using a Unix client to handle all of the heavy lifting of > uploading and downloading, as Glacier is meant to be used via an API > rather than a web client.[1] If anyone is interested, I have local > documentation on usage that I could probably genericize. And yes, I > did round-trip a file to make sure it functioned as advertised. > Roy > > [1] https://github.com/vsespb/mt-aws-glacier > > On Thu, Jan 10, 2013 at 2:29 PM, wrote: >> We built our own solution for this by creating a plugin that works with our >> digital asset management system (ResourceSpace) to invidually back up files >> to Amazon S3. Because S3 is replicated to multiple data centers, this >> provides a fairly high level of redundancy. And because it's an object-based >> web service, we can access any given object individually by using a URL >> related to the original storage URL within our system. >> >> This also allows us to take advantage of S3 for images on our website. All >> of the images from in our online collections database are being served >> straight from S3, which diverts the load from our public web server. When we >> launch zoomable images later this year, all of the tiles will also be >> generated locally in the DAM and then served to the public via the mirrored >> copy in S3. >> >> The current pricing is around $0.08/GB/month for 1-50 TB, which I think is >> fairly reasonable for what we're getting. They just dropped the price >> substantially a few months ago. >> >> DuraCloud http://www.duracloud.org/ supposedly offers a way to add another >> abstraction layer so you can build something like this that is portable >> between different cloud storage providers. But I haven't really looked into >> this as of yet. -- Gary McGath, Professional Software Developer http://www.garymcgath.com
Re: [CODE4LIB] Digital collection backups
David, That sounds like a definite option. Thanks. Does S3 has an API for uploading so that the upload process could be scripted, or do you manually upload each file? Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of ddwigg...@historicnewengland.org Sent: Thursday, January 10, 2013 4:29 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups We built our own solution for this by creating a plugin that works with our digital asset management system (ResourceSpace) to invidually back up files to Amazon S3. Because S3 is replicated to multiple data centers, this provides a fairly high level of redundancy. And because it's an object-based web service, we can access any given object individually by using a URL related to the original storage URL within our system. This also allows us to take advantage of S3 for images on our website. All of the images from in our online collections database are being served straight from S3, which diverts the load from our public web server. When we launch zoomable images later this year, all of the tiles will also be generated locally in the DAM and then served to the public via the mirrored copy in S3. The current pricing is around $0.08/GB/month for 1-50 TB, which I think is fairly reasonable for what we're getting. They just dropped the price substantially a few months ago. DuraCloud http://www.duracloud.org/ supposedly offers a way to add another abstraction layer so you can build something like this that is portable between different cloud storage providers. But I haven't really looked into this as of yet. -David __ David Dwiggins Systems Librarian/Archivist, Historic New England 141 Cambridge Street, Boston, MA 02114 (617) 994-5948 ddwigg...@historicnewengland.org http://www.historicnewengland.org >>> Joshua Welker 1/10/2013 5:20 PM >>> Hi everyone, We are starting a digitization project for some of our special collections, and we are having a hard time setting up a backup system that meets the long-term preservation needs of digital archives. The backup mechanisms currently used by campus IT are short-term full-server backups. What we are looking for is more granular, file-level backup over the very long term. Does anyone have any recommendations of software or some service or technique? We are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit of their experiences with it? Josh Welker Electronic/Media Services Librarian College Liaison University Libraries Southwest Baptist University 417.328.1624
Re: [CODE4LIB] Digital collection backups
Obnam http://liw.fi/obnam/ might do what you need with the minimum of fuss Chris On 11 January 2013 12:05, Fleming, Declan wrote: > Hi - you might look into Chronopolis (which can be front ended by DuraCloud > or not) http://chronopolis.sdsc.edu/ > > Declan > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy > Tennant > Sent: Thursday, January 10, 2013 2:56 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Digital collection backups > > I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data > files in logical tar'd and gzip'd chunks and it's costing my employer less > than 50 cents/month. Glacier, however, is best for "park it and forget" kinds > of needs, as the real cost is in data flow. > Storage is cheap, but must be considered "offline" or "near line" as you must > first request to retrieve a file, wait for about a day, and then retrieve the > file. And you're charged more for the download throughput than just about > anything. > > I'm using a Unix client to handle all of the heavy lifting of uploading and > downloading, as Glacier is meant to be used via an API rather than a web > client.[1] If anyone is interested, I have local documentation on usage that > I could probably genericize. And yes, I did round-trip a file to make sure it > functioned as advertised. > Roy > > [1] https://github.com/vsespb/mt-aws-glacier > > On Thu, Jan 10, 2013 at 2:29 PM, wrote: >> We built our own solution for this by creating a plugin that works with our >> digital asset management system (ResourceSpace) to invidually back up files >> to Amazon S3. Because S3 is replicated to multiple data centers, this >> provides a fairly high level of redundancy. And because it's an object-based >> web service, we can access any given object individually by using a URL >> related to the original storage URL within our system. >> >> This also allows us to take advantage of S3 for images on our website. All >> of the images from in our online collections database are being served >> straight from S3, which diverts the load from our public web server. When we >> launch zoomable images later this year, all of the tiles will also be >> generated locally in the DAM and then served to the public via the mirrored >> copy in S3. >> >> The current pricing is around $0.08/GB/month for 1-50 TB, which I think is >> fairly reasonable for what we're getting. They just dropped the price >> substantially a few months ago. >> >> DuraCloud http://www.duracloud.org/ supposedly offers a way to add another >> abstraction layer so you can build something like this that is portable >> between different cloud storage providers. But I haven't really looked into >> this as of yet. >> >> -David >> >> >> __ >> >> David Dwiggins >> Systems Librarian/Archivist, Historic New England >> 141 Cambridge Street, Boston, MA 02114 >> (617) 994-5948 >> ddwigg...@historicnewengland.org >> http://www.historicnewengland.org >>>>> Joshua Welker 1/10/2013 5:20 PM >>> >> Hi everyone, >> >> We are starting a digitization project for some of our special collections, >> and we are having a hard time setting up a backup system that meets the >> long-term preservation needs of digital archives. The backup mechanisms >> currently used by campus IT are short-term full-server backups. What we are >> looking for is more granular, file-level backup over the very long term. >> Does anyone have any recommendations of software or some service or >> technique? We are looking into LOCKSS but haven't dug too deeply yet. Can >> anyone who uses LOCKSS tell me a bit of their experiences with it? >> >> Josh Welker >> Electronic/Media Services Librarian >> College Liaison >> University Libraries >> Southwest Baptist University >> 417.328.1624
Re: [CODE4LIB] Digital collection backups
Hi - you might look into Chronopolis (which can be front ended by DuraCloud or not) http://chronopolis.sdsc.edu/ Declan -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy Tennant Sent: Thursday, January 10, 2013 2:56 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data files in logical tar'd and gzip'd chunks and it's costing my employer less than 50 cents/month. Glacier, however, is best for "park it and forget" kinds of needs, as the real cost is in data flow. Storage is cheap, but must be considered "offline" or "near line" as you must first request to retrieve a file, wait for about a day, and then retrieve the file. And you're charged more for the download throughput than just about anything. I'm using a Unix client to handle all of the heavy lifting of uploading and downloading, as Glacier is meant to be used via an API rather than a web client.[1] If anyone is interested, I have local documentation on usage that I could probably genericize. And yes, I did round-trip a file to make sure it functioned as advertised. Roy [1] https://github.com/vsespb/mt-aws-glacier On Thu, Jan 10, 2013 at 2:29 PM, wrote: > We built our own solution for this by creating a plugin that works with our > digital asset management system (ResourceSpace) to invidually back up files > to Amazon S3. Because S3 is replicated to multiple data centers, this > provides a fairly high level of redundancy. And because it's an object-based > web service, we can access any given object individually by using a URL > related to the original storage URL within our system. > > This also allows us to take advantage of S3 for images on our website. All of > the images from in our online collections database are being served straight > from S3, which diverts the load from our public web server. When we launch > zoomable images later this year, all of the tiles will also be generated > locally in the DAM and then served to the public via the mirrored copy in S3. > > The current pricing is around $0.08/GB/month for 1-50 TB, which I think is > fairly reasonable for what we're getting. They just dropped the price > substantially a few months ago. > > DuraCloud http://www.duracloud.org/ supposedly offers a way to add another > abstraction layer so you can build something like this that is portable > between different cloud storage providers. But I haven't really looked into > this as of yet. > > -David > > > __ > > David Dwiggins > Systems Librarian/Archivist, Historic New England > 141 Cambridge Street, Boston, MA 02114 > (617) 994-5948 > ddwigg...@historicnewengland.org > http://www.historicnewengland.org >>>> Joshua Welker 1/10/2013 5:20 PM >>> > Hi everyone, > > We are starting a digitization project for some of our special collections, > and we are having a hard time setting up a backup system that meets the > long-term preservation needs of digital archives. The backup mechanisms > currently used by campus IT are short-term full-server backups. What we are > looking for is more granular, file-level backup over the very long term. Does > anyone have any recommendations of software or some service or technique? We > are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses > LOCKSS tell me a bit of their experiences with it? > > Josh Welker > Electronic/Media Services Librarian > College Liaison > University Libraries > Southwest Baptist University > 417.328.1624
Re: [CODE4LIB] Digital collection backups
I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data files in logical tar'd and gzip'd chunks and it's costing my employer less than 50 cents/month. Glacier, however, is best for "park it and forget" kinds of needs, as the real cost is in data flow. Storage is cheap, but must be considered "offline" or "near line" as you must first request to retrieve a file, wait for about a day, and then retrieve the file. And you're charged more for the download throughput than just about anything. I'm using a Unix client to handle all of the heavy lifting of uploading and downloading, as Glacier is meant to be used via an API rather than a web client.[1] If anyone is interested, I have local documentation on usage that I could probably genericize. And yes, I did round-trip a file to make sure it functioned as advertised. Roy [1] https://github.com/vsespb/mt-aws-glacier On Thu, Jan 10, 2013 at 2:29 PM, wrote: > We built our own solution for this by creating a plugin that works with our > digital asset management system (ResourceSpace) to invidually back up files > to Amazon S3. Because S3 is replicated to multiple data centers, this > provides a fairly high level of redundancy. And because it's an object-based > web service, we can access any given object individually by using a URL > related to the original storage URL within our system. > > This also allows us to take advantage of S3 for images on our website. All of > the images from in our online collections database are being served straight > from S3, which diverts the load from our public web server. When we launch > zoomable images later this year, all of the tiles will also be generated > locally in the DAM and then served to the public via the mirrored copy in S3. > > The current pricing is around $0.08/GB/month for 1-50 TB, which I think is > fairly reasonable for what we're getting. They just dropped the price > substantially a few months ago. > > DuraCloud http://www.duracloud.org/ supposedly offers a way to add another > abstraction layer so you can build something like this that is portable > between different cloud storage providers. But I haven't really looked into > this as of yet. > > -David > > > __ > > David Dwiggins > Systems Librarian/Archivist, Historic New England > 141 Cambridge Street, Boston, MA 02114 > (617) 994-5948 > ddwigg...@historicnewengland.org > http://www.historicnewengland.org Joshua Welker 1/10/2013 5:20 PM >>> > Hi everyone, > > We are starting a digitization project for some of our special collections, > and we are having a hard time setting up a backup system that meets the > long-term preservation needs of digital archives. The backup mechanisms > currently used by campus IT are short-term full-server backups. What we are > looking for is more granular, file-level backup over the very long term. Does > anyone have any recommendations of software or some service or technique? We > are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses > LOCKSS tell me a bit of their experiences with it? > > Josh Welker > Electronic/Media Services Librarian > College Liaison > University Libraries > Southwest Baptist University > 417.328.1624
Re: [CODE4LIB] Digital collection backups
We built our own solution for this by creating a plugin that works with our digital asset management system (ResourceSpace) to invidually back up files to Amazon S3. Because S3 is replicated to multiple data centers, this provides a fairly high level of redundancy. And because it's an object-based web service, we can access any given object individually by using a URL related to the original storage URL within our system. This also allows us to take advantage of S3 for images on our website. All of the images from in our online collections database are being served straight from S3, which diverts the load from our public web server. When we launch zoomable images later this year, all of the tiles will also be generated locally in the DAM and then served to the public via the mirrored copy in S3. The current pricing is around $0.08/GB/month for 1-50 TB, which I think is fairly reasonable for what we're getting. They just dropped the price substantially a few months ago. DuraCloud http://www.duracloud.org/ supposedly offers a way to add another abstraction layer so you can build something like this that is portable between different cloud storage providers. But I haven't really looked into this as of yet. -David __ David Dwiggins Systems Librarian/Archivist, Historic New England 141 Cambridge Street, Boston, MA 02114 (617) 994-5948 ddwigg...@historicnewengland.org http://www.historicnewengland.org >>> Joshua Welker 1/10/2013 5:20 PM >>> Hi everyone, We are starting a digitization project for some of our special collections, and we are having a hard time setting up a backup system that meets the long-term preservation needs of digital archives. The backup mechanisms currently used by campus IT are short-term full-server backups. What we are looking for is more granular, file-level backup over the very long term. Does anyone have any recommendations of software or some service or technique? We are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit of their experiences with it? Josh Welker Electronic/Media Services Librarian College Liaison University Libraries Southwest Baptist University 417.328.1624