Re: [CODE4LIB] Digital collection backups
David, That sounds like a definite option. Thanks. Does S3 has an API for uploading so that the upload process could be scripted, or do you manually upload each file? Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of ddwigg...@historicnewengland.org Sent: Thursday, January 10, 2013 4:29 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups We built our own solution for this by creating a plugin that works with our digital asset management system (ResourceSpace) to invidually back up files to Amazon S3. Because S3 is replicated to multiple data centers, this provides a fairly high level of redundancy. And because it's an object-based web service, we can access any given object individually by using a URL related to the original storage URL within our system. This also allows us to take advantage of S3 for images on our website. All of the images from in our online collections database are being served straight from S3, which diverts the load from our public web server. When we launch zoomable images later this year, all of the tiles will also be generated locally in the DAM and then served to the public via the mirrored copy in S3. The current pricing is around $0.08/GB/month for 1-50 TB, which I think is fairly reasonable for what we're getting. They just dropped the price substantially a few months ago. DuraCloud http://www.duracloud.org/ supposedly offers a way to add another abstraction layer so you can build something like this that is portable between different cloud storage providers. But I haven't really looked into this as of yet. -David __ David Dwiggins Systems Librarian/Archivist, Historic New England 141 Cambridge Street, Boston, MA 02114 (617) 994-5948 ddwigg...@historicnewengland.org http://www.historicnewengland.org Joshua Welker jwel...@sbuniv.edu 1/10/2013 5:20 PM Hi everyone, We are starting a digitization project for some of our special collections, and we are having a hard time setting up a backup system that meets the long-term preservation needs of digital archives. The backup mechanisms currently used by campus IT are short-term full-server backups. What we are looking for is more granular, file-level backup over the very long term. Does anyone have any recommendations of software or some service or technique? We are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit of their experiences with it? Josh Welker Electronic/Media Services Librarian College Liaison University Libraries Southwest Baptist University 417.328.1624
Re: [CODE4LIB] Digital collection backups
Concerns have been raised about how expensive Glacier gets if you need to recover a lot of files in a short time period. http://www.wired.com/wiredenterprise/2012/08/glacier/ On 1/10/13 5:56 PM, Roy Tennant wrote: I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data files in logical tar'd and gzip'd chunks and it's costing my employer less than 50 cents/month. Glacier, however, is best for park it and forget kinds of needs, as the real cost is in data flow. Storage is cheap, but must be considered offline or near line as you must first request to retrieve a file, wait for about a day, and then retrieve the file. And you're charged more for the download throughput than just about anything. I'm using a Unix client to handle all of the heavy lifting of uploading and downloading, as Glacier is meant to be used via an API rather than a web client.[1] If anyone is interested, I have local documentation on usage that I could probably genericize. And yes, I did round-trip a file to make sure it functioned as advertised. Roy [1] https://github.com/vsespb/mt-aws-glacier On Thu, Jan 10, 2013 at 2:29 PM, ddwigg...@historicnewengland.org wrote: We built our own solution for this by creating a plugin that works with our digital asset management system (ResourceSpace) to invidually back up files to Amazon S3. Because S3 is replicated to multiple data centers, this provides a fairly high level of redundancy. And because it's an object-based web service, we can access any given object individually by using a URL related to the original storage URL within our system. This also allows us to take advantage of S3 for images on our website. All of the images from in our online collections database are being served straight from S3, which diverts the load from our public web server. When we launch zoomable images later this year, all of the tiles will also be generated locally in the DAM and then served to the public via the mirrored copy in S3. The current pricing is around $0.08/GB/month for 1-50 TB, which I think is fairly reasonable for what we're getting. They just dropped the price substantially a few months ago. DuraCloud http://www.duracloud.org/ supposedly offers a way to add another abstraction layer so you can build something like this that is portable between different cloud storage providers. But I haven't really looked into this as of yet. -- Gary McGath, Professional Software Developer http://www.garymcgath.com
Re: [CODE4LIB] Digital collection backups
Glacier sounds even better than S3 for what we're looking for. We are only going to be retrieving the files in the case of corruption, so the pay-per-retrieval model would work well. I heard of Glacier in the past but forgot all about it. Thank you. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy Tennant Sent: Thursday, January 10, 2013 4:56 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data files in logical tar'd and gzip'd chunks and it's costing my employer less than 50 cents/month. Glacier, however, is best for park it and forget kinds of needs, as the real cost is in data flow. Storage is cheap, but must be considered offline or near line as you must first request to retrieve a file, wait for about a day, and then retrieve the file. And you're charged more for the download throughput than just about anything. I'm using a Unix client to handle all of the heavy lifting of uploading and downloading, as Glacier is meant to be used via an API rather than a web client.[1] If anyone is interested, I have local documentation on usage that I could probably genericize. And yes, I did round-trip a file to make sure it functioned as advertised. Roy [1] https://github.com/vsespb/mt-aws-glacier On Thu, Jan 10, 2013 at 2:29 PM, ddwigg...@historicnewengland.org wrote: We built our own solution for this by creating a plugin that works with our digital asset management system (ResourceSpace) to invidually back up files to Amazon S3. Because S3 is replicated to multiple data centers, this provides a fairly high level of redundancy. And because it's an object-based web service, we can access any given object individually by using a URL related to the original storage URL within our system. This also allows us to take advantage of S3 for images on our website. All of the images from in our online collections database are being served straight from S3, which diverts the load from our public web server. When we launch zoomable images later this year, all of the tiles will also be generated locally in the DAM and then served to the public via the mirrored copy in S3. The current pricing is around $0.08/GB/month for 1-50 TB, which I think is fairly reasonable for what we're getting. They just dropped the price substantially a few months ago. DuraCloud http://www.duracloud.org/ supposedly offers a way to add another abstraction layer so you can build something like this that is portable between different cloud storage providers. But I haven't really looked into this as of yet. -David __ David Dwiggins Systems Librarian/Archivist, Historic New England 141 Cambridge Street, Boston, MA 02114 (617) 994-5948 ddwigg...@historicnewengland.org http://www.historicnewengland.org Joshua Welker jwel...@sbuniv.edu 1/10/2013 5:20 PM Hi everyone, We are starting a digitization project for some of our special collections, and we are having a hard time setting up a backup system that meets the long-term preservation needs of digital archives. The backup mechanisms currently used by campus IT are short-term full-server backups. What we are looking for is more granular, file-level backup over the very long term. Does anyone have any recommendations of software or some service or technique? We are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit of their experiences with it? Josh Welker Electronic/Media Services Librarian College Liaison University Libraries Southwest Baptist University 417.328.1624
Re: [CODE4LIB] Digital collection backups
Good point. But since campus IT will be creating regular disaster-recovery backups, the odds that we'd need ever need to retrieve more than a handful of files from Glacier at a time is pretty low. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Gary McGath Sent: Friday, January 11, 2013 8:03 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Concerns have been raised about how expensive Glacier gets if you need to recover a lot of files in a short time period. http://www.wired.com/wiredenterprise/2012/08/glacier/ On 1/10/13 5:56 PM, Roy Tennant wrote: I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data files in logical tar'd and gzip'd chunks and it's costing my employer less than 50 cents/month. Glacier, however, is best for park it and forget kinds of needs, as the real cost is in data flow. Storage is cheap, but must be considered offline or near line as you must first request to retrieve a file, wait for about a day, and then retrieve the file. And you're charged more for the download throughput than just about anything. I'm using a Unix client to handle all of the heavy lifting of uploading and downloading, as Glacier is meant to be used via an API rather than a web client.[1] If anyone is interested, I have local documentation on usage that I could probably genericize. And yes, I did round-trip a file to make sure it functioned as advertised. Roy [1] https://github.com/vsespb/mt-aws-glacier On Thu, Jan 10, 2013 at 2:29 PM, ddwigg...@historicnewengland.org wrote: We built our own solution for this by creating a plugin that works with our digital asset management system (ResourceSpace) to invidually back up files to Amazon S3. Because S3 is replicated to multiple data centers, this provides a fairly high level of redundancy. And because it's an object-based web service, we can access any given object individually by using a URL related to the original storage URL within our system. This also allows us to take advantage of S3 for images on our website. All of the images from in our online collections database are being served straight from S3, which diverts the load from our public web server. When we launch zoomable images later this year, all of the tiles will also be generated locally in the DAM and then served to the public via the mirrored copy in S3. The current pricing is around $0.08/GB/month for 1-50 TB, which I think is fairly reasonable for what we're getting. They just dropped the price substantially a few months ago. DuraCloud http://www.duracloud.org/ supposedly offers a way to add another abstraction layer so you can build something like this that is portable between different cloud storage providers. But I haven't really looked into this as of yet. -- Gary McGath, Professional Software Developer http://www.garymcgath.com
Re: [CODE4LIB] Digital collection backups
Hi Josh, Glad you are looking into LOCKSS as a potential solution for your needs and that you are thinking beyond simple backup solutions for more long-term preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve a range of content/collections from our member institutions. The nice thing (I think) about our approach and our use of LOCKSS as an embedded technology is that you as an institution retain full control over your collections in the preservation network and get to play an active and on-going part in their preservation treatment over time. Storage costs in MetaArchive are competitive ($1/GB/year), and with that you get up to 7 geographic replications. MetaArchive is international at this point and so your collections really do achieve some safe distance from any disasters that may hit close to home. I'd be more than happy to talk with you further about your collection needs, why we like LOCKSS, and any interest your institution may have in being part of a collaborative approach to preserving your content above and beyond simple backup. Feel free to contact me directly. Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote: Hi everyone, We are starting a digitization project for some of our special collections, and we are having a hard time setting up a backup system that meets the long-term preservation needs of digital archives. The backup mechanisms currently used by campus IT are short-term full-server backups. What we are looking for is more granular, file-level backup over the very long term. Does anyone have any recommendations of software or some service or technique? We are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit of their experiences with it? Josh Welker Electronic/Media Services Librarian College Liaison University Libraries Southwest Baptist University 417.328.1624 -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204
Re: [CODE4LIB] Digital collection backups
We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is typically spec-d for consumer hardware, and so, presumably as a result of SE Asia flooding, there have been some drive failures and cache downtimes and adjustments accordingly. However, that is the worst of it, first. LOCKSS is to some perhaps even considerable degree, tamper-resistant since it relies on mechanisms of collective polling among multiple copies to preserve integrity. This, as opposed to static checksums or some other solution. As such, it seems to me important to run a LOCKSS box with other LOCKSS boxes; MA cooperative specifies six or so, distributed locations for each cache. The economic sustainability of such an enterprise is a valid question. David S H Rosenthal at Stanford seems to lead the charge for this research. e.g. http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more I've heard mention from other players that they watch MA carefully for such sustainability considerations, especially because MA uses LOCKSS for non-journal content. In some sense this may extend LOCKSS beyond its original design. MetaArchive has in my opinion been extremely responsible in designating succession scenarios and disaster recovery scenarios, going to far as to fund, develop and test services for migration out of the system, into an IRODS repository in the initial case. Al Matthews AUC Robert W. Woodruff Library On 1/11/13 9:10 AM, Joshua Welker jwel...@sbuniv.edu wrote: Good point. But since campus IT will be creating regular disaster-recovery backups, the odds that we'd need ever need to retrieve more than a handful of files from Glacier at a time is pretty low. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Gary McGath Sent: Friday, January 11, 2013 8:03 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Concerns have been raised about how expensive Glacier gets if you need to recover a lot of files in a short time period. http://www.wired.com/wiredenterprise/2012/08/glacier/ On 1/10/13 5:56 PM, Roy Tennant wrote: I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data files in logical tar'd and gzip'd chunks and it's costing my employer less than 50 cents/month. Glacier, however, is best for park it and forget kinds of needs, as the real cost is in data flow. Storage is cheap, but must be considered offline or near line as you must first request to retrieve a file, wait for about a day, and then retrieve the file. And you're charged more for the download throughput than just about anything. I'm using a Unix client to handle all of the heavy lifting of uploading and downloading, as Glacier is meant to be used via an API rather than a web client.[1] If anyone is interested, I have local documentation on usage that I could probably genericize. And yes, I did round-trip a file to make sure it functioned as advertised. Roy [1] https://github.com/vsespb/mt-aws-glacier On Thu, Jan 10, 2013 at 2:29 PM, ddwigg...@historicnewengland.org wrote: We built our own solution for this by creating a plugin that works with our digital asset management system (ResourceSpace) to invidually back up files to Amazon S3. Because S3 is replicated to multiple data centers, this provides a fairly high level of redundancy. And because it's an object-based web service, we can access any given object individually by using a URL related to the original storage URL within our system. This also allows us to take advantage of S3 for images on our website. All of the images from in our online collections database are being served straight from S3, which diverts the load from our public web server. When we launch zoomable images later this year, all of the tiles will also be generated locally in the DAM and then served to the public via the mirrored copy in S3. The current pricing is around $0.08/GB/month for 1-50 TB, which I think is fairly reasonable for what we're getting. They just dropped the price substantially a few months ago. DuraCloud http://www.duracloud.org/ supposedly offers a way to add another abstraction layer so you can build something like this that is portable between different cloud storage providers. But I haven't really looked into this as of yet. -- Gary McGath, Professional Software Developer http://www.garymcgath.com - ** The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email in error please notify the system manager or the sender immediately and do not disclose the contents to anyone or make copies. ** IronMail scanned this email for viruses, vandals and malicious content. **
Re: [CODE4LIB] Digital collection backups
Thanks, Al. I think we'd join a LOCKSS network rather than run multiple LOCKSS boxes ourselves. Does anyone have any experience with one of those, like the LOCKSS Global Alliance? Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Al Matthews Sent: Friday, January 11, 2013 8:50 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is typically spec-d for consumer hardware, and so, presumably as a result of SE Asia flooding, there have been some drive failures and cache downtimes and adjustments accordingly. However, that is the worst of it, first. LOCKSS is to some perhaps even considerable degree, tamper-resistant since it relies on mechanisms of collective polling among multiple copies to preserve integrity. This, as opposed to static checksums or some other solution. As such, it seems to me important to run a LOCKSS box with other LOCKSS boxes; MA cooperative specifies six or so, distributed locations for each cache. The economic sustainability of such an enterprise is a valid question. David S H Rosenthal at Stanford seems to lead the charge for this research. e.g. http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more I've heard mention from other players that they watch MA carefully for such sustainability considerations, especially because MA uses LOCKSS for non-journal content. In some sense this may extend LOCKSS beyond its original design. MetaArchive has in my opinion been extremely responsible in designating succession scenarios and disaster recovery scenarios, going to far as to fund, develop and test services for migration out of the system, into an IRODS repository in the initial case. Al Matthews AUC Robert W. Woodruff Library On 1/11/13 9:10 AM, Joshua Welker jwel...@sbuniv.edu wrote: Good point. But since campus IT will be creating regular disaster-recovery backups, the odds that we'd need ever need to retrieve more than a handful of files from Glacier at a time is pretty low. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Gary McGath Sent: Friday, January 11, 2013 8:03 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Concerns have been raised about how expensive Glacier gets if you need to recover a lot of files in a short time period. http://www.wired.com/wiredenterprise/2012/08/glacier/ On 1/10/13 5:56 PM, Roy Tennant wrote: I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data files in logical tar'd and gzip'd chunks and it's costing my employer less than 50 cents/month. Glacier, however, is best for park it and forget kinds of needs, as the real cost is in data flow. Storage is cheap, but must be considered offline or near line as you must first request to retrieve a file, wait for about a day, and then retrieve the file. And you're charged more for the download throughput than just about anything. I'm using a Unix client to handle all of the heavy lifting of uploading and downloading, as Glacier is meant to be used via an API rather than a web client.[1] If anyone is interested, I have local documentation on usage that I could probably genericize. And yes, I did round-trip a file to make sure it functioned as advertised. Roy [1] https://github.com/vsespb/mt-aws-glacier On Thu, Jan 10, 2013 at 2:29 PM, ddwigg...@historicnewengland.org wrote: We built our own solution for this by creating a plugin that works with our digital asset management system (ResourceSpace) to invidually back up files to Amazon S3. Because S3 is replicated to multiple data centers, this provides a fairly high level of redundancy. And because it's an object-based web service, we can access any given object individually by using a URL related to the original storage URL within our system. This also allows us to take advantage of S3 for images on our website. All of the images from in our online collections database are being served straight from S3, which diverts the load from our public web server. When we launch zoomable images later this year, all of the tiles will also be generated locally in the DAM and then served to the public via the mirrored copy in S3. The current pricing is around $0.08/GB/month for 1-50 TB, which I think is fairly reasonable for what we're getting. They just dropped the price substantially a few months ago. DuraCloud http://www.duracloud.org/ supposedly offers a way to add another abstraction layer so you can build something like this that is portable between different cloud storage providers. But I haven't really looked into this as of yet. -- Gary McGath, Professional Software Developer http://www.garymcgath.com -
Re: [CODE4LIB] Digital collection backups
http://metaarchive.org/costs in our case. Interested to hear other experiences. Al On 1/11/13 10:01 AM, Joshua Welker jwel...@sbuniv.edu wrote: Thanks, Al. I think we'd join a LOCKSS network rather than run multiple LOCKSS boxes ourselves. Does anyone have any experience with one of those, like the LOCKSS Global Alliance? Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Al Matthews Sent: Friday, January 11, 2013 8:50 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is typically spec-d for consumer hardware, and so, presumably as a result of SE Asia flooding, there have been some drive failures and cache downtimes and adjustments accordingly. However, that is the worst of it, first. LOCKSS is to some perhaps even considerable degree, tamper-resistant since it relies on mechanisms of collective polling among multiple copies to preserve integrity. This, as opposed to static checksums or some other solution. As such, it seems to me important to run a LOCKSS box with other LOCKSS boxes; MA cooperative specifies six or so, distributed locations for each cache. The economic sustainability of such an enterprise is a valid question. David S H Rosenthal at Stanford seems to lead the charge for this research. e.g. http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more I've heard mention from other players that they watch MA carefully for such sustainability considerations, especially because MA uses LOCKSS for non-journal content. In some sense this may extend LOCKSS beyond its original design. MetaArchive has in my opinion been extremely responsible in designating succession scenarios and disaster recovery scenarios, going to far as to fund, develop and test services for migration out of the system, into an IRODS repository in the initial case. Al Matthews AUC Robert W. Woodruff Library On 1/11/13 9:10 AM, Joshua Welker jwel...@sbuniv.edu wrote: Good point. But since campus IT will be creating regular disaster-recovery backups, the odds that we'd need ever need to retrieve more than a handful of files from Glacier at a time is pretty low. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Gary McGath Sent: Friday, January 11, 2013 8:03 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Concerns have been raised about how expensive Glacier gets if you need to recover a lot of files in a short time period. http://www.wired.com/wiredenterprise/2012/08/glacier/ On 1/10/13 5:56 PM, Roy Tennant wrote: I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data files in logical tar'd and gzip'd chunks and it's costing my employer less than 50 cents/month. Glacier, however, is best for park it and forget kinds of needs, as the real cost is in data flow. Storage is cheap, but must be considered offline or near line as you must first request to retrieve a file, wait for about a day, and then retrieve the file. And you're charged more for the download throughput than just about anything. I'm using a Unix client to handle all of the heavy lifting of uploading and downloading, as Glacier is meant to be used via an API rather than a web client.[1] If anyone is interested, I have local documentation on usage that I could probably genericize. And yes, I did round-trip a file to make sure it functioned as advertised. Roy [1] https://github.com/vsespb/mt-aws-glacier On Thu, Jan 10, 2013 at 2:29 PM, ddwigg...@historicnewengland.org wrote: We built our own solution for this by creating a plugin that works with our digital asset management system (ResourceSpace) to invidually back up files to Amazon S3. Because S3 is replicated to multiple data centers, this provides a fairly high level of redundancy. And because it's an object-based web service, we can access any given object individually by using a URL related to the original storage URL within our system. This also allows us to take advantage of S3 for images on our website. All of the images from in our online collections database are being served straight from S3, which diverts the load from our public web server. When we launch zoomable images later this year, all of the tiles will also be generated locally in the DAM and then served to the public via the mirrored copy in S3. The current pricing is around $0.08/GB/month for 1-50 TB, which I think is fairly reasonable for what we're getting. They just dropped the price substantially a few months ago. DuraCloud http://www.duracloud.org/ supposedly offers a way to add another abstraction layer so you can build something like this that is portable between different cloud storage providers. But I haven't really looked into this as of yet. -- Gary McGath, Professional Software Developer
Re: [CODE4LIB] Digital collection backups
Hi Josh, I lurked on this thread, as I did not know the size of your institution. Being a public library serving about 24,000 residents - we have the small-institution issues as well for this type of project. We recently tackled a similar situation and the solution: 1) Purchase a 3TB SeaGate external network storage device (residential drive from Best Buy) 2) Burn archived materials to DVD 3) Copy files to external storage (on site in my server room) 4) DVDs reside off-site (we are still determining where this would be, as the library does not have a Safe Deposit Box) This removes external companies, and the data is quick trip home and back. I know it is not elaborate and fancy, very little code... but it was $150 for the drive; and cost of DVDs. James Gilbert, BS, MLIS Systems Librarian Whitehall Township Public Library 3700 Mechanicsville Road Whitehall, PA 18052 610-432-4330 ext: 203 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joshua Welker Sent: Friday, January 11, 2013 10:09 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Matt, I appreciate the information. At that price, it looks like MetaArchive would be a better option than most of the other services mentioned in this thread. At this point, I think it is going to come down to a LOCKSS solution such as what MetaArchive provides or Amazon Glacier. We anticipate our digital collection growing to about 3TB in the first two years. With Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like to support library initiatives like LOCKSS, we are a small institution with a very small budget, and the pricing of Glacier is starting to look too good to pass up. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt Schultz Sent: Friday, January 11, 2013 8:49 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, Glad you are looking into LOCKSS as a potential solution for your needs and that you are thinking beyond simple backup solutions for more long-term preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve a range of content/collections from our member institutions. The nice thing (I think) about our approach and our use of LOCKSS as an embedded technology is that you as an institution retain full control over your collections in the preservation network and get to play an active and on-going part in their preservation treatment over time. Storage costs in MetaArchive are competitive ($1/GB/year), and with that you get up to 7 geographic replications. MetaArchive is international at this point and so your collections really do achieve some safe distance from any disasters that may hit close to home. I'd be more than happy to talk with you further about your collection needs, why we like LOCKSS, and any interest your institution may have in being part of a collaborative approach to preserving your content above and beyond simple backup. Feel free to contact me directly. Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote: Hi everyone, We are starting a digitization project for some of our special collections, and we are having a hard time setting up a backup system that meets the long-term preservation needs of digital archives. The backup mechanisms currently used by campus IT are short-term full-server backups. What we are looking for is more granular, file-level backup over the very long term. Does anyone have any recommendations of software or some service or technique? We are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit of their experiences with it? Josh Welker Electronic/Media Services Librarian College Liaison University Libraries Southwest Baptist University 417.328.1624 -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204
Re: [CODE4LIB] Digital collection backups
James, Definitely a simple and elegant solution, but that is not a viable long-term option for us. We currently have tons of old CDs and DVDs full of data, and one of our goals is to wean off those media completely. Most consumer-grade CDs and DVDs are very poor in terms of long-term data integrity. Those discs have a shelf life of probably a decade or two tops. Plus we are wanting more redundancy than what is offered by having the backups as a collection of discs in a single physical location. But if that works for you guys, power to you. Cheap is good. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of James Gilbert Sent: Friday, January 11, 2013 9:34 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, I lurked on this thread, as I did not know the size of your institution. Being a public library serving about 24,000 residents - we have the small-institution issues as well for this type of project. We recently tackled a similar situation and the solution: 1) Purchase a 3TB SeaGate external network storage device (residential drive from Best Buy) 2) Burn archived materials to DVD 3) Copy files to external storage (on site in my server room) 4) DVDs reside off-site (we are still determining where this would be, as the library does not have a Safe Deposit Box) This removes external companies, and the data is quick trip home and back. I know it is not elaborate and fancy, very little code... but it was $150 for the drive; and cost of DVDs. James Gilbert, BS, MLIS Systems Librarian Whitehall Township Public Library 3700 Mechanicsville Road Whitehall, PA 18052 610-432-4330 ext: 203 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joshua Welker Sent: Friday, January 11, 2013 10:09 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Matt, I appreciate the information. At that price, it looks like MetaArchive would be a better option than most of the other services mentioned in this thread. At this point, I think it is going to come down to a LOCKSS solution such as what MetaArchive provides or Amazon Glacier. We anticipate our digital collection growing to about 3TB in the first two years. With Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like to support library initiatives like LOCKSS, we are a small institution with a very small budget, and the pricing of Glacier is starting to look too good to pass up. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt Schultz Sent: Friday, January 11, 2013 8:49 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, Glad you are looking into LOCKSS as a potential solution for your needs and that you are thinking beyond simple backup solutions for more long-term preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve a range of content/collections from our member institutions. The nice thing (I think) about our approach and our use of LOCKSS as an embedded technology is that you as an institution retain full control over your collections in the preservation network and get to play an active and on-going part in their preservation treatment over time. Storage costs in MetaArchive are competitive ($1/GB/year), and with that you get up to 7 geographic replications. MetaArchive is international at this point and so your collections really do achieve some safe distance from any disasters that may hit close to home. I'd be more than happy to talk with you further about your collection needs, why we like LOCKSS, and any interest your institution may have in being part of a collaborative approach to preserving your content above and beyond simple backup. Feel free to contact me directly. Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote: Hi everyone, We are starting a digitization project for some of our special collections, and we are having a hard time setting up a backup system that meets the long-term preservation needs of digital archives. The backup mechanisms currently used by campus IT are short-term full-server backups. What we are looking for is more granular, file-level backup over the very long term. Does anyone have any recommendations of software or some service or technique? We are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit of their experiences with it? Josh Welker Electronic/Media Services Librarian College Liaison University Libraries Southwest Baptist University 417.328.1624 -- Matt Schultz Program
Re: [CODE4LIB] Digital collection backups
Josh, Totally understand the resource constraints and the price comparison up-front. As Roy alluded to earlier, it pays with Glacier to envision what your content retrieval scenarios might be, because that $368 up-front could very easily balloon in situations where you are needing to restore a collection(s) en-masse at a later date. Amazon Glacier as a service makes their money on that end. In MetaArchive there is currently no charge for collection retrieval for the sake of a restoration. You are also subject and powerless over the long-term to Amazon's price hikes with Glacier. Because we are a Cooperative, our members collaboratively work together annually to determine technology preferences, vendors, pricing, cost control, etc. You have a direct seat at the table to help steer the solution in your direction. On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker jwel...@sbuniv.edu wrote: Matt, I appreciate the information. At that price, it looks like MetaArchive would be a better option than most of the other services mentioned in this thread. At this point, I think it is going to come down to a LOCKSS solution such as what MetaArchive provides or Amazon Glacier. We anticipate our digital collection growing to about 3TB in the first two years. With Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like to support library initiatives like LOCKSS, we are a small institution with a very small budget, and the pricing of Glacier is starting to look too good to pass up. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt Schultz Sent: Friday, January 11, 2013 8:49 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, Glad you are looking into LOCKSS as a potential solution for your needs and that you are thinking beyond simple backup solutions for more long-term preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve a range of content/collections from our member institutions. The nice thing (I think) about our approach and our use of LOCKSS as an embedded technology is that you as an institution retain full control over your collections in the preservation network and get to play an active and on-going part in their preservation treatment over time. Storage costs in MetaArchive are competitive ($1/GB/year), and with that you get up to 7 geographic replications. MetaArchive is international at this point and so your collections really do achieve some safe distance from any disasters that may hit close to home. I'd be more than happy to talk with you further about your collection needs, why we like LOCKSS, and any interest your institution may have in being part of a collaborative approach to preserving your content above and beyond simple backup. Feel free to contact me directly. Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote: Hi everyone, We are starting a digitization project for some of our special collections, and we are having a hard time setting up a backup system that meets the long-term preservation needs of digital archives. The backup mechanisms currently used by campus IT are short-term full-server backups. What we are looking for is more granular, file-level backup over the very long term. Does anyone have any recommendations of software or some service or technique? We are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit of their experiences with it? Josh Welker Electronic/Media Services Librarian College Liaison University Libraries Southwest Baptist University 417.328.1624 -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204
Re: [CODE4LIB] Digital collection backups
Restoring 3 Tb from Glacier is $370. Add about $90 if you use AWS Import/Export (you provide the device). Hopefully, this is not something that you would do often. Cary On Fri, Jan 11, 2013 at 8:14 AM, Matt Schultz matt.schu...@metaarchive.org wrote: Josh, Totally understand the resource constraints and the price comparison up-front. As Roy alluded to earlier, it pays with Glacier to envision what your content retrieval scenarios might be, because that $368 up-front could very easily balloon in situations where you are needing to restore a collection(s) en-masse at a later date. Amazon Glacier as a service makes their money on that end. In MetaArchive there is currently no charge for collection retrieval for the sake of a restoration. You are also subject and powerless over the long-term to Amazon's price hikes with Glacier. Because we are a Cooperative, our members collaboratively work together annually to determine technology preferences, vendors, pricing, cost control, etc. You have a direct seat at the table to help steer the solution in your direction. On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker jwel...@sbuniv.edu wrote: Matt, I appreciate the information. At that price, it looks like MetaArchive would be a better option than most of the other services mentioned in this thread. At this point, I think it is going to come down to a LOCKSS solution such as what MetaArchive provides or Amazon Glacier. We anticipate our digital collection growing to about 3TB in the first two years. With Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like to support library initiatives like LOCKSS, we are a small institution with a very small budget, and the pricing of Glacier is starting to look too good to pass up. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt Schultz Sent: Friday, January 11, 2013 8:49 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, Glad you are looking into LOCKSS as a potential solution for your needs and that you are thinking beyond simple backup solutions for more long-term preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve a range of content/collections from our member institutions. The nice thing (I think) about our approach and our use of LOCKSS as an embedded technology is that you as an institution retain full control over your collections in the preservation network and get to play an active and on-going part in their preservation treatment over time. Storage costs in MetaArchive are competitive ($1/GB/year), and with that you get up to 7 geographic replications. MetaArchive is international at this point and so your collections really do achieve some safe distance from any disasters that may hit close to home. I'd be more than happy to talk with you further about your collection needs, why we like LOCKSS, and any interest your institution may have in being part of a collaborative approach to preserving your content above and beyond simple backup. Feel free to contact me directly. Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote: Hi everyone, We are starting a digitization project for some of our special collections, and we are having a hard time setting up a backup system that meets the long-term preservation needs of digital archives. The backup mechanisms currently used by campus IT are short-term full-server backups. What we are looking for is more granular, file-level backup over the very long term. Does anyone have any recommendations of software or some service or technique? We are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit of their experiences with it? Josh Welker Electronic/Media Services Librarian College Liaison University Libraries Southwest Baptist University 417.328.1624 -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 -- Cary Gordon The Cherry Hill Company http://chillco.com
Re: [CODE4LIB] Digital collection backups
Hello Josh, Auburn University is a member of two Private LOCKSS Networks: the MetaArchive Cooperative and the Alabama Digital Preservation Network (ADPNet). Here's a link to a recent conference paper that describes both networks, including their current pricing structures: http://conference.ifla.org/past/ifla78/216-trehub-en.pdf LOCKSS has worked well for us so far, in part because supporting community-based solutions is important to us. As you point out, however, Glacier is an attractive alternative, especially for institutions that may be more interested in low-cost, low-throughput storage and less concerned about entrusting their content to a commercial outfit or having to pay extra to get it back out. As with most things, you pay your money--more or less, depending--and make your choice. And take your risks. Good luck with whatever solution(s) you decide on. They need not be mutually exclusive. Best, Aaron Aaron Trehub Assistant Dean for Technology and Technical Services Auburn University Libraries 231 Mell Street, RBD Library Auburn, AL 36849-5606 Phone: (334) 844-1716 Skype: ajtrehub E-mail: treh...@auburn.edu URL: http://lib.auburn.edu/ -Original Message- From: Code for Libraries [mailto:CODE4LIB@listserv.nd.edu] On Behalf Of Joshua Welker Sent: Friday, January 11, 2013 9:09 AM To: CODE4LIB@listserv.nd.edu Subject: Re: [CODE4LIB] Digital collection backups Matt, I appreciate the information. At that price, it looks like MetaArchive would be a better option than most of the other services mentioned in this thread. At this point, I think it is going to come down to a LOCKSS solution such as what MetaArchive provides or Amazon Glacier. We anticipate our digital collection growing to about 3TB in the first two years. With Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like to support library initiatives like LOCKSS, we are a small institution with a very small budget, and the pricing of Glacier is starting to look too good to pass up. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt Schultz Sent: Friday, January 11, 2013 8:49 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, Glad you are looking into LOCKSS as a potential solution for your needs and that you are thinking beyond simple backup solutions for more long-term preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve a range of content/collections from our member institutions. The nice thing (I think) about our approach and our use of LOCKSS as an embedded technology is that you as an institution retain full control over your collections in the preservation network and get to play an active and on-going part in their preservation treatment over time. Storage costs in MetaArchive are competitive ($1/GB/year), and with that you get up to 7 geographic replications. MetaArchive is international at this point and so your collections really do achieve some safe distance from any disasters that may hit close to home. I'd be more than happy to talk with you further about your collection needs, why we like LOCKSS, and any interest your institution may have in being part of a collaborative approach to preserving your content above and beyond simple backup. Feel free to contact me directly. Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote: Hi everyone, We are starting a digitization project for some of our special collections, and we are having a hard time setting up a backup system that meets the long-term preservation needs of digital archives. The backup mechanisms currently used by campus IT are short-term full-server backups. What we are looking for is more granular, file-level backup over the very long term. Does anyone have any recommendations of software or some service or technique? We are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit of their experiences with it? Josh Welker Electronic/Media Services Librarian College Liaison University Libraries Southwest Baptist University 417.328.1624 -- Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204
[CODE4LIB] code4lib 2013 location
Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. best, Erik Hetzner Sent from my free software system http://fsf.org/. pgpnr9TtfSgBA.pgp Description: PGP signature
Re: [CODE4LIB] Digital collection backups
Thanks, I missed the part about DuraCloud as an abstraction layer. I might look into hosting an install of it on the primary server running the digitization platform. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tim Donohue Sent: Friday, January 11, 2013 12:39 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi all, Just wanted to add some additional details about DuraCloud (mentioned earlier in this thread), in case it is of interest to anyone. DuraCloud essentially provides an abstraction layer (as previously mentioned) above several cloud storage providers. DuraCloud also provides additional preservation services to help manage your content in the cloud (e.g. integrity checks, replication across several storage providers, migration between storage providers, various health/status reports). The currently supported cloud storage providers include: - Amazon S3 - Rackspace - SDSC There's several other cloud storage providers which are beta-level or in development. These include: - Amazon Glacier (in development) - Chronopolis (in development) - Azure (beta) - iRODS (beta) - HP Cloud (beta) DuraCloud is open source (so you could run it on your own server), but it is also offered as a hosted service (through DuraSpace, my employer). You can also try out the hosted service for free for two months. For much more info, see: - http://www.duracloud.org - Pricing for hosted service: http://duracloud.org/content/pricing * The pricing has dropped recently to reflect market changes - More technical info / documentation: https://wiki.duraspace.org/display/DURACLOUD/DuraCloud If it's of interest, I can put folks in touch with the DuraCloud team for more info (or you can email i...@duracloud.org). - Tim -- Tim Donohue Technical Lead for DSpace Project DuraSpace.org
Re: [CODE4LIB] Digital collection backups
The only scenario I can think of where we'd need to do a full restore is if the server crashes, and for those cases, we are going to have typical short-term imaging setups in place. Our needs beyond that are to make sure our original files are backed up redundantly in some non-volatile location so that in the event a file on the local server becomes corrupt, we have a high fidelity copy of the original on hand to use to restore it. Since data decay I assume happens rather infrequently and over a long period of time, it's not important for us to be able to restore all the files at once. Like I said, if the server catches on fire and crashes, we have regular off-site tape-based storage to fix those short-term problems. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cary Gordon Sent: Friday, January 11, 2013 10:27 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Restoring 3 Tb from Glacier is $370. Add about $90 if you use AWS Import/Export (you provide the device). Hopefully, this is not something that you would do often. Cary On Fri, Jan 11, 2013 at 8:14 AM, Matt Schultz matt.schu...@metaarchive.org wrote: Josh, Totally understand the resource constraints and the price comparison up-front. As Roy alluded to earlier, it pays with Glacier to envision what your content retrieval scenarios might be, because that $368 up-front could very easily balloon in situations where you are needing to restore a collection(s) en-masse at a later date. Amazon Glacier as a service makes their money on that end. In MetaArchive there is currently no charge for collection retrieval for the sake of a restoration. You are also subject and powerless over the long-term to Amazon's price hikes with Glacier. Because we are a Cooperative, our members collaboratively work together annually to determine technology preferences, vendors, pricing, cost control, etc. You have a direct seat at the table to help steer the solution in your direction. On Fri, Jan 11, 2013 at 10:09 AM, Joshua Welker jwel...@sbuniv.edu wrote: Matt, I appreciate the information. At that price, it looks like MetaArchive would be a better option than most of the other services mentioned in this thread. At this point, I think it is going to come down to a LOCKSS solution such as what MetaArchive provides or Amazon Glacier. We anticipate our digital collection growing to about 3TB in the first two years. With Glacier, that would be $368 per year vs $3,072 per year for MetaArchive and LOCKSS. As much as I would like to support library initiatives like LOCKSS, we are a small institution with a very small budget, and the pricing of Glacier is starting to look too good to pass up. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Matt Schultz Sent: Friday, January 11, 2013 8:49 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, Glad you are looking into LOCKSS as a potential solution for your needs and that you are thinking beyond simple backup solutions for more long-term preservation. Here at MetaArchive Cooperative we make use of LOCKSS to preserve a range of content/collections from our member institutions. The nice thing (I think) about our approach and our use of LOCKSS as an embedded technology is that you as an institution retain full control over your collections in the preservation network and get to play an active and on-going part in their preservation treatment over time. Storage costs in MetaArchive are competitive ($1/GB/year), and with that you get up to 7 geographic replications. MetaArchive is international at this point and so your collections really do achieve some safe distance from any disasters that may hit close to home. I'd be more than happy to talk with you further about your collection needs, why we like LOCKSS, and any interest your institution may have in being part of a collaborative approach to preserving your content above and beyond simple backup. Feel free to contact me directly. Matt Schultz Program Manager Educopia Institute, MetaArchive Cooperative http://www.metaarchive.org matt.schu...@metaarchive.org 616-566-3204 On Thu, Jan 10, 2013 at 5:20 PM, Joshua Welker jwel...@sbuniv.edu wrote: Hi everyone, We are starting a digitization project for some of our special collections, and we are having a hard time setting up a backup system that meets the long-term preservation needs of digital archives. The backup mechanisms currently used by campus IT are short-term full-server backups. What we are looking for is more granular, file-level backup over the very long term. Does anyone have any recommendations of software or some service or technique? We are looking into LOCKSS but haven't dug
Re: [CODE4LIB] Digital collection backups
Thanks for bringing up the issue of the cost of making sure the data is consistent. We will be using DSpace for now, and I know DSpace has some checksum functionality built in out-of-the-box. It shouldn't be too difficult to write a script that loops through DSpace's checksum data and compares it against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it looks like they provide an archive inventory (updated daily) that can be downloaded as JSON. I read some users saying that this inventory includes checksum data. So hopefully it will just be a matter of comparing the local checksum to the Glacier checksum, and that would be easy enough to script. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby Sent: Friday, January 11, 2013 11:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups As Aaron alludes to your decision should base off your real needs and they might not be exclusive. LOCKSS/MetaArchive might be worth the money if it is the community archival aspect you are going for. Depending on your institution being a participant might make political/mission sense regardless of the storage needs and it could just be a specific collection that makes sense. Glacier is a great choice if you are looking for spreading a backup across regions. S3 similarly if you also want to benefit from CloudFront (the CDN setup) to take load off your institutions server (you can now use cloudfront off your own origin server as well). Depending on your bandwidth this might be worth the money regardless of LOCKSS participation (which can be more dark). Amazon also tends to be dropping prices over time vs raising but as any outsource you have to plan that it might not exist in the future. Also look more at Glacier prices in terms of checking your data for consistency. There have been a few papers on the costs of making sure Amazon really has the proper data depending on how often your requirements want you to check. Another option if you are just looking for more geo placement is finding an institution or service provider that will colocate. There may be another small institution that would love to shove a cheap box with hard drives on your network in exchange for the same. Not as involved/formal as LOCKSS but gives you something you control to satisfy your requirements. It could also be as low tech as shipping SSDs to another institution who then runs some bagit checksums on the drive, etc. All of the above should be scriptable in your workflow. Just need to decide what you really want out of it. Eby On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub treh...@auburn.edu wrote: Hello Josh, Auburn University is a member of two Private LOCKSS Networks: the MetaArchive Cooperative and the Alabama Digital Preservation Network (ADPNet). Here's a link to a recent conference paper that describes both networks, including their current pricing structures: http://conference.ifla.org/past/ifla78/216-trehub-en.pdf LOCKSS has worked well for us so far, in part because supporting community-based solutions is important to us. As you point out, however, Glacier is an attractive alternative, especially for institutions that may be more interested in low-cost, low-throughput storage and less concerned about entrusting their content to a commercial outfit or having to pay extra to get it back out. As with most things, you pay your money--more or less, depending--and make your choice. And take your risks. Good luck with whatever solution(s) you decide on. They need not be mutually exclusive. Best, Aaron Aaron Trehub Assistant Dean for Technology and Technical Services Auburn University Libraries 231 Mell Street, RBD Library Auburn, AL 36849-5606 Phone: (334) 844-1716 Skype: ajtrehub E-mail: treh...@auburn.edu URL: http://lib.auburn.edu/
Re: [CODE4LIB] code4lib 2013 location
On Fri, Jan 11, 2013 at 10:41:54AM -0800, Erik Hetzner wrote: Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. That's a good assumption to make. As to the confusion I said to you when you asked me about this a couple of days ago. http://www.uic.edu/~kayiwa/code4lib.html was supposed to be our proposal. If you look at the document it also suggests that we were going to have the conference registration staggered by timezones. We have elected not to update that because as that was our proposal. When preparing our proposal we borrowed heavily from Yale's and IU's proposal and if someone would like to steal from us I think it is fair to leave that as is. If you want the conference page use the lanyrd.com link below. I can't even take credit for doing that. All of that goes to @pberry http://lanyrd.com/2013/c4l13/ Cheers, ./fxk best, Erik Hetzner Sent from my free software system http://fsf.org/. -- Speed is subsittute fo accurancy.
Re: [CODE4LIB] Digital collection backups
On Fri, Jan 11, 2013 at 07:45:21PM +, Joshua Welker wrote: Thanks for bringing up the issue of the cost of making sure the data is consistent. We will be using DSpace for now, and I know DSpace has some checksum functionality built in out-of-the-box. It shouldn't be too difficult to write a script that loops through DSpace's checksum data and compares it against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it looks like they provide an archive inventory (updated daily) that can be downloaded as JSON. I read some users saying that this inventory includes checksum data. So hopefully it will just be a matter of comparing the local checksum to the Glacier checksum, and that would be easy enough to script. An important question to ask here, though, is if that included checksum data is the same that Amazon uses to perform the systematic data integrity checks they mention in the Glacier FAQ, or if it's just catalog data --- here's the checksum when we put it in. This is always the question we run into when we consider services like this, can we tease enough information out to convince ourselves that their checking is sufficient. -- Thomas L. Kula | tlk2...@columbia.edu Systems Engineer | Library Information Technology Office The Libraries, Columbia University in the City of New York
Re: [CODE4LIB] Digital collection backups
Hi Josh, Now that you bring up DSpace as being part of the equation... You might want to look at the newly released Replication Task Suite plugin/addon for DSpace (supports DSpace versions 1.8.x 3.0): https://wiki.duraspace.org/display/DSPACE/ReplicationTaskSuite This DSpace plugin does essentially what you are talking about... It allows you to backup (i.e. replicate) DSpace content files and metadata (in the form of a set of AIPs, Archival Information Packages) to a local filesystem/drive or to cloud storage. Plus it provides an auditing tool to audit changes between DSpace and the cloud storage provider. Currently, for the Replication Task Suite, that only cloud storage plugin we have created is for DuraCloud. But, it wouldn't be too hard to create a new plugin for Glacier (if you wanted to send DSpace content directly to Glacier without DuraCloud in between). The code is in GitHub at: https://github.com/DSpace/dspace-replicate If you decide to use it and create anything cool, feel free to send us a pull request. Good luck, - Tim -- Tim Donohue Technical Lead for DSpace Project DuraSpace.org On 1/11/2013 1:45 PM, Joshua Welker wrote: Thanks for bringing up the issue of the cost of making sure the data is consistent. We will be using DSpace for now, and I know DSpace has some checksum functionality built in out-of-the-box. It shouldn't be too difficult to write a script that loops through DSpace's checksum data and compares it against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it looks like they provide an archive inventory (updated daily) that can be downloaded as JSON. I read some users saying that this inventory includes checksum data. So hopefully it will just be a matter of comparing the local checksum to the Glacier checksum, and that would be easy enough to script. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby Sent: Friday, January 11, 2013 11:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups As Aaron alludes to your decision should base off your real needs and they might not be exclusive. LOCKSS/MetaArchive might be worth the money if it is the community archival aspect you are going for. Depending on your institution being a participant might make political/mission sense regardless of the storage needs and it could just be a specific collection that makes sense. Glacier is a great choice if you are looking for spreading a backup across regions. S3 similarly if you also want to benefit from CloudFront (the CDN setup) to take load off your institutions server (you can now use cloudfront off your own origin server as well). Depending on your bandwidth this might be worth the money regardless of LOCKSS participation (which can be more dark). Amazon also tends to be dropping prices over time vs raising but as any outsource you have to plan that it might not exist in the future. Also look more at Glacier prices in terms of checking your data for consistency. There have been a few papers on the costs of making sure Amazon really has the proper data depending on how often your requirements want you to check. Another option if you are just looking for more geo placement is finding an institution or service provider that will colocate. There may be another small institution that would love to shove a cheap box with hard drives on your network in exchange for the same. Not as involved/formal as LOCKSS but gives you something you control to satisfy your requirements. It could also be as low tech as shipping SSDs to another institution who then runs some bagit checksums on the drive, etc. All of the above should be scriptable in your workflow. Just need to decide what you really want out of it. Eby On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub treh...@auburn.edu wrote: Hello Josh, Auburn University is a member of two Private LOCKSS Networks: the MetaArchive Cooperative and the Alabama Digital Preservation Network (ADPNet). Here's a link to a recent conference paper that describes both networks, including their current pricing structures: http://conference.ifla.org/past/ifla78/216-trehub-en.pdf LOCKSS has worked well for us so far, in part because supporting community-based solutions is important to us. As you point out, however, Glacier is an attractive alternative, especially for institutions that may be more interested in low-cost, low-throughput storage and less concerned about entrusting their content to a commercial outfit or having to pay extra to get it back out. As with most things, you pay your money--more or less, depending--and make your choice. And take your risks. Good luck with whatever solution(s) you decide on. They need not be mutually exclusive. Best, Aaron Aaron Trehub Assistant Dean for Technology and Technical Services Auburn University Libraries 231 Mell Street, RBD Library
[CODE4LIB] Job Posting / Metadata Specialist / Washington, DC
Apologies for the cross postings . . . . . . . LAC Group is seeking a Metadata Specialist to work on a long-term contract for a prestigious government agency located in Washington, DC. This position includes reconciling existing schemas and vocabularies to create an enterprise schema and vocabulary using appropriate software tools. A successful candidate will have experience in crisp execution of projects; understand the role of standards, structure, content analysis, context, user roles; and information lifecycle management. The candidate is expected to understand and help shape the organization's content strategy, including findability, discovery, usability, dissemination, etc., as well as basic information management principles. Responsibilities: § Develop schemas to standardize the business semantics of the agency; § Assist in developing content strategies for facilitate interoperability, exchange, findability, discovery, usability, etc.; § Evaluate software tools to manage the agency's business schema; § Ability to manage projects crisply and produce deliverables on time and within budget. Qualifications: § Master's Degree in Library and Information Science or equivalent work experience in business or non-profit sectors; § Significant expertise in developing/managing business semantics, business vocabularies, content strategies, business process analysis, systems analysis, etc.; § Experience in the areas of content modeling, content analysis, business process analysis, etc.; § Understanding of emerging information services/technology trends a plus; § Ability to balance business requests with users growing needs in the development and growth of system taxonomies and metadata; § Experience with Controlled Reference Sources; § Business Analysis experience - attending requirement gathering sessions with Stakeholders, extracting information, and managing the requirements process; § Builds and maintains strong relationships with team members to meet organizational goals as well as strong sense of urgency with excellent organizational and time management skills; § Excellent analytical and communication skills; § Creative problem solving abilities; § Ability to work effectively in a multicultural, multi-project environment and ability to respond immediately to often changing business priorities; § Knowledge of a second language is a plus. Immediate consideration, apply at http://goo.gl/Bd0YB LAC Group is an Equal Opportunity/Affirmative Action employer and values diversity in the workforce. LAC Group is a premier provider of recruiting and consultancy services for information professionals at U.S. and global organizations including Fortune 100 companies, law firms, pharmaceutical companies, large academic institutions and prominent government agencies.
Re: [CODE4LIB] Digital collection backups
Awesome! Thanks. I will look into this for sure. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tim Donohue Sent: Friday, January 11, 2013 2:30 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups Hi Josh, Now that you bring up DSpace as being part of the equation... You might want to look at the newly released Replication Task Suite plugin/addon for DSpace (supports DSpace versions 1.8.x 3.0): https://wiki.duraspace.org/display/DSPACE/ReplicationTaskSuite This DSpace plugin does essentially what you are talking about... It allows you to backup (i.e. replicate) DSpace content files and metadata (in the form of a set of AIPs, Archival Information Packages) to a local filesystem/drive or to cloud storage. Plus it provides an auditing tool to audit changes between DSpace and the cloud storage provider. Currently, for the Replication Task Suite, that only cloud storage plugin we have created is for DuraCloud. But, it wouldn't be too hard to create a new plugin for Glacier (if you wanted to send DSpace content directly to Glacier without DuraCloud in between). The code is in GitHub at: https://github.com/DSpace/dspace-replicate If you decide to use it and create anything cool, feel free to send us a pull request. Good luck, - Tim -- Tim Donohue Technical Lead for DSpace Project DuraSpace.org On 1/11/2013 1:45 PM, Joshua Welker wrote: Thanks for bringing up the issue of the cost of making sure the data is consistent. We will be using DSpace for now, and I know DSpace has some checksum functionality built in out-of-the-box. It shouldn't be too difficult to write a script that loops through DSpace's checksum data and compares it against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it looks like they provide an archive inventory (updated daily) that can be downloaded as JSON. I read some users saying that this inventory includes checksum data. So hopefully it will just be a matter of comparing the local checksum to the Glacier checksum, and that would be easy enough to script. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby Sent: Friday, January 11, 2013 11:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups As Aaron alludes to your decision should base off your real needs and they might not be exclusive. LOCKSS/MetaArchive might be worth the money if it is the community archival aspect you are going for. Depending on your institution being a participant might make political/mission sense regardless of the storage needs and it could just be a specific collection that makes sense. Glacier is a great choice if you are looking for spreading a backup across regions. S3 similarly if you also want to benefit from CloudFront (the CDN setup) to take load off your institutions server (you can now use cloudfront off your own origin server as well). Depending on your bandwidth this might be worth the money regardless of LOCKSS participation (which can be more dark). Amazon also tends to be dropping prices over time vs raising but as any outsource you have to plan that it might not exist in the future. Also look more at Glacier prices in terms of checking your data for consistency. There have been a few papers on the costs of making sure Amazon really has the proper data depending on how often your requirements want you to check. Another option if you are just looking for more geo placement is finding an institution or service provider that will colocate. There may be another small institution that would love to shove a cheap box with hard drives on your network in exchange for the same. Not as involved/formal as LOCKSS but gives you something you control to satisfy your requirements. It could also be as low tech as shipping SSDs to another institution who then runs some bagit checksums on the drive, etc. All of the above should be scriptable in your workflow. Just need to decide what you really want out of it. Eby On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub treh...@auburn.edu wrote: Hello Josh, Auburn University is a member of two Private LOCKSS Networks: the MetaArchive Cooperative and the Alabama Digital Preservation Network (ADPNet). Here's a link to a recent conference paper that describes both networks, including their current pricing structures: http://conference.ifla.org/past/ifla78/216-trehub-en.pdf LOCKSS has worked well for us so far, in part because supporting community-based solutions is important to us. As you point out, however, Glacier is an attractive alternative, especially for institutions that may be more interested in low-cost, low-throughput storage and less concerned about entrusting their content to a commercial outfit or having to pay extra to get
Re: [CODE4LIB] code4lib 2013 location
I'll take this opportunity to remind folks that if you spot anything amiss, please let me know (or sign up and fix it!) and I will clean it up. Thanks! On Fri, Jan 11, 2013 at 11:51 AM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 10:41:54AM -0800, Erik Hetzner wrote: Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. That's a good assumption to make. As to the confusion I said to you when you asked me about this a couple of days ago. http://www.uic.edu/~kayiwa/code4lib.html was supposed to be our proposal. If you look at the document it also suggests that we were going to have the conference registration staggered by timezones. We have elected not to update that because as that was our proposal. When preparing our proposal we borrowed heavily from Yale's and IU's proposal and if someone would like to steal from us I think it is fair to leave that as is. If you want the conference page use the lanyrd.com link below. I can't even take credit for doing that. All of that goes to @pberry http://lanyrd.com/2013/c4l13/ Cheers, ./fxk best, Erik Hetzner Sent from my free software system http://fsf.org/. -- Speed is subsittute fo accurancy.
Re: [CODE4LIB] Digital collection backups
Be careful about assuming too much on this. When I started working with S3, the system required an MD5 sum to upload, and would respond to requests with this etag in the header as well. I therefor assumed that this was integral to the system, and was a good way to compare local files against the remote copies. Then, maybe a year or two ago, Amazon introduced chunked uploads, so that you could send files in pieces and reassemble them once they got to S3. This was good, because it eliminated problems with huge files failing to upload due to network hicups. I went ahead and implemented it in my scripts. Then, all of a sudden I started getting invalid checksums. Turns out that for multipart file uploads, they now create etag identifiers that are not the md5 sum of the underlying files. I now store the checksum as a separate piece of header metadata. And my sync script does periodically compare against this. But since this is just metadata, checking it doesn't really prove anything about the underlying file that Amazon has. To do this I would need to write a script that would actually retrieve the file and rerun the checksum. I have not done this yet, although it is on my to-do list at some point. This would ideally happen on an Amazon server so that I wouldn't have to send the file back and forth. In any case, my main point is: don't assume that you can just check against a checksum from the API to verify a file for digital preservation purposes. -David __ David Dwiggins Systems Librarian/Archivist, Historic New England 141 Cambridge Street, Boston, MA 02114 (617) 994-5948 ddwigg...@historicnewengland.org http://www.historicnewengland.org Joshua Welker jwel...@sbuniv.edu 1/11/2013 2:45 PM Thanks for bringing up the issue of the cost of making sure the data is consistent. We will be using DSpace for now, and I know DSpace has some checksum functionality built in out-of-the-box. It shouldn't be too difficult to write a script that loops through DSpace's checksum data and compares it against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it looks like they provide an archive inventory (updated daily) that can be downloaded as JSON. I read some users saying that this inventory includes checksum data. So hopefully it will just be a matter of comparing the local checksum to the Glacier checksum, and that would be easy enough to script. Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ryan Eby Sent: Friday, January 11, 2013 11:37 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Digital collection backups As Aaron alludes to your decision should base off your real needs and they might not be exclusive. LOCKSS/MetaArchive might be worth the money if it is the community archival aspect you are going for. Depending on your institution being a participant might make political/mission sense regardless of the storage needs and it could just be a specific collection that makes sense. Glacier is a great choice if you are looking for spreading a backup across regions. S3 similarly if you also want to benefit from CloudFront (the CDN setup) to take load off your institutions server (you can now use cloudfront off your own origin server as well). Depending on your bandwidth this might be worth the money regardless of LOCKSS participation (which can be more dark). Amazon also tends to be dropping prices over time vs raising but as any outsource you have to plan that it might not exist in the future. Also look more at Glacier prices in terms of checking your data for consistency. There have been a few papers on the costs of making sure Amazon really has the proper data depending on how often your requirements want you to check. Another option if you are just looking for more geo placement is finding an institution or service provider that will colocate. There may be another small institution that would love to shove a cheap box with hard drives on your network in exchange for the same. Not as involved/formal as LOCKSS but gives you something you control to satisfy your requirements. It could also be as low tech as shipping SSDs to another institution who then runs some bagit checksums on the drive, etc. All of the above should be scriptable in your workflow. Just need to decide what you really want out of it. Eby On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub treh...@auburn.edu wrote: Hello Josh, Auburn University is a member of two Private LOCKSS Networks: the MetaArchive Cooperative and the Alabama Digital Preservation Network (ADPNet). Here's a link to a recent conference paper that describes both networks, including their current pricing structures: http://conference.ifla.org/past/ifla78/216-trehub-en.pdf LOCKSS has worked well for us so far, in part because supporting community-based solutions is important to us. As you point out,
Re: [CODE4LIB] Digital collection backups
On Fri, Jan 11, 2013 at 2:45 PM, Joshua Welker jwel...@sbuniv.edu wrote: Reading the Glacier FAQ on Amazon's site, it looks like they provide an archive inventory (updated daily) that can be downloaded as JSON. I read some users saying that this inventory includes checksum data. So hopefully it will just be a matter of comparing the local checksum to the Glacier checksum, and that would be easy enough to script. One could also occasionally spin up local EC2 instances to do the checksums in the same data center, and ship just that metadata down - you would not incur any bulk transfer costs in that case (if memory serves). DAITSS uses both md5 and sha1 checksums in combination, other preservation systems might require similar. -Randy Fischer
Re: [CODE4LIB] XMP Metadata to tab-delemited file
Hi, Andrea, XMP is natively an RDF-based format, so getting out XML isn't hard at all. You have a couple of XML-based options with exiftool: exiftool -X foo.jpg # prints the metadata in exiftool's own RDF/XML schema to stdout exiftool -tagsfromfile foo.jpg -o foo.xmp # writes the metadata in an XMP XML file Exiftool -tagsfromfile foo.jpg -o -.xmp # writes the metadata in XMP XML to stdout; only works in recentish versions of exiftool exiftool also has a CSV output that might be helpful to you; check `exiftool --help` for details on how that works. Misty On 13-01-10 11:32 AM, Medina-Smith, Andrea andrea.medina-sm...@nist.gov wrote: I can get the data out, and I can even get a single file created w/ all the metadata for all the images in the collection. It's just that it is unstructured and not useful as such. Anything xml would also be useful, but I haven't found a product that does that. I was really trying not to just call you up ;) -a __ _ Andrea Medina-Smith Metadata Librarian NIST Gaithersburg andrea.medina-sm...@nist.gov 301-975-2592 Be Green! Think before you print this email. -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of ddwigg...@historicnewengland.org Sent: Thursday, January 10, 2013 12:02 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] XMP Metadata to tab-delemited file ResourceSpace does this internally to extract metadata. I think it's as simple as exiftool -t -s imagefile.tif metadatafile.tab Does this do what you want? -DD __ David Dwiggins Systems Librarian/Archivist, Historic New England 141 Cambridge Street, Boston, MA 02114 (617) 994-5948 ddwigg...@historicnewengland.org http://www.historicnewengland.org Medina-Smith, Andrea andrea.medina-sm...@nist.gov 1/10/2013 10:57 AM Hello, I need to take xmp metadata that is imbedded in tif images and pull it out into a tab delimited text file for ingest into our digital repository (CONTENTdm). Has anyone done this using exiftool or the like? Thanks, A __ _ Andrea Medina-Smith Metadata Librarian NIST Gaithersburg andrea.medina-sm...@nist.gov 301-975-2592 Be Green! Think before you print this email.
Re: [CODE4LIB] code4lib 2013 location
I'm sorry, but that doesn't actually clear up anything for me. The location on the layrd page just says Chicago. So, is the conference still happening at UIC? Since the conference hotel isn't super close, does that mean there will be transportation provided? While we're on the subject, are the pre-conferences happening at the same location? On Fri, Jan 11, 2013 at 2:51 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 10:41:54AM -0800, Erik Hetzner wrote: Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. That's a good assumption to make. As to the confusion I said to you when you asked me about this a couple of days ago. http://www.uic.edu/~kayiwa/code4lib.html was supposed to be our proposal. If you look at the document it also suggests that we were going to have the conference registration staggered by timezones. We have elected not to update that because as that was our proposal. When preparing our proposal we borrowed heavily from Yale's and IU's proposal and if someone would like to steal from us I think it is fair to leave that as is. If you want the conference page use the lanyrd.com link below. I can't even take credit for doing that. All of that goes to @pberry http://lanyrd.com/2013/c4l13/ Cheers, ./fxk best, Erik Hetzner Sent from my free software system http://fsf.org/. -- Speed is subsittute fo accurancy.
Re: [CODE4LIB] code4lib 2013 location
So, the location field on Lanyrd is not super specific. It likes big things, like cities. On the reg page for code4lib the UIC Forum is listed as the location though. http://www.regonline.com/builder/site/Default.aspx?EventID=1167723 I'll see if I can put that somewhere in the event on Lanyrd. Cheers, Pat On Fri, Jan 11, 2013 at 3:41 PM, Cynthia Ng cynthia.s...@gmail.com wrote: I'm sorry, but that doesn't actually clear up anything for me. The location on the layrd page just says Chicago. So, is the conference still happening at UIC? Since the conference hotel isn't super close, does that mean there will be transportation provided? While we're on the subject, are the pre-conferences happening at the same location? On Fri, Jan 11, 2013 at 2:51 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 10:41:54AM -0800, Erik Hetzner wrote: Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. That's a good assumption to make. As to the confusion I said to you when you asked me about this a couple of days ago. http://www.uic.edu/~kayiwa/code4lib.html was supposed to be our proposal. If you look at the document it also suggests that we were going to have the conference registration staggered by timezones. We have elected not to update that because as that was our proposal. When preparing our proposal we borrowed heavily from Yale's and IU's proposal and if someone would like to steal from us I think it is fair to leave that as is. If you want the conference page use the lanyrd.com link below. I can't even take credit for doing that. All of that goes to @pberry http://lanyrd.com/2013/c4l13/ Cheers, ./fxk best, Erik Hetzner Sent from my free software system http://fsf.org/. -- Speed is subsittute fo accurancy.
[CODE4LIB] Job: Tools Lab Operations Engineer (Contractor) at Wikimedia Foundation
**Background Information** The Wikimedia Foundation, Inc. is a nonprofit charitable organization dedicated to the growth, development and distribution of free, multilingual content, and to providing the full content of these wiki-based projects to the public free of charge. The Wikimedia Foundation operates some of the largest collaboratively edited reference projects in the world, including Wikipedia, a top-ten internet property. **Statement of Purpose** The Technical Operations team of the Wikimedia Foundation is embarking on a new project to build a flexible and scalable lab infrastructure for our community and volunteers, to support their effort to prototype, develop, test and deploy their tools and extensions. Some of the uses of the Wikimedia Labs infrastructure are for: * Deployment of volunteer-created tools which are independent of MediaWiki, e.g. edit counters, mentoring database, geographic information about articles etc. (essentially the kind of things currently running on the Toolserver) * Prototyping and staging of WMF-developed MediaWiki code * Prototyping and staging of volunteer or chapter-developed MediaWiki code * Development and deployment of new site architecture by staff and volunteers in a code-reviewed, devops-oriented environment * Access for researchers (WMF or external) to live database replication or large datasets, as well as computing resources, for the purpose of running analyses * Serving as an execution and hosting space for bots, so that bots can be more systematically developed and tracked Two full-time Wikimedia Foundation operations engineers are currently building Wikimedia Labs. **Scope of Work** Wikimedia is looking for a contractor whose primary focus will be to assist the community developers to migrate their tools to this new Labs infrastructure, especially those residing in Toolserver today. In addition, this person will: * Support enhancement and perform operational duties of the Labs Virtualization Project using OpenStack and LAMP-stack technology. Duties include developing, deploying and supporting tools to provision and manage large networks of virtual machines, creating a redundant and scalable cloud computing platform * Set up monitoring systems * Provide system and database administration duties for the Labs environment **Outcome and Performance Standards** You are expected to work about 40 hours a week, on average. During these (flexible) hours you are required to be available online for collaboration with the (international) Foundation team. Outside these hours, you may incidentally be contacted for emergencies (e.g. during system outages). You will report to the Director of Operations, and will work closely with Operations staff, the Engineering Community Team, and the Toolserver community. Besides maintaining regular communication with your point of contact, you may need to participate in bi-weekly online Operations meetings with the rest of the team. There will be milestone check-ins with the Foundation to discuss progress and activities. You must be willing to travel occasionally for international meetings, as well as to perform your duties. **Term of Contract** Your initial contract will be for a duration of 6 months, and will commence as soon as possible. Renegotiation at the termination of the contract is optional. **Payments, Incentives, and Penalties** Rate will be determined by level of experience and expertise. **Contractual Terms and Conditions and Required Qualifications** Respondent parties are expected to: * Have 5+ years of hands-on and strong knowledge of LAMP-stack system administration * Be competent in programming and scripting languages like PHP, Python and bash * Be able to work independently where needed, and work remotely as part of a globally distributed team * Be comfortable in a highly collaborative, consensus-oriented environment * Be a proficient speaker in the English language Furthermore: * Prior work experience in creating provisioning tools is a plus * Prior work experience integrating different types of services together, e.g., LDAP, Puppet and MediaWiki is a plus * Experience with virtualization technologies such as OpenStack or Ganeti is a plus * Experience with clustered filesystems such as GlusterFS or Swift is a plus * Experience with high-traffic web site operations is a plus * Experience with MySQL database administration is a plus * Experience with the Solaris UNIX operating system and Sun Grid Engine is a plus * Understanding of the free culture movement, especially Wikimedia, is a plus The ideal candidate will be creative, highly motivated, and able to operate effectively in multiple cultural contexts. Candidates do not have to live in the San Francisco Bay Area or the USA; remote candidates are welcome. Brought to you by code4lib jobs:
Re: [CODE4LIB] code4lib 2013 location
On Fri, Jan 11, 2013 at 06:41:26PM -0500, Cynthia Ng wrote: I'm sorry, but that doesn't actually clear up anything for me. The location on the layrd page just says Chicago. So, is the conference still happening at UIC? Since the conference hotel isn't super close, does that mean there will be transportation provided? The entire conference and pre-conference is at UIC. The Forum is a revenue generating part of UIC. The pre-conference will be at the University Libraries on Monday with the exception of the Drupal one. The hotel is a mile or thereabouts from UIC Forum. Here is the problem with us natives planning. It never crossed our minds that walking a mile while on the *upper limit* of our shuttling to and from work is not the norm for everyone. This was brought to our attention and we will have a shuttle from the Hotel to the Conference venue. While we're on the subject, are the pre-conferences happening at the same location? See above. ./fxk On Fri, Jan 11, 2013 at 2:51 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 10:41:54AM -0800, Erik Hetzner wrote: Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. That's a good assumption to make. As to the confusion I said to you when you asked me about this a couple of days ago. http://www.uic.edu/~kayiwa/code4lib.html was supposed to be our proposal. If you look at the document it also suggests that we were going to have the conference registration staggered by timezones. We have elected not to update that because as that was our proposal. When preparing our proposal we borrowed heavily from Yale's and IU's proposal and if someone would like to steal from us I think it is fair to leave that as is. If you want the conference page use the lanyrd.com link below. I can't even take credit for doing that. All of that goes to @pberry http://lanyrd.com/2013/c4l13/ Cheers, ./fxk best, Erik Hetzner Sent from my free software system http://fsf.org/. -- Speed is subsittute fo accurancy. -- Speed is subsittute fo accurancy.
Re: [CODE4LIB] code4lib 2013 location
Because it seems like it might be useful, I've started a publicly-editable google map at http://goo.gl/maps/LWqay Right now, it has two points: the hotel and the conference location. Please add stuff as appropriate if the urge strikes you. On Fri, Jan 11, 2013 at 7:54 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 06:41:26PM -0500, Cynthia Ng wrote: I'm sorry, but that doesn't actually clear up anything for me. The location on the layrd page just says Chicago. So, is the conference still happening at UIC? Since the conference hotel isn't super close, does that mean there will be transportation provided? The entire conference and pre-conference is at UIC. The Forum is a revenue generating part of UIC. The pre-conference will be at the University Libraries on Monday with the exception of the Drupal one. The hotel is a mile or thereabouts from UIC Forum. Here is the problem with us natives planning. It never crossed our minds that walking a mile while on the *upper limit* of our shuttling to and from work is not the norm for everyone. This was brought to our attention and we will have a shuttle from the Hotel to the Conference venue. While we're on the subject, are the pre-conferences happening at the same location? See above. ./fxk On Fri, Jan 11, 2013 at 2:51 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 10:41:54AM -0800, Erik Hetzner wrote: Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. That's a good assumption to make. As to the confusion I said to you when you asked me about this a couple of days ago. http://www.uic.edu/~kayiwa/code4lib.html was supposed to be our proposal. If you look at the document it also suggests that we were going to have the conference registration staggered by timezones. We have elected not to update that because as that was our proposal. When preparing our proposal we borrowed heavily from Yale's and IU's proposal and if someone would like to steal from us I think it is fair to leave that as is. If you want the conference page use the lanyrd.com link below. I can't even take credit for doing that. All of that goes to @pberry http://lanyrd.com/2013/c4l13/ Cheers, ./fxk best, Erik Hetzner Sent from my free software system http://fsf.org/. -- Speed is subsittute fo accurancy. -- Speed is subsittute fo accurancy. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] code4lib 2013 location
It takes about 15 minutes to walk a mile. It's really not that far for people without health problems that affect mobility. In most cases, driving, then parking will take more time than walking to cover such a short distance. Just saying... -Wilhelmina Randtke On Fri, Jan 11, 2013 at 7:12 PM, Bill Dueber b...@dueber.com wrote: Because it seems like it might be useful, I've started a publicly-editable google map at http://goo.gl/maps/LWqay Right now, it has two points: the hotel and the conference location. Please add stuff as appropriate if the urge strikes you. On Fri, Jan 11, 2013 at 7:54 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 06:41:26PM -0500, Cynthia Ng wrote: I'm sorry, but that doesn't actually clear up anything for me. The location on the layrd page just says Chicago. So, is the conference still happening at UIC? Since the conference hotel isn't super close, does that mean there will be transportation provided? The entire conference and pre-conference is at UIC. The Forum is a revenue generating part of UIC. The pre-conference will be at the University Libraries on Monday with the exception of the Drupal one. The hotel is a mile or thereabouts from UIC Forum. Here is the problem with us natives planning. It never crossed our minds that walking a mile while on the *upper limit* of our shuttling to and from work is not the norm for everyone. This was brought to our attention and we will have a shuttle from the Hotel to the Conference venue. While we're on the subject, are the pre-conferences happening at the same location? See above. ./fxk On Fri, Jan 11, 2013 at 2:51 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 10:41:54AM -0800, Erik Hetzner wrote: Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. That's a good assumption to make. As to the confusion I said to you when you asked me about this a couple of days ago. http://www.uic.edu/~kayiwa/code4lib.html was supposed to be our proposal. If you look at the document it also suggests that we were going to have the conference registration staggered by timezones. We have elected not to update that because as that was our proposal. When preparing our proposal we borrowed heavily from Yale's and IU's proposal and if someone would like to steal from us I think it is fair to leave that as is. If you want the conference page use the lanyrd.com link below. I can't even take credit for doing that. All of that goes to @pberry http://lanyrd.com/2013/c4l13/ Cheers, ./fxk best, Erik Hetzner Sent from my free software system http://fsf.org/. -- Speed is subsittute fo accurancy. -- Speed is subsittute fo accurancy. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] code4lib 2013 location
FWIW, the # 8 bus runs every 10 min. Cary On Fri, Jan 11, 2013 at 5:12 PM, Bill Dueber b...@dueber.com wrote: Because it seems like it might be useful, I've started a publicly-editable google map at http://goo.gl/maps/LWqay Right now, it has two points: the hotel and the conference location. Please add stuff as appropriate if the urge strikes you. On Fri, Jan 11, 2013 at 7:54 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 06:41:26PM -0500, Cynthia Ng wrote: I'm sorry, but that doesn't actually clear up anything for me. The location on the layrd page just says Chicago. So, is the conference still happening at UIC? Since the conference hotel isn't super close, does that mean there will be transportation provided? The entire conference and pre-conference is at UIC. The Forum is a revenue generating part of UIC. The pre-conference will be at the University Libraries on Monday with the exception of the Drupal one. The hotel is a mile or thereabouts from UIC Forum. Here is the problem with us natives planning. It never crossed our minds that walking a mile while on the *upper limit* of our shuttling to and from work is not the norm for everyone. This was brought to our attention and we will have a shuttle from the Hotel to the Conference venue. While we're on the subject, are the pre-conferences happening at the same location? See above. ./fxk On Fri, Jan 11, 2013 at 2:51 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 10:41:54AM -0800, Erik Hetzner wrote: Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. That's a good assumption to make. As to the confusion I said to you when you asked me about this a couple of days ago. http://www.uic.edu/~kayiwa/code4lib.html was supposed to be our proposal. If you look at the document it also suggests that we were going to have the conference registration staggered by timezones. We have elected not to update that because as that was our proposal. When preparing our proposal we borrowed heavily from Yale's and IU's proposal and if someone would like to steal from us I think it is fair to leave that as is. If you want the conference page use the lanyrd.com link below. I can't even take credit for doing that. All of that goes to @pberry http://lanyrd.com/2013/c4l13/ Cheers, ./fxk best, Erik Hetzner Sent from my free software system http://fsf.org/. -- Speed is subsittute fo accurancy. -- Speed is subsittute fo accurancy. -- Bill Dueber Library Systems Programmer University of Michigan Library -- Cary Gordon The Cherry Hill Company http://chillco.com
Re: [CODE4LIB] code4lib 2013 location
On Fri, Jan 11, 2013 at 05:51:17PM -0800, Cary Gordon wrote: FWIW, the # 8 bus runs every 10 min. Good point. It may be worth your while getting the 3 day pass for $US 14 http://www.transitchicago.com/travel_information/fares/unlimitedridecards.aspx Not for traveling to the conference but any other travel that you may want to do while in town. ./fxk Cary On Fri, Jan 11, 2013 at 5:12 PM, Bill Dueber b...@dueber.com wrote: Because it seems like it might be useful, I've started a publicly-editable google map at http://goo.gl/maps/LWqay Right now, it has two points: the hotel and the conference location. Please add stuff as appropriate if the urge strikes you. On Fri, Jan 11, 2013 at 7:54 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 06:41:26PM -0500, Cynthia Ng wrote: I'm sorry, but that doesn't actually clear up anything for me. The location on the layrd page just says Chicago. So, is the conference still happening at UIC? Since the conference hotel isn't super close, does that mean there will be transportation provided? The entire conference and pre-conference is at UIC. The Forum is a revenue generating part of UIC. The pre-conference will be at the University Libraries on Monday with the exception of the Drupal one. The hotel is a mile or thereabouts from UIC Forum. Here is the problem with us natives planning. It never crossed our minds that walking a mile while on the *upper limit* of our shuttling to and from work is not the norm for everyone. This was brought to our attention and we will have a shuttle from the Hotel to the Conference venue. While we're on the subject, are the pre-conferences happening at the same location? See above. ./fxk On Fri, Jan 11, 2013 at 2:51 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 10:41:54AM -0800, Erik Hetzner wrote: Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. That's a good assumption to make. As to the confusion I said to you when you asked me about this a couple of days ago. http://www.uic.edu/~kayiwa/code4lib.html was supposed to be our proposal. If you look at the document it also suggests that we were going to have the conference registration staggered by timezones. We have elected not to update that because as that was our proposal. When preparing our proposal we borrowed heavily from Yale's and IU's proposal and if someone would like to steal from us I think it is fair to leave that as is. If you want the conference page use the lanyrd.com link below. I can't even take credit for doing that. All of that goes to @pberry http://lanyrd.com/2013/c4l13/ Cheers, ./fxk best, Erik Hetzner Sent from my free software system http://fsf.org/. -- Speed is subsittute fo accurancy. -- Speed is subsittute fo accurancy. -- Bill Dueber Library Systems Programmer University of Michigan Library -- Cary Gordon The Cherry Hill Company http://chillco.com -- Speed is subsittute fo accurancy.
Re: [CODE4LIB] code4lib 2013 location
Gah, I think I forgot to announce this on the list, but there's also this google map: https://maps.google.com/maps/ms?msid=213549257652679418473.0004ce6c25e6cdeb0319dmsa=0 which I put on the social page http://wiki.code4lib.org/index.php/2013_social_activities I'll go ahead and add the hotel and conference site to that as well if it's not already there. On Fri, Jan 11, 2013 at 7:12 PM, Bill Dueber b...@dueber.com wrote: Because it seems like it might be useful, I've started a publicly-editable google map at http://goo.gl/maps/LWqay Right now, it has two points: the hotel and the conference location. Please add stuff as appropriate if the urge strikes you. On Fri, Jan 11, 2013 at 7:54 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 06:41:26PM -0500, Cynthia Ng wrote: I'm sorry, but that doesn't actually clear up anything for me. The location on the layrd page just says Chicago. So, is the conference still happening at UIC? Since the conference hotel isn't super close, does that mean there will be transportation provided? The entire conference and pre-conference is at UIC. The Forum is a revenue generating part of UIC. The pre-conference will be at the University Libraries on Monday with the exception of the Drupal one. The hotel is a mile or thereabouts from UIC Forum. Here is the problem with us natives planning. It never crossed our minds that walking a mile while on the *upper limit* of our shuttling to and from work is not the norm for everyone. This was brought to our attention and we will have a shuttle from the Hotel to the Conference venue. While we're on the subject, are the pre-conferences happening at the same location? See above. ./fxk On Fri, Jan 11, 2013 at 2:51 PM, Francis Kayiwa kay...@uic.edu wrote: On Fri, Jan 11, 2013 at 10:41:54AM -0800, Erik Hetzner wrote: Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. That's a good assumption to make. As to the confusion I said to you when you asked me about this a couple of days ago. http://www.uic.edu/~kayiwa/code4lib.html was supposed to be our proposal. If you look at the document it also suggests that we were going to have the conference registration staggered by timezones. We have elected not to update that because as that was our proposal. When preparing our proposal we borrowed heavily from Yale's and IU's proposal and if someone would like to steal from us I think it is fair to leave that as is. If you want the conference page use the lanyrd.com link below. I can't even take credit for doing that. All of that goes to @pberry http://lanyrd.com/2013/c4l13/ Cheers, ./fxk best, Erik Hetzner Sent from my free software system http://fsf.org/. -- Speed is subsittute fo accurancy. -- Speed is subsittute fo accurancy. -- Bill Dueber Library Systems Programmer University of Michigan Library