Re: [CODE4LIB] digital storage
Thanks to all of you who answered. Crowdsourcing does work if you pick the right crowd. We have been looking at the S3 possibility but I agree this would have to be a second copy. The policy and institutional support comments from my tokayo see http://en.wiktionary.org/wiki/tocayo seem especially appropriate. I am going to include a link on our staff blog to this thread as a resource. Thanks again, Edward Iglesias On Thu, Aug 27, 2009 at 8:59 PM, Edward M. Corradoecorr...@ecorrado.us wrote: Joe Atzberger wrote: On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado ecorr...@ecorrado.uswrote: Nate Vack wrote: On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu wrote: $213,360 over 3 years If you're ONLY looking at storage costs, SATA drives in enterprise RAID systems range from about $1.00/GB to about $1.25/GB for online storage. Yeah -- but if you're looking only at storage costs, you'll have an inaccurate estimate of your costs. You've got power, cooling, sysadmin time, and replacements for failed disks. If you want an apples-to-apples comparison, you'll want an offsite mirror, as well. I'm not saying S3 is always cost-effective -- but in our experience, the costs of the disks themselves is dwarfed by the costs of the related infrastructure. I agree that the cost of storage is only one factor. I have to wonder though, how much more staff time do you need for local storage than cloud storage? I don't know the answer but I'm not sure it is much more than setting up S3 storage, especially if you have a good partnership with your storage vendor. Support relationships, especially regarding storage are very costly. When I worked at a midsize datacenter, we implemented a backup solution with STORServer and tivoli. Both hardware and software were considerably costly. Initial and ongoing support, while indispensable was basically as much as the cost of the hardware every few years. They can be depending on what you are doing and what choices on software you make, but for long term preservation purposes they don't have to be nearly as expensive as what Ryan calculated S3 to cost. If you shop around you can get a quality 36GB array with 3 yr warranty for say $30,000 that is almost $180,000 less than S3 (probably much less, I'm be less than generous with my Sun discounts and only briefly looked at there prices). Even if we use the double your cost for support, it is still over $50,000 a year less for 3 years. Yes, we might need some expertise, but running a 36TB preservation storage array is not a $50,000 a year job and besides, what is wrong with growing local expertise? ... Yes, maybe you save on staff time patching software on your storage array, but that is not a significant amount of time - esp. since you are still going to have some local storage, and there isn't much difference in staff time in doing 2 TB vs. 20 TB. There's a real difference. I can get 2 TB in a single HDD, for example this one for $200 at NewEgg: http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 Any high school kid can install that. 20 TB requires some kind of additional structure and additional expertise. Well building a 20 TB storage device and getting it to work can actually be very cheap and doesn't require a PhD (just a local GNU/Linux geek who likes to play with hardware) if you are OK with a home grown solution. I wouldn't be satisfied with that, but I don't see how a commercial offering that adds up to $150,000 worth of expertise and infrastructure. You may some time on the initial configuration, but you still need to configure cloud storage. Is cloud storage that much easier/less time consuming to configure than an iSCSI device? Replacement for disks would be covered under your warranty or support contract (at least I would hope you would have one). Warranties expire and force you into ill-timed, hardly-afforded and dangerous-to-your-data upgrades. Sorta like some ILS systems with which we are all familiar. Yes some application upgrades can cause issues, but how is that different if your application and/or storage is in a cloud? The cloud doesn't necessarily stay the same, but the part you care about (data in, data out) does. How do you know they won't change their cloud models? And you don't even have a warranty with the cloud. They won't even guarantee they won't delete your data. As long as you use a common standards based method of storage, you won't have any more issues getting it to work than you will getting future application servers to work with the cloud. While I'm not a huge fan of NFS I've been using it for many years with no problems due to changes in NFS or operating systems or hardware. NFS has been available to the public for about 20 years. Occasionally you may need to migrate it
Re: [CODE4LIB] digital storage
Related to our discussion: http://online.wsj.com/article/SB125139942345664387.html I particularly like the quote at the end: Digital information lasts forever -- or five years, says RAND Corp. computer analyst Jeff Rothenberg, whichever comes first. Tim McGeary Team Leader, Library Technology Lehigh University 610-758-4998 tim.mcge...@lehigh.edu Google Talk: timmcgeary Yahoo IM: timmcgeary Edward Iglesias wrote: Thanks to all of you who answered. Crowdsourcing does work if you pick the right crowd. We have been looking at the S3 possibility but I agree this would have to be a second copy. The policy and institutional support comments from my tokayo see http://en.wiktionary.org/wiki/tocayo seem especially appropriate. I am going to include a link on our staff blog to this thread as a resource. Thanks again, Edward Iglesias On Thu, Aug 27, 2009 at 8:59 PM, Edward M. Corradoecorr...@ecorrado.us wrote: Joe Atzberger wrote: On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado ecorr...@ecorrado.uswrote: Nate Vack wrote: On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu wrote: $213,360 over 3 years If you're ONLY looking at storage costs, SATA drives in enterprise RAID systems range from about $1.00/GB to about $1.25/GB for online storage. Yeah -- but if you're looking only at storage costs, you'll have an inaccurate estimate of your costs. You've got power, cooling, sysadmin time, and replacements for failed disks. If you want an apples-to-apples comparison, you'll want an offsite mirror, as well. I'm not saying S3 is always cost-effective -- but in our experience, the costs of the disks themselves is dwarfed by the costs of the related infrastructure. I agree that the cost of storage is only one factor. I have to wonder though, how much more staff time do you need for local storage than cloud storage? I don't know the answer but I'm not sure it is much more than setting up S3 storage, especially if you have a good partnership with your storage vendor. Support relationships, especially regarding storage are very costly. When I worked at a midsize datacenter, we implemented a backup solution with STORServer and tivoli. Both hardware and software were considerably costly. Initial and ongoing support, while indispensable was basically as much as the cost of the hardware every few years. They can be depending on what you are doing and what choices on software you make, but for long term preservation purposes they don't have to be nearly as expensive as what Ryan calculated S3 to cost. If you shop around you can get a quality 36GB array with 3 yr warranty for say $30,000 that is almost $180,000 less than S3 (probably much less, I'm be less than generous with my Sun discounts and only briefly looked at there prices). Even if we use the double your cost for support, it is still over $50,000 a year less for 3 years. Yes, we might need some expertise, but running a 36TB preservation storage array is not a $50,000 a year job and besides, what is wrong with growing local expertise? ... Yes, maybe you save on staff time patching software on your storage array, but that is not a significant amount of time - esp. since you are still going to have some local storage, and there isn't much difference in staff time in doing 2 TB vs. 20 TB. There's a real difference. I can get 2 TB in a single HDD, for example this one for $200 at NewEgg: http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 Any high school kid can install that. 20 TB requires some kind of additional structure and additional expertise. Well building a 20 TB storage device and getting it to work can actually be very cheap and doesn't require a PhD (just a local GNU/Linux geek who likes to play with hardware) if you are OK with a home grown solution. I wouldn't be satisfied with that, but I don't see how a commercial offering that adds up to $150,000 worth of expertise and infrastructure. You may some time on the initial configuration, but you still need to configure cloud storage. Is cloud storage that much easier/less time consuming to configure than an iSCSI device? Replacement for disks would be covered under your warranty or support contract (at least I would hope you would have one). Warranties expire and force you into ill-timed, hardly-afforded and dangerous-to-your-data upgrades. Sorta like some ILS systems with which we are all familiar. Yes some application upgrades can cause issues, but how is that different if your application and/or storage is in a cloud? The cloud doesn't necessarily stay the same, but the part you care about (data in, data out) does. How do you know they won't change their cloud models? And you don't even have a warranty with the cloud. They won't even guarantee they won't delete your data. As long as you use a common standards based method of storage, you
[CODE4LIB] find in page, diacritics, etc
Hi Folks, Looking for help/perspectives. Anyone got any clever solutions for allowing folks to find a word with diacritics in a rendered web page regardless of whether or not the user tries with or without diacritics. In indexes this is usually solved by indexing the word with and without, so the user gets what they want regardless of how they search. Thanks in advance for any ideas/enlightenment, Tim
Re: [CODE4LIB] find in page, diacritics, etc
Something like the jquery highlight function combined with this kind of mapping: http://stackoverflow.com/questions/863800/replacing-diacritics-in-javascript If you don't mind, you can speed things up by forcing the comparison sets to be in one case or the other. --Joe On Fri, Aug 28, 2009 at 12:17 PM, Tim Shearer sh...@ils.unc.edu wrote: Hi Folks, Looking for help/perspectives. Anyone got any clever solutions for allowing folks to find a word with diacritics in a rendered web page regardless of whether or not the user tries with or without diacritics. In indexes this is usually solved by indexing the word with and without, so the user gets what they want regardless of how they search. Thanks in advance for any ideas/enlightenment, Tim
Re: [CODE4LIB] find in page, diacritics, etc
Hi, Tim. Are you are referring to a find in page, where a user presses CTRL-F in the browser? If so, it will depend on the browser. Google Chrome 2.0 will find matches regardless of the diacritics (i.e. user can type placa and it matches plaça, and vice versa). This doesn't seem to work in Firefox 3.0.13 or IE8. Keith On Fri, Aug 28, 2009 at 12:17 PM, Tim Shearersh...@ils.unc.edu wrote: Hi Folks, Looking for help/perspectives. Anyone got any clever solutions for allowing folks to find a word with diacritics in a rendered web page regardless of whether or not the user tries with or without diacritics. In indexes this is usually solved by indexing the word with and without, so the user gets what they want regardless of how they search. Thanks in advance for any ideas/enlightenment, Tim
Re: [CODE4LIB] find in page, diacritics, etc
Are you are referring to a find in page, where a user presses CTRL-F in the browser? Yes, sorry to be unclear. If so, it will depend on the browser. Google Chrome 2.0 will find matches regardless of the diacritics (i.e. user can type placa and it matches pla�a, and vice versa). This doesn't seem to work in Firefox 3.0.13 or IE8. Exactly, and FF and IE are the most common browsers we're seeing. I was wondering if someone (I know this sounds crazy) has explored the idea of marking up the non-diacritic inline version of the word in a span styled in such a way as to make it findable but not intrusive. -t Keith On Fri, Aug 28, 2009 at 12:17 PM, Tim Shearersh...@ils.unc.edu wrote: Hi Folks, Looking for help/perspectives. Anyone got any clever solutions for allowing folks to find a word with diacritics in a rendered web page regardless of whether or not the user tries with or without diacritics. In indexes this is usually solved by indexing the word with and without, so the user gets what they want regardless of how they search. Thanks in advance for any ideas/enlightenment, Tim
Re: [CODE4LIB] find in page, diacritics, etc
This sounds like a great idea for a Firefox plugin... Ken -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Tim Shearer Sent: Friday, August 28, 2009 12:18 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] find in page, diacritics, etc Hi Folks, Looking for help/perspectives. Anyone got any clever solutions for allowing folks to find a word with diacritics in a rendered web page regardless of whether or not the user tries with or without diacritics. In indexes this is usually solved by indexing the word with and without, so the user gets what they want regardless of how they search. Thanks in advance for any ideas/enlightenment, Tim
Re: [CODE4LIB] find in page, diacritics, etc
On Fri, 28 Aug 2009, Keith Jenkins wrote: Hi, Tim. Are you are referring to a find in page, where a user presses CTRL-F in the browser? If so, it will depend on the browser. Google Chrome 2.0 will find matches regardless of the diacritics (i.e. user can type placa and it matches pla?a, and vice versa). This doesn't seem to work in Firefox 3.0.13 or IE8. Works in Safari 4.0.3, fails in Firefox 3.5.2 Here's a page to test with, using the correct (non-americanized) spelling of my name: http://www.frbr.org/2009/01/15/hourcle-frbr-applied-to-scientific-data As for tricks to get it to work -- T\the closest HTML tag that I can think of is 'ABBR' which isn't quite right: ABBR lang='fr' title='Hourcle'Hourcleacute;/ABBR ... and a quick test in Firefox 3.5 shows it doesn't help. -Joe (and no, it's not French, but it's a French spelling, so it'd clue the pronunciation for screen readers correctly
Re: [CODE4LIB] digital storage
Edward Iglesias wrote: Thanks to all of you who answered. Crowdsourcing does work if you pick the right crowd. We have been looking at the S3 possibility but I agree this would have to be a second copy. There been lots of talk about hardware platorms but not much about software to manage all that data. You might want to take a look at DuraCloud from DuraSpace (http://duraspace.org/duracloud.php). DuraSpace is the recently created organization from the merger of Fedora and DSpace. When I first started at Cornell 12 years ago I sat next to Sandy Payette, who is now the CEO of DuraSpace.