Re: [CODE4LIB] digital storage

2009-08-28 Thread Edward Iglesias
Thanks to all of you who answered.  Crowdsourcing does work if you
pick the right crowd.  We have been looking at the S3 possibility but
I agree this would have to be a second copy.  The policy and
institutional support comments from my tokayo

see http://en.wiktionary.org/wiki/tocayo

seem especially appropriate.  I am going to include a link on our
staff blog to this thread as a resource.

Thanks again,

Edward Iglesias



On Thu, Aug 27, 2009 at 8:59 PM, Edward M. Corradoecorr...@ecorrado.us wrote:
 Joe Atzberger wrote:

 On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado
 ecorr...@ecorrado.uswrote:



 Nate Vack wrote:



 On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu
 wrote:




 $213,360 over 3 years




  If you're ONLY looking at storage costs, SATA drives in enterprise RAID


 systems range from about $1.00/GB to about $1.25/GB for online storage.



 Yeah -- but if you're looking only at storage costs, you'll have an
 inaccurate estimate of your costs. You've got power, cooling, sysadmin
 time, and replacements for failed disks. If you want an
 apples-to-apples comparison, you'll want an offsite mirror, as well.

 I'm not saying S3 is always cost-effective -- but in our experience,
 the costs of the disks themselves is dwarfed by the costs of the
 related infrastructure.

  I agree that the cost of storage is only one factor. I have to wonder


 though, how much more staff time do you need for local storage than cloud
 storage? I don't know the answer but I'm not sure it is much more than
 setting up S3 storage, especially if you have a good partnership with
 your
 storage vendor.



 Support relationships, especially regarding storage are very costly.  When
 I
 worked at a midsize datacenter, we implemented a backup solution with
 STORServer and tivoli.  Both hardware and software were considerably
 costly.  Initial and ongoing support, while indispensable was basically as
 much as the cost of the hardware every few years.


 They can be depending on what you are doing and what choices on software you
 make, but for long term preservation purposes they don't have to be nearly
 as expensive as what Ryan calculated S3 to cost. If you shop around you can
 get a quality 36GB array with 3 yr warranty for say $30,000 that is almost
 $180,000 less than S3 (probably much less, I'm be less than generous with my
 Sun discounts and only briefly looked at there prices). Even if we use the
 double your cost for support, it is still over $50,000 a year less for 3
 years. Yes, we might need some expertise, but running a 36TB preservation
 storage array is not a $50,000 a year job and besides, what is wrong with
 growing local expertise?

 ...

 Yes, maybe you save on staff time patching software on your storage
 array,
 but that is not a significant amount of time - esp. since you are still
 going to have some local storage, and there isn't much difference in
 staff
 time in doing 2 TB vs. 20 TB.



 There's a real difference.  I can get 2 TB in a single HDD, for example
 this
 one for $200 at NewEgg:
 http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413
 http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413

 Any high school kid can install that.  20 TB requires some kind of
 additional structure and additional expertise.


 Well building a 20 TB storage device and getting it to work can actually be
 very cheap and doesn't require a PhD (just a local GNU/Linux geek who likes
 to play with hardware) if you are OK with a home grown solution. I wouldn't
 be satisfied with that, but I don't see how a commercial offering that adds
 up to $150,000 worth of expertise and infrastructure.

 You may some time on the initial configuration, but you still need to


 configure cloud storage. Is cloud storage that much easier/less time
 consuming to configure than an iSCSI device? Replacement for disks would
 be
 covered under your warranty or support contract (at least I would hope
 you
 would have one).



 Warranties expire and force you into ill-timed, hardly-afforded and
 dangerous-to-your-data upgrades.  Sorta like some ILS systems with which
 we
 are all familiar.

 Yes some application upgrades can cause issues, but how is that different if
 your application and/or storage is in a  cloud?

  The cloud doesn't necessarily stay the same, but the part
 you care about (data in, data out) does.


 How do you know they won't change their cloud models? And you don't even
 have a warranty with the cloud. They won't even guarantee they won't delete
 your data.

 As long as you use a common standards based method of storage, you won't
 have any more issues getting it to work than you will getting future
 application servers to work with the cloud. While I'm not a huge fan of NFS
 I've been using it for many years with no problems due to changes in NFS or
 operating systems or hardware. NFS has been available to the public for
 about 20 years. Occasionally you may need to migrate it 

Re: [CODE4LIB] digital storage

2009-08-28 Thread Tim McGeary

Related to our discussion:
http://online.wsj.com/article/SB125139942345664387.html

I particularly like the quote at the end:


Digital information lasts forever -- or five years, says RAND Corp.
computer analyst Jeff Rothenberg, whichever comes first.


Tim McGeary
Team Leader, Library Technology
Lehigh University
610-758-4998
tim.mcge...@lehigh.edu
Google Talk: timmcgeary
Yahoo IM: timmcgeary

Edward Iglesias wrote:
Thanks to all of you who answered.  Crowdsourcing does work if you 
pick the right crowd.  We have been looking at the S3 possibility but
 I agree this would have to be a second copy.  The policy and 
institutional support comments from my tokayo


see http://en.wiktionary.org/wiki/tocayo

seem especially appropriate.  I am going to include a link on our 
staff blog to this thread as a resource.


Thanks again,

Edward Iglesias



On Thu, Aug 27, 2009 at 8:59 PM, Edward M.
Corradoecorr...@ecorrado.us wrote:

Joe Atzberger wrote:
On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado 
ecorr...@ecorrado.uswrote:




Nate Vack wrote:



On Thu, Aug 27, 2009 at 1:57 PM, Ryan
Ordwayrord...@oregonstate.edu wrote:




$213,360 over 3 years




If you're ONLY looking at storage costs, SATA drives in
enterprise RAID


systems range from about $1.00/GB to about $1.25/GB for
online storage.



Yeah -- but if you're looking only at storage costs, you'll
have an inaccurate estimate of your costs. You've got power,
cooling, sysadmin time, and replacements for failed disks. If
you want an apples-to-apples comparison, you'll want an
offsite mirror, as well.

I'm not saying S3 is always cost-effective -- but in our
experience, the costs of the disks themselves is dwarfed by
the costs of the related infrastructure.

I agree that the cost of storage is only one factor. I have
to wonder


though, how much more staff time do you need for local storage
than cloud storage? I don't know the answer but I'm not sure it
is much more than setting up S3 storage, especially if you have
a good partnership with your storage vendor.



Support relationships, especially regarding storage are very
costly.  When I worked at a midsize datacenter, we implemented a
backup solution with STORServer and tivoli.  Both hardware and
software were considerably costly.  Initial and ongoing support,
while indispensable was basically as much as the cost of the
hardware every few years.


They can be depending on what you are doing and what choices on
software you make, but for long term preservation purposes they
don't have to be nearly as expensive as what Ryan calculated S3 to
cost. If you shop around you can get a quality 36GB array with 3 yr
warranty for say $30,000 that is almost $180,000 less than S3
(probably much less, I'm be less than generous with my Sun
discounts and only briefly looked at there prices). Even if we use
the double your cost for support, it is still over $50,000 a year
less for 3 years. Yes, we might need some expertise, but running a
36TB preservation storage array is not a $50,000 a year job and
besides, what is wrong with growing local expertise?

...

Yes, maybe you save on staff time patching software on your
storage array, but that is not a significant amount of time -
esp. since you are still going to have some local storage, and
there isn't much difference in staff time in doing 2 TB vs. 20
TB.



There's a real difference.  I can get 2 TB in a single HDD, for
example this one for $200 at NewEgg: 
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413



Any high school kid can install that.  20 TB requires some kind
of additional structure and additional expertise.


Well building a 20 TB storage device and getting it to work can
actually be very cheap and doesn't require a PhD (just a local
GNU/Linux geek who likes to play with hardware) if you are OK with
a home grown solution. I wouldn't be satisfied with that, but I
don't see how a commercial offering that adds up to $150,000 worth
of expertise and infrastructure.


You may some time on the initial configuration, but you still
need to


configure cloud storage. Is cloud storage that much easier/less
time consuming to configure than an iSCSI device? Replacement
for disks would be covered under your warranty or support
contract (at least I would hope you would have one).



Warranties expire and force you into ill-timed, hardly-afforded
and dangerous-to-your-data upgrades.  Sorta like some ILS systems
with which we are all familiar.

Yes some application upgrades can cause issues, but how is that
different if your application and/or storage is in a  cloud?


The cloud doesn't necessarily stay the same, but the part you
care about (data in, data out) does.


How do you know they won't change their cloud models? And you don't
even have a warranty with the cloud. They won't even guarantee they
won't delete your data.

As long as you use a common standards based method of storage, you

[CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Tim Shearer

Hi Folks,

Looking for help/perspectives.

Anyone got any clever solutions for allowing folks to find a word with 
diacritics in a rendered web page regardless of whether or not the user 
tries with or without diacritics.


In indexes this is usually solved by indexing the word with and without, 
so the user gets what they want regardless of how they search.


Thanks in advance for any ideas/enlightenment,
Tim


Re: [CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Joe Atzberger
Something like the jquery highlight function combined with this kind of
mapping:

http://stackoverflow.com/questions/863800/replacing-diacritics-in-javascript

If you don't mind, you can speed things up by forcing the comparison sets to
be in one case or the other.
--Joe

On Fri, Aug 28, 2009 at 12:17 PM, Tim Shearer sh...@ils.unc.edu wrote:

 Hi Folks,

 Looking for help/perspectives.

 Anyone got any clever solutions for allowing folks to find a word with
 diacritics in a rendered web page regardless of whether or not the user
 tries with or without diacritics.

 In indexes this is usually solved by indexing the word with and without, so
 the user gets what they want regardless of how they search.

 Thanks in advance for any ideas/enlightenment,
 Tim



Re: [CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Keith Jenkins
Hi, Tim.

Are you are referring to a find in page, where a user presses CTRL-F
in the browser?

If so, it will depend on the browser.  Google Chrome 2.0 will find
matches regardless of the diacritics (i.e. user can type placa and
it matches plaça, and vice versa).  This doesn't seem to work in
Firefox 3.0.13 or IE8.

Keith


On Fri, Aug 28, 2009 at 12:17 PM, Tim Shearersh...@ils.unc.edu wrote:
 Hi Folks,

 Looking for help/perspectives.

 Anyone got any clever solutions for allowing folks to find a word with
 diacritics in a rendered web page regardless of whether or not the user
 tries with or without diacritics.

 In indexes this is usually solved by indexing the word with and without, so
 the user gets what they want regardless of how they search.

 Thanks in advance for any ideas/enlightenment,
 Tim



Re: [CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Tim Shearer

Are you are referring to a find in page, where a user presses CTRL-F
in the browser?


Yes, sorry to be unclear.


If so, it will depend on the browser.  Google Chrome 2.0 will find
matches regardless of the diacritics (i.e. user can type placa and
it matches pla�a, and vice versa).  This doesn't seem to work in
Firefox 3.0.13 or IE8.


Exactly, and FF and IE are the most common browsers we're seeing.

I was wondering if someone (I know this sounds crazy) has explored the 
idea of marking up the non-diacritic inline version of the word in a span 
styled in such a way as to make it findable but not intrusive.


-t


Keith


On Fri, Aug 28, 2009 at 12:17 PM, Tim Shearersh...@ils.unc.edu wrote:

Hi Folks,

Looking for help/perspectives.

Anyone got any clever solutions for allowing folks to find a word with
diacritics in a rendered web page regardless of whether or not the user
tries with or without diacritics.

In indexes this is usually solved by indexing the word with and without, so
the user gets what they want regardless of how they search.

Thanks in advance for any ideas/enlightenment,
Tim



Re: [CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Ken Irwin
This sounds like a great idea for a Firefox plugin...

Ken

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Tim Shearer
 Sent: Friday, August 28, 2009 12:18 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] find in page, diacritics, etc
 
 Hi Folks,
 
 Looking for help/perspectives.
 
 Anyone got any clever solutions for allowing folks to find a word with
 diacritics in a rendered web page regardless of whether or not the user
 tries with or without diacritics.
 
 In indexes this is usually solved by indexing the word with and without,
 so the user gets what they want regardless of how they search.
 
 Thanks in advance for any ideas/enlightenment,
 Tim


Re: [CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Joe Hourcle



On Fri, 28 Aug 2009, Keith Jenkins wrote:


Hi, Tim.

Are you are referring to a find in page, where a user presses CTRL-F
in the browser?

If so, it will depend on the browser.  Google Chrome 2.0 will find
matches regardless of the diacritics (i.e. user can type placa and
it matches pla?a, and vice versa).  This doesn't seem to work in
Firefox 3.0.13 or IE8.


Works in Safari 4.0.3, fails in Firefox 3.5.2

Here's a page to test with, using the correct (non-americanized) spelling 
of my name:


http://www.frbr.org/2009/01/15/hourcle-frbr-applied-to-scientific-data


As for tricks to get it to work -- T\the closest HTML tag that I can think 
of is 'ABBR' which isn't quite right:


ABBR lang='fr' title='Hourcle'Hourcleacute;/ABBR

... and a quick test in Firefox 3.5 shows it doesn't help.

-Joe

(and no, it's not French, but it's a French spelling, so it'd clue the 
pronunciation for screen readers correctly

Re: [CODE4LIB] digital storage

2009-08-28 Thread John Fereira

Edward Iglesias wrote:

Thanks to all of you who answered.  Crowdsourcing does work if you
pick the right crowd.  We have been looking at the S3 possibility but
I agree this would have to be a second copy.  


There been lots of talk about hardware platorms but not much about 
software to manage all that data.  You might want to take a look at 
DuraCloud from DuraSpace (http://duraspace.org/duracloud.php). 
DuraSpace is the recently created organization from the merger of Fedora 
and DSpace.  When I first started at Cornell 12 years ago I sat next to 
Sandy Payette, who is now the CEO of DuraSpace.