Re: [CODE4LIB] digital storage

2009-08-28 Thread John Fereira

Edward Iglesias wrote:

Thanks to all of you who answered.  Crowdsourcing does work if you
pick the right crowd.  We have been looking at the S3 possibility but
I agree this would have to be a second copy.  


There been lots of talk about hardware platorms but not much about 
software to manage all that data.  You might want to take a look at 
DuraCloud from DuraSpace (http://duraspace.org/duracloud.php). 
DuraSpace is the recently created organization from the merger of Fedora 
and DSpace.  When I first started at Cornell 12 years ago I sat next to 
Sandy Payette, who is now the CEO of DuraSpace.


Re: [CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Joe Hourcle



On Fri, 28 Aug 2009, Keith Jenkins wrote:


Hi, Tim.

Are you are referring to a "find in page", where a user presses CTRL-F
in the browser?

If so, it will depend on the browser.  Google Chrome 2.0 will find
matches regardless of the diacritics (i.e. user can type "placa" and
it matches "pla?a", and vice versa).  This doesn't seem to work in
Firefox 3.0.13 or IE8.


Works in Safari 4.0.3, fails in Firefox 3.5.2

Here's a page to test with, using the correct (non-americanized) spelling 
of my name:


http://www.frbr.org/2009/01/15/hourcle-frbr-applied-to-scientific-data


As for tricks to get it to work -- T\the closest HTML tag that I can think 
of is 'ABBR' which isn't quite right:


Hourclé

... and a quick test in Firefox 3.5 shows it doesn't help.

-Joe

(and no, it's not French, but it's a French spelling, so it'd clue the 
pronunciation for screen readers correctly

Re: [CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Ken Irwin
This sounds like a great idea for a Firefox plugin...

Ken

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Tim Shearer
> Sent: Friday, August 28, 2009 12:18 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] find in page, diacritics, etc
> 
> Hi Folks,
> 
> Looking for help/perspectives.
> 
> Anyone got any clever solutions for allowing folks to find a word with
> diacritics in a rendered web page regardless of whether or not the user
> tries with or without diacritics.
> 
> In indexes this is usually solved by indexing the word with and without,
> so the user gets what they want regardless of how they search.
> 
> Thanks in advance for any ideas/enlightenment,
> Tim


Re: [CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Tim Shearer

Are you are referring to a "find in page", where a user presses CTRL-F
in the browser?


Yes, sorry to be unclear.


If so, it will depend on the browser.  Google Chrome 2.0 will find
matches regardless of the diacritics (i.e. user can type "placa" and
it matches "pla�a", and vice versa).  This doesn't seem to work in
Firefox 3.0.13 or IE8.


Exactly, and FF and IE are the most common browsers we're seeing.

I was wondering if someone (I know this sounds crazy) has explored the 
idea of marking up the non-diacritic inline version of the word in a span 
styled in such a way as to make it findable but not intrusive.


-t


Keith


On Fri, Aug 28, 2009 at 12:17 PM, Tim Shearer wrote:

Hi Folks,

Looking for help/perspectives.

Anyone got any clever solutions for allowing folks to find a word with
diacritics in a rendered web page regardless of whether or not the user
tries with or without diacritics.

In indexes this is usually solved by indexing the word with and without, so
the user gets what they want regardless of how they search.

Thanks in advance for any ideas/enlightenment,
Tim



Re: [CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Keith Jenkins
Hi, Tim.

Are you are referring to a "find in page", where a user presses CTRL-F
in the browser?

If so, it will depend on the browser.  Google Chrome 2.0 will find
matches regardless of the diacritics (i.e. user can type "placa" and
it matches "plaça", and vice versa).  This doesn't seem to work in
Firefox 3.0.13 or IE8.

Keith


On Fri, Aug 28, 2009 at 12:17 PM, Tim Shearer wrote:
> Hi Folks,
>
> Looking for help/perspectives.
>
> Anyone got any clever solutions for allowing folks to find a word with
> diacritics in a rendered web page regardless of whether or not the user
> tries with or without diacritics.
>
> In indexes this is usually solved by indexing the word with and without, so
> the user gets what they want regardless of how they search.
>
> Thanks in advance for any ideas/enlightenment,
> Tim
>


Re: [CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Joe Atzberger
Something like the jquery highlight function combined with this kind of
mapping:

http://stackoverflow.com/questions/863800/replacing-diacritics-in-javascript

If you don't mind, you can speed things up by forcing the comparison sets to
be in one case or the other.
--Joe

On Fri, Aug 28, 2009 at 12:17 PM, Tim Shearer  wrote:

> Hi Folks,
>
> Looking for help/perspectives.
>
> Anyone got any clever solutions for allowing folks to find a word with
> diacritics in a rendered web page regardless of whether or not the user
> tries with or without diacritics.
>
> In indexes this is usually solved by indexing the word with and without, so
> the user gets what they want regardless of how they search.
>
> Thanks in advance for any ideas/enlightenment,
> Tim
>


[CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Tim Shearer

Hi Folks,

Looking for help/perspectives.

Anyone got any clever solutions for allowing folks to find a word with 
diacritics in a rendered web page regardless of whether or not the user 
tries with or without diacritics.


In indexes this is usually solved by indexing the word with and without, 
so the user gets what they want regardless of how they search.


Thanks in advance for any ideas/enlightenment,
Tim


Re: [CODE4LIB] digital storage

2009-08-28 Thread Tim McGeary

Related to our discussion:
http://online.wsj.com/article/SB125139942345664387.html

I particularly like the quote at the end:


"Digital information lasts forever -- or five years," says RAND Corp.
computer analyst Jeff Rothenberg, "whichever comes first."


Tim McGeary
Team Leader, Library Technology
Lehigh University
610-758-4998
tim.mcge...@lehigh.edu
Google Talk: timmcgeary
Yahoo IM: timmcgeary

Edward Iglesias wrote:
Thanks to all of you who answered.  Crowdsourcing does work if you 
pick the right crowd.  We have been looking at the S3 possibility but
 I agree this would have to be a second copy.  The policy and 
institutional support comments from my tokayo


see http://en.wiktionary.org/wiki/tocayo

seem especially appropriate.  I am going to include a link on our 
staff blog to this thread as a resource.


Thanks again,

Edward Iglesias



On Thu, Aug 27, 2009 at 8:59 PM, Edward M.
Corrado wrote:

Joe Atzberger wrote:
On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado 
wrote:




Nate Vack wrote:



On Thu, Aug 27, 2009 at 1:57 PM, Ryan
Ordway wrote:




$213,360 over 3 years




If you're ONLY looking at storage costs, SATA drives in
enterprise RAID


systems range from about $1.00/GB to about $1.25/GB for
online storage.



Yeah -- but if you're looking only at storage costs, you'll
have an inaccurate estimate of your costs. You've got power,
cooling, sysadmin time, and replacements for failed disks. If
you want an apples-to-apples comparison, you'll want an
offsite mirror, as well.

I'm not saying S3 is always cost-effective -- but in our
experience, the costs of the disks themselves is dwarfed by
the costs of the related infrastructure.

I agree that the cost of storage is only one factor. I have
to wonder


though, how much more staff time do you need for local storage
than cloud storage? I don't know the answer but I'm not sure it
is much more than setting up S3 storage, especially if you have
a good partnership with your storage vendor.



Support relationships, especially regarding storage are very
costly.  When I worked at a midsize datacenter, we implemented a
backup solution with STORServer and tivoli.  Both hardware and
software were considerably costly.  Initial and ongoing support,
while indispensable was basically as much as the cost of the
hardware every few years.


They can be depending on what you are doing and what choices on
software you make, but for long term preservation purposes they
don't have to be nearly as expensive as what Ryan calculated S3 to
cost. If you shop around you can get a quality 36GB array with 3 yr
warranty for say $30,000 that is almost $180,000 less than S3
(probably much less, I'm be less than generous with my Sun
discounts and only briefly looked at there prices). Even if we use
the double your cost for support, it is still over $50,000 a year
less for 3 years. Yes, we might need some expertise, but running a
36TB preservation storage array is not a $50,000 a year job and
besides, what is wrong with growing local expertise?

...

Yes, maybe you save on staff time patching software on your
storage array, but that is not a significant amount of time -
esp. since you are still going to have some local storage, and
there isn't much difference in staff time in doing 2 TB vs. 20
TB.



There's a real difference.  I can get 2 TB in a single HDD, for
example this one for $200 at NewEgg: 
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 




Any high school kid can install that.  20 TB requires some kind
of additional structure and additional expertise.


Well building a 20 TB storage device and getting it to work can
actually be very cheap and doesn't require a PhD (just a local
GNU/Linux geek who likes to play with hardware) if you are OK with
a home grown solution. I wouldn't be satisfied with that, but I
don't see how a commercial offering that adds up to $150,000 worth
of expertise and infrastructure.


You may some time on the initial configuration, but you still
need to


configure cloud storage. Is cloud storage that much easier/less
time consuming to configure than an iSCSI device? Replacement
for disks would be covered under your warranty or support
contract (at least I would hope you would have one).



Warranties expire and force you into ill-timed, hardly-afforded
and dangerous-to-your-data upgrades.  Sorta like some ILS systems
with which we are all familiar.

Yes some application upgrades can cause issues, but how is that
different if your application and/or storage is in a  cloud?


The cloud doesn't necessarily stay the same, but the part you
care about (data in, data out) does.


How do you know they won't change their cloud models? And you don't
even have a warranty with the cloud. They won't even guarantee they
won't delete your data.

As long as you use a common standards based method of storage, you
won't have any more issues getting it to work than you will 

Re: [CODE4LIB] digital storage

2009-08-28 Thread Edward Iglesias
Thanks to all of you who answered.  Crowdsourcing does work if you
pick the right crowd.  We have been looking at the S3 possibility but
I agree this would have to be a second copy.  The policy and
institutional support comments from my tokayo

see http://en.wiktionary.org/wiki/tocayo

seem especially appropriate.  I am going to include a link on our
staff blog to this thread as a resource.

Thanks again,

Edward Iglesias



On Thu, Aug 27, 2009 at 8:59 PM, Edward M. Corrado wrote:
> Joe Atzberger wrote:
>>
>> On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado
>> wrote:
>>
>>
>>>
>>> Nate Vack wrote:
>>>
>>>

 On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordway
 wrote:



>
> $213,360 over 3 years
>
>
>

  If you're ONLY looking at storage costs, SATA drives in enterprise RAID

>
> systems range from about $1.00/GB to about $1.25/GB for online storage.
>
>

 Yeah -- but if you're looking only at storage costs, you'll have an
 inaccurate estimate of your costs. You've got power, cooling, sysadmin
 time, and replacements for failed disks. If you want an
 apples-to-apples comparison, you'll want an offsite mirror, as well.

 I'm not saying S3 is always cost-effective -- but in our experience,
 the costs of the disks themselves is dwarfed by the costs of the
 related infrastructure.

  I agree that the cost of storage is only one factor. I have to wonder

>>>
>>> though, how much more staff time do you need for local storage than cloud
>>> storage? I don't know the answer but I'm not sure it is much more than
>>> setting up S3 storage, especially if you have a good partnership with
>>> your
>>> storage vendor.
>>>
>>
>>
>> Support relationships, especially regarding storage are very costly.  When
>> I
>> worked at a midsize datacenter, we implemented a backup solution with
>> STORServer and tivoli.  Both hardware and software were considerably
>> costly.  Initial and ongoing support, while indispensable was basically as
>> much as the cost of the hardware every few years.
>>
>
> They can be depending on what you are doing and what choices on software you
> make, but for long term preservation purposes they don't have to be nearly
> as expensive as what Ryan calculated S3 to cost. If you shop around you can
> get a quality 36GB array with 3 yr warranty for say $30,000 that is almost
> $180,000 less than S3 (probably much less, I'm be less than generous with my
> Sun discounts and only briefly looked at there prices). Even if we use the
> double your cost for support, it is still over $50,000 a year less for 3
> years. Yes, we might need some expertise, but running a 36TB preservation
> storage array is not a $50,000 a year job and besides, what is wrong with
> growing local expertise?
>
> ...
>>>
>>> Yes, maybe you save on staff time patching software on your storage
>>> array,
>>> but that is not a significant amount of time - esp. since you are still
>>> going to have some local storage, and there isn't much difference in
>>> staff
>>> time in doing 2 TB vs. 20 TB.
>>>
>>
>>
>> There's a real difference.  I can get 2 TB in a single HDD, for example
>> this
>> one for $200 at NewEgg:
>> http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413
>> 
>>
>> Any high school kid can install that.  20 TB requires some kind of
>> additional structure and additional expertise.
>>
>
> Well building a 20 TB storage device and getting it to work can actually be
> very cheap and doesn't require a PhD (just a local GNU/Linux geek who likes
> to play with hardware) if you are OK with a home grown solution. I wouldn't
> be satisfied with that, but I don't see how a commercial offering that adds
> up to $150,000 worth of expertise and infrastructure.
>
>> You may some time on the initial configuration, but you still need to
>>
>>>
>>> configure cloud storage. Is cloud storage that much easier/less time
>>> consuming to configure than an iSCSI device? Replacement for disks would
>>> be
>>> covered under your warranty or support contract (at least I would hope
>>> you
>>> would have one).
>>>
>>
>>
>> Warranties expire and force you into ill-timed, hardly-afforded and
>> dangerous-to-your-data upgrades.  Sorta like some ILS systems with which
>> we
>> are all familiar.
>
> Yes some application upgrades can cause issues, but how is that different if
> your application and/or storage is in a  cloud?
>
>>  The cloud doesn't necessarily stay the same, but the part
>> you care about (data in, data out) does.
>>
>
> How do you know they won't change their cloud models? And you don't even
> have a warranty with the cloud. They won't even guarantee they won't delete
> your data.
>
> As long as you use a common standards based method of storage, you won't
> have any more issues getting it to work than you will getting future
> application servers to work wi