[CODE4LIB] Unicode persistence (was: it's cool to hate on OpenURL)

2010-04-30 Thread Jakob Voss

Eric Hellman wrote:

May I just add here that of all the things we've talked about in
these threads, perhaps the only thing that will still be in use a
hundred years from now will be Unicode. إن شاء الله

Stuart Yeates wrote:

> Sadly, yes, I agree with you on this.
> Do you have any idea how demotivating that is for those of us
> maintaining collections with works containing characters that don't
> qualify for inclusion?

May I just add there that Unicode is evolving too and you can help to 
get missing characters included. One of the next updates will even 
include hundreds of icons such as a slice of pizza, a kissing couple, 
and the mount Fuji (See this zipped PDF: http://is.gd/bABl9 and 

I also bet that Unicode will be there in hundred years from now (and 
probably URIs) while things like XML and RDF may be little used then. 
But I fear that the Unicode may be used in a different way just like 
words in natural language change their meanings over the centuries.

And that's why wee need libraries (phew, at least one positive claim 
about these institutions we all are bound to ;-)


Jakob Voß , skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de

Re: [CODE4LIB] it's cool to hate on OpenURL

2010-04-30 Thread Jakob Voss

Stuart Yeates wrote:

A great deal of heat has been vented in this thread, and at least a 
little light.

I'd like to invite everyone to contribute to the wikipedia page at 
http://en.wikipedia.org/wiki/OpenURL in the hopes that it evolves into a 
better overview of the protocol, the ecosystem and their place on th web.

[Hint: the best heading for a rant wikipedia is 'criticisms' but you'll 
still need to reference the key points. Links into this thread count as 
references, if you can't find anything else.]

Good point - but writing Wikipedia articles is more work than discussing 
on mailing lists ;-) Instead of improving the OpenURL article I started 
to add to the more relevant[1] COinS article:


Maybe some of you (Eric Hellman, Richard Cameron, Daniel Chudnov, Ross 
Singer, Herbert Van de Sompel ...) could fix the history section which I 
tried to reconstruct from historic sources[2] from the Internet without 
violating the Wikipedia NPOV which is hard if you write about things you 
were involved at.

Am I right that neither OpenURL nor COinS strictly defines a metadata 
model with a set of entities/attributes/fields/you-name-it and their 
definition? Apparently all ContextObjects metadata formats are based on 
non-normative "implementation guidelines" only ??


[1] My bet: What will remain from OpenURL will be "a link server base 
URL that you attach a COinS to"

[2] about five years ago, so its historic in terms of internet ;-) By 
the way does anyone have a copy of

http://dbk.ch.umist.ac.uk/wiki/index.php?title=Metadata_in_HTML ?

Jakob Voß , skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Owen Stephens
Dead ends from OpenURL enabled hyperlinks aren't a result of the standard
though, but rather an aspect of both the problem they are trying to solve,
and the conceptual way they try to do this.

I'd content these dead ends are an implementation issue - and despite this I
have to say that my experience on the ground is that feedback from library
users on the use of link resolvers is positive - much more so than many of
the other library systems I've been involved with.

What I do see as a problem is that this market seems to have essentially
stagnated, at least as far as I can see. I suspect the reasons for this are
complex, but it would be nice to see some more innovation in this area.


On Thu, Apr 29, 2010 at 6:14 PM, Ed Summers  wrote:

> On Thu, Apr 29, 2010 at 12:08 PM, Eric Hellman  wrote:
> > Since this thread has turned into a discussion on OpenURL...
> >
> > I have to say that during the OpenURL 1.0 standardization process, we
> definitely had moments of despair. Today, I'm willing to derive satisfaction
> from "it works" and overlook shortcomings. It might have been otherwise.
> Personally, I've followed enough OpenURL enabled hyperlink dead ends
> to contest "it works".
> //Ed

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] Twitter annotations and library software

2010-04-30 Thread Owen Stephens

Could you expand on how you think the problem that OpenURL tackles would
have been better approached with existing mechanisms? I'm not debating this
necessarily, but from my perspective when OpenURL was first introduced it
solved a real problem that I hadn't seen solved before.


On Thu, Apr 29, 2010 at 11:55 PM, Alexander Johannesen <
alexander.johanne...@gmail.com> wrote:

> Hi,
> On Thu, Apr 29, 2010 at 22:47, Walker, David  wrote:
> > I would suggest it's more because, once you step outside of the
> > primary use case for OpenURL, you end-up bumping into *other* standards.
> These issues were raised all the back when it was created, as well. I
> guess it's easy to be clever in hindsight. :) Here's what I wrote
> about it 5 years ago (http://shelter.nu/blog-159.html) ;
> So let's talk about 'Not invented here' first, because surely, we're
> all guilty of this one from time to time. For example, lately I dug
> into the ANSI/NISO Z39.88 -2004 standard, better known as OpenURL. I
> was looking at it critically, I have to admit, comparing it to what I
> already knew about Web Services, SOA, http,
> Google/Amazon/Flickr/Del.icio.us API's, and various Topic Maps and
> semantic web technologies (I was the technical editor of Explorers
> Guide to the Semantic Web)
> I think I can sum up my experiences with OpenURL as such; why? Why
> have the library world invented a new way of doing things that already
> can be done quite well already? Now, there is absolutely nothing wrong
> with the standard per se (except a pretty darn awful choice of
> name!!), so I'm not here criticising the technical merits and the work
> put into it. No, it's a simple 'why' that I have yet to get a decent
> answer to, even after talking to the OpenURL bigwigs about it. I mean,
> come on; convince me! I'm not unreasonable, no truly, really, I just
> want to be convinced that we need this over anything else.
> Regards,
> Alex
> --
>  Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
> --- http://shelter.nu/blog/ --
> -- http://www.google.com/profiles/alexander.johannesen ---

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] it's cool to hate on OpenURL

2010-04-30 Thread Thomas Berger
Hash: SHA1


Jakob Voss schrieb:

> Am I right that neither OpenURL nor COinS strictly defines a metadata
> model with a set of entities/attributes/fields/you-name-it and their
> definition? Apparently all ContextObjects metadata formats are based on
> non-normative "implementation guidelines" only ??

You are right, the 129 page spec of Z39.88 only deals with these in
examples, and IIRC does not even decent pointers to the following:

There are some "Core Metadata formats" registered under

notably info:ofi/fmt:kev:mtx:book and info:ofi/fmt:kev:mtx:journal :

< http://www.openurl.info/registry/docs/mtx/info:ofi/fmt:kev:mtx:book >,
< http://www.openurl.info/registry/docs/mtx/info:ofi/fmt:kev:mtx:journal >.

As for the semantics of the fields defined there, the "description" is
certainly not strictly defined in any sense but you can already see
that there is potential for confusion and especially no canonical way
(no way at all?) to achieve satisfying descriptions for several kinds of
non-mainstream objects:

* scholarly articles originally published online (when a "journal
  context" cannot be constructed)

* articles in books which are not by coincidence conference proceedings


Thomas Berger

Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


Re: [CODE4LIB] Twitter annotations and library software

2010-04-30 Thread Owen Stephens

I'd vote for adopting the same approach as COinS on the basis it already has
some level of adoption, and we know covers at least some of the stuff
libraries and academic users (as used by both libraries and consumer tools
such as Zotero) might want to do. We are talking Books (from what you've
said), so we don't have to worry about other formats. (although it does mean
we can do journal articles and some other stuff as well for no effort)

Mendeley and Zotero already speak COinS, it is pretty simple, and there are
already several code libraries to deal with it.

It isn't where I hope we end up in the longterm but if we talk about this
happening tomorrow, why not use something that is relatively simple, already
has a good set of implementations, and we know works for several cases of
embedding book metadata in a web environment


On Thu, Apr 29, 2010 at 7:01 PM, Jakob Voss  wrote:

> Dear Tim,
> you wrote:
>> So this is my recommended framework for proceeding. Tim, I'm afraid
>>> you'll actually have to do the hard work yourself.
>> No, I don't. Because the work isn't fundamentally that hard. A
>> complex standard might be, but I never for a moment considered
>> anything like that. We have *512 bytes*, and it needs to be usable by
>> anyone. Library technology is usually fatally over-engineered, but
>> this is a case where that approach isn't even possible.
> Jonathan did a very well summary - you just have to pick what you main
> focus of embedding bibliographic data is.
> A) I favour using the CSL-Record format which I summarized at
> http://wiki.code4lib.org/index.php/Citation_Style_Language
> because I had in mind that people want to have a nice looking citation of
> the publication that someone tweeted about. The drawback is that CSL is less
> adopted and will not always fit in 512 bytes
> B) If you main focus is to link Tweets about the same publication (and
> other stuff about this publication) than you must embed identifiers.
> LibraryThing is mainly based on two identifiers
> 1) ISBN to identify editions
> 2) LT work ids to identify works
> I wonder why LT work ids have not picked up more although you thankfully
> provide a full mapping to ISBN at
> http://www.librarything.com/feeds/thingISBN.xml.gz but nevermind. I
> thought that some LT records also contain other identifiers such as OCLC
> number, LOC number etc. but maybe I am wrong. The best way to specify
> identifiers is to use an URI (all relevant identifiers that I know have an
> URI form). For ISBN it is
> uri:isbn:{ISBN13}
> For LT Work-ID you can use the URL with your .com top level domain:
> http://www.librarything.com/work/{LTWORKID}
> That would fit for tweets about books with an ISBN and for tweets about a
> work which will make 99.9% of tweets from LT about single publications
> anyway.
> C) If your focus is to let people search for a publication in libraries
> than and to copy bibliographic data in reference management software then
> COinS is a way to go. COinS is based on OpenURL which I and others ranted
> about because it is a crapy library standard like MARC. But unlike other
> metadata formats COinS usually fits in less then 512 bytes. Furthermore you
> may have to deal with it for LibraryThing for libraries anyway.
> Although I strongly favour CSL as a practising library scientist and
> developer I must admit that for LibraryThing the best way is to embed
> identifiers (ISBN and LT Work-ID) and maybe COinS. As long as LibraryThing
> does not open up to more complex publications like preprints of
> proceeding-articles in series etc. but mainly deals with books and works
> this will make LibraryThing users happy.
>  Then, three years from now, we can all conference-tweet about a CIL talk,
>> about all the cool ways libraries are using Twitter, and how it's such a
>> shame that the annotations standard wasn't designed with libraries in mind.
> How about a bet instead of voting. In three years will there be:
> a) No relevant Twitter annotations anyway
> b) Twitter annotations but not used much for bibliographic data
> c) A rich variety of incompatible bibliographic annotation standards
> d) Semantic Web will have solved every problem anyway
> ..
> Cheers
> Jakob
> --
> Jakob Voß , skype: nichtich
> Verbundzentrale des GBV (VZG) / Common Library Network
> Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
> +49 (0)551 39-10242, http://www.gbv.de

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] Twitter annotations and library software

2010-04-30 Thread Alexander Johannesen
On Fri, Apr 30, 2010 at 18:47, Owen Stephens  wrote:
> Could you expand on how you think the problem that OpenURL tackles would
> have been better approached with existing mechanisms?

As we all know, it's pretty much a spec for a way to template incoming
and outgoing URLs, defining some functionality along the way. As such,
URLs with basic URI templates and rewriting have been around for a
long time. Even longer than that is just the basics of HTTP which have
status codes and functionality to do exactly the same. We've been
doing link resolving since mid 90's, either as CGI scripts, or as
Apache modules, so none of this were new. URI comes in, you look it up
in a database, you cross-check with other REQUEST parameters (or
sessions, if you must, as well as IP addresses) and pop out a 303
(with some possible rewriting of the outgoing URL) (with the hack we
needed at the time to also create dummy pages with META tags

So the idea was to standardize on a way to do this, and it was a good
idea as such. OpenURL *could* have had a great potential if it
actually defined something tangible, something concrete like a model
of interaction or basic rules for fishing and catching tokens and the
like, and as someone else mentioned, the 0.1 version was quite a good
start. But by the time when 1.0 came out, all the goodness had turned
so generic and flexible in such a complex way that handling it turned
you right off it. The standard also had a very difficult language, and
more specifically didn't use enough of the normal geeky language used
by sysadmins around. The more I tried to wrap my head around it, the
more I felt like just going back to CGI scripts that looked stuff up
in a database. It was easier to hack legacy code, which, well, defeats
the purpose, no?

Also, forgive me if I've forgotten important details; I've suppressed
this part of my life. :)

Kind regards,

 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---

Re: [CODE4LIB] Twitter annotations and library software

2010-04-30 Thread Owen Stephens
Thanks Alex,

This makes sense, and yes I see what your saying - and yes, if you end up
going back to custom coding because it's easier it does seem to defeat the

However I'd argue that actually OpenURL 'succeeded' because it did manage to
get some level of acceptance (ignoring the question of whether it is v0.1 or
v1.0) - the cost of developing 'link resolvers' would have been much higher
if we'd been doing something different for each publisher/platform. In this
sense (I'd argue) sometimes crappy standards are better than none.

We've used OpenURL v1.0 in a recent project and because we were able to
simply pick up code already done for Zotero, and  we already had an OpenURL
resolver, the amount of new code we needed for this was minimal.

I think the point about Link Resolvers doing stuff that Apache and CGI
scripts were already doing is a good one - and I've argued before that what
we actually should do is separate some of this out (a bit like Johnathan did
with Umlaut) into an application that can answer questions about location
(what is generally called the KnowledgeBase in link resolvers) and the
applications that deal with analysing the context and the redirection

(To introduce another tangent in a tangential thread, interestingly (I
think!) I'm having a not dissimilar debate about Linked Data at the moment -
there are many who argue that it is too complex and that as long as you have
a nice RESTful interface you don't need to get bogged down in ontologies and
RDF etc. I'm still struggling with this one - my instinct is that it will
pay to standardise but so far I've not managed to convince even myself this
is more than wishful thinking at the moment)


On Fri, Apr 30, 2010 at 10:33 AM, Alexander Johannesen <
alexander.johanne...@gmail.com> wrote:

> On Fri, Apr 30, 2010 at 18:47, Owen Stephens  wrote:
> > Could you expand on how you think the problem that OpenURL tackles would
> > have been better approached with existing mechanisms?
> As we all know, it's pretty much a spec for a way to template incoming
> and outgoing URLs, defining some functionality along the way. As such,
> URLs with basic URI templates and rewriting have been around for a
> long time. Even longer than that is just the basics of HTTP which have
> status codes and functionality to do exactly the same. We've been
> doing link resolving since mid 90's, either as CGI scripts, or as
> Apache modules, so none of this were new. URI comes in, you look it up
> in a database, you cross-check with other REQUEST parameters (or
> sessions, if you must, as well as IP addresses) and pop out a 303
> (with some possible rewriting of the outgoing URL) (with the hack we
> needed at the time to also create dummy pages with META tags
> *shudder*).
> So the idea was to standardize on a way to do this, and it was a good
> idea as such. OpenURL *could* have had a great potential if it
> actually defined something tangible, something concrete like a model
> of interaction or basic rules for fishing and catching tokens and the
> like, and as someone else mentioned, the 0.1 version was quite a good
> start. But by the time when 1.0 came out, all the goodness had turned
> so generic and flexible in such a complex way that handling it turned
> you right off it. The standard also had a very difficult language, and
> more specifically didn't use enough of the normal geeky language used
> by sysadmins around. The more I tried to wrap my head around it, the
> more I felt like just going back to CGI scripts that looked stuff up
> in a database. It was easier to hack legacy code, which, well, defeats
> the purpose, no?
> Also, forgive me if I've forgotten important details; I've suppressed
> this part of my life. :)
> Kind regards,
> Alex
> --
>  Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
> --- http://shelter.nu/blog/ --
> -- http://www.google.com/profiles/alexander.johannesen ---

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] Twitter annotations and library software

2010-04-30 Thread Alexander Johannesen
On Fri, Apr 30, 2010 at 20:29, Owen Stephens  wrote:
> However I'd argue that actually OpenURL 'succeeded' because it did manage to
> get some level of acceptance (ignoring the question of whether it is v0.1 or
> v1.0) - the cost of developing 'link resolvers' would have been much higher
> if we'd been doing something different for each publisher/platform. In this
> sense (I'd argue) sometimes crappy standards are better than none.

Well, perhaps. I see OpenURL as the natural progression from PURL, in
which both have their degree of "success", however I'm careful using
that word as I live on the outside of the library world. It may well
be a success on the inside. :)

> I think the point about Link Resolvers doing stuff that Apache and CGI
> scripts were already doing is a good one - and I've argued before that what
> we actually should do is separate some of this out (a bit like Johnathan did
> with Umlaut) into an application that can answer questions about location
> (what is generally called the KnowledgeBase in link resolvers) and the
> applications that deal with analysing the context and the redirection

Yes, split it into smaller chunks is always smart, especially with
complex issues. For example, in the Topic Maps world, the who standard
(reference model, data model, query language, constraint language, XML
exchange language, various notational languages) is wrapped up with a
guide in the middle. Make them into smaller parcels, and make your
flexible point there. If you pop it all into one, no one will read it
and fully understand it. (And don't get me started on the WS-* set of
standards on the same issues ...)

> (To introduce another tangent in a tangential thread, interestingly (I
> think!) I'm having a not dissimilar debate about Linked Data at the moment -
> there are many who argue that it is too complex and that as long as you have
> a nice RESTful interface you don't need to get bogged down in ontologies and
> RDF etc. I'm still struggling with this one - my instinct is that it will
> pay to standardise but so far I've not managed to convince even myself this
> is more than wishful thinking at the moment)

Ah, now this is certainly up my alley. As you might have seen, I'm a
Topic Maps guy, and we have in our model a distinction between three
different kinds of identities; internal, external indicators and
published subject identifiers. The RDF world only had rdf:about, so
when you used "www.somewhere.org", are you talking about that thing,
or does that thing represent something you're talking about? Tricky
stuff which has these days become a *huge* problem with Linked Data.
And yes, they're trying to solve that by issuing a HTTP 303 status
code as a means of declaring the identifiers imperative, which is a
*lot* of resolving to do on any substantial set of data, and in my
eyes a huge ugly hack. (And what if your Internet falls down? Tough.)

Anyway, here's more on these identity problems ;

As to the RESTful notions, they only take you as far as content-types
can take you. Sure, you can gleam semantics from it, but I reckon
there's an impedance mismatch between just the things librarians how
got down pat ; meta data vs. data. CRUD or, in this example, GPPD
(get/post/put/delete), who aren't in a dichotomy btw, can only
determine behavior that enables certain semantic paradigms, but cannot
speak about more complex relationships or even modest models. (Very
often models aren't actionable :)

The funny thing is that after all these years of working with Topic
Maps I find that these hard issues have been solved years ago, and the
rest of the world is slowly catching up to it. I blame the lame
DAML+OIL background of RDF and OWL, to be honest; a model too simple
to be elegantly advanced and too complex to be easily useful.

Kind regards,

 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ --
-- http://www.google.com/profiles/alexander.johannesen ---

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Kyle Banerjee
> Dead ends from OpenURL enabled hyperlinks aren't a result of the standard
> though, but rather an aspect of both the problem they are trying to solve,
> and the conceptual way they try to do this.
> I'd content these dead ends are an implementation issue.

Absolutely. There is no inherent reason that an OpenURL has to be a plain
text link that must be manually clicked nor is there any requirement that
the resolver simply return an HTML page that may or may not contain plain
text links the user can click on.

An obvious thing for a resolver to be able to do is return results in JSON
so the OpenURL can be more than a static link. But since the standard
defines no such response, the site generating the OpenURL would have to know
something about the resolver.

> What I do see as a problem is that this market seems to have essentially
> stagnated, at least as far as I can see. I suspect the reasons for this are
> complex

I suspect they are simple. The easy part of OpenURL addresses a compelling
use case, but there just isn't that much pressure to take things to the next
level. If people really got that upset about dead links -- and in many
systems this is impossible because you'll be offered ILL fulfillment if no
electronic copy is available -- there would be more incentive to handle
resolution before the user clicks on the link.

In short, you reach a point of diminishing returns with OpenURL very


[CODE4LIB] Job Posting: Database Specialist @ Penn State University Libraries

2010-04-30 Thread Janis Mathewson

Hi all,

We have an application developer job opportunity in my department at the 
Penn State Libraries.


Database Specialist
Level: 03
Work Unit: University Libraries
Department: Department Of Information Technologies
Job Number: 32060

Penn State University Libraries is seeking a database/applications 
specialist to support Library initiatives in Web application 
development, data analysis and database design. Design and development 
of Web/data applications incorporating the complete software development 
cycle including needs assessment, definition of requirements, prototype 
development, application design, testing, coding, documentation and 
integration of Web application services into other enterprise platforms 
as appropriate. Typically requires Bachelor’s degree plus four years of 
related experience or an equivalent combination of education and 
experience. Experience should include database design, data 
manipulation, developing and deploying software or an equivalent 
combination of education and experience. The candidate must be able to: 
keep informed of new technologies for use with application development 
initiatives; create complex queries and generate custom reports from 
Oracle tables and other data sources; assist in defining data needs and 
in interpretation of existing data for library administration; provide 
advanced SQL code to aid colleagues in database planning, querying, data 
transformation and analyzing; design database architecture for new 
Web/data applications; lead and participate on project teams; work 
closely with other developers in the unit and serve on cross functional 
teams; disseminate project information in written and oral formats to a 
technical and non-specialized audience in a variety of settings (public, 
small group, individual, etc.). The ideal candidate must have experience 
with: a server side scripting language such as ColdFusion, PHP, JSP, 
.NET etc., advanced SQL and database design skills, XHTML, CSS, 
demonstrated commitment to diversity; ability to effectively communicate 
technical information to a non-technical audience and excellent 
organizational, interpersonal and communication skills; and demonstrated 
ability to work in a team environment. Preferred skills include 
proficiency with Oracle databases and ColdFusion, strong understanding 
of XML/XSLT, AJAX, JavaScript, PL/SQL and familiarity with a web content 
management system. A commitment to providing outstanding customer 
service and a driven passion for technology are essential. The 
successful candidate must also possess the ability to interact 
effectively with individuals from a variety of cultures and backgrounds. 
The University Libraries is a multicultural environment that embraces 
respect and diversity.

To apply please visit: http://www.psu.jobs/Search/Opportunities.html

Janis Mathewson
Department for Information Technologies (I-Tech)
Penn State University Libraries
University Park, PA 16802-1812
(814) 865-4867

Re: [CODE4LIB] it's cool to hate on OpenURL

2010-04-30 Thread Ross Singer
On Fri, Apr 30, 2010 at 4:09 AM, Jakob Voss  wrote:

> Am I right that neither OpenURL nor COinS strictly defines a metadata model
> with a set of entities/attributes/fields/you-name-it and their definition?
> Apparently all ContextObjects metadata formats are based on non-normative
> "implementation guidelines" only ??

You are right.  Z39.88 and (by extension) COinS really only defines
the ContextObject itself.  So it defines the carrier "package", it's
administrative elements, referents, referrers, referringentities,
services, requester and resolver and their transports.

It doesn't really specify what should actually go into any of those
slots.  The idea is that it defers to the community profiles for that.

In the XML context object, you can send more than one metadata-by-val
element (or metadata-by-ref) per entity (ref, rfr, rfe, svc, req, res)
- I'm not sure what is supposed to happen, for example, if you send a
referent that has multiple MBV elements that don't actually describe
the same thing.


Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Ross Singer
On Fri, Apr 30, 2010 at 7:59 AM, Kyle Banerjee  wrote:

> An obvious thing for a resolver to be able to do is return results in JSON
> so the OpenURL can be more than a static link. But since the standard
> defines no such response, the site generating the OpenURL would have to know
> something about the resolver.
I actually think this lack of any specified response format is a large
factor in the stagnation of OpenURL as a technology.  Since a resolver
is under no obligation to do anything but present a web page it's
difficult for local entrepreneurial types to build upon the
infrastructure simply because there are no guarantees that it will
work anywhere else (or even locally, depending on your vendor, I
suppose), much less contribute back to the ecosystem.

Umlaut was able to exist because (for better or worse) SFX has an XML
output.  It has never been able to scale horizontally, however,
because to work with another vendor's link resolver (which should
actually be quite straightforward) it requires a connector to whatever
*their* proprietary API needs.

I could definitely see a project like Umlaut providing a 'de facto'
machine readable response for SAP 1/2 requests that content providers
could then use to start offering better integration at *their* end.

This assumes that more than 5 libraries would actually be using it, however.


Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Ed Summers
On Fri, Apr 30, 2010 at 9:09 AM, Ross Singer  wrote:
> I actually think this lack of any specified response format is a large
> factor in the stagnation of OpenURL as a technology.  Since a resolver
> is under no obligation to do anything but present a web page it's
> difficult for local entrepreneurial types to build upon the
> infrastructure simply because there are no guarantees that it will
> work anywhere else (or even locally, depending on your vendor, I
> suppose), much less contribute back to the ecosystem.

I agree. And that's an issue with the standard, not the implementations.


[CODE4LIB] Approaches to "Did You Mean" Query Spelling Suggestions

2010-04-30 Thread Cory Lown
I'm exploring options for implementing a spelling suggestion or basic query 
reformulation service in our home grown search application (searches library 
website, catalog, summon, and a few other bins). Right now, my thought is to 
provide results for whatever was searched for 'as is' and generate a link for 
an alternate search -- sort of like what The Google does.  I am concerned only 
with correcting spelling errors, not so much with topically related search 

The 3 options I've found that seem worth further investigation are:

- Yahoo Search spellingSuggestion service: 

- GNU Aspell: http://aspell.net

- Ockham Spell service: http://spell.ockham.org/about/index.html . There is a 
thread on Code4Lib back in 2005 about this: 

Anyone doing something like this? What tools are you using? What have you 
tried? What worked well? Have I overlooked an option that I should consider?



Cory Lown
NCSU Libraries
Raleigh, NC

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Eric Hellman
OK, what does the EdSuRoSi spec for OpenURL responses say?


On Apr 30, 2010, at 9:40 AM, Ed Summers wrote:

> On Fri, Apr 30, 2010 at 9:09 AM, Ross Singer  wrote:
>> I actually think this lack of any specified response format is a large
>> factor in the stagnation of OpenURL as a technology.  Since a resolver
>> is under no obligation to do anything but present a web page it's
>> difficult for local entrepreneurial types to build upon the
>> infrastructure simply because there are no guarantees that it will
>> work anywhere else (or even locally, depending on your vendor, I
>> suppose), much less contribute back to the ecosystem.
> I agree. And that's an issue with the standard, not the implementations.
> //Ed

Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042


Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Owen Stephens
Although part of the problem is that you might want to offer any service on
the basis of an OpenURL the major use case is supply of a document (either
online or via ILL) - so it strikes me you could look at DAIA
http://www.gbv.de/wikis/cls/DAIA_-_Document_Availability_Information_API ?
Jakob does this make sense?


On Fri, Apr 30, 2010 at 3:08 PM, Eric Hellman  wrote:

> OK, what does the EdSuRoSi spec for OpenURL responses say?
> Eric
> On Apr 30, 2010, at 9:40 AM, Ed Summers wrote:
> > On Fri, Apr 30, 2010 at 9:09 AM, Ross Singer 
> wrote:
> >> I actually think this lack of any specified response format is a large
> >> factor in the stagnation of OpenURL as a technology.  Since a resolver
> >> is under no obligation to do anything but present a web page it's
> >> difficult for local entrepreneurial types to build upon the
> >> infrastructure simply because there are no guarantees that it will
> >> work anywhere else (or even locally, depending on your vendor, I
> >> suppose), much less contribute back to the ecosystem.
> >
> > I agree. And that's an issue with the standard, not the implementations.
> >
> > //Ed
> Eric Hellman
> President, Gluejar, Inc.
> 41 Watchung Plaza, #132
> Montclair, NJ 07042
> e...@hellman.net
> http://go-to-hellman.blogspot.com/

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Ross Singer
On Fri, Apr 30, 2010 at 10:08 AM, Eric Hellman  wrote:
> OK, what does the EdSuRoSi spec for OpenURL responses say?
Well, I don't think it's up to us and I think it's dependent upon
community profile (more than Z39.88 itself), since it would be heavily
influenced with what is actually trying to be accomplished.

I think the basis of a response could actually be another context
object with the 'services' entity containing a list of
services/targets that are formatted in some way that is appropriate
for the context and the referent entity enhanced with whatever the
resolver can add to the puzzle.

This could then be taken to another resolver for more services layered on.

This is just riffing off the top of my head, of course...

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Corey A Harper

Hi All,

Though hesitant to jump in here, I agree with Owen that the dead ends 
aren't a standards issue. The bloat of the standard is, as is the lack 
of a standardized response format, but the dead ends have to do with bad 
metadata being coded into open-URLs and with breakdowns in the 
connection between content aggregators/providers and knowledge base 

Work in this area isn't completely stagnant, though. The joint NISO/UK 
Serials Group's "Knowledge Bases And Related Tools working group" is 
looking towards solutions to exactly these problems.


Their initial report on best practice for content providers and KB 
maintainers is worth a look.


Owen Stephens wrote:

Dead ends from OpenURL enabled hyperlinks aren't a result of the standard
though, but rather an aspect of both the problem they are trying to solve,
and the conceptual way they try to do this.

I'd content these dead ends are an implementation issue - and despite this I
have to say that my experience on the ground is that feedback from library
users on the use of link resolvers is positive - much more so than many of
the other library systems I've been involved with.

What I do see as a problem is that this market seems to have essentially
stagnated, at least as far as I can see. I suspect the reasons for this are
complex, but it would be nice to see some more innovation in this area.


On Thu, Apr 29, 2010 at 6:14 PM, Ed Summers  wrote:

On Thu, Apr 29, 2010 at 12:08 PM, Eric Hellman  wrote:

Since this thread has turned into a discussion on OpenURL...

I have to say that during the OpenURL 1.0 standardization process, we

definitely had moments of despair. Today, I'm willing to derive satisfaction
from "it works" and overlook shortcomings. It might have been otherwise.

Personally, I've followed enough OpenURL enabled hyperlink dead ends
to contest "it works".


Corey A Harper
Metadata Services Librarian
New York University Libraries
20 Cooper Square, 3rd Floor
New York, NY 10003-7112

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Eric Hellman
Eek. I was hoping for something much simpler. Do you realize that you're asking 
for service taxonomy?

On Apr 30, 2010, at 10:22 AM, Ross Singer wrote:

> I think the basis of a response could actually be another context
> object with the 'services' entity containing a list of
> services/targets that are formatted in some way that is appropriate
> for the context and the referent entity enhanced with whatever the
> resolver can add to the puzzle.

[CODE4LIB] NYTSL Spring Meeting and Program: May 19, 2010

2010-04-30 Thread Lisa Genoese
Cross-posted; apologies for duplication.


Dear colleagues and friends,

Please join us for:

New York Technical Services Librarians

Spring Meeting & Program

Wednesday, May 19, 2010

Online registration is now open!


Space is limited so please register early.  Registration deadline: Friday, May 
14, 2010.

TOPIC: Communities of Interest : A New Model for Institutional Repositories

SPEAKER: Kate Wittenberg

As Project Director, Client and Partnership Development at Ithaka, Kate focuses 
on building partnerships among scholars, publishers, libraries, technology 
providers, and societies with an interest in promoting the development of 
digital scholarship and building and sustaining innovative initiatives. Before 
coming to Ithaka, Kate was the Director of EPIC (the Electronic Publishing 
Initiative at Columbia) a pioneering initiative in digital publishing, and a 
model partnership for libraries, presses, and academic IT departments. Some of 
the ventures produced by EPIC include CIAO (Columbia International Affairs 
Online), Gutenberg-E (a reinvention of the monograph as an electronic work), 
and Jazz Studies Online.


South Court Auditorium

NYPL Humanities & Social Sciences Library

Fifth Avenue and 42nd Street

New York, N.Y. 10018

Wednesday, May 19, 2010

Refreshments, 5:00-6:00 PM

Meeting & Program, 6:00-8:00 PM


NYTSL now offers PayPal as a preferred payment method. Please go to 
http://www.nytsl.org  for more information.  Mail-in 
registration forms are also available on the website.

Program (Members): $15.00

Program + Membership (Non-members and renewals), Sept. 2009-Aug. 2010 Academic 
Year: $20.00

Program only (Non-members): $25.00

For questions about membership status, please contact Lisa Genoese, 

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Ross Singer
On Fri, Apr 30, 2010 at 10:43 AM, Eric Hellman  wrote:
> Eek. I was hoping for something much simpler. Do you realize that you're 
> asking for service taxonomy?

Yes.  I think you'd have to have one, otherwise how would you know
what to expect from the results?  If the target only offered TOCs or
something, you would want to distinguish that from a target that
offers fulltext or ILL fulfillment.

I mean, right?

How would you propose a response, Eric?  I'm not sold on the ctx idea
(in fact, I'd love something simpler), I just thought it would tie a
nice bow around the existing spec :)


> On Apr 30, 2010, at 10:22 AM, Ross Singer wrote:
>> I think the basis of a response could actually be another context
>> object with the 'services' entity containing a list of
>> services/targets that are formatted in some way that is appropriate
>> for the context and the referent entity enhanced with whatever the
>> resolver can add to the puzzle.

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Ed Summers
On Fri, Apr 30, 2010 at 10:43 AM, Eric Hellman  wrote:
> Eek. I was hoping for something much simpler. Do you realize that you're 
> asking for service taxonomy?

I doubt I understand the full scope of the problem (never made it
through the spec myself). But I imagine a sensible use of HTTP status
codes would've gotten most of the way there.


[CODE4LIB] New open-source inventory system

2010-04-30 Thread Yitzchak Schaffer

Hello all, apologies for cross-posting.

We have just released a PHP/symfony application for processing the 
inventory of ILS item, under the new BSD license.  This is great for 
those of us with an ILS that can do data export, but lacks an inventory 
module.  Code is included for importing items from III export files, as 
well as checking LC shelf-order.


Have fun; let me know if you want to contribute.

Yitzchak Schaffer
Systems Manager
Touro College Libraries
33 West 23rd Street
New York, NY 10010
Tel (212) 463-0400 x5230
Fax (212) 627-3197
Email yitzchak.schaf...@tourolib.org

Access Problems? Contact systems.libr...@touro.edu

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Ross Singer
On Fri, Apr 30, 2010 at 11:21 AM, Ed Summers  wrote:
> I doubt I understand the full scope of the problem (never made it
> through the spec myself). But I imagine a sensible use of HTTP status
> codes would've gotten most of the way there.

Just to clarify -- OpenURL 1.0 does not assume HTTP is being used.

This is not an endorsement of that view, just stating the facts.


Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Ed Summers
On Fri, Apr 30, 2010 at 11:33 AM, Ross Singer  wrote:
> Just to clarify -- OpenURL 1.0 does not assume HTTP is being used.

Oh, so that's the problem!

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Mike Taylor
On 30 April 2010 16:42, Ed Summers  wrote:
> On Fri, Apr 30, 2010 at 11:33 AM, Ross Singer  wrote:
>> Just to clarify -- OpenURL 1.0 does not assume HTTP is being used.
> Oh, so that's the problem!

Yes!  Exactly!

Poor old OpenURL 1.0 is abstracted to hell and back.  The sad old
thing doesn't even know what transport it's running on (why?  Because
Abstraction Is Good, not because anyone actually had any reason for
wanting to use a different transport than HTTP), and as a result it
can't assume it has, for example, the ability for the transport to
report errors.

It's a shame.  I can see the reasons why the committee took it the way
they did, but the whole exercise has ended up smelling of architecture
astronautics.  See this column if you're not familiar with the term,
it's a good read:

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Ross Singer
On Fri, Apr 30, 2010 at 11:52 AM, Mike Taylor  wrote:
> On 30 April 2010 16:42, Ed Summers  wrote:
>> On Fri, Apr 30, 2010 at 11:33 AM, Ross Singer  wrote:
>>> Just to clarify -- OpenURL 1.0 does not assume HTTP is being used.
>> Oh, so that's the problem!
> Yes!  Exactly!
> Poor old OpenURL 1.0 is abstracted to hell and back.  The sad old
> thing doesn't even know what transport it's running on (why?  Because
> Abstraction Is Good, not because anyone actually had any reason for
> wanting to use a different transport than HTTP), and as a result it
> can't assume it has, for example, the ability for the transport to
> report errors.

Of course, per Eric's earlier comment, there's no reason why we can't
take what's there and refine it so that there are assumptions like
HTTP and optimize it to actually *work* in such an environment.

Is there?


> It's a shame.  I can see the reasons why the committee took it the way
> they did, but the whole exercise has ended up smelling of architecture
> astronautics.  See this column if you're not familiar with the term,
> it's a good read:
>        http://www.joelonsoftware.com/articles/fog18.html

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Mike Taylor
On 30 April 2010 16:56, Ross Singer  wrote:
> On Fri, Apr 30, 2010 at 11:52 AM, Mike Taylor  wrote:
>> On 30 April 2010 16:42, Ed Summers  wrote:
>>> On Fri, Apr 30, 2010 at 11:33 AM, Ross Singer  wrote:
 Just to clarify -- OpenURL 1.0 does not assume HTTP is being used.
>>> Oh, so that's the problem!
>> Yes!  Exactly!
>> Poor old OpenURL 1.0 is abstracted to hell and back.  The sad old
>> thing doesn't even know what transport it's running on (why?  Because
>> Abstraction Is Good, not because anyone actually had any reason for
>> wanting to use a different transport than HTTP), and as a result it
>> can't assume it has, for example, the ability for the transport to
>> report errors.
> Of course, per Eric's earlier comment, there's no reason why we can't
> take what's there and refine it so that there are assumptions like
> HTTP and optimize it to actually *work* in such an environment.
> Is there?

Well, that's what the "Community Profiles" are.  So now you have TWO
long, dense, boring documents to read -- the standard and the profile!

The main game in town for Making OpenURL 1.0 Usable (maybe still the
only game, come to think of it) is the San Antonio Profile, or SAP for
short, which you can get here:
Happily, it's only eight pages long (i.e. the same length as the
entire original OpenURL 0.1 specification).  The bad news is, they are
incredible dense pages.  Sample statement:

D.5.5 Transports
  Transports define how to transport representations of
ContextObjects over the
  network. Table D.7 lists the Transports supported by the San
Antonio Community
  The By-Value OpenURL Transport and the By-Reference OpenURL
  Transport (over HTTP and HTTPS) may be used to transport ContextObjects
  represented by means of the Key/Encoded-Value and the XML ContextObject
  Format. In case of the Key/Encoded-Value representation, only a single
  ContextObject may be transported. In case of the XML
representation, one or more
  ContextObjects may be transported.
  The Inline OpenURL Transport may only be used to transport a single
  ContextObject represented using the KEV ContextObject Format.

Have fun with that!

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread MJ Suhonos
> Well, that's what the "Community Profiles" are.  So now you have TWO
> long, dense, boring documents to read -- the standard and the profile!
> The main game in town for Making OpenURL 1.0 Usable (maybe still the
> only game, come to think of it) is the San Antonio Profile, or SAP for
> short, which you can get here:
> Happily, it's only eight pages long (i.e. the same length as the
> entire original OpenURL 0.1 specification).  The bad news is, they are
> incredible dense pages.  Sample statement:
> Have fun with that!

Funny, I feel the same way when I read many of the Dublin Core specifications, 
and they have helpful diagrams and example code blocks to boot!  :-)


Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Eric Hellman
COinS showed that is in fact possible to do so- there are probably more COinS 
in the wild than OpenURLs.

I was thinking more along the lines of Ed's suggestion, (request headers, too) 
although I previously had implemented something along the lines of what Ross 

On Apr 30, 2010, at 11:56 AM, Ross Singer wrote:

> Of course, per Eric's earlier comment, there's no reason why we can't
> take what's there and refine it so that there are assumptions like
> HTTP and optimize it to actually *work* in such an environment.
> Is there?
> -Ross.

On Apr 30, 2010, at 11:21 AM, Ed Summers wrote:

> I doubt I understand the full scope of the problem (never made it
> through the spec myself). But I imagine a sensible use of HTTP status
> codes would've gotten most of the way there.
> //Ed

Eric Hellman
President, Gluejar, Inc.
41 Watchung Plaza, #132
Montclair, NJ 07042


Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Karen Coyle

Quoting Mike Taylor :

It's a shame.  I can see the reasons why the committee took it the way
they did, but the whole exercise has ended up smelling of architecture
astronautics.  See this column if you're not familiar with the term,
it's a good read:

Speaking as someone who was on the committee, I can tell you that  
there was not a consensus on "going astronautic." Although some of us  
fought a good (well, at least hard) fight, the astronauts won. And if  
you think the text of the final standard is dense, you should have  
seen version 0.1! Eric Hellman wrote a revised version that was 1) in  
English 2) made sense, but that, too, was rejected.

If you want to see my reaction to being presented with the "Bison  
Fute'" model [1] on the first day of the OpenURL committee meeting,  
download this [2] PPT and play it as a slide show (it is  
self-animated). (It helps you get the joke if you know that "Bison  
Fute'" means "wily buffalo".)

[1] http://www.dlib.org/dlib/july01/vandesompel/07vandesompel.html
[2] http://kcoyle.net/presentations/cpm3.ppt

Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

Re: [CODE4LIB] Approaches to "Did You Mean" Query Spelling Suggestions

2010-04-30 Thread Eric Larson
Bing's API is very nice.  Among many fun services, it includes a JSON  
"Did You Mean" service:


We've a large Solr index (8M+ docs) of multi-language MARC records  
that makes using Solr's internal SpellCheckComponent less than ideal.   
Solr would assume too many poor spellings were correctly spelled.

Bing wins for our use case.

- Eric

Eric Larson
Digital Library Consultant
UW Digital Collections Center

On Apr 30, 2010, at 8:55 AM, Cory Lown wrote:

I'm exploring options for implementing a spelling suggestion or  
basic query reformulation service in our home grown search  
application (searches library website, catalog, summon, and a few  
other bins). Right now, my thought is to provide results for  
whatever was searched for 'as is' and generate a link for an  
alternate search -- sort of like what The Google does.  I am  
concerned only with correcting spelling errors, not so much with  
topically related search suggestions.

The 3 options I've found that seem worth further investigation are:

- Yahoo Search spellingSuggestion service: 

- GNU Aspell: http://aspell.net

- Ockham Spell service: http://spell.ockham.org/about/index.html .  
There is a thread on Code4Lib back in 2005 about this: http://serials.infomotions.com/code4lib/sru/?operation=searchRetrieve&version=1.1&stylesheet=/code4lib/sru/style.xsl&query=spelling+server

Anyone doing something like this? What tools are you using? What  
have you tried? What worked well? Have I overlooked an option that I  
should consider?



Cory Lown
NCSU Libraries
Raleigh, NC

[CODE4LIB] SRU/ZeeRex explain question : record schemas

2010-04-30 Thread Jonathan Rochkind
This page:


"The Explain document lists the XML schemas for a given database in which 
records may be transferred. Every schemas is unambiguously identified by a URI 
and a server may assign a short name, which may or may not be the same as the 
short name listed in the table below (and may differ from the short name that 
another server assigns)."

But perusing the SRU/ZeeRex Explain documentation I've been able to find, I've 
been unable to find WHERE in the Explain document this information is 

Can anyone clue me in?

Re: [CODE4LIB] SRU/ZeeRex explain question : record schemas

2010-04-30 Thread Ray Denenberg, Library of Congress

 is what you're looking for I think.

Look at http://z3950.loc.gov:7090/voyager.

Line 74, for example,



Is this what you're looking for?


- Original Message - 
From: "Jonathan Rochkind" 

Sent: Friday, April 30, 2010 3:57 PM
Subject: [CODE4LIB] SRU/ZeeRex explain question : record schemas

This page:


"The Explain document lists the XML schemas for a given database in which 
records may be transferred. Every schemas is unambiguously identified by a 
URI and a server may assign a short name, which may or may not be the same 
as the short name listed in the table below (and may differ from the short 
name that another server assigns)."

But perusing the SRU/ZeeRex Explain documentation I've been able to find, 
I've been unable to find WHERE in the Explain document this information is 

Can anyone clue me in? 

Re: [CODE4LIB] SRU/ZeeRex explain question : record schemas

2010-04-30 Thread LeVan,Ralph
There's a  element right under the  element that
carries that data.

Here's a pointer to the Explain record for my LCNAF database.


Don't let the browser fool you!  View the source and you'll see the
actual XML that was returned.  The schemaInfo element is towards the


p.s. For my LCNAF friends on the list, note the change to the 
element in the  section.  It now include the update date and
number of records in the database!  That's being automatically generated
by the SRW server.

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf
> Jonathan Rochkind
> Sent: Friday, April 30, 2010 3:58 PM
> Subject: [CODE4LIB] SRU/ZeeRex explain question : record schemas
> This page:
> http://www.loc.gov/standards/sru/resources/schemas.html
> says:
> "The Explain document lists the XML schemas for a given database in
> records may be transferred. Every schemas is unambiguously identified
by a
> URI and a server may assign a short name, which may or may not be the
> as the short name listed in the table below (and may differ from the
short name
> that another server assigns)."
> But perusing the SRU/ZeeRex Explain documentation I've been able to
find, I've
> been unable to find WHERE in the Explain document this information is
> listed/advertised.
> Can anyone clue me in?

Re: [CODE4LIB] Approaches to "Did You Mean" Query Spelling Suggestions

2010-04-30 Thread Genny Engel
I am not a fan of services that give spelling suggestions based on their own 
web-wide universe of terms.  It's better to suggest only terms that are 
actually found within the smaller universe of your own materials.  That way the 
user isn't offered a link that's guaranteed to get them zero results.  However, 
this only works if you're actually indexing the contents of all your sources 
into a local index -- not if you're dynamically retrieving the results from 
different sources.

I don't have personal experience with any of the options you list, but from 
briefly looking at them, I would be inclined toward Aspell since you'd control 
the dictionary.  

Ideally the dictionary would auto-populate from the index the search engine 
builds.  We use Thunderstone Webinator http://www.thunderstone.com for our 
website search and it uses its own index for the spelling suggestions.  It also 
lists in parentheses the number of results that match each suggestion.   

Genny Engel
Sonoma County Library
707 545-0831 x581

>>> cory_l...@ncsu.edu 04/30/10 06:55AM >>>
I'm exploring options for implementing a spelling suggestion or basic query 
reformulation service in our home grown search application (searches library 
website, catalog, summon, and a few other bins). Right now, my thought is to 
provide results for whatever was searched for 'as is' and generate a link for 
an alternate search -- sort of like what The Google does.  I am concerned only 
with correcting spelling errors, not so much with topically related search 

The 3 options I've found that seem worth further investigation are:

- Yahoo Search spellingSuggestion service: 

- GNU Aspell: http://aspell.net 

- Ockham Spell service: http://spell.ockham.org/about/index.html . There is a 
thread on Code4Lib back in 2005 about this: 

Anyone doing something like this? What tools are you using? What have you 
tried? What worked well? Have I overlooked an option that I should consider?



Cory Lown
NCSU Libraries
Raleigh, NC

Re: [CODE4LIB] Approaches to "Did You Mean" Query Spelling Suggestions

2010-04-30 Thread Brad Dewar
Seconded.  We use Solr's SpellCheckComponent to accomplish exactly this.

Brad Dewar

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Genny 
Sent: April-30-10 6:00 PM
Subject: Re: [CODE4LIB] Approaches to "Did You Mean" Query Spelling Suggestions

I am not a fan of services that give spelling suggestions based on their own 
web-wide universe of terms.  It's better to suggest only terms that are 
actually found within the smaller universe of your own materials.  That way the 
user isn't offered a link that's guaranteed to get them zero results.  However, 
this only works if you're actually indexing the contents of all your sources 
into a local index -- not if you're dynamically retrieving the results from 
different sources.

I don't have personal experience with any of the options you list, but from 
briefly looking at them, I would be inclined toward Aspell since you'd control 
the dictionary.  

Ideally the dictionary would auto-populate from the index the search engine 
builds.  We use Thunderstone Webinator http://www.thunderstone.com for our 
website search and it uses its own index for the spelling suggestions.  It also 
lists in parentheses the number of results that match each suggestion.   

Genny Engel
Sonoma County Library
707 545-0831 x581

>>> cory_l...@ncsu.edu 04/30/10 06:55AM >>>
I'm exploring options for implementing a spelling suggestion or basic query 
reformulation service in our home grown search application (searches library 
website, catalog, summon, and a few other bins). Right now, my thought is to 
provide results for whatever was searched for 'as is' and generate a link for 
an alternate search -- sort of like what The Google does.  I am concerned only 
with correcting spelling errors, not so much with topically related search 

The 3 options I've found that seem worth further investigation are:

- Yahoo Search spellingSuggestion service: 

- GNU Aspell: http://aspell.net 

- Ockham Spell service: http://spell.ockham.org/about/index.html . There is a 
thread on Code4Lib back in 2005 about this: 

Anyone doing something like this? What tools are you using? What have you 
tried? What worked well? Have I overlooked an option that I should consider?



Cory Lown
NCSU Libraries
Raleigh, NC

Re: [CODE4LIB] Approaches to "Did You Mean" Query Spelling Suggestions

2010-04-30 Thread Chad Fennell
 "Seconded.  We use Solr's SpellCheckComponent to accomplish exactly this."


[CODE4LIB] It's cool to love milk and cookies

2010-04-30 Thread Simon Spero

I like chocolate milk.