[CODE4LIB] NCSU's Virtual Shelf Browse Project Open Source Release

2010-05-19 Thread Cory Lown
The NCSU Libraries is pleased to release its Virtual Shelf Browse application 
and web service as open source software. Source code is available for viewing, 
download, and checkout from Google Code under the MIT/X11 License. It includes 
a back end web service for retrieving items in shelf order and a front end web 
application for browsing.

The Virtual Shelf Browse project is intended to mimic in an online environment 
the discovery opportunities available while browsing library book stacks. We 
intend this as a first step toward providing patrons with Amazon and Netflix 
style recommendations to "find more like this" in the library.

The code:

http://code.google.com/p/virtual-shelf-browse/

Try it here (click on "browse shelf"):

http://www2.lib.ncsu.edu/catalog/record/NCSU2240490

Additional information about the project:

http://www.lib.ncsu.edu/dli/projects/virtualshelfindex/
http://code4lib.org/conference/2010/orphanides_lown_lynema

Do with it what you will -- and enjoy.

The NCSU Libraries

Andreas Orphanides
Emily Lynema
Cory Lown


Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Peter Noerr
Agreed it is a problem. What MSSEs do (when operating this way) is make this 
issue a response time dependent one. Users themselves make it a Source 
dependent one (they only look at results from the sites they decide to search). 
Ranking algorithms make it an algorithm dependent one (their algorithm will 
determine what is top of the list).

In all cases the results are vying for the few slots that the user will 
actually look at - "above the fold", first 3", "first page", etc. The problem 
is that all results cannot be first, and we do not have any way to insist the 
user look at all of them and make an informed selection. Anyway this can go all 
the way back to the collection policies of the library and the aggregators and 
even the cussedness of authors in not writing articles on exactly the right 
topic. (bad authors!) 

The MSEEs try to be even handed about it, but it doesn't always work. Possibly 
saving technologies here are text analysis and faceting. These can help take 
"horizontal slices" out of the vertically ordered list of results. That means 
the users can select another list which will be ordered a bit differently, and 
with text analysis and facets applied again, give them ways to slice and dice 
those results. But, in the end it requires enough interest from the user to do 
some refinement, and that battles with "good enough".

Peter

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Walker, David
> Sent: Wednesday, May 19, 2010 1:18 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was
> - OASIS SRU and CQL, access to most-current drafts
> 
> > And if the majority of users are only looking at results
> > from one resource... why do a broadcast multi-server
> > search in the first place?
> 
> More than just a theoretical concern.  Consider this from an article by
> Nina McHale:
> 
> "[R]eference and instruction staff at Auraria were asked to draw up a
> list of ten or so resources that would be included in a general-focus
> “Quick Search” . . . [h]owever, in practice, the result was
> disappointing. The results returned from the fastest resource were the
> results on top of the pile, and of the twelve resources chosen,
> PsycINFO routinely returned results first. Reference and instruction
> staff rightly felt that this skewed the results for a general query."
> [1]
> 
> One library' perspective, and I'm pretty sure they were not using Muse.
> But conceptually the concern would be the same.
> 
> --Dave
> 
> [1] http://webserviceslibrarian.blogspot.com/2009/01/why-reference-and-
> instruction.html
> 
> ==
> David Walker
> Library Web Services Manager
> California State University
> http://xerxes.calstate.edu
> 
> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of
> Jonathan Rochkind [rochk...@jhu.edu]
> Sent: Wednesday, May 19, 2010 12:45 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was
> - OASIS SRU and CQL, access to most-current drafts
> 
> Wait, but in the case you suspect is common, where you return results
> as
> soon as the first resource is returned, and subsequent results are
> added
> to the _end_ of the list
> 
> I'm thinking that in most of these cases, the subsequent results will
> be
> several pages "in", and the user will never even get there. And if the
> majority of users are only looking at results from one resource... why
> do a broadcast multi-server search in the first place?
> 
> Peter Noerr wrote:
> > However things are a bit different now...  At the risk of opening the
> debate once more and lots of lengthy discussion let me say that our
> experience (as one of the handful of commercial providers of "multi-
> server search engines" (MSSEs? - it'll never stick, but I like it)) is:
> >
> > 1) Times are not slow for most installations as they are set by
> default to provide incremental results in the fashion Jakub suggests
> ("First In, First Displayed"). So users see results driven by the time
> of the fastest Source, not the slowest. This
> means that, on average, getting the results from a MSSE can be faster
> than doing the same search on all of the native sites (just talking
> response times here, not the fact it is one search versus N). Do the
> maths - it's quite fun. 
> >
> > 2) The average "delay" for just processing the results through modern
> MSSEs is about 0.5 sec. Add to this say another 0.2 for two extra
> network hops and the additional response time to first display is about
> 3/4 of a second. This is a time shift all the way down the set of
> results - most of which the user isn't aware of as they are beyond the
> first 10 on screen, and the system allows interaction with those 10
> while the rest are getting their act together. So, under 1 second is
> added to response times which average about 5 seconds. Of course

Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Walker, David
> And if the majority of users are only looking at results 
> from one resource... why do a broadcast multi-server 
> search in the first place?

More than just a theoretical concern.  Consider this from an article by Nina 
McHale:

"[R]eference and instruction staff at Auraria were asked to draw up a list of 
ten or so resources that would be included in a general-focus “Quick Search” . 
. . [h]owever, in practice, the result was disappointing. The results returned 
from the fastest resource were the results on top of the pile, and of the 
twelve resources chosen, PsycINFO routinely returned results first. Reference 
and instruction staff rightly felt that this skewed the results for a general 
query." [1]

One library' perspective, and I'm pretty sure they were not using Muse.  But 
conceptually the concern would be the same.

--Dave

[1] 
http://webserviceslibrarian.blogspot.com/2009/01/why-reference-and-instruction.html

==
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Jonathan 
Rochkind [rochk...@jhu.edu]
Sent: Wednesday, May 19, 2010 12:45 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS 
SRU and CQL, access to most-current drafts

Wait, but in the case you suspect is common, where you return results as
soon as the first resource is returned, and subsequent results are added
to the _end_ of the list

I'm thinking that in most of these cases, the subsequent results will be
several pages "in", and the user will never even get there. And if the
majority of users are only looking at results from one resource... why
do a broadcast multi-server search in the first place?

Peter Noerr wrote:
> However things are a bit different now...  At the risk of opening the debate 
> once more and lots of lengthy discussion let me say that our experience (as 
> one of the handful of commercial providers of "multi-server search engines" 
> (MSSEs? - it'll never stick, but I like it)) is:
>
> 1) Times are not slow for most installations as they are set by default to 
> provide incremental results in the fashion Jakub suggests ("First In, First 
> Displayed"). So users see results driven by the time of the fastest Source, 
> not the slowest. This means that, on average, getting 
> the results from a MSSE can be faster than doing the same search on all of 
> the native sites (just talking response times here, not the fact it is one 
> search versus N). Do the maths - it's quite fun. 
>
> 2) The average "delay" for just processing the results through modern MSSEs 
> is about 0.5 sec. Add to this say another 0.2 for two extra network hops and 
> the additional response time to first display is about 3/4 of a second. This 
> is a time shift all the way down the set of results - most of which the user 
> isn't aware of as they are beyond the first 10 on screen, and the system 
> allows interaction with those 10 while the rest are getting their act 
> together. So, under 1 second is added to response times which average about 5 
> seconds. Of course, waiting for all the results adds this time to the slowest 
> results.
>
> 3) Most users seem happy to get things back faster and not worry too much 
> about relevance ranking. To combat the response time issue for users who 
> require ranked results, the incremental return can be set to show interfiled 
> results as the later records come in and rank within the ones displayed to 
> the user. This can be disconcerting, but making sure the UI doesn't lose 
> track of the user's focus is helpful. Another option is to show that "new 
> results" are available, and let the user manually click to get them 
> incorporated - less intrusive, but an extra click!
>
> General experience with the incremental displays shows that users are 
> happiest with them when there is an obvious and clear reason for the new 
> additions. The most accepted case is where the ranking criterion is price, 
> and the user is always happy to see a cheaper item arrive. It really doesn't 
> work well for titles sorted alphabetically - unless the user is looking for a 
> specific title which should occur at the beginning of the list. And these 
> examples illustrate the general point - that if the user is focused on 
> specific items at the top of the list, then they are generally happy with an 
> updating list, if they are more in "browse" mode, then the distraction of the 
> updating list is just that - a distraction, if it is on screen.
>
> Overall our experience from our partner's users is that they would rather see 
> things quickly than wait for relevance ranking. I suspect partly (can of 
> worms coming) because the existing ranking schemes don't make a lot of 
> difference (ducks quickly).
>
> Peter
>
> Peter Noerr
> CTO, Museglobal
> www.museglobal.com
>
>
>> -Original 

Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Peter Noerr
Aha, but we get interleaved results from the different Sources. So the results 
are not "all A", "all B", "all... Even if the results come as complete "sets of 
10", we internally collect them asynchronously as they are processed. The 
number of buffers and processing stages is quite large, so the parallel 
processing nature of multi-tasking means that the results get interleaved. It 
is still possible that one set of results comes in so far in advance of 
everything else that it is completely processed before anything else arrives, 
then the display is "all A", "others".

However the major benefit is that the results from all the Sources are there at 
once, so even if the user uses the system to "skip" from Source to Source, it 
is quicker than running the search on all the Sources individually. And, yes, 
you can individually save "a few here", "one or two there" to make your 
combined chosen few. 

But, first page only viewing does mean that the fastest Sources get the best 
spots. Is this an incentive to speed up the search systems? (Actually it has 
happened that a couple of the Sources who we showed comparative response time 
to, did use the figures to get funds for hardware replacement.)

Peter

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Jonathan Rochkind
> Sent: Wednesday, May 19, 2010 12:45 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Multi-server Search Engine response times: was
> - OASIS SRU and CQL, access to most-current drafts
> 
> Wait, but in the case you suspect is common, where you return results
> as
> soon as the first resource is returned, and subsequent results are
> added
> to the _end_ of the list
> 
> I'm thinking that in most of these cases, the subsequent results will
> be
> several pages "in", and the user will never even get there. And if the
> majority of users are only looking at results from one resource... why
> do a broadcast multi-server search in the first place?
> 
> Peter Noerr wrote:
> > However things are a bit different now...  At the risk of opening the
> debate once more and lots of lengthy discussion let me say that our
> experience (as one of the handful of commercial providers of "multi-
> server search engines" (MSSEs? - it'll never stick, but I like it)) is:
> >
> > 1) Times are not slow for most installations as they are set by
> default to provide incremental results in the fashion Jakub suggests
> ("First In, First Displayed"). So users see results driven by the time
> of the fastest Source, not the slowest. This
> means that, on average, getting the results from a MSSE can be faster
> than doing the same search on all of the native sites (just talking
> response times here, not the fact it is one search versus N). Do the
> maths - it's quite fun. 
> >
> > 2) The average "delay" for just processing the results through modern
> MSSEs is about 0.5 sec. Add to this say another 0.2 for two extra
> network hops and the additional response time to first display is about
> 3/4 of a second. This is a time shift all the way down the set of
> results - most of which the user isn't aware of as they are beyond the
> first 10 on screen, and the system allows interaction with those 10
> while the rest are getting their act together. So, under 1 second is
> added to response times which average about 5 seconds. Of course,
> waiting for all the results adds this time to the slowest results.
> >
> > 3) Most users seem happy to get things back faster and not worry too
> much about relevance ranking. To combat the response time issue for
> users who require ranked results, the incremental return can be set to
> show interfiled results as the later records come in and rank within
> the ones displayed to the user. This can be disconcerting, but making
> sure the UI doesn't lose track of the user's focus is helpful. Another
> option is to show that "new results" are available, and let the user
> manually click to get them incorporated - less intrusive, but an extra
> click!
> >
> > General experience with the incremental displays shows that users are
> happiest with them when there is an obvious and clear reason for the
> new additions. The most accepted case is where the ranking criterion is
> price, and the user is always happy to see a cheaper item arrive. It
> really doesn't work well for titles sorted alphabetically - unless the
> user is looking for a specific title which should occur at the
> beginning of the list. And these examples illustrate the general point
> - that if the user is focused on specific items at the top of the list,
> then they are generally happy with an updating list, if they are more
> in "browse" mode, then the distraction of the updating list is just
> that - a distraction, if it is on screen.
> >
> > Overall our experience from our partner's users is that they would
> rather see things quickly than wait for relevance ranking. I suspect
> partly (can of worms coming)

Re: [CODE4LIB] OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Peter Noerr
Since we generally return results asynchronously to client systems from our 
MSSE (fed/meta/broadcast/aggregated/parallel/Multi-Server/Search Engine) I 
would just point out that we use other protocols than SRU when doing so. When 
we do use SRU on the client side, then we send back the results in a complete 
set. Otherwise we send them in tranches on a timescale controlled by the client 
system, usually about every 2 seconds.

Obviously an SRU-async protocol is possible, but would it be used? As a MSSE we 
would use it to get results from Sources, so they could be processed earlier 
(smaller response time) and more smoothly. But that would require Source 
servers implemented it, and what would their incentive be to implement it? 

For direct use with end users it would mean a browser client capable of 
retrieving and managing the partial data is needed. Middleware systems (between 
the MSSE and the user) would need to support it, and pass the benefit to the 
user. Any system doing heavy analysis of the results would probably not want 
(and may not be able) to start than analysis until all the results are 
obtained, because of the added messiness of handling partial results sets, from 
multiple Sources (it is messy - believe me). 

I would be very happy to see such a protocol (and have it implemented), and if 
Jakub implemented browser code to handle that end, then the users could benefit.

Peter

Peter Noerr
CTO. MuseGlobal
www.museglobal.com

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Jakub Skoczen
> Sent: Tuesday, May 18, 2010 12:51 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current
> drafts
> 
> On Tue, May 18, 2010 at 9:17 PM, Ray Denenberg, Library of Congress
>  wrote:
> > First, no. There are extensibility features in SRU but nothing that
> would
> > help here.
> >
> > Actually, Jonathan, what I though you were suggesting was the
> creation of a
> > (I hesitate to say it) metasearch engine. I use that term because it
> is what
> > NISO called it, when they started their metasearch initiative five or
> so
> > years ago, to create a standard for a metasearch engine, but they got
> > distracted and the effort really came to nothing.
> 
> I'm not sure if Jonathan was suggesting that but that's exactly what I
> had in mind - using SRU 2.0 as a front-end protocol for a meta-search
> engine. And yes while creating a third-party, SRU-inspired protocol
> for that purpose could work, I see very little value in such exercise.
> I suspect that, as any standard, SRU has certain limitations and, as
> an implementer, you have to work around them but you do end up with an
> obvious gain: standards compliance. SRU-inspired protocol is not quite
> the same thing, and it's probably easier to go all the way and create
> a custom, proprietary protocol.
> 
> > The premise of the metasearch engine is that there exists a single-
> thread
> > protocol, for example, SRU, and the need is to manage many threads,
> which is
> > what the metasearch engine would have done if it had ever been
> defined. This
> > is probably not an area for OASIS work, but if someone wanted to
> revive the
> > effort in NISO (and put it on the right track) it could be useful.
> >
> > --Ray
> >
> >
> > -Original Message-
> > From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf
> Of
> > Jonathan Rochkind
> > Sent: Tuesday, May 18, 2010 2:56 PM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current
> drafts
> >
> > Jakub Skoczen wrote:
> >>
> >>> I wonder if someone, like Kuba, could design an 'extended async
> SRU'
> >>> on top of SRU, that is very SRU like, but builds on top of it to
> add
> >>> just enough operations for Kuba's use case area.  I think that's
> the
> >>> right way to approach it.
> >>>
> >>
> >> Is there a particular "extensibility" feature in the protocol that
> >> allows for this?
> >>
> > I don't know, but that's not what I was suggesting. I was suggesting
> you
> > read the SRU spec, and then design your own "SRU-async" spec, which
> is
> > defined as "exactly like SRU 2.0, except it also has the following
> > operations, and is identified in an Explain document like X."
> >
> > Jonathan
> >
> 
> 
> 
> --
> 
> Cheers,
> Jakub


Re: [CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Jonathan Rochkind
Wait, but in the case you suspect is common, where you return results as 
soon as the first resource is returned, and subsequent results are added 
to the _end_ of the list


I'm thinking that in most of these cases, the subsequent results will be 
several pages "in", and the user will never even get there. And if the 
majority of users are only looking at results from one resource... why 
do a broadcast multi-server search in the first place?


Peter Noerr wrote:

However things are a bit different now...  At the risk of opening the debate once more 
and lots of lengthy discussion let me say that our experience (as one of the handful of 
commercial providers of "multi-server search engines" (MSSEs? - it'll never 
stick, but I like it)) is:

1) Times are not slow for most installations as they are set by default to provide incremental 
results in the fashion Jakub suggests ("First In, First Displayed"). So users see results 
driven by the time of the fastest Source, not the slowest. This means 
that, on average, getting the results from a MSSE can be faster than doing the same search on all of 
the native sites (just talking response times here, not the fact it is one search versus N). Do the 
maths - it's quite fun. 

2) The average "delay" for just processing the results through modern MSSEs is 
about 0.5 sec. Add to this say another 0.2 for two extra network hops and the additional 
response time to first display is about 3/4 of a second. This is a time shift all the way 
down the set of results - most of which the user isn't aware of as they are beyond the 
first 10 on screen, and the system allows interaction with those 10 while the rest are 
getting their act together. So, under 1 second is added to response times which average 
about 5 seconds. Of course, waiting for all the results adds this time to the slowest 
results.

3) Most users seem happy to get things back faster and not worry too much about relevance 
ranking. To combat the response time issue for users who require ranked results, the 
incremental return can be set to show interfiled results as the later records come in and 
rank within the ones displayed to the user. This can be disconcerting, but making sure 
the UI doesn't lose track of the user's focus is helpful. Another option is to show that 
"new results" are available, and let the user manually click to get them 
incorporated - less intrusive, but an extra click!

General experience with the incremental displays shows that users are happiest with them when there is an obvious and clear reason for the new additions. The most accepted case is where the ranking criterion is price, and the user is always happy to see a cheaper item arrive. It really doesn't work well for titles sorted alphabetically - unless the user is looking for a specific title which should occur at the beginning of the list. And these examples illustrate the general point - that if the user is focused on specific items at the top of the list, then they are generally happy with an updating list, if they are more in "browse" mode, then the distraction of the updating list is just that - a distraction, if it is on screen. 


Overall our experience from our partner's users is that they would rather see 
things quickly than wait for relevance ranking. I suspect partly (can of worms 
coming) because the existing ranking schemes don't make a lot of difference 
(ducks quickly).

Peter

Peter Noerr
CTO, Museglobal
www.museglobal.com

  

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Walker, David
Sent: Tuesday, May 18, 2010 12:44 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current
drafts



in order to provide decent user experience you need to be
able to present some results "sooner" than others.
  

I would actually question whether this is really necessary.

A few years back, I did a big literature review on metasearch, as well
as a looked at a good number of usability studies that libraries did
with metasearch systems.

One thing that stood to me out was that the literature (written by
librarians and technologists) was very concerned about the slow search
times of metasearch, often seeing it as a deal-breaker.

And yet, in the usability studies, actual students and faculty were far
less concerned about the search times -- within reason, of course.

I thought the UC Santa Cruz study [1] summarized the point well: "Users
are willing to wait as long as they think that they will get useful
results. Their perceptions of time depend on this belief."

Trying to return the results of a metasearch quickly just for the sake
of returning them quickly I think introduces other problems (in terms
of relevance ranking and presentation) that do far more to negatively
impact the user experience.  Just my opinion, of course.

--Dave

[1]
http://www.cdlib.org/services/d2d/metasearch/docs/core_ucsc_oct2004usab
ility.pdf

==

[CODE4LIB] Multi-server Search Engine response times: was - OASIS SRU and CQL, access to most-current drafts

2010-05-19 Thread Peter Noerr
However things are a bit different now...  At the risk of opening the debate 
once more and lots of lengthy discussion let me say that our experience (as one 
of the handful of commercial providers of "multi-server search engines" (MSSEs? 
- it'll never stick, but I like it)) is:

1) Times are not slow for most installations as they are set by default to 
provide incremental results in the fashion Jakub suggests ("First In, First 
Displayed"). So users see results driven by the time of the fastest Source, not 
the slowest. This means that, on average, getting the 
results from a MSSE can be faster than doing the same search on all of the 
native sites (just talking response times here, not the fact it is one search 
versus N). Do the maths - it's quite fun. 

2) The average "delay" for just processing the results through modern MSSEs is 
about 0.5 sec. Add to this say another 0.2 for two extra network hops and the 
additional response time to first display is about 3/4 of a second. This is a 
time shift all the way down the set of results - most of which the user isn't 
aware of as they are beyond the first 10 on screen, and the system allows 
interaction with those 10 while the rest are getting their act together. So, 
under 1 second is added to response times which average about 5 seconds. Of 
course, waiting for all the results adds this time to the slowest results.

3) Most users seem happy to get things back faster and not worry too much about 
relevance ranking. To combat the response time issue for users who require 
ranked results, the incremental return can be set to show interfiled results as 
the later records come in and rank within the ones displayed to the user. This 
can be disconcerting, but making sure the UI doesn't lose track of the user's 
focus is helpful. Another option is to show that "new results" are available, 
and let the user manually click to get them incorporated - less intrusive, but 
an extra click!

General experience with the incremental displays shows that users are happiest 
with them when there is an obvious and clear reason for the new additions. The 
most accepted case is where the ranking criterion is price, and the user is 
always happy to see a cheaper item arrive. It really doesn't work well for 
titles sorted alphabetically - unless the user is looking for a specific title 
which should occur at the beginning of the list. And these examples illustrate 
the general point - that if the user is focused on specific items at the top of 
the list, then they are generally happy with an updating list, if they are more 
in "browse" mode, then the distraction of the updating list is just that - a 
distraction, if it is on screen. 

Overall our experience from our partner's users is that they would rather see 
things quickly than wait for relevance ranking. I suspect partly (can of worms 
coming) because the existing ranking schemes don't make a lot of difference 
(ducks quickly).

Peter

Peter Noerr
CTO, Museglobal
www.museglobal.com

> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
> Walker, David
> Sent: Tuesday, May 18, 2010 12:44 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current
> drafts
> 
> > in order to provide decent user experience you need to be
> > able to present some results "sooner" than others.
> 
> I would actually question whether this is really necessary.
> 
> A few years back, I did a big literature review on metasearch, as well
> as a looked at a good number of usability studies that libraries did
> with metasearch systems.
> 
> One thing that stood to me out was that the literature (written by
> librarians and technologists) was very concerned about the slow search
> times of metasearch, often seeing it as a deal-breaker.
> 
> And yet, in the usability studies, actual students and faculty were far
> less concerned about the search times -- within reason, of course.
> 
> I thought the UC Santa Cruz study [1] summarized the point well: "Users
> are willing to wait as long as they think that they will get useful
> results. Their perceptions of time depend on this belief."
> 
> Trying to return the results of a metasearch quickly just for the sake
> of returning them quickly I think introduces other problems (in terms
> of relevance ranking and presentation) that do far more to negatively
> impact the user experience.  Just my opinion, of course.
> 
> --Dave
> 
> [1]
> http://www.cdlib.org/services/d2d/metasearch/docs/core_ucsc_oct2004usab
> ility.pdf
> 
> ==
> David Walker
> Library Web Services Manager
> California State University
> http://xerxes.calstate.edu
> 
> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Kuba
> [skoc...@gmail.com]
> Sent: Tuesday, May 18, 2010 9:57 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] OASIS SRU and CQL, access to most-current
> dra

[CODE4LIB] JOB OPENING: Drupal Developer at the University of Michigan Library

2010-05-19 Thread Varnum, Ken
Apologies for cross-posting.

The University of Michigan Library -- http://www.lib.umich.edu/ -- is seeking 
an experienced Drupal developer and web designer to work on the library's 
public-facing web site, extend our services into new areas, develop a staff 
Intranet, and be part of a dynamic team of web developers working in a 
cutting-edge library.

The University Library web site provides access to resources, services, 
collections, news and staff of the University Library. The University Library 
web environment serves to facilitate and support branch and divisional library 
web sites and resources as well as to create an integrated and coherent 
Library-wide web environment responsive to the information and access needs of 
our users. The web site is built with Drupal 6, makes extensive use of Solr, 
and is integrated with the library's catalog (VuFind) and article discovery 
(Metalib) tools. It serves as a virtual extension and representation of the 
Library, is designed to be easily navigable; and to provide timely and accurate 
news and information, customization options and a sense of identity and 
coherence to the user.

This is a permanent, full-time, salaried position with full benefits including 
generous time-off (24 paid vacation days/year), a retirement plan that provides 
matching contributions with immediate vesting, many choices for comprehensive 
medical insurance, life insurance, a long-term disability option, as well as 
several other features that contribute to peace of mind.

For a detailed job posting and instructions for applying, please visit
http://bit.ly/um-drupal-job

If you have questions about the job, please contact me.



--
Ken Varnum
Web Systems Manager   E: var...@umich.edu
University of Michigan LibraryT: 734-615-3287
309 Hatcher Graduate Library  F: 734-647-6897
Ann Arbor, MI 48109-1190  http://www.lib.umich.edu/


[CODE4LIB] Job Posting: Web Developer / Designer

2010-05-19 Thread Jason Stirnaman
Web Developer / Designer
A.R. Dykes Library/Internet Development
University of Kansas Medical Center, Kansas City, KS

See the full posting at 
https://jobs.kumc.edu/applicants/jsp/shared/position/JobDetails_css.jsp?postingId=371066

Position Summary: This person will work closely with Dykes Library staff, 
faculty and other Information Resources units to raise the visibility of 
expertise, research, publications, grey literature, and collections on the KUMC 
web site. Requires one to work closely with librarians and library personnel to 
understand information requirement needs and to then seek solutions for meeting 
those required needs.  Requires working with diverse web applications and data 
sources to provide seamless user services.

Required Qualifications: 2 or more years experience developing database-driven 
web applications
Degree from accredited college or university. An equivalent combination of 
education and experience may be considered - each year of additional experience 
may be substituted for one year of education.
Experience programming with Ruby
Experience working with XSLT and XPath
Experience developing real-world applications using Ruby on Rails, Java, .NET, 
or PHP and Postgresql, MySQL, SQL Server, or Oracle
Experience working in a Unix environment
Willingness to work in a collaborative team setting
Excellent communication skills, analytical, and problem-solving abilities   
  


Re: [CODE4LIB] internet archive experiment -- bad metadata

2010-05-19 Thread Barnett, Jeffrey
How common is the kind of meta data mismatch* associated with this record?
http://openlibrary.org/books/OL23383343M/Cisco_Networking_Academy_Program
What is the point of contact for making corrections?

*The metadata is about Unix (2004), the Book is about Ben Franklin (1908)
"Contributed by Google"

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Eric 
Lease Morgan
Sent: Friday, May 14, 2010 2:05 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] internet archive experiment

We are doing a tiny experiment here at Notre Dame with the Internet Archive, 
specifically, we are determining whether or not we can supplement a special 
collection with full text content.

We are hosting at site colloquially called the Catholic Portal -- a collection 
of rare, infrequently held, and uncommon materials of a Catholic nature. [1] 
Much of the content of the Portal is metadata -- MARC and EAD records/files. I 
think the Portal would be more useful if it contained full text content. If it 
did, then indexing would be improved and services against the texts could be 
implemented.

How can we get full text content? This is what we are going to try:

  1. parse out identifying information from
 metadata (author names, titles, dates,
 etc.)

  2. construct a URL in the form of a
 Advanced Search query and send it to the
 Archive

  3. get back a list of matches in an XML
 format

  4. parse the result looking for the "best"
 matches

  5. save Internet Archive keys identifying
 full text items

  6. mirror Internet Archive content locally
 using keys as pointers

  7. update local metadata files pointing to
 Archive content as well as locally
 mirrored content

  8. re-index local metadata

If we are (somewhat) successful, then search results would not only have 
pointers to the physical items, but they would also have pointers to the 
digitized items. Not only could they have pointers to the digitized items, but 
they could also have pointers to "services against the texts" such as make word 
cloud, display concordance, plot word/phrase frequency, etc. These later 
services are spaces where I think there is great potential for librarianship.

Frankly, because of the Portal's collection policy, I don't expect to find very 
much material. On the other hand, the same process could be applied to more 
generic library collections where more content may have already been digitized. 

Wish us luck.

[1] Catholic Portal - http://www.catholicresearch.net/
[2] Advanced search - http://www.archive.org/advancedsearch.php

-- 
Eric Lease Morgan
University of Notre Dame