Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service

2009-12-08 Thread Keith Jenkins
On Mon, Dec 7, 2009 at 5:56 PM, Ed Summers e...@pobox.com wrote:
 It would be great to have some external dataset to use in
 ranking LCSH suggestions at id.loc.gov. But at the moment it's a
 simple mysql db loaded up with some MARC LCSH data. I guess it could
 do something smart with PageRank-like ranking of 'super-concepts'
 (concepts that are linked to a lot)...but that would've taken longer
 than 20 minutes :-)

The frequency of an LCSH term within the LC catalog could also be
useful for ranking, although I'm not sure if such data would be
readily available.

Another possibility would be a simple count of broader terms +
narrower terms + related terms or something like that.  Although
PageRank would probably be better, since even some important terms
might have a relatively small number of immediately-adjacent links.

Keith


Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service

2009-12-08 Thread Karen Coyle

Quoting Keith Jenkins k...@cornell.edu:



The frequency of an LCSH term within the LC catalog could also be
useful for ranking, although I'm not sure if such data would be
readily available.


Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a  
copy of the LC subject authority file. The entries in this file form  
the basis for subject headings, most of which add facets to the  
authority entry when forming the subject heading. One could do a  
left-anchored match against actual headings, and that might provide  
some interesting statistics.


Edward Betts of the Open Library project did some casual data  
gathering for subjects, and posted his top 1000 subject headings  
(not subject authorities):

http://edwardbetts.com/ol/top_1000_subjects
The OL has decided to break up the subject headings into their  
subfields, and somewhere there are some pages that show some subfields  
with the highest ranking subfields they appear with. (There must be a  
better way to say that! Sorry, too early, too few cups of tea.) One  
example is here:

http://home.us.archive.org/~edward/related/Cheese.html
I think that something like this will be incorporated into the next  
version of OL, which will be heavily navigation-oriented rather than  
search-oriented.


kc
p.s. Anyone who wants to play with a file can grab the OL data export:

http://openlibrary.org/dev/docs/jsondump

Unfortunately it includes both LC and non-LC subjects (mainly BISAC  
from Amazon)




Another possibility would be a simple count of broader terms +
narrower terms + related terms or something like that.  Although
PageRank would probably be better, since even some important terms
might have a relatively small number of immediately-adjacent links.

Keith



--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service

2009-12-08 Thread Ethan Gruber
If that isn't LCSH, then is the entirety of LCSH available electronically in
some capacity (at least available in some easily accessible file or files
that can be processed)?

Ethan

On Tue, Dec 8, 2009 at 10:16 AM, Karen Coyle li...@kcoyle.net wrote:

 Quoting Keith Jenkins k...@cornell.edu:


 The frequency of an LCSH term within the LC catalog could also be
 useful for ranking, although I'm not sure if such data would be
 readily available.


 Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a
 copy of the LC subject authority file. The entries in this file form the
 basis for subject headings, most of which add facets to the authority
 entry when forming the subject heading. One could do a left-anchored match
 against actual headings, and that might provide some interesting statistics.

 Edward Betts of the Open Library project did some casual data gathering for
 subjects, and posted his top 1000 subject headings (not subject
 authorities):
 http://edwardbetts.com/ol/top_1000_subjects
 The OL has decided to break up the subject headings into their subfields,
 and somewhere there are some pages that show some subfields with the highest
 ranking subfields they appear with. (There must be a better way to say that!
 Sorry, too early, too few cups of tea.) One example is here:
 http://home.us.archive.org/~edward/related/Cheese.htmlhttp://home.us.archive.org/%7Eedward/related/Cheese.html
 I think that something like this will be incorporated into the next version
 of OL, which will be heavily navigation-oriented rather than
 search-oriented.

 kc
 p.s. Anyone who wants to play with a file can grab the OL data export:

 http://openlibrary.org/dev/docs/jsondump

 Unfortunately it includes both LC and non-LC subjects (mainly BISAC from
 Amazon)



 Another possibility would be a simple count of broader terms +
 narrower terms + related terms or something like that.  Although
 PageRank would probably be better, since even some important terms
 might have a relatively small number of immediately-adjacent links.

 Keith


 --
 Karen Coyle
 kco...@kcoyle.net http://kcoyle.net
 ph: 1-510-540-7596
 m: 1-510-435-8234
 skype: kcoylenet



Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service

2009-12-08 Thread Ed Summers
On Tue, Dec 8, 2009 at 10:16 AM, Karen Coyle li...@kcoyle.net wrote:
 Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a copy
 of the LC subject authority file. The entries in this file form the basis
 for subject headings, most of which add facets to the authority entry when
 forming the subject heading. One could do a left-anchored match against
 actual headings, and that might provide some interesting statistics.

Yes, using the actual headings extracted from bibliographic data seems
to be a better approach. It's easier to rank them, and as Karen points
out you get the actual post-coordinated headings, not just the
headings LC has decided to establish authority records for.

//Ed


Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service

2009-12-08 Thread Ross Singer
I suppose it would be helpful to actually know the problem that is
trying to be solved here (I mean, a lot of people, including myself,
are throwing out solutions to a problem that's never been actually
defined).

Ethan, what, exactly, are you trying to do?  Do you want authorized
headings?  Or do you want LCSH that appears in the wild?

-Ross.

On Tue, Dec 8, 2009 at 10:35 AM, Ed Summers e...@pobox.com wrote:
 On Tue, Dec 8, 2009 at 10:16 AM, Karen Coyle li...@kcoyle.net wrote:
 Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a copy
 of the LC subject authority file. The entries in this file form the basis
 for subject headings, most of which add facets to the authority entry when
 forming the subject heading. One could do a left-anchored match against
 actual headings, and that might provide some interesting statistics.

 Yes, using the actual headings extracted from bibliographic data seems
 to be a better approach. It's easier to rank them, and as Karen points
 out you get the actual post-coordinated headings, not just the
 headings LC has decided to establish authority records for.

 //Ed



Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service

2009-12-08 Thread Ethan Gruber
I am going to integrate subject headings into an XForms application.  I can
work with the authorized headings.  Before taking the XML file that LC
provides and doing my own thing with Solr, I wanted to see if anyone had
used the headings to do their own type of autosuggest.  I was especially
interested if anyone had done it with data provided dynamically by a service
that may or may not have existed on LC.  A Solr index that I populate with
data I download today becomes static, my interest is how I (or other people
developing their own autosuggest systems based on the subject headings) can
pull updates of the master authority XML file into my index of terms.

Ethan

On Tue, Dec 8, 2009 at 10:42 AM, Ross Singer rossfsin...@gmail.com wrote:

 I suppose it would be helpful to actually know the problem that is
 trying to be solved here (I mean, a lot of people, including myself,
 are throwing out solutions to a problem that's never been actually
 defined).

 Ethan, what, exactly, are you trying to do?  Do you want authorized
 headings?  Or do you want LCSH that appears in the wild?

 -Ross.

 On Tue, Dec 8, 2009 at 10:35 AM, Ed Summers e...@pobox.com wrote:
  On Tue, Dec 8, 2009 at 10:16 AM, Karen Coyle li...@kcoyle.net wrote:
  Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a
 copy
  of the LC subject authority file. The entries in this file form the
 basis
  for subject headings, most of which add facets to the authority entry
 when
  forming the subject heading. One could do a left-anchored match against
  actual headings, and that might provide some interesting statistics.
 
  Yes, using the actual headings extracted from bibliographic data seems
  to be a better approach. It's easier to rank them, and as Karen points
  out you get the actual post-coordinated headings, not just the
  headings LC has decided to establish authority records for.
 
  //Ed
 



Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service

2009-12-08 Thread Ed Summers
On Tue, Dec 8, 2009 at 11:08 AM, Ethan Gruber ewg4x...@gmail.com wrote:
 my interest is how I (or other people
 developing their own autosuggest systems based on the subject headings) can
 pull updates of the master authority XML file into my index of terms.

Take a look at the Atom feed for id.loc.gov [1]. I think there's
enough information there for you to pull in creates, updates and
deletes. If not it would be a fun practical, experiment to try to get
it to the point where it can work for you, and hopefully others.

//Ed

[1] http://id.loc.gov/authorities/feed/


Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service

2009-12-08 Thread Ethan Gruber
Thanks a lot, Ed!

I think that could work.

On Tue, Dec 8, 2009 at 11:23 AM, Ed Summers e...@pobox.com wrote:

 On Tue, Dec 8, 2009 at 11:08 AM, Ethan Gruber ewg4x...@gmail.com wrote:
  my interest is how I (or other people
  developing their own autosuggest systems based on the subject headings)
 can
  pull updates of the master authority XML file into my index of terms.

 Take a look at the Atom feed for id.loc.gov [1]. I think there's
 enough information there for you to pull in creates, updates and
 deletes. If not it would be a fun practical, experiment to try to get
 it to the point where it can work for you, and hopefully others.

 //Ed

 [1] http://id.loc.gov/authorities/feed/



[CODE4LIB] Position Announcement: Software Developer - Equinox Software, Inc., Norcross, GA, USA

2009-12-08 Thread Jason Etheridge
** Apologies for the cross-posting **

Software Developer 11/13/2009

Equinox Software Inc. (The Evergreen Experts) seeks highly
motivated, experienced Software Developers to contribute to our
dynamic, fast-growing open source support and development company.

About Equinox Software Inc.

Founded by the original designers and developers, Equinox Software
boasts a growing team of skilled developers and professionals who
provide comprehensive services for Evergreen, the enterprise-grade,
open source Integrated Library System (ILS). Evergreen provides back
end services to libraries and library consortia. Visit
http://www.esilibrary.com for more company information or
http://www.evergreen-ils.org to learn more about Evergreen.

Equinox is in Norcross, GA, conveniently located just 20 miles
northeast of metro-Atlanta.

Skills We Are Looking For:

   * Experience with Perl, C, Python, and Javascript.
   * Familiarity with public and/or academic library operations and
standards a plus.
   * Familiarity with the Evergreen ILS and the open source culture a plus.

What We Have to Offer:

   * Competitive salary based upon experience.
   * Full company-paid medical, dental, and vision insurance; paid
sick and vacation time; and a 401k plan with a matching company
contribution.
   * A challenging environment with opportunities to expand and
improve your skill sets.
   * A humane work environment staffed by dedicated professionals who
share your values for excellence and customer service.

Applications will be accepted until the positions are filled. Please
send resume or c.v. with cover letter, three references, and
compensation requirements to care...@esilibrary.com with the subject
line Software Developer.

--
Jason Etheridge
 | VP, Tactical Development
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  ja...@esilibrary.com
 | web:  http://www.esilibrary.com

Please join us for the Evergreen 2010 International Conference, April 20-23,
2010 at the Amway Grand Hotel and Convention Center, Grand Rapids, Michigan.
http://www.evergreen2010.org/