Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service
On Mon, Dec 7, 2009 at 5:56 PM, Ed Summers e...@pobox.com wrote: It would be great to have some external dataset to use in ranking LCSH suggestions at id.loc.gov. But at the moment it's a simple mysql db loaded up with some MARC LCSH data. I guess it could do something smart with PageRank-like ranking of 'super-concepts' (concepts that are linked to a lot)...but that would've taken longer than 20 minutes :-) The frequency of an LCSH term within the LC catalog could also be useful for ranking, although I'm not sure if such data would be readily available. Another possibility would be a simple count of broader terms + narrower terms + related terms or something like that. Although PageRank would probably be better, since even some important terms might have a relatively small number of immediately-adjacent links. Keith
Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service
Quoting Keith Jenkins k...@cornell.edu: The frequency of an LCSH term within the LC catalog could also be useful for ranking, although I'm not sure if such data would be readily available. Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a copy of the LC subject authority file. The entries in this file form the basis for subject headings, most of which add facets to the authority entry when forming the subject heading. One could do a left-anchored match against actual headings, and that might provide some interesting statistics. Edward Betts of the Open Library project did some casual data gathering for subjects, and posted his top 1000 subject headings (not subject authorities): http://edwardbetts.com/ol/top_1000_subjects The OL has decided to break up the subject headings into their subfields, and somewhere there are some pages that show some subfields with the highest ranking subfields they appear with. (There must be a better way to say that! Sorry, too early, too few cups of tea.) One example is here: http://home.us.archive.org/~edward/related/Cheese.html I think that something like this will be incorporated into the next version of OL, which will be heavily navigation-oriented rather than search-oriented. kc p.s. Anyone who wants to play with a file can grab the OL data export: http://openlibrary.org/dev/docs/jsondump Unfortunately it includes both LC and non-LC subjects (mainly BISAC from Amazon) Another possibility would be a simple count of broader terms + narrower terms + related terms or something like that. Although PageRank would probably be better, since even some important terms might have a relatively small number of immediately-adjacent links. Keith -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service
If that isn't LCSH, then is the entirety of LCSH available electronically in some capacity (at least available in some easily accessible file or files that can be processed)? Ethan On Tue, Dec 8, 2009 at 10:16 AM, Karen Coyle li...@kcoyle.net wrote: Quoting Keith Jenkins k...@cornell.edu: The frequency of an LCSH term within the LC catalog could also be useful for ranking, although I'm not sure if such data would be readily available. Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a copy of the LC subject authority file. The entries in this file form the basis for subject headings, most of which add facets to the authority entry when forming the subject heading. One could do a left-anchored match against actual headings, and that might provide some interesting statistics. Edward Betts of the Open Library project did some casual data gathering for subjects, and posted his top 1000 subject headings (not subject authorities): http://edwardbetts.com/ol/top_1000_subjects The OL has decided to break up the subject headings into their subfields, and somewhere there are some pages that show some subfields with the highest ranking subfields they appear with. (There must be a better way to say that! Sorry, too early, too few cups of tea.) One example is here: http://home.us.archive.org/~edward/related/Cheese.htmlhttp://home.us.archive.org/%7Eedward/related/Cheese.html I think that something like this will be incorporated into the next version of OL, which will be heavily navigation-oriented rather than search-oriented. kc p.s. Anyone who wants to play with a file can grab the OL data export: http://openlibrary.org/dev/docs/jsondump Unfortunately it includes both LC and non-LC subjects (mainly BISAC from Amazon) Another possibility would be a simple count of broader terms + narrower terms + related terms or something like that. Although PageRank would probably be better, since even some important terms might have a relatively small number of immediately-adjacent links. Keith -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service
On Tue, Dec 8, 2009 at 10:16 AM, Karen Coyle li...@kcoyle.net wrote: Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a copy of the LC subject authority file. The entries in this file form the basis for subject headings, most of which add facets to the authority entry when forming the subject heading. One could do a left-anchored match against actual headings, and that might provide some interesting statistics. Yes, using the actual headings extracted from bibliographic data seems to be a better approach. It's easier to rank them, and as Karen points out you get the actual post-coordinated headings, not just the headings LC has decided to establish authority records for. //Ed
Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service
I suppose it would be helpful to actually know the problem that is trying to be solved here (I mean, a lot of people, including myself, are throwing out solutions to a problem that's never been actually defined). Ethan, what, exactly, are you trying to do? Do you want authorized headings? Or do you want LCSH that appears in the wild? -Ross. On Tue, Dec 8, 2009 at 10:35 AM, Ed Summers e...@pobox.com wrote: On Tue, Dec 8, 2009 at 10:16 AM, Karen Coyle li...@kcoyle.net wrote: Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a copy of the LC subject authority file. The entries in this file form the basis for subject headings, most of which add facets to the authority entry when forming the subject heading. One could do a left-anchored match against actual headings, and that might provide some interesting statistics. Yes, using the actual headings extracted from bibliographic data seems to be a better approach. It's easier to rank them, and as Karen points out you get the actual post-coordinated headings, not just the headings LC has decided to establish authority records for. //Ed
Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service
I am going to integrate subject headings into an XForms application. I can work with the authorized headings. Before taking the XML file that LC provides and doing my own thing with Solr, I wanted to see if anyone had used the headings to do their own type of autosuggest. I was especially interested if anyone had done it with data provided dynamically by a service that may or may not have existed on LC. A Solr index that I populate with data I download today becomes static, my interest is how I (or other people developing their own autosuggest systems based on the subject headings) can pull updates of the master authority XML file into my index of terms. Ethan On Tue, Dec 8, 2009 at 10:42 AM, Ross Singer rossfsin...@gmail.com wrote: I suppose it would be helpful to actually know the problem that is trying to be solved here (I mean, a lot of people, including myself, are throwing out solutions to a problem that's never been actually defined). Ethan, what, exactly, are you trying to do? Do you want authorized headings? Or do you want LCSH that appears in the wild? -Ross. On Tue, Dec 8, 2009 at 10:35 AM, Ed Summers e...@pobox.com wrote: On Tue, Dec 8, 2009 at 10:16 AM, Karen Coyle li...@kcoyle.net wrote: Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a copy of the LC subject authority file. The entries in this file form the basis for subject headings, most of which add facets to the authority entry when forming the subject heading. One could do a left-anchored match against actual headings, and that might provide some interesting statistics. Yes, using the actual headings extracted from bibliographic data seems to be a better approach. It's easier to rank them, and as Karen points out you get the actual post-coordinated headings, not just the headings LC has decided to establish authority records for. //Ed
Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service
On Tue, Dec 8, 2009 at 11:08 AM, Ethan Gruber ewg4x...@gmail.com wrote: my interest is how I (or other people developing their own autosuggest systems based on the subject headings) can pull updates of the master authority XML file into my index of terms. Take a look at the Atom feed for id.loc.gov [1]. I think there's enough information there for you to pull in creates, updates and deletes. If not it would be a fun practical, experiment to try to get it to the point where it can work for you, and hopefully others. //Ed [1] http://id.loc.gov/authorities/feed/
Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service
Thanks a lot, Ed! I think that could work. On Tue, Dec 8, 2009 at 11:23 AM, Ed Summers e...@pobox.com wrote: On Tue, Dec 8, 2009 at 11:08 AM, Ethan Gruber ewg4x...@gmail.com wrote: my interest is how I (or other people developing their own autosuggest systems based on the subject headings) can pull updates of the master authority XML file into my index of terms. Take a look at the Atom feed for id.loc.gov [1]. I think there's enough information there for you to pull in creates, updates and deletes. If not it would be a fun practical, experiment to try to get it to the point where it can work for you, and hopefully others. //Ed [1] http://id.loc.gov/authorities/feed/
[CODE4LIB] Position Announcement: Software Developer - Equinox Software, Inc., Norcross, GA, USA
** Apologies for the cross-posting ** Software Developer 11/13/2009 Equinox Software Inc. (The Evergreen Experts) seeks highly motivated, experienced Software Developers to contribute to our dynamic, fast-growing open source support and development company. About Equinox Software Inc. Founded by the original designers and developers, Equinox Software boasts a growing team of skilled developers and professionals who provide comprehensive services for Evergreen, the enterprise-grade, open source Integrated Library System (ILS). Evergreen provides back end services to libraries and library consortia. Visit http://www.esilibrary.com for more company information or http://www.evergreen-ils.org to learn more about Evergreen. Equinox is in Norcross, GA, conveniently located just 20 miles northeast of metro-Atlanta. Skills We Are Looking For: * Experience with Perl, C, Python, and Javascript. * Familiarity with public and/or academic library operations and standards a plus. * Familiarity with the Evergreen ILS and the open source culture a plus. What We Have to Offer: * Competitive salary based upon experience. * Full company-paid medical, dental, and vision insurance; paid sick and vacation time; and a 401k plan with a matching company contribution. * A challenging environment with opportunities to expand and improve your skill sets. * A humane work environment staffed by dedicated professionals who share your values for excellence and customer service. Applications will be accepted until the positions are filled. Please send resume or c.v. with cover letter, three references, and compensation requirements to care...@esilibrary.com with the subject line Software Developer. -- Jason Etheridge | VP, Tactical Development | Equinox Software, Inc. / The Evergreen Experts | phone: 1-877-OPEN-ILS (673-6457) | email: ja...@esilibrary.com | web: http://www.esilibrary.com Please join us for the Evergreen 2010 International Conference, April 20-23, 2010 at the Amway Grand Hotel and Convention Center, Grand Rapids, Michigan. http://www.evergreen2010.org/