There's no reason to screen scrape the results.

The label service permits the use of the "Accept" header.  For example:

curl -i -L -H "Accept: application/rdf+xml" http://id.loc.gov/authorities/label/orchids

Take note of the initial set of response headers:

HTTP/1.1 302 FOUND
Location: http://id.loc.gov/authorities/subjects/sh85095334
X-URI: http://id.loc.gov/authorities/subjects/sh85095334
X-PrefLabel: Orchids
Cache-Control: public, max-age=1209600
Content-Length: 0
Date: Sat, 29 Jul 2017 12:41:00 GMT
Server: Apache
X-Varnish: 95467183 53781367
Age: 2343793
Via: 1.1 varnish-v4
X-Cache: HIT
X-Cache-Hits: 24
Connection: keep-alive

If you want, you could perform only a HEAD request on the label service and then use the X-URI and X-PrefLabel headers to gather the info you need. NB: The service works on an exact match, more or less; take off the 's' of 'orchids' and you'll get an entirely different result.

You can also operate on the search results - not the label service - programatically. See "Supported Search serialization formats" here: http://id.loc.gov/techcenter/serializations.html One XML-based option and a JSON one too.

Yours,
Kevin



On 8/25/17 10:39, Josh Welker wrote:
Thanks, Nathan. That looks like it will work if I do it manually, but there
is no interface for doing it programmatically. Is LC okay with me screen
scraping the search results?

Joshua Welker
Information Technology Librarian
James C. Kirkpatrick Library
University of Central Missouri
Warrensburg, MO 64093
JCKL 2260
660.543.8022


On Fri, Aug 25, 2017 at 10:18 AM, Trail, Nate <n...@loc.gov> wrote:

You can try our "label" service. See under "known label retrieval" here:
http://id.loc.gov/techcenter/searching.html
I would be glad to help further.

Thanks, Nate

-----------------------------------------
Nate Trail
Network Development & MARC Standards Office
LS/ABA/NDMSO
LA308, Mail Stop 4402
Library of Congress
Washington DC 20540




-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB@LISTS.CLIR.ORG] On Behalf Of
Josh Welker
Sent: Friday, August 25, 2017 11:12 AM
To: CODE4LIB@LISTS.CLIR.ORG
Subject: [CODE4LIB] Searching LC Name Authority file programmatically

I have sort of inherited authority control recently at my library, and I
want to find some way to automate some common workflows. I am looking for
an easy way to query blind name references against the LC Name Authority
master file. There is no API for searching it on the web, and the name file
itself is 10+ GB and hard to work with.

Here are options as I see them:


    - Screen scrape the search engine at id.loc.gov.
    - Load the 10+ GB name file into a local database to query
    programmatically.

Does anyone have experience with either method? Does some other method
exist I am not aware of?

Joshua Welker
Information Technology Librarian
James C. Kirkpatrick Library
University of Central Missouri
Warrensburg, MO 64093
JCKL 2260
660.543.8022

Reply via email to