Tim, I think this is a fantastic idea and the only suggestion I would
make is to make sure you get on the Open Library developers list (I'm
looking for the URL... I'll email when I find it unless someone else
beats me to it) and discuss this there. (You may already have done
this, I don't know.) They may be interested in hosting such a
project, and of course it would be helpful to have their knowledge of
the collections and apis on call. They seem to be keen on involving
developers from outside the Internet Archives staff, and this seems
like a perfect opportunity.

I would be very interested in helping you test such a service,
though, and I would definitely put links into our library catalogue.


Elizabeth (Bess) Sadler
Research and Development Librarian
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904

(434) 243-2305

On Mar 6, 2008, at 8:47 AM, Tim Shearer wrote:

Howdy folks,

I've been playing and thinking.  I'd like to have what amounts to a
identifier index to oca digitized texts.  I want to be able to pull
all the
records that have oclc numbers, issns, isbns, etc.  I want it to be
lightweight, fast, searchable.

Would anyone else want/use such a thing?

I'm thinking about building something like this.

If I do, it would be ideal if wouldn't be a duplication of effort,
so anyone
got this in the works?  And if it would meet the needs of others.

My basic notion is to crawl the site (starting with "americana",
the American
Libraries.  Pull the oca unique identifier (e.g.
northcarolinayea1910rale) and
associate it with

unique identifiers (oclc numbers, issns, isbns, lc numbers)
contributing institution's alias and unique catalog identifier
upload date

That's all I was thinking of.  Then there's what you might be able
to do with

        Give me all the oca unique identifiers that have oclc numbers
        Give me all the oca unique identifiers with isbns that were
                uploaded between x and y date
        Give me the oca unique identifier for this oclc number

Planning to do:

        keep crawling it and keep it up to date.

Things I wasn't planning to do:

        worry about other unique ids (you'd have to go to xISBN or
                ThingISBN yourself)
        worry about storing anything else from oca.

It would be good for being able to add an 856 to matches in your
catalog. It
would not be good for grabbing all marc records for all of oca.

Anyhow, is this duplication of effort?  Would you like something
like this?
What else would you like it to do (keeping in mind this is an
unfunded pet
project)?  How would you want to talk to it?  I was thinking of a
web service,
but hadn't thought too much about how to query it or how I'd
deliver results.

Of course I'm being an idiot and trying out new tools at the same
time (python
to see what the buzz is all about, sqlite just to learn it (it may
not work

Thoughts?  Vicious criticism?


Reply via email to