Re: [CODE4LIB] Voting update

2008-02-26 Thread Ross Singer
Your votes can now be changed.

They actually could be changed before, unless you changed your vote to zero.

Let me know if you see any new problems (like you didn't vote before
and all of a sudden your computer explodes when you press "submit").

-Ross.

On Tue, Feb 26, 2008 at 5:41 PM, Michael J. Giarlo
<[EMAIL PROTECTED]> wrote:
> Hey folks,
>
>  A short voting update:  Ross is working on the voting system and will
>  announce a fix later on.  Your votes are safe -- insofar as we trust
>  Ross -- but there is an issue with folks not being able to change
>  their ratings.
>
>  And here are the results as they stand:
>
>  178  Brown University/Providence, Rhode Island
>  166 Columbus, OH
>  113 Southeast Florida
>  69  Bahía Blanca, Argentina
>
>  Voting ends at 11:59PM PDT tomorrow, so let your voice be heard.
>
>  -Mike
>


[CODE4LIB] Voting update

2008-02-26 Thread Michael J. Giarlo
Hey folks,

A short voting update:  Ross is working on the voting system and will
announce a fix later on.  Your votes are safe -- insofar as we trust
Ross -- but there is an issue with folks not being able to change
their ratings.

And here are the results as they stand:

178  Brown University/Providence, Rhode Island
166 Columbus, OH
113 Southeast Florida
69  Bahía Blanca, Argentina

Voting ends at 11:59PM PDT tomorrow, so let your voice be heard.

-Mike


Re: [CODE4LIB] oca api?

2008-02-26 Thread Steve Toub
It is the same interface Chris described. I had emailed with Brewster directly 
to learn about it.

In that email exchange I got the sense that OAI-PMH was better. And my 
questions about a staging
instance went unanswered. But in standing in here when Jonathan cornered 
Brewster, I got the sense
he prefers the query interface. He didn't set concrete guidance about how many 
queries is too much
but he was conscious of performance.
   --SET





--- Chris Freeland <[EMAIL PROTECTED]> wrote:

> My guess is that, yes, the query interface we've been discussing here
> and the 'all sorts of interfaces that none of us knew about' are the
> same.  It's not documented that I'm aware of.  We've found out about it
> by literally sitting next to IA developers and asking questions.
>
> Chris
> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
> Jonathan Rochkind
> Sent: Tuesday, February 26, 2008 12:18 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] oca api?
>
> So in answer to my question here at the Code4Lib conference, after
> Brewster's keynote, Brewster suggests there are all sorts of interfaces
> that none of us knew about. Or at least I didn't know about, and haven't
> been able to figure out in months of trying!  I'm going to try and
> corner him and ask for an email of who we should contact.
>
> Perhaps it's the XML interface that you guys know about already. Is that
> documented anywhere? How the heck did you find out about it?
>
> Jonathan
>
>
> >>> Steve Toub <[EMAIL PROTECTED]> 02/25/08 9:41 PM >>>
> I'll add that when IA told me about
> http://www.archive.org/services/search.php interface to return
> XML, they asked that we not send more than 100 records at time since
> doing more would adversely
> affect production services. Which made it seem like OAI-PMH was a better
> way to go.
>
> Chris, can you explain a bit more about what this means: "We found their
> OAI interface to pull
> scanned items inconsistently based on date of scanning"? I'm having
> trouble parsing.
>
>
>--SET
>
>
>
>
> --- Chris Freeland <[EMAIL PROTECTED]> wrote:
>
> > Jonathan - No, I don't believe it's documented - at least not anywhere
> > publicly.  If any IA/OCA folks are lurking, here's an opportunity to
> > make a bunch of techies happy...
> >
> > Chris
> >
> > -Original Message-
> > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
> Of
> > Jonathan Rochkind
> > Sent: Monday, February 25, 2008 2:48 PM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] oca api?
> >
> > I hadn't known this "custom query interface" existed! This is welcome
> > news. Is this documented anywhere?
> >
> > Jonathan
> >
> >
> > >>> Chris Freeland <[EMAIL PROTECTED]> 02/25/08 2:51 PM >>>
> > Steve & Tim,
> >
> > I'm the tech director for the Biodiversity Heritage Library (BHL),
> which
> > is a consortium of 10 natural history libraries who have partnered
> with
> > Internet Archive (IA)/OCA for scanning our collections.  We've just
> > launched our revamped portal, complete with more than 7,500 books &
> 2.8
> > million pages scanned by IA & other digitization partners, at:
> > http://www.biodiversitylibrary.org
> >
> > To build this portal we ingest metadata from IA.  We found their OAI
> > interface to pull scanned items inconsistently based on date of
> > scanning, so we switched to using their custom query interface.
> Here's
> > an example of a query we fire off:
> >
> >
> http://www.archive.org/services/search.php?query=collection:(biodiversit
> >
> y)+AND+updatedate:%5b2007-10-31+TO+2007-11-30%5d+AND+-contributor:(MBLWH
> > OI%20Library)&limit=10&submit=submit
> >
> > This is returning scanned items from the "biodiversity" collection,
> > updated between 10/31/2007 - 11/30/2007, restricted to one of our
> > contributing libraries (MBLWHOI Library), and limited to 10 results.
> >
> > The results are styled in the browser; view source to see the good
> > stuff.  We use this list to grab the identifiers we've yet to ingest.
> >
> > Some background: When a book is scanned through IA/OCA scanning, they
> > create their own unique identifier (like "annalesacademiae21univ") and
> > grab a MARC record from the contributing library's catalog.  All of
> the
> > scanned files, derivatives, and metadata files are stored on IA's
> > clusters in a directory named with the identifier.
> >
> > Steve mentioned using their /details/ directive, then sniffing the
> page
> > to get the cluster location and the files for downloading.  An easier
> > method is to use their /download/ directive, as in:
> >
> > http://www.archive.org/download/ID$, or in the example above:
> > http://www.archive.org/download/annalesacademiae21univ
> >
> > That automatically does a lookup on the cluster, which means you don't
> > have to scrape info off pages.  You can also address any files within
> > that directory, as in:
> >
> http://www.archive.org/download/annalesacademiae21univ/annalesacade

Re: [CODE4LIB] oca api?

2008-02-26 Thread Chris Freeland
My guess is that, yes, the query interface we've been discussing here
and the 'all sorts of interfaces that none of us knew about' are the
same.  It's not documented that I'm aware of.  We've found out about it
by literally sitting next to IA developers and asking questions.

Chris
-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
Jonathan Rochkind
Sent: Tuesday, February 26, 2008 12:18 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] oca api?

So in answer to my question here at the Code4Lib conference, after
Brewster's keynote, Brewster suggests there are all sorts of interfaces
that none of us knew about. Or at least I didn't know about, and haven't
been able to figure out in months of trying!  I'm going to try and
corner him and ask for an email of who we should contact.

Perhaps it's the XML interface that you guys know about already. Is that
documented anywhere? How the heck did you find out about it?

Jonathan


>>> Steve Toub <[EMAIL PROTECTED]> 02/25/08 9:41 PM >>>
I'll add that when IA told me about
http://www.archive.org/services/search.php interface to return
XML, they asked that we not send more than 100 records at time since
doing more would adversely
affect production services. Which made it seem like OAI-PMH was a better
way to go.

Chris, can you explain a bit more about what this means: "We found their
OAI interface to pull
scanned items inconsistently based on date of scanning"? I'm having
trouble parsing.


   --SET




--- Chris Freeland <[EMAIL PROTECTED]> wrote:

> Jonathan - No, I don't believe it's documented - at least not anywhere
> publicly.  If any IA/OCA folks are lurking, here's an opportunity to
> make a bunch of techies happy...
>
> Chris
>
> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
> Jonathan Rochkind
> Sent: Monday, February 25, 2008 2:48 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] oca api?
>
> I hadn't known this "custom query interface" existed! This is welcome
> news. Is this documented anywhere?
>
> Jonathan
>
>
> >>> Chris Freeland <[EMAIL PROTECTED]> 02/25/08 2:51 PM >>>
> Steve & Tim,
>
> I'm the tech director for the Biodiversity Heritage Library (BHL),
which
> is a consortium of 10 natural history libraries who have partnered
with
> Internet Archive (IA)/OCA for scanning our collections.  We've just
> launched our revamped portal, complete with more than 7,500 books &
2.8
> million pages scanned by IA & other digitization partners, at:
> http://www.biodiversitylibrary.org
>
> To build this portal we ingest metadata from IA.  We found their OAI
> interface to pull scanned items inconsistently based on date of
> scanning, so we switched to using their custom query interface.
Here's
> an example of a query we fire off:
>
>
http://www.archive.org/services/search.php?query=collection:(biodiversit
>
y)+AND+updatedate:%5b2007-10-31+TO+2007-11-30%5d+AND+-contributor:(MBLWH
> OI%20Library)&limit=10&submit=submit
>
> This is returning scanned items from the "biodiversity" collection,
> updated between 10/31/2007 - 11/30/2007, restricted to one of our
> contributing libraries (MBLWHOI Library), and limited to 10 results.
>
> The results are styled in the browser; view source to see the good
> stuff.  We use this list to grab the identifiers we've yet to ingest.
>
> Some background: When a book is scanned through IA/OCA scanning, they
> create their own unique identifier (like "annalesacademiae21univ") and
> grab a MARC record from the contributing library's catalog.  All of
the
> scanned files, derivatives, and metadata files are stored on IA's
> clusters in a directory named with the identifier.
>
> Steve mentioned using their /details/ directive, then sniffing the
page
> to get the cluster location and the files for downloading.  An easier
> method is to use their /download/ directive, as in:
>
> http://www.archive.org/download/ID$, or in the example above:
> http://www.archive.org/download/annalesacademiae21univ
>
> That automatically does a lookup on the cluster, which means you don't
> have to scrape info off pages.  You can also address any files within
> that directory, as in:
>
http://www.archive.org/download/annalesacademiae21univ/annalesacademiae2
> 1univ_marc.xml
>
> The only way to get standard identifiers (ISBN, ISSN, OCLC, LCCN) for
> these scanned books is to grab them out of the MARC record.  So the
> long-winded answer to your question, Tim, is no, there's no simple way
> to crossref what IA has scanned with your catalog - THAT I KNOW OF.
Big
> caveat on that last part.
>
> Happy to help with any other questions I can,
>
> Chris Freeland
>
>
> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
> Steve Toub
> Sent: Sunday, February 24, 2008 11:20 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] oca api?
>
> --- Tim Shearer <[EMAIL PROTECTED]> wrote:
>
> > Hi Folks,
> >
> > I'm looking int

Re: [CODE4LIB] oca api?

2008-02-26 Thread Jonathan Rochkind
So in answer to my question here at the Code4Lib conference, after Brewster's 
keynote, Brewster suggests there are all sorts of interfaces that none of us 
knew about. Or at least I didn't know about, and haven't been able to figure 
out in months of trying!  I'm going to try and corner him and ask for an email 
of who we should contact.

Perhaps it's the XML interface that you guys know about already. Is that 
documented anywhere? How the heck did you find out about it?

Jonathan


>>> Steve Toub <[EMAIL PROTECTED]> 02/25/08 9:41 PM >>>
I'll add that when IA told me about http://www.archive.org/services/search.php 
interface to return
XML, they asked that we not send more than 100 records at time since doing more 
would adversely
affect production services. Which made it seem like OAI-PMH was a better way to 
go.

Chris, can you explain a bit more about what this means: "We found their OAI 
interface to pull
scanned items inconsistently based on date of scanning"? I'm having trouble 
parsing.


   --SET




--- Chris Freeland <[EMAIL PROTECTED]> wrote:

> Jonathan - No, I don't believe it's documented - at least not anywhere
> publicly.  If any IA/OCA folks are lurking, here's an opportunity to
> make a bunch of techies happy...
>
> Chris
>
> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
> Jonathan Rochkind
> Sent: Monday, February 25, 2008 2:48 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] oca api?
>
> I hadn't known this "custom query interface" existed! This is welcome
> news. Is this documented anywhere?
>
> Jonathan
>
>
> >>> Chris Freeland <[EMAIL PROTECTED]> 02/25/08 2:51 PM >>>
> Steve & Tim,
>
> I'm the tech director for the Biodiversity Heritage Library (BHL), which
> is a consortium of 10 natural history libraries who have partnered with
> Internet Archive (IA)/OCA for scanning our collections.  We've just
> launched our revamped portal, complete with more than 7,500 books & 2.8
> million pages scanned by IA & other digitization partners, at:
> http://www.biodiversitylibrary.org
>
> To build this portal we ingest metadata from IA.  We found their OAI
> interface to pull scanned items inconsistently based on date of
> scanning, so we switched to using their custom query interface.  Here's
> an example of a query we fire off:
>
> http://www.archive.org/services/search.php?query=collection:(biodiversit
> y)+AND+updatedate:%5b2007-10-31+TO+2007-11-30%5d+AND+-contributor:(MBLWH
> OI%20Library)&limit=10&submit=submit
>
> This is returning scanned items from the "biodiversity" collection,
> updated between 10/31/2007 - 11/30/2007, restricted to one of our
> contributing libraries (MBLWHOI Library), and limited to 10 results.
>
> The results are styled in the browser; view source to see the good
> stuff.  We use this list to grab the identifiers we've yet to ingest.
>
> Some background: When a book is scanned through IA/OCA scanning, they
> create their own unique identifier (like "annalesacademiae21univ") and
> grab a MARC record from the contributing library's catalog.  All of the
> scanned files, derivatives, and metadata files are stored on IA's
> clusters in a directory named with the identifier.
>
> Steve mentioned using their /details/ directive, then sniffing the page
> to get the cluster location and the files for downloading.  An easier
> method is to use their /download/ directive, as in:
>
> http://www.archive.org/download/ID$, or in the example above:
> http://www.archive.org/download/annalesacademiae21univ
>
> That automatically does a lookup on the cluster, which means you don't
> have to scrape info off pages.  You can also address any files within
> that directory, as in:
> http://www.archive.org/download/annalesacademiae21univ/annalesacademiae2
> 1univ_marc.xml
>
> The only way to get standard identifiers (ISBN, ISSN, OCLC, LCCN) for
> these scanned books is to grab them out of the MARC record.  So the
> long-winded answer to your question, Tim, is no, there's no simple way
> to crossref what IA has scanned with your catalog - THAT I KNOW OF.  Big
> caveat on that last part.
>
> Happy to help with any other questions I can,
>
> Chris Freeland
>
>
> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
> Steve Toub
> Sent: Sunday, February 24, 2008 11:20 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] oca api?
>
> --- Tim Shearer <[EMAIL PROTECTED]> wrote:
>
> > Hi Folks,
> >
> > I'm looking into tapping the texts in the Open Content Alliance.
> >
> > A few questions...
> >
> > As near as I can tell, they don't expose (perhaps even store?) any
> common
> > unique identifiers (oclc number, issn, isbn, loc number).
>
> I poked around in this world a few months ago in my previous job at
> California Digital Library,
> also an OCA partner.
>
> The unique key seems to be text string identifier (one that seems to be
> completely different from
> the text string identifier in O

Re: [CODE4LIB] oca api?

2008-02-26 Thread Eric Lease Morgan

On Feb 26, 2008, at 12:21 PM, Chris Freeland wrote:


The biggest problem we found with the OAI implementation had to do
with
pulling incremental updates.  If you ask for a date range like Dec
1 - 5
you get all of Dec.  When we discussed this with IA we were shown the
query interface and just decided to use that instead since we're doing
mostly incremental updates.




Incidentally, I was asked a few months ago about incorporating Open
Library and/or Internet Archive material into a service I (barely)
maintain called Ockham Alert. I told them I would be happy to do so,
but since Ockham Alert relies on OAI date ranges, and their date
ranges did not work, I was unable to oblige them. I suppose the date
issue with their OAI implementation is a known issue.

--
Eric Lease Morgan
University Libraries of Notre Dame

(574) 631-8604


Re: [CODE4LIB] oca api?

2008-02-26 Thread Chris Freeland
Steve - I'm not sure about the scalability of the query interface, so
hopefully someone from IA can comment.

The biggest problem we found with the OAI implementation had to do with
pulling incremental updates.  If you ask for a date range like Dec 1 - 5
you get all of Dec.  When we discussed this with IA we were shown the
query interface and just decided to use that instead since we're doing
mostly incremental updates.

The date inconsistency might not be enough to drive folks away from OAI
if you're looking to do one-time, or infrequent, harvests.

Chris

-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
Steve Toub
Sent: Monday, February 25, 2008 8:41 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] oca api?

I'll add that when IA told me about
http://www.archive.org/services/search.php interface to return
XML, they asked that we not send more than 100 records at time since
doing more would adversely
affect production services. Which made it seem like OAI-PMH was a better
way to go.

Chris, can you explain a bit more about what this means: "We found their
OAI interface to pull
scanned items inconsistently based on date of scanning"? I'm having
trouble parsing.


   --SET




--- Chris Freeland <[EMAIL PROTECTED]> wrote:

> Jonathan - No, I don't believe it's documented - at least not anywhere
> publicly.  If any IA/OCA folks are lurking, here's an opportunity to
> make a bunch of techies happy...
>
> Chris
>
> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
> Jonathan Rochkind
> Sent: Monday, February 25, 2008 2:48 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] oca api?
>
> I hadn't known this "custom query interface" existed! This is welcome
> news. Is this documented anywhere?
>
> Jonathan
>
>
> >>> Chris Freeland <[EMAIL PROTECTED]> 02/25/08 2:51 PM >>>
> Steve & Tim,
>
> I'm the tech director for the Biodiversity Heritage Library (BHL),
which
> is a consortium of 10 natural history libraries who have partnered
with
> Internet Archive (IA)/OCA for scanning our collections.  We've just
> launched our revamped portal, complete with more than 7,500 books &
2.8
> million pages scanned by IA & other digitization partners, at:
> http://www.biodiversitylibrary.org
>
> To build this portal we ingest metadata from IA.  We found their OAI
> interface to pull scanned items inconsistently based on date of
> scanning, so we switched to using their custom query interface.
Here's
> an example of a query we fire off:
>
>
http://www.archive.org/services/search.php?query=collection:(biodiversit
>
y)+AND+updatedate:%5b2007-10-31+TO+2007-11-30%5d+AND+-contributor:(MBLWH
> OI%20Library)&limit=10&submit=submit
>
> This is returning scanned items from the "biodiversity" collection,
> updated between 10/31/2007 - 11/30/2007, restricted to one of our
> contributing libraries (MBLWHOI Library), and limited to 10 results.
>
> The results are styled in the browser; view source to see the good
> stuff.  We use this list to grab the identifiers we've yet to ingest.
>
> Some background: When a book is scanned through IA/OCA scanning, they
> create their own unique identifier (like "annalesacademiae21univ") and
> grab a MARC record from the contributing library's catalog.  All of
the
> scanned files, derivatives, and metadata files are stored on IA's
> clusters in a directory named with the identifier.
>
> Steve mentioned using their /details/ directive, then sniffing the
page
> to get the cluster location and the files for downloading.  An easier
> method is to use their /download/ directive, as in:
>
> http://www.archive.org/download/ID$, or in the example above:
> http://www.archive.org/download/annalesacademiae21univ
>
> That automatically does a lookup on the cluster, which means you don't
> have to scrape info off pages.  You can also address any files within
> that directory, as in:
>
http://www.archive.org/download/annalesacademiae21univ/annalesacademiae2
> 1univ_marc.xml
>
> The only way to get standard identifiers (ISBN, ISSN, OCLC, LCCN) for
> these scanned books is to grab them out of the MARC record.  So the
> long-winded answer to your question, Tim, is no, there's no simple way
> to crossref what IA has scanned with your catalog - THAT I KNOW OF.
Big
> caveat on that last part.
>
> Happy to help with any other questions I can,
>
> Chris Freeland
>
>
> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
Of
> Steve Toub
> Sent: Sunday, February 24, 2008 11:20 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] oca api?
>
> --- Tim Shearer <[EMAIL PROTECTED]> wrote:
>
> > Hi Folks,
> >
> > I'm looking into tapping the texts in the Open Content Alliance.
> >
> > A few questions...
> >
> > As near as I can tell, they don't expose (perhaps even store?) any
> common
> > unique identifiers (oclc number, issn, isbn, loc number).
>
> I poked around in this world a few months