Re: [CODE4LIB] Voting update
Your votes can now be changed. They actually could be changed before, unless you changed your vote to zero. Let me know if you see any new problems (like you didn't vote before and all of a sudden your computer explodes when you press "submit"). -Ross. On Tue, Feb 26, 2008 at 5:41 PM, Michael J. Giarlo <[EMAIL PROTECTED]> wrote: > Hey folks, > > A short voting update: Ross is working on the voting system and will > announce a fix later on. Your votes are safe -- insofar as we trust > Ross -- but there is an issue with folks not being able to change > their ratings. > > And here are the results as they stand: > > 178 Brown University/Providence, Rhode Island > 166 Columbus, OH > 113 Southeast Florida > 69 Bahía Blanca, Argentina > > Voting ends at 11:59PM PDT tomorrow, so let your voice be heard. > > -Mike >
[CODE4LIB] Voting update
Hey folks, A short voting update: Ross is working on the voting system and will announce a fix later on. Your votes are safe -- insofar as we trust Ross -- but there is an issue with folks not being able to change their ratings. And here are the results as they stand: 178 Brown University/Providence, Rhode Island 166 Columbus, OH 113 Southeast Florida 69 Bahía Blanca, Argentina Voting ends at 11:59PM PDT tomorrow, so let your voice be heard. -Mike
Re: [CODE4LIB] oca api?
It is the same interface Chris described. I had emailed with Brewster directly to learn about it. In that email exchange I got the sense that OAI-PMH was better. And my questions about a staging instance went unanswered. But in standing in here when Jonathan cornered Brewster, I got the sense he prefers the query interface. He didn't set concrete guidance about how many queries is too much but he was conscious of performance. --SET --- Chris Freeland <[EMAIL PROTECTED]> wrote: > My guess is that, yes, the query interface we've been discussing here > and the 'all sorts of interfaces that none of us knew about' are the > same. It's not documented that I'm aware of. We've found out about it > by literally sitting next to IA developers and asking questions. > > Chris > -Original Message- > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of > Jonathan Rochkind > Sent: Tuesday, February 26, 2008 12:18 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] oca api? > > So in answer to my question here at the Code4Lib conference, after > Brewster's keynote, Brewster suggests there are all sorts of interfaces > that none of us knew about. Or at least I didn't know about, and haven't > been able to figure out in months of trying! I'm going to try and > corner him and ask for an email of who we should contact. > > Perhaps it's the XML interface that you guys know about already. Is that > documented anywhere? How the heck did you find out about it? > > Jonathan > > > >>> Steve Toub <[EMAIL PROTECTED]> 02/25/08 9:41 PM >>> > I'll add that when IA told me about > http://www.archive.org/services/search.php interface to return > XML, they asked that we not send more than 100 records at time since > doing more would adversely > affect production services. Which made it seem like OAI-PMH was a better > way to go. > > Chris, can you explain a bit more about what this means: "We found their > OAI interface to pull > scanned items inconsistently based on date of scanning"? I'm having > trouble parsing. > > >--SET > > > > > --- Chris Freeland <[EMAIL PROTECTED]> wrote: > > > Jonathan - No, I don't believe it's documented - at least not anywhere > > publicly. If any IA/OCA folks are lurking, here's an opportunity to > > make a bunch of techies happy... > > > > Chris > > > > -Original Message- > > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf > Of > > Jonathan Rochkind > > Sent: Monday, February 25, 2008 2:48 PM > > To: CODE4LIB@LISTSERV.ND.EDU > > Subject: Re: [CODE4LIB] oca api? > > > > I hadn't known this "custom query interface" existed! This is welcome > > news. Is this documented anywhere? > > > > Jonathan > > > > > > >>> Chris Freeland <[EMAIL PROTECTED]> 02/25/08 2:51 PM >>> > > Steve & Tim, > > > > I'm the tech director for the Biodiversity Heritage Library (BHL), > which > > is a consortium of 10 natural history libraries who have partnered > with > > Internet Archive (IA)/OCA for scanning our collections. We've just > > launched our revamped portal, complete with more than 7,500 books & > 2.8 > > million pages scanned by IA & other digitization partners, at: > > http://www.biodiversitylibrary.org > > > > To build this portal we ingest metadata from IA. We found their OAI > > interface to pull scanned items inconsistently based on date of > > scanning, so we switched to using their custom query interface. > Here's > > an example of a query we fire off: > > > > > http://www.archive.org/services/search.php?query=collection:(biodiversit > > > y)+AND+updatedate:%5b2007-10-31+TO+2007-11-30%5d+AND+-contributor:(MBLWH > > OI%20Library)&limit=10&submit=submit > > > > This is returning scanned items from the "biodiversity" collection, > > updated between 10/31/2007 - 11/30/2007, restricted to one of our > > contributing libraries (MBLWHOI Library), and limited to 10 results. > > > > The results are styled in the browser; view source to see the good > > stuff. We use this list to grab the identifiers we've yet to ingest. > > > > Some background: When a book is scanned through IA/OCA scanning, they > > create their own unique identifier (like "annalesacademiae21univ") and > > grab a MARC record from the contributing library's catalog. All of > the > > scanned files, derivatives, and metadata files are stored on IA's > > clusters in a directory named with the identifier. > > > > Steve mentioned using their /details/ directive, then sniffing the > page > > to get the cluster location and the files for downloading. An easier > > method is to use their /download/ directive, as in: > > > > http://www.archive.org/download/ID$, or in the example above: > > http://www.archive.org/download/annalesacademiae21univ > > > > That automatically does a lookup on the cluster, which means you don't > > have to scrape info off pages. You can also address any files within > > that directory, as in: > > > http://www.archive.org/download/annalesacademiae21univ/annalesacade
Re: [CODE4LIB] oca api?
My guess is that, yes, the query interface we've been discussing here and the 'all sorts of interfaces that none of us knew about' are the same. It's not documented that I'm aware of. We've found out about it by literally sitting next to IA developers and asking questions. Chris -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Jonathan Rochkind Sent: Tuesday, February 26, 2008 12:18 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] oca api? So in answer to my question here at the Code4Lib conference, after Brewster's keynote, Brewster suggests there are all sorts of interfaces that none of us knew about. Or at least I didn't know about, and haven't been able to figure out in months of trying! I'm going to try and corner him and ask for an email of who we should contact. Perhaps it's the XML interface that you guys know about already. Is that documented anywhere? How the heck did you find out about it? Jonathan >>> Steve Toub <[EMAIL PROTECTED]> 02/25/08 9:41 PM >>> I'll add that when IA told me about http://www.archive.org/services/search.php interface to return XML, they asked that we not send more than 100 records at time since doing more would adversely affect production services. Which made it seem like OAI-PMH was a better way to go. Chris, can you explain a bit more about what this means: "We found their OAI interface to pull scanned items inconsistently based on date of scanning"? I'm having trouble parsing. --SET --- Chris Freeland <[EMAIL PROTECTED]> wrote: > Jonathan - No, I don't believe it's documented - at least not anywhere > publicly. If any IA/OCA folks are lurking, here's an opportunity to > make a bunch of techies happy... > > Chris > > -Original Message- > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of > Jonathan Rochkind > Sent: Monday, February 25, 2008 2:48 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] oca api? > > I hadn't known this "custom query interface" existed! This is welcome > news. Is this documented anywhere? > > Jonathan > > > >>> Chris Freeland <[EMAIL PROTECTED]> 02/25/08 2:51 PM >>> > Steve & Tim, > > I'm the tech director for the Biodiversity Heritage Library (BHL), which > is a consortium of 10 natural history libraries who have partnered with > Internet Archive (IA)/OCA for scanning our collections. We've just > launched our revamped portal, complete with more than 7,500 books & 2.8 > million pages scanned by IA & other digitization partners, at: > http://www.biodiversitylibrary.org > > To build this portal we ingest metadata from IA. We found their OAI > interface to pull scanned items inconsistently based on date of > scanning, so we switched to using their custom query interface. Here's > an example of a query we fire off: > > http://www.archive.org/services/search.php?query=collection:(biodiversit > y)+AND+updatedate:%5b2007-10-31+TO+2007-11-30%5d+AND+-contributor:(MBLWH > OI%20Library)&limit=10&submit=submit > > This is returning scanned items from the "biodiversity" collection, > updated between 10/31/2007 - 11/30/2007, restricted to one of our > contributing libraries (MBLWHOI Library), and limited to 10 results. > > The results are styled in the browser; view source to see the good > stuff. We use this list to grab the identifiers we've yet to ingest. > > Some background: When a book is scanned through IA/OCA scanning, they > create their own unique identifier (like "annalesacademiae21univ") and > grab a MARC record from the contributing library's catalog. All of the > scanned files, derivatives, and metadata files are stored on IA's > clusters in a directory named with the identifier. > > Steve mentioned using their /details/ directive, then sniffing the page > to get the cluster location and the files for downloading. An easier > method is to use their /download/ directive, as in: > > http://www.archive.org/download/ID$, or in the example above: > http://www.archive.org/download/annalesacademiae21univ > > That automatically does a lookup on the cluster, which means you don't > have to scrape info off pages. You can also address any files within > that directory, as in: > http://www.archive.org/download/annalesacademiae21univ/annalesacademiae2 > 1univ_marc.xml > > The only way to get standard identifiers (ISBN, ISSN, OCLC, LCCN) for > these scanned books is to grab them out of the MARC record. So the > long-winded answer to your question, Tim, is no, there's no simple way > to crossref what IA has scanned with your catalog - THAT I KNOW OF. Big > caveat on that last part. > > Happy to help with any other questions I can, > > Chris Freeland > > > -Original Message- > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of > Steve Toub > Sent: Sunday, February 24, 2008 11:20 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] oca api? > > --- Tim Shearer <[EMAIL PROTECTED]> wrote: > > > Hi Folks, > > > > I'm looking int
Re: [CODE4LIB] oca api?
So in answer to my question here at the Code4Lib conference, after Brewster's keynote, Brewster suggests there are all sorts of interfaces that none of us knew about. Or at least I didn't know about, and haven't been able to figure out in months of trying! I'm going to try and corner him and ask for an email of who we should contact. Perhaps it's the XML interface that you guys know about already. Is that documented anywhere? How the heck did you find out about it? Jonathan >>> Steve Toub <[EMAIL PROTECTED]> 02/25/08 9:41 PM >>> I'll add that when IA told me about http://www.archive.org/services/search.php interface to return XML, they asked that we not send more than 100 records at time since doing more would adversely affect production services. Which made it seem like OAI-PMH was a better way to go. Chris, can you explain a bit more about what this means: "We found their OAI interface to pull scanned items inconsistently based on date of scanning"? I'm having trouble parsing. --SET --- Chris Freeland <[EMAIL PROTECTED]> wrote: > Jonathan - No, I don't believe it's documented - at least not anywhere > publicly. If any IA/OCA folks are lurking, here's an opportunity to > make a bunch of techies happy... > > Chris > > -Original Message- > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of > Jonathan Rochkind > Sent: Monday, February 25, 2008 2:48 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] oca api? > > I hadn't known this "custom query interface" existed! This is welcome > news. Is this documented anywhere? > > Jonathan > > > >>> Chris Freeland <[EMAIL PROTECTED]> 02/25/08 2:51 PM >>> > Steve & Tim, > > I'm the tech director for the Biodiversity Heritage Library (BHL), which > is a consortium of 10 natural history libraries who have partnered with > Internet Archive (IA)/OCA for scanning our collections. We've just > launched our revamped portal, complete with more than 7,500 books & 2.8 > million pages scanned by IA & other digitization partners, at: > http://www.biodiversitylibrary.org > > To build this portal we ingest metadata from IA. We found their OAI > interface to pull scanned items inconsistently based on date of > scanning, so we switched to using their custom query interface. Here's > an example of a query we fire off: > > http://www.archive.org/services/search.php?query=collection:(biodiversit > y)+AND+updatedate:%5b2007-10-31+TO+2007-11-30%5d+AND+-contributor:(MBLWH > OI%20Library)&limit=10&submit=submit > > This is returning scanned items from the "biodiversity" collection, > updated between 10/31/2007 - 11/30/2007, restricted to one of our > contributing libraries (MBLWHOI Library), and limited to 10 results. > > The results are styled in the browser; view source to see the good > stuff. We use this list to grab the identifiers we've yet to ingest. > > Some background: When a book is scanned through IA/OCA scanning, they > create their own unique identifier (like "annalesacademiae21univ") and > grab a MARC record from the contributing library's catalog. All of the > scanned files, derivatives, and metadata files are stored on IA's > clusters in a directory named with the identifier. > > Steve mentioned using their /details/ directive, then sniffing the page > to get the cluster location and the files for downloading. An easier > method is to use their /download/ directive, as in: > > http://www.archive.org/download/ID$, or in the example above: > http://www.archive.org/download/annalesacademiae21univ > > That automatically does a lookup on the cluster, which means you don't > have to scrape info off pages. You can also address any files within > that directory, as in: > http://www.archive.org/download/annalesacademiae21univ/annalesacademiae2 > 1univ_marc.xml > > The only way to get standard identifiers (ISBN, ISSN, OCLC, LCCN) for > these scanned books is to grab them out of the MARC record. So the > long-winded answer to your question, Tim, is no, there's no simple way > to crossref what IA has scanned with your catalog - THAT I KNOW OF. Big > caveat on that last part. > > Happy to help with any other questions I can, > > Chris Freeland > > > -Original Message- > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of > Steve Toub > Sent: Sunday, February 24, 2008 11:20 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] oca api? > > --- Tim Shearer <[EMAIL PROTECTED]> wrote: > > > Hi Folks, > > > > I'm looking into tapping the texts in the Open Content Alliance. > > > > A few questions... > > > > As near as I can tell, they don't expose (perhaps even store?) any > common > > unique identifiers (oclc number, issn, isbn, loc number). > > I poked around in this world a few months ago in my previous job at > California Digital Library, > also an OCA partner. > > The unique key seems to be text string identifier (one that seems to be > completely different from > the text string identifier in O
Re: [CODE4LIB] oca api?
On Feb 26, 2008, at 12:21 PM, Chris Freeland wrote: The biggest problem we found with the OAI implementation had to do with pulling incremental updates. If you ask for a date range like Dec 1 - 5 you get all of Dec. When we discussed this with IA we were shown the query interface and just decided to use that instead since we're doing mostly incremental updates. Incidentally, I was asked a few months ago about incorporating Open Library and/or Internet Archive material into a service I (barely) maintain called Ockham Alert. I told them I would be happy to do so, but since Ockham Alert relies on OAI date ranges, and their date ranges did not work, I was unable to oblige them. I suppose the date issue with their OAI implementation is a known issue. -- Eric Lease Morgan University Libraries of Notre Dame (574) 631-8604
Re: [CODE4LIB] oca api?
Steve - I'm not sure about the scalability of the query interface, so hopefully someone from IA can comment. The biggest problem we found with the OAI implementation had to do with pulling incremental updates. If you ask for a date range like Dec 1 - 5 you get all of Dec. When we discussed this with IA we were shown the query interface and just decided to use that instead since we're doing mostly incremental updates. The date inconsistency might not be enough to drive folks away from OAI if you're looking to do one-time, or infrequent, harvests. Chris -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Steve Toub Sent: Monday, February 25, 2008 8:41 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] oca api? I'll add that when IA told me about http://www.archive.org/services/search.php interface to return XML, they asked that we not send more than 100 records at time since doing more would adversely affect production services. Which made it seem like OAI-PMH was a better way to go. Chris, can you explain a bit more about what this means: "We found their OAI interface to pull scanned items inconsistently based on date of scanning"? I'm having trouble parsing. --SET --- Chris Freeland <[EMAIL PROTECTED]> wrote: > Jonathan - No, I don't believe it's documented - at least not anywhere > publicly. If any IA/OCA folks are lurking, here's an opportunity to > make a bunch of techies happy... > > Chris > > -Original Message- > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of > Jonathan Rochkind > Sent: Monday, February 25, 2008 2:48 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] oca api? > > I hadn't known this "custom query interface" existed! This is welcome > news. Is this documented anywhere? > > Jonathan > > > >>> Chris Freeland <[EMAIL PROTECTED]> 02/25/08 2:51 PM >>> > Steve & Tim, > > I'm the tech director for the Biodiversity Heritage Library (BHL), which > is a consortium of 10 natural history libraries who have partnered with > Internet Archive (IA)/OCA for scanning our collections. We've just > launched our revamped portal, complete with more than 7,500 books & 2.8 > million pages scanned by IA & other digitization partners, at: > http://www.biodiversitylibrary.org > > To build this portal we ingest metadata from IA. We found their OAI > interface to pull scanned items inconsistently based on date of > scanning, so we switched to using their custom query interface. Here's > an example of a query we fire off: > > http://www.archive.org/services/search.php?query=collection:(biodiversit > y)+AND+updatedate:%5b2007-10-31+TO+2007-11-30%5d+AND+-contributor:(MBLWH > OI%20Library)&limit=10&submit=submit > > This is returning scanned items from the "biodiversity" collection, > updated between 10/31/2007 - 11/30/2007, restricted to one of our > contributing libraries (MBLWHOI Library), and limited to 10 results. > > The results are styled in the browser; view source to see the good > stuff. We use this list to grab the identifiers we've yet to ingest. > > Some background: When a book is scanned through IA/OCA scanning, they > create their own unique identifier (like "annalesacademiae21univ") and > grab a MARC record from the contributing library's catalog. All of the > scanned files, derivatives, and metadata files are stored on IA's > clusters in a directory named with the identifier. > > Steve mentioned using their /details/ directive, then sniffing the page > to get the cluster location and the files for downloading. An easier > method is to use their /download/ directive, as in: > > http://www.archive.org/download/ID$, or in the example above: > http://www.archive.org/download/annalesacademiae21univ > > That automatically does a lookup on the cluster, which means you don't > have to scrape info off pages. You can also address any files within > that directory, as in: > http://www.archive.org/download/annalesacademiae21univ/annalesacademiae2 > 1univ_marc.xml > > The only way to get standard identifiers (ISBN, ISSN, OCLC, LCCN) for > these scanned books is to grab them out of the MARC record. So the > long-winded answer to your question, Tim, is no, there's no simple way > to crossref what IA has scanned with your catalog - THAT I KNOW OF. Big > caveat on that last part. > > Happy to help with any other questions I can, > > Chris Freeland > > > -Original Message- > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of > Steve Toub > Sent: Sunday, February 24, 2008 11:20 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] oca api? > > --- Tim Shearer <[EMAIL PROTECTED]> wrote: > > > Hi Folks, > > > > I'm looking into tapping the texts in the Open Content Alliance. > > > > A few questions... > > > > As near as I can tell, they don't expose (perhaps even store?) any > common > > unique identifiers (oclc number, issn, isbn, loc number). > > I poked around in this world a few months