Re: [CODE4LIB] Internet Archive collection codes?
Ditto. I've mentioned the issue to others (not Alexis) in the past, but it seems that it took code4lib to find the right person. :) -- Michael B. Klein Digital Initiatives Technology Librarian Boston Public Library (617) 859-2391 [EMAIL PROTECTED] > From: Eric Lease Morgan <[EMAIL PROTECTED]> > Reply-To: "Code for Libraries " > > Date: Thu, 5 Jun 2008 15:53:46 -0400 > To: > Subject: Re: [CODE4LIB] Internet Archive collection codes? > > On Jun 5, 2008, at 3:49 PM, Alexis Rossi wrote: > >> Hmm... well that doesn't seem right at all, does it? Thank you for >> pointing it out, I've sent this along to our petabox team to see if >> they >> can put up the correct error codes. > > > > And I appreciate that you, Alexis, "the vendor", are actively > listening and participating. > > internet_archive++ > > -- > Eric Lease Morgan
Re: [CODE4LIB] Internet Archive collection codes?
On Jun 5, 2008, at 3:49 PM, Alexis Rossi wrote: Hmm... well that doesn't seem right at all, does it? Thank you for pointing it out, I've sent this along to our petabox team to see if they can put up the correct error codes. And I appreciate that you, Alexis, "the vendor", are actively listening and participating. internet_archive++ -- Eric Lease Morgan
Re: [CODE4LIB] Internet Archive collection codes?
Hi Michael, Hmm... well that doesn't seem right at all, does it? Thank you for pointing it out, I've sent this along to our petabox team to see if they can put up the correct error codes. Alexis Klein, Michael wrote: Peter, I've seen no official information or documentation from the Internet Archive either. I've actually been quite frustrated by several issues for a while now. For example: If you go to http://www.archive.org/details/nonexistentidentifier you'll get a human-readable web page stating that the item cannot be found. That page, however, is served up with an HTTP status of 200 OK, not 404 NOT FOUND. In addition, I've noticed that when certain requests fail due to system load and other issues, I get back an HTML page saying something like "the system is experiencing slowness," but again with a 200 OK instead of a 503 SERVICE UNAVAILABLE (ideally with a Retry-After header). These things alone make it extremely difficult to automate any large-scale metadata retrieval from the Internet Archive, and that's without any attempt to download content. I'm working on a post documenting some of the techniques and strategies that have worked for us, but it's not quite ready for human consumption yet. Michael -- Michael B. Klein Digital Initiatives Technology Librarian Boston Public Library [EMAIL PROTECTED] From: "Binkley, Peter" <[EMAIL PROTECTED]> Reply-To: "Code for Libraries " Date: Thu, 5 Jun 2008 13:08:13 -0600 To: Conversation: [CODE4LIB] Internet Archive collection codes? Subject: Re: [CODE4LIB] Internet Archive collection codes? While we're on the subject, are there any more up-to-date instructions for harvesting from Internet Archive than these? http://biodiversitylibrary.blogspot.com/2008/03/harvesting-process-from- internet_14.html And does IA provide guidelines for harvesting (traffic limits etc.)? I clicked around the site a bit and didn't find them, but could easily have missed them. Peter
Re: [CODE4LIB] Internet Archive collection codes?
Peter, I've seen no official information or documentation from the Internet Archive either. I've actually been quite frustrated by several issues for a while now. For example: If you go to http://www.archive.org/details/nonexistentidentifier you'll get a human-readable web page stating that the item cannot be found. That page, however, is served up with an HTTP status of 200 OK, not 404 NOT FOUND. In addition, I've noticed that when certain requests fail due to system load and other issues, I get back an HTML page saying something like "the system is experiencing slowness," but again with a 200 OK instead of a 503 SERVICE UNAVAILABLE (ideally with a Retry-After header). These things alone make it extremely difficult to automate any large-scale metadata retrieval from the Internet Archive, and that's without any attempt to download content. I'm working on a post documenting some of the techniques and strategies that have worked for us, but it's not quite ready for human consumption yet. Michael -- Michael B. Klein Digital Initiatives Technology Librarian Boston Public Library [EMAIL PROTECTED] > From: "Binkley, Peter" <[EMAIL PROTECTED]> > Reply-To: "Code for Libraries " > > Date: Thu, 5 Jun 2008 13:08:13 -0600 > To: > Conversation: [CODE4LIB] Internet Archive collection codes? > Subject: Re: [CODE4LIB] Internet Archive collection codes? > > While we're on the subject, are there any more up-to-date instructions > for harvesting from Internet Archive than these? > http://biodiversitylibrary.blogspot.com/2008/03/harvesting-process-from- > internet_14.html > > And does IA provide guidelines for harvesting (traffic limits etc.)? I > clicked around the site a bit and didn't find them, but could easily > have missed them. > > Peter
Re: [CODE4LIB] Internet Archive collection codes?
While we're on the subject, are there any more up-to-date instructions for harvesting from Internet Archive than these? http://biodiversitylibrary.blogspot.com/2008/03/harvesting-process-from- internet_14.html And does IA provide guidelines for harvesting (traffic limits etc.)? I clicked around the site a bit and didn't find them, but could easily have missed them. Peter
Re: [CODE4LIB] Internet Archive collection codes?
Andrew, I'm not sure this is the same thing that you were told about, but what I discovered for IA after Jonathan sent out his message is here: http://www.archive.org/advancedsearch.php#col2 It is just a redirect to a search of a Solr index so it ought to be easy for you to see what's going on. Do note that the address of the Solr may change, so you'll want to use the bookmark link. You'll see that changing the xmlsearch param from bookmark to Search bypasses the bookmark page. Using this you can find all collection identifiers and names as Alexis points out. Jason On Wed, Jun 4, 2008 at 4:27 PM, Andrew Nagy <[EMAIL PROTECTED]> wrote: > Excuse me if I am late to the game on this one - but at the Code4Lib > conference either Brewster Kahle or Aaron Swartz spoke about an API to either > the open library or the internet archive. Is this available, or any plans to > release this? It seems like you are referring to some sort of API. > > Andrew > >> -Original Message- >> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of >> [Alexis Rossi] >> Sent: Tuesday, June 03, 2008 10:58 PM >> To: CODE4LIB@LISTSERV.ND.EDU >> Subject: Re: [CODE4LIB] Internet Archive collection codes? >> >> Hi, >> >> You can do a search for mediatype:collection to return results for all >> 4200+ collections. >> >> We have a search interface that will return specific fields for this >> query >> in xml format, if you'd like, but I'll need to give you some >> permissions >> to access it. Feel free to send me an email if you'd like to use that >> ([EMAIL PROTECTED]). >> >> Alexis >> >> >> >> >> > Does anyone know where to get a list of Internet Archive collection >> > codes and their human-displayable display labels? >> > >> > For instance: >> > americana => "American Libraries" >> > gutenberg => "Project Gutenberg" >> > librivoxaudio => [hell if I know] >> > >> > >> > Some of these I can 'scrape' from the quick search box popup on the >> IA >> > website. But their not all in there. And maybe there's a better place >> to >> > get these? >> > >> > Anyone know where the right place to ask this of the IA and/or IA >> > developer community is? >> > >> > Jonathan >> > >
Re: [CODE4LIB] Internet Archive collection codes? [open library api]
On Jun 4, 2008, at 4:27 PM, Andrew Nagy wrote: Excuse me if I am late to the game on this one - but at the Code4Lib conference either Brewster Kahle or Aaron Swartz spoke about an API to either the open library or the internet archive. Is this available, or any plans to release this? It seems like you are referring to some sort of API. Yes, I believe the API for Open Library can be found at: http://demo.openlibrary.org:8080/dev/docs/api -- Eric Lease Morgan
Re: [CODE4LIB] Internet Archive collection codes?
Excuse me if I am late to the game on this one - but at the Code4Lib conference either Brewster Kahle or Aaron Swartz spoke about an API to either the open library or the internet archive. Is this available, or any plans to release this? It seems like you are referring to some sort of API. Andrew > -Original Message- > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of > [Alexis Rossi] > Sent: Tuesday, June 03, 2008 10:58 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Internet Archive collection codes? > > Hi, > > You can do a search for mediatype:collection to return results for all > 4200+ collections. > > We have a search interface that will return specific fields for this > query > in xml format, if you'd like, but I'll need to give you some > permissions > to access it. Feel free to send me an email if you'd like to use that > ([EMAIL PROTECTED]). > > Alexis > > > > > > Does anyone know where to get a list of Internet Archive collection > > codes and their human-displayable display labels? > > > > For instance: > > americana => "American Libraries" > > gutenberg => "Project Gutenberg" > > librivoxaudio => [hell if I know] > > > > > > Some of these I can 'scrape' from the quick search box popup on the > IA > > website. But their not all in there. And maybe there's a better place > to > > get these? > > > > Anyone know where the right place to ask this of the IA and/or IA > > developer community is? > > > > Jonathan > >
Re: [CODE4LIB] Internet Archive collection codes?
Jonathan, I can't answer your question but, Librivoxaudio is LibriVox, a project to record books not in copyright. They have over 1500 audio books to download and use freely. http://librivox.org/ Sincerely, David Bigwood [EMAIL PROTECTED] http://catalogablog.blogspot.com Twitter LPI_Library -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Jonathan Rochkind Sent: Tuesday, June 03, 2008 7:22 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Internet Archive collection codes? Does anyone know where to get a list of Internet Archive collection codes and their human-displayable display labels? For instance: americana => "American Libraries" gutenberg => "Project Gutenberg" librivoxaudio => [hell if I know] Some of these I can 'scrape' from the quick search box popup on the IA website. But their not all in there. And maybe there's a better place to get these? Anyone know where the right place to ask this of the IA and/or IA developer community is? Jonathan
Re: [CODE4LIB] Internet Archive collection codes?
Hi, You can do a search for mediatype:collection to return results for all 4200+ collections. We have a search interface that will return specific fields for this query in xml format, if you'd like, but I'll need to give you some permissions to access it. Feel free to send me an email if you'd like to use that ([EMAIL PROTECTED]). Alexis > Does anyone know where to get a list of Internet Archive collection > codes and their human-displayable display labels? > > For instance: > americana => "American Libraries" > gutenberg => "Project Gutenberg" > librivoxaudio => [hell if I know] > > > Some of these I can 'scrape' from the quick search box popup on the IA > website. But their not all in there. And maybe there's a better place to > get these? > > Anyone know where the right place to ask this of the IA and/or IA > developer community is? > > Jonathan >
[CODE4LIB] Internet Archive collection codes?
Does anyone know where to get a list of Internet Archive collection codes and their human-displayable display labels? For instance: americana => "American Libraries" gutenberg => "Project Gutenberg" librivoxaudio => [hell if I know] Some of these I can 'scrape' from the quick search box popup on the IA website. But their not all in there. And maybe there's a better place to get these? Anyone know where the right place to ask this of the IA and/or IA developer community is? Jonathan