Jonathan, Amazon.com doesn't seem to allow HEAD requests -- it returns a 405 METHOD NOT ALLOWED status. What's more, GET responses don't seem to include Content-Length headers.
One thing I've noticed, though is that the "unavailable" response doesn't include a <title> element, while the regular reader does. You may be able to come up with a way to make that quicker and more reliable than grepping the full text. Michael -- Michael B. Klein Digital Initiatives Technology Librarian Boston Public Library (617) 859-2391 [EMAIL PROTECTED] > From: Jonathan Rochkind <[EMAIL PROTECTED]> > Reply-To: "Code for Libraries <CODE4LIB@LISTSERV.ND.EDU>" > <CODE4LIB@LISTSERV.ND.EDU> > Date: Fri, 27 Jun 2008 12:00:54 -0400 > To: <CODE4LIB@LISTSERV.ND.EDU> > Subject: Re: [CODE4LIB] Amazon Web Services and search-inside-the-book > > Excellent, thanks Charles. > > I can tell you that my technique seems to be working fine, if you want > to try it too. > > Construct a URL: > > http://www.amazon.com/gp/reader/ASIN > > Requset the URL. Grep the response for "book is temporarily > unavailable"--if you get it, there's no search inside the book. If you > don't get it, there is search inside the book. (Sadly, it's still a 200 > HTTP status in response, either way). > > I want to look at if I can just do a HEAD request and tell the > difference between presence and absence of search inside by the > advertised length of the response. That's Terry Reese's preferred way of > doing a check for legitimate content at the end of a URL, trying to > guess from content length with just a HEAD request. Not sure if that > will work here or not. Would potentially be somewhat more efficient if > it would. > > Jonathan > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 > rochkind (at) jhu.edu