I always like answering my own questions =) So, the way I fixed this was to hack at the HttpResponse object in the http protocol.
Basically, I added Pragma nocache headers keep alive and keep alive connection time values a last modified since header All of that seemed to work well. Then, I also found another issue, in that we were not looking for transfer encoding of "chunked" So, if that came in, then I sent the stream to the readChunkedEncoding method. All of my feed readers seem to work now. Now I just have issues with the Fetcher (and Fetcher2) of blocking on socket.read (s) 1-5 threads seem to work fine, but I get thread waits after I start passing the 10 thread mark. very strange/weird sdeck wrote: > > Hey all, > I have played around with the HTTPResponse object for a few days now > trying to figure this out. Not the httpclient plugin, just the http > plugin. > It seems that certain rss feeds don't get fully read. here is an example > url: http://blog.news-record.com/sportsextra/index.xml > > It does not seem to happen on all of my feeds, just some of them. Let's > say the content-length comes back as 5K, well the response may read > something like 3K, but then return -1 (EOF) and the response just goes on. > No timeout exception, no exception at all. > I have tried so many different things. Adding in sleeps to pause and then > try and keep reading data. I have tried switching to httpclient, and it > does the same thing. The weird thing, I put the url into my browser and > it loads fine. > > So, the question is, has anyone run into the socket not really returning > all data without throwing an exception? Or, can someone try the above url > and see if they also run into the issue? > I have more example urls. The only connection I seem to find, is that > they all map to > application/xhtml+xml > > Thoughts anyone? > Scott > -- View this message in context: http://www.nabble.com/httpresponse-%2B-xml-%3D-not-reading-all-bytes-tf3146593.html#a8774451 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
