I always like answering my own questions =)
So, the way I fixed this was to hack at the HttpResponse object in the http
protocol.

Basically, I added Pragma nocache headers
keep alive and keep alive connection time values
a last modified since header
All of that seemed to work well.
Then, I also found another issue, in that we were not looking for transfer
encoding of "chunked" So, if that came in, then I sent the stream to the
readChunkedEncoding method.
All of my feed readers seem to work now.

Now I just have issues with the Fetcher (and Fetcher2) of blocking on
socket.read (s)
1-5 threads seem to work fine, but I get thread waits after I start passing
the 10 thread mark. very strange/weird



sdeck wrote:
> 
> Hey all,
>  I have played around with the HTTPResponse object for a few days now
> trying to figure this out. Not the httpclient plugin, just the http
> plugin.
> It seems that certain rss feeds don't get fully read.  here is an example
> url: http://blog.news-record.com/sportsextra/index.xml
> 
> It does not seem to happen on all of my feeds, just some of them.  Let's
> say the content-length comes back as 5K, well the response may read
> something like 3K, but then return -1 (EOF) and the response just goes on.
> No timeout exception, no exception at all. 
> I have tried so many different things. Adding in sleeps to pause and then
> try and keep reading data. I have tried switching to httpclient, and it
> does the same thing.  The weird thing, I put the url into my browser and
> it loads fine.
> 
> So, the question is, has anyone run into the socket not really returning
> all data without throwing an exception? Or, can someone try the above url
> and see if they also run into the issue?
> I have more example urls.  The only connection I seem to find, is that
> they all map to
> application/xhtml+xml
> 
> Thoughts anyone?
> Scott
> 

-- 
View this message in context: 
http://www.nabble.com/httpresponse-%2B-xml-%3D-not-reading-all-bytes-tf3146593.html#a8774451
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to