Hey all,
 I have played around with the HTTPResponse object for a few days now trying
to figure this out. Not the httpclient plugin, just the http plugin.
It seems that certain rss feeds don't get fully read.  here is an example
url: http://blog.news-record.com/sportsextra/index.xml

It does not seem to happen on all of my feeds, just some of them.  Let's say
the content-length comes back as 5K, well the response may read something
like 3K, but then return -1 (EOF) and the response just goes on. No timeout
exception, no exception at all. 
I have tried so many different things. Adding in sleeps to pause and then
try and keep reading data. I have tried switching to httpclient, and it does
the same thing.  The weird thing, I put the url into my browser and it loads
fine.

So, the question is, has anyone run into the socket not really returning all
data without throwing an exception? Or, can someone try the above url and
see if they also run into the issue?
I have more example urls.  The only connection I seem to find, is that they
all map to
application/xhtml+xml

Thoughts anyone?
Scott
-- 
View this message in context: 
http://www.nabble.com/httpresponse-%2B-xml-%3D-not-reading-all-bytes-tf3146593.html#a8722984
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to