Hey all, I have played around with the HTTPResponse object for a few days now trying to figure this out. Not the httpclient plugin, just the http plugin. It seems that certain rss feeds don't get fully read. here is an example url: http://blog.news-record.com/sportsextra/index.xml
It does not seem to happen on all of my feeds, just some of them. Let's say the content-length comes back as 5K, well the response may read something like 3K, but then return -1 (EOF) and the response just goes on. No timeout exception, no exception at all. I have tried so many different things. Adding in sleeps to pause and then try and keep reading data. I have tried switching to httpclient, and it does the same thing. The weird thing, I put the url into my browser and it loads fine. So, the question is, has anyone run into the socket not really returning all data without throwing an exception? Or, can someone try the above url and see if they also run into the issue? I have more example urls. The only connection I seem to find, is that they all map to application/xhtml+xml Thoughts anyone? Scott -- View this message in context: http://www.nabble.com/httpresponse-%2B-xml-%3D-not-reading-all-bytes-tf3146593.html#a8722984 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
