Chirag said: > Can you send me the link to a page that has this problem -- I'll run some > tests to see what's causing this.
Unless I'm reading this thread incorrectly, the following sites share this malady: fetch okay, but can't parse http://www.tea.state.tx.us/waivers/granted.html, reason: Content-Type not application/msword: When it very rarely has doc extension. In a fetch of some 200,000 pages I got this about 3,000 times. Here's some other pages with this malady: http://www.commerce.ubc.ca/ (a redirect) http://www.siue.edu/BUSINESS/econfin/ http://www.nmhu.edu/business/ http://www.juntadeandalucia.es/economiayhacienda/ - Bill -- *------------------------------------------------------* | Bill Goffe [EMAIL PROTECTED] | | Department of Economics voice: (315) 312-3444 | | SUNY Oswego fax: (315) 312-5444 | | 416 Mahar Hall <wuecon.wustl.edu/~goffe> | | Oswego, NY 13126 | *--------*------------------------------------------------------*-----------* | "Close to half of the teachers report spending `a great deal' of time | | preparing their students in test-taking skills." | | -- "Survey: Educators worry about 'teaching to test,'" January 10, 2001 | | <http://www.cnn.com/2001/US/01/10/standardized.test/index.html> | *---------------------------------------------------------------------------* ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
