> I'm not sure if that is the right thing.
> If the site administrator did a poort job and a wrong media type is
> advertized, it's the site
> problem and Nutch shouldn't be fixing it, in my opinion.  Those sites
> would
> not work properly with the browsers any way, and Nutch doesn't need to
> work properly
> except that it should protect itself from crashing.  I tried to visit your
> fake.zip page with
> IE and Firefox, and both faithfully trusted the media type as advertised
> by the server, and
> asked me if I want to open it with WinZip or save it; there was no option
> to open it as an HTML.
> Why should Nutch treat it as HTML?

Simply because it is a HTML file, with a strange name, of course, but it is
a HTML file.
My example is a kind of "caricature". But some more real case could be : a
HTML file with a text/plain content-type, or with an text/xml
Finaly it is a good news that Nutch seems to be more "intelligent" on
content-type guessing than Firefox or IE, no?

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to