On 7/27/07, Carl Cerecke <[EMAIL PROTECTED]> wrote:
> Carl Cerecke wrote:
> > The problem is that the contentType for the page (that it was redirected
> > to) is null.
> >
> > Changing Content.java:165 to:
> > Text.writeString(out, contentType != null ? contentType : ""); // write
> > contentType
> >
> > fixes the problem. But is empty string better for an unknown content
> > type or something like text/x-unknown-content-type
>
> rfc2616 (HTTP/1.1) section 7.2.1
> (http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html) says:
>
> Any HTTP/1.1 message containing an entity-body SHOULD include a
> Content-Type header field defining the media type of that body. If and
> only if the media type is not given by a Content-Type field, the
> recipient MAY attempt to guess the media type via inspection of its
> content and/or the name extension(s) of the URI used to identify the
> resource. If the media type remains unknown, the recipient SHOULD treat
> it as type "application/octet-stream".

Thanks for the explanation. I think marking unknown content types as
"application/octet-stream" sounds good. Can you open a JIRA issue and
attach a patch against trunk, so we can discuss it further? ( here is
a tutorial on how you can create patches:
http://wiki.apache.org/nutch/HowToContribute )

>
> Cheers,
> Carl.
>


-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to