On 7/27/07, Carl Cerecke <[EMAIL PROTECTED]> wrote: > Carl Cerecke wrote: > > The problem is that the contentType for the page (that it was redirected > > to) is null. > > > > Changing Content.java:165 to: > > Text.writeString(out, contentType != null ? contentType : ""); // write > > contentType > > > > fixes the problem. But is empty string better for an unknown content > > type or something like text/x-unknown-content-type > > rfc2616 (HTTP/1.1) section 7.2.1 > (http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html) says: > > Any HTTP/1.1 message containing an entity-body SHOULD include a > Content-Type header field defining the media type of that body. If and > only if the media type is not given by a Content-Type field, the > recipient MAY attempt to guess the media type via inspection of its > content and/or the name extension(s) of the URI used to identify the > resource. If the media type remains unknown, the recipient SHOULD treat > it as type "application/octet-stream".
Thanks for the explanation. I think marking unknown content types as "application/octet-stream" sounds good. Can you open a JIRA issue and attach a patch against trunk, so we can discuss it further? ( here is a tutorial on how you can create patches: http://wiki.apache.org/nutch/HowToContribute ) > > Cheers, > Carl. > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
