BTW, I acknowledge the hard work of you guys on my blog:
http://paraviya.blogspot.com/2006/02/search-your-intranet-with-nutch.htm
l
Hope you don't mind how I color coded Nutch :)

Thanks,
Thushara

-----Original Message-----
From: Thushara Wijeratna 
Sent: Friday, February 10, 2006 5:45 PM
To: 'Teruhiko Kurosaka'
Cc: [email protected]
Subject: RE: Authentication / Content-type 

Yes, your suggested change seems more appropriate. Let me know if you
make the change and want me to test it.

BTW, we didn't have a search for our wiki and thanks to Nutch, we have a
pretty cool search! I've indexed across 4 different intranet sites and
finding stuff is really easy! You guys rock!

Thushara

-----Original Message-----
From: Teruhiko Kurosaka [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 4:24 PM
To: Thushara Wijeratna
Cc: [email protected]
Subject: RE: Authentication / Content-type 

Sorry for a late response.
 
Do you mean there are two kinds of headers, one with lowercase "t"
and the other with the uppercase "T"?  If you mean that,
there are more possiblity such as "CONTENT-TYPE", "content-type",
or even "cONtenT-tYPe" because HTTP spec says the header field names
are case-insensitive.

4.2 Message Headers

HTTP header fields,...  follow the same generic format as that given in
Section 3.1 
of RFC 822 [9]. ...Field names are case-insensitive. 


So the right way seems to change the getHeader method implementation
to compare names in a case-insensitive manner.

Sorry if I missed your point.

-Kuro

> -----Original Message-----
> From: Thushara Wijeratna [mailto:[EMAIL PROTECTED] 
> Sent: 2006-1-19 14:08
> To: [email protected]
> Subject: Authentication / Content-type 
> 
> Hi,
> 
> I used nutch-0.7.1 to index an intranet. It is a really great tool,
> thanks for developing it! I had to hack something quick for
> Authentication (somehow couldn't get the crawler to accept the
> http.auth.basic.user etc). I also found an issue where parsing an html
> page returned an error "Content type is xml not html". Turns out that
> sometimes the string "Content-Type" is used instead of "Content-type".
> So I hacked HttpResponse.java - toContent method like this:
> 
>  
> 
>             String contentType = getHeader("Content-type");
> 
>             If (contentType == null) {
> 
>                         contentType = getHeader("Content-Type");
> 
>             }
> 
> Just thought I'll share with you all.



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to