BTW, I acknowledge the hard work of you guys on my blog: http://paraviya.blogspot.com/2006/02/search-your-intranet-with-nutch.htm l Hope you don't mind how I color coded Nutch :)
Thanks, Thushara -----Original Message----- From: Thushara Wijeratna Sent: Friday, February 10, 2006 5:45 PM To: 'Teruhiko Kurosaka' Cc: [email protected] Subject: RE: Authentication / Content-type Yes, your suggested change seems more appropriate. Let me know if you make the change and want me to test it. BTW, we didn't have a search for our wiki and thanks to Nutch, we have a pretty cool search! I've indexed across 4 different intranet sites and finding stuff is really easy! You guys rock! Thushara -----Original Message----- From: Teruhiko Kurosaka [mailto:[EMAIL PROTECTED] Sent: Friday, February 10, 2006 4:24 PM To: Thushara Wijeratna Cc: [email protected] Subject: RE: Authentication / Content-type Sorry for a late response. Do you mean there are two kinds of headers, one with lowercase "t" and the other with the uppercase "T"? If you mean that, there are more possiblity such as "CONTENT-TYPE", "content-type", or even "cONtenT-tYPe" because HTTP spec says the header field names are case-insensitive. 4.2 Message Headers HTTP header fields,... follow the same generic format as that given in Section 3.1 of RFC 822 [9]. ...Field names are case-insensitive. So the right way seems to change the getHeader method implementation to compare names in a case-insensitive manner. Sorry if I missed your point. -Kuro > -----Original Message----- > From: Thushara Wijeratna [mailto:[EMAIL PROTECTED] > Sent: 2006-1-19 14:08 > To: [email protected] > Subject: Authentication / Content-type > > Hi, > > I used nutch-0.7.1 to index an intranet. It is a really great tool, > thanks for developing it! I had to hack something quick for > Authentication (somehow couldn't get the crawler to accept the > http.auth.basic.user etc). I also found an issue where parsing an html > page returned an error "Content type is xml not html". Turns out that > sometimes the string "Content-Type" is used instead of "Content-type". > So I hacked HttpResponse.java - toContent method like this: > > > > String contentType = getHeader("Content-type"); > > If (contentType == null) { > > contentType = getHeader("Content-Type"); > > } > > Just thought I'll share with you all. ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
