Bugs item #988325, was opened at 2004-07-09 18:41 Message generated for change (Comment added) made by cutting You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=491356&aid=988325&group_id=59548
Category: None Group: None >Status: Open >Resolution: None Priority: 2 Submitted By: Fabio Gasparetti (magnum74) Assigned to: Nobody/Anonymous (nobody) Summary: case insensitive hostname Initial Comment: The RegexURLFilter does not consider case insensitive hostnames. If you have two links in your site: mysite.net/ and MySite.net/, you need to specify something like [Mm][Yy][Ss]... in the urlfilter.txt file to catch both of them. Perhaps just a simple remainder in the accept host comment would be useful. ---------------------------------------------------------------------- >Comment By: Doug Cutting (cutting) Date: 2004-07-10 13:08 Message: Logged In: YES user_id=21778 You're right, I spoke too soon. The URL has not yet been normalized at this point. I think the best fix is to normalize the link in the Outlink constructor, as is done in Page.java. ---------------------------------------------------------------------- Comment By: Fabio Gasparetti (magnum74) Date: 2004-07-10 10:38 Message: Logged In: YES user_id=666942 Yeah, but as far as I see in the source I guess that the normalization happens in the Link constructor, when the url has been already filtered by the call: URLFilterFactory.getFilter().filter(url); in the UpdateDatabaseTool.pageContentsChanged() method. ---------------------------------------------------------------------- Comment By: Doug Cutting (cutting) Date: 2004-07-10 08:41 Message: Logged In: YES user_id=21778 Nutch always normalizes hostnames to lowercase before filtering them, so checking is already case-insensitive. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=491356&aid=988325&group_id=59548 ------------------------------------------------------- This SF.Net email sponsored by Black Hat Briefings & Training. Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
