problem with RegExURLFilter class

ajaxtrend Sun, 19 Oct 2008 22:55:31 -0700

Hi,
   I am somehow facing a strange problem using regex for urls mentioned in 
crawl-urlfilter.txt. Before using any regx for urls, I test them in a 
standalone class and they work correctly i.e. pattern.matcher(url).find() 
returns true.
But when the same url and regex is used during crawling, it returns false. I am 
not sure how it behaves differently.
Let me give an example


RegEx in crawl-urlfilter.txt :

^http://bangalore.locanto.in/(used-cars|ID_\\d+)/((\\d*/(\\d+/)*)|(.*.html))

URL: http://bangalore.locanto.in/used-cars/902/

During standalone testing(not in nutch environment), attern.matcher(url).find() 
returns true. However in nucth environment it returns false.

Appreciate your help on this.

- RB

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

problem with RegExURLFilter class

Reply via email to