Hi, I am somehow facing a strange problem using regex for urls mentioned in crawl-urlfilter.txt. Before using any regx for urls, I test them in a standalone class and they work correctly i.e. pattern.matcher(url).find() returns true. But when the same url and regex is used during crawling, it returns false. I am not sure how it behaves differently. Let me give an example
RegEx in crawl-urlfilter.txt : ^http://bangalore.locanto.in/(used-cars|ID_\\d+)/((\\d*/(\\d+/)*)|(.*.html)) URL: http://bangalore.locanto.in/used-cars/902/ During standalone testing(not in nutch environment), attern.matcher(url).find() returns true. However in nucth environment it returns false. Appreciate your help on this. - RB __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
