eric park wrote:
hello, the problem is they are not unwanted URLS.
I crawled on the site 'www.qmind.co.kr'. I found that the nutch7.0 crawler
works just fine in first depth. However in second depth,  it filters out any
links that start with 'www.qmind.co.kr'.  It only crawls urls starting with
'qmind.co.kr'.  I can't figure out why it filters out urls starting with
'www' in second depth. Nutch 6.0 works just fine. Are there any known bugs
in Nutch7.0 crawler?

Could you please show us your URL filters configuration (I presume you are using the regex-urlfilter, then it's the regex-urlfilter.txt file).

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to