[Nutch-general] Re: Nutch0.6 and Nutch 0.7 crawlers

Andrzej Bialecki Wed, 12 Apr 2006 13:49:15 -0700

eric park wrote:

hello, the problem is they are not unwanted URLS.
I crawled on the site 'www.qmind.co.kr'. I found that the nutch7.0 crawler
works just fine in first depth. However in second depth,  it filters out any
links that start with 'www.qmind.co.kr'.  It only crawls urls starting with
'qmind.co.kr'.  I can't figure out why it filters out urls starting with
'www' in second depth. Nutch 6.0 works just fine. Are there any known bugs
in Nutch7.0 crawler?

Could you please show us your URL filters configuration (I presume youare using the regex-urlfilter, then it's the regex-urlfilter.txt file).


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: Nutch0.6 and Nutch 0.7 crawlers

Reply via email to