[email protected] pisze:
Hello,
I use nutch-0.9 and try to index urls with ? and & symbols. I have commented
this line? -[...@=] in conf/crawl-urlfilter.txt, conf/automaton-urlfilter and
conf/regex-urlfilter.txt files.
However nutch still ignores these urls.
Does anyone know how this can be fixed?
Thanks in advance.
A.
Hi,
If you commented out those line it should be fine. That part is correct
so problem is somewhere else.
I must give us more information like:
- does your nutch crawles and index "normal" URL's (without ? and &)
- are you crawling domains that are NOT blocked in crawl-urlfilter
- is robots.txt on this domain doesn't block your url's
- are your talking about one specific domain or many different?
Thanks,
Bartosz