Hello, I have one specific domain. I tested further and it looks like nutch? fetches this domain's other links but the ones with ?. Also nutch fetches other domains with ? symbol.
How to know if robots.txt on this domain blocks this specific links to be fetched? Thanks. A. -----Original Message----- From: Bartosz Gadzimski <[email protected]> To: [email protected] Sent: Sun, 1 Mar 2009 11:13 am Subject: Re: urls with ? and & symbols [email protected] pisze:? > Hello,? >? > I use nutch-0.9 and try to index urls with ? and & symbols. I have commented > this line? -[...@=] in conf/crawl-urlfilter.txt, conf/automaton-urlfilter and > conf/regex-urlfilter.txt files.? > However nutch still ignores these urls.? >? > Does anyone know how this can be fixed?? >? > Thanks in advance.? > A.? >? >? > >? >? >? >? > Hi,? ? If you commented out those line it should be fine. That part is correct so problem is somewhere else.? ? I must give us more information like:? - does your nutch crawles and index "normal" URL's (without ? and &)? - are you crawling domains that are NOT blocked in crawl-urlfilter? - is robots.txt on this domain doesn't block your url's? - are your talking about one specific domain or many different?? ? Thanks,? Bartosz?
