Hi there

I'm experiencing a recurring url for this example lets call
it xyz.com.

I've added a regexfilter so that it would be excluded from
any crawls as well as added it to the banned-hosts file and
pruned the segments regularly for any reference to the
domain however which each and every fetch I'm seeing the
url reappear a few times. This is one of those sites that
have a "nocache tag" (xyz.com/adasdasd.asp?nc=329084723
etc) in the url which thus creates thousands of pages to
crawl for a 6 page site. 

Any ideas?

Thanks
_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote


-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy.  
Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to