On Wed, 2006-03-08 at 19:15 +0100, Andrzej Bialecki wrote:
> Doug Cutting wrote:
> > [EMAIL PROTECTED] wrote:
> >> Don't generate URLs that don't pass URLFilters.
> >
> > Just to be clear, this is to support folks changing their filters 
> > while they're crawling, right?  We already filter before we 
> 
> Yes, and this seems to be the most common case. This is especially 
> important since there are no tools yet to clean up the DB.

I have this situation now. There are over 100M urls in my DB from crap
domains that I want to get rid of.

Adding a --refilter option to updatedb seemed like the most obvious
course of action.

A completely separate command so it could be initiated by hand would
also work for me.

-- 
Rod Taylor <[EMAIL PROTECTED]>

Reply via email to