On Wed, 2006-03-08 at 19:15 +0100, Andrzej Bialecki wrote: > Doug Cutting wrote: > > [EMAIL PROTECTED] wrote: > >> Don't generate URLs that don't pass URLFilters. > > > > Just to be clear, this is to support folks changing their filters > > while they're crawling, right? We already filter before we > > Yes, and this seems to be the most common case. This is especially > important since there are no tools yet to clean up the DB.
I have this situation now. There are over 100M urls in my DB from crap domains that I want to get rid of. Adding a --refilter option to updatedb seemed like the most obvious course of action. A completely separate command so it could be initiated by hand would also work for me. -- Rod Taylor <[EMAIL PROTECTED]>
