Doug Cutting wrote:
[EMAIL PROTECTED] wrote:
Don't generate URLs that don't pass URLFilters.

Just to be clear, this is to support folks changing their filters while they're crawling, right? We already filter before we

Yes, and this seems to be the most common case. This is especially important since there are no tools yet to clean up the DB.

put things into the db, so we're filtering twice now, no? If so, then perhaps there should be an option to disable this second filtering for folks who don't change their filters?

IMHO doing this here has a minimal impact while preventing a common problem, but if you think this would harm many users then we should of course make it optional.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to