Doug Cutting wrote:
[EMAIL PROTECTED] wrote:
Don't generate URLs that don't pass URLFilters.
Just to be clear, this is to support folks changing their filters
while they're crawling, right? We already filter before we
Yes, and this seems to be the most common case. This is especially
important since there are no tools yet to clean up the DB.
put things into the db, so we're filtering twice now, no? If so, then
perhaps there should be an option to disable this second filtering for
folks who don't change their filters?
IMHO doing this here has a minimal impact while preventing a common
problem, but if you think this would harm many users then we should of
course make it optional.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com