I understand that "mergedb ... -filter" can be used to remove links that do
not meet the filtering requirements of the active URLFilters. However,
mergedb operates on the whole crawldb, and can be very slow. Is there a way
of enforcing filtering at updatedb time, preventing the unfetchable links
from entering the database in first place?
Similar issue with links that result in HTTP timeouts. How can I get rid of
them, so that they don't come periodically back to slow down my fetching?
Thanks in advance,
Enzo