Filtering links from crawldb

Enzo Michelangeli Thu, 24 May 2007 05:23:20 -0700

I understand that "mergedb ... -filter" can be used to remove links that donot meet the filtering requirements of the active URLFilters. However,mergedb operates on the whole crawldb, and can be very slow. Is there a wayof enforcing filtering at updatedb time, preventing the unfetchable linksfrom entering the database in first place?

Similar issue with links that result in HTTP timeouts. How can I get rid ofthem, so that they don't come periodically back to slow down my fetching?


Thanks in advance,

Enzo

Filtering links from crawldb

Reply via email to