I understand that "mergedb ... -filter" can be used to remove links that do not meet the filtering requirements of the active URLFilters. However, mergedb operates on the whole crawldb, and can be very slow. Is there a way of enforcing filtering at updatedb time, preventing the unfetchable links from entering the database in first place?
Similar issue with links that result in HTTP timeouts. How can I get rid of them, so that they don't come periodically back to slow down my fetching? Thanks in advance, Enzo ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
