Is there a way of parallelizing URLFiltering over multiple threads? After 
all, the URLFilters themselves must already be thread-safe, or else they 
would have problems during fetching.

The reason why I'm asking is I have a custom URLFilter that needs to make 
calls to the DNS resolver, and multi-threading the URLFiltering would 
greatly speed up some filtering procedures that, unlike fetching, appear to 
be single-threaded: "mergedb -filter", inject, generate, "updatedb -filter" 
etc. (The most important is of course "generate" or, even better, 
"updatedb -filter" to prevent undesired URL's to reach the crawldb in first 
place).

Enzo
 


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to