I have written a custom URLFilter that resolves the hostname into an IP address and checks the latter against a GeoIP database. Unfortunately the source code was developed under a commercial contract, and is not freely available.
Enzo ----- Original Message ----- From: "Cesar Voulgaris" <[EMAIL PROTECTED]> To: "nutch user" <[EMAIL PROTECTED]> Sent: Monday, June 11, 2007 9:24 AM Subject: crawling by ip range Hi all, I have some problem for some time, I want to crawl only sites of my country or related to it. The problem is that crawling only by domain (in my case I set teh regex-urlfiter regex to cath "(com|org|..).uy") lives out a lot of sites wich doesn,t end in .uy but in .com .org, .... I don´t want to crawl to a certain depth and expand the crawled pages outside the country. Is ther any clever method to crawl over a range of ip´s without touching the code?. If not, which plugin or extension point I have to extend to consider such thing as ip checking for a gven url? thanks in advance ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
