"Insurance Squared Inc." <[EMAIL PROTECTED]> writes: > - Can nutch only crawl specific TLD's? (i.e. like .it, or .uk.com). My > suspicion is that I could easily modify nutch to do this.
You could use regex-urlfilter. Put something like this in conf/regex-urlfilter.txt: +^http://.*\.tld/ Don't forget to remove the "+." line. > - Can I run crawlers on two seperate machines, then merge the results > for search? I'm guessing yes, just looking for confirmation. Yes. > - If I only use a specific TLD, I think I would need a 'submit your > site' function. Does nutch do this? I didn't see it in our install, > wondering if it's a common practice. AFAIK you have to write such a function yourself (unless someone already did it). But it should be pretty simple, just inject the submitted URL (maybe after a sanity check). -- \ / [EMAIL PROTECTED] \/lad http://www.hashbang.de ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
