"Insurance Squared Inc." <[EMAIL PROTECTED]> writes:

> - Can nutch only crawl specific TLD's?  (i.e. like .it, or .uk.com).  My 
> suspicion is that I could easily modify nutch to do this.

You could use regex-urlfilter. Put something like this in
conf/regex-urlfilter.txt:

+^http://.*\.tld/

Don't forget to remove the "+." line.

> - Can I run crawlers on two seperate machines, then merge the results 
> for search?  I'm guessing yes, just looking for confirmation.

Yes.

> - If I only use a specific TLD, I think I would need a 'submit your 
> site' function.  Does nutch do this?  I didn't see it in our install, 
> wondering if it's a common practice.

AFAIK you have to write such a function yourself (unless someone already
did it). But it should be pretty simple, just inject the submitted URL
(maybe after a sanity check).


-- 
\  /                                       [EMAIL PROTECTED]
 \/lad                                     http://www.hashbang.de 


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to