Hello all,

I am using nutch 9 and when I fetch a couple of sites nutch does not include 
pages other that the main one.
For example, if I have mysite.com/cv.htm, nutch fetches only mysite.com. It 
does not fetch cv.htm and other files in the site.
I noticed that if I do? bin/nutch generate crawl/crawldb crawl/segments -topN 
1000?
after?
?bin/nutch generate crawl/crawldb crawl/segments 

it includes some of those pages but not all of them.

Is there any way to tell nutch to crawl all the objects in mysite.com

Also, I wondered how to put nutch in a website, let say mysite.com/search?

Thanks in advance.
Alex.



-----Original Message-----
From: payo <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wed, 9 Jan 2008 10:18 am
Subject: Re: subcollections




hi to all

i can configure this part.

1.- agree subcollection plucgin in nutch-site.xml in the tomcat 

Tomcat\webapps\ROOT\WEB-INF\classes\nutch-site.xml

2.- agree label select in te serach.jsp indicating the subcollections

line 147 <form name="search" action="../search.jsp" method="get">
 <SELECT NAME="subcollection">   
   <option selected value=<%=subcoleccion%>><%=subcoleccion%></option> 
   <OPTION VALUE="apache">Apache</OPTION> 
   <OPTION VALUE="nutch">Nutch</OPTION> 
   <OPTION VALUE="xml">XML</OPTION> 
</SELECT>


thanks

-- 
View this message in context: 
http://www.nabble.com/subcollections-tp14373976p14716644.html
Sent from the Nutch - User mailing list archive at Nabble.com.



________________________________________________________________________
More new features than ever.  Check out the new AIM(R) Mail ! - 
http://webmail.aim.com

Reply via email to