Hello all, I am using nutch 9 and when I fetch a couple of sites nutch does not include pages other that the main one. For example, if I have mysite.com/cv.htm, nutch fetches only mysite.com. It does not fetch cv.htm and other files in the site. I noticed that if I do? bin/nutch generate crawl/crawldb crawl/segments -topN 1000? after? ?bin/nutch generate crawl/crawldb crawl/segments
it includes some of those pages but not all of them. Is there any way to tell nutch to crawl all the objects in mysite.com Also, I wondered how to put nutch in a website, let say mysite.com/search? Thanks in advance. Alex. -----Original Message----- From: payo <[EMAIL PROTECTED]> To: [email protected] Sent: Wed, 9 Jan 2008 10:18 am Subject: Re: subcollections hi to all i can configure this part. 1.- agree subcollection plucgin in nutch-site.xml in the tomcat Tomcat\webapps\ROOT\WEB-INF\classes\nutch-site.xml 2.- agree label select in te serach.jsp indicating the subcollections line 147 <form name="search" action="../search.jsp" method="get"> <SELECT NAME="subcollection"> <option selected value=<%=subcoleccion%>><%=subcoleccion%></option> <OPTION VALUE="apache">Apache</OPTION> <OPTION VALUE="nutch">Nutch</OPTION> <OPTION VALUE="xml">XML</OPTION> </SELECT> thanks -- View this message in context: http://www.nabble.com/subcollections-tp14373976p14716644.html Sent from the Nutch - User mailing list archive at Nabble.com. ________________________________________________________________________ More new features than ever. Check out the new AIM(R) Mail ! - http://webmail.aim.com
