Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by MiddleForkMaps: http://wiki.apache.org/nutch/GettingNutchRunningWithDebian ------------------------------------------------------------------------------ Under Debian Etch, the Catalina configuration files are located under '''/etc/tomcat5.5/policy.d''' At runtime they are combined into a single file, ''/usr/share/tomcat5.5/conf/catalina.policy'' Do not edit the latter, as it will be overwrittten.[[BR]] At the end of /etc/tomcat5.5/policy.d/04webapps.policy include the following code:[[BR]] - ''grant codeBase "file:/usr/share/tomcat5.5-webapps/-" { + ''grant codeBase "file:/usr/share/tomcat5.5-webapps/-" {[[BR]] - permission java.util.PropertyPermission "user.dir", "read"; + permission java.util.PropertyPermission "user.dir", "read";[[BR]] - permission java.util.PropertyPermission "java.io.tmpdir", "read,write"; + permission java.util.PropertyPermission "java.io.tmpdir", "read,write";[[BR]] - permission java.util.PropertyPermission "org.apache.*", "read,execute"; + permission java.util.PropertyPermission "org.apache.*", "read,execute";[[BR]] - permission java.io.FilePermission "/usr/local/nutch/crawls/-" , "read"; + permission java.io.FilePermission "/usr/local/nutch/crawls/-" , "read";[[BR]] - permission java.io.FilePermission "/var/lib/tomcat5.5/temp", "read"; + permission java.io.FilePermission "/var/lib/tomcat5.5/temp", "read";[[BR]] - permission java.io.FilePermission "/var/lib/tomcat5.5/temp/-", "read,write,execute,delete"; + permission java.io.FilePermission "/var/lib/tomcat5.5/temp/-", "read,write,execute,delete";[[BR]] - permission java.lang.RuntimePermission "createClassLoader", ""; + permission java.lang.RuntimePermission "createClassLoader", "";[[BR]] - permission java.security.AllPermission; + permission java.security.AllPermission;[[BR]] + };[[BR]] - }; - '' - '''Warning: The last line here was necessary in order to make things work for me. If anybody can supply a more restrictive permission set, please do so!!! The effects of this are unknown''' + '''Warning: The last line here was necessary in order to make things work for me. If anybody can supply a more restrictive permission set, please do so!!! The effects of this are unknown'''[[BR]] == Acquire, install and configure Nutch == - Follow '''ONLY''' the section ''Getting Started'' in the Nutch tutorial at http://lucene.apache.org/nutch/tutorial8.html + Acquire a copy of nutch and unpack it in a new directory location. I suggest using /usr/local/nutch as the top-level directory, but this is of course optional[[BR]] + - ===Configure for multiple, independent site crawls and searches=== + === Configure for multiple, independent site crawls and searches === + Follow the section '''Intranet:Configuration''' from the Nutch tutorial at http://lucene.apache.org/nutch/tutorial8.html. However, plan in advance for crawling and searching sites independently from one another:[[BR]] - Given two sites, site1 and site2 which you wish to crawl/index (and later search) independently from each other:[[BR]] + Given two sites, site1 and site2 which you wish to crawl/index (and later search) independently from each other, you may make multiple copies of the conf directory:[[BR]] + ''#cd /usr/local/nutch''[[BR]] ''#cp -rp conf conf.site1''[[BR]] ''#cp -rp conf conf.site2''[[BR]] + And then work through steps one through four of the above mentioned section for '''each''' site.[[BR]] + + Create simple shell scripts which allow for the independent crawling of each site, such as '''/usr/local/nutch/crawl_site1.sh'''[[BR]] + ''NUTCH_CONF_DIR=conf.site1''[[BR]] + ''export NUTCH_CONF_DIR''[[BR]] + ''bin/nutch crawl urls/site1 -dir crawls/site1 -depth 10 -topN 100000''[[BR]] + and the same for site2.[[BR]] + Crawl each site:[[BR]] + ''sh crawl_site1.sh''[[BR]] + ''sh crawl_site2.sh''[[BR]] + + ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs