Partial success on the way to installing Nutch 0.8.1 With Debian Etch. http://mfgis.com/docs/nutchconfig.html
I would like to relate here my progress towards implementing Nutch 0.8.1 on Debian Etch in hope of receiving help at the stage where I have become stuck. So here goes: Disclaimer: I know little to nothing about the inner workings of Java, and Tomcat & Nutch were completely unknown to me a week ago. 0. My OS # uname -a Linux 2.6.9-023stab033.6-enterprise #1 SMP Tue Nov 7 16:16:56 MSK 2006 i686 GNU/Linux # cat /etc/debian_version testing/unstable I. Install Sun's Java //Sun Java is available as a set of Debian packages and may be easily installed using apt. (To obtain Sun's Java, ensure that 'non-free' is included in /etc/apt/sources.list) # apt-get install sun-java5-bin sun-java5-demo sun-java-5jdk sun-java5-jre //Since there may be more than one flavor of Java on the system (e.g. kaffe) ensure that Sun Java is the chosen alternative # update-alternatives --config java // then select sun java from the menu //If necessary edit /etc/profile to include the following lines: JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-1.5.0.10 export JAVA_HOME II. Install Tomcat5.5 # apt-get install tomcat5.5 libtomcat5.5-java tomcat5.5-admin tomcat5.5-web // Hopefully, tomcat is installed and running, which I was able to verify: # ps -ef |grep tomcat tomcat55 8069 1 0 09:11 ? 00:00:00 su -p -s /bin/sh tomcat55 -c /usr/sbin/rotatelogs "/var/lib/tomcat5.5/logs/catalina_%F.log" 86400 tomcat55 8072 8069 0 09:11 ? 00:00:00 /usr/sbin/rotatelogs /var/lib/tomcat5.5/logs/catalina_%F.log 86400 tomcat55 8103 1 0 09:11 ? 00:00:47 /usr/lib/jvm/java-1.5.0-sun-1.5.0.10/bin/java -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/var/lib/tomcat5.5/conf/logging.properties -Djava.awt.headless=true -Xmx128M -Djava.endorsed.dirs=/usr/share/tomcat5.5/common/endorsed -classpath :/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/jre//lib/jcert.jar:/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/jre//lib/jnet.jar:/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/jre//lib/jsse.jar:/usr/share/tomcat5.5/bin/bootstrap.jar:/usr/share/tomcat5.5/bin/commons-logging-api.jar -Djava.security.manager -Djava.security.policy==/var/lib/tomcat5.5/conf/catalina.policy -Dcatalina.base=/var/lib/tomcat5.5 -Dcatalina.home=/usr/share/tomcat5.5 -Djava.io.tmpdir=/var/lib/tomcat5.5/temp org.apache.catalina.startup.Bootstrap start // // So, while the above worked completely smoothly on the architecture described above, I am stalled at this stage on a second debian machine which is: #uname -a Linux amboro 2.6.15-1-486 #2 Mon Mar 6 15:19:16 UTC 2006 i686 GNU/Linux #cat /etc/debian_version 4.0 // On this second machine, the previous #apt-get install tomcat5.5 libtomcat5.5-java tomcat5.5-admin tomcat5.5-web also forces the install of the jsvc ("native application to launch java packages as daemons") package, although at http://packages.debian.org/testing/web/tomcat5.5 jsvc shows as a suggested, not required (depends) package. // Installing the packages yields the following messages: Setting up jsvc (1.0.2~svn20061127-4) ... Setting up libtomcat5.5-java (5.5.20-4) ... Setting up tomcat5.5 (5.5.20-4) ... Adding system user `tomcat55' (UID 108) ... Adding new user `tomcat55' (UID 108) with group `nogroup' ... Not creating home directory `/usr/share/tomcat5.5'. Installing /var/lib/tomcat5.5/conf/tomcat-users.xml. Starting Tomcat servlet engine: tomcat5.5. Setting up tomcat5.5-admin (5.5.20-4) ... invoke-rc.d: initscript tomcat5.5, action "status" failed. Setting up tomcat5.5-webapps (5.5.20-4) ... invoke-rc.d: initscript tomcat5.5, action "status" failed. // So something is wrong. Running: # ps -ef |grep tomcat // Shows multiple processes which look like: root 9136 1 0 12:52 ? 00:00:00 jsvc.exec -user tomcat55 -cp /usr/share/java/commons-daemon.jar:/usr/share/tomcat5.5/bin/bootstrap.jar -outfile /var/lib/tomcat5.5/logs/catalina.out -errfile &1 -pidfile /var/run/tomcat5.5.pid -Djava.awt.headless=true -Xmx128M -Djava.endorsed.dirs=/usr/share/tomcat5.5/common/endorsed -Dcatalina.base=/var/lib/tomcat5.5 -Dcatalina.home=/usr/share/tomcat5.5 -Djava.io.tmpdir=/var/lib/tomcat5.5/temp -Djava.security.manager -Djava.security.policy=/var/lib/tomcat5.5/conf/catalina.policy -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/var/lib/tomcat5.5/conf/logging.properties org.apache.catalina.startup.Bootstrap // Now, comparing this to the above (successfull) install, I note that under the former tomcat is running under the java virtual machine, but here it is (trying to?) run as a java service. I haven't looked into this further, perhaps there is an easy solution? // To verify that things aren't working: # /etc/init.d/tomcat5.5 restart Starting Tomcat servlet engine: tomcat5.5. # /etc/init.d/tomcat5.5 status Tomcat servlet engine is not running. // Looking at the log files, there is only: # ls -l /var/log/tomcat5.5 total 0 prw------- 1 tomcat55 nogroup 0 2007-01-24 12:52 catalina.out // // So, for now I will table the above and return to the server in which tomcat IS working. # /etc/init.d/tomcat5.5 status #Tomcat servlet engine is running with Java pid /var/lib/tomcat5.5/temp/tomcat5.5.pid // And this is a great time to point out where I and/or Debian diverge from the Nutch tutorial at http://lucene.apache.org/nutch/tutorial8.html namely: 1. Starting|Stoping Tomcat (and with it Catalina) may be achieved using /etc/init.d/tomcat5.5 start|stop as noted above (where the tutorial wants you to `sh catalina.sh start` (Will this come back to bite me later when nutch's files can't be found? We shall see) 2. Config file and webapp paths. This is a REALLY BIG DEAL. Debian has its own location for important Catalina configuration files and for the Tomcat webapps root: # ls /etc/tomcat5.5 policy.d server.xml web.xml # ls /etc/tomcat5.5/policy.d 01system.policy 02debian.policy 03catalina.policy 04webapps.policy 04webapps.policy 50user.policy // These files, collectively provide the content of /var/lib/tomcat5.5/catalina.policy but are used instead. I discovered this a few steps down the line when changes to the catalina.policy were ignored but those in the /etc/tomcat5.5/policy.d were implemented. // Similarly, the root application 'webapps' path is not used under (this) Debian. Instead, the path is: // /usr/share/tomcat5.5-webapps #ls /usr/share/tomcat5.5-webapps ROOT balancer.xml sample.war tomcat-docs webdav.xml ROOT.xml jsp-examples servlets-examples tomcat-docs.xml balancer jsp-examples.xml servlets-examples.xml webdav // Not surprisingly, these are applications provided by the deb package tomcat5.5-webapps. // // Ok, so I am ready to point my browser to the Tomcat home at http://localhost:8080 // Well, that fails with a standard 'unable to connect' // So, what port do I really want? Turning to the conf files in the correct /etc/tomcat5.5 directory // Reviewing /etc/tomcat5.5/server.xml I discover ... Connector port="8180" // So I return to my browser and point to http://localhost:8180 // "If you're seeing this page via a web browser, it means you've set up Tomcat successfully. Congratulations!" // All Right! I'm on a roll and life is good. // Returning to the Nutch tutorial at http://lucene.apache.org/nutch/tutorial8.html (skipping for now the indexing and crawling sections and jumping down to the web search section) I am instructed to 'rm -rf ~/local/tomcat/webapps/ROOT*' and 'cp nutch*.war ~/local/tomcat/webapps/ROOT.war' (remembering that in the case of my Debian system the webapps path is /usr/share/tomcat5-webapps rather than ~/local/tomcat/webapps). // Wait a minute! After struggling to get Tomcat running, I am instructed to throw away all of its webapps in the hope of having Nutch work in the future. I think not. Instead I shouldn't I prefer to install Nutch as one among many applications? Yes I have seen discussion threads that indicate various files and paths within Nutch are hard-wired to ROOT, and I notice here: http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine that quote "It is not clear why the developers designed the application to run in the root context. However it is possible to modify the application to enable it to be deployed normally." And so I pin my hopes on this. // After all, I have the Tomcat Manager at my disposal http://localhost:8180/manager/html from my Tomcat home page, so I choose to use this in an attempt to install nutch. // I must grant myself permission to access the Tomcat Manager pages, and as instructed in (reference?) do so by modifying // /usr/share/tomcat5.5/conf/tomcat-users.xml to include the line: // <user username="me" password="*****" roles="manager"/> // Granted access to the Tomcat Manager I can now list available applications and not surprisingly find that those provided by the deb tomcat5.5-webapps package are both listed and functional. // Thus, Java and Tomcat are installed and verified to be functional. It is time to turn my attention to: // III. Acquire, configure and install Nutch; Build a test index and run a test crawl. // I follow the tutorial at http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine through Section 3.2 without complication and verify I have a successful test index and crawl. #cd /home/me/nutch-0.8.1 #bin/nutch org.apache.nutch.searcher.NutchBean blahBlahBlah Total Hits: 8 ... #bin/nutch readdb testcrawl/crawldb -stats CrawlDb statistics start: testcrawl/crawldb Statistics for CrawlDb: testcrawl/crawldb TOTAL urls: 4727 ... CrawlDb statistics: done // // OK, to recap: The Tomcat Server is fully functional, as is Nutch as a stand-alone. It is time to: IV. Install Nutch as a Tomcat application. // As noted above, I ignore the advice to wipe Tomcat's ROOT context, opting to (hopefully) install Nutch as one application among many. // I have read that placing a WAR in the webapps folder will result in it being extracted automatically upon Tomcat's next restart, so I # cp /home/me/nutch-0.8.1/nutch-0.8.1.war /usr/share/tomcat5.5-webapps/nutch-0.8.1.war // I also create the context file /usr/share/tomcat5.5-webapps/nutch-0.8.1.xml the contents of which are <Context path="/nutch-0.8.1" docBase="/usr/share/tomcat5.5-webapps/nutch-0.8.1" debug="0" privileged="false" allowLinking="true"> </Context> // I restart Tomcat # /etc/init.d/tomcat5.5 restart // Contrary to expectations, the nutch-0.8.1.war file was *NOT* extracted. I do so manually # mkdir nutch-0.8.1 //(in /usr/share/tomcat5.5-webapps) # mv nutch-0.8.1.war nutch-0.8.1 // move the WAR to the folder # cd nutch-0.8.1 # jar -xvf nutch-0.8.1.war # /etc/init.d/tomcat5.5 restart // I return to my browser and the Tomcat Manager page and 'List Applications' // I find an entry for nutch-0.8.1! and click 'start' // A message is returned: 'OK - Started application at context path /nutch-0.8.1' // Life is good! // I point my browser to the Nutch home page http://localhost:8180/nutch-0.8.1 which redirects me slightly to // http://localhost:8180/nutch-0.8.1/en // I enter a search term and click 'search' // I get an error dump indicating permission errors, which, thanks to http://nutch.sourceforge.net/cgi-bin/twiki/view/Main/GettingNutchRunningOnDebian I can correct. // Remembering the location of Tomcat's configuration files under /etc/tomcat5.5/policy.d I edit 04webapps.policy and add the following lines: grant codeBase "file:/usr/share/tomcat5.5-webapps/nutch-0.8.1/-" { permission java.util.PropertyPermission "user.dir", "read"; permission java.io.FilePermission "/home/me/nutch-0.8.1/testcrawl/*" , "read"; }; // Restart Tomcat # /etc/init.d/tomcat5.5 restart // Try the search again // And receive an HTTP 500 error which begins: exception org.apache.jasper.JasperException: Exception in JSP: /search.jsp:49 46: --%> 47: 48: <% 49: NutchBean bean = NutchBean.get(application, nutchConf); 50: // set the character encoding to use when interpreting request values 51: request.setCharacterEncoding("UTF-8"); 52: // And a log file which reads in part: 2007-01-24 17:38:08,470 INFO NutchBean - creating new bean 2007-01-24 17:38:08,472 INFO NutchBean - opening merged index in /home/walker/nutch-0.8.1/crawl2/index // Everything is fine until here, but: 2007-01-24 17:38:08,477 ERROR [jsp] - Servlet.service() for servlet jsp threw exception java.lang.NoClassDefFoundError at org.apache.nutch.searcher.IndexSearcher.getDirectory(IndexSearcher.java:83) at org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:70) at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:118) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:105) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:83) at org.apache.nutch.searcher.NutchBean.get(NutchBean.java:70) // So I am ready to start digging in to this error under the assumption that it may be as simple as a path error. // However, I instead spend a couple hours preparing this narrative and post it up to the Nutch user's list. // I will gladly reformat this for inclusion in the Wiki if that should prove of interest to anybody. Naturally I would hope to have a complete solution in hand, and would appreciate any help along the way. S.W. Middle Fork Geographic Information Services middleforkgis-att-gmail-dott-comm 24 Jan 2007 </pre> ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general