Partial success on the way to installing Nutch 0.8.1 With Debian Etch.

http://mfgis.com/docs/nutchconfig.html

I would like to relate here my progress towards implementing Nutch
0.8.1 on Debian Etch in hope of receiving help at the stage where I
have become stuck.

So here goes:
Disclaimer:  I know little to nothing about the inner workings of
Java, and Tomcat & Nutch were completely unknown to me a week ago.

0.  My OS
# uname -a
Linux  2.6.9-023stab033.6-enterprise #1 SMP Tue Nov 7 16:16:56 MSK
2006 i686 GNU/Linux
# cat /etc/debian_version
testing/unstable

I.  Install Sun's Java
//Sun Java is available as a set of Debian packages and may be easily
installed using apt.  (To obtain Sun's Java, ensure that 'non-free' is
included in /etc/apt/sources.list)
# apt-get install sun-java5-bin sun-java5-demo sun-java-5jdk sun-java5-jre

//Since there may be more than one flavor of Java on the system (e.g.
kaffe) ensure that Sun Java is the chosen alternative
# update-alternatives --config java   // then select sun java from the menu

//If necessary edit /etc/profile to include the following lines:
JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-1.5.0.10
export JAVA_HOME

II.  Install Tomcat5.5
# apt-get install tomcat5.5 libtomcat5.5-java tomcat5.5-admin tomcat5.5-web
// Hopefully, tomcat is installed and running, which I was able to verify:
# ps -ef |grep tomcat
tomcat55  8069     1  0 09:11 ?        00:00:00 su -p -s /bin/sh
tomcat55 -c /usr/sbin/rotatelogs
"/var/lib/tomcat5.5/logs/catalina_%F.log" 86400
tomcat55  8072  8069  0 09:11 ?        00:00:00 /usr/sbin/rotatelogs
/var/lib/tomcat5.5/logs/catalina_%F.log 86400
tomcat55  8103     1  0 09:11 ?        00:00:47
/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/bin/java
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file=/var/lib/tomcat5.5/conf/logging.properties
-Djava.awt.headless=true -Xmx128M
-Djava.endorsed.dirs=/usr/share/tomcat5.5/common/endorsed -classpath
:/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/jre//lib/jcert.jar:/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/jre//lib/jnet.jar:/usr/lib/jvm/java-1.5.0-sun-1.5.0.10/jre//lib/jsse.jar:/usr/share/tomcat5.5/bin/bootstrap.jar:/usr/share/tomcat5.5/bin/commons-logging-api.jar
-Djava.security.manager
-Djava.security.policy==/var/lib/tomcat5.5/conf/catalina.policy
-Dcatalina.base=/var/lib/tomcat5.5
-Dcatalina.home=/usr/share/tomcat5.5
-Djava.io.tmpdir=/var/lib/tomcat5.5/temp
org.apache.catalina.startup.Bootstrap start
//
// So, while the above worked completely smoothly on the architecture
described above, I am stalled at this stage on a second debian machine
which is:
#uname -a
Linux amboro 2.6.15-1-486 #2 Mon Mar 6 15:19:16 UTC 2006 i686 GNU/Linux
#cat /etc/debian_version
4.0
// On this second machine, the previous #apt-get install tomcat5.5
libtomcat5.5-java tomcat5.5-admin tomcat5.5-web also forces the
install of the jsvc ("native application to launch java packages as
daemons") package, although at
http://packages.debian.org/testing/web/tomcat5.5  jsvc shows as a
suggested, not required (depends) package.
// Installing the packages yields the following messages:
Setting up jsvc (1.0.2~svn20061127-4) ...
Setting up libtomcat5.5-java (5.5.20-4) ...
Setting up tomcat5.5 (5.5.20-4) ...
Adding system user `tomcat55' (UID 108) ...
Adding new user `tomcat55' (UID 108) with group `nogroup' ...
Not creating home directory `/usr/share/tomcat5.5'.
Installing /var/lib/tomcat5.5/conf/tomcat-users.xml.
Starting Tomcat servlet engine: tomcat5.5.
Setting up tomcat5.5-admin (5.5.20-4) ...
invoke-rc.d: initscript tomcat5.5, action "status" failed.
Setting up tomcat5.5-webapps (5.5.20-4) ...
invoke-rc.d: initscript tomcat5.5, action "status" failed.
// So something is wrong.  Running:
# ps -ef |grep tomcat
// Shows multiple processes which look like:
root      9136     1  0 12:52 ?        00:00:00 jsvc.exec -user
tomcat55 -cp 
/usr/share/java/commons-daemon.jar:/usr/share/tomcat5.5/bin/bootstrap.jar
-outfile /var/lib/tomcat5.5/logs/catalina.out -errfile &1 -pidfile
/var/run/tomcat5.5.pid -Djava.awt.headless=true -Xmx128M
-Djava.endorsed.dirs=/usr/share/tomcat5.5/common/endorsed
-Dcatalina.base=/var/lib/tomcat5.5
-Dcatalina.home=/usr/share/tomcat5.5
-Djava.io.tmpdir=/var/lib/tomcat5.5/temp -Djava.security.manager
-Djava.security.policy=/var/lib/tomcat5.5/conf/catalina.policy
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file=/var/lib/tomcat5.5/conf/logging.properties
org.apache.catalina.startup.Bootstrap
// Now, comparing this to the above (successfull) install, I note that
under the former tomcat is running under the java virtual machine, but
here it is (trying to?) run as a java service.  I haven't looked into
this further, perhaps there is an easy solution?
// To verify that things aren't working:
# /etc/init.d/tomcat5.5 restart
Starting Tomcat servlet engine: tomcat5.5.
# /etc/init.d/tomcat5.5 status
Tomcat servlet engine is not running.
// Looking at the log files, there is only:
# ls -l /var/log/tomcat5.5
total 0
prw------- 1 tomcat55 nogroup 0 2007-01-24 12:52 catalina.out
//
// So, for now I will table the above and return to the server in
which tomcat IS working.
# /etc/init.d/tomcat5.5 status
#Tomcat servlet engine is running with Java pid
/var/lib/tomcat5.5/temp/tomcat5.5.pid
// And this is a great time to point out where I and/or Debian diverge
from the Nutch tutorial at
http://lucene.apache.org/nutch/tutorial8.html namely:
1.  Starting|Stoping Tomcat (and with it Catalina)  may be achieved
using /etc/init.d/tomcat5.5 start|stop as noted above (where the
tutorial wants you to `sh catalina.sh start`  (Will this come back to
bite me later when nutch's files can't be found?  We shall see)
2.  Config file and webapp paths.  This is a REALLY BIG DEAL.  Debian
has its own location for important Catalina configuration files and
for the Tomcat webapps root:
# ls /etc/tomcat5.5
policy.d  server.xml  web.xml
# ls /etc/tomcat5.5/policy.d
01system.policy  02debian.policy  03catalina.policy  04webapps.policy
04webapps.policy  50user.policy
// These files, collectively provide the content of
/var/lib/tomcat5.5/catalina.policy but are used instead.  I discovered
this a few steps down the line when changes to the catalina.policy
were ignored but those in the /etc/tomcat5.5/policy.d were
implemented.
// Similarly, the root application 'webapps' path is not used under
(this) Debian.  Instead, the path is:
// /usr/share/tomcat5.5-webapps
#ls /usr/share/tomcat5.5-webapps
ROOT      balancer.xml      sample.war             tomcat-docs      webdav.xml
ROOT.xml  jsp-examples      servlets-examples      tomcat-docs.xml
balancer  jsp-examples.xml  servlets-examples.xml  webdav
// Not surprisingly, these are applications provided by the deb
package tomcat5.5-webapps.
//
// Ok, so I am ready to point my browser to the Tomcat home at
http://localhost:8080
// Well, that fails with a standard 'unable to connect'
// So, what port do I really want?  Turning to the conf files in the
correct /etc/tomcat5.5 directory
// Reviewing  /etc/tomcat5.5/server.xml I discover ... Connector port="8180"
// So I return to my browser and point to http://localhost:8180
// "If you're seeing this page via a web browser, it means you've set
up Tomcat successfully.  Congratulations!"
// All Right!  I'm on a roll and life is good.
// Returning to the Nutch tutorial at
http://lucene.apache.org/nutch/tutorial8.html (skipping for now the
indexing and crawling sections and jumping down to the web search
section) I am instructed to 'rm -rf ~/local/tomcat/webapps/ROOT*' and
'cp nutch*.war ~/local/tomcat/webapps/ROOT.war'   (remembering that in
the case of my Debian system the webapps path is
/usr/share/tomcat5-webapps rather than ~/local/tomcat/webapps).
// Wait a minute!  After struggling to get Tomcat running, I am
instructed to throw away all of its webapps in the hope of having
Nutch work in the future.  I think not.  Instead I shouldn't I prefer
to install Nutch as one among many applications?  Yes I have seen
discussion threads that indicate various files and paths within Nutch
are hard-wired to ROOT, and I notice here:
http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine  that
quote "It is not clear why the developers designed the application to
run in the root context. However it is possible to modify the
application to enable it to be deployed normally."  And so I pin my
hopes on this.
// After all, I have the Tomcat Manager at my disposal
http://localhost:8180/manager/html from my Tomcat home page, so I
choose to use this in an attempt to install nutch.
// I must grant myself permission to access the Tomcat Manager pages,
and as instructed in (reference?) do so by modifying
//    /usr/share/tomcat5.5/conf/tomcat-users.xml to include the line:
//    <user username="me" password="*****" roles="manager"/>
// Granted access to the Tomcat Manager I can now list available
applications and not surprisingly find that those provided by the deb
tomcat5.5-webapps package are both listed and functional.
// Thus, Java and Tomcat are installed and verified to be functional.
It is time to turn my attention to:
//
III.  Acquire, configure and  install Nutch;  Build a test index and
run a test crawl.
// I follow the tutorial at
http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine through
Section 3.2 without complication and verify I have a successful test
index and crawl.
#cd /home/me/nutch-0.8.1
#bin/nutch org.apache.nutch.searcher.NutchBean blahBlahBlah
Total Hits: 8
...
#bin/nutch readdb testcrawl/crawldb -stats
CrawlDb statistics start: testcrawl/crawldb
Statistics for CrawlDb: testcrawl/crawldb
TOTAL urls:     4727
...
CrawlDb statistics: done
//
// OK, to recap:  The Tomcat Server is fully functional, as is Nutch
as a stand-alone. It is time to:

IV.  Install Nutch as a Tomcat application.
// As noted above, I ignore the advice to wipe Tomcat's ROOT context,
opting to (hopefully) install Nutch as one application among many.
// I have read that placing a WAR in the webapps folder will result in
it being extracted automatically upon Tomcat's next restart, so I
# cp /home/me/nutch-0.8.1/nutch-0.8.1.war
/usr/share/tomcat5.5-webapps/nutch-0.8.1.war
// I also create the context file
/usr/share/tomcat5.5-webapps/nutch-0.8.1.xml the contents of which are
<Context path="/nutch-0.8.1" docBase="/usr/share/tomcat5.5-webapps/nutch-0.8.1"
         debug="0" privileged="false" allowLinking="true">
</Context>
// I restart Tomcat
# /etc/init.d/tomcat5.5 restart
// Contrary to expectations, the nutch-0.8.1.war file was *NOT*
extracted.  I do so manually
# mkdir nutch-0.8.1  //(in /usr/share/tomcat5.5-webapps)
# mv nutch-0.8.1.war nutch-0.8.1  // move the WAR to the folder
# cd nutch-0.8.1
# jar -xvf nutch-0.8.1.war
# /etc/init.d/tomcat5.5 restart
// I return to my browser and the Tomcat Manager page and 'List Applications'
// I find an entry for nutch-0.8.1! and click 'start'
// A message is returned:  'OK - Started application at context path
/nutch-0.8.1'
// Life is good!
// I point my browser to the Nutch home page
http://localhost:8180/nutch-0.8.1 which redirects me slightly to
// http://localhost:8180/nutch-0.8.1/en
// I enter a search term and click 'search'
// I get an error dump indicating permission errors, which, thanks to
http://nutch.sourceforge.net/cgi-bin/twiki/view/Main/GettingNutchRunningOnDebian
 I can correct.
// Remembering the location of Tomcat's configuration files under
/etc/tomcat5.5/policy.d I edit 04webapps.policy and add the following
lines:
grant codeBase "file:/usr/share/tomcat5.5-webapps/nutch-0.8.1/-" {
    permission java.util.PropertyPermission "user.dir", "read";
    permission java.io.FilePermission
"/home/me/nutch-0.8.1/testcrawl/*" , "read";
};
// Restart Tomcat
# /etc/init.d/tomcat5.5 restart
// Try the search again
// And receive an HTTP 500 error which begins:

exception
org.apache.jasper.JasperException: Exception in JSP: /search.jsp:49

46: --%>
47:
48: <%
49:   NutchBean bean = NutchBean.get(application, nutchConf);
50:   // set the character encoding to use when interpreting request values
51:   request.setCharacterEncoding("UTF-8");
52:

// And a log file which reads in part:
2007-01-24 17:38:08,470 INFO  NutchBean - creating new bean
2007-01-24 17:38:08,472 INFO  NutchBean - opening merged index in
/home/walker/nutch-0.8.1/crawl2/index
// Everything is fine until here, but:
2007-01-24 17:38:08,477 ERROR [jsp] - Servlet.service() for servlet
jsp threw exception
java.lang.NoClassDefFoundError
        at 
org.apache.nutch.searcher.IndexSearcher.getDirectory(IndexSearcher.java:83)
        at org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:70)
        at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:118)
        at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:105)
        at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:83)
        at org.apache.nutch.searcher.NutchBean.get(NutchBean.java:70)

// So I am ready to start digging in to this error under the
assumption that it may be as simple as a path error.
// However, I instead spend a couple hours preparing this narrative
and post it up to the Nutch user's list.

// I will gladly reformat this for inclusion in the Wiki if that
should prove of interest to anybody.  Naturally I would hope to have a
complete solution in hand, and would appreciate any help along the
way.

S.W.
Middle Fork Geographic Information Services
middleforkgis-att-gmail-dott-comm
24 Jan 2007
</pre>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to