Hi all,
I'm just starting out using Nutch and have a (probably basic) question
about configuration.
I want to use Nutch to provide search facilities for a website. I have
installed it at c:/nutch-0.9, edited the files in c:/nuthc-0.9/conf and
created a crawl index at c:/nutch-0.9/crawl.mysite.
Now I'm trying to use NutchBean to return some search results to a page
on my site. I'm using Tomcat and have added nutch-0.9.jar to the
deployed war file. However, I'm having trouble getting Nutch to run
against the external-to-tomcat directory.
Specifically, I don't want to have copies of the Nutch config files in
the deployed webapp to avoid management overhead on the live site
associated with keeping two copies of the file in sync, so I want to
convince the webapp to use c:/nutch-0.9/conf to get nutch-site.xml and
so forth.
I can get most of the config loaded by loading the xml files into a
configuration object like this...
//conf = NutchConfiguration.create();
conf = new Configuration();
// Add Nutch config files using nutchPath as a base to
over-ride defaults
File defaultFile = new File(nutchPath +
"/conf/nutch-default.xml");
if ( defaultFile.exists() ) {
conf.addDefaultResource(defaultFile.toURL());
}
File siteFile = new File(nutchPath +
"/conf/nutch-site.xml");
if ( siteFile.exists() ) {
conf.addFinalResource(siteFile.toURL());
}
bean = new NutchBean(conf);
but for some reason it won't pick up a file called common-terms.utf8. It
just reports:
2007-05-08 13:50:14,318 [main] INFO
[org.apache.hadoop.conf.Configuration]
C:/nutch-0.9/conf/common-terms.utf8 not found
Although that file is certainly present. And then it throws an NPE
because it can't find the file:
java.lang.NullPointerException
at java.io.Reader.<init>(Reader.java:61)
at java.io.BufferedReader.<init>(BufferedReader.java:76)
...
Anyone know where I'm going wrong and how I can configure it so I don't
have to include the config files in my war?
Cheers,
Ian.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general