Hello, I have been working on it since then.. I have found one problem. It seems the plugin parse-xml plugin is not loading.
One thing I did was put the plugin in the parse-plugins.xml to enable nutch-0.8.1 to detect that parse-xml is the plugin to be used for xml content. This is not given in the instructions for the plugin though. Because of it I started to get the following error in hadoop.log:- 2006-11-06 15:12:33,156 WARN parse.ParserFactory - ParserFactory: Plugin: parse-xml mapped to contentType text/xml via parse-plugins.xml, but not enabled via plugin.includes in nutch-default.xml The issue is that I have the plugin enabled in the nutch-site.xml. I also tried to enable the plugin in nutch-default.xml but I still get the same error. Any thoughts/ pointers on how to make the plugin work? Thanks and Best Regards, Jayant Gandhi On 11/5/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote: > I am using the default xmlparser-conf.xml, just copied it into > nutch/conf dir. To test it I used the xml file given in the sample > directory xmltest.xml and is uploaded at http://www.jkg.in/xmltest.xml > . > > I do not get any errors while indexing or parsing. The crawl log is > attached. I am able to get the xml file in the results when I search > for 'XPath' but when I click the explain link, it doesn't show me the > field dctitle in the index which it should. > > I just noticed that hadoop.log has some error for handling xml files > and I cannot see parse-xml loaded, but I have it enabled in my > nutch-site.conf. I am new to nutch-0.8 and hadoop so I have no idea > whether this is expected behaviour/ how to fix it. > > Thanks and Best Regards, > Jayant > > On 11/5/06, Nutch Newbie <[EMAIL PROTECTED]> wrote: > > Can you post your "xmlparser-conf.xml" from the nutch/conf dir ? > > Also what kind of error message do you get when you index? > > You can use Luke to see the index... > > > > Regards, > > > > On 11/4/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote: > > > Hello Everyone, > > > > > > I am just installed nutch-0.8.1 on my dev machine. I installed a new > > > plugin called XML Parser available at > > > http://issues.apache.org/jira/browse/NUTCH-185 > > > The issue is that I am unable get it to work. > > > I copied the parse-xml folder to src/plugin folder. I made the > > > corresponding deploy/ clean entries in the build xml file. > > > > > > Also, I have editied the nutch conf to enable xml plugin. > > > The plugin is still not working. After compiling using ant, I started > > > indexing. After the indexing was finished and query done, I couldnt > > > see the indexed fields on the explain page. > > > > > > Any inputs? > > > > > > Thanks, > > > Jayant > > > > > > > -- > www.jkg.in | http://www.jkg.in/contact-me/ > Jayant Kr. Gandhi -- www.jkg.in | http://www.jkg.in/contact-me/ Jayant Kr. Gandhi ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
