Hello,

I have been working on it since then.. I have found one problem. It
seems the plugin parse-xml plugin is not loading.

One thing I did was put the plugin in the parse-plugins.xml to enable
nutch-0.8.1 to detect that parse-xml is the plugin to be used for xml
content. This is not given in the instructions for the plugin though.

Because of it I started to get the following error in hadoop.log:-

2006-11-06 15:12:33,156 WARN  parse.ParserFactory - ParserFactory:
Plugin: parse-xml mapped to contentType text/xml via
parse-plugins.xml, but not enabled via plugin.includes in
nutch-default.xml

The issue is that I have the plugin enabled in the nutch-site.xml. I
also tried to enable the plugin in nutch-default.xml but I still get
the same error.

Any thoughts/ pointers on how to make the plugin work?

Thanks and Best Regards,
Jayant Gandhi


On 11/5/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:
> I am using the default xmlparser-conf.xml, just copied it into
> nutch/conf dir. To test it I used the xml file given in the sample
> directory xmltest.xml and is uploaded at http://www.jkg.in/xmltest.xml
> .
>
> I do not get any errors while indexing or parsing. The crawl log is
> attached. I am able to get the xml file in the results when I search
> for 'XPath' but when I click the explain link, it doesn't show me the
> field dctitle in the index which it should.
>
> I just noticed that hadoop.log has some error for handling xml files
> and I cannot see parse-xml loaded, but I have it enabled in my
> nutch-site.conf. I am new to nutch-0.8 and hadoop so I have no idea
> whether this is expected behaviour/ how to fix it.
>
> Thanks and Best Regards,
> Jayant
>
> On 11/5/06, Nutch Newbie <[EMAIL PROTECTED]> wrote:
> > Can you post your "xmlparser-conf.xml" from the nutch/conf dir ?
> > Also what kind of error message do you get when you index?
> > You can use Luke to see the index...
> >
> > Regards,
> >
> > On 11/4/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:
> > > Hello Everyone,
> > >
> > > I am just installed nutch-0.8.1 on my dev machine. I installed a new
> > > plugin called XML Parser available at
> > > http://issues.apache.org/jira/browse/NUTCH-185
> > > The issue is that I am unable get it to work.
> > > I copied the parse-xml folder to src/plugin folder. I made the
> > > corresponding deploy/ clean entries in the build xml file.
> > >
> > > Also, I have editied the nutch conf to enable xml plugin.
> > > The plugin is still not working. After compiling using ant, I started
> > > indexing. After the indexing was finished and query done, I couldnt
> > > see the indexed fields on the explain page.
> > >
> > > Any inputs?
> > >
> > > Thanks,
> > > Jayant
> > >
> >
>
> --
> www.jkg.in | http://www.jkg.in/contact-me/
> Jayant Kr. Gandhi

-- 
www.jkg.in | http://www.jkg.in/contact-me/
Jayant Kr. Gandhi

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to