Hi,
Here is the steps to install the Xml Parser plugin :
1- Copy parse-xml in the src/plugin directory
2- Copy xmlparser-conf.xml in the conf directory
3- Add to nutch-site.xml (conf directory) the following property
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter
-regex|parse-(text|xml|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints plugin. By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins.
</description>
</property>
4- Modify parse-plugins.xml (conf directory)
<mimeType name="text/xml">
<plugin id="parse-xml" />
<plugin id="parse-text" />
<plugin id="parse-html" />
<plugin id="parse-rss" />
</mimeType>
5- Modify build.xml in the root directory add parse-xml
6 - Modify src\plugin build.xml add parse-xml
7 - Execute ant in src/plugin directory
8 - Execute ant in the root directory
9 - Copy parse-xml directory located in nutch-0.8.1/build/plugins to
nutch-0.8.1/plugins
Best regards
Rida Benjelloun
On 11/7/06, Jim Wilson <[EMAIL PROTECTED]> wrote:
I think you should stop sending *bump* emails.
-- Jim
On 11/7/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:
>
> *bump*
>
> Any thoughts, anyone?
>
> Thanks,
> Jayant
>
> On 11/6/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:
> > Hello,
> >
> > I have been working on it since then.. I have found one problem. It
> > seems the plugin parse-xml plugin is not loading.
> >
> > One thing I did was put the plugin in the parse-plugins.xml to enable
> > nutch-0.8.1 to detect that parse-xml is the plugin to be used for xml
> > content. This is not given in the instructions for the plugin though.
> >
> > Because of it I started to get the following error in hadoop.log:-
> >
> > 2006-11-06 15:12:33,156 WARN parse.ParserFactory - ParserFactory:
> > Plugin: parse-xml mapped to contentType text/xml via
> > parse-plugins.xml, but not enabled via plugin.includes in
> > nutch-default.xml
> >
> > The issue is that I have the plugin enabled in the nutch-site.xml. I
> > also tried to enable the plugin in nutch-default.xml but I still get
> > the same error.
> >
> > Any thoughts/ pointers on how to make the plugin work?
> >
> > Thanks and Best Regards,
> > Jayant Gandhi
> >
> >
> > On 11/5/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:
> > > I am using the default xmlparser-conf.xml, just copied it into
> > > nutch/conf dir. To test it I used the xml file given in the sample
> > > directory xmltest.xml and is uploaded at
http://www.jkg.in/xmltest.xml
> > > .
> > >
> > > I do not get any errors while indexing or parsing. The crawl log is
> > > attached. I am able to get the xml file in the results when I search
> > > for 'XPath' but when I click the explain link, it doesn't show me
the
> > > field dctitle in the index which it should.
> > >
> > > I just noticed that hadoop.log has some error for handling xml files
> > > and I cannot see parse-xml loaded, but I have it enabled in my
> > > nutch-site.conf. I am new to nutch-0.8 and hadoop so I have no idea
> > > whether this is expected behaviour/ how to fix it.
> > >
> > > Thanks and Best Regards,
> > > Jayant
> > >
> > > On 11/5/06, Nutch Newbie <[EMAIL PROTECTED]> wrote:
> > > > Can you post your "xmlparser-conf.xml" from the nutch/conf dir ?
> > > > Also what kind of error message do you get when you index?
> > > > You can use Luke to see the index...
> > > >
> > > > Regards,
> > > >
> > > > On 11/4/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:
> > > > > Hello Everyone,
> > > > >
> > > > > I am just installed nutch-0.8.1 on my dev machine. I installed a
> new
> > > > > plugin called XML Parser available at
> > > > > http://issues.apache.org/jira/browse/NUTCH-185
> > > > > The issue is that I am unable get it to work.
> > > > > I copied the parse-xml folder to src/plugin folder. I made the
> > > > > corresponding deploy/ clean entries in the build xml file.
> > > > >
> > > > > Also, I have editied the nutch conf to enable xml plugin.
> > > > > The plugin is still not working. After compiling using ant, I
> started
> > > > > indexing. After the indexing was finished and query done, I
> couldnt
> > > > > see the indexed fields on the explain page.
> > > > >
> > > > > Any inputs?
> > > > >
> > > > > Thanks,
> > > > > Jayant
> > > > >
> > > >
> > >
> > > --
> > > www.jkg.in | http://www.jkg.in/contact-me/
> > > Jayant Kr. Gandhi
> >
> > --
> > www.jkg.in | http://www.jkg.in/contact-me/
> > Jayant Kr. Gandhi
> >
>
>
> --
> www.jkg.in | http://www.jkg.in/contact-me/
> Jayant Kr. Gandhi
> M.Tech. Computer Tech. Class of 2007,
> IIT Delhi
>
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general