You can use Xpath like this :
//*[name()='TitleImage']/*[name()='MediaUri']
//*[name()='TitleVideo']/*[name()='MediaUri']
//*[name()='Comment']/*[name()='FreeTextAnnotation']
Best regards.
---------- Forwarded message ----------
From: Jayant Kumar Gandhi <[EMAIL PROTECTED]>
Date: Nov 8, 2006 4:39 PM
Subject: Re: XMLParser for Nutch
To: Rida Benjelloun <[EMAIL PROTECTED]>
Thanks Rida.
I was missing steps 5 & 9.
One more quick question from you. I am trying to parse a mpeg7
document. I have a simple MPEG7 document attached. To index/ search it
I have the following xpath configuration in the xmlparser-conf.xml,
but it doesn't seem to be working. The default sample given with
parse-xml works fine though.
<xmlIndexerProperties type="filePerDocument"
namespace="urn:mpeg:mpeg7:schema:2001">
<field name="videotitle" xpath="//Title" type="Text" boost=" 1.5"/>
<field name="screenshoturi" xpath="//TitleImage/MediaUri"
type="keyword" boost="1.0"/>
<field name="videouri" xpath="//TitleVideo/MediaUri" type="keyword"
boost="1.0"/>
<field name="tags" xpath="//Comment/FreeTextAnnotation" type="Text"
boost="1.5"/>
</xmlIndexerProperties>
Could you please point me to what could be the reason/ mistake I did.
Best Regards,
Jayant Gandhi
On 11/8/06, Rida Benjelloun <[EMAIL PROTECTED] > wrote:
> Hi,
> Here is the steps to install the Xml Parser plugin :
> 1- Copy parse-xml in the src/plugin directory
>
> 2- Copy xmlparser-conf.xml in the conf directory
> 3- Add to nutch-site.xml (conf directory) the following property
> <property>
> <name>plugin.includes</name>
> <value>protocol-http|urlfilter
>
> -regex|parse-(text|xml|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic</value>
>
> <description>Regular _expression_ naming plugin directory names to
> include. Any plugin not matching this _expression_ is excluded.
> In any case you need at least include the nutch-extensionpoints plugin. By
>
> default Nutch includes crawling just HTML and plain text via HTTP,
> and basic indexing and search plugins.
> </description>
> </property>
>
> 4- Modify parse-plugins.xml (conf directory)
> <mimeType name="text/xml">
> <plugin id="parse-xml" />
> <plugin id="parse-text" />
> <plugin id="parse-html" />
> <plugin id="parse-rss" />
> </mimeType>
>
> 5- Modify build.xml in the root directory add parse-xml
> 6 - Modify src\plugin build.xml add parse-xml
> 7 - Execute ant in src/plugin directory
> 8 - Execute ant in the root directory
> 9 - Copy parse-xml directory located in nutch-0.8.1/build/plugins to
> nutch-0.8.1/plugins
>
> Best regards
>
> Rida Benjelloun
>
>
>
>
> On 11/7/06, Jim Wilson <[EMAIL PROTECTED]> wrote:
> >
> > I think you should stop sending *bump* emails.
> >
> > -- Jim
> >
> > On 11/7/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:
> > >
> > > *bump*
> > >
> > > Any thoughts, anyone?
> > >
> > > Thanks,
> > > Jayant
> > >
> > > On 11/6/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:
> > > > Hello,
> > > >
> > > > I have been working on it since then.. I have found one problem. It
> > > > seems the plugin parse-xml plugin is not loading.
> > > >
> > > > One thing I did was put the plugin in the parse-plugins.xml to enable
> > > > nutch-0.8.1 to detect that parse-xml is the plugin to be used for xml
> > > > content. This is not given in the instructions for the plugin though.
> > > >
> > > > Because of it I started to get the following error in hadoop.log:-
> > > >
> > > > 2006-11-06 15:12:33,156 WARN parse.ParserFactory - ParserFactory:
> > > > Plugin: parse-xml mapped to contentType text/xml via
> > > > parse-plugins.xml, but not enabled via plugin.includes in
> > > > nutch-default.xml
> > > >
> > > > The issue is that I have the plugin enabled in the nutch-site.xml . I
> > > > also tried to enable the plugin in nutch-default.xml but I still get
> > > > the same error.
> > > >
> > > > Any thoughts/ pointers on how to make the plugin work?
> > > >
> > > > Thanks and Best Regards,
> > > > Jayant Gandhi
> > > >
> > > >
> > > > On 11/5/06, Jayant Kumar Gandhi < [EMAIL PROTECTED]> wrote:
> > > > > I am using the default xmlparser-conf.xml, just copied it into
> > > > > nutch/conf dir. To test it I used the xml file given in the sample
> > > > > directory xmltest.xml and is uploaded at
> > http://www.jkg.in/xmltest.xml
> > > > > .
> > > > >
> > > > > I do not get any errors while indexing or parsing. The crawl log is
> > > > > attached. I am able to get the xml file in the results when I search
> > > > > for 'XPath' but when I click the explain link, it doesn't show me
> > the
> > > > > field dctitle in the index which it should.
> > > > >
> > > > > I just noticed that hadoop.log has some error for handling xml files
> > > > > and I cannot see parse-xml loaded, but I have it enabled in my
> > > > > nutch-site.conf. I am new to nutch-0.8 and hadoop so I have no idea
> > > > > whether this is expected behaviour/ how to fix it.
> > > > >
> > > > > Thanks and Best Regards,
> > > > > Jayant
> > > > >
> > > > > On 11/5/06, Nutch Newbie <[EMAIL PROTECTED]> wrote:
> > > > > > Can you post your " xmlparser-conf.xml" from the nutch/conf dir ?
> > > > > > Also what kind of error message do you get when you index?
> > > > > > You can use Luke to see the index...
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > On 11/4/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:
> > > > > > > Hello Everyone,
> > > > > > >
> > > > > > > I am just installed nutch-0.8.1 on my dev machine. I installed a
> > > new
> > > > > > > plugin called XML Parser available at
> > > > > > > http://issues.apache.org/jira/browse/NUTCH-185
> > > > > > > The issue is that I am unable get it to work.
> > > > > > > I copied the parse-xml folder to src/plugin folder. I made the
> > > > > > > corresponding deploy/ clean entries in the build xml file.
> > > > > > >
> > > > > > > Also, I have editied the nutch conf to enable xml plugin.
> > > > > > > The plugin is still not working. After compiling using ant, I
> > > started
> > > > > > > indexing. After the indexing was finished and query done, I
> > > couldnt
> > > > > > > see the indexed fields on the explain page.
> > > > > > >
> > > > > > > Any inputs?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jayant
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > www.jkg.in | http://www.jkg.in/contact-me/
> > > > > Jayant Kr. Gandhi
> > > >
> > > > --
> > > > www.jkg.in | http://www.jkg.in/contact-me/
> > > > Jayant Kr. Gandhi
> > > >
> > >
> > >
> > > --
> > > www.jkg.in | http://www.jkg.in/contact-me/
> > > Jayant Kr. Gandhi
> > > M.Tech. Computer Tech. Class of 2007,
> > > IIT Delhi
> > >
> >
> >
>
>
--
www.jkg.in | http://www.jkg.in/contact-me/
Jayant Kr. Gandhi
M.Tech. Computer Tech. Class of 2007,
IIT Delhi
-----------------------------------------------------------
Rida Benjelloun
DocuLibre inc.
Site Web : http://www.doculibre.com
Courriel : [EMAIL PROTECTED]
-----------------------------------------------------------
<?xml version="1.0" encoding="iso-8859-1"?> <!-- ********************************************************************************
This XML document was originally developed in the course of development of the ISO/IEC
15938 standard (MPEG-7). This XML document contains either a part of the MPEG-7 schema
implementation for one or more MPEG-7 tools as specified by the MPEG-7 Requirements or
MPEG-7 description examples conformant to the MPEG-7 schema.
ISO/IEC gives users of MPEG-7 free license to this XML document or modifications thereof
for use in hardware or software products claiming conformance to MPEG-7.
Those intending to use this XML document in hardware or software products are advised that
its use may infringe existing patents. The original developers of this XML document and his/her
company, the subsequent editors and their companies, and ISO/IEC have no liability for use of
this XML document or modifications thereof in an implementation.
Copyright is not released for non MPEG-7 conforming products. The organizations who
contributed to this XML document retain the full right to use the code for their own purpose,
assign or donate their contribution to a third party and inhibit third parties from using their
contribution for non MPEG-7 conforming products.
Copyright (c) 1999-2001 ISO/IEC.
This XML document is provided for informative purposes only. If any parts of this XML
document contradict the normative part of the corresponding standard document then the
normative part should be used as the definitive specification.
This notice must be included in all copies or derivative works.
************************************************************************************ -->
<!-- ################################################################################ -->
<!-- ISO/IEC 15938 Information Technology - Multimedia Content Description Interface -->
<!-- MPEG-7 Description Example developed by MPEG MDS Sub-group, -->
<!-- ################################################################################ -->
<Mpeg7 xmlns="urn:mpeg:mpeg7:schema:2001" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:mpeg7="urn:mpeg:mpeg7:schema:2001" xmlns:xml="http://www.w3.org/XML/1998/namespace" xsi:schemaLocation="urn:mpeg:mpeg7:schema:2001 .\Mpeg7-2001.xsd">
<!-- ##################################################### -->
<!-- ### Description Metadata ### -->
<!-- ##################################################### -->
<Description xsi:type="ContentEntityType">
<DescriptionMetadata>
<Comment>
<FreeTextAnnotation>give tags here tag1 tag2 tag3 tag4 tag5 tag6</FreeTextAnnotation>
</Comment>
</DescriptionMetadata>
<MultimediaContent xsi:type="VideoType">
<Video>
<!-- ##################################################### -->
<!-- ### CreationInformation Description ### -->
<!-- ##################################################### -->
<CreationInformation>
<Creation>
<Title type="popular">give title of the video here</Title>
<TitleMedia>
<TitleImage>
<MediaUri>give path of the screenshot here Soccer_2.bmp</MediaUri>
</TitleImage>
<TitleVideo>
<MediaUri>give path of the video here Soccer_2.flv</MediaUri>
</TitleVideo>
</TitleMedia>
<Abstract>
<FreeTextAnnotation>
give description of the video here</FreeTextAnnotation>
</Abstract>
</Creation>
</CreationInformation>
</Video>
</MultimediaContent>
</Description>
</Mpeg7>
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
