Also take a look at the plugin tutorial.

The example actually parses a meta content tag.

http://wiki.apache.org/nutch/WritingPluginExample-0%2e9

Regards,

Ronny 

-----Opprinnelig melding-----
Fra: karan [mailto:[EMAIL PROTECTED] 
Sendt: 20. juni 2007 11:56
Til: [EMAIL PROTECTED]
Emne: Re: meta data plugin needed

dear sir
m havin the data in the following format

<meta name="Title"content=" Introduction to computer science "> <meta
name="Author"content=" GRAHAM(Neill) ">  <meta
name="Publishers"content=" West Pub Co ">

how do i extract data based on these formats

On 6/20/07, Thorsten Scherler <
[EMAIL PROTECTED]>
wrote:
>
> On Wed, 2007-06-20 at 14:33 +0530, karan thakral wrote:
> > hi
> >
> > i need to write plugins for extracting the info from meta tags ...in
> HTML
> > documents
> > the HTML documents are having meta tags as Title Publisher and 
> > Creator
> and
> > date
> >
> > are thr already in buit in plugins available with the nutch 
> > distribution
> and
> > will i have to write the plugin by myself.
>
> Have a look at
> $NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html
> /HtmlParser.java
>
>
> There is a line
> HTMLMetaProcessor.getMetaTags(metaTags, root, base);
>
> HTH
>
> salu2
> --
> Thorsten Scherler
thorsten.at.apache.org
> Open Source Java                      consulting, training and
solutions
>
>


!DSPAM:4678f9ce12562988016950!

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to