Also take a look at the plugin tutorial. The example actually parses a meta content tag.
http://wiki.apache.org/nutch/WritingPluginExample-0%2e9 Regards, Ronny -----Opprinnelig melding----- Fra: karan [mailto:[EMAIL PROTECTED] Sendt: 20. juni 2007 11:56 Til: [EMAIL PROTECTED] Emne: Re: meta data plugin needed dear sir m havin the data in the following format <meta name="Title"content=" Introduction to computer science "> <meta name="Author"content=" GRAHAM(Neill) "> <meta name="Publishers"content=" West Pub Co "> how do i extract data based on these formats On 6/20/07, Thorsten Scherler < [EMAIL PROTECTED]> wrote: > > On Wed, 2007-06-20 at 14:33 +0530, karan thakral wrote: > > hi > > > > i need to write plugins for extracting the info from meta tags ...in > HTML > > documents > > the HTML documents are having meta tags as Title Publisher and > > Creator > and > > date > > > > are thr already in buit in plugins available with the nutch > > distribution > and > > will i have to write the plugin by myself. > > Have a look at > $NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html > /HtmlParser.java > > > There is a line > HTMLMetaProcessor.getMetaTags(metaTags, root, base); > > HTH > > salu2 > -- > Thorsten Scherler thorsten.at.apache.org > Open Source Java consulting, training and solutions > > !DSPAM:4678f9ce12562988016950! ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
