On Wed, 2007-06-20 at 14:33 +0530, karan thakral wrote:
> hi
> 
> i need to write plugins for extracting the info from meta tags ...in HTML
> documents
> the HTML documents are having meta tags as Title Publisher and Creator and
> date
> 
> are thr already in buit in plugins available with the nutch distribution and
> will i have to write the plugin by myself.

Have a look at 
$NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java

There is a line
HTMLMetaProcessor.getMetaTags(metaTags, root, base);

HTH

salu2
-- 
Thorsten Scherler                                 thorsten.at.apache.org
Open Source Java                      consulting, training and solutions


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to