Hi, 
I am working on a focused Crawler for which I need the HTML meta tag info in
ParseOutputFormat.java. It provides me with the parse of the HTML page so is
there a way to Extract the HTML meta tags value through parse.getData?

For Ex. for html page :

<html lang="hi"><head>
      <meta http-equiv="Content-Type" content="text/html;
charset=utf-8"><!-- Generated by Topcat -->
<title>BBCHindi.com</title>
<meta name="keywords" content="BBC, Hindi News, Politics">

I would like to extract the keywords content in through the parse.
-- 
View this message in context: 
http://www.nabble.com/Problem-Extracting-HTML-Meta-Tags-tf3490430.html#a9747835
Sent from the Nutch - Dev mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to