Hi, I am working on a focused Crawler for which I need the HTML meta tag info in ParseOutputFormat.java. It provides me with the parse of the HTML page so is there a way to Extract the HTML meta tags value through parse.getData?
For Ex. for html page : <html lang="hi"><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"><!-- Generated by Topcat --> <title>BBCHindi.com</title> <meta name="keywords" content="BBC, Hindi News, Politics"> I would like to extract the keywords content in through the parse. -- View this message in context: http://www.nabble.com/Problem-Extracting-HTML-Meta-Tags-tf3490430.html#a9747835 Sent from the Nutch - Dev mailing list archive at Nabble.com.