Actually did it tonight. Go to https://issues.apache.org/jira/browse/NUTCH-460 and download the zip file attachment. The plugin you can use as an example is RDFLinkParseFilter, specifically functions walk and findRDFLink.
Those functions walk the DOM tree looking for a link element with certain characteristics. Once they're found, the links are logged. I expect that you want to look for embed tags, and won't wish to stop after you reach the body tag like I do, but the principle is basically the same. While this code is written against the trunk, I'm sure it'll point you in the right direction if you're using an older version. I hope this helps you, but do let me know if you have any questions. Regards, Ricardo J. Méndez http://ricardo.strangevistas.net/ On 3/15/07, Ratnesh,V2Solutions India <[EMAIL PROTECTED]> wrote: > > Hi > as per our application I have to index and crawl only those html pages which > contains videos and mp3. > I have already studied writing plugin example given in wikki > http://wiki.apache.org/nutch/WritingPluginExample but what i found here is > that , this example helps to recommend a perticular page if we search for > keyword plugin. > and its necessary that in the html page there should be <meta > name="recommended" content="plugins" /> > tags. > > So the example given in wikki not helping me out......... > > Can you please give me some more input , so that I could test my application > ??? > > Looking to hear u soon......... > > Ratnesh V2solutions India > -- > View this message in context: > http://www.nabble.com/help-me-in-writing-plugin-for-extracting-tag-from-a-HTML-page-tf3412605.html#a9508714 > Sent from the Nutch - User mailing list archive at Nabble.com. > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
