Actually did it tonight.  Go to
https://issues.apache.org/jira/browse/NUTCH-460  and download the zip
file attachment.  The plugin you can use as an example is
RDFLinkParseFilter, specifically functions walk and findRDFLink.

Those functions walk the DOM tree looking for a link element with
certain characteristics. Once they're found, the links are logged.  I
expect that you want to look for embed tags, and won't wish to stop
after you reach the body tag like I do, but the principle is basically
the same.

While this code is written against the trunk, I'm sure it'll point you
in the right direction if you're using an older version.

I hope this helps you, but do let me know if you have any questions.  Regards,




Ricardo J. Méndez
http://ricardo.strangevistas.net/



On 3/15/07, Ratnesh,V2Solutions India
<[EMAIL PROTECTED]> wrote:
>
> Hi
> as per our application I have to index and crawl only those html pages which
> contains videos and mp3.
> I have already studied writing plugin example given in wikki
> http://wiki.apache.org/nutch/WritingPluginExample but what i found here is
> that , this example helps to recommend a perticular page if we search for
> keyword plugin.
> and its necessary that in the html page there should be <meta
> name="recommended" content="plugins" />
> tags.
>
> So the example given in wikki not helping me out.........
>
> Can you please give me some more input , so that I could test my application
> ???
>
> Looking to hear u soon.........
>
> Ratnesh V2solutions India
> --
> View this message in context: 
> http://www.nabble.com/help-me-in-writing-plugin-for-extracting-tag-from-a-HTML-page-tf3412605.html#a9508714
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to