On Tue, Jan 13, 2009 at 5:38 PM, ahammad <[email protected]> wrote:
>
> Hello,
>
> I have been using Nutch for a few days now, and it seems to be working
> great. One thing that I do need is the ability to index HTML meta tags from
> pages. I'm using Nutch to search some article, so there are tags like
> "author" in the html pages. From searching the mailing list, I saw that
> there were a few requests made last year for this, but that there was no
> built-in functionality. Is this accurate?
>
> A few people suggested writing plug-ins while some other claimed that you
> could modify certain files to do the job. Is there a simple way to do this
> or do I have no choice but to write a plug-in for it?
>

No unfortunately you will have to write a plug-in for it. I have
something in mind
that will make extracting data from html pages easier, but that's for post-1.0.

> I read http://wiki.apache.org/nutch/WritingPluginExample-0%2e9 but it seems
> somewhat overwhelming at this point. Any suggestions would be helpful.
>
> Thanks.
>
> Cheers
> --
> View this message in context: 
> http://www.nabble.com/Indexing-HTML-meta-tags-tp21438171p21438171.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



-- 
Doğacan Güney

Reply via email to