On Tue, Jan 13, 2009 at 5:38 PM, ahammad <[email protected]> wrote: > > Hello, > > I have been using Nutch for a few days now, and it seems to be working > great. One thing that I do need is the ability to index HTML meta tags from > pages. I'm using Nutch to search some article, so there are tags like > "author" in the html pages. From searching the mailing list, I saw that > there were a few requests made last year for this, but that there was no > built-in functionality. Is this accurate? > > A few people suggested writing plug-ins while some other claimed that you > could modify certain files to do the job. Is there a simple way to do this > or do I have no choice but to write a plug-in for it? >
No unfortunately you will have to write a plug-in for it. I have something in mind that will make extracting data from html pages easier, but that's for post-1.0. > I read http://wiki.apache.org/nutch/WritingPluginExample-0%2e9 but it seems > somewhat overwhelming at this point. Any suggestions would be helpful. > > Thanks. > > Cheers > -- > View this message in context: > http://www.nabble.com/Indexing-HTML-meta-tags-tp21438171p21438171.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > -- Doğacan Güney
