Hello, I have been using Nutch for a few days now, and it seems to be working great. One thing that I do need is the ability to index HTML meta tags from pages. I'm using Nutch to search some article, so there are tags like "author" in the html pages. From searching the mailing list, I saw that there were a few requests made last year for this, but that there was no built-in functionality. Is this accurate?
A few people suggested writing plug-ins while some other claimed that you could modify certain files to do the job. Is there a simple way to do this or do I have no choice but to write a plug-in for it? I read http://wiki.apache.org/nutch/WritingPluginExample-0%2e9 but it seems somewhat overwhelming at this point. Any suggestions would be helpful. Thanks. Cheers -- View this message in context: http://www.nabble.com/Indexing-HTML-meta-tags-tp21438171p21438171.html Sent from the Nutch - User mailing list archive at Nabble.com.
