Thanks for the reply. I will create a new list about writing plug-ins since it is technically a new topic.
If ay of the other people have suggestions please add them. I read somewhere that we can copy the existing index-more plugin and add a few lines so that it reads meta tags and indexes them. Any ideas about that? Cheers, Doğacan Güney-3 wrote: > > On Tue, Jan 13, 2009 at 5:38 PM, ahammad <[email protected]> wrote: >> >> Hello, >> >> I have been using Nutch for a few days now, and it seems to be working >> great. One thing that I do need is the ability to index HTML meta tags >> from >> pages. I'm using Nutch to search some article, so there are tags like >> "author" in the html pages. From searching the mailing list, I saw that >> there were a few requests made last year for this, but that there was no >> built-in functionality. Is this accurate? >> >> A few people suggested writing plug-ins while some other claimed that you >> could modify certain files to do the job. Is there a simple way to do >> this >> or do I have no choice but to write a plug-in for it? >> > > No unfortunately you will have to write a plug-in for it. I have > something in mind > that will make extracting data from html pages easier, but that's for > post-1.0. > >> I read http://wiki.apache.org/nutch/WritingPluginExample-0%2e9 but it >> seems >> somewhat overwhelming at this point. Any suggestions would be helpful. >> >> Thanks. >> >> Cheers >> -- >> View this message in context: >> http://www.nabble.com/Indexing-HTML-meta-tags-tp21438171p21438171.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> >> > > > > -- > Doğacan Güney > > -- View this message in context: http://www.nabble.com/Indexing-HTML-meta-tags-tp21438171p21441215.html Sent from the Nutch - User mailing list archive at Nabble.com.
