If I understand you correct. You want to enrich your index documents with additional information (metadata)? This is possible and you have to create new plugins for both indexing and query.
I had a simular need but I got the content I needed from the url with a simple regex, but I guess you have to intercept the parseing part of the document type you want to store additional metadata from. I used the plugin documentation/tutorial mentioned here (http://wiki.apache.org/nutch/WritingPluginExample-0%2e9) and created my own plugins. Existing plugins is also great for reference if you download the source from svn. If the metadata you need not is supported by the parser I guess it can not be done. Regards, Ronny -----Opprinnelig melding----- Fra: karan thakral [mailto:[EMAIL PROTECTED] Sendt: 19. juni 2007 13:39 Til: [EMAIL PROTECTED] Emne: Re: doubt about indexing Hi Ronny Thanks for your concerenced reply.As i told you that I am new to nutch..i thought for adding new fields you may have to change your indexing code for adding new fields actually I want to add meta data to all the documents How can this thing be accomplished and what changes have to be done in the plugin configuration files of the above plugin folders you have told me to do awaiting reply On 6/19/07, Naess, Ronny <[EMAIL PROTECTED]> wrote: > > Take a lokk at the plugin directory. You must probably do some changes > in some or all of this index-basic, index-more, query-basic and > query-more > > Regards, > Ronny > > -----Opprinnelig melding----- > Fra: karan thakral [mailto:[EMAIL PROTECTED] > Sendt: 19. juni 2007 12:09 > Til: [EMAIL PROTECTED] > Emne: doubt about indexing > > hi > > I am new to nutch.I am using nutch-0.9 on fedora core .I want to know > what are the default fields over which nutch does indexing while > performing the crawl operation. > also i want to change the fields of the documents to title > ,contents,publisher creator how can i do dat and then i want to > perform a field based search and can that search be done using the > nutch web interface > > -- > With Regards > Karan Thakral > > > > -- With Regards Karan Thakral !DSPAM:4677c064317936608812681! ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
