If I understand you correct. You want to enrich your index documents
with additional information (metadata)?
This is possible and you have to create new plugins for both indexing
and query. 

I had a simular need but I got the content I needed from the url with a
simple regex, but I guess you have to intercept the parseing part of the
document type you want to store additional metadata from.

I used the plugin documentation/tutorial mentioned here
(http://wiki.apache.org/nutch/WritingPluginExample-0%2e9) and created my
own plugins. Existing plugins is also great for reference if you
download the source from svn. If the metadata you need not is supported
by the parser I guess it can not be done.

Regards,
Ronny
 

-----Opprinnelig melding-----
Fra: karan thakral [mailto:[EMAIL PROTECTED] 
Sendt: 19. juni 2007 13:39
Til: [EMAIL PROTECTED]
Emne: Re: doubt about indexing

Hi Ronny

Thanks for your concerenced reply.As i told you that I am new to
nutch..i thought for adding new fields you may have to change your
indexing code for adding new fields actually I want to add meta data to
all the documents How can this thing be accomplished and what changes
have to be done in the plugin configuration files of the above plugin
folders you have told me  to do

awaiting reply

On 6/19/07, Naess, Ronny <[EMAIL PROTECTED]> wrote:
>
> Take a lokk at the plugin directory. You must probably do some changes

> in some or all of this index-basic, index-more, query-basic and 
> query-more
>
> Regards,
> Ronny
>
> -----Opprinnelig melding-----
> Fra: karan thakral [mailto:[EMAIL PROTECTED]
> Sendt: 19. juni 2007 12:09
> Til: [EMAIL PROTECTED]
> Emne: doubt about indexing
>
> hi
>
> I am new to nutch.I am using nutch-0.9 on fedora core .I want to know 
> what are the default fields over which nutch does indexing while 
> performing the crawl operation.
> also i want to change the fields of the documents to title 
> ,contents,publisher creator how can i do dat and then i want to 
> perform a field based search and can that search be done using the 
> nutch web interface
>
> --
> With Regards
> Karan Thakral
>
>
> 
>



--
With Regards
Karan Thakral


!DSPAM:4677c064317936608812681!

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to