By default, Nutch uses the index-basic plugin (see plugin.includes 
property in nutch-default.xml) 
This plugin (org.apache.nutch.indexer.basic.BasicIndexingFilter) indexes 
a document using the following fields:

host, site, url, content, anchor, title, tstamp (and cache if allowed)

The fields digest, segment and boost are added by 
org.apache.nutch.indexer.Indexer for each document by default because  
Nutch needs them regardless of the indexing filter used.

Mathijs

Daniel Clark wrote:
> Which indexFilter plugin does Nutch use out-of-the-box?  Or how do I find
> out?  I'm trying to figure out how the following fields are being indexed.
>
>  
>
> anchor
>
> boost
>
> content
>
> digest
>
> host
>
> segment
>
> site
>
> title
>
> tstamp
>
> url
>
>  
>
>  
>
>
>   

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to