By default, Nutch uses the index-basic plugin (see plugin.includes property in nutch-default.xml) This plugin (org.apache.nutch.indexer.basic.BasicIndexingFilter) indexes a document using the following fields:
host, site, url, content, anchor, title, tstamp (and cache if allowed) The fields digest, segment and boost are added by org.apache.nutch.indexer.Indexer for each document by default because Nutch needs them regardless of the indexing filter used. Mathijs Daniel Clark wrote: > Which indexFilter plugin does Nutch use out-of-the-box? Or how do I find > out? I'm trying to figure out how the following fields are being indexed. > > > > anchor > > boost > > content > > digest > > host > > segment > > site > > title > > tstamp > > url > > > > > > > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
