Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "IndexStructure" page has been changed by SebastianNagel:
https://wiki.apache.org/nutch/IndexStructure?action=diff&rev1=20&rev2=21

Comment:
Add field 'id' (cf. NUTCH-1708)

  
  The index structure formed after indexing is shown below : 
  
- ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Plugin/Class''' 
||'''Comment'''|| '''version'''||
+ ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Plugin/Class''' 
||'''Comment'''||<-2> '''version'''||
  || || || || || || '''1.x''' || '''2.x''' ||
+ ||      id      ||      YES   ||      Indexed, Un-Tokenized   || 
[[http://nutch.apache.org/apidocs/apidocs-1.8/org/apache/nutch/indexer/IndexerMapReduce.html|IndexerMapReduce]]/[[http://nutch.apache.org/apidocs/apidocs-2.2.1/org/apache/nutch/indexer/IndexUtil.html|IndexUtil]]
  || '''URL''' used as '''ID''' to update and delete documents || X || X ||
  ||    boost    ||     YES     ||      Not Indexed     || various scoring 
plugins || Adds a '''score''' value field to a particular document. This is 
allocated based upon its importance within the webgraph. || ?  || ? ||
  ||    digest  ||      YES     ||      Not Indexed     || 
org.apache.nutch.indexer.IndexerMapReduce.java || Adds a '''message digest''' 
field to a document. Can be MD5 over content and headers or more sophisticated 
text profile of the content. ||  ?  || ? ||
  ||    lang    ||      YES     ||      Un-Tokenized    ||      
language-identifier || Add a '''lang''', language field to a document.||  ?  || 
? ||

Reply via email to