Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by EnisSoztutar:
http://wiki.apache.org/nutch/IndexStructure

The comment on the change is:
created from scracth

New page:
= The Index Structure =

The index structure formed after indexing is shown below : 

||'''FieldName'''||'''Stored'''||'''Index'''|| '''IndexingFilter''' 
||'''Comment'''||
||      boost    ||     YES     ||      NotIndexed      ||      Indexer || ||
||      digest  ||      YES     ||      NotIndexed      ||      Indexer || ||
||      lang    ||      YES     ||      UnTokenized     ||      
language-identifier || ||
||      segment ||              YES     ||      NotIndexed      ||      Indexer 
|| ||
||      tstamp  ||      YES     ||      Tokenized       ||      Indexer || ||
||      anchor  ||      NO      ||      Tokenized       ||      index-basic || 
||
||      title   ||      YES     ||      Tokenized       ||      index-basic     
||      also by index-more ||
||      site    ||      NO      ||      UnTokenized     ||      index-basic || 
||
||      host    ||      NO      ||      Tokenized       ||      index-basic     
||      hostname ||
||      url     ||      YES     ||      Tokenized       ||      index-basic || 
||
||      content         ||      NO      ||      Tokenized       ||      
index-basic     ||      content ||
||      lastModified    ||      YES     ||      NotIndexed      ||      
index-more || ||
||      date    ||      NO      ||      UnTokenized     ||      index-more || ||
||      contentLength   ||      YES     ||      NotIndexed      ||      
index-more || ||
||      type    ||      NO      ||      UnTokenized     ||      index-more      
||      contentType,primaryType,subType (all mime-types) ||
||      primaryType     ||      YES     ||      UnTokenized     ||      
index-more      ||      primaryType (mime-type) ||
||      subType         ||      YES     ||      UnTokenized     ||      
index-more      ||      subType (mime-type) ||
||      domain          ||     NO       || Tokenized  || index-domain  || see 
http://issues.apache.org/jira/browse/NUTCH-445 ||
||      tld             ||     YES      || UnTokenized / NotStored(bassed on 
conf) || tld || see http://issues.apache.org/jira/browse/NUTCH-439 ||
||      category        ||    NO        || UnTokenized || index-url-category || 
see http://issues.apache.org/jira/browse/NUTCH-386 ||
||      subcollection   ||    YES || Tokenized || subcollection || see 
subcollection plugin ||

----
Jira Issues about indexing and IndexingFilterPlugins are 

 * [http://issues.apache.org/jira/browse/NUTCH-445 DomainIndexingFilter]
 * [http://issues.apache.org/jira/browse/NUTCH-439 TLDIndexingFilter]
 * [http://issues.apache.org/jira/browse/NUTCH-422 index-extra plugin]
 * [http://issues.apache.org/jira/browse/NUTCH-386 index-url-categories] 


----


The index plugins to include are : 

 index-(basic | more | extra | domain | url-category) | tld | subcollection

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to