Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by EnisSoztutar: http://wiki.apache.org/nutch/IndexStructure The comment on the change is: created from scracth New page: = The Index Structure = The index structure formed after indexing is shown below : ||'''FieldName'''||'''Stored'''||'''Index'''|| '''IndexingFilter''' ||'''Comment'''|| || boost || YES || NotIndexed || Indexer || || || digest || YES || NotIndexed || Indexer || || || lang || YES || UnTokenized || language-identifier || || || segment || YES || NotIndexed || Indexer || || || tstamp || YES || Tokenized || Indexer || || || anchor || NO || Tokenized || index-basic || || || title || YES || Tokenized || index-basic || also by index-more || || site || NO || UnTokenized || index-basic || || || host || NO || Tokenized || index-basic || hostname || || url || YES || Tokenized || index-basic || || || content || NO || Tokenized || index-basic || content || || lastModified || YES || NotIndexed || index-more || || || date || NO || UnTokenized || index-more || || || contentLength || YES || NotIndexed || index-more || || || type || NO || UnTokenized || index-more || contentType,primaryType,subType (all mime-types) || || primaryType || YES || UnTokenized || index-more || primaryType (mime-type) || || subType || YES || UnTokenized || index-more || subType (mime-type) || || domain || NO || Tokenized || index-domain || see http://issues.apache.org/jira/browse/NUTCH-445 || || tld || YES || UnTokenized / NotStored(bassed on conf) || tld || see http://issues.apache.org/jira/browse/NUTCH-439 || || category || NO || UnTokenized || index-url-category || see http://issues.apache.org/jira/browse/NUTCH-386 || || subcollection || YES || Tokenized || subcollection || see subcollection plugin || ---- Jira Issues about indexing and IndexingFilterPlugins are * [http://issues.apache.org/jira/browse/NUTCH-445 DomainIndexingFilter] * [http://issues.apache.org/jira/browse/NUTCH-439 TLDIndexingFilter] * [http://issues.apache.org/jira/browse/NUTCH-422 index-extra plugin] * [http://issues.apache.org/jira/browse/NUTCH-386 index-url-categories] ---- The index plugins to include are : index-(basic | more | extra | domain | url-category) | tld | subcollection ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs