[ https://issues.apache.org/jira/browse/NUTCH-640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doğacan Güney closed NUTCH-640. ------------------------------- Resolution: Fixed Committed as of rev. 701052. > confusing description "set it to Integer.MAX_VALUE" > --------------------------------------------------- > > Key: NUTCH-640 > URL: https://issues.apache.org/jira/browse/NUTCH-640 > Project: Nutch > Issue Type: Improvement > Components: documentation > Affects Versions: 0.9.0 > Reporter: Stijn Vermeeren > Assignee: Doğacan Güney > Priority: Minor > Attachments: NUTCH-640.patch > > > This property "indexer.max.tokens" has the following description in > nutch-default.xml : > " The maximum number of tokens that will be indexed for a single field > in a document. This limits the amount of memory required for > indexing, so that collections with very large files will not crash > the indexing process by running out of memory. > Note that this effectively truncates large documents, excluding > from the index tokens that occur further in the document. If you > know your source documents are large, be sure to set this value > high enough to accomodate the expected size. If you set it to > Integer.MAX_VALUE, then the only limit is your memory, but you > should anticipate an OutOfMemoryError." > Apparently, "set it to Integer.MAX_VALUE" here means <<substitute the integer > value of Integer.MAX_VALUE>>, and not <<put the text "Integer.MAX_VALUE" > between the value tags>>. I think this is very confusing and the description > should be improved. > I first put <value>Integer.MAX_VALUE</value> in my configuration, and it took > a long time to figure out what was wrong, especially since Nutch rolled back > on the default value of 10000 instead of giving an error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.