confusing description "set it to Integer.MAX_VALUE" ---------------------------------------------------
Key: NUTCH-640 URL: https://issues.apache.org/jira/browse/NUTCH-640 Project: Nutch Issue Type: Improvement Components: documentation Affects Versions: 0.9.0 Reporter: Stijn Vermeeren This property "indexer.max.tokens" has the following description in nutch-default.xml : " The maximum number of tokens that will be indexed for a single field in a document. This limits the amount of memory required for indexing, so that collections with very large files will not crash the indexing process by running out of memory. Note that this effectively truncates large documents, excluding from the index tokens that occur further in the document. If you know your source documents are large, be sure to set this value high enough to accomodate the expected size. If you set it to Integer.MAX_VALUE, then the only limit is your memory, but you should anticipate an OutOfMemoryError." Apparently, "set it to Integer.MAX_VALUE" here means <<substitute the integer value of Integer.MAX_VALUE>>, and not <<put the text "Integer.MAX_VALUE" between the value tags>>. I think this is very confusing and the description should be improved. I first put <value>Integer.MAX_VALUE</value> in my configuration, and it took a long time to figure out what was wrong, especially since Nutch rolled back on the default value of 10000 instead of giving an error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.