confusing description "set it to Integer.MAX_VALUE"
---------------------------------------------------

                 Key: NUTCH-640
                 URL: https://issues.apache.org/jira/browse/NUTCH-640
             Project: Nutch
          Issue Type: Improvement
          Components: documentation
    Affects Versions: 0.9.0
            Reporter: Stijn Vermeeren


This property "indexer.max.tokens" has the following description in 
nutch-default.xml :

" The maximum number of tokens that will be indexed for a single field
  in a document. This limits the amount of memory required for
  indexing, so that collections with very large files will not crash
  the indexing process by running out of memory.

  Note that this effectively truncates large documents, excluding
  from the index tokens that occur further in the document. If you
  know your source documents are large, be sure to set this value
  high enough to accomodate the expected size. If you set it to
  Integer.MAX_VALUE, then the only limit is your memory, but you
  should anticipate an OutOfMemoryError."

Apparently, "set it to Integer.MAX_VALUE" here means <<substitute the integer 
value of Integer.MAX_VALUE>>, and not <<put the text "Integer.MAX_VALUE" 
between the value tags>>. I think this is very confusing and the description 
should be improved.

I first put <value>Integer.MAX_VALUE</value> in my configuration, and it took a 
long time to figure out what was wrong, especially since Nutch rolled back on 
the default value of 10000 instead of giving an error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to