This is an automated email from the ASF dual-hosted git repository.

jzemerick pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/opennlp.git


The following commit(s) were added to refs/heads/master by this push:
     new 22ba38f  OPENNLP-1189: Updating tokenizer input description. (#307)
22ba38f is described below

commit 22ba38f8a902de29e9bc4d40ec5da14fe31cdc8e
Author: Jeff Zemerick <[email protected]>
AuthorDate: Fri May 18 06:37:47 2018 -0400

    OPENNLP-1189: Updating tokenizer input description. (#307)
---
 opennlp-docs/src/docbkx/tokenizer.xml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/opennlp-docs/src/docbkx/tokenizer.xml 
b/opennlp-docs/src/docbkx/tokenizer.xml
index 3fb4519..f1fae19 100644
--- a/opennlp-docs/src/docbkx/tokenizer.xml
+++ b/opennlp-docs/src/docbkx/tokenizer.xml
@@ -215,7 +215,8 @@ double tokenProbs[] = tokenizer.getTokenProbabilities();]]>
                                available from the model download page on 
various corpora. The data
                                can be converted to the OpenNLP Tokenizer 
training format or used directly.
                 The OpenNLP format contains one sentence per line. Tokens are 
either separated by a
-                whitespace or by a special &lt;SPLIT&gt; tag.
+                whitespace or by a special &lt;SPLIT&gt; tag. Tokens are split 
automaticaly on whitespace
+                and at least one &lt;SPLIT&gt; tag must be present in the 
training text.
                                
                                The following sample shows the sample from 
above in the correct format.
                                <screen>

-- 
To stop receiving notification emails like this one, please contact
[email protected].

Reply via email to