This is an automated email from the ASF dual-hosted git repository.
sergeykamov pushed a commit to branch NLPCRAFT-468
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft.git
The following commit(s) were added to refs/heads/NLPCRAFT-468 by this push:
new 742b715 WIP.
742b715 is described below
commit 742b715474e7e040405312a2701e4548f092912a
Author: Sergey Kamov <[email protected]>
AuthorDate: Thu Oct 14 10:28:54 2021 +0300
WIP.
---
nlpcraft/src/main/scala/org/apache/nlpcraft/interfaces.txt | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/nlpcraft/src/main/scala/org/apache/nlpcraft/interfaces.txt
b/nlpcraft/src/main/scala/org/apache/nlpcraft/interfaces.txt
index d9c5bc7..9ed5d65 100644
--- a/nlpcraft/src/main/scala/org/apache/nlpcraft/interfaces.txt
+++ b/nlpcraft/src/main/scala/org/apache/nlpcraft/interfaces.txt
@@ -18,10 +18,11 @@
Interfaces (pluggable components). All of them have built-in implementations.
1. Text-to-words tokenizer - org.apache.nlpcraft.model.nlp.NCNlpTokenizer.
-All frameworks allow to configure this low-level component according your
requirements
-Look it :
+This component should be pluggable because different tokenization approaches
can be needed. Provided default is fine in 90% - 99% (for EN)
+Look at:
https://nlp.stanford.edu/nlp/javadoc/javanlp-3.5.0/edu/stanford/nlp/process/Tokenizer.html,
https://opennlp.apache.org/docs/1.8.2/apidocs/opennlp-tools/opennlp/tools/tokenize/Tokenizer.html
+All frameworks allow to configure this low-level component according users'
requirements.
Delivered:
- org.apache.nlpcraft.model.components.tokenizer.NCOpenNlpTokenizer (not
configured)
- Stanford impl (not configured)
@@ -29,7 +30,7 @@ Mandatory.
Default in config - NCOpenNlpTokenizer.
When user needs to implement his own:
- another standard logic required (look at different variants by links above)
- - own logic required (tokenization for commands in own format like:
'give_me_coffee_please')
+ - user's own logic required (tokenization for commands in own format like:
'give_me_coffee_please')
- new languages support
2. Ners finder - org.apache.nlpcraft.model.nlp.NCNlpNerParse.
@@ -52,7 +53,7 @@ Optional (if null, stop, swear and suspicious words are not
detected, these prop
Default in config - NCDefaultStopWordsDetector, NCDefaultSwearWordsDetector.
(`suspicious` detector is not set by default. Can be configured if necessary
by NCConfiguredWordsDetector)
When user needs to implement his own:
- - own sophisticated logic implementation, which cannot be configured by
NCConfiguredWordsDetector.
+ - user's own sophisticated logic implementation, which cannot be configured
by NCConfiguredWordsDetector.
- new languages support
4. org.apache.nlpcraft.model.NCModelBehaviour