This is an automated email from the ASF dual-hosted git repository.
sergeykamov pushed a commit to branch NLPCRAFT-468
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft.git
The following commit(s) were added to refs/heads/NLPCRAFT-468 by this push:
new b65ad81 WIP.
b65ad81 is described below
commit b65ad81a33e6410456701d2421ee3ca10f2da03f
Author: Sergey Kamov <[email protected]>
AuthorDate: Thu Oct 14 10:24:55 2021 +0300
WIP.
---
nlpcraft/src/main/scala/org/apache/nlpcraft/interfaces.txt | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/nlpcraft/src/main/scala/org/apache/nlpcraft/interfaces.txt
b/nlpcraft/src/main/scala/org/apache/nlpcraft/interfaces.txt
index f260949..d9c5bc7 100644
--- a/nlpcraft/src/main/scala/org/apache/nlpcraft/interfaces.txt
+++ b/nlpcraft/src/main/scala/org/apache/nlpcraft/interfaces.txt
@@ -18,13 +18,18 @@
Interfaces (pluggable components). All of them have built-in implementations.
1. Text-to-words tokenizer - org.apache.nlpcraft.model.nlp.NCNlpTokenizer.
+All frameworks allow to configure this low-level component according your
requirements
+Look it :
+https://nlp.stanford.edu/nlp/javadoc/javanlp-3.5.0/edu/stanford/nlp/process/Tokenizer.html,
+https://opennlp.apache.org/docs/1.8.2/apidocs/opennlp-tools/opennlp/tools/tokenize/Tokenizer.html
Delivered:
- org.apache.nlpcraft.model.components.tokenizer.NCOpenNlpTokenizer (not
configured)
- Stanford impl (not configured)
Mandatory.
Default in config - NCOpenNlpTokenizer.
When user needs to implement his own:
- - own logic required (for example `opennlp` implementation is not satisfied,
and `stanford` license is not suitable)
+ - another standard logic required (look at different variants by links above)
+ - own logic required (tokenization for commands in own format like:
'give_me_coffee_please')
- new languages support
2. Ners finder - org.apache.nlpcraft.model.nlp.NCNlpNerParse.