This is an automated email from the ASF dual-hosted git repository.
sergeykamov pushed a commit to branch NLPCRAFT-513
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft-website.git
The following commit(s) were added to refs/heads/NLPCRAFT-513 by this push:
new 80f3a91 WIP.
80f3a91 is described below
commit 80f3a91a9edc78022e0fab4ed914e3f5fa5f22a5
Author: skhdl <[email protected]>
AuthorDate: Fri Oct 28 12:34:04 2022 +0400
WIP.
---
api-components.html | 51 +++++++++++++++++++++-------------------
built-in-builder.html | 41 ++++++++++++++++++---------------
built-in-entity-parser.html | 55 +++++++++++++++++++++++---------------------
built-in-overview.html | 6 ++---
built-in-token-enricher.html | 15 +++++++-----
built-in-token-parser.html | 11 ++++-----
custom-components.html | 27 +++++++++++-----------
first-example.html | 20 ++++++++--------
8 files changed, 118 insertions(+), 108 deletions(-)
diff --git a/api-components.html b/api-components.html
index c3e82e4..ab17278 100644
--- a/api-components.html
+++ b/api-components.html
@@ -21,7 +21,7 @@ id: api-components
limitations under the License.
-->
-<div class="col-md-8 second-column">
+<div class="col-md-8 second-column" xmlns="http://www.w3.org/1999/html">
<section id="overview">
<h2 class="section-title">API Components<a href="#"><i class="top-link
fas fa-fw fa-angle-double-up"></i></a></h2>
@@ -47,7 +47,7 @@ id: api-components
<p>Typical part of code:</p>
<pre class="brush: scala, highlight: []">
- // Initialized prepared domain model.
+ // Initializes prepared domain model.
val mdl = new CustomNlpModel()
// Creates client for given model.
@@ -79,8 +79,8 @@ id: api-components
<td><code>Token</code></td>
<td>
<code>Token</code> represented as <a
href="apis/latest/org/apache/nlpcraft/NCToken.html">NCToken</a>
- is simple string, part of user input, which split
according to some rules,
- for instance by spaces and some additional conditions,
which depends on language and some expectations.
+ is simple string, part of user input, which split
according to some rules
+ for instance by spaces and some additional conditions
which depend on language and some expectations.
So user input "<b>Where is it?</b>" contains four tokens:
"<code>Where</code>", "<code>is</code>",
"<code>it</code>", "<code>?</code>".
Usually <code>tokens</code> are words and punctuation
symbols which can also contain some additional
@@ -114,8 +114,8 @@ id: api-components
<tr>
<td><code>Intent</code></td>
<td>
- <code>Intent</code> is user defined callback and rule,
according to which this callback should be called.
- Rule is most often some template, based on expected set of
<code>entities</code> in user input,
+ <code>Intent</code> is user defined callback and rule
according to which this callback should be called.
+ Rule is most often some template based on expected set of
<code>entities</code> in user input,
but it can be more flexible.
Parameters extracted from user text input are passed into
callback methods.
These methods execution results are provided to user as
answer on his request.
@@ -189,7 +189,8 @@ id: api-components
<ul>
<li>
- <code>Line 1</code> defines intent <code>call</code>
with two conditions.
+ <code>Line 1</code> defines intent <code>call</code>
with two conditions
+ which expects two named entities in user input text.
</li>
<li>
<code>Line 2</code> defines related callback method
<code>onCommand()</code>.
@@ -216,8 +217,8 @@ id: api-components
<h2 class="section-title">Client responsibility<a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
- Client which is represented as <a
href="apis/latest/org/apache/nlpcraft/NCModelClient.html">NCModelClient</a>
- is necessary for communication with the model. Base client methods
are described below.
+ <code>Client</code> represented as <a
href="apis/latest/org/apache/nlpcraft/NCModelClient.html">NCModelClient</a>
+ is necessary for communication with the <code>Data Model</code>.
Base client methods are described below.
</p>
<table class="gradient-table">
@@ -244,7 +245,7 @@ id: api-components
Passes user text input to the model and receives back
callback and its parameters or
rejection exception if there isn't any triggered intents.
Main difference from <code>ask</code> that triggered
intent callback method is not called.
- This method and this parameter can be useful for tests
scenarios.
+ This method and this parameter can be useful in tests
scenarios.
</td>
</tr>
<tr>
@@ -252,7 +253,7 @@ id: api-components
<td>
Clears STM state. Memory is cleared wholly or with some
predicate.
Loot at <a href="short-term-memory.html">Conversation</a>
chapter to get more details.
- Second variant odf given method with another parameters is
here: <a
href="apis/latest/org/apache/nlpcraft/NCModelClient.html#clearStm-1d8">clearStm()</a>.
+ Second variant of given method with another parameters is
here - <a
href="apis/latest/org/apache/nlpcraft/NCModelClient.html#clearStm-1d8">clearStm()</a>.
</td>
</tr>
<tr>
@@ -260,7 +261,7 @@ id: api-components
<td>
Clears dialog state. Dialog is cleared wholly or with some
predicate.
Loot at <a href="short-term-memory.html">Conversation</a>
chapter to get more details.
- Second variant odf given method with another parameters is
here: <a
href="apis/latest/org/apache/nlpcraft/NCModelClient.html#clearDialog-1d8">clearDialog()</a>.
+ Second variant of given method with another parameters is
here - <a
href="apis/latest/org/apache/nlpcraft/NCModelClient.html#clearDialog-1d8">clearDialog()</a>.
</td>
</tr>
<tr>
@@ -277,8 +278,9 @@ id: api-components
<h2 class="section-title">Model configuration <a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
- Data Model configuration <a
href="apis/latest/org/apache/nlpcraft/NCModelConfig.html">NCModelConfig</a>
represents set of model parameter values.
- Its properties are described below.
+ <code>Data Model</code> configuration represented as
+ <a
href="apis/latest/org/apache/nlpcraft/NCModelConfig.html">NCModelConfig</a>
+ contains set of parameters which are described below.
</p>
<table class="gradient-table">
<thead>
@@ -306,7 +308,7 @@ id: api-components
Timeout of the user's conversation.
If user doesn't communicate with the model this time
period STM is going to be cleared.
Loot at <a href="short-term-memory.html">Conversation</a>
chapter to get more details.
- Mandatory parameter with default value.
+ It is the mandatory parameter with default value.
</td>
</tr>
<tr>
@@ -314,7 +316,7 @@ id: api-components
<td>
Maximum supported depth the user's conversation.
Loot at <a href="short-term-memory.html">Conversation</a>
chapter to get more details.
- Mandatory parameter with default value.
+ It is the mandatory parameter with default value.
</td>
</tr>
</tbody>
@@ -342,9 +344,9 @@ id: api-components
<td><a
href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a></td>
<td>Mandatory single</td>
<td>
- <code>Token parser</code> should parse user input plain
text and split this text
+ <code>Token parser</code> should be able to parse user
input plain text and split this text
into <code>tokens</code> list.
- NLPCraft provides default English language implementation
of token parser.
+ NLPCraft provides two default English language
implementations of token parser.
Also, project contains examples for <a
href="examples/light_switch_fr.html">French</a> and
<a href="examples/light_switch_ru.html">Russia</a>
languages token parser implementations.
</td>
@@ -353,9 +355,9 @@ id: api-components
<td> <a
href="apis/latest/org/apache/nlpcraft/NCTokenEnricher.html">NCTokenEnricher</a></td>
<td>Optional list</td>
<td>
- <code>Tokens enricher</code> is a component which allow to
add additional properties to prepared tokens,
+ <code>Tokens enricher</code> is a component which allow to
add additional properties for prepared tokens,
like part of speech, quote, stop-words flags or any other.
- NLPCraft provides English language default set of token
enrichers implementations.
+ NLPCraft provides built-in English language set of token
enrichers implementations.
</td>
</tr>
<tr>
@@ -372,8 +374,9 @@ id: api-components
<td>
<code>Entity parser</code> is a component which allow to
find user defined named entities
based on prepared tokens as input.
- NLPCraft provides wrappers for named-entity recognition
components of <a href="https://opennlp.apache.org/">Apache OpenNLP</a> and
- <a href="https://nlp.stanford.edu/">Stanford NLP</a>.
+ NLPCraft provides wrappers for named-entity recognition
components of
+ <a href="https://opennlp.apache.org/">Apache OpenNLP</a>
and
+ <a href="https://nlp.stanford.edu/">Stanford NLP</a> and
its own implementations.
Note that at least one entity parser must be defined.
</td>
</tr>
@@ -381,7 +384,7 @@ id: api-components
<td> <a
href="apis/latest/org/apache/nlpcraft/NCEntityEnricher.html">NCEntityEnricher</a></td>
<td>Optional list</td>
<td>
- <code>Entity enricher</code> is component which allows to
add additional properties to prepared entities.
+ <code>Entity enricher</code> is component which allows to
add additional properties for prepared entities.
Can be useful for extending existing entity enrichers
functionality.
</td>
</tr>
@@ -389,7 +392,7 @@ id: api-components
<td> <a
href="apis/latest/org/apache/nlpcraft/NCEntityMapper.html">NCEntityMapper</a></td>
<td>Optional list</td>
<td>
- <code>Entity mappers</code> is component which allows to
map one set of entities into another after the entities
+ <code>Entity mappers</code> is component which allows to
map one set of entities to another after the entities
were parsed and enriched. Can be useful for building
complex parsers based on existing.
</td>
</tr>
diff --git a/built-in-builder.html b/built-in-builder.html
index cc9ba8a..c2d2f04 100644
--- a/built-in-builder.html
+++ b/built-in-builder.html
@@ -28,11 +28,12 @@ id: built-in-builder
<p>
<a
href="apis/latest/org/apache/nlpcraft/NCPipelineBuilder.html">NCPipelineBuilder</a>
class
is designed for simplifying preparing <a
href="apis/latest/org/apache/nlpcraft/NCPipeline.html">NCPipeline</a> instance.
- It contains a number of methods <a
href="apis/latest/org/apache/nlpcraft/NCPipelineBuilder.html#withSemantic-fffff4b0">withSemantic()</a>
+ It allows to construct <a
href="apis/latest/org/apache/nlpcraft/NCPipeline.html">NCPipeline</a> instance
+ adding nested components via its methods.
+ It also contains a number of methods <a
href="apis/latest/org/apache/nlpcraft/NCPipelineBuilder.html#withSemantic-fffff4b0">withSemantic()</a>
which allow to prepare pipeline instance based on
<a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser">NCSemanticEntityParser</a>
and configured language.
- Currently only English language is supported.
- It also adds following English built-in components into pipeline:
+ Currently only <b>English</b> language is supported with broad set
of built-in components:
<a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCOpenNLPTokenParser.html">NCOpenNLPTokenParser</a>,
<a
href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher.html">NCOpenNLPLemmaPosTokenEnricher</a>,
<a
href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnStopWordsTokenEnricher.html">NCEnStopWordsTokenEnricher</a>,
@@ -53,14 +54,14 @@ id: built-in-builder
</pre>
<ul>
<li>
- It defines pipeline with all default English language
components and one semantic entity parser with
+ It defines pipeline with all built-in English language
components and one semantic entity parser with
model defined in <code>lightswitch_model.yaml</code>.
</li>
</ul>
- <p><b>Example with pipeline configured by built-in components:</b></p>
+ <p><b>Pipeline creation example constructed from built-in
components:</b></p>
- <pre class="brush: scala, highlight: [2, 6, 7, 12, 13, 14, 15]">
+ <pre class="brush: scala, highlight: [2, 6, 7, 12, 13, 14, 15, 16]">
val pipeline =
val stanford =
val props = new Properties()
@@ -74,36 +75,38 @@ id: built-in-builder
new NCPipelineBuilder().
withTokenParser(tokParser).
withTokenEnricher(new NCEnStopWordsTokenEnricher()).
+ withEntityParser(NCSemanticEntityParser(stemmer,
tokParser, "pizzeria_model.yaml")).
withEntityParser(new NCStanfordNLPEntityParser(stanford,
Set("number"))).
build
</pre>
<ul>
<li>
- <code>Line 2</code> defines configured
<code>StanfordCoreNLP</code> class instance.
+ <code>Line 2</code> defines configured <a
href="StanfordCoreNLP">StanfordCoreNLP</a> class instance.
Look at <a href="https://nlp.stanford.edu/">Stanford NLP</a>
documentation for more details.
</li>
<li>
- <code>Line 6</code> defines token parser
<code>NCStanfordNLPTokenParser</code>, pipeline mandatory component.
+ <code>Line 6</code> defines token parser <a
href="NCStanfordNLPTokenParser">NCStanfordNLPTokenParser</a>, pipeline
mandatory component.
Note that this one instance is used for two places: in
pipeline definition on <code>line 12</code> and
- in <code>NCSemanticEntityParser</code> definition on
<code>line 15</code>.
- </li>
- <li>
- <code>Line 7</code> defines simple implementation of semantic
stemmer which is necessary part
- of <code>NCSemanticEntityParser</code>.
+ in <a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser.html">NCSemanticEntityParser</a>
definition on <code>line 14</code>.
</li>
+
<li>
- <code>Line 13</code> defines configured
<code>NCEnStopWordsTokenEnricher</code> token enricher.
+ <code>Line 13</code> defines configured
+ <a
href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnStopWordsTokenEnricher.html">NCEnStopWordsTokenEnricher
</a>
+ token enricher.
</li>
<li>
- <code>Line 14</code> defines
<code>NCStanfordNLPEntityParser</code> entity parser based on Stanford NER
- configured for number values detection.
+ <code>Line 14</code> defines <a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser.html">NCSemanticEntityParser</a>
+ configured in YAML file <code>pizzeria_model.yaml</code>.
+ It uses also simple implementation of <a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticStemmer.html">NCSemanticStemmer</a>
+ created on <code>line 7</code> and token parser prepared on
<code>line 6</code>.
</li>
<li>
- <code>Line 14</code> defines
<code>NCStanfordNLPEntityParser</code> entity parser based on Stanford NER
+ <code>Line 15</code> defines <a
href="NCStanfordNLPEntityParser">NCStanfordNLPEntityParser</a> based on
Stanford NER
configured for number values detection.
</li>
<li>
- <code>Line 15</code> defines pipeline building.
+ <code>Line 16</code> defines pipeline building.
</li>
</ul>
@@ -125,7 +128,7 @@ id: built-in-builder
You can get fore information at examples description chapters:
<a href="examples/light_switch_fr.html">Light Switch FR</a> and
<a href="examples/light_switch_ru.html">Light Switch RU</a>.
- Note that these custom components are mostly wrappers on
existing solutions and
+ Note that these custom components are mostly wrappers on
existing open source on NLPCraft built-in solutions and
should be prepared just once when you start work with new
language.
</li>
</ul>
diff --git a/built-in-entity-parser.html b/built-in-entity-parser.html
index 0ab5dee..1cd3868 100644
--- a/built-in-entity-parser.html
+++ b/built-in-entity-parser.html
@@ -27,19 +27,22 @@ id: built-in-entity-parser
<p>
<a
href="apis/latest/org/apache/nlpcraft/NCEntityParser.html">NCEntityParser</a>
- is a component which allow to find user defined named entities
+ is a component which allows to find user defined named entities
based on prepared tokens as input.
</p>
<p>
- There are proved:
+ There are provided following built-in parsers:
</p>
<ul>
<li>
- Wrappers for <a href="https://opennlp.apache.org/">Apache
OpenNLP</a> and
- <a href="https://nlp.stanford.edu/">Stanford NLP</a> NER
components.
- Their models support English and some another languages for
their NER recognition.
+ Wrapper for <a href="https://opennlp.apache.org/">Apache
OpenNLP</a> named entities finder which
+ prepared models support English and some other languages.
+ </li>
+ <li>
+ Wrapper for <a href="https://nlp.stanford.edu/">Stanford
NLP</a> named entities finder which
+ prepared models support English and some other languages.
</li>
<li>
NLP data wrapper implementation. It is not depends on language.
@@ -55,10 +58,10 @@ id: built-in-entity-parser
<p>
<a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCOpenNLPTokenParser.html">NCOpenNLPTokenParser</a>
is wrapper on <a href="https://opennlp.apache.org/">Apache OpenNLP</a> NER
components.
- Look at the supported <b>Name Finder</b> models <a
href="https://opennlp.sourceforge.net/models-1.5/">here</a>.
+ Look at the supported NER finders models <a
href="https://opennlp.sourceforge.net/models-1.5/">here</a>.
For example for English language are accessible:
<code>Location</code>, <code>Money</code>,
<code>Person</code>, <code>Organization</code>, <code>Date</code>,
<code>Time</code> and <code>Percentage</code>.
- There are also accessible dome models for another languages.
+ There are also accessible models for other languages.
</p>
</section>
@@ -67,10 +70,11 @@ id: built-in-entity-parser
<p>
<code>NCStanfordNLPEntityParser</code> is wrapper on <a
href="https://nlp.stanford.edu/">Stanford NLP</a> NER components.
+ Look at the supported NER finders models <a
href="https://nlp.stanford.edu/software/CRF-NER.shtml">here</a>.
For example for English language are accessible:
<code>Location</code>, <code>Money</code>,
<code>Person</code>, <code>Organization</code>, <code>Date</code>,
<code>Time</code> and <code>Percent</code>.
- There are also accessible dome models for another languages.
- Look at the detailed information <a
href="https://nlp.stanford.edu/software/CRF-NER.shtml">here</a>.
+ There are also accessible models for other languages.
+
</p>
</section>
@@ -80,9 +84,11 @@ id: built-in-entity-parser
<p>
<a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCOpenNLPTokenParser.html">NCOpenNLPTokenParser</a>
converts NLP tokens into entities with four mandatory properties:
<code>nlp:token:text</code>, <code>nlp:token:index</code>,
<code>nlp:token:startCharIndex</code> and
- <code>nlp:token:endCharIndex</code>. However, if any other
properties were added into
- processed tokens by <a
href="apis/latest/org/apache/nlpcraft/NCTokenEnricher.html">NCTokenEnricher</a>
components, they will be copied also with names
- prefixed with <code>nlp:token:</code>.
+ <code>nlp:token:endCharIndex</code>.
+ However, if any other <a
href="apis/latest/org/apache/nlpcraft/NCTokenEnricher.html">NCTokenEnricher</a>
components
+ are registered in the <a
href="apis/latest/org/apache/nlpcraft/NCPipeline.html">NCPipeline</a>
+ and they add other properties into the tokens,
+ these properties also will be copied with names prefixed with
<code>nlp:token:</code>.
It is language independent component.
Note that converted tokens set can be restricted by predicate.
</p>
@@ -94,8 +100,7 @@ id: built-in-entity-parser
<p>
Semantic entity parser
<a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser.html">NCSemanticEntityParser</a>
- is the implementation of <a
href="apis/latest/org/apache/nlpcraft/NCEntityParser.html">NCEntityParser</a>,
- which in turn is component of the model pipeline <a
href="apis/latest/org/apache/nlpcraft/NCPipeline.html">NCPipeline</a>.
+ is synonyms based implementation of <a
href="apis/latest/org/apache/nlpcraft/NCEntityParser.html">NCEntityParser</a>.
This parser provides simple but very powerful way to find domain
specific data in the input text.
It defines list of <a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticElement.html">NCSemanticElement</a>
which are represent <a
href="https://en.wikipedia.org/wiki/Named-entity_recognition">Named
entities</a>.
@@ -630,13 +635,14 @@ id: built-in-entity-parser
</pre>
<ul>
<li>
- <code>Line 5</code> shows <code>macro</code> parameter
usage.
+ <code>Line 5</code> shows <code>macro</code> parameter
definition.
</li>
<li>
<code>Line 10</code> shows <code>macro</code> list of <a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticElement.html">NCSemanticElement</a>
parameter usage.
</li>
<li>
- Note that usage <code>withSemantic()</code> method which
represented on <code>line 3</code> i optional.
+ Note that usage <a
href="apis/latest/org/apache/nlpcraft/NCPipelineBuilder.html#withSemantic-fffff4b0">withSemantic()</a>
+ method which represented on <code>line 3</code> is
optional.
You can add <a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCNLPEntityParser.html">NCNLPEntityParser</a>
as usual <a
href="apis/latest/org/apache/nlpcraft/NCEntityParser.html">NCEntityParser</a>
when you define your <a
href="apis/latest/org/apache/nlpcraft/NCPipeline.html">NCPipeline</a>.
@@ -644,7 +650,7 @@ id: built-in-entity-parser
</ul>
<p>
- The following examples is based on YAML semantic elements
representation.
+ The following example is based on YAML semantic elements
representation.
</p>
<pre class="brush: js, highlight: []">
@@ -673,7 +679,7 @@ id: built-in-entity-parser
</pre>
<ul>
<li>
- <code>Line 3</code> makes semantic model which defined in
<code>time_model.yaml</code> YAML file.
+ <code>Line 3</code> makes semantic model which elements
are defined in <code>time_model.yaml</code> YAML file.
</li>
</ul>
@@ -723,11 +729,12 @@ id: built-in-entity-parser
<h3 class="sub-section-title">Languages Extending</h3>
<p>
- If you want to use <a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser.html">NCSemanticEntityParser</a>
- with not English language, you have to provide custom
+ If you want to use
+ <a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser.html">NCSemanticEntityParser</a>
+ with any not English language you have to provide custom
<a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticStemmer.html">NCSemanticStemmer</a>
and
<a
href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a>
- implementations for required language.
+ implementations for this desirable language.
Look at the <a href="examples/light_switch_fr.html">Light
Switch FR</a> for more details.
</p>
</section>
@@ -746,8 +753,4 @@ id: built-in-entity-parser
<li><a href="#parser-semantic-extending">SemanticParser Languages
Extending</a></li>
{% include quick-links.html %}
</ul>
-</div>
-
-
-
-
+</div>
\ No newline at end of file
diff --git a/built-in-overview.html b/built-in-overview.html
index 4309382..f65663e 100644
--- a/built-in-overview.html
+++ b/built-in-overview.html
@@ -34,13 +34,13 @@ id: built-in-overview
<p>There are two kinds of pipeline components:</p>
<ul>
<li>
- Following pipeline components have built-in implementations
and described in related sections:
+ Pipeline components which have built-in implementations and
can have broad range of uses:
<a
href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a>,
<a
href="apis/latest/org/apache/nlpcraft/NCTokenEnricher.html">NCTokenEnricher</a>,
<a
href="apis/latest/org/apache/nlpcraft/NCEntityParser.html">NCEntityParser</a>.
</li>
<li>
- Following pipeline components cannot have build implementation
because their logic are depended on concrete user model:
+ Pipeline components which can't have built-in implementations
because their logic are depended on concrete user model:
<a
href="apis/latest/org/apache/nlpcraft/NCTokenValidator.html">NCTokenValidator</a>,
<a
href="apis/latest/org/apache/nlpcraft/NCEntityEnricher.html">NCEntityEnricher</a>,
<a
href="apis/latest/org/apache/nlpcraft/NCEntityValidator.html">NCEntityValidator</a>,
@@ -52,7 +52,7 @@ id: built-in-overview
<div class="bq info">
<p><b>Built-in component licenses.</b></p>
<p>
- All built-in components which are based on <a
href="https://nlp.stanford.edu/">Stanford NLP</a> models and classes
+ All built-in components which are based on <a
href="https://nlp.stanford.edu/">Stanford NLP</a> models classes
are provided with <a
href="http://www.gnu.org/licenses/gpl-2.0.html">GNU General Public License</a>.
Look at Stanford NLP <a
href="https://nlp.stanford.edu/software/">Software</a> page.
All such components are placed in special project module
<code>nlpcraft-stanford</code>.
diff --git a/built-in-token-enricher.html b/built-in-token-enricher.html
index ddb6929..b0ea4ed 100644
--- a/built-in-token-enricher.html
+++ b/built-in-token-enricher.html
@@ -39,8 +39,9 @@ id: built-in-token-enricher
<p>
<a
href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher.html">NCOpenNLPLemmaPosTokenEnricher</a>
-
this component allows to add <code>lemma</code> and
<code>pos</code> values to processed token.
- Look at these links fpr more details: <a
href="https://www.wikiwand.com/en/Lemma_(morphology)">Lemma</a> and
- <a href="https://www.wikiwand.com/en/Part-of-speech_tagging">Part
of speech</a>.
+ Look at these links fpr more details:
+ <a
href="https://en.wikipedia.org/wiki/Lemma_(morphology)">Lemma</a> and
+ <a
href="https://en.wikipedia.org/wiki/Part-of-speech_tagging">Part of speech</a>.
Current implementation is based on <a
href="https://opennlp.apache.org/">Apache OpenNLP</a> project components.
Is uses Apache OpenNLP models, which are accessible
<a href="http://opennlp.sourceforge.net/models-1.5/">here</a> for
POS taggers.
@@ -71,9 +72,10 @@ id: built-in-token-enricher
<p>
<a
href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnDictionaryTokenEnricher.html">NCEnDictionaryTokenEnricher</a>
-
this component allows to add <code>dict</code> boolean flag to
processed token.
- Note that it requires already defined <code>lemma</code> token
property,
+ Note that it requires already defined <code>lemma</code> token
property.
You can use <a
href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher.html">NCOpenNLPLemmaPosTokenEnricher</a>
or any another component which sets
- <code>lemma</code> into the token.
+ <code>lemma</code> into the token. Note that you have to define it
in model pipilene token enricher list before
+ <a
href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnDictionaryTokenEnricher.html">NCEnDictionaryTokenEnricher</a>.
</p>
</section>
<section id="enricher-opennlp-stopword">
@@ -83,9 +85,10 @@ id: built-in-token-enricher
<a
href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnStopWordsTokenEnricher.html">NCEnStopWordsTokenEnricher</a>
-
this component allows to add <code>stopword</code> boolean flag to
processed token.
It is based on predefined rules for English language, but it can
be also extended by custom user word list and excluded list.
- Note that it requires already defined <code>lemma</code> token
property,
+ Note that it requires already defined <code>lemma</code> token
property.
You can use <a
href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCOpenNLPLemmaPosTokenEnricher.html">NCOpenNLPLemmaPosTokenEnricher</a>
or any another component which sets
- <code>lemma</code> into the token.
+ <code>lemma</code> into the toke. Note that you have to define it
in model pipilene token enricher list before
+ <a
href="apis/latest/org/apache/nlpcraft/nlp/enrichers/NCEnStopWordsTokenEnricher.html">NCEnStopWordsTokenEnricher</a>.
</p>
</section>
<section id="enricher-opennlp-swearword">
diff --git a/built-in-token-parser.html b/built-in-token-parser.html
index a961125..267a914 100644
--- a/built-in-token-parser.html
+++ b/built-in-token-parser.html
@@ -30,7 +30,7 @@ id: built-in-token-parser
component implementation should parse user input plain text and
split this text
into <code>tokens</code> list.
- NLPCraft provides default English language implementation of token
parser.
+ NLPCraft provides two English language token parser
implementations.
Also, project contains examples for <a
href="examples/light_switch_fr.html">French</a> and
<a href="examples/light_switch_ru.html">Russia</a> languages token
parser implementations.
@@ -43,7 +43,7 @@ id: built-in-token-parser
<p>
There is <a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCOpenNLPTokenParser.html">NCOpenNLPTokenParser</a>
implementation.
- It is token parser implementation which is wrapper on
+ This implementation is wrapper on
<a href="https://opennlp.apache.org/">Apache OpenNLP</a> project
tokenizer.
</p>
</section>
@@ -53,7 +53,7 @@ id: built-in-token-parser
<p>
There is <a
href="NCStanfordNLPTokenParser.html">NCStanfordNLPTokenParser</a>
implementation.
- It is token parser implementation which is wrapper on
+ This implementation is wrapper on
<a href="https://nlp.stanford.edu/">Stanford NLP</a> project
tokenizer.
</p>
</section>
@@ -61,10 +61,9 @@ id: built-in-token-parser
<h2 class="section-title">Remarks<a href="#"><i class="top-link fas
fa-fw fa-angle-double-up"></i></a></h2>
<p>
- There are two built-in token parsers added for one English
language because they have some difference
- in their algorithm and can provide different list of tokens for
same user text input.
+ There are two different English language implementations are
provided because they have some difference
+ in their algorithms and can provide different list of tokens for
same user text input.
Some built-in components are required token parser instance as
their parameter.
-
</p>
<ul>
<li>
diff --git a/custom-components.html b/custom-components.html
index fb6d403..7dec715 100644
--- a/custom-components.html
+++ b/custom-components.html
@@ -41,7 +41,7 @@ id: custom-components
It's not often situation when you need to prepare your own
language tokenizer.
Mostly it can be necessary if you want to work with some new
language.
You have to prepare new implementation once and can use it for all
projects on this language.
- Usually you just should find open source solution and wrap it for
+ Usually you just should find open source solution and wrap it for.
You have to implement <a
href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a>
trait.
</p>
<pre class="brush: scala, highlight: [2, 6]">
@@ -73,7 +73,8 @@ id: custom-components
</pre>
<ul>
<li>
- <code>NCFrTokenParser</code> is a simple wrapper which
implements <code>NCTokenParser</code> based on
+ <code>NCFrTokenParser</code> is a simple wrapper which
implements
+ <a
href="apis/latest/org/apache/nlpcraft/NCTokenParser.html">NCTokenParser</a>
methods based on
open source <a href="https://languagetool.org">Language
Tool</a> library.
</li>
</ul>
@@ -86,7 +87,7 @@ id: custom-components
</p>
<p>
Tokens enricher is component which allows to add additional
properties to prepared tokens.
- These tokens properties are used later when entities detection.
+ On next pipeline processing steps you can define entities
detection conditions based on these tokens properties.
</p>
<pre class="brush: scala, highlight: [25, 26]">
import org.apache.nlpcraft.*
@@ -128,13 +129,13 @@ id: custom-components
</section>
<section id="token-validator">
- <h2 class="section-title">Token validator <a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+ <h2 class="section-title">Token validator<a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>
You have to implement <a
href="apis/latest/org/apache/nlpcraft/NCTokenValidator.html">NCTokenValidator</a>
trait.
</p>
<p>
- There are tokens are inspected and exception can be thrown from
user code to break user input processing.
+ This component is designed for tokens inspection, an exception can
be thrown from user code to break user input processing.
</p>
<pre class="brush: scala, highlight: [3]">
@@ -164,9 +165,8 @@ id: custom-components
<p>
Most important component which finds user specific data.
These defined entities are input for <a
href="intent-matching.html">Intent matching</a> conditions.
- If built-in <a
href="apis/latest/org/apache/nlpcraft/nlp/parsers/NCSemanticEntityParser.html">NCSemanticEntityParser</a>
- is not enough, you can implement your own NER searching here.
- There is point for potential integrations with neural networks or
any other solutions which
+ You can implement your own custom logic for named entities
detection here.
+ Also, there is a point for potential integrations with neural
networks or any other solutions which
help you find and mark your domain specific named entities.
</p>
@@ -203,7 +203,7 @@ id: custom-components
Can be useful for extending existing entity enrichers
functionality.
</p>
- <pre class="brush: scala, highlight: [4, 10, 11]">
+ <pre class="brush: scala, highlight: [4, 11, 12]">
import org.apache.nlpcraft.*
object CityPopulationEntityEnricher:
@@ -223,10 +223,10 @@ id: custom-components
<code>Line 4</code> defines getting cities population data
from some external service.
</li>
<li>
- <code>Line 10</code> filters entities by <code>ID</code>.
+ <code>Line 11</code> filters entities by <code>ID</code>.
</li>
<li>
- <code>Line 11</code> enriches entities by new
<code>city:population</code> property.
+ <code>Line 12</code> enriches entities by new
<code>city:population</code> property.
</li>
</ul>
</section>
@@ -238,7 +238,7 @@ id: custom-components
</p>
<p>
- Entity mapper is component which allows to map one set of entities
into another after the entities
+ Entity mapper is component which allows to map one set of entities
to another after the entities
were parsed and enriched. Can be useful for building complex
parsers based on existing.
</p>
@@ -285,8 +285,7 @@ id: custom-components
You have to implement <a
href="apis/latest/org/apache/nlpcraft/NCEntityValidator.html">NCEntityValidator</a>
trait.
</p>
<p>
- Entities validator is user defined component, where prepared
entities are inspected and exceptions
- can be thrown from user code to break user input processing.
+ This component is designed for entities inspection, an exception
can be thrown from user code to break user input processing.
</p>
<pre class="brush: scala, highlight: [3]">
diff --git a/first-example.html b/first-example.html
index f1ae623..2f2fd84 100644
--- a/first-example.html
+++ b/first-example.html
@@ -39,7 +39,7 @@ id: first-example
You can create new Scala projects in many ways - we'll use SBT
to accomplish this task. Make sure that <code>build.sbt</code>
file has the following content:
</p>
- <pre class="brush: js, highlight: []">
+ <pre class="brush: js, highlight: [7]">
ThisBuild / version := "0.1.0-SNAPSHOT"
ThisBuild / scalaVersion := "3.1.3"
lazy val root = (project in file("."))
@@ -53,9 +53,9 @@ id: first-example
<p><b>NOTE: </b>use the latest versions of Scala and ScalaTest.</p>
<p>Create the following files so that resulting project structure
would look like the following:</p>
<ul>
- <li><code>lightswitch_model.yaml</code> - YAML configuration file,
which contains model description.</li>
- <li><code>LightSwitchModel.scala</code> - Scala class, model
implementation.</li>
- <li><code>LightSwitchModelSpec.scala</code> - Scala tests class,
which allows to test your model.</li>
+ <li><code>lightswitch_model.yaml</code> - YAML configuration file
which contains model description.</li>
+ <li><code>LightSwitchModel.scala</code> - Model
implementation.</li>
+ <li><code>LightSwitchModelSpec.scala</code> - Test that allows to
test your model.</li>
</ul>
<pre class="brush: plain, highlight: [7, 10, 14]">
| build.sbt
@@ -78,7 +78,7 @@ id: first-example
<h2 class="section-title">Data Model<a href="#"><i class="top-link fas
fa-fw fa-angle-double-up"></i></a></h2>
<p>
We are going to start with declaring the static part of our model
using YAML which we will later load using
- <code>NCModelAdapter</code> in our Scala-based model
implementation.
+ <a
href="../apis/latest/org/apache/nlpcraft/NCModelAdapter.html">NCModelAdapter</a>
in our Scala-based model implementation.
Open <code>src/main/resources/<b>light_switch.yaml</b></code>
file and replace its content with the following YAML:
</p>
@@ -177,18 +177,18 @@ id: first-example
</p>
<ul>
<li>
- On <code>line 5</code> our class extends
<code>NCModelAdapter</code> that allows us to pass
+ On <code>line 6</code> our class extends <a
href="../apis/latest/org/apache/nlpcraft/NCModelAdapter.html">NCModelAdapter</a>
that allows us to pass
prepared configuration and pipeline into model.
</li>
<li>
- On <code>line 6</code> created model configuration with most
default parameters.
+ <code>Line 7</code> creates model configuration with most
default parameters.
</li>
<li>
- On <code>line 7</code> created pipeline, based on semantic
model definition,
+ <code>Line 8</code> creates pipeline, based on semantic model
definition,
described in <code>lightswitch_model.yaml</code> file.
</li>
<li>
- <code>Lines 10 and 11</code> annotates intents <code>ls</code>
and its callback method <code>onMatch()</code>.
+ <code>Lines 10 and 11</code> annotate intents <code>ls</code>
and its callback method <code>onMatch()</code>.
Intent <code>ls</code> requires one action (a token belonging
to the group <code>act</code>) and optional list of light locations
(zero or more tokens with ID <code>ls:loc</code>) - by default
we assume the entire house as a default location.
</li>
@@ -264,7 +264,7 @@ id: first-example
</li>
<li>
<code>Line 11</code> calls a special method
- <a
href="apis/latest/org/apache/nlpcraft/NCModelClient.html#debugAsk-fffff96c">debugAsk()</a>.
+ <a
href="../apis/latest/org/apache/nlpcraft/NCModelClient.html#debugAsk-fffff96c">debugAsk()</a>.
It allows to check the winning intent and its callback
parameters without actually
calling the intent.
</li>