This is an automated email from the ASF dual-hosted git repository.
sergeykamov pushed a commit to branch NLPCRAFT-513-intents
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft-website.git
The following commit(s) were added to refs/heads/NLPCRAFT-513-intents by this
push:
new b9497fb WIP.
b9497fb is described below
commit b9497fb45dd51a2f5c9a268ac02bff7779d79489
Author: Sergey Khisamov <[email protected]>
AuthorDate: Tue Dec 6 17:08:26 2022 +0400
WIP.
---
_data/idl-fns.yml | 41 +++++++-------
intent-matching.html | 149 +++++++++++++++------------------------------------
2 files changed, 64 insertions(+), 126 deletions(-)
diff --git a/_data/idl-fns.yml b/_data/idl-fns.yml
index 1f1aa6f..8cbd4bd 100644
--- a/_data/idl-fns.yml
+++ b/_data/idl-fns.yml
@@ -23,9 +23,9 @@ fn-ent:
- name: ent_id
sig: |
<b>ent_id</b>(t: Entity<em><sub>opt</sub></em>) ⇒ String, # ⇒ String
- synopsis: Returns {% scaladoc NCEntity.html#getId-0 entity ID() %}
+ synopsis: Returns <a
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getId-0">entity ID</a>
desc: |
- Returns <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getId()">entity
ID</a>
+ Returns <a
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getId-0">entity ID</a>
for the current entity (default) or the provided one by the optional
parameter <code><b>t</b></code>. Note that this
functions has a special shorthand <code><b>#</b></code>.
usage: |
@@ -38,9 +38,9 @@ fn-ent:
- name: ent_groups
sig: |
<b>ent_groups</b>(t: Entity<em><sub>opt</sub></em>) ⇒ List[String]
- synopsis: Gets the list of <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getGroups()">groups</a>
this entity belongs to
+ synopsis: Gets the list of <a class="not-code"
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getGroups-0">groups</a>
this entity belongs to
desc: |
- Gets the list of <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getGroups()">groups</a>
+ Gets the list of <a class="not-code"
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getGroups-0">groups</a>
the current entity (default) or the provided one by the optional
parameter <code><b>t</b></code> belongs to. Note that,
by default, if not specified explicitly, entity always belongs to one
group with ID equal to entity ID.
May return an empty list but never a <code>null</code>.
@@ -64,7 +64,7 @@ fn-ent:
<b>ent_text</b>(t: Entity<em><sub>opt</sub></em>) ⇒ String
synopsis: Returns entity's original text
desc: |
- Returns <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getOriginalText()">entity's
original text</a>.
+ Returns <a class="not-code"
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#mkText-0">entity's
original text</a>.
If <code>t</code> is not provided the current entity is assumed.
usage: |
// Result: entity original input text.
@@ -75,7 +75,7 @@ fn-ent:
<b>ent_index</b>(t: Entity<em><sub>opt</sub></em>) ⇒ Long
synopsis: Returns entity's index in the original input
desc: |
- Returns <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getIndex()">entity's
index</a> in the original input. Note that this is an index of the entity and
not of the character.
+ Returns entity's index in the original input. Note that this is an index
of the entity and not of the character.
If <code>t</code> is not provided the current entity is assumed.
usage: |
// Result: 'true' if index of this entity in the original input is equal
to 1.
@@ -151,11 +151,11 @@ fn-ent:
<b>ent_is_before_group</b>(grp: String) ⇒ Boolean
synopsis: |
Returns <code>true</code> if there is a entity that belongs to the
- <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getGroups()">group</a>
+ <a class="not-code"
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getGroups-0">group</a>
<code>grp</code> after this entity
desc: |
Returns <code>true</code> if there is a entity that belongs to the
- <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getGroups()">group</a>
+ <a class="not-code"
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getGroups-0">group</a>
<code>grp</code> after this entity.
usage: |
// Result: 'true' if there is a entity that belongs to the group 'grp'
after this entity.
@@ -166,11 +166,11 @@ fn-ent:
<b>ent_is_after_group</b>(grp: String) ⇒ Boolean
synopsis: |
Returns <code>true</code> if there is a entity that belongs to the
- <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getGroups()">group</a>
+ <a class="not-code"
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getGroups-0">group</a>
<code>grp</code> before this entity
desc: |
Returns <code>true</code> if there is a entity that belongs to the
- <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCEntity.html#getGroups()">group</a>
+ <a class="not-code"
href="/apis/latest/org/apache/nlpcraft/NCEntity.html#getGroups-0">group</a>
<code>grp</code> before this entity.
usage: |
// Result: 'true' if there is a entity that belongs to the group 'grp'
before this entity.
@@ -357,19 +357,19 @@ fn-req:
- name: req_id
sig: |
<b>req_id</b> ⇒ String
- synopsis: Returns <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html#getServerRequestId()">server
request ID</a>
+ synopsis: Returns <a class="not-code"
href="/apis/latest/org/apache/nlpcraft/NCRequest.html#getRequestId-0">request
ID</a>
desc: |
- Returns <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html#getServerRequestId()">server
request ID</a>.
+ Returns <a class="not-code"
href="/apis/latest/org/apache/nlpcraft/NCRequest.html#getRequestId-0">request
ID</a>.
usage: |
- // Result: server request ID.
+ // Result: request ID.
req_id
- name: req_text
sig: |
<b>req_text</b> ⇒ String
- synopsis: Returns request <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html#getNormalizedText()">normalied
text</a>
+ synopsis: Returns request <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/NCRequest.html#getText-0">text</a>
desc: |
- Returns request <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html#getNormalizedText()">normalized
text</a>.
+ Returns request <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/NCRequest.html#getText-0">text</a>.
usage: |
// Result: request text.
req_text
@@ -377,9 +377,9 @@ fn-req:
- name: req_tstamp
sig: |
<b>req_tstamp</b> ⇒ Long
- synopsis: Gets UTC/GMT receive <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html#getNormalizedText()">timestamp</a>
+ synopsis: Gets UTC/GMT receive <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/NCRequest.html#getReceiveTimestamp-0">timestamp</a>
desc: |
- Gets UTC/GMT <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCRequest.html#getNormalizedText()">timestamp</a>
+ Gets UTC/GMT <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/NCRequest.html#getReceiveTimestamp-0">timestamp</a>
in ms when user input was received.
usage: |
// Result: input receive timsstamp in ms.
@@ -388,9 +388,9 @@ fn-req:
- name: user_id
sig: |
<b>user_id</b> ⇒ String
- synopsis: Returns <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCUser.html#getId()">user ID</a>
+ synopsis: Returns <code>user ID</code>
desc: |
- Returns <a class="not-code" target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCUser.html#getId()">user ID</a>.
+ Returns <code>user ID</code>
usage: |
// Result: user ID.
user_id
@@ -897,8 +897,7 @@ fn-metadata:
<b>meta_ent</b>(p: String) ⇒ Any
synopsis: Gets entity metadata property <code><b>p</b></code>
desc: |
- Gets entity metadata property <code><b>p</b></code>. See
- <a href="/data-model.html#meta">entity metadata</a> for more information.
+ Gets entity metadata property <code><b>p</b></code>.
usage: |
// Result: 'nlp:token:text' entity metadata property.
meta_ent('nlp:token:text')
diff --git a/intent-matching.html b/intent-matching.html
index 7e5fe35..c8317cb 100644
--- a/intent-matching.html
+++ b/intent-matching.html
@@ -1144,123 +1144,62 @@ id: intent_matching
</section>
<section id="logic">
<h2 class="section-title">Intent Matching Logic <a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- In order to understand the intent matching logic lets review the
overall user request processing workflow:
- </p>
- <figure>
- <img class="img-fluid" style="border: none; padding: 0;"
src="/images/intent_matching1.png" alt="">
- <figcaption><b>Fig. 1</b> User Request Workflow</figcaption>
- </figure>
- <ul>
- <li>
- <b>Step: 0</b><br>
- <p>
- Server receives REST call <code>/ask</code> or
<code>/ask/sync</code> that contains the text
- of the sentence that needs to be processed.
- </p>
- </li>
- <li>
- <b>Step: 1</b><br>
- <p>
- At this step the server attempts to find additional
variations of the input sentence by substituting
- certain words in the original text with synonyms from
Google's BERT dataset. Note that server will not use the synonyms that
- are already defined in the model itself - it only tries to
compensate for the potential incompleteness
- of the model. The result of this step is one or more
sentences that all have the same meaning as the
- original text.
- </p>
- </li>
- <li>
- <b>Step: 2</b><br>
- <p>
- At this step the server takes one or more sentences from
the previous step and tokenizes them. This
- process involves converting the text into a sequence of
enriched entities representing named entities.
- This step also performs the initial server-side enrichment
and detection of the
- <a href="/data-model.html#builtin">built-in named
entities</a>.
- </p>
- <p>
- The result of this step is a sequence of converted
sentences, where each element is a sequence
- of entities. These sequences are send down to the data
probe that has requested data model deployed.
- </p>
- </li>
- <li>
- <b>Step: 3</b><br>
- <p>
- This is the first step of the probe-side processing. At
this point the data probe receives one or more
- sequences of entities. Probe then takes each sequence and
performs the final enrichment by detecting user-defined
- elements additionally to the built-in entities that were
detected on the server during step 2 above.
- </p>
- </li>
- <li>
- <b>Step: 4</b><br>
- <p>
- This is an important step for understanding intent
matching logic. At this step the data probe
- takes sequences of entities generated at the last step and
comes up with one or more parsing
- variants. A parsing variant is a sequence of entities that
is free from entity overlapping and other parsing
- ambiguities. Typically, a single sequence of entities can
produce one (always) or more parsing variants.
- </p>
- <p>
- Let's consider the input text <code>'A B C D'</code> and
the following elements defined in our model:
- </p>
- <pre class="brush: js">
- "elements": [
- {
- "id": "elm1",
- "synonyms": ["A B"]
- },
- {
- "id": "elm2",
- "synonyms": ["B C"]
- },
- {
- "id": "elm3",
- "synonyms": ["D"]
- }
- ],
- </pre>
- <p>
- All of these elements will be detected but since two of
them are overlapping (<code>elm1</code> and
- <code>elm2</code>) there should be <b>two</b> parsing
variants at the output of this step:
- </p>
+ <p>
+ {% scaladoc NCPipeline NCPipeline %} processing result is
collection of {% scaladoc NCVariant NCVariant %} instances.
+ As example let's consider the input text <code>'A B C
D'</code> and the following elements defined in our model:
+ </p>
+ <pre class="brush: js">
+ "elements": [
+ {
+ "id": "elm1",
+ "synonyms": ["A B"]
+ },
+ {
+ "id": "elm2",
+ "synonyms": ["B C"]
+ },
+ {
+ "id": "elm3",
+ "synonyms": ["D"]
+ }
+ ],
+ </pre>
+ <p>
+ All of these elements will be detected but since two of them
are overlapping (<code>elm1</code> and
+ <code>elm2</code>) there should be <b>two</b> parsing variants
at the output of this step:
+ </p>
<ol>
<li><code>elm1</code>('A', 'B') <code>freeword</code>('C')
<code>elm3</code>('D')</li>
<li><code>freeword</code>('A') <code>elm2</code>('B', 'C')
<code>elm3</code>('D')</li>
</ol>
<p></p>
<p>
- Note that at this point the <em>system cannot determine
which of these variants is the best one
+ Note that initially the <em>system cannot determine which
of these variants is the best one
for matching - there's simply not enough information at
this stage</em>. It can only be determined
- when each variant is matched against model's intents -
which happens in the next step.
- </p>
- </li>
- <li>
- <b>Step: 5</b><br>
- <p>
- At this step the actual matching between intents and
variants happens. Each parsing variant from the previous
- step is matched against each intent. Each matching pair of
a variant and an intent produce a match with a
+ when each variant is matched against model's intents.
+ So, each parsing variant is matched against each intent.
Each matching pair of a variant and an intent produce a match with a
<em>certain weight</em>. If there are no matches at all -
an error is returned. If matches were found, the match
with the biggest weight is selected as a winning match. If
multiple matches have the same weight, their
respective variants' weights will be used to further sort
them out. Finally, the intent's callback from the winning match is
called.
</p>
- <p>
- Although details on exact algorithm on weight calculation
are too complex, here's the general guidelines
- on what determines the weight of the match between a
parsing variant and the intent. Note that these rules
- coalesce around the principle idea that the <b>more
specific match always wins</b>:
- </p>
- <ul>
- <li>
- A match that captures more entities has more weight
than a match with less entities. As a corollary, the match
- with less free words (i.e. unused words) has bigger
weight than a match with more free words.
- </li>
- <li>
- Entities for user-defined elements are more important
than built-in entities.
- </li>
- <li>
- A more specific match has bigger weight. In other
words, a match that uses an entity from the conversation
- context (i.e short-term-memory) has less weight than a
match that only uses entities from the current request. In the same
- way older entities from the conversation give less
weight than the more recent ones.
- </li>
- </ul>
+ <p>
+ Although details on exact algorithm on weight calculation are too
complex, here's the general guidelines
+ on what determines the weight of the match between a parsing
variant and the intent. Note that these rules
+ coalesce around the principle idea that the <b>more specific match
always wins</b>:
+ </p>
+ <ul>
+ <li>
+ A match that captures more entities has more weight than a
match with less entities. As a corollary, the match
+ with less free words (i.e. unused words) has bigger weight
than a match with more free words.
+ </li>
+ <li>
+ Entities for user-defined elements are more important than
built-in entities.
+ </li>
+ <li>
+ A more specific match has bigger weight. In other words, a
match that uses an entity from the conversation
+ context (i.e short-term-memory) has less weight than a match
that only uses entities from the current request. In the same
+ way older entities from the conversation give less weight than
the more recent ones.
</li>
</ul>
</section>