This is an automated email from the ASF dual-hosted git repository.
aradzinski pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft-website.git
The following commit(s) were added to refs/heads/master by this push:
new 51d192c WIP.
51d192c is described below
commit 51d192c1a1907373983059d5614bd2735be05049
Author: Aaron Radzinski <[email protected]>
AuthorDate: Wed Jul 28 13:28:05 2021 -0700
WIP.
---
_layouts/documentation.html | 7 -
basic-concepts.html | 271 ---------------------------------
blogs/quick_intro_apache_nlpcraft.html | 10 +-
data-model.html | 69 +++++++++
4 files changed, 74 insertions(+), 283 deletions(-)
diff --git a/_layouts/documentation.html b/_layouts/documentation.html
index 9a8c8f4..74d0301 100644
--- a/_layouts/documentation.html
+++ b/_layouts/documentation.html
@@ -56,13 +56,6 @@ layout: interior
</li>
<li class="side-nav-title">Developer Guide</li>
<li>
- {% if page.id == "basic_concepts" %}
- <a class="active" href="/basic-concepts.html">Basic
Concepts</a>
- {% else %}
- <a href="/basic-concepts.html">Basic Concepts</a>
- {% endif %}
- </li>
- <li>
{% if page.id == "first_example" %}
<a class="active" href="/first-example.html">First Example</a>
{% else %}
diff --git a/basic-concepts.html b/basic-concepts.html
deleted file mode 100644
index 72638e7..0000000
--- a/basic-concepts.html
+++ /dev/null
@@ -1,271 +0,0 @@
----
-active_crumb: Basic Concepts
-layout: documentation
-id: basic_concepts
----
-
-<!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
--->
-
-<div id="basic-concepts" class="col-md-8 second-column">
- <section id="overview">
- <h2 class="section-title">Basic Concepts <a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Below we’ll cover some of the key concepts that are important for
NLPCraft:
- </p>
- <ul>
- <li><a href="#model">Data Model</a></li>
- <li><a href="#ne">Named Entities</a></li>
- <li><a href="#intent">Intent Matching</a></li>
- <li><a href="#stm">Conversation <span class="amp">&</span>
STM</a></li>
- </ul>
- </section>
- <section id="model">
- <h2 class="section-sub-title">Data Model <a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Data model is a central concept in NLPCraft. It defines natural
language interface to your public or
- private data sources like on-premise database or a cloud SaaS
application.
- NLPCraft employs a <em>model-as-a-code</em> approach where entire
data model is
- an implementation of <a target="javadoc"
href="/apis/latest/org/apache/nlpcraft/model/NCModel.html">NCModel</a>
- interface which can be developed using any JVM programming
language like Java, Scala, Kotlin, or Groovy.
- </p>
- <p>
- A data model defines:
- </p>
- <ul>
- <li>Set of model <a href="data-model.html#elements">elements</a>
(a.k.a. named entities) to be detected in the user input.</li>
- <li>Zero or more intents and their callbacks.</li>
- <li>Common model configuration and various life-cycle
callbacks.</li>
- </ul>
- <p>
- Note that model-as-a-code approach allows you to use any software
lifecycle tools and
- frameworks like various build tools, CI/SCM tools, IDEs, etc. to
develop and maintain your data model.
- You don't have to use additional web-based tools to manage some
aspects of your
- data models - your entire model and all of its components are part
of your project source code.
- </p>
- <p>
- Read more about data models <a href="data-model.html">here</a>.
- </p>
- </section>
- <section id="ne">
- <h2 class="section-sub-title">Named Entities <a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- Named entity, also known as a model element or a token, is one of
the main a components defined by the NLPCraft data model.
- A named entity is one or more individual words that have a
consistent semantic meaning and typically denote a
- real-world object, such as persons, locations, number, date and
time, organizations, products, etc. Such
- object can be abstract or have a physical existence.
- </p>
- <p>
- For example, in the following sentence:
- </p>
- <p>
- <i class="fa fa-fw fa-angle-right"></i><code>Meeting is set for
12pm today in San Francisco.</code>
- </p>
- <p>
- the following named entities can be detected:
- </p>
- <table class="gradient-table">
- <thead>
- <tr>
- <th>Words</th>
- <th>Type</th>
- <th>Normalized Value</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><code>Meeting</code></td>
- <td>CUSTOM_OBJ</td>
- <td>meeting</td>
- </tr>
- <tr>
- <td><code>set</code></td>
- <td>CUSTOM_ACT</td>
- <td>set</td>
- </tr>
- <tr>
- <td><code>12pm today</code></td>
- <td>DATE_TIME</td>
- <td>12:00 September 1, 2019 GMT</td>
- </tr>
- <tr>
- <td><code>San Francisco</code></td>
- <td>GEO_CITY</td>
- <td>San Francisco, CA USA</td>
- </tr>
- </tbody>
- </table>
- <p>
- In most cases named entities will have associated <em>normalized
value</em>. It is especially important for named entities that have many
- different notational forms such as time and date, currency,
geographical locations, etc. For example, <code>New York</code>,
- <code>New York City</code> and <code>NYC</code> all refer to the
same "New York City, NY USA" location which is a standard normalized form.
- </p>
- <p>
- The process of detecting named entities is called Named Entity
Recognition (NER). There are many
- different ways of how a certain named entity can be detected:
through list of synonyms, by name, rule-based or by using
- statistical techniques like neural networks with large corpus of
predefined data. NLPCraft natively supports synonym-based
- named entities definition as well as the ability to compose
compose new named entities through powerful <a
href="/intent-matching.html">Intent Definition Language</a> (IDL)
- combining other named entities including named entities from
external projects such OpenNLP, spaCy or Stanford CoreNLP.
- </p>
- <p>
- Named entities allow you to abstract from basic linguistic forms
like nouns and verbs to deal with the higher level semantic
- abstractions like geographical location or time when you are
trying to understand the meaning of the sentence.
- One of the main goals of named entities is to act as an input
ingredients for intent matching.
- </p>
- <p>
- Read more in-depth about named entities <a
href="data-model.html">here</a>.
- </p>
- </section>
- <section id="intent">
- <h2 class="section-sub-title">Intent Matching <a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
- <p>
- You can think of intent matching as regular expression matching
where instead of characters you deal with detected named entities.
- Intent defines a pattern in terms of detected named entities (or
tokens) and a callback to call when submitted sentence
- matches that pattern.
- </p>
- <p>
- Intents can also match on the <em>dialog flow</em> additionally to
the matching on the current user sentence.
- Dialog flow matching means matching an intent based on what
intents were matched previously for the same user
- and data model, i.e. the flow of the dialog. Note that you should
not confuse dialog flow intent matching with
- conversational STM that is used to fill in missing tokens from
memory.
- </p>
- <div class="bq success">
- <div class="bq-idea-container">
- <div><div>💡</div></div>
- <div>
- You can think of NLPCraft data model as a mechanism to
define named entities and intents that use
- these named entities to pattern match the user input.
- </div>
- </div>
- </div>
- <p>
- Learn more details about intent matching <a
href="intent-matching.html">here</a>.
- </p>
- </section>
- <section id="stm">
- <h2 class="section-sub-title">Conversation <span
class="amp">&</span> STM <a href="#"><i class="top-link fas fa-fw
fa-angle-double-up"></i></a></h2>
- <div class="bq info">
- <b>Short-Term Memory</b>
- <p>
- Read more in-depth explanation about maintaining
conversational context and
- Short-Term Memory in Aaron Radzinski <a
href="/blogs/short_term_memory.html">blog.</a>
- </p>
- </div>
-
- <p>
- NLPCraft provides automatic conversation management right out of
the box.
- Conversation management is based on the idea of short-term memory
(STM). STM is automatically
- maintained by NLPCraft per each user and data model. Essentially,
NLPCraft "remembers"
- the context of the conversation and can supply the currently
missing elements from its memory (i.e. from STM).
- STM implementation is also fully integrated with intent matching.
- </p>
- <p>
- Maintaining conversation state is necessary for effective context
resolution, so that users
- could ask, for example, the following sequence of questions using
example weather model:
- </p>
- <dl class="stm-example">
- <dd><i class="fa fa-fw fa-angle-right"></i>What’s the weather in
London today?</dd>
- <dt>
- <p>
- User gets the current London’s weather.
- STM is empty at this moment so NLPCraft expects to get all
necessary information from
- the user sentence. Meaningful parts of the sentence get
stored in STM.
- </p>
- <div class="stm-state">
- <div class="stm">
- <label>STM Before:</label>
- <span> </span>
- </div>
- <div class="stm">
- <label>STM After:</label>
- <span>weather</span>
- <span>London</span>
- <span>today</span>
- </div>
- </div>
- </dt>
- <dd><i class="fa fa-fw fa-angle-right"></i>And what about
Berlin?</dd>
- <dt>
- <p>
- User gets the current Berlin’s weather.
- The only useful data in the user sentence is name of the
city <code>Berlin</code>. But since
- NLPCraft now has data from the previous question in its
STM it can safely deduce that we
- are asking about <code>weather</code> for
<code>today</code>.
- <code>Berlin</code> overrides <code>London</code> in STM.
- </p>
- <div class="stm-state">
- <div class="stm">
- <label>STM Before:</label>
- <span>weather</span>
- <span>London</span>
- <span>today</span>
- </div>
- <div class="stm">
- <label>STM After:</label>
- <span>weather</span>
- <span><b>Berlin</b></span>
- <span>today</span>
- </div>
- </div>
- </dt>
- <dd><i class="fa fa-fw fa-angle-right"></i>Next week forecast?</dd>
- <dt>
- <p>
- User gets the next week forecast for Berlin.
- Again, the only useful data in the user sentence is
<code>next week</code> and <code>forecast</code>.
- STM supplies <code>Berlin</code>. <code>Next week</code>
override <code>today</code>, and
- <code>forecast</code> override <code>weather</code> in STM.
- </p>
- <div class="stm-state">
- <div class="stm">
- <label>STM Before:</label>
- <span>weather</span>
- <span>Berlin</span>
- <span>today</span>
- </div>
- <div class="stm">
- <label>STM After:</label>
- <span><b>forecast</b></span>
- <span>Berlin</span>
- <span><b>Next week</b></span>
- </div>
- </div>
- </dt>
- </dl>
- <p>
- Note that STM is maintained per user and per data model.
- Conversation management implementation is also smart enough to
clear STM after certain
- period of time, i.e. it “forgets” the conversational context after
few minutes of inactivity.
- Note also that conversational context can also be cleared
explicitly
- via <a
href="https://github.com/apache/incubator-nlpcraft/blob/master/openapi/nlpcraft_swagger.yml"
target="github">REST API</a>.
- </p>
- </section>
-</div>
-<div class="col-md-2 third-column">
- <ul class="side-nav">
- <li class="side-nav-title">On This Page</li>
- <li><a href="#model">Data Model</a></li>
- <li><a href="#ne">Named Entities</a></li>
- <li><a href="#intent">Intent Matching</a></li>
- <li><a href="#stm">Conversation <span class="amp">&</span>
STM</a></li>
- {% include quick-links.html %}
- </ul>
-</div>
-
-
-
-
diff --git a/blogs/quick_intro_apache_nlpcraft.html
b/blogs/quick_intro_apache_nlpcraft.html
index f3cb800..cfade05 100644
--- a/blogs/quick_intro_apache_nlpcraft.html
+++ b/blogs/quick_intro_apache_nlpcraft.html
@@ -98,10 +98,10 @@ publish_date: November 16, 2020
NLPCraft natively integrates with 3rd party libraries for
basic NLP processing and named entity recognition:
</p>
<div style="display: inline-block; margin-bottom: 20px">
- <a style="margin-right: 10px" target="opennlp"
href="https://opennlp.apache.org"><img src="/images/opennlp-logo.png"
height="32px" alt=""></a>
- <a style="margin-right: 10px" target="google"
href="https://cloud.google.com/natural-language/"><img
src="/images/google-cloud-logo-small.png" height="32px" alt=""></a>
- <a style="margin-right: 10px" target="stanford"
href="https://stanfordnlp.github.io/CoreNLP"><img
src="/images/corenlp-logo.gif" height="48px" alt=""></a>
- <a style="margin-right: 10px" target="spacy"
href="https://spacy.io"><img src="/images/spacy-logo.png" height="32px"
alt=""></a>
+ <a style="margin-right: 10px" target=_
href="https://opennlp.apache.org"><img src="/images/opennlp-logo-h32.png"
alt=""></a>
+ <a style="margin-right: 10px" target=_
href="https://cloud.google.com/natural-language/"><img
src="/images/google-cloud-logo-small-h32.png" alt=""></a>
+ <a style="margin-right: 10px" target=_
href="https://stanfordnlp.github.io/CoreNLP"><img
src="/images/corenlp-logo-h48.png" alt=""></a>
+ <a style="margin-right: 10px" target=_
href="https://spacy.io"><img src="/images/spacy-logo-h32.png" alt=""></a>
</div>
</div>
<div class="col-6">
@@ -366,7 +366,7 @@ publish_date: November 16, 2020
<p>
We’ll leave outside of this article the details of the particular
integration with HomeKit or Arduino devices. We’ll also defer
to the NLPCraft <a href="/docs.html">documentation</a> to learn about
other topics such as
- <a href="/basic-concepts.html#stm">conversation management</a>,
+ <a href="/short-term-memory.html">conversation management</a>,
details of <a href="/data-model.html#macros">Macro DSL</a> and <a
href="/intent-matching.html">Intent Definition Language</a>,
built-in <a href="/tools/test_framework.html">testing tools</a>,
3rd party NER <a href="/integrations.html">integrations</a>, etc.
diff --git a/data-model.html b/data-model.html
index 66a66c8..14cf1ec 100644
--- a/data-model.html
+++ b/data-model.html
@@ -472,6 +472,75 @@ intents:
</div>
</div>
</section>
+ <section id="ne">
+ <h2 class="section-sub-title">Named Entities <a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+ <p>
+ Named entity, also known as a model element or a token, is one of
the main a components defined by the NLPCraft data model.
+ A named entity is one or more individual words that have a
consistent semantic meaning and typically denote a
+ real-world object, such as persons, locations, number, date and
time, organizations, products, etc. Such
+ object can be abstract or have a physical existence.
+ </p>
+ <p>
+ For example, in the following sentence:
+ </p>
+ <p>
+ <i class="fa fa-fw fa-angle-right"></i><code>Meeting is set for
12pm today in San Francisco.</code>
+ </p>
+ <p>
+ the following named entities can be detected:
+ </p>
+ <table class="gradient-table">
+ <thead>
+ <tr>
+ <th>Words</th>
+ <th>Type</th>
+ <th>Normalized Value</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><code>Meeting</code></td>
+ <td>CUSTOM_OBJ</td>
+ <td>meeting</td>
+ </tr>
+ <tr>
+ <td><code>set</code></td>
+ <td>CUSTOM_ACT</td>
+ <td>set</td>
+ </tr>
+ <tr>
+ <td><code>12pm today</code></td>
+ <td>DATE_TIME</td>
+ <td>12:00 September 1, 2019 GMT</td>
+ </tr>
+ <tr>
+ <td><code>San Francisco</code></td>
+ <td>GEO_CITY</td>
+ <td>San Francisco, CA USA</td>
+ </tr>
+ </tbody>
+ </table>
+ <p>
+ In most cases named entities will have associated <em>normalized
value</em>. It is especially important for named entities that have many
+ different notational forms such as time and date, currency,
geographical locations, etc. For example, <code>New York</code>,
+ <code>New York City</code> and <code>NYC</code> all refer to the
same "New York City, NY USA" location which is a standard normalized form.
+ </p>
+ <p>
+ The process of detecting named entities is called Named Entity
Recognition (NER). There are many
+ different ways of how a certain named entity can be detected:
through list of synonyms, by name, rule-based or by using
+ statistical techniques like neural networks with large corpus of
predefined data. NLPCraft natively supports synonym-based
+ named entities definition as well as the ability to compose
compose new named entities through powerful <a
href="/intent-matching.html">Intent Definition Language</a> (IDL)
+ combining other named entities including named entities from
external projects such OpenNLP, spaCy or Stanford CoreNLP.
+ </p>
+ <p>
+ Named entities allow you to abstract from basic linguistic forms
like nouns and verbs to deal with the higher level semantic
+ abstractions like geographical location or time when you are
trying to understand the meaning of the sentence.
+ One of the main goals of named entities is to act as an input
ingredients for intent matching.
+ </p>
+ <p>
+ Read more in-depth about named entities <a
href="data-model.html">here</a>.
+ </p>
+ </section>
<section id="elements">
<h2 class="section-title">Model Elements <a href="#"><i
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
<p>