[incubator-nlpcraft-website] 02/02: WIP.

sergeykamov Tue, 18 Oct 2022 23:11:44 -0700

This is an automated email from the ASF dual-hosted git repository.

sergeykamov pushed a commit to branch NLPCRAFT-513
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft-website.git


commit e54bd225c0de70b42c9be7bd8d1998068c30622b
Author: skhdl <[email protected]>
AuthorDate: Wed Oct 19 10:11:13 2022 +0400

    WIP.
---
 examples/light_switch_ru.html | 468 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 468 insertions(+)

diff --git a/examples/light_switch_ru.html b/examples/light_switch_ru.html
new file mode 100644
index 0000000..80d8d44
--- /dev/null
+++ b/examples/light_switch_ru.html
@@ -0,0 +1,468 @@
+---
+active_crumb: Light Switch RU <code><sub>ex</sub></code>
+layout: documentation
+id: light_switch_ru
+fa_icon: fa-cube
+---
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<div class="col-md-8 second-column example">
+    <section id="overview">
+        <h2 class="section-title">Overview <a href="#"><i class="top-link fas 
fa-fw fa-angle-double-up"></i></a></h2>
+        <p>
+            This example provides a very simple Russian language 
implementation for NLI-powered light switch. You can say something like
+            "Выключи свет по всем доме" or "Включи свет в детской".
+            By modifying intent callbacks using, for example, HomeKit or 
Arduino-based controllers you can provide the
+            actual light switching.
+        </p>
+        <p>
+            <b>Complexity:</b> <span class="complexity-two-star"><i class="fas 
fa-square"></i> <i class="fas fa-square"></i> <i class="far 
fa-square"></i></span><br/>
+            <span class="ex-src">Source code: <a target="github" 
href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples/lightswitch_ru";>GitHub
 <i class="fab fa-fw fa-github"></i></a><br/></span>
+            <span class="ex-review-all">Review: <a target="github" 
href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples";>All
 Examples at GitHub <i class="fab fa-fw fa-github"></i></a></span>
+        </p>
+    </section>
+    <section id="new_project">
+        <h2 class="section-title">Create New Project <a href="#"><i 
class="top-link fas fa-fw fa-angle-double-up"></i></a></h2>
+        <p>
+            You can create new Scala projects in many ways - we'll use SBT
+            to accomplish this task. Make sure that <code>build.sbt</code> 
file has the following content:
+        </p>
+        <pre class="brush: js, highlight: [8, 9, 10]">
+            ThisBuild / version := "0.1.0-SNAPSHOT"
+            ThisBuild / scalaVersion := "3.1.3"
+            lazy val root = (project in file("."))
+              .settings(
+                name := "NLPCraft LightSwitch FR Example",
+                version := "{{site.latest_version}}",
+                libraryDependencies += "org.apache.nlpcraft" % "nlpcraft" % 
"{{site.latest_version}}",
+                libraryDependencies += "org.apache.lucene" % 
"lucene-analyzers-common" % "8.11.2",
+                libraryDependencies += "org.languagetool" % 
"languagetool-core" % "5.8",
+                libraryDependencies += "org.languagetool" % "language-ru" % 
"5.8"
+                libraryDependencies += "org.scalatest" %% "scalatest" % 
"3.2.14" % "test"
+              )
+        </pre>
+
+        <p>
+            On <code>lines 8, 9 and 10</code> added libraries, which used for 
support base NLP operations with Russian language.
+        </p>
+
+        <p><b>NOTE: </b>use the latest versions of Scala and ScalaTest.</p>
+        <p>Create the following files so that resulting project structure 
would look like the following:</p>
+        <ul>
+            <li><code>lightswitch_model_ru.yaml</code> - YAML configuration 
file, which contains model description.</li>
+            <li><code>LightSwitchRuModel.scala</code> - Scala class, model 
implementation.</li>
+            <li><code>NCRuSemanticEntityParser.scala</code> - Scala class, 
semantic entity parser, custom implementation for Russian language.</li>
+            <li><code>NCRuLemmaPosTokenEnricher.scala</code> - Scala class, 
lemma and point of speech token enricher, custom implementation for Russian 
language.</li>
+            <li><code>NCRuStopWordsTokenEnricher.scala</code> - Scala class, 
stop-words token enricher, custom implementation for Russian language.</li>
+            <li><code>NCRuTokenParser.scala</code> - Scala class, token 
parser, custom implementation for Russian language.</li>
+            <li><code>LightSwitchRuModelSpec.scala</code> - Scala tests class, 
which allows to test your model.</li>
+        </ul>
+        <pre class="brush: plain, highlight: [7, 10, 14, 17, 18, 20, 24]">
+            |  build.sbt
+            +--project
+            |    build.properties
+            \--src
+               +--main
+               |  +--resources
+               |  |  lightswitch_model_ru.yaml
+               |  \--scala
+               |     \--demo
+               |        |  LightSwitchRuModel.scala
+               |        \--nlp
+               |           +--entity
+               |           |  \--parser
+               |           |       NCRuSemanticEntityParser.scala
+               |           \--token
+               |              +--enricher
+               |              |    NCRuLemmaPosTokenEnricher.scala
+               |              |    NCRuStopWordsTokenEnricher.scala
+               |              \--parser
+               |                   NCRuTokenParser.scala
+               \--test
+                   \--scala
+                       \--demo
+                            LightSwitchRuModelSpec.scala
+        </pre>
+    </section>
+    <section id="model">
+        <h2 class="section-title">Data Model<a href="#"><i class="top-link fas 
fa-fw fa-angle-double-up"></i></a></h2>
+        <p>
+            We are going to start with declaring the static part of our model 
using YAML which we will later load using
+            <code>NCModelAdapter</code> in our Scala-based model 
implementation.
+            Open <code>src/main/resources/<b>light_switch_ru.yaml</b></code>
+            file and replace its content with the following YAML:
+        </p>
+        <pre class="brush: js, highlight: [1, 8, 13, 21]">
+            macros:
+              "&lt;TURN_ON&gt;" : 
"{включить|включать|врубить|врубать|запустить|запускать|зажигать|зажечь}"
+              "&lt;TURN_OFF&gt;" : 
"{погасить|загасить|гасить|выключить|выключать|вырубить|вырубать|отключить|отключать|убрать|убирать|приглушить|приглушать|стоп}"
+              "&lt;ENTIRE_OPT&gt;" : 
"{весь|все|всё|повсюду|вокруг|полностью|везде|_}"
+              "&lt;LIGHT_OPT&gt;" : 
"{это|лампа|бра|люстра|светильник|лампочка|лампа|освещение|свет|электричество|электрика|_}"
+
+            elements:
+              - id: "ls:loc"
+                description: "Location of lights."
+                synonyms:
+                  - "&lt;ENTIRE_OPT&gt; 
{здание|помещение|дом|кухня|детская|кабинет|гостиная|спальня|ванная|туалет|{большая|обеденная|ванная|детская|туалетная}
 комната}"
+
+              - id: "ls:on"
+                groups:
+                  - "act"
+                description: "Light switch ON action."
+                synonyms:
+                  - "&lt;LIGHT_OPT&gt; &lt;ENTIRE_OPT&gt; &lt;TURN_ON&gt;"
+                  - "&lt;TURN_ON&gt; &lt;ENTIRE_OPT&gt; &lt;LIGHT_OPT&gt;"
+
+              - id: "ls:off"
+                groups:
+                  - "act"
+                description: "Light switch OFF action."
+                synonyms:
+                  - "&lt;LIGHT_OPT&gt; &lt;ENTIRE_OPT&gt; &lt;TURN_OFF&gt;"
+                  - "&lt;TURN_OFF&gt; &lt;ENTIRE_OPT&gt; &lt;LIGHT_OPT&gt;"
+                  - "без &lt;ENTIRE_OPT&gt; &lt;LIGHT_OPT&gt;"
+        </pre>
+        <p>There are number of important points here:</p>
+        <ul>
+            <li>
+                <code>Line 1</code> defines several macros that are used later 
on throughout the model's elements
+                to shorten the synonym declarations. Note how macros coupled 
with option groups  
+                shorten overall synonym declarations 1000:1 vs. manually 
listing all possible word permutations.
+            </li>
+            <li>
+                <code>Lines 8, 13, 21</code> define three model elements: the 
location of the light, and actions to turn
+                the light on and off. Action elements belong to the same group 
<code>act</code> which
+                will be used in our intent, defined in 
<code>LightSwitchRuModel</code> class. Note that these model
+                elements are defined mostly through macros we have defined 
above.
+
+            </li>
+        </ul>
+        <div class="bq info">
+            <p><b>YAML vs. API</b></p>
+            <p>
+                As usual, this YAML-based static model definition is 
convenient but totally optional. All elements definitions
+                can be provided programmatically inside Scala model 
<code>LightSwitchRuModel</code> class as well.
+            </p>
+        </div>
+    </section>
+    <section id="code">
+        <h2 class="section-title">Model Class <a href="#"><i class="top-link 
fas fa-fw fa-angle-double-up"></i></a></h2>
+        <p>
+            Open 
<code>src/main/scala/demo/<b>LightSwitchRuModel.scala</b></code> file and 
replace its content with the following code:
+        </p>
+        <pre class="brush: scala, highlight: [11, 12, 13, 20, 21, 24, 25, 32]">
+            package demo
+
+            import com.google.gson.Gson
+            import org.apache.nlpcraft.*
+            import org.apache.nlpcraft.annotations.*
+            import demo.nlp.entity.parser.NCRuSemanticEntityParser
+            import demo.lightswitch.nlp.token.enricher.*
+            import demo.lightswitch.nlp.token.parser.NCRuTokenParser
+            import scala.jdk.CollectionConverters.*
+
+            class LightSwitchRuModel extends NCModelAdapter(
+                NCModelConfig("nlpcraft.lightswitch.ru.ex", "LightSwitch 
Example Model RU", "1.0"),
+                new NCPipelineBuilder().
+                    withTokenParser(new NCRuTokenParser()).
+                    withTokenEnricher(new NCRuLemmaPosTokenEnricher()).
+                    withTokenEnricher(new NCRuStopWordsTokenEnricher()).
+                    withEntityParser(new 
NCRuSemanticEntityParser("lightswitch_model_ru.yaml")).
+                    build
+            ):
+                @NCIntent("intent=ls term(act)={has(ent_groups, 'act')} 
term(loc)={# == 'ls:loc'}*")
+                def onMatch(
+                    ctx: NCContext,
+                    im: NCIntentMatch,
+                    @NCIntentTerm("act") actEnt: NCEntity,
+                    @NCIntentTerm("loc") locEnts: List[NCEntity]
+                ): NCResult =
+                    val action = if actEnt.getId == "ls:on" then "включить" 
else "выключить"
+                    val locations = if locEnts.isEmpty then "весь дом" else 
locEnts.map(_.mkText).mkString(", ")
+
+                    // Add HomeKit, Arduino or other integration here.
+                    // By default - just return a descriptive action string.
+                    NCResult(new Gson().toJson(Map("locations" -> locations, 
"action" -> action).asJava))
+        </pre>
+        <p>
+            The intent callback logic is very simple - we return a descriptive 
confirmation message
+            back (explaining what lights were changed). With action and 
location detected, you can add
+            the actual light switching using HomeKit or Arduino devices. Let's 
review this implementation step by step:
+        </p>
+        <ul>
+            <li>
+                On <code>line 11</code> our class extends 
<code>NCModelAdapter</code> that allows us to pass
+                prepared configuration and pipeline into model.
+            </li>
+            <li>
+                On <code>line 12</code> created model configuration with most 
default parameters.
+            </li>
+            <li>
+                On <code>line 13</code> created pipeline, based on custom 
Russian language components, which are described below.
+                <ul>
+                    <li><code>NCRuTokenParser</code>. Token parser.</li>
+                    <li><code>NCRuLemmaPosTokenEnricher</code>. Lemma and 
point of speech token enricher.</li>
+                    <li><code>NCRuStopWordsTokenEnricher</code>. Stop-words 
token enricher.</li>
+                    <li><code>NCRuSemanticEntityParser</code>. Semantic entity 
parser extending.</li>
+                </ul>
+                Note that <code>NCRuSemanticEntityParser</code> is based on 
semantic model definition,
+                described in <code>lightswitch_model_ru.yaml</code> file.
+            </li>
+            <li>
+                <code>Lines 20 and 21</code> annotates intents <code>ls</code> 
and its callback method <code>onMatch</code>.
+                Intent <code>ls</code> requires one action (a token belonging 
to the group act) and optional list of light locations
+                (zero or more tokens with ID ls:loc) - by default we assume 
the entire house as a default location.
+            </li>
+            <li>
+                <code>Lines 24 and 25</code> map terms from detected intent to 
the formal method parameters of the
+                <code>onMatch</code> method.
+            </li>
+            <li>
+                On the <code>line 32</code> the intent callback simply returns 
a confirmation message.
+            </li>
+        </ul>
+
+        <p>
+            Lets review each custom pipeline components.
+        </p>
+
+        <p>
+            Open 
<code>src/main/scala/demo/nlp/token/parser/<b>NCRuTokenParser.scala</b></code> 
file and replace its content with the following code:
+        </p>
+
+        <pre class="brush: scala, highlight: [19]">
+            package demo.nlp.token.parser
+
+            import org.apache.nlpcraft.*
+            import org.languagetool.tokenizers.WordTokenizer
+            import scala.jdk.CollectionConverters.*
+
+            class NCRuTokenParser extends NCTokenParser:
+                private val tokenizer = new WordTokenizer
+
+                override def tokenize(text: String): List[NCToken] =
+                    val toks = collection.mutable.ArrayBuffer.empty[NCToken]
+                    var sumLen = 0
+
+                    for ((word, idx) <- 
tokenizer.tokenize(text).asScala.zipWithIndex)
+                        val start = sumLen
+                        val end = sumLen + word.length
+
+                        if word.strip.nonEmpty then
+                            toks += new NCPropertyMapAdapter with NCToken:
+                                override def getText: String = word
+                                override def getIndex: Int = idx
+                                override def getStartCharIndex: Int = start
+                                override def getEndCharIndex: Int = end
+
+                        sumLen = end
+
+                    toks.toList
+        </pre>
+        <p>
+            <code>NCRuTokenParser</code> is simple wrapper, which implements 
<code>NCTokenParser</code> based on
+            open source solution <a href="https://languagetool.org";>Language 
Tool</a> solution.
+            On <code>line 19</code> <code>NCToken</code> instances created.
+        </p>
+
+        <p>
+            Open 
<code>src/main/scala/demo/nlp/token/enricher/<b>NCRuLemmaPosTokenEnricher.scala</b></code>
 file and replace its content with the following code:
+        </p>
+        <pre class="brush: scala, highlight: [27, 28]">
+            package demo.nlp.token.enricher
+
+            import org.apache.nlpcraft.*
+            import org.languagetool.AnalyzedToken
+            import org.languagetool.tagging.ru.RussianTagger
+            import scala.jdk.CollectionConverters.*
+
+            class NCRuLemmaPosTokenEnricher extends NCTokenEnricher:
+                private def nvl(v: String, dflt : => String): String = if v != 
null then v else dflt
+
+                override def enrich(req: NCRequest, cfg: NCModelConfig, toks: 
List[NCToken]): Unit =
+                    val tags = 
RussianTagger.INSTANCE.tag(toks.map(_.getText).asJava).asScala
+
+                    require(toks.size == tags.size)
+
+                    toks.zip(tags).foreach { case (tok, tag) =>
+                        val readings = tag.getReadings.asScala
+
+                        val (lemma, pos) = readings.size match
+                            // No data. Lemma is word as is, POS is undefined.
+                            case 0 => (tok.getText, "")
+                            // Takes first. Other variants ignored.
+                            case _ =>
+                                val aTok: AnalyzedToken = readings.head
+                                (nvl(aTok.getLemma, tok.getText), 
nvl(aTok.getPOSTag, ""))
+
+                        tok.put("pos", pos)
+                        tok.put("lemma", lemma)
+
+                        () // Otherwise NPE.
+                    }
+        </pre>
+        <p>
+            <code>NCRuLemmaPosTokenEnricher</code> lemma and point of speech 
tokens enricher, based on
+            open source solution <a href="https://languagetool.org";>Language 
Tool</a> solution.
+            On <code>line 27 and 28</code> tokens are enriched by 
<code>pos</code> and <code>lemma</code> data.
+        </p>
+
+        <p>
+            Open 
<code>src/main/scala/demo/nlp/token/enricher/<b>NCRuStopWordsTokenEnricher.scala</b></code>
 file and replace its content with the following code:
+        </p>
+
+        <pre class="brush: scala, highlight: [17]">
+            package demo.nlp.token.enricher
+
+            import org.apache.lucene.analysis.ru.RussianAnalyzer
+            import org.apache.nlpcraft.*
+
+            class NCRuStopWordsTokenEnricher extends NCTokenEnricher:
+                private val stops = RussianAnalyzer.getDefaultStopSet
+
+                private def getPos(t: NCToken): String = 
t.get("pos").getOrElse(throw new NCException("POS not found in token."))
+                private def getLemma(t: NCToken): String = 
t.get("lemma").getOrElse(throw new NCException("Lemma not found in token."))
+
+                override def enrich(req: NCRequest, cfg: NCModelConfig, toks: 
List[NCToken]): Unit =
+                    for (t <- toks)
+                        val lemma = getLemma(t)
+                        lazy val pos = getPos(t)
+
+                        t.put(
+                            "stopword",
+                            lemma.length == 1 && 
!Character.isLetter(lemma.head) && !Character.isDigit(lemma.head) ||
+                            stops.contains(lemma.toLowerCase) ||
+                            pos.startsWith("PARTICLE") ||
+                            pos.startsWith("INTERJECTION") ||
+                            pos.startsWith("PREP")
+                        )
+        </pre>
+        <p>
+            <code>NCRuStopWordsTokenEnricher</code> stop-words tokens 
enricher, based on
+            open source solution <a href="https://lucene.apache.org/";>Apache 
Lucene</a> solution.
+            On <code>line 17</code> tokens are enriched by 
<code>stopword</code> flags data.
+        </p>
+
+        <p>
+            Open 
<code>src/main/scala/demo/nlp/entity/parser/<b>NCRuSemanticEntityParser.scala</b></code>
 file and replace its content with the following code:
+        </p>
+
+        <pre class="brush: scala, highlight: [8, 12]">
+            package demo.nlp.entity.parser
+
+            import opennlp.tools.stemmer.snowball.SnowballStemmer
+            import demo.nlp.token.parser.NCRuTokenParser
+            import org.apache.nlpcraft.nlp.parsers.*
+
+            class NCRuSemanticEntityParser(src: String) extends 
NCSemanticEntityParser(
+                new NCSemanticStemmer:
+                    private val stemmer = new 
SnowballStemmer(SnowballStemmer.ALGORITHM.RUSSIAN)
+                    override def stem(txt: String): String = 
stemmer.synchronized { stemmer.stem(txt.toLowerCase).toString }
+                ,
+                new NCRuTokenParser(),
+                mdlSrcOpt = Option(src)
+            )
+        </pre>
+        <p>
+            <code>NCRuSemanticEntityParser</code> extends 
<code>NCSemanticEntityParser</code>
+            It uses stemmer implementation from <a 
href="https://opennlp.apache.org/";>Apache OpenNLP</a> solution
+            on <code>line 8</code> and already described 
<code>NCRuTokenParser</code> token parser implementation on <code>line 
12</code>.
+        </p>
+    </section>
+
+    <section id="testing">
+        <h2 class="section-title">Testing <a href="#"><i class="top-link fas 
fa-fw fa-angle-double-up"></i></a></h2>
+        <p>
+            The test defined in <code>LightSwitchRuModelSpec</code> allows to 
check that all input test sentences are
+            processed correctly and trigger the expected intent 
<code>ls</code>:
+        </p>
+        <pre class="brush: scala, highlight: [9, 11]">
+            package demo
+
+            import org.apache.nlpcraft.*
+            import org.scalatest.funsuite.AnyFunSuite
+            import scala.util.Using
+
+            class LightSwitchRuModelSpec extends AnyFunSuite:
+                test("test") {
+                    Using.resource(new NCModelClient(new LightSwitchRuModel)) 
{ client =>
+                        def check(txt: String): Unit =
+                            require(client.debugAsk(txt, "userId", 
true).getIntentId == "ls")
+
+                        check("Выключи свет по всем доме")
+                        check("Выруби электричество!")
+                        check("Включи свет в детской")
+                        check("Включай повсюду освещение")
+                        check("Включайте лампы в детской комнате")
+                        check("Свет на кухне, пожалуйста, приглуши")
+                        check("Нельзя ли повсюду выключить свет?")
+                        check("Пожалуйста без света")
+                        check("Отключи электричество в ванной")
+                        check("Выключи, пожалуйста, тут всюду свет")
+                        check("Выключай все!")
+                        check("Свет пожалуйста везде включи")
+                        check("Зажги лампу на кухне")
+                    }
+                }
+        </pre>
+        <ul>
+            <li>
+                On <code>line 9</code> the client for our model is created.
+            </li>
+            <li>
+                On <code>line 11</code> a special method <code>debugAsk</code> 
is called.
+                It allows to check the winning intent and its callback 
parameters without actually
+                calling the intent.
+            </li>
+            <li>
+                <code>Lines 13-25</code> define all the test input sentences 
that should all
+                trigger <code>ls</code> intent.
+            </li>
+        </ul>
+        <p>
+            You can run this test via SBT task <code>executeTests</code> or 
using IDE.
+        </p>
+        <pre class="brush: scala, highlight: []">
+            PS C:\apache\incubator-nlpcraft-examples\lightswitch_ru> sbt 
executeTests
+        </pre>
+    </section>
+    <section>
+        <h2 class="section-title">Done! 👌 <a href="#"><i class="top-link fas 
fa-fw fa-angle-double-up"></i></a></h2>
+        <p>
+            You've created light switch data model and tested it.
+        </p>
+    </section>
+</div>
+<div class="col-md-2 third-column">
+    <ul class="side-nav">
+        <li class="side-nav-title">On This Page</li>
+        <li><a href="#overview">Overview</a></li>
+        <li><a href="#new_project">New Project</a></li>
+        <li><a href="#model">Data Model</a></li>
+        <li><a href="#code">Model Class</a></li>
+        <li><a href="#testing">Testing</a></li>
+        {% include quick-links.html %}
+    </ul>
+</div>
+
+
+
+
+
+

[incubator-nlpcraft-website] 02/02: WIP.

Reply via email to