This is an automated email from the ASF dual-hosted git repository. sergeykamov pushed a commit to branch NLPCRAFT-513 in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft-website.git
commit e54bd225c0de70b42c9be7bd8d1998068c30622b Author: skhdl <[email protected]> AuthorDate: Wed Oct 19 10:11:13 2022 +0400 WIP. --- examples/light_switch_ru.html | 468 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 468 insertions(+) diff --git a/examples/light_switch_ru.html b/examples/light_switch_ru.html new file mode 100644 index 0000000..80d8d44 --- /dev/null +++ b/examples/light_switch_ru.html @@ -0,0 +1,468 @@ +--- +active_crumb: Light Switch RU <code><sub>ex</sub></code> +layout: documentation +id: light_switch_ru +fa_icon: fa-cube +--- + +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<div class="col-md-8 second-column example"> + <section id="overview"> + <h2 class="section-title">Overview <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> + <p> + This example provides a very simple Russian language implementation for NLI-powered light switch. You can say something like + "Выключи свет по всем доме" or "Включи свет в детской". + By modifying intent callbacks using, for example, HomeKit or Arduino-based controllers you can provide the + actual light switching. + </p> + <p> + <b>Complexity:</b> <span class="complexity-two-star"><i class="fas fa-square"></i> <i class="fas fa-square"></i> <i class="far fa-square"></i></span><br/> + <span class="ex-src">Source code: <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples/lightswitch_ru">GitHub <i class="fab fa-fw fa-github"></i></a><br/></span> + <span class="ex-review-all">Review: <a target="github" href="https://github.com/apache/incubator-nlpcraft/tree/master/nlpcraft-examples">All Examples at GitHub <i class="fab fa-fw fa-github"></i></a></span> + </p> + </section> + <section id="new_project"> + <h2 class="section-title">Create New Project <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> + <p> + You can create new Scala projects in many ways - we'll use SBT + to accomplish this task. Make sure that <code>build.sbt</code> file has the following content: + </p> + <pre class="brush: js, highlight: [8, 9, 10]"> + ThisBuild / version := "0.1.0-SNAPSHOT" + ThisBuild / scalaVersion := "3.1.3" + lazy val root = (project in file(".")) + .settings( + name := "NLPCraft LightSwitch FR Example", + version := "{{site.latest_version}}", + libraryDependencies += "org.apache.nlpcraft" % "nlpcraft" % "{{site.latest_version}}", + libraryDependencies += "org.apache.lucene" % "lucene-analyzers-common" % "8.11.2", + libraryDependencies += "org.languagetool" % "languagetool-core" % "5.8", + libraryDependencies += "org.languagetool" % "language-ru" % "5.8" + libraryDependencies += "org.scalatest" %% "scalatest" % "3.2.14" % "test" + ) + </pre> + + <p> + On <code>lines 8, 9 and 10</code> added libraries, which used for support base NLP operations with Russian language. + </p> + + <p><b>NOTE: </b>use the latest versions of Scala and ScalaTest.</p> + <p>Create the following files so that resulting project structure would look like the following:</p> + <ul> + <li><code>lightswitch_model_ru.yaml</code> - YAML configuration file, which contains model description.</li> + <li><code>LightSwitchRuModel.scala</code> - Scala class, model implementation.</li> + <li><code>NCRuSemanticEntityParser.scala</code> - Scala class, semantic entity parser, custom implementation for Russian language.</li> + <li><code>NCRuLemmaPosTokenEnricher.scala</code> - Scala class, lemma and point of speech token enricher, custom implementation for Russian language.</li> + <li><code>NCRuStopWordsTokenEnricher.scala</code> - Scala class, stop-words token enricher, custom implementation for Russian language.</li> + <li><code>NCRuTokenParser.scala</code> - Scala class, token parser, custom implementation for Russian language.</li> + <li><code>LightSwitchRuModelSpec.scala</code> - Scala tests class, which allows to test your model.</li> + </ul> + <pre class="brush: plain, highlight: [7, 10, 14, 17, 18, 20, 24]"> + | build.sbt + +--project + | build.properties + \--src + +--main + | +--resources + | | lightswitch_model_ru.yaml + | \--scala + | \--demo + | | LightSwitchRuModel.scala + | \--nlp + | +--entity + | | \--parser + | | NCRuSemanticEntityParser.scala + | \--token + | +--enricher + | | NCRuLemmaPosTokenEnricher.scala + | | NCRuStopWordsTokenEnricher.scala + | \--parser + | NCRuTokenParser.scala + \--test + \--scala + \--demo + LightSwitchRuModelSpec.scala + </pre> + </section> + <section id="model"> + <h2 class="section-title">Data Model<a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> + <p> + We are going to start with declaring the static part of our model using YAML which we will later load using + <code>NCModelAdapter</code> in our Scala-based model implementation. + Open <code>src/main/resources/<b>light_switch_ru.yaml</b></code> + file and replace its content with the following YAML: + </p> + <pre class="brush: js, highlight: [1, 8, 13, 21]"> + macros: + "<TURN_ON>" : "{включить|включать|врубить|врубать|запустить|запускать|зажигать|зажечь}" + "<TURN_OFF>" : "{погасить|загасить|гасить|выключить|выключать|вырубить|вырубать|отключить|отключать|убрать|убирать|приглушить|приглушать|стоп}" + "<ENTIRE_OPT>" : "{весь|все|всё|повсюду|вокруг|полностью|везде|_}" + "<LIGHT_OPT>" : "{это|лампа|бра|люстра|светильник|лампочка|лампа|освещение|свет|электричество|электрика|_}" + + elements: + - id: "ls:loc" + description: "Location of lights." + synonyms: + - "<ENTIRE_OPT> {здание|помещение|дом|кухня|детская|кабинет|гостиная|спальня|ванная|туалет|{большая|обеденная|ванная|детская|туалетная} комната}" + + - id: "ls:on" + groups: + - "act" + description: "Light switch ON action." + synonyms: + - "<LIGHT_OPT> <ENTIRE_OPT> <TURN_ON>" + - "<TURN_ON> <ENTIRE_OPT> <LIGHT_OPT>" + + - id: "ls:off" + groups: + - "act" + description: "Light switch OFF action." + synonyms: + - "<LIGHT_OPT> <ENTIRE_OPT> <TURN_OFF>" + - "<TURN_OFF> <ENTIRE_OPT> <LIGHT_OPT>" + - "без <ENTIRE_OPT> <LIGHT_OPT>" + </pre> + <p>There are number of important points here:</p> + <ul> + <li> + <code>Line 1</code> defines several macros that are used later on throughout the model's elements + to shorten the synonym declarations. Note how macros coupled with option groups + shorten overall synonym declarations 1000:1 vs. manually listing all possible word permutations. + </li> + <li> + <code>Lines 8, 13, 21</code> define three model elements: the location of the light, and actions to turn + the light on and off. Action elements belong to the same group <code>act</code> which + will be used in our intent, defined in <code>LightSwitchRuModel</code> class. Note that these model + elements are defined mostly through macros we have defined above. + + </li> + </ul> + <div class="bq info"> + <p><b>YAML vs. API</b></p> + <p> + As usual, this YAML-based static model definition is convenient but totally optional. All elements definitions + can be provided programmatically inside Scala model <code>LightSwitchRuModel</code> class as well. + </p> + </div> + </section> + <section id="code"> + <h2 class="section-title">Model Class <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> + <p> + Open <code>src/main/scala/demo/<b>LightSwitchRuModel.scala</b></code> file and replace its content with the following code: + </p> + <pre class="brush: scala, highlight: [11, 12, 13, 20, 21, 24, 25, 32]"> + package demo + + import com.google.gson.Gson + import org.apache.nlpcraft.* + import org.apache.nlpcraft.annotations.* + import demo.nlp.entity.parser.NCRuSemanticEntityParser + import demo.lightswitch.nlp.token.enricher.* + import demo.lightswitch.nlp.token.parser.NCRuTokenParser + import scala.jdk.CollectionConverters.* + + class LightSwitchRuModel extends NCModelAdapter( + NCModelConfig("nlpcraft.lightswitch.ru.ex", "LightSwitch Example Model RU", "1.0"), + new NCPipelineBuilder(). + withTokenParser(new NCRuTokenParser()). + withTokenEnricher(new NCRuLemmaPosTokenEnricher()). + withTokenEnricher(new NCRuStopWordsTokenEnricher()). + withEntityParser(new NCRuSemanticEntityParser("lightswitch_model_ru.yaml")). + build + ): + @NCIntent("intent=ls term(act)={has(ent_groups, 'act')} term(loc)={# == 'ls:loc'}*") + def onMatch( + ctx: NCContext, + im: NCIntentMatch, + @NCIntentTerm("act") actEnt: NCEntity, + @NCIntentTerm("loc") locEnts: List[NCEntity] + ): NCResult = + val action = if actEnt.getId == "ls:on" then "включить" else "выключить" + val locations = if locEnts.isEmpty then "весь дом" else locEnts.map(_.mkText).mkString(", ") + + // Add HomeKit, Arduino or other integration here. + // By default - just return a descriptive action string. + NCResult(new Gson().toJson(Map("locations" -> locations, "action" -> action).asJava)) + </pre> + <p> + The intent callback logic is very simple - we return a descriptive confirmation message + back (explaining what lights were changed). With action and location detected, you can add + the actual light switching using HomeKit or Arduino devices. Let's review this implementation step by step: + </p> + <ul> + <li> + On <code>line 11</code> our class extends <code>NCModelAdapter</code> that allows us to pass + prepared configuration and pipeline into model. + </li> + <li> + On <code>line 12</code> created model configuration with most default parameters. + </li> + <li> + On <code>line 13</code> created pipeline, based on custom Russian language components, which are described below. + <ul> + <li><code>NCRuTokenParser</code>. Token parser.</li> + <li><code>NCRuLemmaPosTokenEnricher</code>. Lemma and point of speech token enricher.</li> + <li><code>NCRuStopWordsTokenEnricher</code>. Stop-words token enricher.</li> + <li><code>NCRuSemanticEntityParser</code>. Semantic entity parser extending.</li> + </ul> + Note that <code>NCRuSemanticEntityParser</code> is based on semantic model definition, + described in <code>lightswitch_model_ru.yaml</code> file. + </li> + <li> + <code>Lines 20 and 21</code> annotates intents <code>ls</code> and its callback method <code>onMatch</code>. + Intent <code>ls</code> requires one action (a token belonging to the group act) and optional list of light locations + (zero or more tokens with ID ls:loc) - by default we assume the entire house as a default location. + </li> + <li> + <code>Lines 24 and 25</code> map terms from detected intent to the formal method parameters of the + <code>onMatch</code> method. + </li> + <li> + On the <code>line 32</code> the intent callback simply returns a confirmation message. + </li> + </ul> + + <p> + Lets review each custom pipeline components. + </p> + + <p> + Open <code>src/main/scala/demo/nlp/token/parser/<b>NCRuTokenParser.scala</b></code> file and replace its content with the following code: + </p> + + <pre class="brush: scala, highlight: [19]"> + package demo.nlp.token.parser + + import org.apache.nlpcraft.* + import org.languagetool.tokenizers.WordTokenizer + import scala.jdk.CollectionConverters.* + + class NCRuTokenParser extends NCTokenParser: + private val tokenizer = new WordTokenizer + + override def tokenize(text: String): List[NCToken] = + val toks = collection.mutable.ArrayBuffer.empty[NCToken] + var sumLen = 0 + + for ((word, idx) <- tokenizer.tokenize(text).asScala.zipWithIndex) + val start = sumLen + val end = sumLen + word.length + + if word.strip.nonEmpty then + toks += new NCPropertyMapAdapter with NCToken: + override def getText: String = word + override def getIndex: Int = idx + override def getStartCharIndex: Int = start + override def getEndCharIndex: Int = end + + sumLen = end + + toks.toList + </pre> + <p> + <code>NCRuTokenParser</code> is simple wrapper, which implements <code>NCTokenParser</code> based on + open source solution <a href="https://languagetool.org">Language Tool</a> solution. + On <code>line 19</code> <code>NCToken</code> instances created. + </p> + + <p> + Open <code>src/main/scala/demo/nlp/token/enricher/<b>NCRuLemmaPosTokenEnricher.scala</b></code> file and replace its content with the following code: + </p> + <pre class="brush: scala, highlight: [27, 28]"> + package demo.nlp.token.enricher + + import org.apache.nlpcraft.* + import org.languagetool.AnalyzedToken + import org.languagetool.tagging.ru.RussianTagger + import scala.jdk.CollectionConverters.* + + class NCRuLemmaPosTokenEnricher extends NCTokenEnricher: + private def nvl(v: String, dflt : => String): String = if v != null then v else dflt + + override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit = + val tags = RussianTagger.INSTANCE.tag(toks.map(_.getText).asJava).asScala + + require(toks.size == tags.size) + + toks.zip(tags).foreach { case (tok, tag) => + val readings = tag.getReadings.asScala + + val (lemma, pos) = readings.size match + // No data. Lemma is word as is, POS is undefined. + case 0 => (tok.getText, "") + // Takes first. Other variants ignored. + case _ => + val aTok: AnalyzedToken = readings.head + (nvl(aTok.getLemma, tok.getText), nvl(aTok.getPOSTag, "")) + + tok.put("pos", pos) + tok.put("lemma", lemma) + + () // Otherwise NPE. + } + </pre> + <p> + <code>NCRuLemmaPosTokenEnricher</code> lemma and point of speech tokens enricher, based on + open source solution <a href="https://languagetool.org">Language Tool</a> solution. + On <code>line 27 and 28</code> tokens are enriched by <code>pos</code> and <code>lemma</code> data. + </p> + + <p> + Open <code>src/main/scala/demo/nlp/token/enricher/<b>NCRuStopWordsTokenEnricher.scala</b></code> file and replace its content with the following code: + </p> + + <pre class="brush: scala, highlight: [17]"> + package demo.nlp.token.enricher + + import org.apache.lucene.analysis.ru.RussianAnalyzer + import org.apache.nlpcraft.* + + class NCRuStopWordsTokenEnricher extends NCTokenEnricher: + private val stops = RussianAnalyzer.getDefaultStopSet + + private def getPos(t: NCToken): String = t.get("pos").getOrElse(throw new NCException("POS not found in token.")) + private def getLemma(t: NCToken): String = t.get("lemma").getOrElse(throw new NCException("Lemma not found in token.")) + + override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit = + for (t <- toks) + val lemma = getLemma(t) + lazy val pos = getPos(t) + + t.put( + "stopword", + lemma.length == 1 && !Character.isLetter(lemma.head) && !Character.isDigit(lemma.head) || + stops.contains(lemma.toLowerCase) || + pos.startsWith("PARTICLE") || + pos.startsWith("INTERJECTION") || + pos.startsWith("PREP") + ) + </pre> + <p> + <code>NCRuStopWordsTokenEnricher</code> stop-words tokens enricher, based on + open source solution <a href="https://lucene.apache.org/">Apache Lucene</a> solution. + On <code>line 17</code> tokens are enriched by <code>stopword</code> flags data. + </p> + + <p> + Open <code>src/main/scala/demo/nlp/entity/parser/<b>NCRuSemanticEntityParser.scala</b></code> file and replace its content with the following code: + </p> + + <pre class="brush: scala, highlight: [8, 12]"> + package demo.nlp.entity.parser + + import opennlp.tools.stemmer.snowball.SnowballStemmer + import demo.nlp.token.parser.NCRuTokenParser + import org.apache.nlpcraft.nlp.parsers.* + + class NCRuSemanticEntityParser(src: String) extends NCSemanticEntityParser( + new NCSemanticStemmer: + private val stemmer = new SnowballStemmer(SnowballStemmer.ALGORITHM.RUSSIAN) + override def stem(txt: String): String = stemmer.synchronized { stemmer.stem(txt.toLowerCase).toString } + , + new NCRuTokenParser(), + mdlSrcOpt = Option(src) + ) + </pre> + <p> + <code>NCRuSemanticEntityParser</code> extends <code>NCSemanticEntityParser</code> + It uses stemmer implementation from <a href="https://opennlp.apache.org/">Apache OpenNLP</a> solution + on <code>line 8</code> and already described <code>NCRuTokenParser</code> token parser implementation on <code>line 12</code>. + </p> + </section> + + <section id="testing"> + <h2 class="section-title">Testing <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> + <p> + The test defined in <code>LightSwitchRuModelSpec</code> allows to check that all input test sentences are + processed correctly and trigger the expected intent <code>ls</code>: + </p> + <pre class="brush: scala, highlight: [9, 11]"> + package demo + + import org.apache.nlpcraft.* + import org.scalatest.funsuite.AnyFunSuite + import scala.util.Using + + class LightSwitchRuModelSpec extends AnyFunSuite: + test("test") { + Using.resource(new NCModelClient(new LightSwitchRuModel)) { client => + def check(txt: String): Unit = + require(client.debugAsk(txt, "userId", true).getIntentId == "ls") + + check("Выключи свет по всем доме") + check("Выруби электричество!") + check("Включи свет в детской") + check("Включай повсюду освещение") + check("Включайте лампы в детской комнате") + check("Свет на кухне, пожалуйста, приглуши") + check("Нельзя ли повсюду выключить свет?") + check("Пожалуйста без света") + check("Отключи электричество в ванной") + check("Выключи, пожалуйста, тут всюду свет") + check("Выключай все!") + check("Свет пожалуйста везде включи") + check("Зажги лампу на кухне") + } + } + </pre> + <ul> + <li> + On <code>line 9</code> the client for our model is created. + </li> + <li> + On <code>line 11</code> a special method <code>debugAsk</code> is called. + It allows to check the winning intent and its callback parameters without actually + calling the intent. + </li> + <li> + <code>Lines 13-25</code> define all the test input sentences that should all + trigger <code>ls</code> intent. + </li> + </ul> + <p> + You can run this test via SBT task <code>executeTests</code> or using IDE. + </p> + <pre class="brush: scala, highlight: []"> + PS C:\apache\incubator-nlpcraft-examples\lightswitch_ru> sbt executeTests + </pre> + </section> + <section> + <h2 class="section-title">Done! 👌 <a href="#"><i class="top-link fas fa-fw fa-angle-double-up"></i></a></h2> + <p> + You've created light switch data model and tested it. + </p> + </section> +</div> +<div class="col-md-2 third-column"> + <ul class="side-nav"> + <li class="side-nav-title">On This Page</li> + <li><a href="#overview">Overview</a></li> + <li><a href="#new_project">New Project</a></li> + <li><a href="#model">Data Model</a></li> + <li><a href="#code">Model Class</a></li> + <li><a href="#testing">Testing</a></li> + {% include quick-links.html %} + </ul> +</div> + + + + + +
