[incubator-nlpcraft] branch master updated: Update NCPipeline.java

aradzinski Tue, 29 Mar 2022 15:27:50 -0700

This is an automated email from the ASF dual-hosted git repository.

aradzinski pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-nlpcraft.git



The following commit(s) were added to refs/heads/master by this push:
     new 74b7997  Update NCPipeline.java
74b7997 is described below

commit 74b7997e51172d171593dcfde90ca0dfdc90fe6f
Author: Aaron Radzinski <[email protected]>
AuthorDate: Tue Mar 29 15:27:40 2022 -0700

    Update NCPipeline.java
---
 .../main/scala/org/apache/nlpcraft/NCPipeline.java | 115 ++++++++++++++++-----
 1 file changed, 92 insertions(+), 23 deletions(-)

diff --git a/nlpcraft/src/main/scala/org/apache/nlpcraft/NCPipeline.java 
b/nlpcraft/src/main/scala/org/apache/nlpcraft/NCPipeline.java
index 867d20e..f25dd95 100644
--- a/nlpcraft/src/main/scala/org/apache/nlpcraft/NCPipeline.java
+++ b/nlpcraft/src/main/scala/org/apache/nlpcraft/NCPipeline.java
@@ -28,81 +28,150 @@ import java.util.Optional;
  * pipeline and produce the list of {@link NCEntity entities} at the end of 
the pipeline.
  * Schematically the pipeline looks like this:
  * <pre>
- *                                   +----------+        +-----------+
- * *=========*    +---------+    +---+-------+  |    +---+-------+   |
- * :  Text   : -> |  Token  | -> | Token     |  | -> | Token      |  | ----.
- * :  Input  :    |  Parser |    | Enrichers |--+    | Validators |--+      \
- * *=========*    +---------+    +-----------+       +------------+          \
- *                                                                            }
- *                    +-----------+        +----------+        +--------+    /
- * *=========*    +---+--------+  |    +---+-------+  |    +---+-----+  |   /
- * :  Entity : <- | Entity     |  | <- | Entity    |  | <- | Entity  |  | <-
- * :  List   :    | Validators |--+    | Enrichers |--+    | Parsers |--+
- * *=========*    +------------+       +-----------+       +---------+
+ *                                                ,---------.        
,----------.        ,-------.
+ *   o/         *=========*    ,---------.    ,---'-------. |    
,---'--------. |    ,---'-----. |
+ *  /|     ->   :  Text   : -> |  Token  | -> | Token     | | -> | Token      
| | -> | Entity  | |
+ *  / \         :  Input  :    |  Parser |    | Enrichers |-'    | Validators 
|-'    | Parsers |-'
+ *              *=========*    `---------'    `-----------'      
`------------'      `---------'
+ *                                                                             
            |
+ *                                                   ,----------.        
,---------.       |
+ *              *============*    ,---------.    ,---'--------. |    
,---'-------. |       |
+ * Intent   <-  :  Entity    : <- | Variant | <- | Entity     | | <- | Entity  
  | | <-----'
+ * Matching     :  Variants  :    | Filter  |    | Validators |-'    | 
Enrichers |-'
+ *              *============*    `---------'    `------------'      
`-----------'
  * </pre>
  * <p>
  * Pipeline has the following components:
  * <ul>
  *     <li>
- *         {@link NCTokenParser} is responsible for taking the input text and 
tokenize it into a list of
- *         {@link NCToken}. This process is called tokenization, i.e. the 
process of demarcating and
- *         classifying sections of a string of input characters. There's only 
one token parser for the pipeline.
+ *         <p>
+ *              {@link NCTokenParser} is responsible for taking the input text 
and tokenize it into a list of
+ *              {@link NCToken}. This process is called tokenization, i.e. the 
process of demarcating and
+ *              classifying sections of a string of input characters. There's 
only one token parser for the pipeline
+ *              and token parser is mandatory part of the pipeline.
+ *         </p>
  *     </li>
  *     <li>
- *         After the initial list of token is
+ *         <p>
+ *              After the initial list of token is created one or more {@link 
NCTokenEnricher} are called to enrich
+ *              each token. Enrichment consists of adding properties to {@link 
NCToken} instance. Example of enrichers
+ *              could be stopword detection, geo-location detection, POS 
tagging, etc. Token enrichers are optional and
+ *              by default the list of token enrichers is empty.
+ *         </p>
+ *     </li>
+ *     <li>
+ *         <p>
+ *              After all tokens are enriched the {@link NCTokenValidator} are 
called. Token validators provide an opportunity
+ *              to reject input request at the early stage of token 
processing. Some of the examples of token validation
+ *              can be curse words filtration, privacy checks, adult content 
blocking, etc. Token validators are optional
+ *              and by default the list of token validators is empty.
+ *         </p>
+ *     </li>
+ *     <li>
+ *         <p>
+ *              Once tokens are parsed, enriched and validated they are passed 
into one or more {@link NCEntityParser}.
+ *              Entity parser is responsible for taking a list of tokens and 
converting them into a list of entity, where
+ *              an entity is typically has a consistent semantic meaning and 
usually denotes a real-world object, such as
+ *              persons, locations, number, date and time, organizations, 
products, etc. - where such objects can be
+ *              abstract or have a physical existence.
+ *         </p>
+ *         <p>
+ *              At least one entity parser must be defined in the pipeline. If 
multiple parsers are defined their collective
+ *              output is combined for further processing. Note that it is 
possible and in many cases is required that a single
+ *              list of tokens can be converted to the list of entities in 
more than one way that is called {@link NCVariant}.
+ *              Having multiple entity parsers allows to compartmentalize this 
logic.
+ *         </p>
+ *     </li>
+ *     <li>
+ *         <p>
+ *              Just like with tokens, once entity list (or lists) are 
obtained, they go through {@link NCEntityEnricher}.
+ *              Entity enrichment consists of adding properties to {@link 
NCEntity} instance. Entity enrichers are optional
+ *              and by default the list of entity enrichers is empty. Examples 
of the entity enrichment are always application
+ *              specific since they are dealing with application specific 
entities: it could be access tokens, special
+ *              markers, etc.
+ *         </p>
+ *     </li>
+ *     <li>
+ *         <p>
+ *              After entity enrichment is done the list(s) of entities go 
through {@link NCEntityValidator}. Just like
+ *              token validators, entity validators allow to reject input 
request at the level of entity processing.
+ *              Entity validators are optional and by default the list of 
entity validators is empty. Examples of the entity
+ *              validators can be security checks, authentication and 
authorization, ACL checks, etc.
+ *         </p>
+ *     </li>
+ *     <li>
+ *         <p>
+ *              Finally, there is an optional filter for {@link NCVariant} 
instances before they get into intent matching. This
+ *              filter allows to filter out unnecessary (or spurious) parsing 
variants based on application-specific logic.
+ *              Note that amount of parsing variants directly correlates to 
the overall performance of intent matching.
+ *         </p>
  *     </li>
  * </ul>
  *
- *
+ * @see NCEntity
+ * @see NCToken
+ * @see NCTokenParser
+ * @see NCTokenEnricher
+ * @see NCTokenValidator
+ * @see NCEntityParser
+ * @see NCEntityEnricher
+ * @see NCEntityValidator
  */
 public interface NCPipeline {
     /**
+     * Gets mandatory token parser.
      *
-     * @return
+     * @return Token parser.
      */
     NCTokenParser getTokenParser();
 
     /**
+     * Gets the list of entity parser. At least one entity parser is required.
      *
-     * @return
+     * @return List of entity parser. List should contain at least one entity 
parser.
      */
     List<NCEntityParser> getEntityParsers();
 
     /**
+     * Gets optional list of token enrichers.
      *
-     * @return
+     * @return Optional list of token enrichers. Can be empty but never {@code 
null}.
      */
     default List<NCTokenEnricher> getTokenEnrichers() {
         return Collections.emptyList();
     }
 
     /**
+     * Gets optional list of entity enrichers.
      *
-     * @return
+     * @return Optional list of entity enrichers. Can be empty but never 
{@code null}.
      */
     default List<NCEntityEnricher> getEntityEnrichers() {
         return Collections.emptyList();
     }
 
     /**
+     * Gets optional list of token validators.
      *
-     * @return
+     * @return Optional list of token validators. Can be empty but never 
{@code null}.
      */
     default List<NCTokenValidator> getTokenValidators() {
         return Collections.emptyList();
     }
 
     /**
+     * Gets optional list of entity validators.
      *
-     * @return
+     * @return Optional list of entity validators. Can be empty but never 
{@code null}.
      */
     default List<NCEntityValidator> getEntityValidators() {
         return Collections.emptyList();
     }
 
     /**
+     * Gets optional variant filter.
      *
-     * @return
+     * @return Optional variant filter.
      */
     default Optional<NCVariantFilter> getVariantFilter() {
         return Optional.empty();

[incubator-nlpcraft] branch master updated: Update NCPipeline.java

Reply via email to