GitHub user chenlica created a discussion: Operator property specification 
(from old wiki)

>From the page 
>https://github.com/apache/texera/wiki/Operator-property-specification (may be 
>dangling)

=====

This document describes the properties for each operator in Texera. It serves 
as the communication API of the operators and query plans between `Texera-GUI` 
and `Texera-Web`.

Author: Zuozhi Wang, Kishore Narendran

```json

All operators mentioned below commonly have a required property: "attributes" 
and two optional properties: "limit" and "offset".

{
        "attributes" : "attr1_name, attr2_name, attr3_name",
        "limit" : "10 (this property is optional)",
        "offset" : "5 (this property is optional)"
}

Matcher operators:
{
        "operator_type" : "KeywordMatcher",
        "keyword" : "a_keyword",
        "matching_type" : "one of: [conjunction, phrase, substring]"
}

{
        "operator_type" : "DictionaryMatcher",
        "dictionary" : "dict_entry_1, dict_entry_2, dict_entry_3",
        "matching_type" : "one of: [conjunction, phrase, substring]"
}

{
        "operator_type" : "RegexMatcher",
        "regex" : "a_regex",
}

{
        "operator_type" : "FuzzyTokenMatcher",
        "query" : "a query of fuzzy token matcher",
        "threshold_ratio" : "0.8",
}

{
        "operator_type" : "NlpExtractor",
        "nlp_type" : "one of: [Noun, Verb, Adjective, Adverb, NE_ALL, Number, 
Location, Person, Organization, Money, Percent, Date, Time] (case insensitive)",
}


{
        "operator_type" : "Join",
        "inner_attribute" : "inner_attr_name",
        "outer_attribute" : "outer_attr_name",
        "predicate_type" : "one of [CharacterDistance, SimilarityJoin]",
        "threshold" : "10"
}
notice that join doesn't have attributes, instead, it has inner_attribute and 
outer_attribute.

{
        "operator_type" : "Projection",
        "attributes" : "attr_1_name, attr_2_name"
}

Source Operators:
Keyword, Regex, FuzzyToken, and Dictionary have their corresponding source 
operator, which adds a another property of "dataSource".

{
        "operator_type" : "KeywordSource",
        "data_source" : "data_source_name",
        "keyword" : "a_keyword",
        "matching_type" : "one of: [conjunction, phrase, substring]"
}
{
        "operator_type" : "DictionarySource",
        "data_source" : "data_source_name",
        "dictionary" : "dict_entry_1, dict_entry_2, dict_entry_3",
        "matching_type" : "one of: [conjunction, phrase, substring]"
}

{
        "operator_type" : "RegexSource",
        "dataSource" : "data_source_name",
        "regex" : "a_regex"
}

{
        "operator_type" : "FuzzyTokenSource",
        "data_source" : "data_source_name",
        "query" : "a query of fuzzy token matcher",
        "threshold_ratio" : "0.8",
}

Sink Operators:
{
        "operator_type" : "FileSink",
        "file_path" : "file_path"
}

{
        "operator_type" : "IndexSink",
        "index_path" : "index_path",
        "index_name" : "name_of_index"
}

{
        "operator_type" : "TupleStreamSink"
}


The JSON format representing the operator graph will be:
{
        "operators" : [
        {
                "operator_id" : "operator_1_id",
                "operator properties as mentioned above" : "some properties"
        },
        {
                "operator_id" : "operator_2_id",
                "operator properties as mentioned above" : "some properties"
        }
        ],
        "links" : [
        {
                "from" : "operator_1_id",
                "to" : "operator_2_id"
        }
        ]
}

```

GitHub link: https://github.com/apache/texera/discussions/3977

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]

Reply via email to