GitHub user chenlica closed a discussion: Operator property specification (from old wiki)
>From the page >https://github.com/apache/texera/wiki/Operator-property-specification (may be >dangling) ===== This document describes the properties for each operator in Texera. It serves as the communication API of the operators and query plans between `Texera-GUI` and `Texera-Web`. Author: Zuozhi Wang, Kishore Narendran ```json All operators mentioned below commonly have a required property: "attributes" and two optional properties: "limit" and "offset". { "attributes" : "attr1_name, attr2_name, attr3_name", "limit" : "10 (this property is optional)", "offset" : "5 (this property is optional)" } Matcher operators: { "operator_type" : "KeywordMatcher", "keyword" : "a_keyword", "matching_type" : "one of: [conjunction, phrase, substring]" } { "operator_type" : "DictionaryMatcher", "dictionary" : "dict_entry_1, dict_entry_2, dict_entry_3", "matching_type" : "one of: [conjunction, phrase, substring]" } { "operator_type" : "RegexMatcher", "regex" : "a_regex", } { "operator_type" : "FuzzyTokenMatcher", "query" : "a query of fuzzy token matcher", "threshold_ratio" : "0.8", } { "operator_type" : "NlpExtractor", "nlp_type" : "one of: [Noun, Verb, Adjective, Adverb, NE_ALL, Number, Location, Person, Organization, Money, Percent, Date, Time] (case insensitive)", } { "operator_type" : "Join", "inner_attribute" : "inner_attr_name", "outer_attribute" : "outer_attr_name", "predicate_type" : "one of [CharacterDistance, SimilarityJoin]", "threshold" : "10" } notice that join doesn't have attributes, instead, it has inner_attribute and outer_attribute. { "operator_type" : "Projection", "attributes" : "attr_1_name, attr_2_name" } Source Operators: Keyword, Regex, FuzzyToken, and Dictionary have their corresponding source operator, which adds a another property of "dataSource". { "operator_type" : "KeywordSource", "data_source" : "data_source_name", "keyword" : "a_keyword", "matching_type" : "one of: [conjunction, phrase, substring]" } { "operator_type" : "DictionarySource", "data_source" : "data_source_name", "dictionary" : "dict_entry_1, dict_entry_2, dict_entry_3", "matching_type" : "one of: [conjunction, phrase, substring]" } { "operator_type" : "RegexSource", "dataSource" : "data_source_name", "regex" : "a_regex" } { "operator_type" : "FuzzyTokenSource", "data_source" : "data_source_name", "query" : "a query of fuzzy token matcher", "threshold_ratio" : "0.8", } Sink Operators: { "operator_type" : "FileSink", "file_path" : "file_path" } { "operator_type" : "IndexSink", "index_path" : "index_path", "index_name" : "name_of_index" } { "operator_type" : "TupleStreamSink" } The JSON format representing the operator graph will be: { "operators" : [ { "operator_id" : "operator_1_id", "operator properties as mentioned above" : "some properties" }, { "operator_id" : "operator_2_id", "operator properties as mentioned above" : "some properties" } ], "links" : [ { "from" : "operator_1_id", "to" : "operator_2_id" } ] } ``` GitHub link: https://github.com/apache/texera/discussions/3977 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
