Hi Jark,

Thank you for your e-mail. I agree, let's engage all interested parties in this 
discussion - I'm writing this e-mail to both Flink and Calcite dev mailing 
lists.

I'll repeat myself to present the proposal to the Calcite community.

I would like to propose an enrichment of existing Flink SQL MATCH_RECOGNIZE 
syntax to cover for the case of the absence of an event. Such an enrichment 
would help our company solve a business case containing timed-out patterns 
handling. An example of usage of such a clause from Flink training exercises 
could be a task of identification of taxi rides with a START event that is not 
followed by an END event within two hours. Currently, a solution to such a task 
could be achieved with the use of CEP and a timeout handler. However, as far as 
I know, it is impossible to take advantage of Flink SQL syntax for this task. 

I can think of two ways for such a feature to be incorporated into existing 
MATCH_RECOGNIZE syntax:
- In analogy to CEP, a keyword could be added which would determine, if timed 
out matches should be dropped altogether or available either through side 
output or main output. SQL usage could be similar to the current WITHIN clause, 
f.e. "PATTERN (A B C) TIMEOUT INTERVAL '30' SECOND" would output partially 
matched patterns 30 seconds after A event appearance.

- Add possibility to define absence of event inside pattern definition - for 
example "PATTERN (A B !C) WITHIN INTERVAL '30' SECOND" would output partially 
matched patterns with the occurrence of A and B event 30 seconds after A event 
appearance.

In our company we did some basic testing of this concept - we modified existing 
MatchCodeGenerator to add processTimedOutMatch function based on a boolean 
trigger and tested it against the aforementioned business case containing 
timed-out patterns handling.

I'm interested to hear your thoughts about how we could help Flink SQL be able 
to express these kinds of cases. 

With regards,
Kosma Grochowski



> On 21 Sep 2020, at 05:12, Jark Wu <imj...@gmail.com> wrote:
> 
> Hi Kosma,
> 
> Thanks for the proposal. I like it and we also have supported similar
> syntax in our company.
> The problem is that Flink SQL leverages Calcite as the query parser, so if
> we want to support this syntax, we may have to push this syntax back to the
> Calcite community.
> Besides, the SQL standard doesn't define the timeout syntax for MATCH
> RECOGNIZE. So we have to extend the standard and this is usually not
> trivial.
> 
> So I think it would be better to have a joint discussion with the Calcite
> and Flink community together. What do you think?
> 
> Best,
> Jark
> 
> 
> 
> 
> 
> On Fri, 18 Sep 2020 at 22:48, Kosma Grochowski <
> kosma.grochow...@getindata.com> wrote:
> 
>> Hello,
>> 
>> I would like to propose an enrichment of existing Flink SQL
>> MATCH_RECOGNIZE syntax to cover for the case of the absence of an event.
>> Such an enrichment would help our company solve a business case containing
>> timed-out patterns handling. An example of usage of such a clause from
>> Flink training exercises could be a task of identification of taxi rides
>> with a START event that is not followed by an END event within two hours.
>> Currently, a solution to such a task could be achieved with the use of CEP
>> and a timeout handler. However, as far as I know, it is impossible to take
>> advantage of Flink SQL syntax for this task.
>> 
>> I can think of two ways for such a feature to be incorporated into
>> existing MATCH_RECOGNIZE syntax:
>> - In analogy to CEP, a keyword could be added which would determine, if
>> timed out matches should be dropped altogether or available either through
>> side output or main output. SQL usage could be similar to the current
>> WITHIN clause, f.e. "PATTERN (A B C) TIMEOUT INTERVAL '30' SECOND" would
>> output partially matched patterns 30 seconds after A event appearance.
>> 
>> - Add possibility to define absence of event inside pattern definition -
>> for example "PATTERN (A B !C) WITHIN INTERVAL '30' SECOND" would output
>> partially matched patterns with the occurrence of A and B event 30 seconds
>> after A event appearance.
>> 
>> In our company we did some basic testing of this concept - we modified
>> existing MatchCodeGenerator to add processTimedOutMatch function based on a
>> boolean trigger and tested it against the aforementioned business case
>> containing timed-out patterns handling.
>> 
>> 
>> I'm interested to hear your thoughts about how we could help Flink SQL be
>> able to express these kinds of cases.
>> 
>> With regards,
>> Kosma Grochowski
>> 
>> 
>> 
>> 

Reply via email to