Thank you Julian for mentioning the anti-join. With its help, I managed to solve our particular case similarly as follows:
``` SELECT e.* FROM events e LEFT JOIN patterns p ON e.record_id = p.begin_record_id WHERE e.pattern_val = 'BEGIN' AND p.begin_record_id is null ``` However, I'm thinking that such an approach will fail for more complicated patterns than `BEGIN !END`, for example determining on which event did the pattern `A B{1,N} A{1,N} B` time out does not seem suitable for such an approach. Moreover, this way of proceeding seems like a workaround of MATCH_RECOGNIZE limitations in dealing with absent events. I can’t think of a way to make these cases solved generically, and such pattern extensions would be the way to do that. With regards, Kosma > On 22 Sep 2020, at 20:29, Julian Hyde <jh...@apache.org> wrote: > > Is there a better way? > > I'm am idealist with regard to streaming SQL semantics, and I'm going > to make the 'slippery slope' argument that if we add a TIMEOUT > parameter to MATCH_RECOGNIZE, won't we also need to add it to GROUP BY > and JOIN? (Because those are also "blocking" operators.) > > Maybe JOIN and GROUP BY are simpler because (absent retractions) they > are monotonic. If more data arrives, it will not cause rows to > disappear from your result. So, maybe anti-join is the best > comparison. How does Flink deal with, say "show me all orders from > customers who have not made a product return in the last 3 months"? > You'd need a timeout on the PRODUCT_RETURNS stream, right? > > My hunch is that Flink can express these semantics without extending > the syntax of JOIN, and if so, we could use the same approach to make > MATCH_RECOGNIZE work with late data. > > Julian > > On Mon, Sep 21, 2020 at 12:05 AM Kosma Grochowski > <kosma.grochow...@getindata.com> wrote: >> >> Hi Jark, >> >> Thank you for your e-mail. I agree, let's engage all interested parties in >> this discussion - I'm writing this e-mail to both Flink and Calcite dev >> mailing lists. >> >> I'll repeat myself to present the proposal to the Calcite community. >> >> I would like to propose an enrichment of existing Flink SQL MATCH_RECOGNIZE >> syntax to cover for the case of the absence of an event. Such an enrichment >> would help our company solve a business case containing timed-out patterns >> handling. An example of usage of such a clause from Flink training exercises >> could be a task of identification of taxi rides with a START event that is >> not followed by an END event within two hours. Currently, a solution to such >> a task could be achieved with the use of CEP and a timeout handler. However, >> as far as I know, it is impossible to take advantage of Flink SQL syntax for >> this task. >> >> I can think of two ways for such a feature to be incorporated into existing >> MATCH_RECOGNIZE syntax: >> - In analogy to CEP, a keyword could be added which would determine, if >> timed out matches should be dropped altogether or available either through >> side output or main output. SQL usage could be similar to the current WITHIN >> clause, f.e. "PATTERN (A B C) TIMEOUT INTERVAL '30' SECOND" would output >> partially matched patterns 30 seconds after A event appearance. >> >> - Add possibility to define absence of event inside pattern definition - for >> example "PATTERN (A B !C) WITHIN INTERVAL '30' SECOND" would output >> partially matched patterns with the occurrence of A and B event 30 seconds >> after A event appearance. >> >> In our company we did some basic testing of this concept - we modified >> existing MatchCodeGenerator to add processTimedOutMatch function based on a >> boolean trigger and tested it against the aforementioned business case >> containing timed-out patterns handling. >> >> I'm interested to hear your thoughts about how we could help Flink SQL be >> able to express these kinds of cases. >> >> With regards, >> Kosma Grochowski >> >> >> >>> On 21 Sep 2020, at 05:12, Jark Wu <imj...@gmail.com> wrote: >>> >>> Hi Kosma, >>> >>> Thanks for the proposal. I like it and we also have supported similar >>> syntax in our company. >>> The problem is that Flink SQL leverages Calcite as the query parser, so if >>> we want to support this syntax, we may have to push this syntax back to the >>> Calcite community. >>> Besides, the SQL standard doesn't define the timeout syntax for MATCH >>> RECOGNIZE. So we have to extend the standard and this is usually not >>> trivial. >>> >>> So I think it would be better to have a joint discussion with the Calcite >>> and Flink community together. What do you think? >>> >>> Best, >>> Jark >>> >>> >>> >>> >>> >>> On Fri, 18 Sep 2020 at 22:48, Kosma Grochowski < >>> kosma.grochow...@getindata.com> wrote: >>> >>>> Hello, >>>> >>>> I would like to propose an enrichment of existing Flink SQL >>>> MATCH_RECOGNIZE syntax to cover for the case of the absence of an event. >>>> Such an enrichment would help our company solve a business case containing >>>> timed-out patterns handling. An example of usage of such a clause from >>>> Flink training exercises could be a task of identification of taxi rides >>>> with a START event that is not followed by an END event within two hours. >>>> Currently, a solution to such a task could be achieved with the use of CEP >>>> and a timeout handler. However, as far as I know, it is impossible to take >>>> advantage of Flink SQL syntax for this task. >>>> >>>> I can think of two ways for such a feature to be incorporated into >>>> existing MATCH_RECOGNIZE syntax: >>>> - In analogy to CEP, a keyword could be added which would determine, if >>>> timed out matches should be dropped altogether or available either through >>>> side output or main output. SQL usage could be similar to the current >>>> WITHIN clause, f.e. "PATTERN (A B C) TIMEOUT INTERVAL '30' SECOND" would >>>> output partially matched patterns 30 seconds after A event appearance. >>>> >>>> - Add possibility to define absence of event inside pattern definition - >>>> for example "PATTERN (A B !C) WITHIN INTERVAL '30' SECOND" would output >>>> partially matched patterns with the occurrence of A and B event 30 seconds >>>> after A event appearance. >>>> >>>> In our company we did some basic testing of this concept - we modified >>>> existing MatchCodeGenerator to add processTimedOutMatch function based on a >>>> boolean trigger and tested it against the aforementioned business case >>>> containing timed-out patterns handling. >>>> >>>> >>>> I'm interested to hear your thoughts about how we could help Flink SQL be >>>> able to express these kinds of cases. >>>> >>>> With regards, >>>> Kosma Grochowski >>>> >>>> >>>> >>>> >>