Re: Query on retract stream

2019-01-27 Thread Gagan Agrawal
Thanks Hequn for sharing those details. Looking forward for Blink integration. I have one doubt around one of your earlier statements *> Also, currently, the window doesn't have the ability to handle retraction messages* When we use multi window (as you suggested), it is able to handle updates.

Re: Query on retract stream

2019-01-26 Thread Hequn Cheng
Hi Gagan, Besides the eventime and proctime difference, there is another difference between the two ways. The window aggregate on bounded data, while unbounded aggregate on unbounded data, i.e., the new coming data can update a very old data. As for the performance, I think the two ways may have

Re: Query on retract stream

2019-01-25 Thread Hequn Cheng
Hi Gagan, Time attribute fields will be materialized by the unbounded groupby. Also, currently, the window doesn't have the ability to handle retraction messages. I see two ways to solve the problem. - Use multi-window. The first window performs lastValue, the second performs count. - Use two

Re: Query on retract stream

2019-01-25 Thread Gagan Agrawal
Based on the suggestions in this mail thread, I tried out few experiments on upsert stream with flink 1.7.1 and here is the issue I am facing with window stream. *1. Global Pending order count. * Following query works fine and it's able to handle updates as per original requirement. select

Re: Query on retract stream

2019-01-22 Thread Gagan Agrawal
Thanks Hequn for your response. I initially thought of trying out "over window" clause, however as per documentation there seems to be limitation in "orderBy" clause where it allows only single time event/processing time attribute. Whereas in my case events are getting generated from mysql bin log

Re: Query on retract stream

2019-01-21 Thread Hequn Cheng
Hi Gagan, > But I also have a requirement for event time based sliding window aggregation Yes, you can achieve this with Flink TableAPI/SQL. However, currently, sliding windows don't support early fire, i.e., only output results when event time reaches the end of the window. Once window fires,

Re: Query on retract stream

2019-01-21 Thread Gagan Agrawal
Thank you guys. It's great to hear multiple solutions to achieve this. I understand that records once emitted to Kafka can not be deleted and that's acceptable for our use case as last updated value should always be correct. However as I understand most of these solutions will work for global

Re: Query on retract stream

2019-01-21 Thread Piotr Nowojski
@Jeff: It depends if user can define a time window for his condition. As Gagan described his problem it was about “global” threshold of pending orders. I have just thought about another solution that should work without any custom code. Converting “status” field to status_value int: - "+1”

Re: Query on retract stream

2019-01-21 Thread Jeff Zhang
I am thinking of another approach instead of retract stream. Is it possible to define a custom window to do this ? This window is defined for each order. And then you just need to analyze the events in this window. Piotr Nowojski 于2019年1月21日周一 下午8:44写道: > Hi, > > There is a missing feature in

Re: Query on retract stream

2019-01-21 Thread Piotr Nowojski
Hi, There is a missing feature in Flink Table API/SQL of supporting retraction streams as the input (or conversions from append stream to retraction stream) at the moment. With that your problem would simplify to one simple `SELECT uid, count(*) FROM Changelog GROUP BY uid`. There is an

Re: Query on retract stream

2019-01-21 Thread Hequn Cheng
Hi Gagan, Yes, you can achieve this with Flink TableAPI/SQL. However, you have to pay attention to the following things: 1) Currently, Flink only ingests append streams. In order to ingest upsert streams(steam with keys), you can use groupBy with a user-defined LAST_VALUE aggregate function. For

Query on retract stream

2019-01-18 Thread Gagan Agrawal
Hi, I have a requirement and need to understand if same can be achieved with Flink retract stream. Let's say we have stream with 4 attributes userId, orderId, status, event_time where orderId is unique and hence any change in same orderId updates previous value as below *Changelog* *Event Stream*