Re: Query on retract stream

2019-01-27 Thread Gagan Agrawal
d...@flink.apache.org:lte=1M:FLIP-32 > > > On Sat, Jan 26, 2019 at 4:11 PM Gagan Agrawal > wrote: > >> Thanks Hequn for suggested solutions and I think this should really work >> and will give it a try. As I understand First solution of using multiple >> wind

Re: Query on retract stream

2019-01-25 Thread Gagan Agrawal
05) [error] at org.apache.flink.table.api.scala.StreamTableEnvironment.toRetractStream(StreamTableEnvironment.scala:185) Gagan On Tue, Jan 22, 2019 at 7:01 PM Gagan Agrawal wrote: > Thanks Hequn for your response. I initially thought of trying out "over > window" clause, however as per

Re: Query on retract stream

2019-01-22 Thread Gagan Agrawal
> [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/tableApi.html#over-windows > [2] > https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/windows.html > > > > On Tue, Jan 22, 2019 at 12:58 AM Gagan Agrawal > wrote: > >

Re: Query on retract stream

2019-01-21 Thread Gagan Agrawal
ROUP BY >> uid >> >> Where `changelog` is an append only stream with the following content: >> >> *user, order, status, event_time* >> u1, o1, pending, t1 >> u2, o2, failed, t2 >> *u1, o3, pending, t3* >> *u1, o3, success, t4* >> u2, o4, pending,

Query on retract stream

2019-01-18 Thread Gagan Agrawal
Hi, I have a requirement and need to understand if same can be achieved with Flink retract stream. Let's say we have stream with 4 attributes userId, orderId, status, event_time where orderId is unique and hence any change in same orderId updates previous value as below *Changelog* *Event Stream*

Re: Multiple MapState vs single nested MapState in stateful Operator

2019-01-11 Thread Gagan Agrawal
s de-serialized (and serialized in case of a > put()). > Given this, it is more efficient to have many keys, with small state, than > fewer keys with huge state. > > Cheers, > Kostas > > > On Thu, Jan 10, 2019 at 12:34 PM Congxian Qiu > wrote: > >> Hi,

Custom Serializer for Avro GenericRecord

2019-01-09 Thread Gagan Agrawal
Hi, I am using Avro GenericRecord for most of IN/OUT types from my custom functions. What I have noticed is that default Avro GenericRecord serializer, also serializes Schema which makes messages very heavy and hence impacts overall performance. In my case I already know the schema before hand

Multiple MapState vs single nested MapState in stateful Operator

2019-01-09 Thread Gagan Agrawal
Hi, I have a use case where 4 streams get merged (union) and grouped on common key (keyBy) and a custom KeyedProcessFunction is called. Now I need to keep state (RocksDB backend) for all 4 streams in my custom KeyedProcessFunction where each of these 4 streams would be stored as map. So I have 2

Re: Buffer stats when Back Pressure is high

2019-01-08 Thread Gagan Agrawal
h to > use a broadcast join that distributes the second stream to all operators > such that all operators can perform the enrichment step in a round robin > fashion. > > Regards, > Timo > > Am 07.01.19 um 14:45 schrieb Gagan Agrawal: > > Flink Version is 1.7. > Tha

Re: Buffer stats when Back Pressure is high

2019-01-07 Thread Gagan Agrawal
Flink Version is 1.7. Thanks Zhijiang for your pointer. Initially I was checking only for few. However I just checked for all and found couple of them having queue length of 40+ which seems to be due to skewness in data. Is there any general guide lines on how to handle skewed data? In my case I

Buffer stats when Back Pressure is high

2019-01-06 Thread Gagan Agrawal
Hi, I want to understand does any of buffer stats help in debugging / validating that downstream operator is performing slow when Back Pressure is high? Say I have A -> B operators and A shows High Back Pressure which indicates something wrong or not performing well on B side which is slowing down

Joining more than 2 streams

2018-11-24 Thread Gagan Agrawal
Hi, I want to do window join on multiple Kafka streams (say a, b, c) on common field in all 3 streams and apply some custom function on joined stream. As I understand we can join only 2 streams at a time via DataStream api. So may be I need to join a and b first and then join first joined stream

Flink job failing due to "Container is running beyond physical memory limits" error.

2018-11-23 Thread Gagan Agrawal
Hi, I am running flink job on yarn where it ran fine so far (4-5 days) and have now started failing with following errors. 2018-11-24 03:46:21,029 INFO org.apache.flink.yarn.YarnResourceManager - Closing TaskExecutor connection container_1542008917197_0038_01_06 because:

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-11-19 Thread Gagan Agrawal
ead my email >>> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Why-documentation-always-say-checkpoint-does-not-support-Flink-specific-features-like-rescaling-td23982.html> >>> posted in dev-mail-list. >>> >>> Best >>> Yun Tang >>>

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-11-02 Thread Gagan Agrawal
n-always-say-checkpoint-does-not-support-Flink-specific-features-like-rescaling-td23982.html> > posted in dev-mail-list. > > Best > Yun Tang > ---------- > *From:* Gagan Agrawal > *Sent:* Thursday, November 1, 2018 13:38 > *To:* myas...@live.com > *Cc:* hap

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-10-31 Thread Gagan Agrawal
etained-checkpoint> > Deployment & Operations; State & Fault Tolerance; Checkpoints; > Checkpoints. Overview; Retained Checkpoints. Directory Structure; > Difference to Savepoints; Resuming from a retained checkpoint > ci.apache.org > > Best > Yun Tang > > -

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-10-31 Thread Gagan Agrawal
it is the case you have met. > > Best > Henry > > > 在 2018年10月30日,下午6:07,Gagan Agrawal 写道: > > > > Hi, > > We have a flink job (flink version 1.6.1) which unions 2 streams to pass > through custom KeyedProcessFunction with RocksDB state store which final > c

Savepoint failed with error "Checkpoint expired before completing"

2018-10-30 Thread Gagan Agrawal
Hi, We have a flink job (flink version 1.6.1) which unions 2 streams to pass through custom KeyedProcessFunction with RocksDB state store which final creates another stream into Kafka. Current size of checkpoint is around ~100GB and checkpoints are saved to s3 with 5 mins interval and incremental