Re: Issue with Incremental window aggregation using Aggregate function.

2023-05-18 Thread Sumanta Majumdar
Any thoughts on this?

On Fri, Apr 21, 2023 at 4:10 PM Sumanta Majumdar 
wrote:

> Hi,
>
> Currently we have a streaming use case where we have a flink application
> which runs on a session cluster which is responsible for reading data from
> Kafka source which is basically table transaction events getting ingested
> into the upstream kafka topic which is converted to a row and then
> deduplicated to extract distinct rows and persist via a sink function to an
> external warehouse such as vertica or snowflake schema.
>
> Now initially what we have observed by using TumblingWindow windows
> assigner implementation is that the state sizes are growing unconditionally
> even when we have tuned rocksdb options and provided a good chunk of
> managed memory.
>
> We are able to read more than 15 records within a period of 4 mins
> which is our time window set based on our requirements.
>
> Now one optimization which I see is suggested through the flink docs in
> order to reduce the state size is to use incremental aggregation using
> reduce or aggregate functions available.
>
> Now I did use aggregate function along with window in order implement the
> same but now I am observing that the consumption rate has reduced
> drastically post this implementation which has increased the overall
> throughput of the pipeline.
>
> Any thoughts as to why this can happen?
>
> --
> Thanks and Regards,
> Sumanta Majumdar
>


-- 
Thanks and Regards,
Sumanta Majumdar


Issue with Incremental window aggregation using Aggregate function.

2023-04-21 Thread Sumanta Majumdar
Hi,

Currently we have a streaming use case where we have a flink application
which runs on a session cluster which is responsible for reading data from
Kafka source which is basically table transaction events getting ingested
into the upstream kafka topic which is converted to a row and then
deduplicated to extract distinct rows and persist via a sink function to an
external warehouse such as vertica or snowflake schema.

Now initially what we have observed by using TumblingWindow windows
assigner implementation is that the state sizes are growing unconditionally
even when we have tuned rocksdb options and provided a good chunk of
managed memory.

We are able to read more than 15 records within a period of 4 mins
which is our time window set based on our requirements.

Now one optimization which I see is suggested through the flink docs in
order to reduce the state size is to use incremental aggregation using
reduce or aggregate functions available.

Now I did use aggregate function along with window in order implement the
same but now I am observing that the consumption rate has reduced
drastically post this implementation which has increased the overall
throughput of the pipeline.

Any thoughts as to why this can happen?

-- 
Thanks and Regards,
Sumanta Majumdar