flink batch execution mode

2023-04-26 Thread Lu Niu
Hi, Flink users

I am trying to understand the internals of flink batch mode. some questions:

1. Does flink batch mode use columnar in-memory format?
2. Does flink batch mode use vectorization technique?
3. any performance benchmark available compared with batch engines like
spark or presto?

Some code pointers would be great! Thanks!

Best
Lu


Flink Batch Execution Mode

2024-03-12 Thread irakli.keshel...@sony.com
Hello,

I have a Flink job that is running in the Batch mode. The source for the job is 
a Kafka topic which has limited number of events. I can see that the job starts 
running fine and consumes the events, but never makes it past the first task 
and becomes idle. The Kafka source is defined to be bounded by following 
command: "KafkaSource.builder().setBounded(OffsetsInitializer.latest())".
I expect the job to consume all the events that are in the Kafka topic and then 
move to the next task, but I'm not sure if the "OffsetsInitializer.latest()" is 
the right OffsetInitializer. Can anyone help me out here? Thanks!

Cheers,
Irakli


Re: flink batch execution mode

2023-04-26 Thread Shammon FY
Hi Lu,

Currently, Flink does not have official benchmark results compared to Spark
and Presto. You can run the TPC-DS benchmark yourself to compare to
different engines, Flink supports all queries for TPC-DS, for example, the
benchmark project [1]. Of course, other companies have made similar
comparisons, such as this document [2], and the results are for reference
only.

[1] https://github.com/ververica/flink-sql-benchmark
[2]
https://files.alicdn.com/tpsservice/c399186c83bb17dc5dd72cbd0c6ccf71.pdf?spm=a2csy.flink.0.0.49496ea8lXSLb8&file=c399186c83bb17dc5dd72cbd0c6ccf71.pdf

Best,
Shammon FY

On Thu, Apr 27, 2023 at 1:33 AM Lu Niu  wrote:

> Hi, Flink users
>
> I am trying to understand the internals of flink batch mode. some
> questions:
>
> 1. Does flink batch mode use columnar in-memory format?
> 2. Does flink batch mode use vectorization technique?
> 3. any performance benchmark available compared with batch engines like
> spark or presto?
>
> Some code pointers would be great! Thanks!
>
> Best
> Lu
>
>
>


Re: flink batch execution mode

2023-04-26 Thread Lu Niu
Thanks! Shammon! Do you also have insights on the first 2 questions as
well? Thanks!

1. Does flink batch mode use columnar in-memory format?
2. Does flink batch mode use vectorization technique?

Best
Lu


On Wed, Apr 26, 2023 at 7:51 PM Shammon FY  wrote:

> Hi Lu,
>
> Currently, Flink does not have official benchmark results compared to
> Spark and Presto. You can run the TPC-DS benchmark yourself to compare to
> different engines, Flink supports all queries for TPC-DS, for example, the
> benchmark project [1]. Of course, other companies have made similar
> comparisons, such as this document [2], and the results are for reference
> only.
>
> [1] https://github.com/ververica/flink-sql-benchmark
> [2]
> https://files.alicdn.com/tpsservice/c399186c83bb17dc5dd72cbd0c6ccf71.pdf?spm=a2csy.flink.0.0.49496ea8lXSLb8&file=c399186c83bb17dc5dd72cbd0c6ccf71.pdf
>
> Best,
> Shammon FY
>
> On Thu, Apr 27, 2023 at 1:33 AM Lu Niu  wrote:
>
>> Hi, Flink users
>>
>> I am trying to understand the internals of flink batch mode. some
>> questions:
>>
>> 1. Does flink batch mode use columnar in-memory format?
>> 2. Does flink batch mode use vectorization technique?
>> 3. any performance benchmark available compared with batch engines like
>> spark or presto?
>>
>> Some code pointers would be great! Thanks!
>>
>> Best
>> Lu
>>
>>
>>


Re: flink batch execution mode

2023-04-26 Thread weijie guo
Hi Lu,

At present, Flink is still based on row format in the runtime execution
layer and vectorization relies mostly on automatic compiler optimizations
of the JVM.


Best regards,

Weijie


Shammon FY  于2023年4月27日周四 10:52写道:

> Hi Lu,
>
> Currently, Flink does not have official benchmark results compared to
> Spark and Presto. You can run the TPC-DS benchmark yourself to compare to
> different engines, Flink supports all queries for TPC-DS, for example, the
> benchmark project [1]. Of course, other companies have made similar
> comparisons, such as this document [2], and the results are for reference
> only.
>
> [1] https://github.com/ververica/flink-sql-benchmark
> [2]
> https://files.alicdn.com/tpsservice/c399186c83bb17dc5dd72cbd0c6ccf71.pdf?spm=a2csy.flink.0.0.49496ea8lXSLb8&file=c399186c83bb17dc5dd72cbd0c6ccf71.pdf
>
> Best,
> Shammon FY
>
> On Thu, Apr 27, 2023 at 1:33 AM Lu Niu  wrote:
>
>> Hi, Flink users
>>
>> I am trying to understand the internals of flink batch mode. some
>> questions:
>>
>> 1. Does flink batch mode use columnar in-memory format?
>> 2. Does flink batch mode use vectorization technique?
>> 3. any performance benchmark available compared with batch engines like
>> spark or presto?
>>
>> Some code pointers would be great! Thanks!
>>
>> Best
>> Lu
>>
>>
>>


Re: Flink Batch Execution Mode

2024-03-12 Thread Feng Jin
Hi Irakli

What version of flink-connector-kafka are you using?
You may have encountered a bug [1] in the old version that prevents the
source task from entering the finished state.


[1]. https://issues.apache.org/jira/browse/FLINK-31319

Best,
Feng


On Tue, Mar 12, 2024 at 7:21 PM irakli.keshel...@sony.com <
irakli.keshel...@sony.com> wrote:

> Hello,
>
> I have a Flink job that is running in the Batch mode. The source for the
> job is a Kafka topic which has limited number of events. I can see that the
> job starts running fine and consumes the events, but never makes it past
> the first task and becomes idle. The Kafka source is defined to be bounded
> by following command:
> "KafkaSource.builder().setBounded(OffsetsInitializer.latest())".
> I expect the job to consume all the events that are in the Kafka topic and
> then move to the next task, but I'm not sure if the "
> OffsetsInitializer.latest()" is the right OffsetInitializer. Can anyone
> help me out here? Thanks!
>
> Cheers,
> Irakli
>


Re: Flink Batch Execution Mode

2024-03-13 Thread irakli.keshel...@sony.com
Hi Feng,

I'm using flink-connector-kafka 3.0.1-1.17. I see that 1.17 is affected, but 
the ticket is marked as fixed so I'm not sure if that is actually the issue.

Best,
Irakli

From: Feng Jin 
Sent: 12 March 2024 18:28
To: Keshelava, Irakli 
Cc: user@flink.apache.org 
Subject: Re: Flink Batch Execution Mode

Hi Irakli

What version of flink-connector-kafka are you using?
You may have encountered a bug [1] in the old version that prevents the source 
task from entering the finished state.


[1]. 
https://issues.apache.org/jira/browse/FLINK-31319<https://issues.apache.org/jira/browse/FLINK-31319>

Best,
Feng


On Tue, Mar 12, 2024 at 7:21 PM 
irakli.keshel...@sony.com<mailto:irakli.keshel...@sony.com> 
mailto:irakli.keshel...@sony.com>> wrote:
Hello,

I have a Flink job that is running in the Batch mode. The source for the job is 
a Kafka topic which has limited number of events. I can see that the job starts 
running fine and consumes the events, but never makes it past the first task 
and becomes idle. The Kafka source is defined to be bounded by following 
command: "KafkaSource.builder().setBounded(OffsetsInitializer.latest())".
I expect the job to consume all the events that are in the Kafka topic and then 
move to the next task, but I'm not sure if the "OffsetsInitializer.latest()" is 
the right OffsetInitializer. Can anyone help me out here? Thanks!

Cheers,
Irakli