flink batch execution mode
Hi, Flink users I am trying to understand the internals of flink batch mode. some questions: 1. Does flink batch mode use columnar in-memory format? 2. Does flink batch mode use vectorization technique? 3. any performance benchmark available compared with batch engines like spark or presto? Some code pointers would be great! Thanks! Best Lu
Flink Batch Execution Mode
Hello, I have a Flink job that is running in the Batch mode. The source for the job is a Kafka topic which has limited number of events. I can see that the job starts running fine and consumes the events, but never makes it past the first task and becomes idle. The Kafka source is defined to be bounded by following command: "KafkaSource.builder().setBounded(OffsetsInitializer.latest())". I expect the job to consume all the events that are in the Kafka topic and then move to the next task, but I'm not sure if the "OffsetsInitializer.latest()" is the right OffsetInitializer. Can anyone help me out here? Thanks! Cheers, Irakli
Re: flink batch execution mode
Hi Lu, Currently, Flink does not have official benchmark results compared to Spark and Presto. You can run the TPC-DS benchmark yourself to compare to different engines, Flink supports all queries for TPC-DS, for example, the benchmark project [1]. Of course, other companies have made similar comparisons, such as this document [2], and the results are for reference only. [1] https://github.com/ververica/flink-sql-benchmark [2] https://files.alicdn.com/tpsservice/c399186c83bb17dc5dd72cbd0c6ccf71.pdf?spm=a2csy.flink.0.0.49496ea8lXSLb8&file=c399186c83bb17dc5dd72cbd0c6ccf71.pdf Best, Shammon FY On Thu, Apr 27, 2023 at 1:33 AM Lu Niu wrote: > Hi, Flink users > > I am trying to understand the internals of flink batch mode. some > questions: > > 1. Does flink batch mode use columnar in-memory format? > 2. Does flink batch mode use vectorization technique? > 3. any performance benchmark available compared with batch engines like > spark or presto? > > Some code pointers would be great! Thanks! > > Best > Lu > > >
Re: flink batch execution mode
Thanks! Shammon! Do you also have insights on the first 2 questions as well? Thanks! 1. Does flink batch mode use columnar in-memory format? 2. Does flink batch mode use vectorization technique? Best Lu On Wed, Apr 26, 2023 at 7:51 PM Shammon FY wrote: > Hi Lu, > > Currently, Flink does not have official benchmark results compared to > Spark and Presto. You can run the TPC-DS benchmark yourself to compare to > different engines, Flink supports all queries for TPC-DS, for example, the > benchmark project [1]. Of course, other companies have made similar > comparisons, such as this document [2], and the results are for reference > only. > > [1] https://github.com/ververica/flink-sql-benchmark > [2] > https://files.alicdn.com/tpsservice/c399186c83bb17dc5dd72cbd0c6ccf71.pdf?spm=a2csy.flink.0.0.49496ea8lXSLb8&file=c399186c83bb17dc5dd72cbd0c6ccf71.pdf > > Best, > Shammon FY > > On Thu, Apr 27, 2023 at 1:33 AM Lu Niu wrote: > >> Hi, Flink users >> >> I am trying to understand the internals of flink batch mode. some >> questions: >> >> 1. Does flink batch mode use columnar in-memory format? >> 2. Does flink batch mode use vectorization technique? >> 3. any performance benchmark available compared with batch engines like >> spark or presto? >> >> Some code pointers would be great! Thanks! >> >> Best >> Lu >> >> >>
Re: flink batch execution mode
Hi Lu, At present, Flink is still based on row format in the runtime execution layer and vectorization relies mostly on automatic compiler optimizations of the JVM. Best regards, Weijie Shammon FY 于2023年4月27日周四 10:52写道: > Hi Lu, > > Currently, Flink does not have official benchmark results compared to > Spark and Presto. You can run the TPC-DS benchmark yourself to compare to > different engines, Flink supports all queries for TPC-DS, for example, the > benchmark project [1]. Of course, other companies have made similar > comparisons, such as this document [2], and the results are for reference > only. > > [1] https://github.com/ververica/flink-sql-benchmark > [2] > https://files.alicdn.com/tpsservice/c399186c83bb17dc5dd72cbd0c6ccf71.pdf?spm=a2csy.flink.0.0.49496ea8lXSLb8&file=c399186c83bb17dc5dd72cbd0c6ccf71.pdf > > Best, > Shammon FY > > On Thu, Apr 27, 2023 at 1:33 AM Lu Niu wrote: > >> Hi, Flink users >> >> I am trying to understand the internals of flink batch mode. some >> questions: >> >> 1. Does flink batch mode use columnar in-memory format? >> 2. Does flink batch mode use vectorization technique? >> 3. any performance benchmark available compared with batch engines like >> spark or presto? >> >> Some code pointers would be great! Thanks! >> >> Best >> Lu >> >> >>
Re: Flink Batch Execution Mode
Hi Irakli What version of flink-connector-kafka are you using? You may have encountered a bug [1] in the old version that prevents the source task from entering the finished state. [1]. https://issues.apache.org/jira/browse/FLINK-31319 Best, Feng On Tue, Mar 12, 2024 at 7:21 PM irakli.keshel...@sony.com < irakli.keshel...@sony.com> wrote: > Hello, > > I have a Flink job that is running in the Batch mode. The source for the > job is a Kafka topic which has limited number of events. I can see that the > job starts running fine and consumes the events, but never makes it past > the first task and becomes idle. The Kafka source is defined to be bounded > by following command: > "KafkaSource.builder().setBounded(OffsetsInitializer.latest())". > I expect the job to consume all the events that are in the Kafka topic and > then move to the next task, but I'm not sure if the " > OffsetsInitializer.latest()" is the right OffsetInitializer. Can anyone > help me out here? Thanks! > > Cheers, > Irakli >
Re: Flink Batch Execution Mode
Hi Feng, I'm using flink-connector-kafka 3.0.1-1.17. I see that 1.17 is affected, but the ticket is marked as fixed so I'm not sure if that is actually the issue. Best, Irakli From: Feng Jin Sent: 12 March 2024 18:28 To: Keshelava, Irakli Cc: user@flink.apache.org Subject: Re: Flink Batch Execution Mode Hi Irakli What version of flink-connector-kafka are you using? You may have encountered a bug [1] in the old version that prevents the source task from entering the finished state. [1]. https://issues.apache.org/jira/browse/FLINK-31319<https://issues.apache.org/jira/browse/FLINK-31319> Best, Feng On Tue, Mar 12, 2024 at 7:21 PM irakli.keshel...@sony.com<mailto:irakli.keshel...@sony.com> mailto:irakli.keshel...@sony.com>> wrote: Hello, I have a Flink job that is running in the Batch mode. The source for the job is a Kafka topic which has limited number of events. I can see that the job starts running fine and consumes the events, but never makes it past the first task and becomes idle. The Kafka source is defined to be bounded by following command: "KafkaSource.builder().setBounded(OffsetsInitializer.latest())". I expect the job to consume all the events that are in the Kafka topic and then move to the next task, but I'm not sure if the "OffsetsInitializer.latest()" is the right OffsetInitializer. Can anyone help me out here? Thanks! Cheers, Irakli