RE: [ANNOUNCE] Release 1.18.0, release candidate #0

2023-10-06 Thread David Radley
Hi Martjin,
Thanks for your comments. I also think it is better to decouple the connectors 
– I agree they need to have their own release cycles. . I was worried that 
moving to Flink 1.118 is somehow causing the Kafka connector to fail – i.e. a 
regression. I think you are saying that there is no regression like this,
  Kind regards, David.
From: Martijn Visser 
Date: Thursday, 5 October 2023 at 21:39
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [ANNOUNCE] Release 1.18.0, release candidate #0
Hi David,

It’s a deliberate choice to decouple the connectors. We shouldn’t block
Flink 1.18 on connector statuses. There’s already work being done to fix
the Flink Kafka connector. Any Flink connector comes after the new minor
version, similar to how it has been for all other connectors with Flink
1.17.

Best regards,

Martijn Visser

Op do 5 okt 2023 om 11:33 schreef David Radley 

> Hi Jing,
> Yes I agree that if we can get them resolved then that would be ideal.
>
> I guess the worry is that at 1.17, we had a released Flink core and Kafka
> connector.
> At 1.18 we will have a released Core Flink but no new Kafka connector. So
> the last released Kafka connector would now be
> https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka/3.0.0-1.17
> which should be the same as the Kafka connector in 1.17. I guess this is
> the combination that people would pick up to deploy in production – and I
> assume this has been tested.
>
> This issues with the nightly builds refers to kafka connector main
> branch.  If they are not regressions, you are suggesting that pragmatically
> we go forward with the release; I think that makes sense to do, but do
> these issues effect 3.0.0.-1.117.
>
> I suspect we should release a new Kafka connector asap, so we have a
> matching connector built outside of the Flink repo. We may want to not
> include the Flink core version in the connector – or we might end up
> wanting to release a Kafka connector when there are no changes just to have
> a match with the Flink core version.
>
> Kind regards, David.
>
>
>
> From: Jing Ge 
> Date: Wednesday, 4 October 2023 at 17:36
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [ANNOUNCE] Release 1.18.0, release candidate #0
> Hi David,
>
> First of all, we should have enough time to wait for those issues to
> be resolved. Secondly, it makes less sense to block upstream release by
> downstream build issues. In case, those issues might need more time, we
> should move forward with the Flink release without waiting for them. WDYT?
>
> Best regards,
> Jing
>
> On Wed, Oct 4, 2023 at 6:15 PM David Radley 
> wrote:
>
> > Hi ,
> > As release 1.18 removes  the kafka connector from the core Flink
> > repository, I assume we will wait until the kafka connector nightly build
> > issues https://issues.apache.org/jira/browse/FLINK-33104   and
> > https://issues.apache.org/jira/browse/FLINK-33017   are resolved before
> > releasing 1.18?
> >
> >  Kind regards, David.
> >
> >
> > From: Jing Ge 
> > Date: Wednesday, 27 September 2023 at 15:11
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] Re: [ANNOUNCE] Release 1.18.0, release candidate #0
> > Hi Folks,
> >
> > @Ryan FYI: CI passed and the PR has been merged. Thanks!
> >
> > If there are no more other concerns, I will start publishing 1.18-rc1.
> >
> > Best regards,
> > Jing
> >
> > On Mon, Sep 25, 2023 at 1:40 PM Jing Ge  wrote:
> >
> > > Hi Ryan,
> > >
> > > Thanks for reaching out. It is fine to include it but we need to wait
> > > until the CI passes. I am not sure how long it will take, since there
> > seems
> > > to be some infra issues.
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Mon, Sep 25, 2023 at 11:34 AM Ryan Skraba
> > 
> > > wrote:
> > >
> > >> Hello!  There's a security fix that probably should be applied to 1.18
> > >> in the next RC1 : https://github.com/apache/flink/pull/23461(bump
> to
> > >> snappy-java).  Do you think this would be possible to include?
> > >>
> > >> All my best, Ryan
> > >>
> > >> [1]: https://issues.apache.org/jira/browse/FLINK-33149"Bump
> > >> snappy-java to 1.1.10.4"
> > >>
> > >>
> > >>
> > >> On Mon, Sep 25, 2023 at 3:54 PM Jing Ge 
> > >> wrote:
> > >> >
> > >> > Thanks Zakelly for the update! Appreciate it!
> > >> >
> > >> > @Piotr Nowojski  If you do not have any other
> > >> > concerns, I will move forward to create 1.18 rc1 and start voting.
> > WDYT?
> > >> >
> > >> > Best regards,
> > >> > Jing
> > >> >
> > >> > On Mon, Sep 25, 2023 at 2:20 AM Zakelly Lan 
> > >> wrote:
> > >> >
> > >> > > Hi Jing and everyone,
> > >> > >
> > >> > > I have conducted three rounds of benchmarking with Java11,
> comparing
> > >> > > release 1.18 (commit: deb07e99560[1]) with commit 6d62f9918ea[2].
> > The
> > >> > > results are attached[3]. Most of the tests show no obvious
> > regression.
> > >> > > However, I did observe significant change in several tests. Upon
> > >> > > reviewing the historical results from the previous pipelin

Re: [ANNOUNCE] Release 1.18.0, release candidate #1

2023-10-06 Thread Sergey Nuyanzin
sorry for not mentioning it in previous mail

based on the reason above I'm
-1 (non-binding)

also there is one more issue [1]
which blocks all the externalised connectors testing against the most
recent commits in
to corresponding branches
[1] https://issues.apache.org/jira/browse/FLINK-33175


On Thu, Oct 5, 2023 at 11:19 PM Sergey Nuyanzin  wrote:

> Thanks for creating RC1
>
> * Downloaded artifacts
> * Built from sources
> * Verified checksums and gpg signatures
> * Verified versions in pom files
> * Checked NOTICE, LICENSE files
>
> The strange thing I faced is
> CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished
> fails on AZP [1]
>
> which looks like it is related to [2], [3] fixed  in 1.18.0 (not 100%
> sure).
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-33186
> [2] https://issues.apache.org/jira/browse/FLINK-32996
> [3] https://issues.apache.org/jira/browse/FLINK-32907
>
> On Tue, Oct 3, 2023 at 2:53 PM Ferenc Csaky 
> wrote:
>
>> Thanks everyone for the efforts!
>>
>> Checked the following:
>>
>> - Downloaded artifacts
>> - Built Flink from source
>> - Verified checksums/signatures
>> - Verified NOTICE, LICENSE files
>> - Deployed dummy SELECT job via SQL gateway on standalone cluster, things
>> seemed fine according to the log files
>>
>> +1 (non-binding)
>>
>> Best,
>> Ferenc
>>
>>
>> --- Original Message ---
>> On Friday, September 29th, 2023 at 22:12, Gabor Somogyi <
>> gabor.g.somo...@gmail.com> wrote:
>>
>>
>> >
>> >
>> > Thanks for the efforts!
>> >
>> > +1 (non-binding)
>> >
>> > * Verified versions in the poms
>> > * Built from source
>> > * Verified checksums and signatures
>> > * Started basic workloads with kubernetes operator
>> > * Verified NOTICE and LICENSE files
>> >
>> > G
>> >
>> > On Fri, Sep 29, 2023, 18:16 Matthias Pohl matthias.p...@aiven.io.invalid
>> >
>> > wrote:
>> >
>> > > Thanks for creating RC1. I did the following checks:
>> > >
>> > > * Downloaded artifacts
>> > > * Built Flink from sources
>> > > * Verified SHA512 checksums GPG signatures
>> > > * Compared checkout with provided sources
>> > > * Verified pom file versions
>> > > * Went over NOTICE file/pom files changes without finding anything
>> > > suspicious
>> > > * Deployed standalone session cluster and ran WordCount example in
>> batch
>> > > and streaming: Nothing suspicious in log files found
>> > >
>> > > +1 (binding)
>> > >
>> > > On Fri, Sep 29, 2023 at 10:34 AM Etienne Chauchot
>> echauc...@apache.org
>> > > wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > Thanks to the team for this RC.
>> > > >
>> > > > I did a quick check of this RC against user pipelines (1) coded with
>> > > > DataSet (even if deprecated and soon removed), DataStream and SQL
>> APIs
>> > > >
>> > > > based on the small scope of this test, LGTM
>> > > >
>> > > > +1 (non-binding)
>> > > >
>> > > > [1] https://github.com/echauchot/tpcds-benchmark-flink
>> > > >
>> > > > Best
>> > > > Etienne
>> > > >
>> > > > Le 28/09/2023 à 19:35, Jing Ge a écrit :
>> > > >
>> > > > > Hi everyone,
>> > > > >
>> > > > > The RC1 for Apache Flink 1.18.0 has been created. The related
>> voting
>> > > > > process will be triggered once the announcement is ready. The RC1
>> has
>> > > > > all
>> > > > > the artifacts that we would typically have for a release, except
>> for
>> > > > > the
>> > > > > release note and the website pull request for the release
>> announcement.
>> > > > >
>> > > > > The following contents are available for your review:
>> > > > >
>> > > > > - Confirmation of no benchmarks regression at the thread[1].
>> > > > > - The preview source release and binary convenience releases [2],
>> which
>> > > > > are signed with the key with fingerprint 96AE0E32CBE6E0753CE6 [3].
>> > > > > - all artifacts that would normally be deployed to the Maven
>> > > > > Central Repository [4].
>> > > > > - source code tag "release-1.18.0-rc1" [5]
>> > > > >
>> > > > > Your help testing the release will be greatly appreciated! And
>> we'll
>> > > > > create the rc1 release and the voting thread as soon as all the
>> efforts
>> > > > > are
>> > > > > finished.
>> > > > >
>> > > > > [1]
>> https://lists.apache.org/thread/yxyphglwwvq57wcqlfrnk3qo9t3sr2ro
>> > > > > [2]https://dist.apache.org/repos/dist/dev/flink/flink-1.18.0-rc1/
>> > > > > [3]https://dist.apache.org/repos/dist/release/flink/KEYS
>> > > > > [4]
>> > > > >
>> https://repository.apache.org/content/repositories/orgapacheflink-1657
>> > > > > [5]
>> https://github.com/apache/flink/releases/tag/release-1.18.0-rc1
>> > > > >
>> > > > > Best regards,
>> > > > > Qingsheng, Sergei, Konstantin and Jing
>>
>
>
> --
> Best regards,
> Sergey
>


-- 
Best regards,
Sergey


[jira] [Created] (FLINK-33197) PyFlink support for ByteArraySchema

2023-10-06 Thread Liu Chong (Jira)
Liu Chong created FLINK-33197:
-

 Summary: PyFlink support for ByteArraySchema
 Key: FLINK-33197
 URL: https://issues.apache.org/jira/browse/FLINK-33197
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.17.0
Reporter: Liu Chong


Currently in Python Flink API, when reading messages from a Kafka source, only 
SimpleStringSchema is available.
If the data is in arbitary binary format(e.g. marshalled Protocol Buffer msg) 
it may not be decodable with the default 'utf-8' encoding. 
There's currently a workaround which is to manually set the encoding to 
'ISO-8859-1' which supports all possible byte combinations. 
However this is not an elegant solution.
We should support ByteArraySchema which outputs a raw byte array for subsequent 
unmarshalling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-06 Thread xiangyu feng
Thanks Yuepeng and Rui for driving this Discussion.

Internally when we try to use Flink 1.17.1 in production, we are also
suffering from the unbalanced task distribution problem for jobs with high
qps and complex dag. So +1 for the overall proposal.

Some questions about the details:

1, About the waiting mechanism: Will the waiting mechanism happen only in
the second level 'assigning slots to TM'?  IIUC, the first level 'assigning
Tasks to Slots' needs only the asynchronous slot result from slotpool.

2, About the slot LoadingWeight: it is reasonable to use the number of
tasks by default in the beginning, but it would be better if this could be
easily extended in future to distinguish between CPU-intensive and
IO-intensive workloads. In some cases, TMs may have IO bottlenecks but
others have CPU bottlenecks.

Regards,
Xiangyu


Yuepeng Pan  于2023年10月5日周四 18:34写道:

> Hi, Zhu Zhu,
>
> Thanks for your feedback!
>
> > I think we can introduce a new config option
> > `taskmanager.load-balance.mode`,
> > which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots`
> > can be superseded by the "Slots" mode and get deprecated. In the future
> > it can support more mode, e.g. "CpuCores", to work better for jobs with
> > fine-grained resources. The proposed config option
> > `slot.request.max-interval`
> > then can be renamed to
> `taskmanager.load-balance.request-stablizing-timeout`
> > to show its relation with the feature. The proposed
> `slot.sharing-strategy`
> > is not needed, because the configured "Tasks" mode will do the work.
>
> The new proposed configuration option sounds good to me.
>
> I have a small question, If we set our configuration value to 'Tasks,' it
> will initiate two processes: balancing the allocation of task quantities at
> the slot level and balancing the number of tasks across TaskManagers (TMs).
> Alternatively, if we configure it as 'Slots,' the system will employ the
> LocalPreferred allocation policy (which is the default) when assigning
> tasks to slots, and it will ensure that the number of slots used across TMs
> is balanced.
> Does  this configuration essentially combine a balanced selection strategy
> across two dimensions into fixed configuration items, right?
>
> I would appreciate it if you could correct me if I've made any errors.
>
> Best,
> Yuepeng.
>


Re: [ANNOUNCE] Release 1.18.0, release candidate #1

2023-10-06 Thread Konstantin Knauf
Hi everyone,

I've just opened a PR for the release announcement [1] and I am looking
forward to reviews and feedback.

Cheers,

Konstantin

[1] https://github.com/apache/flink-web/pull/680

Am Fr., 6. Okt. 2023 um 11:03 Uhr schrieb Sergey Nuyanzin <
snuyan...@gmail.com>:

> sorry for not mentioning it in previous mail
>
> based on the reason above I'm
> -1 (non-binding)
>
> also there is one more issue [1]
> which blocks all the externalised connectors testing against the most
> recent commits in
> to corresponding branches
> [1] https://issues.apache.org/jira/browse/FLINK-33175
>
>
> On Thu, Oct 5, 2023 at 11:19 PM Sergey Nuyanzin 
> wrote:
>
> > Thanks for creating RC1
> >
> > * Downloaded artifacts
> > * Built from sources
> > * Verified checksums and gpg signatures
> > * Verified versions in pom files
> > * Checked NOTICE, LICENSE files
> >
> > The strange thing I faced is
> > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished
> > fails on AZP [1]
> >
> > which looks like it is related to [2], [3] fixed  in 1.18.0 (not 100%
> > sure).
> >
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-33186
> > [2] https://issues.apache.org/jira/browse/FLINK-32996
> > [3] https://issues.apache.org/jira/browse/FLINK-32907
> >
> > On Tue, Oct 3, 2023 at 2:53 PM Ferenc Csaky 
> > wrote:
> >
> >> Thanks everyone for the efforts!
> >>
> >> Checked the following:
> >>
> >> - Downloaded artifacts
> >> - Built Flink from source
> >> - Verified checksums/signatures
> >> - Verified NOTICE, LICENSE files
> >> - Deployed dummy SELECT job via SQL gateway on standalone cluster,
> things
> >> seemed fine according to the log files
> >>
> >> +1 (non-binding)
> >>
> >> Best,
> >> Ferenc
> >>
> >>
> >> --- Original Message ---
> >> On Friday, September 29th, 2023 at 22:12, Gabor Somogyi <
> >> gabor.g.somo...@gmail.com> wrote:
> >>
> >>
> >> >
> >> >
> >> > Thanks for the efforts!
> >> >
> >> > +1 (non-binding)
> >> >
> >> > * Verified versions in the poms
> >> > * Built from source
> >> > * Verified checksums and signatures
> >> > * Started basic workloads with kubernetes operator
> >> > * Verified NOTICE and LICENSE files
> >> >
> >> > G
> >> >
> >> > On Fri, Sep 29, 2023, 18:16 Matthias Pohl
> matthias.p...@aiven.io.invalid
> >> >
> >> > wrote:
> >> >
> >> > > Thanks for creating RC1. I did the following checks:
> >> > >
> >> > > * Downloaded artifacts
> >> > > * Built Flink from sources
> >> > > * Verified SHA512 checksums GPG signatures
> >> > > * Compared checkout with provided sources
> >> > > * Verified pom file versions
> >> > > * Went over NOTICE file/pom files changes without finding anything
> >> > > suspicious
> >> > > * Deployed standalone session cluster and ran WordCount example in
> >> batch
> >> > > and streaming: Nothing suspicious in log files found
> >> > >
> >> > > +1 (binding)
> >> > >
> >> > > On Fri, Sep 29, 2023 at 10:34 AM Etienne Chauchot
> >> echauc...@apache.org
> >> > > wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > Thanks to the team for this RC.
> >> > > >
> >> > > > I did a quick check of this RC against user pipelines (1) coded
> with
> >> > > > DataSet (even if deprecated and soon removed), DataStream and SQL
> >> APIs
> >> > > >
> >> > > > based on the small scope of this test, LGTM
> >> > > >
> >> > > > +1 (non-binding)
> >> > > >
> >> > > > [1] https://github.com/echauchot/tpcds-benchmark-flink
> >> > > >
> >> > > > Best
> >> > > > Etienne
> >> > > >
> >> > > > Le 28/09/2023 à 19:35, Jing Ge a écrit :
> >> > > >
> >> > > > > Hi everyone,
> >> > > > >
> >> > > > > The RC1 for Apache Flink 1.18.0 has been created. The related
> >> voting
> >> > > > > process will be triggered once the announcement is ready. The
> RC1
> >> has
> >> > > > > all
> >> > > > > the artifacts that we would typically have for a release, except
> >> for
> >> > > > > the
> >> > > > > release note and the website pull request for the release
> >> announcement.
> >> > > > >
> >> > > > > The following contents are available for your review:
> >> > > > >
> >> > > > > - Confirmation of no benchmarks regression at the thread[1].
> >> > > > > - The preview source release and binary convenience releases
> [2],
> >> which
> >> > > > > are signed with the key with fingerprint 96AE0E32CBE6E0753CE6
> [3].
> >> > > > > - all artifacts that would normally be deployed to the Maven
> >> > > > > Central Repository [4].
> >> > > > > - source code tag "release-1.18.0-rc1" [5]
> >> > > > >
> >> > > > > Your help testing the release will be greatly appreciated! And
> >> we'll
> >> > > > > create the rc1 release and the voting thread as soon as all the
> >> efforts
> >> > > > > are
> >> > > > > finished.
> >> > > > >
> >> > > > > [1]
> >> https://lists.apache.org/thread/yxyphglwwvq57wcqlfrnk3qo9t3sr2ro
> >> > > > > [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-1.18.0-rc1/
> >> > > > > [3]https://dist.apache.org/repos/dist/release/flink/KEYS
> >> > > > > [4]
> >> > > > >
> >> https://reposi

[jira] [Created] (FLINK-33198) Add timestamp with local time zone support in Avro converters

2023-10-06 Thread Zhenqiu Huang (Jira)
Zhenqiu Huang created FLINK-33198:
-

 Summary: Add timestamp with local time zone support in Avro 
converters
 Key: FLINK-33198
 URL: https://issues.apache.org/jira/browse/FLINK-33198
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Reporter: Zhenqiu Huang
 Fix For: 1.18.1


Currently, RowDataToAvroConverters doesn't handle with LogicType 
TIMESTAMP_WITH_LOCAL_TIME_ZONE. We should add the corresponding conversion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33199) ArchitectureTests should test for canonical class names instead of Class objects

2023-10-06 Thread Alexander Fedulov (Jira)
Alexander Fedulov created FLINK-33199:
-

 Summary: ArchitectureTests should test for canonical class names 
instead of Class objects
 Key: FLINK-33199
 URL: https://issues.apache.org/jira/browse/FLINK-33199
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Reporter: Alexander Fedulov


Currently architecture tests rely on importing such classes as 
MiniClusterExtension. This introduces a production scope dependency on 
flink-test-utils which in turn depends on flink-streaming-java. This is 
problematic because adding architecture tests to any direct or transitive 
dependency of flink-streaming-java creates a dependency cycle.

Example: https://github.com/apache/flink/pull/22850#discussion_r1243343382

In general, since architecture tests are supposed to be used freely in any 
submodule, it is desirable to reduce its dependency surface as much as possible 
to prevent such cycles. 

This can be achieved by moving away from using Class objects and employing 
fully qualified type names checks instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


dev@flink.apache.org

2023-10-06 Thread Joao Boto
Hi all, Thank you to everyone for the feedback on FLIP-239[1]. Based on the
discussion thread [2] and some offline discussions, we have come to a
consensus on the design and are ready to take a vote to contribute this to
Flink. I'd like to start a vote for it. The vote will be open for at least
72 hours(excluding weekends, unless there is an objection or an
insufficient number of votes. [1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386271
[2]https://lists.apache.org/thread/yx833h5h3fjlyor0bfm32chy3sjw8hwt  Best,
Joao Boto


UDF support parity in Scala

2023-10-06 Thread mojhaha kiklasds
Hi team,

I am looking to understand parity in Table API and SQL API.
Would you say that we have parity for UDF?
I see the TableAggregateFunction can not be represented in Flink SQL but
only in Table API. Have you known it to work like other UDFs?


[jira] [Created] (FLINK-33200) ItemAt Expression validation fail in Table API due to type mismatch

2023-10-06 Thread Zhenqiu Huang (Jira)
Zhenqiu Huang created FLINK-33200:
-

 Summary: ItemAt Expression validation fail in Table API due to 
type mismatch
 Key: FLINK-33200
 URL: https://issues.apache.org/jira/browse/FLINK-33200
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Zhenqiu Huang
 Fix For: 1.8.4


The table schema is defined as below:

public static final DataType DATA_TYPE = DataTypes.ROW(
DataTypes.FIELD("id", DataTypes.STRING()),
DataTypes.FIELD("events", 
DataTypes.ARRAY(DataTypes.MAP(DataTypes.STRING(), DataTypes.STRING(
);

public static final Schema SCHEMA = 
Schema.newBuilder().fromRowDataType(DATA_TYPE).build();


inTable.select(Expressions.$("events").at(1).at("eventType").as("firstEventType")

The validation fail as "eventType" is inferred as 
BasicTypeInfo.STRING_TYPE_INFO, the table key internally is a 
StringDataTypeInfo. The validation fail at 

case mti: MapTypeInfo[_, _] =>
if (key.resultType == mti.getKeyTypeInfo) {
  ValidationSuccess
} else {
  ValidationFailure(
s"Map entry access needs a valid key of type " +
  s"'${mti.getKeyTypeInfo}', found '${key.resultType}'.")
}








--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-06 Thread Shammon FY
Thanks Rui, I check the codes and you're right.

As you described above, the entire process is actually two independent
steps from slot to TM and task to slot. Currenlty we use option
`cluster.evenly-spread-out-slots` for both of them. Can we provide
different options for the two steps, such as ANY/SLOTS for RM and ANY/TASKS
for slot pool?

I want this because we would like to add a new slot to TM strategy such as
SLOTS_NUM in the future for OLAP to improve the performance for olap jobs,
which will use TASKS strategy for task to slot. cc Guoyangze

Best,
Shammon FY

On Fri, Oct 6, 2023 at 6:19 PM xiangyu feng  wrote:

> Thanks Yuepeng and Rui for driving this Discussion.
>
> Internally when we try to use Flink 1.17.1 in production, we are also
> suffering from the unbalanced task distribution problem for jobs with high
> qps and complex dag. So +1 for the overall proposal.
>
> Some questions about the details:
>
> 1, About the waiting mechanism: Will the waiting mechanism happen only in
> the second level 'assigning slots to TM'?  IIUC, the first level 'assigning
> Tasks to Slots' needs only the asynchronous slot result from slotpool.
>
> 2, About the slot LoadingWeight: it is reasonable to use the number of
> tasks by default in the beginning, but it would be better if this could be
> easily extended in future to distinguish between CPU-intensive and
> IO-intensive workloads. In some cases, TMs may have IO bottlenecks but
> others have CPU bottlenecks.
>
> Regards,
> Xiangyu
>
>
> Yuepeng Pan  于2023年10月5日周四 18:34写道:
>
> > Hi, Zhu Zhu,
> >
> > Thanks for your feedback!
> >
> > > I think we can introduce a new config option
> > > `taskmanager.load-balance.mode`,
> > > which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots`
> > > can be superseded by the "Slots" mode and get deprecated. In the future
> > > it can support more mode, e.g. "CpuCores", to work better for jobs with
> > > fine-grained resources. The proposed config option
> > > `slot.request.max-interval`
> > > then can be renamed to
> > `taskmanager.load-balance.request-stablizing-timeout`
> > > to show its relation with the feature. The proposed
> > `slot.sharing-strategy`
> > > is not needed, because the configured "Tasks" mode will do the work.
> >
> > The new proposed configuration option sounds good to me.
> >
> > I have a small question, If we set our configuration value to 'Tasks,' it
> > will initiate two processes: balancing the allocation of task quantities
> at
> > the slot level and balancing the number of tasks across TaskManagers
> (TMs).
> > Alternatively, if we configure it as 'Slots,' the system will employ the
> > LocalPreferred allocation policy (which is the default) when assigning
> > tasks to slots, and it will ensure that the number of slots used across
> TMs
> > is balanced.
> > Does  this configuration essentially combine a balanced selection
> strategy
> > across two dimensions into fixed configuration items, right?
> >
> > I would appreciate it if you could correct me if I've made any errors.
> >
> > Best,
> > Yuepeng.
> >
>


[RESULT][VOTE] FLIP-314: Support Customized Job Lineage Listener

2023-10-06 Thread Shammon FY
Hi devs,

Thanks everyone for your review and the votes!

I am happy to announce that FLIP-314: Support Customized Job Lineage
Listener [1] has been accepted.

There are 5 binding votes and 4 non-binding vote [2]:
- Jing Ge(binding)
- Feng Jin
- Leonard Xu(binding)
- Yangze Guo(binding)
- Yun Tang(binding)
- 曹帝胄
- Chen Zhanghao
- Rui Fan(binding)
- Yuepeng Pan

There is no disapproving vote.

Best,
Shammon FY

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
[2] https://lists.apache.org/thread/dxdqjc0dd8rf1vbdg755zo1n2zj1tj8d


[jira] [Created] (FLINK-33201) Memory leak in CachingTopicSelector

2023-10-06 Thread Sunyeop Lee (Jira)
Sunyeop Lee created FLINK-33201:
---

 Summary: Memory leak in CachingTopicSelector
 Key: FLINK-33201
 URL: https://issues.apache.org/jira/browse/FLINK-33201
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
 Environment: I am out of office now, so this is what I remember (the 
flink version may not be correct). Because I already identified the cause, this 
should not matter anyway.

EKS 1.24, x86_64, Bottlerocket OS, flink 1.14, scala 2.12
Reporter: Sunyeop Lee
 Attachments: 273084767-29bc0d8a-7445-4a74-a6e1-7c836775c7b1.png

Pull Request available at: 
https://github.com/apache/flink-connector-kafka/pull/55

 

In the CachingTopicSelector, a memory leak may occur when the internal logic 
fails to check the cache size due to a race condition. 
([https://github.com/apache/flink-connector-kafka/blob/d89a082180232bb79e3c764228c4e7dbb9eb6b8b/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/sink/KafkaRecordSerializationSchemaBuilder.java#L287-L289)]

 

By analyzing a Java heap dump, I identified a memory leak in the 
CachingTopicSelector. As in the screenshot, cache has 47,769 elements. If the 
internal logic were functioning correctly, the number of elements should be 
less than or equal to CACHE_RESET_SIZE (which is 5).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33202) FLIP-327: Support switching from batch to stream mode to improve throughput when processing backlog data

2023-10-06 Thread Xuannan Su (Jira)
Xuannan Su created FLINK-33202:
--

 Summary: FLIP-327: Support switching from batch to stream mode to 
improve throughput when processing backlog data
 Key: FLINK-33202
 URL: https://issues.apache.org/jira/browse/FLINK-33202
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Task
Reporter: Xuannan Su
 Fix For: 1.19.0


Umbrella issue for 
[https://cwiki.apache.org/confluence/display/FLINK/FLIP-327%3A+Support+switching+from+batch+to+stream+mode+to+improve+throughput+when+processing+backlog+data|https://cwiki.apache.org/confluence/display/FLINK/FLIP-309%3A+Support+using+larger+checkpointing+interval+when+source+is+processing+backlog]
h4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)