Re: [VOTE] FLIP-391: Deprecate RuntimeContext#getExecutionConfig
+1 (binding) Thanks, Zhu Rui Fan <1996fan...@gmail.com> 于2023年11月28日周二 13:11写道: > +1(binding) > > Best, > Rui > > On Tue, Nov 28, 2023 at 12:34 PM Junrui Lee wrote: > > > Hi everyone, > > > > Thank you to everyone for the feedback on FLIP-391: Deprecate > > RuntimeContext#getExecutionConfig[1] which has been discussed in this > > thread [2]. > > > > I would like to start a vote for it. The vote will be open for at least > 72 > > hours unless there is an objection or not enough votes. > > > > [1] > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278465937 > > [2]https://lists.apache.org/thread/cv4x3372g5txw1j20r5l1vwlqrvjoqq5 > > >
[jira] [Created] (FLINK-33670) Public operators cannot be reused in multi sinks
Lyn Zhang created FLINK-33670: - Summary: Public operators cannot be reused in multi sinks Key: FLINK-33670 URL: https://issues.apache.org/jira/browse/FLINK-33670 Project: Flink Issue Type: Improvement Components: Table SQL / Planner Affects Versions: 1.18.0 Reporter: Lyn Zhang Attachments: image-2023-11-28-14-31-30-153.png Dear all: I find that some public operators cannot be reused when submit a job with multi sinks. I have an example as follows: {code:java} CREATE TABLE source ( id STRING, ts TIMESTAMP(3), v BIGINT, WATERMARK FOR ts AS ts - INTERVAL '3' SECOND ) WITH ( 'connector' = 'socket', 'hostname' = 'localhost', 'port' = '', 'byte-delimiter' = '10', 'format' = 'json' ); CREATE VIEW source_distinct AS SELECT * FROM ( SELECT *, ROW_NUMBER() OVER w AS row_nu FROM source WINDOW w AS (PARTITION BY id ORDER BY proctime() ASC) ) WHERE row_nu = 1; CREATE TABLE print1 ( id STRING, ts TIMESTAMP(3) ) WITH('connector' = 'blackhole'); INSERT INTO print1 SELECT id, ts FROM source_distinct; CREATE TABLE print2 ( id STRING, ts TIMESTAMP(3), v BIGINT ) WITH('connector' = 'blackhole'); INSERT INTO print2 SELECT id, TUMBLE_START(ts, INTERVAL '20' SECOND), SUM(v) FROM source_distinct GROUP BY TUMBLE(ts, INTERVAL '20' SECOND), id; {code} !image-2023-11-28-14-31-30-153.png|width=384,height=145! I try to check the code, Flink add the rule of CoreRules.PROJECT_MERGE by default, This will create different rel digests of the deduplicate operator and finally cause match common operators fail. In real production environment, Reuse common operators like deduplicate is more worthy than project merge. A good solution is to interrupt the project merge cross shuffle operators in multi sinks cases. How did you consider it? Looking forward to your reply. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33669) Update the documentation for RestartStrategy, Checkpoint Storage, and State Backend.
Junrui Li created FLINK-33669: - Summary: Update the documentation for RestartStrategy, Checkpoint Storage, and State Backend. Key: FLINK-33669 URL: https://issues.apache.org/jira/browse/FLINK-33669 Project: Flink Issue Type: Technical Debt Components: Documentation Reporter: Junrui Li After the deprecation of complex Java object getter and setter methods in FLIP-381, Flink now recommends the use of ConfigOptions for the configuration of RestartStrategy, Checkpoint Storage, and State Backend. It is necessary that we update FLINK documentation to clearly instruct users on this new recommended approach. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] FLIP-391: Deprecate RuntimeContext#getExecutionConfig
+1(binding) Best, Rui On Tue, Nov 28, 2023 at 12:34 PM Junrui Lee wrote: > Hi everyone, > > Thank you to everyone for the feedback on FLIP-391: Deprecate > RuntimeContext#getExecutionConfig[1] which has been discussed in this > thread [2]. > > I would like to start a vote for it. The vote will be open for at least 72 > hours unless there is an objection or not enough votes. > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278465937 > [2]https://lists.apache.org/thread/cv4x3372g5txw1j20r5l1vwlqrvjoqq5 >
[VOTE] FLIP-391: Deprecate RuntimeContext#getExecutionConfig
Hi everyone, Thank you to everyone for the feedback on FLIP-391: Deprecate RuntimeContext#getExecutionConfig[1] which has been discussed in this thread [2]. I would like to start a vote for it. The vote will be open for at least 72 hours unless there is an objection or not enough votes. [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278465937 [2]https://lists.apache.org/thread/cv4x3372g5txw1j20r5l1vwlqrvjoqq5
[jira] [Created] (FLINK-33668) Decoupling Shuffle network memory and job topology
Jiang Xin created FLINK-33668: - Summary: Decoupling Shuffle network memory and job topology Key: FLINK-33668 URL: https://issues.apache.org/jira/browse/FLINK-33668 Project: Flink Issue Type: Improvement Components: Runtime / Network Reporter: Jiang Xin Fix For: 1.19.0 With [FLINK-30469|https://issues.apache.org/jira/browse/FLINK-30469] and [FLINK-31643|https://issues.apache.org/jira/browse/FLINK-31643], we have decoupled the shuffle network memory and the parallelism of tasks by limiting the number of buffers for each InputGate and ResultPartition. However, when too many shuffle tasks are running simultaneously on the same TaskManager, "Insufficient number of network buffers" errors would still occur. This usually happens when Slot Sharing Group is enabled or a TaskManager contains multiple slots. So we need to make sure that the TaskManager does not encounter "Insufficient number of network buffers" even if there are dozens of InputGates and ResultPartitions running on the same TaskManager simultaneously. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[RESULT][VOTE] FLIP-378: Support Avro timestamp with local timezone
Dear developers, FLIP-378 [1] has been accepted and voted through this thread [2]. The proposal received 6 approving votes, 5 of which are binding, and there is no disapproval. - Gyula Fora (binding) - Jark Wu (binding) - Jing Ge (binding) - Leonard Xu (binding) - Matyas Orhidi (binding) - MingLiang Liu (non-binding) Thanks to all participants for the discussion, voting, and providing valuable feedback. [1] https://cwiki.apache.org/confluence/x/Hgt1E [2] https://lists.apache.org/thread/7hls4813xmq01wbmo90jtfb5chr3mpr2 Best Regards, Peter Huang
[jira] [Created] (FLINK-33667) Implement restore tests for MatchRecognize node
Jim Hughes created FLINK-33667: -- Summary: Implement restore tests for MatchRecognize node Key: FLINK-33667 URL: https://issues.apache.org/jira/browse/FLINK-33667 Project: Flink Issue Type: Sub-task Reporter: Jim Hughes Assignee: Jim Hughes -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-394: Add Metrics for Connector Agnostic Autoscaling
Hi Leonard, Thanks for the review! See my responses below: 1. Let's take for example the case that a split is a file (i.e. what I call bounded). To calculate the pending records, the connector needs to know how many splits are left and the number of records in each split. If a reader only knows its split (i.e. file), then it wouldn't be able to calculate the pending records. Furthermore, unlike Kafka, there is no concept of a partition--a reader might be assigned any file, under any file path, etc. 2. Thanks for the catch, looks like I didn't commit my changes. 3. Similar to my response for (1), the metric implemented depends if the split is bounded or unbounded. It wouldn't make much sense for a connector to implement both. We could unify reporting the metric in the enumerator but see the Rejected Alternatives section for why not. 4. Same issue here as in (2) regarding the implementation, looks like I didn't commit my changes. The autoscaling algorithm needs to determine the maximum theoretical parallelism of a source (e.g. max number of splits). To handle the case when the split is completed, we need to report the splits that are assigned to a reader and decrement the gauge when the split is completed. If we want to introduce an additional RPC to the enumerator when a split is completed, we can send an event from the reader in that scenario. However, note that the enumerator doesn't know how many splits are assigned in all cases, e.g. in the case that a job restarts. I generally agree with you on the final status being awkward and I do agree reporting this metric from the enumerator is cleaner, interface-wise. To solve the problem of tracking splits in job restarts, we can send the splits back to the enumerator from the readers on restore (and the enumerator already has an API to handle `addSplitsBack`) and let the enumerator re-assign the splits. Best, Mason On Sat, Nov 18, 2023 at 8:43 PM Leonard Xu wrote: > Thanks Mason for starting this thread discussion, generally +1 for the > motivation and proposal . > > > I have some questions about the detail after read the FLIP. > > 1. The FLIP says "However, pendingRecords is currently only reported by > the SourceReader and doesn’t cover the case for sources that only the > SourceEnumerator can calculate those metrics e.g. bounded split > implementations like Iceberg." Could you explain more why we cannot > calculate metrics in SourceReader side? > > 2.minor: Looks like you mixed SplitEnumeratorMetricGroup and > SourceReaderMetricGroup in public interface part > @PublicEvolving > public interface SplitEnumeratorMetricGroup extends > OperatorCoordinatorMetricGroup { > // IIUC, these methods belongs to SourceReader? > Counter getNumRecordsInErrorsCounter(); > void setPendingBytesGauge(Gauge pendingBytesGauge); > void setPendingRecordsGauge(Gauge pendingRecordsGauge); > > /** new addition */ > void setAssignedSplitsGauge(Gauge assignedSplitsGauge); > } > > 3. Could you explain the relation between your proposed > PendingRecordsGauge in SplitEnumerator and PendingRecordsGauge in > SourceReader? e.g. Which kind of connector developers needs to > care/implements the two > metrics. > > 4. The discussion thread context show me that you want to introduce > setAssignedSplits method in sourceReader side, but the FLIP didn’t update > yet like the implementation part? And, the final status looks strange to me > that we calculate total assignedSplits in > splitEnumerator but calculate total unAssignedSplits in SourceReader side > following your design. The action ‘assign’ is happening in SplitEnumerator > sider and should be surely managed by splitEnumerator, > why a SourceReader never assigns any splits need to report its assigned > splits? > > Best, > Leonard > > > > > 2023年11月17日 上午6:52,Mason Chen 写道: > > > > Hi all, > > > > I would like to start a discussion on FLIP-394: Add Metrics for Connector > > Agnostic Autoscaling [1]. > > > > This FLIP recommends adding two metrics to make autoscaling work for > > bounded split source implementations like IcebergSource. These metrics > are > > required by the Flink Kubernetes Operator autoscaler algorithm [2] to > > retrieve information for the backlog and the maximum source parallelism. > > The changes would affect the `@PublicEvolving` > `SplitEnumeratorMetricGroup` > > API of the source connector framework. > > > > Best, > > Mason > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-394%3A+Add+Metrics+for+Connector+Agnostic+Autoscaling > > [2] > > > https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/#limitations > >
Re: [DISCUSS] Resolve diamond inheritance of Sink.createWriter
I think we should try to separate the discussion in a few different topics: - Concrete issue - How to solve this problem in 1.19 and wrt the affected createWriter interface - Update the documentation [1], so FLIP-321 is visible for every contributor - Generic issue - API stability - Connector dependencies *CreateWriter interface* The change on the createWriter is not strictly required for the functionality defined by the requirements on the FLIP. If the only goal is only to have a backward compatible API, we can simply create a separate `*CommitterInitContext*` object and do not touch the writer `*InitContext*`, like it was done in the original PR [2]. The issue is that this would result in an implementation which has duplicated methods/implementations (internal issue only), and has inconsistent naming (issue for external users). If we want to create an API which is consistent (and I agree with the reviewer's comments), then we need to rename the parameter type ( *WriterInitContext*) for the createWriter method. I have tried to keep the backward compatibility with creating a new method and providing a default implementation for this new method which would call the original method after converting the WriterInitContext to InitContext. This is failed because the following details: - *org.apache.flink.api.connector.sink2.Sink* defines `*SinkWriter createWriter(InitContext context)`* - *org.apache.flink.api.connector.sink2.StatefulSink* narrows it down to *`StatefulSinkWriter createWriter(InitContext context)`* - *org.apache.flink.api.connector.sink2.TwoPhaseCommittingSink* narrows it down to *`PrecommittingSinkWriter createWriter(WriterInitContext context)`* - *org.apache.flink.streaming.runtime.operators.sink.TestSinkV2.TestStatefulSinkV2* implements *StatefulSink* and *TwoPhaseCommittingSink* too *TestStatefulSinkV2* is a good example where we can not achieve backward compatibility, since the the compiler will fail with unrelated default methods [3] I am open for any suggestions how to move to the new API, and keep the backward compatibility. If we do not find a way to keep backward compatibility, and we decide that we would like to honour FLIP-321, then we can reverting to the original solution and keep only the changes for the ` *createCommitter*` method. *Update the documentation* I have not found only one place in the docs [1], where we talk about the compatibility guarantees. Based FLIP-321 and the result of the discussion here, we should update this page. *API stability* I agree with the general sentiment of FLIP-321 to keep the changes backward compatible as much as possible. But the issue above highlights that there could be situations where it is not possible to achieve backward compatibility. Probably we should provide exceptions to handle this kind of situations - minimally for PublicEvolving interfaces. After we agree on long term goals - allowing exceptions or to be more lenient on backward compatibility guarantees, or sticking to FLIP-321 by the letter - we could discuss how to apply it to the current situation. *Connector dependencies* I think it is generally a good practice to depend on the stable version of Flink (or any other downstream project). This is how we do it in Iceberg, and how it was implemented in the Kafka connector as well. This would result in more stable connector builds. The only issue I see, that the situations like this would take longer to surface, but I fully expect us to get better at compatibility after we wetted the process. [1] - https://nightlies.apache.org/flink/flink-docs-master/docs/ops/upgrading/#api-compatibility-guarantees [2] - https://github.com/apache/flink/pull/23555/commits/2b9adeb20e55c33a623115efa97d3149c11e9ca4 [3] - https://github.com/apache/flink/pull/23555#discussion_r1371740397 Martijn Visser ezt írta (időpont: 2023. nov. 27., H, 11:21): > Hi all, > > I'm opening this discussion thread to bring a discussion that's > happening on a completed Jira ticket back to the mailing list [1] > > In summary: > > * There was a discussion and a vote on FLIP-371 [2] > * During implementation, it was determined that there's a diamond > inheritance problem on the Sink.createWriter method, making a > backwards compatible change hard/impossible (I think this is where the > main discussion point actually is) [3] > * The PR was merged, causing a backwards incompatible change without a > discussion on the Dev mailing list > > I think that in hindsight, even though there was a FLIP on this topic, > the finding of the diamond inheritance issue should have been brought > back to the Dev mailing list in order to agree on how to resolve it. > Since 1.19 is still under way, we still have time to fix this. > > I think there's two things we can improve: > > 1) Next time during implementation of a FLIP/PR which involves a > non-backward compatible change of an API that wasn't accounted for, >
[jira] [Created] (FLINK-33666) MergeTableLikeUtil uses different constraint name than Schema
Timo Walther created FLINK-33666: Summary: MergeTableLikeUtil uses different constraint name than Schema Key: FLINK-33666 URL: https://issues.apache.org/jira/browse/FLINK-33666 Project: Flink Issue Type: Bug Components: Table SQL / Planner Reporter: Timo Walther {{MergeTableLikeUtil}} uses a different algorithm to name constraints than {{Schema}}. {{Schema}} includes the column names. {{MergeTableLikeUtil}} uses a hashCode which means it might depend on JVM specifics. For consistency we should use the same algorithm. I propose to use {{Schema}}'s logic. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Python connector question
Hi Team, During some, mostly unrelated work we come to realize that during the externalization of the connector's python modules and the related tests are not moved to the respective connectors repository. We created the jiras [1] to create the infra, and to move the python code and the tests to the respective repos. When working on this code, I have found several oddities, which I would like to hear the community's opinion on: - Does anyone know what the site-packages/pyflink/opt/flink-sql-client-1.18.0.jar supposed to do? We specifically put it there [2], but then we ignore it when we check the directory of jars [3]. If there is no objection, I would remove it. - I would like to use the `opt` directory, to contain optional jars created by the connectors, like flink-sql-connector-kafka-3.1.x.jar. Also, the lint-python.sh [4], and install_command.sh [5] provides the base of the testing infra. Would there be any objections to mark these as a public apis for the connectors? Thanks, Peter [1] - https://issues.apache.org/jira/browse/FLINK-33528 [2] - https://github.com/apache/flink/blob/2da9a9639216b8c48850ee714065f090a80dcd65/flink-python/apache-flink-libraries/setup.py#L129-L130 [3] - https://github.com/apache/flink/blob/2da9a9639216b8c48850ee714065f090a80dcd65/flink-python/pyflink/pyflink_gateway_server.py#L183-L190 [4] - https://github.com/apache/flink/blob/master/flink-python/dev/lint-python.sh [5] - https://github.com/apache/flink/blob/master/flink-python/dev/install_command.sh
Re: Re:Re: [DISCUSS][FLINK-32993] Datagen connector handles length-constrained fields according to the schema definition by default
Hi Yubin, Thanks for the update! +1 for keeping the default length of String and Bytes unchange. Best, Lincoln Lee lixin58...@163.com 于2023年11月24日周五 12:04写道: > Thank Jane for providing examples to make discussions clearer. > Thank Lincoln and Xuyang for your feedback,I agree with you wholeheartedly > that it is better to throw an error instead of ignoring it directly. > Extending datagen to generate variable length values is really an excelent > idea, I will create another jira to follow up. > > Taking the example provided, > > 1. For fixed-length data types (char, binary), two DDLs which custom > length should throw exception like 'User-defined length of the fixed-length > field f0 is not supported.' > > 1. > CREATE TABLE foo ( > f0 CHAR(5) > ) WITH ('connector' = 'datagen', 'fields.f0.length' = '10'); > > CREATE TABLE bar ( > f0 CHAR(5) > ) WITH ('connector' = 'datagen', 'fields.f0.length' = '1'); > > 1. For variable-length data types (varchar, varbinary),the first DDL > can be executed legally, if illegal user-defined length configured, will > throw exception like 'User-defined length of the VARCHAR field %s should be > shorter than the schema definition.' > > 1. > CREATE TABLE meow ( > f0 VARCHAR(20) > ) WITH ('connector' = 'datagen', 'fields.f0.length' = '10'); > > 1. For special variable-length data types, since the length of String > and Bytes is very large (2^31 - 1), when users does not specify a smaller > field length, Fields that occupy a huge amount of memory (estimated to be > more than 2GB) will be generated by default, which can easily lead to > "java.lang.OutOfMemoryError: Java heap space", so I recommend that the > default length of these two fields is 100 just like before, but the length > can be configured to less than 2^31-1. > > 1. > CREATE TABLE purr ( > f0 STRING > ) WITH ('connector' = 'datagen', 'fields.f0.length' = '10'); > > Updates have been synchronized to the merge request [1] > > WDYT? > > [1] https://github.com/apache/flink/pull/23678 > > > Best! > Yubin >
Re: [DISCUSS] Contribute Flink Doris Connector to the Flink community
Hi di.wu, Thanks for the proposal, +1 Best regards, Martijn On Mon, Nov 27, 2023 at 1:05 PM Zhanghao Chen wrote: > > Sounds great. Thanks for driving this. > > Best, > Zhanghao Chen > > From: wudi <676366...@qq.com.INVALID> > Sent: Sunday, November 26, 2023 18:22 > To: dev@flink.apache.org > Subject: [DISCUSS] Contribute Flink Doris Connector to the Flink community > > Hi all, > > At present, Flink Connector and Flink's repository have been decoupled[1]. > At the same time, the Flink-Doris-Connector[3] has been maintained based on > the Apache Doris[2] community. > I think the Flink Doris Connector can be migrated to the Flink community > because it It is part of Flink Connectors and can also expand the ecosystem > of Flink Connectors. > > I volunteer to move this forward if I can. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development > [2] https://doris.apache.org/ > [3] https://github.com/apache/doris-flink-connector > > -- > > Brs, > di.wu
[jira] [Created] (FLINK-33665) Invalid code execution
Peihui He created FLINK-33665: - Summary: Invalid code execution Key: FLINK-33665 URL: https://issues.apache.org/jira/browse/FLINK-33665 Project: Flink Issue Type: Improvement Components: Library / CEP Reporter: Peihui He Attachments: image-2023-11-27-20-00-28-715.png, image-2023-11-27-20-00-59-544.png !image-2023-11-27-20-00-28-715.png! !image-2023-11-27-20-00-59-544.png! As shown in the above pictures, {code:java} // code placeholder nfaState.getPartialMatches() .removeIf(pm -> pm.getStartEventID() != null && !partialMatches.contains(pm)); {code} This code is unnecessary. It is very time-consuming when there are many partial matches. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Contribute Flink Doris Connector to the Flink community
Sounds great. Thanks for driving this. Best, Zhanghao Chen From: wudi <676366...@qq.com.INVALID> Sent: Sunday, November 26, 2023 18:22 To: dev@flink.apache.org Subject: [DISCUSS] Contribute Flink Doris Connector to the Flink community Hi all, At present, Flink Connector and Flink's repository have been decoupled[1]. At the same time, the Flink-Doris-Connector[3] has been maintained based on the Apache Doris[2] community. I think the Flink Doris Connector can be migrated to the Flink community because it It is part of Flink Connectors and can also expand the ecosystem of Flink Connectors. I volunteer to move this forward if I can. [1] https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development [2] https://doris.apache.org/ [3] https://github.com/apache/doris-flink-connector -- Brs, di.wu
[DISCUSS] FLIP-397: Add config options for administrator JVM options
Hi devs, I'd like to start a discussion on FLIP-397: Add config options for administrator JVM options [1]. In production environments, users typically develop and operate their Flink jobs through a managed platform. Users may need to add JVM options to their Flink applications (e.g. to tune GC options). They typically use the env.java.opts.x series of options to do so. Platform administrators also have a set of JVM options to apply by default, e.g. to use JVM 17, enable GC logging, or apply pretuned GC options, etc. Both use cases will need to set the same series of options and will clobber one another. Similar issues have been described in SPARK-23472 [2]. Therefore, I propose adding a set of default JVM options for administrator use that prepends the user-set extra JVM options. Looking forward to hearing from you. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-397%3A+Add+config+options+for+administrator+JVM+options [2] https://issues.apache.org/jira/browse/SPARK-23472 Best, Zhanghao Chen
[RESULT] [VOTE] Release 1.17.2, release candidate #1
I'm happy to announce that we have unanimously approved this release. There are 13 approving votes, 4 of which are binding: - Rui Fan - Matthias Pohl (binding) - Danny Cranmer (binding) - Leonard Xu (binding) - Sergey Nuyanzin - Jiabao Sun - Hang Ruan - Feng Jin - Yuxin Tan - Weijie Guo - Jing Ge - Martijn Visser (binding) - Lincoln Lee There are no disapproving votes. I'll work on the steps to finalize the release and will send out the announcement as soon as that has been completed. Thanks, everyone! Best, Yun Tang
[jira] [Created] (FLINK-33664) Setup cron build for java 21
Sergey Nuyanzin created FLINK-33664: --- Summary: Setup cron build for java 21 Key: FLINK-33664 URL: https://issues.apache.org/jira/browse/FLINK-33664 Project: Flink Issue Type: Sub-task Components: Build System / CI Reporter: Sergey Nuyanzin Assignee: Sergey Nuyanzin -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Contribute Flink Doris Connector to the Flink community
+1 as well Best Etienne Le 27/11/2023 à 06:22, Jing Ge a écrit : That sounds great! +1 Best regards Jing On Mon, Nov 27, 2023 at 3:38 AM Leonard Xu wrote: Thanks wudi for kicking off the discussion, +1 for the idea from my side. A FLIP like Yun posted is required if no other objections. Best, Leonard 2023年11月26日 下午6:22,wudi<676366...@qq.com.INVALID> 写道: Hi all, At present, Flink Connector and Flink's repository have been decoupled[1]. At the same time, the Flink-Doris-Connector[3] has been maintained based on the Apache Doris[2] community. I think the Flink Doris Connector can be migrated to the Flink community because it It is part of Flink Connectors and can also expand the ecosystem of Flink Connectors. I volunteer to move this forward if I can. [1] https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development [2]https://doris.apache.org/ [3]https://github.com/apache/doris-flink-connector -- Brs, di.wu
Re: Sending a hi!
Welcome to the community Pranav ! Best Etienne Le 25/11/2023 à 17:59, Pranav Sharma a écrit : Hi everyone, I am Pranav, and I am getting started to contribute to the Apache Flink project. I have been previously contributing to another Apache project, Allura. I came across flink during my day job as a data engineer, and hoping to contribute to the codebase as I learn more about the internal workings of the framework. Regards, Pranav S
Re: [DISCUSSION] flink-connector-shared-utils release process
Sure! Le 23/11/2023 à 02:57, Leonard Xu a écrit : Thanks Etienne for driving this. - a flink-connector-shared-utils-*test* clone repo and a *io.github.user.flink*:flink-connector-parent custom artifact to be able to directly commit and install the artifact in the CI - a custom ci script that does the cloning and mvn install in the ci.yml github action script for testing with the new flink-connector-parent artifact If people agree on the process and location +1 for the process and location, could we also add a short paragraph and the location link as well in [1] to remind connector developers? Best, Leonard [1]https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development
[jira] [Created] (FLINK-33663) Serialize CallExpressions into SQL
Dawid Wysakowicz created FLINK-33663: Summary: Serialize CallExpressions into SQL Key: FLINK-33663 URL: https://issues.apache.org/jira/browse/FLINK-33663 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dawid Wysakowicz Assignee: Dawid Wysakowicz Fix For: 1.19.0 The task is about introducing {{CallSyntax}} and implementing versions for non-standard SQL functions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[DISCUSS] Resolve diamond inheritance of Sink.createWriter
Hi all, I'm opening this discussion thread to bring a discussion that's happening on a completed Jira ticket back to the mailing list [1] In summary: * There was a discussion and a vote on FLIP-371 [2] * During implementation, it was determined that there's a diamond inheritance problem on the Sink.createWriter method, making a backwards compatible change hard/impossible (I think this is where the main discussion point actually is) [3] * The PR was merged, causing a backwards incompatible change without a discussion on the Dev mailing list I think that in hindsight, even though there was a FLIP on this topic, the finding of the diamond inheritance issue should have been brought back to the Dev mailing list in order to agree on how to resolve it. Since 1.19 is still under way, we still have time to fix this. I think there's two things we can improve: 1) Next time during implementation of a FLIP/PR which involves a non-backward compatible change of an API that wasn't accounted for, the discussion should be brought back to the Dev mailing list. I think we can just add that to the FLIP bylaws. 2) How do we actually resolve the problem: is there anyone who has an idea on how we could introduce the proposed change while maintaining backwards compatibility, or do we agree that while this is an non desired situation, there is no better alternative unfortunately? Best regards, Martijn [1] https://issues.apache.org/jira/browse/FLINK-25857 [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink [3] https://github.com/apache/flink/pull/23555#discussion_r1371740397
[jira] [Created] (FLINK-33662) Bump com.h2database:h2
Martijn Visser created FLINK-33662: -- Summary: Bump com.h2database:h2 Key: FLINK-33662 URL: https://issues.apache.org/jira/browse/FLINK-33662 Project: Flink Issue Type: Technical Debt Components: Connectors / JDBC Reporter: Martijn Visser Assignee: Martijn Visser -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33661) 'RocksDB Memory Management end-to-end test' failed due to unexpected error in logs
Matthias Pohl created FLINK-33661: - Summary: 'RocksDB Memory Management end-to-end test' failed due to unexpected error in logs Key: FLINK-33661 URL: https://issues.apache.org/jira/browse/FLINK-33661 Project: Flink Issue Type: Bug Components: Runtime / State Backends Affects Versions: 1.19.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54942=logs=bea52777-eaf8-5663-8482-18fbc3630e81=43ba8ce7-ebbf-57cd-9163-444305d74117=5132 This seems to be the same issue as FLINK-30785. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33660) AggregateITCase.testConstantGroupKeyWithUpsertSink due to unexpected behavior
Matthias Pohl created FLINK-33660: - Summary: AggregateITCase.testConstantGroupKeyWithUpsertSink due to unexpected behavior Key: FLINK-33660 URL: https://issues.apache.org/jira/browse/FLINK-33660 Project: Flink Issue Type: Bug Components: Runtime / Coordination, Table SQL / Runtime Affects Versions: 1.19.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54895=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11063 {code} Nov 25 05:02:47 05:02:47.850 [ERROR] org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase.testConstantGroupKeyWithUpsertSink Time elapsed: 107.302 s <<< ERROR! Nov 25 05:02:47 java.util.concurrent.ExecutionException: org.apache.flink.table.api.TableException: Failed to wait job finish Nov 25 05:02:47 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) Nov 25 05:02:47 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) Nov 25 05:02:47 at org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118) Nov 25 05:02:47 at org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81) Nov 25 05:02:47 at org.apache.flink.table.planner.runtime.stream.sql.AggregateITCase.testConstantGroupKeyWithUpsertSink(AggregateITCase.scala:1620) Nov 25 05:02:47 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Nov 25 05:02:47 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Nov 25 05:02:47 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Nov 25 05:02:47 at java.lang.reflect.Method.invoke(Method.java:498) Nov 25 05:02:47 at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727) Nov 25 05:02:47 at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) Nov 25 05:02:47 at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) Nov 25 05:02:47 at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) Nov 25 05:02:47 at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) Nov 25 05:02:47 at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94) Nov 25 05:02:47 at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) Nov 25 05:02:47 at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) Nov 25 05:02:47 at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) Nov 25 05:02:47 at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) Nov 25 05:02:47 at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) Nov 25 05:02:47 at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) Nov 25 05:02:47 at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) Nov 25 05:02:47 at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) Nov 25 05:02:47 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:217) Nov 25 05:02:47 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) Nov 25 05:02:47 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:213) Nov 25 05:02:47 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:138) Nov 25 05:02:47 at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:68) Nov 25 05:02:47 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) Nov 25 05:02:47 at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) Nov 25 05:02:47 at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) Nov 25 05:02:47 at
[jira] [Created] (FLINK-33659) Avoid unnecessary retries when restore from savepoint failed.
zhouli created FLINK-33659: -- Summary: Avoid unnecessary retries when restore from savepoint failed. Key: FLINK-33659 URL: https://issues.apache.org/jira/browse/FLINK-33659 Project: Flink Issue Type: Improvement Components: Runtime / Checkpointing Reporter: zhouli when restore a job from savepoint failed, if restart strategy is enabled, flink will try to restart the job, and the restore would fail again. We may wrap the exception as [SuppressRestartsException|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/execution/SuppressRestartsException.java] to avoid unnecessary retries. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33658) Hosted runner lost communication
Matthias Pohl created FLINK-33658: - Summary: Hosted runner lost communication Key: FLINK-33658 URL: https://issues.apache.org/jira/browse/FLINK-33658 Project: Flink Issue Type: Sub-task Components: Test Infrastructure Reporter: Matthias Pohl Some jobs failed due to lost communication: https://github.com/XComp/flink/actions/runs/6997726518 {quote} The hosted runner: GitHub Actions 15 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error. {quote} This is not really something we can fix. The issue is created for documentation purposes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33657) Insert message in top n without row number didn't consider it's number and may not correct
zlzhang0122 created FLINK-33657: --- Summary: Insert message in top n without row number didn't consider it's number and may not correct Key: FLINK-33657 URL: https://issues.apache.org/jira/browse/FLINK-33657 Project: Flink Issue Type: Improvement Components: Table SQL / Runtime Affects Versions: 1.17.1, 1.16.2 Reporter: zlzhang0122 The new insert message in top n without row number didn't consider it's order and just collectInsert to the next operator, this may not correct when the next operator collect all the top n records and aggregate it. For example: create table user_info( user_id int, item_id int, app string, dt timestamp ) whith( 'connector'='kafka', ... ); create table redis_sink ( redis_key string, hash_key string, hash_value string ) with ( 'connector' = 'redis', 'command' = 'hmset' 'nodes' = 'xxx', 'additional-ttl' = 'xx' ); create view base_lastn as select * from( select user_id,item_id,app,dt,row_number() over(partition by item_id, app order by dt desc) as rn from user_action )t where rn<=5; insert into redis_sink select concat("id", item_id) as redis_key, app as hash_key,user_id as hash_value from base_lastn where rn=1; insert into redis_sink select concat("list_", item_id) as redis_key, app as hash_key,listagg(user_id,",") as hash_value from base_lastn where group by item_id, app; There will be a scene that the value in the top 1 will not appear in the first or last value of the top 5. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] Apache Flink Kafka Connectors v3.0.2, RC #1
+1 (binding) - Validated hashes - Verified signature - Verified that no binaries exist in the source archive - Build the source with Maven - Verified licenses - Verified web PRs On Mon, Nov 27, 2023 at 4:17 AM Leonard Xu wrote: > > +1 (binding) > > - checked the flink-connector-base dependency scope has been changed to > provided > - built from source code succeeded > - verified signatures > - verified hashsums > - checked the contents contains jar and pom files in apache repo > - checked Github release tag > - checked release notes > - reviewed the web PR > > Best, > Leonard > > > > 2023年11月26日 下午4:40,Jing Ge 写道: > > > > +1 (non-binding) > > > > - verified signature and hash > > - checked repo > > - checked tag, BTW, the tag link at [5] should be > > https://github.com/apache/flink-connector-kafka/releases/tag/v3.0.2-rc1 > > - verified source archives do not contains any binaries > > - build source maven 3.8.6 and jdk11 > > - verified web PR > > > > Best regards, > > Jing > > > > On Sat, Nov 25, 2023 at 6:44 AM Rui Fan <1996fan...@gmail.com> wrote: > > > >> +1 (non-binding) > >> > >> - Validated checksum hash > >> - Verified signature > >> - Verified that no binaries exist in the source archive > >> - Build the source with Maven and jdk8 > >> - Verified licenses > >> - Verified web PRs > >> > >> Best, > >> Rui > >> > >> On Sat, Nov 25, 2023 at 2:05 AM Tzu-Li (Gordon) Tai > >> wrote: > >> > >>> +1 (binding) > >>> > >>> - Verified signature and hashes > >>> - Verified mvn dependency:tree for a typical user job jar [1]. When using > >>> Flink 1.18.0, flink-connector-base is no longer getting bundled, and all > >>> Flink dependencies resolve as 1.18.0 / provided. > >>> - Submitting user job jar to local Flink 1.18.0 cluster works and job > >> runs > >>> > >>> note: If running in the IDE, the flink-connector-base dependency is > >>> explicitly required when using KafkaSource. Otherwise, if submitting an > >>> uber jar, the flink-connector-base dependency should not be bundled as > >>> it'll be provided by the Flink distribution and will already be on the > >>> classpath. > >>> > >>> [1] mvn dependency:tree output > >>> ``` > >>> [INFO] com.tzulitai:testing-kafka:jar:1.0-SNAPSHOT > >>> [INFO] +- org.apache.flink:flink-streaming-java:jar:1.18.0:provided > >>> [INFO] | +- org.apache.flink:flink-core:jar:1.18.0:provided > >>> [INFO] | | +- org.apache.flink:flink-annotations:jar:1.18.0:provided > >>> [INFO] | | +- org.apache.flink:flink-metrics-core:jar:1.18.0:provided > >>> [INFO] | | +- org.apache.flink:flink-shaded-asm-9:jar:9.5-17.0:provided > >>> [INFO] | | +- > >>> org.apache.flink:flink-shaded-jackson:jar:2.14.2-17.0:provided > >>> [INFO] | | +- org.apache.commons:commons-lang3:jar:3.12.0:provided > >>> [INFO] | | +- org.apache.commons:commons-text:jar:1.10.0:provided > >>> [INFO] | | +- com.esotericsoftware.kryo:kryo:jar:2.24.0:provided > >>> [INFO] | | | +- com.esotericsoftware.minlog:minlog:jar:1.2:provided > >>> [INFO] | | | \- org.objenesis:objenesis:jar:2.1:provided > >>> [INFO] | | +- > >> commons-collections:commons-collections:jar:3.2.2:provided > >>> [INFO] | | \- org.apache.commons:commons-compress:jar:1.21:provided > >>> [INFO] | +- org.apache.flink:flink-file-sink-common:jar:1.18.0:provided > >>> [INFO] | +- org.apache.flink:flink-runtime:jar:1.18.0:provided > >>> [INFO] | | +- org.apache.flink:flink-rpc-core:jar:1.18.0:provided > >>> [INFO] | | +- > >> org.apache.flink:flink-rpc-akka-loader:jar:1.18.0:provided > >>> [INFO] | | +- > >>> org.apache.flink:flink-queryable-state-client-java:jar:1.18.0:provided > >>> [INFO] | | +- org.apache.flink:flink-hadoop-fs:jar:1.18.0:provided > >>> [INFO] | | +- commons-io:commons-io:jar:2.11.0:provided > >>> [INFO] | | +- > >>> org.apache.flink:flink-shaded-netty:jar:4.1.91.Final-17.0:provided > >>> [INFO] | | +- > >>> org.apache.flink:flink-shaded-zookeeper-3:jar:3.7.1-17.0:provided > >>> [INFO] | | +- org.javassist:javassist:jar:3.24.0-GA:provided > >>> [INFO] | | +- org.xerial.snappy:snappy-java:jar:1.1.10.4:runtime > >>> [INFO] | | \- org.lz4:lz4-java:jar:1.8.0:runtime > >>> [INFO] | +- org.apache.flink:flink-java:jar:1.18.0:provided > >>> [INFO] | | \- com.twitter:chill-java:jar:0.7.6:provided > >>> [INFO] | +- > >> org.apache.flink:flink-shaded-guava:jar:31.1-jre-17.0:provided > >>> [INFO] | +- org.apache.commons:commons-math3:jar:3.6.1:provided > >>> [INFO] | +- org.slf4j:slf4j-api:jar:1.7.36:runtime > >>> [INFO] | \- com.google.code.findbugs:jsr305:jar:1.3.9:provided > >>> [INFO] +- org.apache.flink:flink-clients:jar:1.18.0:provided > >>> [INFO] | +- org.apache.flink:flink-optimizer:jar:1.18.0:provided > >>> [INFO] | \- commons-cli:commons-cli:jar:1.5.0:provided > >>> [INFO] +- org.apache.flink:flink-connector-kafka:jar:3.0.2-1.18:compile > >>> [INFO] | +- org.apache.kafka:kafka-clients:jar:3.2.3:compile > >>> [INFO] | | \- com.github.luben:zstd-jni:jar:1.5.2-1:runtime > >>> [INFO] |