Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator
Thanks for the feedback! > > # 1 Flink Native vs Standalone integration > Maybe we should make this more clear in the FLIP but we agreed to do the > first version of the operator based on the native integration. > While this clearly does not cover all use-cases and requirements, it seems > this would lead to a much smaller initial effort and a nicer first version. > I'm also leaning towards the native integration, as long as it reduces the MVP effort. Ultimately the operator will need to also support the standalone mode. I would like to gain more confidence that native integration reduces the effort. While it cuts the effort to handle the TM pod creation, some mapping code from the CR to the native integration client and config needs to be created. As mentioned in the FLIP, native integration requires the Flink job manager to have access to the k8s API to create pods, which in some scenarios may be seen as unfavorable. > > > # Pod Template > > > Is the pod template in CR same with what Flink has already > supported[4]? > > > Then I am afraid not the arbitrary field(e.g. cpu/memory resources) > could > > > take effect. Yes, pod template would look almost identical. There are a few settings that the operator will control (and that may need to be blacklisted), but in general we would not want to place restrictions. I think a mechanism where a pod template is merged from multiple layers would also be interesting to make this more flexible. Cheers, Thomas
[jira] [Created] (FLINK-25861) Move states of AbstractAvroBulkFormat into its reader
Caizhi Weng created FLINK-25861: --- Summary: Move states of AbstractAvroBulkFormat into its reader Key: FLINK-25861 URL: https://issues.apache.org/jira/browse/FLINK-25861 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Reporter: Caizhi Weng Fix For: 1.15.0 FLINK-24565 ports avro format to {{BulkReaderFormatFactory}}. However the implementation leaves some states into the format factory itself. These states should be in the readers. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25860) Move read buffer allocation and output file creation to setup method for sort-shuffle result partition to avoid blocking main thread
Yingjie Cao created FLINK-25860: --- Summary: Move read buffer allocation and output file creation to setup method for sort-shuffle result partition to avoid blocking main thread Key: FLINK-25860 URL: https://issues.apache.org/jira/browse/FLINK-25860 Project: Flink Issue Type: Sub-task Components: Runtime / Network Reporter: Yingjie Cao Fix For: 1.15.0 Currently, the read buffer allocation and output file creation of sort-shuffle is performed by the main thread. These operations are a little heavy and can block the main thread for a while which may influence other RPC calls including follow-up task deployment. This ticket aims to solve the issue by moving read buffer allocation and output file creation to setup method. -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: [VOTE] Apache Flink Stateful Functions 3.2.0, release candidate #1
+1 (non-binding) - verified checksum and signatures - build from source code - version checked - test docker PR - test flink-statefun-playground/greeter with 3.2.0 Misc, do we want to upgrade flink-statefun-playground together? Currently the README information is a little behind. B.R. Mingmin On Wed, Jan 26, 2022 at 4:55 AM Till Rohrmann wrote: > Hi everyone, > > a quick update on the vote: > > The correct link for the artifacts at the Apache Nexus repository is > https://repository.apache.org/content/repositories/orgapacheflink-1485/. > > Moreover, there is now also a tag for the GoLang SDK: > https://github.com/apache/flink-statefun/tree/statefun-sdk-go/v3.2.0-rc1. > > Cheers, > Till > > On Tue, Jan 25, 2022 at 10:49 PM Till Rohrmann > wrote: > > > Hi everyone, > > > > Please review and vote on the release candidate #1 for the version 3.2.0 > > of Apache Flink Stateful Functions, as follows: > > > > [ ] +1, Approve the release > > [ ] -1, Do not approve the release (please provide specific comments) > > > > **Release Overview** > > > > As an overview, the release consists of the following: > > a) Stateful Functions canonical source distribution, to be deployed to > the > > release repository at dist.apache.org > > b) Stateful Functions Python SDK distributions to be deployed to PyPI > > c) Maven artifacts to be deployed to the Maven Central Repository > > d) New Dockerfiles for the release > > e) GoLang SDK (contained in the repository) > > f) JavaScript SDK (contained in the repository; will be uploaded to npm > > after the release) > > > > **Staging Areas to Review** > > > > The staging areas containing the above mentioned artifacts are as > follows, > > for your review: > > * All artifacts for a) and b) can be found in the corresponding dev > > repository at dist.apache.org [2] > > * All artifacts for c) can be found at the Apache Nexus Repository [3] > > > > All artifacts are signed with the key > > B9499FA69EFF5DEEEBC3C1F5BA7E4187C6F73D82 [4] > > > > Other links for your review: > > * JIRA release notes [5] > > * source code tag "release-3.2.0-rc1" [6] > > * PR for the new Dockerfiles [7] > > * PR to update the website Downloads page to include Stateful Functions > > links [8] > > * GoLang SDK [9] > > * JavaScript SDK [10] > > > > **Vote Duration** > > > > The voting time will run for at least 72 hours. > > It is adopted by majority approval, with at least 3 PMC affirmative > votes. > > > > Thanks, > > Till > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Stateful+Functions+Release > > [2] > https://dist.apache.org/repos/dist/dev/flink/flink-statefun-3.2.0-rc1/ > > [3] > > https://repository.apache.org/content/repositories/orgapacheflink-1483/ > > [4] https://dist.apache.org/repos/dist/release/flink/KEYS > > [5] > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12350540 > > [6] https://github.com/apache/flink-statefun/tree/release-3.2.0-rc1 > > [7] https://github.com/apache/flink-statefun-docker/pull/19 > > [8] https://github.com/apache/flink-web/pull/501 > > [9] > > > https://github.com/apache/flink-statefun/tree/release-3.2.0-rc1/statefun-sdk-go > > [10] > > > https://github.com/apache/flink-statefun/tree/release-3.2.0-rc1/statefun-sdk-js > > >
[jira] [Created] (FLINK-25859) Add documentation for DynamoDB Async Sink
Yuri Gusev created FLINK-25859: -- Summary: Add documentation for DynamoDB Async Sink Key: FLINK-25859 URL: https://issues.apache.org/jira/browse/FLINK-25859 Project: Flink Issue Type: New Feature Components: Connectors / Kinesis, Documentation Reporter: Yuri Gusev Assignee: Zichen Liu Fix For: 1.15.0 h2. Motivation _FLINK-24227 introduces a new sink for Kinesis Data Streams that supersedes the existing one based on KPL._ *Scope:* * Deprecate the current section in the docs for the Kinesis KPL sink and write documentation and usage guide for the new sink. h2. References More details to be found [https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25858) Remove ArchUnit rules for JUnit 4 in ITCaseRules the JUnit 4->5 migration is closed
Jing Ge created FLINK-25858: --- Summary: Remove ArchUnit rules for JUnit 4 in ITCaseRules the JUnit 4->5 migration is closed Key: FLINK-25858 URL: https://issues.apache.org/jira/browse/FLINK-25858 Project: Flink Issue Type: Improvement Components: Tests Reporter: Jing Ge Some ArchUnit rules have been created for JUnit 4 test during the JUnit 4->5 migration. Remove them after the migration is closed. To make the work easier, comment with "JUnit 4" text has been added. org.apache.flink.architecture.rules.ITCaseRules -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: [DISCUSS] Pushing Apache Flink 1.15 Feature Freeze
As there have been only votes in favour of pushing the feature freeze we pushed it to the 14th of February. The bi-weekly sync meeting will happen every week now. A nice holiday to all the Chinese contributors. Best, Joe > On 26.01.2022, at 09:26, David Morávek wrote: > > +1, especially for the reasons Yuan has mentioned > > D. > > On Wed, Jan 26, 2022 at 9:15 AM Yu Li wrote: > >> +1 to extend the feature freeze date to Feb. 14th, which might be a good >> Valentine's Day present for all Flink developers as well (smile). >> >> Best Regards, >> Yu >> >> >> On Wed, 26 Jan 2022 at 14:50, Yuan Mei wrote: >> >>> +1 extending feature freeze for one week. >>> >>> Code Freeze on 6th (end of Spring Festival) is equivalent to say code >>> freeze at the end of this week for Chinese buddies, since Spring Festival >>> starts next week. >>> It also means they should be partially available during the holiday, >>> otherwise they would block the release if any unexpected issues arise. >>> >>> The situation sounds a bit stressed and can be resolved very well by >>> extending the freeze date for a bit. >>> >>> Best >>> Yuan >>> >>> On Wed, Jan 26, 2022 at 11:18 AM Yun Tang wrote: >>> Since the official Spring Festival holidays in China starts from Jan >> 31th to Feb 6th, and many developers in China would enjoy the holidays at >> that time. +1 for extending the feature freeze. Best Yun Tang From: Jingsong Li Sent: Wednesday, January 26, 2022 10:32 To: dev Subject: Re: [DISCUSS] Pushing Apache Flink 1.15 Feature Freeze +1 for extending the feature freeze. Thanks Joe for driving. Best, Jingsong On Wed, Jan 26, 2022 at 12:04 AM Martijn Visser >> wrote: > > Hi all, > > +1 for extending the feature freeze. We could use the time to try to >>> wrap > up some important SQL related features and improvements. > > Best regards, > > Martijn > > On Tue, 25 Jan 2022 at 16:38, Johannes Moser >>> wrote: > >> Dear Flink community, >> >> as mentioned in the summary mail earlier some contributors voiced >>> that >> they would benefit from pushing the feature freeze for 1.15. by a >>> week. >> This would mean Monday, 14th of February 2022, end of business >> CEST. >> >> Please let us know in case you got any concerns. >> >> >> Best, >> Till, Yun Gao & Joe >>> >>
Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator
Hi All! Thanks for the questions, there are still quite a few unknowns and decisions to be made but here are my current thoughts: # 1 Flink Native vs Standalone integration Maybe we should make this more clear in the FLIP but we agreed to do the first version of the operator based on the native integration. While this clearly does not cover all use-cases and requirements, it seems this would lead to a much smaller initial effort and a nicer first version. # How do we run a Flink job from a CR? I am very much leaning toward using the ApplicationDeployer interface to submit jobs directly from java. Again this would be a very nice and simple Java solution. I think this will also help making the deployment interfaces more solid so we can then make them public. If there is no way around it we could also invoke the CLI classes from within the application but I would prefer not to. # Pod template I cannot comment on this yet :D Cheers, Gyula On Wed, Jan 26, 2022 at 12:38 PM Yang Wang wrote: > Hi Biao, > > # 1 Flink Native vs Standalone integration > I think we have got a trend in this discussion[1] that the newly introduced > Flink K8s operator will start with native K8s integration first. > Do you have some concerns about this? > > # 2 K8S StatefulSet v.s. K8S Deployment > IIUC, the FlinkDeployment is just a custom resource name. It does not mean > that we need to create a corresponding K8s deployment for JobManager or > TaskManager. > If we are using native K8s integration, the JobManager is started with K8s > deployment while TaskManagers are naked pods managed by > FlinkResourceManager. > > Actually, I think "FlinkDeployment" is easier to understand than > "FlinkStatefulSet" :) > > > [1]. https://lists.apache.org/thread/l1dkp8v4bhlcyb4tdts99g7w4wdglfy4 > > > Best, > Yang > > Biao Geng 于2022年1月26日周三 18:00写道: > > > Hi Thomas, > > Thanks a lot for the great efforts in this well-organized FLIP! After > > reading the FLIP carefully, I think Yang has given some great feedback > and > > I just want to share some of my concerns: > > # 1 Flink Native vs Standalone integration > > I believe it is reasonable to support both modes in the long run but in > the > > FLIP and previous thread[1], it seems that we have not made a decision on > > which one to implement initially. The FLIP mentioned "Maybe start with > > support for Flink Native" for reusing codes in [2]. Is it the selected > one > > finally? > > # 2 K8S StatefulSet v.s. K8S Deployment > > In the CR Example, I notice that the kind we use is FlinkDeployment. I > > would like to check if we have made the decision to use K8S Deployment > > workload resource. As the name implies, StatefulSet is for stateful apps > > while Deployment is usually for stateless apps. I think it is worthwhile > to > > consider the choice more carefully due to some user case in gcp > > operator[3], which may influence our other design choices(like the Flink > > application deletion strategy). > > > > Again, thanks for the work and I believe this FLIP is pretty useful for > > many customers and I hope I can make some contributions to this FLIP > impl! > > > > Best regard, > > Biao Geng > > > > [1] https://lists.apache.org/thread/l1dkp8v4bhlcyb4tdts99g7w4wdglfy4 > > [2] https://github.com/wangyang0918/flink-native-k8s-operator > > [3] > https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/pull/354 > > > > Yang Wang 于2022年1月26日周三 15:25写道: > > > > > Thanks Thomas for creating FLIP-212 to introduce the Flink Kubernetes > > > Operator. > > > > > > The proposal looks already very good to me and has integrated all the > > input > > > in the previous discussion(e.g. native K8s VS standalone, Go VS java). > > > > > > I read the FLIP carefully and have some questions that need to be > > > clarified. > > > > > > # How do we run a Flink job from a CR? > > > 1. Start a session cluster and then followed by submitting the Flink > job > > > via rest API > > > 2. Start a Flink application cluster which bundles one or more Flink > jobs > > > It is not clear enough to me which way we will choose. It seems that > the > > > existing google/lyft K8s operator is using #1. But I lean to #2 in the > > new > > > introduced K8s operator. > > > If #2 is the case, how could we get the job status when it finished or > > > failed? Maybe FLINK-24113[1] and FLINK-25715[2] could help. Or we may > > need > > > to enable the Flink history server[3]. > > > > > > > > > # ApplicationDeployer Interface or "flink run-application" / > > > "kubernetes-session.sh" > > > How do we start the Flink application or session cluster? > > > It will be great if we have the public and stable interfaces for > > deployment > > > in Flink. But currently we only have an internal interface > > > *ApplicationDeployer* to deploy the application cluster and > > > no interfaces for deploying session cluster. > > > Of cause, we could also use the CLI command for submission. However, it > > > will have poor performance when launching
[jira] [Created] (FLINK-25857) Add committer metrics to track the status of committables
Fabian Paul created FLINK-25857: --- Summary: Add committer metrics to track the status of committables Key: FLINK-25857 URL: https://issues.apache.org/jira/browse/FLINK-25857 Project: Flink Issue Type: Sub-task Components: Connectors / Common Reporter: Fabian Paul With Sink V2 we can now track the progress of a committable during committing and show metrics about the committing status. (i.e. failed, retried, succeeded). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25856) Fix use of UserDefinedType in from_elements
Huang Xingbo created FLINK-25856: Summary: Fix use of UserDefinedType in from_elements Key: FLINK-25856 URL: https://issues.apache.org/jira/browse/FLINK-25856 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.14.3, 1.15.0 Reporter: Huang Xingbo Assignee: Huang Xingbo Fix For: 1.15.0, 1.14.4 If we define a new UserDefinedType, and use it in `from_elements`, it will failed. {code:python} class VectorUDT(UserDefinedType): @classmethod def sql_type(cls): return DataTypes.ROW( [ DataTypes.FIELD("type", DataTypes.TINYINT()), DataTypes.FIELD("size", DataTypes.INT()), DataTypes.FIELD("indices", DataTypes.ARRAY(DataTypes.INT())), DataTypes.FIELD("values", DataTypes.ARRAY(DataTypes.DOUBLE())), ] ) @classmethod def module(cls): return "pyflink.ml.core.linalg" def serialize(self, obj): if isinstance(obj, SparseVector): indices = [int(i) for i in obj._indices] values = [float(v) for v in obj._values] return 0, obj.size(), indices, values elif isinstance(obj, DenseVector): values = [float(v) for v in obj._values] return 1, None, None, values else: raise TypeError("Cannot serialize %r of type %r".format(obj, type(obj))) {code} {code:python} self.t_env.from_elements([ (Vectors.dense([1, 2, 3, 4]), 0., 1.), (Vectors.dense([2, 2, 3, 4]), 0., 2.), (Vectors.dense([3, 2, 3, 4]), 0., 3.), (Vectors.dense([4, 2, 3, 4]), 0., 4.), (Vectors.dense([5, 2, 3, 4]), 0., 5.), (Vectors.dense([11, 2, 3, 4]), 1., 1.), (Vectors.dense([12, 2, 3, 4]), 1., 2.), (Vectors.dense([13, 2, 3, 4]), 1., 3.), (Vectors.dense([14, 2, 3, 4]), 1., 4.), (Vectors.dense([15, 2, 3, 4]), 1., 5.), ], DataTypes.ROW([ DataTypes.FIELD("features", VectorUDT()), DataTypes.FIELD("label", DataTypes.DOUBLE()), DataTypes.FIELD("weight", DataTypes.DOUBLE())])) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25855) DefaultDeclarativeSlotPool rejects offered slots when the job is restarting
Till Rohrmann created FLINK-25855: - Summary: DefaultDeclarativeSlotPool rejects offered slots when the job is restarting Key: FLINK-25855 URL: https://issues.apache.org/jira/browse/FLINK-25855 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Affects Versions: 1.14.3, 1.15.0 Reporter: Till Rohrmann The {{DefaultDeclarativeSlotPool}} rejects offered slots if the job is currently restarting. The problem is that in case of a job restart, the scheduler sets the required resources to zero. Hence, all offered slots will be rejected. This is a problem for local recovery because rejected slots will be freed by the {{TaskExecutor}} and thereby all local state will be deleted. Hence, in order to properly support local recovery, we need to handle this situation somehow. I do see different options here: h3. Accept excess slots Accepting excess slots means that the {{DefaultDeclarativeSlotPool}} accepts slots which exceed the currently required set of slots. Advantages: * Easy to implement Disadvantages: * Offered slots that are not really needed will only be freed after the idle slot timeout. This means that some resources might be left unused for some time. h3. Let DefaultDeclarativeSlotPool accept excess slots when job is restarting Here the idea is to only accept excess slots when the job is currently restarting. This will required that the scheduler tells the {{DefaultDeclarativeSlotPool}} about the restarting state. Advantages: * We would only accept excess slots for the time of restarting Disadvantages: * We are complicating the semantics of the {{DefaultDeclarativeSlotPool}}. Moreover, we are introducing additional signals that communicate the restarting state to the pool. h3. Don't immediately free slots on the TaskExecutor when they are rejected Instead of freeing the slot immediately on the {{TaskExecutor}} after it is rejected. We could also retry for some time and only free the slot after some timeout. Advantages: * No changes on the JobMaster side needed. Disadvantages: * Complication of the slot lifecycle on the {{TaskExecutor}} * Unneeded slots are not made available for other jobs as fast as possible -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25854) Annotate legacy FileSource as deprecated
Martijn Visser created FLINK-25854: -- Summary: Annotate legacy FileSource as deprecated Key: FLINK-25854 URL: https://issues.apache.org/jira/browse/FLINK-25854 Project: Flink Issue Type: Technical Debt Components: Connectors / FileSystem Reporter: Martijn Visser Flink has a new FileSystem connector, which is compatible with the new Source and Sink APIs. However, there are still references and implementations using the legacy FileSource, like {{ContinuousFileReaderOperator}} and {{FilesystemTableSource}}. We should make sure that we annotate these as deprecated so they can be removed in a next Flink release. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25853) Annotate current JDBC connector as deprecated to prepare for new JDBC connector
Martijn Visser created FLINK-25853: -- Summary: Annotate current JDBC connector as deprecated to prepare for new JDBC connector Key: FLINK-25853 URL: https://issues.apache.org/jira/browse/FLINK-25853 Project: Flink Issue Type: Technical Debt Components: Connectors / JDBC Reporter: Martijn Visser The current JDBC connector still uses the old interfaces (SourceFunction and SinkFunction). In a next release of Flink, we want to replace the current JDBC connector with a successor that does use the new Source and Sink APIs. In order to be able to remove the old JDBC connector in a next release, we need to annotate it as deprecated so it can be removed in a future one. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25852) Annotate SourceFunction and SinkFunction as deprecated
Martijn Visser created FLINK-25852: -- Summary: Annotate SourceFunction and SinkFunction as deprecated Key: FLINK-25852 URL: https://issues.apache.org/jira/browse/FLINK-25852 Project: Flink Issue Type: Technical Debt Components: Connectors / Common Reporter: Martijn Visser The SourceFunction and SinkFunction should not be used by connectors anymore, who should be using the new Source API [(See FLIP-27|https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface] + [Sources documentation)|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/] and/or Sink API [(See FLIP-143|https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API] or [FLIP-171)|https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink]. Therefore we should properly annotate these as deprecated so they can be removed in future versions. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25851) CassandraConnectorITCase.testRetrialAndDropTables shows table already exists errors on AZP
Etienne Chauchot created FLINK-25851: Summary: CassandraConnectorITCase.testRetrialAndDropTables shows table already exists errors on AZP Key: FLINK-25851 URL: https://issues.apache.org/jira/browse/FLINK-25851 Project: Flink Issue Type: Bug Components: Connectors / Cassandra Affects Versions: 1.15.0 Reporter: Etienne Chauchot https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30050&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&s=ae4f8708-9994-57d3-c2d7-b892156e7812&t=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d&l=11999 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[RESULT][VOTE] FLIP-203: Incremental savepoints
Hi, FLIP-203 [1] Has been accepted. There were 4 binding votes and 2 non-binding in favor. None against. Binding: Till Rohrmann Dawid Wysakowicz Konstantin Knauf Yu Li Non-binding: David Moravek Anton Kalashnikov Best, Piotrek [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints
Re: [VOTE] FLIP-203: Incremental savepoints
Hi, Thank you for casting your votes. I have forgotten to set the voting period, but let's implicitly assume it was 3 days from the creation of the voting thread. FLIP-203 [1] Has been accepted. There were 4 binding votes and 2 non-binding in favor. None against. Best, Piotrek [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints śr., 26 sty 2022 o 18:27 Anton Kalashnikov napisał(a): > +1 (non-binding) > > Thanks Piotr. > -- > Best regards, > Anton Kalashnikov > > 26.01.2022 11:21, David Anderson пишет: > > +1 (non-binding) > > > > I'm pleased to see this significant improvement coming along, as well as > > the effort made in the FLIP to document what is and isn't supported (and > > where ??? remain). > > > > On Wed, Jan 26, 2022 at 10:58 AM Yu Li wrote: > > > >> +1 (binding) > >> > >> Thanks for driving this Piotr! Just one more (belated) suggestion: in > the > >> "Checkpoint vs savepoint guarantees" section, there are still question > >> marks scattered in the table, and I suggest putting all TODO works into > the > >> "Limitations" section, or adding a "Future Work" section, for easier > later > >> tracking. > >> > >> Best Regards, > >> Yu > >> > >> > >> On Mon, 24 Jan 2022 at 18:48, Konstantin Knauf > wrote: > >> > >>> Thanks, Piotr. Proposal looks good. > >>> > >>> +1 (binding) > >>> > >>> On Mon, Jan 24, 2022 at 11:20 AM David Morávek > wrote: > >>> > +1 (non-binding) > > Best, > D. > > On Mon, Jan 24, 2022 at 10:54 AM Dawid Wysakowicz < > >>> dwysakow...@apache.org> > wrote: > > > +1 (binding) > > > > Best, > > > > Dawid > > > > On 24/01/2022 09:56, Piotr Nowojski wrote: > >> Hi, > >> > >> As there seems to be no further questions about the FLIP-203 [1] I > would > >> propose to start a voting thread for it. > >> > >> For me there are still two unanswered questions, whether we want to > > support > >> schema evolution and State Processor API with native format > >> snapshots > or > >> not. But I would propose to tackle them as follow ups, since those > >>> are > >> pre-existing issues of the native format checkpoints, and could be > >>> done > >> completely independently of providing the native format support in > >> savepoints. > >> > >> Best, > >> Piotrek > >> > >> [1] > >> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints > >>> > >>> -- > >>> > >>> Konstantin Knauf > >>> > >>> https://twitter.com/snntrable > >>> > >>> https://github.com/knaufk > >>>
[jira] [Created] (FLINK-25850) Consider notifying nested state backend about checkpoint abortion
Roman Khachatryan created FLINK-25850: - Summary: Consider notifying nested state backend about checkpoint abortion Key: FLINK-25850 URL: https://issues.apache.org/jira/browse/FLINK-25850 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Reporter: Roman Khachatryan Fix For: 1.16.0 The notification is optional, but some backends might do GC upon receiving it. {code} These notifications are "best effort", meaning they can sometimes be skipped. This method is very rarely necessary to implement. {code} The usefulness is also limited by: - low probability of notification reaching backend because of the difference in intervals and cleanup on checkpoint completion - low probability of backends making good use of it because it's delivered after snapshot is done; and backends must be resilient to missing notifications There is added complexity and risk (such as FLINK-25816). Probably, complexity can be eliminated by extracting some Notifier class from ChangelogStateBackend. cc: [~yunta] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25849) Differentiate TaskManager sessions
Till Rohrmann created FLINK-25849: - Summary: Differentiate TaskManager sessions Key: FLINK-25849 URL: https://issues.apache.org/jira/browse/FLINK-25849 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Affects Versions: 1.15.0 Reporter: Till Rohrmann Assignee: Till Rohrmann With the introduction of configurable {{ResourceID}} for {{TaskManager}} processes, it can happen that a restarted {{TaskManager}} process will be restarted with the same {{ResourceID}}. When it now tries to register at the {{JobMaster}}, the {{JobMaster}} won't recognize it as a new instance because it only compares the {{ResourceID}}. As a consequence, the {{JobMaster}} things that this is a duplicate registration and ignores it. It would be better if the {{TaskManager}} would send a session id with the registration that could then be used to decide whether a new instance tries to register at the {{JobMaster}} and, therefore, the old one needs to be disconnected or whether the registration attempt is a duplicate. -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: [ANNOUNCE] flink-shaded 15.0 released
Thanks a lot for driving this release Chesnay! Cheers, Till On Thu, Jan 27, 2022 at 10:33 AM Francesco Guardiani < france...@ververica.com> wrote: > Thanks Chesnay for this! > > FG > > On Mon, Jan 24, 2022 at 11:20 AM David Morávek wrote: > > > That's a great news Chesnay, thanks for driving this! This should unblock > > some ongoing Flink efforts +1 > > > > Best, > > D. > > > > On Mon, Jan 24, 2022 at 10:58 AM Chesnay Schepler > > wrote: > > > > > Hello everyone, > > > > > > we got a new flink-shaded release, with several nifty things: > > > > > > * updated version for ASM, required for Java 17 > > > * jackson extensions for optionals/datetime, which will be used by > the > > > Table API (and maybe REST API) > > > * a relocated version of swagger, finally unblocking the merge of our > > > experimental swagger spec > > > * updated version for Netty, providing a proper fix for FLINK-24197 > > > > > > > > >
Re: [ANNOUNCE] flink-shaded 15.0 released
Thanks Chesnay for this! FG On Mon, Jan 24, 2022 at 11:20 AM David Morávek wrote: > That's a great news Chesnay, thanks for driving this! This should unblock > some ongoing Flink efforts +1 > > Best, > D. > > On Mon, Jan 24, 2022 at 10:58 AM Chesnay Schepler > wrote: > > > Hello everyone, > > > > we got a new flink-shaded release, with several nifty things: > > > > * updated version for ASM, required for Java 17 > > * jackson extensions for optionals/datetime, which will be used by the > > Table API (and maybe REST API) > > * a relocated version of swagger, finally unblocking the merge of our > > experimental swagger spec > > * updated version for Netty, providing a proper fix for FLINK-24197 > > > > >
[jira] [Created] (FLINK-25848) [FLIP-171] KDS Sink does not fast fail when invalid configuration supplied
Danny Cranmer created FLINK-25848: - Summary: [FLIP-171] KDS Sink does not fast fail when invalid configuration supplied Key: FLINK-25848 URL: https://issues.apache.org/jira/browse/FLINK-25848 Project: Flink Issue Type: Sub-task Components: Connectors / Kinesis Reporter: Danny Cranmer Fix For: 1.15.0 h4. Description KDS sink does not fail job when invalid configuration provided. h4. Reproduction Steps - Start a job using an Async Sink implementation, for example KDS - Specify an invalid credential provider configuration, for example {code} CREATE TABLE orders ( `code` STRING, `quantity` BIGINT ) WITH ( 'connector' = 'kinesis', 'stream' = 'source', 'aws.credentials.provider' = 'ASSUME_ROLE', 'aws.region' = 'us-east-1', 'format' = 'json' ); {code} h4. Actual Results - Sink operator transitions to running, consistently retrying {code} 2022-01-27 08:29:31,582 WARN org.apache.flink.connector.kinesis.sink.KinesisDataStreamsSinkWriter [] - KDS Sink failed to persist 5 entries to KDS java.util.concurrent.CompletionException: org.apache.flink.kinesis.shaded.software.amazon.awssdk.services.sts.model.StsException: 2 validation errors detected: Value null at 'roleArn' failed to satisfy constraint: Member must not be null; Value null at 'roleSessionName' failed to satisfy constraint: Member must not be null (Service: Sts, Status Code: 400, Request ID: af8f2176-aafa-4230-805b-72d90e418810, Extended Request ID: null) at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870) ~[?:?] at java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883) ~[?:?] at java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2251) ~[?:?] at org.apache.flink.kinesis.shaded.software.amazon.awssdk.services.kinesis.DefaultKinesisAsyncClient.putRecords(DefaultKinesisAsyncClient.java:2112) ~[blob_p-a167cef2d4088e526c4e209bb2b4e5a83f8ab359-fc0375a6ef7545bc316b770d110d4564:1.15-SNAPSHOT] at org.apache.flink.connector.kinesis.sink.KinesisDataStreamsSinkWriter.submitRequestEntries(KinesisDataStreamsSinkWriter.java:122) ~[blob_p-a167cef2d4088e526c4e209bb2b4e5a83f8ab359-fc0375a6ef7545bc316b770d110d4564:1.15-SNAPSHOT] at org.apache.flink.connector.base.sink.writer.AsyncSinkWriter.flush(AsyncSinkWriter.java:311) ~[flink-connector-files-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.connector.base.sink.writer.AsyncSinkWriter.prepareCommit(AsyncSinkWriter.java:391) ~[flink-connector-files-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.operators.sink.SinkOperator.endInput(SinkOperator.java:192) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.endOperatorInput(StreamOperatorWrapper.java:96) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.endInput(RegularOperatorChain.java:97) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:68) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:517) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:802) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:751) ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) [flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) [flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) [flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) [flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT] at java.lang.Thread.run(Thread.java:829) [?:?] {code} h4. Expected Results - Job fails fast h4. Suggested Resolution -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25847) KubernetesHighAvailabilityRecoverFromSavepointITCase. testRecoverFromSavepoint failed on azure
Yun Gao created FLINK-25847: --- Summary: KubernetesHighAvailabilityRecoverFromSavepointITCase. testRecoverFromSavepoint failed on azure Key: FLINK-25847 URL: https://issues.apache.org/jira/browse/FLINK-25847 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.15.0 Reporter: Yun Gao {code:java} 2022-01-27T06:08:57.7214748Z Jan 27 06:08:57 [INFO] Running org.apache.flink.kubernetes.highavailability.KubernetesHighAvailabilityRecoverFromSavepointITCase 2022-01-27T06:10:23.2568324Z Jan 27 06:10:23 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 85.553 s <<< FAILURE! - in org.apache.flink.kubernetes.highavailability.KubernetesHighAvailabilityRecoverFromSavepointITCase 2022-01-27T06:10:23.2572289Z Jan 27 06:10:23 [ERROR] org.apache.flink.kubernetes.highavailability.KubernetesHighAvailabilityRecoverFromSavepointITCase.testRecoverFromSavepoint Time elapsed: 84.078 s <<< ERROR! 2022-01-27T06:10:23.2573945Z Jan 27 06:10:23 java.util.concurrent.TimeoutException 2022-01-27T06:10:23.2574625Z Jan 27 06:10:23at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) 2022-01-27T06:10:23.2575381Z Jan 27 06:10:23at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) 2022-01-27T06:10:23.2576428Z Jan 27 06:10:23at org.apache.flink.kubernetes.highavailability.KubernetesHighAvailabilityRecoverFromSavepointITCase.testRecoverFromSavepoint(KubernetesHighAvailabilityRecoverFromSavepointITCase.java:104) 2022-01-27T06:10:23.2578437Z Jan 27 06:10:23at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2022-01-27T06:10:23.2579141Z Jan 27 06:10:23at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 2022-01-27T06:10:23.2579893Z Jan 27 06:10:23at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2022-01-27T06:10:23.2594686Z Jan 27 06:10:23at java.lang.reflect.Method.invoke(Method.java:498) 2022-01-27T06:10:23.2595622Z Jan 27 06:10:23at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) 2022-01-27T06:10:23.2596397Z Jan 27 06:10:23at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) 2022-01-27T06:10:23.2597158Z Jan 27 06:10:23at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) 2022-01-27T06:10:23.2597900Z Jan 27 06:10:23at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) 2022-01-27T06:10:23.2598630Z Jan 27 06:10:23at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 2022-01-27T06:10:23.2599335Z Jan 27 06:10:23at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) 2022-01-27T06:10:23.2600044Z Jan 27 06:10:23at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) 2022-01-27T06:10:23.2600736Z Jan 27 06:10:23at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) 2022-01-27T06:10:23.2601408Z Jan 27 06:10:23at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 2022-01-27T06:10:23.2602124Z Jan 27 06:10:23at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) 2022-01-27T06:10:23.2602831Z Jan 27 06:10:23at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 2022-01-27T06:10:23.2603531Z Jan 27 06:10:23at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) 2022-01-27T06:10:23.2604270Z Jan 27 06:10:23at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) 2022-01-27T06:10:23.2604975Z Jan 27 06:10:23at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 2022-01-27T06:10:23.2605641Z Jan 27 06:10:23at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 2022-01-27T06:10:23.2606313Z Jan 27 06:10:23at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 2022-01-27T06:10:23.2607713Z Jan 27 06:10:23at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 2022-01-27T06:10:23.2608497Z Jan 27 06:10:23at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 2022-01-27T06:10:23.2609049Z Jan 27 06:10:23at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) 2022-01-27T06:10:23.2609623Z Jan 27 06:10:23at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) 2022-01-27T06:10:23.2610165Z Jan 27 06:10:23at org.junit.rules.RunRules.evaluate(RunRules.java:20) 2022-01-27T06:10:23.2610700Z Jan 27 06:10:23at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 2022-01-27T06:10:23.2611621Z Jan 27 06:10:23at org.junit.runners.ParentRunner.run(ParentRunner.java:413) 2022-01-27T06:10:23.2612145Z Jan 27 06:10
[jira] [Created] (FLINK-25846) Async Sink does not gracefully shutdown on Cancel
Danny Cranmer created FLINK-25846: - Summary: Async Sink does not gracefully shutdown on Cancel Key: FLINK-25846 URL: https://issues.apache.org/jira/browse/FLINK-25846 Project: Flink Issue Type: Sub-task Reporter: Danny Cranmer h4. Description Async Sink does not react gracefully to cancellation signal h4. Reproduction Steps - Start a job using an Async Sink implementation, for example KDS - Navigate to Flink Dashboard - Click Job > Cancel h4. Actual Results - Sink operator stuck in Cancelling, retrying h4. Expected Results - Sink operator closes h4. Suggested Resolution - Async Sink should treat `InterruptedException` as stop signal -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25845) Expose plan via SQL COMPILE / EXECUTE PLAN
Timo Walther created FLINK-25845: Summary: Expose plan via SQL COMPILE / EXECUTE PLAN Key: FLINK-25845 URL: https://issues.apache.org/jira/browse/FLINK-25845 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Timo Walther This includes: {{EXECUTE PLAN '/mydir/plan.json';}} as mentioned in: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI&SQLPrograms-EXECUTE and https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI&SQLPrograms-COMPILE with option {{table.plan.force-recompile}} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25844) Expose plan via StatementSet.compilePlan
Timo Walther created FLINK-25844: Summary: Expose plan via StatementSet.compilePlan Key: FLINK-25844 URL: https://issues.apache.org/jira/browse/FLINK-25844 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Timo Walther https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI&SQLPrograms-StatementSets It should be marked as {{@Experimental}}. We should check whether {{StreamStatementSet}} throws a helpful exception for DataStreams that we don't support yet. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25843) Expose plan via Table.compilePlan/TableEnvironment.fromPlan
Timo Walther created FLINK-25843: Summary: Expose plan via Table.compilePlan/TableEnvironment.fromPlan Key: FLINK-25843 URL: https://issues.apache.org/jira/browse/FLINK-25843 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Timo Walther https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI&SQLPrograms-SQLQueryandTableQuery The first version should be marked as {{@Experimental}}. We should also verify the end-to-end story for helpful exceptions e.g. when coming `fromDataStream` that is currently not supported. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25842) [v2] FLIP-158: Generalized incremental checkpoints
Roman Khachatryan created FLINK-25842: - Summary: [v2] FLIP-158: Generalized incremental checkpoints Key: FLINK-25842 URL: https://issues.apache.org/jira/browse/FLINK-25842 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Reporter: Roman Khachatryan Fix For: 1.16.0 Umbrella ticket for the 2nd iteration of [FLIP-158: Generalized incremental checkpoints|https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25841) Expose plan via TableEnvironment.compilePlanSql/executePlan
Timo Walther created FLINK-25841: Summary: Expose plan via TableEnvironment.compilePlanSql/executePlan Key: FLINK-25841 URL: https://issues.apache.org/jira/browse/FLINK-25841 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Timo Walther Assignee: Francesco Guardiani - Introduce the helper API class [CompiledPlan|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI&SQLPrograms-CompiledPlan] - Allow [single SQL statements|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI&SQLPrograms-SingleSQLStatements] to be generated and restored Mark all interfaces as {{@Experimental}}. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25840) Add semantic test support in the connector testframe
Hang Ruan created FLINK-25840: - Summary: Add semantic test support in the connector testframe Key: FLINK-25840 URL: https://issues.apache.org/jira/browse/FLINK-25840 Project: Flink Issue Type: Sub-task Components: Tests Affects Versions: 1.15.0 Reporter: Hang Ruan -- This message was sent by Atlassian Jira (v8.20.1#820001)