[jira] [Comment Edited] (FLINK-34043) Remove deprecated Sink V2 interfaces
[ https://issues.apache.org/jira/browse/FLINK-34043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878995#comment-17878995 ] Peter Vary edited comment on FLINK-34043 at 9/3/24 7:23 PM: [~guoweijie]: It would be nice if you could remove the deprecated interfaces/methods. Thanks, Peter was (Author: pvary): [~guoweijie]: It would be nice if you could remove the deprecated interfaces/methods? > Remove deprecated Sink V2 interfaces > > > Key: FLINK-34043 > URL: https://issues.apache.org/jira/browse/FLINK-34043 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > > In Flink 1.20.0 we should remove the interfaces deprecated by FLINK-33973 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34043) Remove deprecated Sink V2 interfaces
[ https://issues.apache.org/jira/browse/FLINK-34043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878995#comment-17878995 ] Peter Vary commented on FLINK-34043: [~guoweijie]: It would be nice if you could remove the deprecated interfaces/methods? > Remove deprecated Sink V2 interfaces > > > Key: FLINK-34043 > URL: https://issues.apache.org/jira/browse/FLINK-34043 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > > In Flink 1.20.0 we should remove the interfaces deprecated by FLINK-33973 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34840) [3.1][pipeline-connectors] Add Implementation of DataSink in Iceberg.
[ https://issues.apache.org/jira/browse/FLINK-34840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849623#comment-17849623 ] Peter Vary commented on FLINK-34840: If you need help with Iceberg reviews, feel free to ping me. > [3.1][pipeline-connectors] Add Implementation of DataSink in Iceberg. > - > > Key: FLINK-34840 > URL: https://issues.apache.org/jira/browse/FLINK-34840 > Project: Flink > Issue Type: Improvement > Components: Flink CDC >Reporter: Flink CDC Issue Import >Priority: Major > Labels: github-import > > ### Search before asking > - [X] I searched in the > [issues|https://github.com/ververica/flink-cdc-connectors/issues] and found > nothing similar. > ### Motivation > Add pipeline sink Implementation for https://github.com/apache/iceberg. > ### Solution > _No response_ > ### Alternatives > _No response_ > ### Anything else? > _No response_ > ### Are you willing to submit a PR? > - [ ] I'm willing to submit a PR! > Imported from GitHub > Url: https://github.com/apache/flink-cdc/issues/2863 > Created by: [lvyanquan|https://github.com/lvyanquan] > Labels: enhancement, > Created at: Wed Dec 13 14:37:54 CST 2023 > State: open -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35051) Weird priorities when processing unaligned checkpoints
[ https://issues.apache.org/jira/browse/FLINK-35051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835142#comment-17835142 ] Peter Vary commented on FLINK-35051: FLINK-34704 is one of the ways this issue materializes > Weird priorities when processing unaligned checkpoints > -- > > Key: FLINK-35051 > URL: https://issues.apache.org/jira/browse/FLINK-35051 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing, Runtime / Network, Runtime / > Task >Affects Versions: 1.16.3, 1.17.2, 1.19.0, 1.18.1 >Reporter: Piotr Nowojski >Priority: Major > > While looking through the code I noticed that `StreamTask` is processing > unaligned checkpoints in strange order/priority. The end result is that > unaligned checkpoint `Start Delay` / triggering checkpoints in `StreamTask` > can be unnecessary delayed by other mailbox actions in the system, like for > example: > * processing time timers > * `AsyncWaitOperator` results > * ... > Incoming UC barrier is treated as a priority event by the network stack (it > will be polled from the input before anything else). This is what we want, > but polling elements from network stack has lower priority then processing > enqueued mailbox actions. > Secondly, if AC barrier timeout to UC, that's done via a mailbox action, but > this mailbox action is also not prioritised in any way, so other mailbox > actions could be unnecessarily executed first. > On top of that there is a clash of two separate concepts here: > # Mailbox priority. yieldToDownstream - so in a sense reverse to what we > would like to have for triggering checkpoint, but that only kicks in #yield() > calls, where it's actually correct, that operator in a middle of execution > can not yield to checkpoint - it should only yield to downstream. > # Control mails in mailbox executor - cancellation is done via that, it > bypasses whole mailbox queue. > # Priority events in the network stack. > It's unfortunate that 1. vs 3. has a naming clash, as priority name is used > in both things, and highest network priority event containing UC barrier, > when executed via mailbox has actually the lowest mailbox priority. > Control mails mechanism is a kind of priority mails executed out of order, > but doesn't generalise well for use in checkpointing. > This whole thing should be re-worked at some point. Ideally what we would > like have is that: > * mail to convert AC barriers to UC > * polling UC barrier from the network input > * checkpoint trigger via RPC for source tasks > should be processed first, with an exception of yieldToDownstream, where > current mailbox priorities should be adhered. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34042) Update the documentation for Sink V2
[ https://issues.apache.org/jira/browse/FLINK-34042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814739#comment-17814739 ] Peter Vary commented on FLINK-34042: Currently there is no existing documentation for the Sink V2 API. It would be good to create on, but this is not a blocker > Update the documentation for Sink V2 > > > Key: FLINK-34042 > URL: https://issues.apache.org/jira/browse/FLINK-34042 > Project: Flink > Issue Type: Sub-task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > > Check the documentation and update the Sink V2 API usages whenever it is > needed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-33328) Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable
[ https://issues.apache.org/jira/browse/FLINK-33328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary closed FLINK-33328. -- Resolution: Duplicate Solved as part of FLINK-33972 > Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the > Committable > --- > > Key: FLINK-33328 > URL: https://issues.apache.org/jira/browse/FLINK-33328 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream, Connectors / Common >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > > In case of the Iceberg Sink, we would like to use the _WithPreCommitTopology_ > to aggregate the writer results and create a single committable from them. So > we would like to change both the type, and the number of the messages. Using > the current _WithPreCommitTopology_ interface we can work around the issue by > using a Tuple, or POJO where some of the fields are used only before the > _addPreCommitTopology_ method, and some of the fields are only used after the > method, but this seems more like abusing the interface than using it. > This is a more generic issue where the _WithPreCommitTopology_ should provide > a way to transform not only the data, but the type of the data channelled > through it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-34307) Release Testing Instructions: Verify FLINK-33972 Enhance and synchronize Sink API to match the Source API
[ https://issues.apache.org/jira/browse/FLINK-34307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary closed FLINK-34307. -- Resolution: Won't Fix [~lincoln.86xy]: Thanks for managing the release. For now, the unit tests are sufficient. > Release Testing Instructions: Verify FLINK-33972 Enhance and synchronize Sink > API to match the Source API > - > > Key: FLINK-34307 > URL: https://issues.apache.org/jira/browse/FLINK-34307 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: Peter Vary >Priority: Blocker > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-34306) Release Testing Instructions: Verify FLINK-25857 Add committer metrics to track the status of committables
[ https://issues.apache.org/jira/browse/FLINK-34306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved FLINK-34306. Resolution: Won't Fix > Release Testing Instructions: Verify FLINK-25857 Add committer metrics to > track the status of committables > --- > > Key: FLINK-34306 > URL: https://issues.apache.org/jira/browse/FLINK-34306 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Common >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: Peter Vary >Priority: Blocker > Fix For: 1.19.0 > > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34306) Release Testing Instructions: Verify FLINK-25857 Add committer metrics to track the status of committables
[ https://issues.apache.org/jira/browse/FLINK-34306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814733#comment-17814733 ] Peter Vary commented on FLINK-34306: [~lincoln.86xy]: Thanks for managing the release. For now, the unit tests are sufficient. > Release Testing Instructions: Verify FLINK-25857 Add committer metrics to > track the status of committables > --- > > Key: FLINK-34306 > URL: https://issues.apache.org/jira/browse/FLINK-34306 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Common >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: Peter Vary >Priority: Blocker > Fix For: 1.19.0 > > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-25857) Add committer metrics to track the status of committables
[ https://issues.apache.org/jira/browse/FLINK-25857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated FLINK-25857: --- Release Note: The TwoPhaseCommittingSink.createCommitter method parametrization has been changed, a new CommitterInitContext parameter has been added. The original method will remain available during the 1.19 release line, but they will be removed in consecutive releases. When migrating please also consider changes introduced by FLINK-33973 and FLIP-372 (https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API) > Add committer metrics to track the status of committables > - > > Key: FLINK-25857 > URL: https://issues.apache.org/jira/browse/FLINK-25857 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Common >Reporter: Fabian Paul >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > Attachments: image-2023-10-20-17-23-09-595.png, screenshot-1.png > > > With Sink V2 we can now track the progress of a committable during committing > and show metrics about the committing status. (i.e. failed, retried, > succeeded). > The voted FLIP > https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33973) Add new interfaces for SinkV2 to synchronize the API with the SourceV2 API
[ https://issues.apache.org/jira/browse/FLINK-33973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated FLINK-33973: --- Release Note: According to FILP-372 (https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API) the SinkV2 API has been changed. The following interfaces are deprecated: TwoPhaseCommittingSink, StatefulSink, WithPreWriteTopology, WithPreCommitTopology, WithPostCommitTopology. The following new interfaces has been introduced: CommitterInitContext, CommittingSinkWriter, WriterInitContext, StatefulSinkWriter. The following interface method's parameter has been changed: Sink.createWriter The original interfaces will remain available during the 1.19 release line, but they will be removed in consecutive releases. For the changes required when migrating, please consult the Migration Plan detailed in the FLIP > Add new interfaces for SinkV2 to synchronize the API with the SourceV2 API > -- > > Key: FLINK-33973 > URL: https://issues.apache.org/jira/browse/FLINK-33973 > Project: Flink > Issue Type: Sub-task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > Create the new interfaces, set inheritance and deprecation to finalize the > interface. > After this change the new interafaces will exits, but they will not be > functional. > The existing interfaces, and test should be working without issue, to verify > that adding the API will be backward compatible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34228) Add long UTF serializer/deserializer
Peter Vary created FLINK-34228: -- Summary: Add long UTF serializer/deserializer Key: FLINK-34228 URL: https://issues.apache.org/jira/browse/FLINK-34228 Project: Flink Issue Type: Improvement Reporter: Peter Vary DataOutputSerializer.writeUTF has a hard limit on the length of the string (64k). This is inherited from the DataOutput.writeUTF method, where the JDK specifically defines this limit [1]. For our use-case we need to enable the possibility to serialize longer UTF strings, so we will need to define a writeLongUTF method with a similar specification than the writeUTF, but without the length limit. Based on the discussion on the mailing list, this is a good additional serialization utility to Flink [2] [1] - https://docs.oracle.com/javase/8/docs/api/java/io/DataOutput.html#writeUTF-java.lang.String- [2] - https://lists.apache.org/thread/ocm6cj0h8o3wbwo7fz2l1b4odss750rk -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34209) Migrate FileSink to the new SinkV2 API
Peter Vary created FLINK-34209: -- Summary: Migrate FileSink to the new SinkV2 API Key: FLINK-34209 URL: https://issues.apache.org/jira/browse/FLINK-34209 Project: Flink Issue Type: Sub-task Reporter: Peter Vary Currently `FileSink` uses `TwoPhaseCommittingSink` and `StatefulSink` from the SinkV2 API. We should migrate it to use the new SinkV2 API -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34208) Migrate SinkV1Adapter to the new SinkV2 API
Peter Vary created FLINK-34208: -- Summary: Migrate SinkV1Adapter to the new SinkV2 API Key: FLINK-34208 URL: https://issues.apache.org/jira/browse/FLINK-34208 Project: Flink Issue Type: Sub-task Reporter: Peter Vary Currently SinkV1Adapter still using `TwoPhaseCommittingSink` and `StatefulSink`. We should migrate it to use the new API -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34043) Remove deprecated Sink V2 interfaces
Peter Vary created FLINK-34043: -- Summary: Remove deprecated Sink V2 interfaces Key: FLINK-34043 URL: https://issues.apache.org/jira/browse/FLINK-34043 Project: Flink Issue Type: Sub-task Components: API / DataStream Reporter: Peter Vary In Flink 1.20.0 we should remove the interfaces deprecated by FLINK-33973 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34042) Update the documentation for Sink V2
[ https://issues.apache.org/jira/browse/FLINK-34042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated FLINK-34042: --- Description: Check the documentation and update the Sink V2 API usages whenever it is needed > Update the documentation for Sink V2 > > > Key: FLINK-34042 > URL: https://issues.apache.org/jira/browse/FLINK-34042 > Project: Flink > Issue Type: Sub-task >Reporter: Peter Vary >Priority: Major > > Check the documentation and update the Sink V2 API usages whenever it is > needed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34042) Update the documentation
Peter Vary created FLINK-34042: -- Summary: Update the documentation Key: FLINK-34042 URL: https://issues.apache.org/jira/browse/FLINK-34042 Project: Flink Issue Type: Sub-task Reporter: Peter Vary -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34042) Update the documentation for Sink V2
[ https://issues.apache.org/jira/browse/FLINK-34042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated FLINK-34042: --- Summary: Update the documentation for Sink V2 (was: Update the documentation ) > Update the documentation for Sink V2 > > > Key: FLINK-34042 > URL: https://issues.apache.org/jira/browse/FLINK-34042 > Project: Flink > Issue Type: Sub-task >Reporter: Peter Vary >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33975) Tests for the new Sink V2 transformations
Peter Vary created FLINK-33975: -- Summary: Tests for the new Sink V2 transformations Key: FLINK-33975 URL: https://issues.apache.org/jira/browse/FLINK-33975 Project: Flink Issue Type: Sub-task Reporter: Peter Vary Create new tests for the SinkV2 api transformations, and migrate some of the tests to use the new API. Some of the old test should be kept using the old API to make sure that the backward compatibility is tested until the deprecation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33974) Implement the Sink transformation depending on the new SinkV2 interfaces
Peter Vary created FLINK-33974: -- Summary: Implement the Sink transformation depending on the new SinkV2 interfaces Key: FLINK-33974 URL: https://issues.apache.org/jira/browse/FLINK-33974 Project: Flink Issue Type: Sub-task Reporter: Peter Vary Implement the changes to the Sink transformation which should depend only on the new API interfaces. The tests should remain the same, to ensure backward compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33973) Add new interfaces for SinkV2 to synchronize the API with the SourceV2 API
Peter Vary created FLINK-33973: -- Summary: Add new interfaces for SinkV2 to synchronize the API with the SourceV2 API Key: FLINK-33973 URL: https://issues.apache.org/jira/browse/FLINK-33973 Project: Flink Issue Type: Sub-task Reporter: Peter Vary Create the new interfaces, set inheritance and deprecation to finalize the interface. After this change the new interafaces will exits, but they will not be functional. The existing interfaces, and test should be working without issue, to verify that adding the API will be backward compatible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33972) Enhance and synchronize Sink API to match the Source API
Peter Vary created FLINK-33972: -- Summary: Enhance and synchronize Sink API to match the Source API Key: FLINK-33972 URL: https://issues.apache.org/jira/browse/FLINK-33972 Project: Flink Issue Type: Improvement Components: API / DataStream Reporter: Peter Vary Umbrella jira for the implementation of https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33528) Externalize Python connector code
[ https://issues.apache.org/jira/browse/FLINK-33528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799320#comment-17799320 ] Peter Vary commented on FLINK-33528: [~Sergey Nuyanzin]: FLINK-33762 is needed, so the connectors could release their on Python package. I would remove the code from the Flink code only after the connector packages are released. > Externalize Python connector code > - > > Key: FLINK-33528 > URL: https://issues.apache.org/jira/browse/FLINK-33528 > Project: Flink > Issue Type: Technical Debt > Components: API / Python, Connectors / Common >Affects Versions: 1.18.0 >Reporter: Márton Balassi >Assignee: Peter Vary >Priority: Major > Fix For: 1.19.0 > > > During the connector externalization effort end to end tests for the python > connectors were left in the main repository under: > [https://github.com/apache/flink/tree/master/flink-python/pyflink/datastream/connectors] > These include both python connector implementation and tests. Currently they > depend on a previously released version of the underlying connectors, > otherwise they would introduce a circular dependency given that they are in > the flink repo at the moment. > This setup prevents us from propagating any breaking change to PublicEvolving > and Internal APIs used by the connectors as they lead to breaking the python > e2e tests. We run into this while implementing FLINK-25857. > Note that we made the decision to turn off the Python test when merging > FLINK-25857, so now we are forced to fix this until 1.19 such that we can > reenable the test runs - now in the externalized connector repos. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33523) DataType ARRAY fails to cast into Object[]
[ https://issues.apache.org/jira/browse/FLINK-33523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794128#comment-17794128 ] Peter Vary commented on FLINK-33523: Created a thread about this topic on the mailing list: https://lists.apache.org/thread/m4c879y8mb7hbn2kkjh9h3d8g1jphh3j I would appreciate if you can share your thoughts there [~prabhujoseph], [~jeyhun], [~aitozi], [~jark], [~xccui] > DataType ARRAY fails to cast into Object[] > > > Key: FLINK-33523 > URL: https://issues.apache.org/jira/browse/FLINK-33523 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Affects Versions: 1.18.0 >Reporter: Prabhu Joseph >Priority: Major > > When upgrading Iceberg's Flink version to 1.18, we found the Flink-related > unit test case broken due to this issue. The below code used to work fine in > Flink 1.17 but failed after upgrading to 1.18. DataType ARRAY > fails to cast into Object[]. > *Error:* > {code} > Exception in thread "main" java.lang.ClassCastException: [I cannot be cast to > [Ljava.lang.Object; > at FlinkArrayIntNotNullTest.main(FlinkArrayIntNotNullTest.java:18) > {code} > *Repro:* > {code} > import org.apache.flink.table.data.ArrayData; > import org.apache.flink.table.data.GenericArrayData; > import org.apache.flink.table.api.EnvironmentSettings; > import org.apache.flink.table.api.TableEnvironment; > import org.apache.flink.table.api.TableResult; > public class FlinkArrayIntNotNullTest { > public static void main(String[] args) throws Exception { > EnvironmentSettings settings = > EnvironmentSettings.newInstance().inBatchMode().build(); > TableEnvironment env = TableEnvironment.create(settings); > env.executeSql("CREATE TABLE filesystemtable2 (id INT, data ARRAY NOT NULL>) WITH ('connector' = 'filesystem', 'path' = > '/tmp/FLINK/filesystemtable2', 'format'='json')"); > env.executeSql("INSERT INTO filesystemtable2 VALUES (4,ARRAY [1,2,3])"); > TableResult tableResult = env.executeSql("SELECT * from > filesystemtable2"); > ArrayData actualArrayData = new GenericArrayData((Object[]) > tableResult.collect().next().getField(1)); > } > } > {code} > *Analysis:* > 1. The code works fine with ARRAY datatype. The issue happens when using > ARRAY. > 2. The code works fine when casting into int[] instead of Object[]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33762) Versioned release of flink-connector-shared-utils python scripts
Peter Vary created FLINK-33762: -- Summary: Versioned release of flink-connector-shared-utils python scripts Key: FLINK-33762 URL: https://issues.apache.org/jira/browse/FLINK-33762 Project: Flink Issue Type: Sub-task Components: API / Python, Connectors / Common Reporter: Peter Vary We need a versioned release of the scripts stored in flink-connector-shared-utils/python directory. This will allow even incompatible changes for these scripts. The connector developers could chose which version of the scripts they depend on. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-25857) Add committer metrics to track the status of committables
[ https://issues.apache.org/jira/browse/FLINK-25857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789963#comment-17789963 ] Peter Vary commented on FLINK-25857: If you check the discussion [1] the diamond inheritance of the `Sink.createWriter` method prevents any backward compatible change of the method. One could argue that this is a flawed design. *About the process and the compatibility* [~Weijie Guo]: Here is my understanding of the FLIP process, please correct me, if I am wrong somewhere: - If there is a change which modifies or creates a new API we should create a FLIP to discuss the change [2] - We start the discussion on the mailing list, so everyone who is interested in, could participate [3] - If there is a consensus on the design, we start a voting thread [4] - If the voting is successful, we announce the result and close the FLIP [5] - If during the implementation we found issues we discuss it there - we do not modify the finalised FLIP [6] Maybe it would be good to have an additional step, that when there is a change related to the original design of the FLIP. We should send a letter to the mailing list as well, to notify interested parties who are not following the actual implementation. About the deprecation process, I have been working based on the API compatibility guarantees [7] stated in the docs. Based on the table there a PublicEvolving API should be source and binary compatible for patch releases, but there is no guarantees for minor releases. Maybe the same redesign process happened during the implementation of FLIP-321 [8]? I was not involved there, so I do not have a first hand information. [1] - https://github.com/apache/flink/pull/23555#discussion_r1371740397 [2] - https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink [3] - https://lists.apache.org/thread/v3mrspdlrqrzvbwm0lcgr0j4v03dx97c [4] - https://lists.apache.org/thread/4f7w4n3nywk8ygnwlxk39oncl3cntp3n [5] - https://lists.apache.org/thread/jw39s55tzzpdkzmlh0vshmjnfrjg02nr [6] - https://github.com/apache/flink/pull/23555#discussion_r1369177945 [7] - https://nightlies.apache.org/flink/flink-docs-master/docs/ops/upgrading/#api-compatibility-guarantees [8] - https://cwiki.apache.org/confluence/display/FLINK/FLIP-321%3A+Introduce+an+API+deprecation+process > Add committer metrics to track the status of committables > - > > Key: FLINK-25857 > URL: https://issues.apache.org/jira/browse/FLINK-25857 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Common >Reporter: Fabian Paul >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > Attachments: image-2023-10-20-17-23-09-595.png, screenshot-1.png > > > With Sink V2 we can now track the progress of a committable during committing > and show metrics about the committing status. (i.e. failed, retried, > succeeded). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-25857) Add committer metrics to track the status of committables
[ https://issues.apache.org/jira/browse/FLINK-25857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789335#comment-17789335 ] Peter Vary commented on FLINK-25857: For the Iceberg connector we release separate versions for different versions of Flink. If I understand correctly, this was the case before the separation of the connectors too - every Flink version contained different versions of connectors, and the jars might not work cross versions. Also the `TwoPhaseCommittingSink` is marked with `PublicEvolving` annotation, which means it could change between minor versions of Flink. We push forward these changes, so soon the API could be changed to `Public`, and we can avoid these kind of disruptions in the future. > Add committer metrics to track the status of committables > - > > Key: FLINK-25857 > URL: https://issues.apache.org/jira/browse/FLINK-25857 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Common >Reporter: Fabian Paul >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > Attachments: image-2023-10-20-17-23-09-595.png, screenshot-1.png > > > With Sink V2 we can now track the progress of a committable during committing > and show metrics about the committing status. (i.e. failed, retried, > succeeded). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33568) SinkV2MetricsITCase.testCommitterMetrics fails with NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-33568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786947#comment-17786947 ] Peter Vary commented on FLINK-33568: With the patch it looks like this: !Screenshot 2023-11-16 at 22.30.51.png|width=612,height=128! > SinkV2MetricsITCase.testCommitterMetrics fails with NullPointerException > > > Key: FLINK-33568 > URL: https://issues.apache.org/jira/browse/FLINK-33568 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: pull-request-available > Attachments: Screenshot 2023-11-16 at 22.30.51.png > > > {code} > Nov 16 01:48:57 01:48:57.537 [ERROR] Tests run: 2, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 6.023 s <<< FAILURE! - in > org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase > Nov 16 01:48:57 01:48:57.538 [ERROR] > org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.testCommitterMetrics > Time elapsed: 0.745 s <<< ERROR! > Nov 16 01:48:57 java.lang.NullPointerException > Nov 16 01:48:57 at > org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.assertSinkCommitterMetrics(SinkV2MetricsITCase.java:254) > Nov 16 01:48:57 at > org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.testCommitterMetrics(SinkV2MetricsITCase.java:153) > Nov 16 01:48:57 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > [...] > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54602&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=8546 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54602&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8605 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33568) SinkV2MetricsITCase.testCommitterMetrics fails with NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-33568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated FLINK-33568: --- Attachment: Screenshot 2023-11-16 at 22.30.51.png > SinkV2MetricsITCase.testCommitterMetrics fails with NullPointerException > > > Key: FLINK-33568 > URL: https://issues.apache.org/jira/browse/FLINK-33568 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Priority: Critical > Labels: pull-request-available > Attachments: Screenshot 2023-11-16 at 22.30.51.png > > > {code} > Nov 16 01:48:57 01:48:57.537 [ERROR] Tests run: 2, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 6.023 s <<< FAILURE! - in > org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase > Nov 16 01:48:57 01:48:57.538 [ERROR] > org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.testCommitterMetrics > Time elapsed: 0.745 s <<< ERROR! > Nov 16 01:48:57 java.lang.NullPointerException > Nov 16 01:48:57 at > org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.assertSinkCommitterMetrics(SinkV2MetricsITCase.java:254) > Nov 16 01:48:57 at > org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.testCommitterMetrics(SinkV2MetricsITCase.java:153) > Nov 16 01:48:57 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > [...] > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54602&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=8546 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54602&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8605 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33568) SinkV2MetricsITCase.testCommitterMetrics fails with NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-33568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786814#comment-17786814 ] Peter Vary commented on FLINK-33568: Checking > SinkV2MetricsITCase.testCommitterMetrics fails with NullPointerException > > > Key: FLINK-33568 > URL: https://issues.apache.org/jira/browse/FLINK-33568 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Priority: Critical > > {code} > Nov 16 01:48:57 01:48:57.537 [ERROR] Tests run: 2, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 6.023 s <<< FAILURE! - in > org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase > Nov 16 01:48:57 01:48:57.538 [ERROR] > org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.testCommitterMetrics > Time elapsed: 0.745 s <<< ERROR! > Nov 16 01:48:57 java.lang.NullPointerException > Nov 16 01:48:57 at > org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.assertSinkCommitterMetrics(SinkV2MetricsITCase.java:254) > Nov 16 01:48:57 at > org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.testCommitterMetrics(SinkV2MetricsITCase.java:153) > Nov 16 01:48:57 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > [...] > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54602&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=8546 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54602&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8605 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33295) Separate SinkV2 and SinkV1Adapter tests
[ https://issues.apache.org/jira/browse/FLINK-33295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784689#comment-17784689 ] Peter Vary edited comment on FLINK-33295 at 11/10/23 6:23 AM: -- _InternalSinkWriterMetricGroup_ is an initial class, so in theory connectors should not use it. * How much effort would it be to enable the annotation check for the connectors? * We can expose the _MetricsGroupTestUtils_ in a test jar, if we see that the connectors would like use it for testing. Thanks for the heads-up! Peter was (Author: pvary): `InternalSinkWriterMetricGroup` is an initial class, so in theory connectors should not use it. * How much effort would it be to enable the annotation check for the connectors? * We can expose the `MetricsGroupTestUtils` in a test jar, if we see that the connectors would like use it for testing. Thanks for the heads-up! Peter > Separate SinkV2 and SinkV1Adapter tests > --- > > Key: FLINK-33295 > URL: https://issues.apache.org/jira/browse/FLINK-33295 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream, Connectors / Common >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > Current SinkV2 tests are based on the sink generated by the > _org.apache.flink.streaming.runtime.operators.sink.TestSink_ test class. This > test class does not generate the SinkV2 directly, but generates a SinkV1 and > wraps in with a > _org.apache.flink.streaming.api.transformations.SinkV1Adapter._ While this > tests the SinkV2, but only as it is aligned with SinkV1, and the > SinkV1Adapter. > We should have tests where we create a SinkV2 directly and the functionality > is tested without the adapter. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33295) Separate SinkV2 and SinkV1Adapter tests
[ https://issues.apache.org/jira/browse/FLINK-33295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784689#comment-17784689 ] Peter Vary commented on FLINK-33295: `InternalSinkWriterMetricGroup` is an initial class, so in theory connectors should not use it. * How much effort would it be to enable the annotation check for the connectors? * We can expose the `MetricsGroupTestUtils` in a test jar, if we see that the connectors would like use it for testing. Thanks for the heads-up! Peter > Separate SinkV2 and SinkV1Adapter tests > --- > > Key: FLINK-33295 > URL: https://issues.apache.org/jira/browse/FLINK-33295 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream, Connectors / Common >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > Current SinkV2 tests are based on the sink generated by the > _org.apache.flink.streaming.runtime.operators.sink.TestSink_ test class. This > test class does not generate the SinkV2 directly, but generates a SinkV1 and > wraps in with a > _org.apache.flink.streaming.api.transformations.SinkV1Adapter._ While this > tests the SinkV2, but only as it is aligned with SinkV1, and the > SinkV1Adapter. > We should have tests where we create a SinkV2 directly and the functionality > is tested without the adapter. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33328) Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable
Peter Vary created FLINK-33328: -- Summary: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable Key: FLINK-33328 URL: https://issues.apache.org/jira/browse/FLINK-33328 Project: Flink Issue Type: Sub-task Components: API / DataStream, Connectors / Common Reporter: Peter Vary In case of the Iceberg Sink, we would like to use the _WithPreCommitTopology_ to aggregate the writer results and create a single committable from them. So we would like to change both the type, and the number of the messages. Using the current _WithPreCommitTopology_ interface we can work around the issue by using a Tuple, or POJO where some of the fields are used only before the _addPreCommitTopology_ method, and some of the fields are only used after the method, but this seems more like abusing the interface than using it. This is a more generic issue where the _WithPreCommitTopology_ should provide a way to transform not only the data, but the type of the data channelled through it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33295) Separate SinkV2 and SinkV1Adapter tests
Peter Vary created FLINK-33295: -- Summary: Separate SinkV2 and SinkV1Adapter tests Key: FLINK-33295 URL: https://issues.apache.org/jira/browse/FLINK-33295 Project: Flink Issue Type: Sub-task Components: API / DataStream, Connectors / Common Reporter: Peter Vary Current SinkV2 tests are based on the sink generated by the _org.apache.flink.streaming.runtime.operators.sink.TestSink_ test class. This test class does not generate the SinkV2 directly, but generates a SinkV1 and wraps in with a _org.apache.flink.streaming.api.transformations.SinkV1Adapter._ While this tests the SinkV2, but only as it is aligned with SinkV1, and the SinkV1Adapter. We should have tests where we create a SinkV2 directly and the functionality is tested without the adapter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32046) OOM caused by SplitAssignmentTracker.uncheckpointedAssignments
[ https://issues.apache.org/jira/browse/FLINK-32046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated FLINK-32046: --- Description: If the checkpointing is turned off then the {{SplitAssignmentTracker.uncheckpointedAssignments}} is never cleared and grows indefinitely. Eventually leading to OOM. The only other place which would remove elements from this map is {{{}getAndRemoveUncheckpointedAssignment{}}}, but it is only for failure scenarios. By my understanding this problem exists since the introduction of the new {{Source}} implementation. was: If the checkpointing is turned off then the {{SplitAssignmentTracker.uncheckpointedAssignments}} is never cleared and grows indefinitely. Eventually leading to OOM. The only other place which would remove elements from this map is {{{}getAndRemoveUncheckpointedAssignment{}}}, but it is only for failure scenarios. By my understanding this problem exists since the introduction of the new source code. > OOM caused by SplitAssignmentTracker.uncheckpointedAssignments > -- > > Key: FLINK-32046 > URL: https://issues.apache.org/jira/browse/FLINK-32046 > Project: Flink > Issue Type: Bug > Components: API / Core >Reporter: Peter Vary >Priority: Major > > If the checkpointing is turned off then the > {{SplitAssignmentTracker.uncheckpointedAssignments}} is never cleared and > grows indefinitely. Eventually leading to OOM. > The only other place which would remove elements from this map is > {{{}getAndRemoveUncheckpointedAssignment{}}}, but it is only for failure > scenarios. > By my understanding this problem exists since the introduction of the new > {{Source}} implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32046) OOM caused by SplitAssignmentTracker.uncheckpointedAssignments
Peter Vary created FLINK-32046: -- Summary: OOM caused by SplitAssignmentTracker.uncheckpointedAssignments Key: FLINK-32046 URL: https://issues.apache.org/jira/browse/FLINK-32046 Project: Flink Issue Type: Bug Components: API / Core Reporter: Peter Vary If the checkpointing is turned off then the {{SplitAssignmentTracker.uncheckpointedAssignments}} is never cleared and grows indefinitely. Eventually leading to OOM. The only other place which would remove elements from this map is {{{}getAndRemoveUncheckpointedAssignment{}}}, but it is only for failure scenarios. By my understanding this problem exists since the introduction of the new source code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31868) Fix DefaultInputSplitAssigner javadoc for class
Peter Vary created FLINK-31868: -- Summary: Fix DefaultInputSplitAssigner javadoc for class Key: FLINK-31868 URL: https://issues.apache.org/jira/browse/FLINK-31868 Project: Flink Issue Type: Improvement Components: API / Core Reporter: Peter Vary Based on the discussion[1] on the mailing list {{there is no requirement of the order of splits by Flink itself}}, we should fix the discrepancy between the code and the comment by updating the comment. [[1] https://lists.apache.org/thread/74m7z2kzgpzylhrp1oq4lz37pnqjmbkh|https://lists.apache.org/thread/74m7z2kzgpzylhrp1oq4lz37pnqjmbkh] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31246) Beautify the SpecChange message
[ https://issues.apache.org/jira/browse/FLINK-31246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated FLINK-31246: --- Summary: Beautify the SpecChange message (was: Remove PodTemplate description from the SpecChange message) > Beautify the SpecChange message > --- > > Key: FLINK-31246 > URL: https://issues.apache.org/jira/browse/FLINK-31246 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Peter Vary >Priority: Major > Labels: pull-request-available > > Currently the Spec Change message contains the full PodTemplate twice. > This makes the message seriously big and also contains very little useful > information. > We should abbreviate the message -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31246) Remove PodTemplate description from the SpecChange message
[ https://issues.apache.org/jira/browse/FLINK-31246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694443#comment-17694443 ] Peter Vary edited comment on FLINK-31246 at 2/28/23 8:57 AM: - Talked with [~gyfora] about this, and he is concerned that this would be a breaking change for some users and could cause issues for them was (Author: pvary): Talked with [~gyfora] about this, and he is concerned that this would be a breaking change for some users and could cause issues for other users > Remove PodTemplate description from the SpecChange message > -- > > Key: FLINK-31246 > URL: https://issues.apache.org/jira/browse/FLINK-31246 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Peter Vary >Priority: Major > > Currently the Spec Change message contains the full PodTemplate twice. > This makes the message seriously big and also contains very little useful > information. > We should abbreviate the message -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31246) Remove PodTemplate description from the SpecChange message
[ https://issues.apache.org/jira/browse/FLINK-31246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694443#comment-17694443 ] Peter Vary commented on FLINK-31246: Talked with [~gyfora] about this, and he is concerned that this would be a breaking change for some users and could cause issues for other users > Remove PodTemplate description from the SpecChange message > -- > > Key: FLINK-31246 > URL: https://issues.apache.org/jira/browse/FLINK-31246 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Peter Vary >Priority: Major > > Currently the Spec Change message contains the full PodTemplate twice. > This makes the message seriously big and also contains very little useful > information. > We should abbreviate the message -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31246) Remove PodTemplate description from the SpecChange message
Peter Vary created FLINK-31246: -- Summary: Remove PodTemplate description from the SpecChange message Key: FLINK-31246 URL: https://issues.apache.org/jira/browse/FLINK-31246 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Peter Vary Currently the Spec Change message contains the full PodTemplate twice. This makes the message seriously big and also contains very little useful information. We should abbreviate the message -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-30543) Adding more examples for setting up jobs via operator.
[ https://issues.apache.org/jira/browse/FLINK-30543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated FLINK-30543: --- Component/s: Kubernetes Operator > Adding more examples for setting up jobs via operator. > -- > > Key: FLINK-30543 > URL: https://issues.apache.org/jira/browse/FLINK-30543 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Sriram Ganesh >Priority: Minor > > Currently, we have only basic examples which help to see how to run the job > via an operator if we can add more examples for all upgrade modes that would > be more helpful. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30367) Enrich the thread dump info with deeper stack
[ https://issues.apache.org/jira/browse/FLINK-30367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646136#comment-17646136 ] Peter Vary commented on FLINK-30367: The depth of the stack trace could be set by the {{cluster.thread-dump.stacktrace-max-depth}} configuration value. See: https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#cluster-thread-dump-stacktrace-max-depth > Enrich the thread dump info with deeper stack > - > > Key: FLINK-30367 > URL: https://issues.apache.org/jira/browse/FLINK-30367 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Task, Runtime / Web Frontend >Reporter: Yun Tang >Assignee: Yu Chen >Priority: Major > Fix For: 1.17.0 > > > Currently, we only have the thread dump info with very few stack depth, and > we cannot see the thread information in details. It would be useful to enrich > the thread dump info. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-30315) Add more information about image pull failures to the operator log
[ https://issues.apache.org/jira/browse/FLINK-30315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated FLINK-30315: --- Description: When there is an image pull error, this is what we see in the operator log: {code:java} org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: Back-off pulling image "flink:1.14" at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.checkContainerBackoff(AbstractFlinkDeploymentObserver.java:194) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeJmDeployment(AbstractFlinkDeploymentObserver.java:150) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:84) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55) at org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56) at org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:113) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54) at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136) at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94) at org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54) at io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) {code} This is the information we have on kubernetes side: {code:java} Normal Scheduled 2m19s default-scheduler Successfully assigned default/quickstart-base-86787586cd-lb7j6 to minikube Warning Failed 20s kubelet Failed to pull image "flink:1.14": rpc error: code = Unknown desc = context deadline exceeded *Warning Failed 20s kubelet Error*: ErrImagePull Normal BackOff 19s kubelet Back-off pulling image "flink:1.14" *Warning Failed 19s kubelet Error*: ImagePullBackOff Normal Pulling 7s (x2 over 2m19s) kubelet Pulling image "flink:1.14" {code} It would be good to add the additional message (in this case {{{}Failed to pull image "flink:1.14": rpc error: code = Unknown desc = context deadline exceeded{}}}) to the message of the {{DeploymentFailedException}} for traceability. was: When there is an image pull error, this is what we see in the operator log: {code:java} org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: Back-off pulling image "flink:1.14" at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.checkContainerBackoff(AbstractFlinkDeploymentObserver.java:194) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeJmDeployment(AbstractFlinkDeploymentObserver.java:150) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:84) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55) at org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56) at org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32) at org.apache.flink.kubernetes.o
[jira] [Commented] (FLINK-30315) Add more information about image pull failures to the operator log
[ https://issues.apache.org/jira/browse/FLINK-30315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643909#comment-17643909 ] Peter Vary commented on FLINK-30315: The {{ContainerStateWaiting}} contains the message that we want. The issue is that: - For {{ErrImagePull}} we have the correct message: {{Failed to pull image "flink:1.14": rpc error: code = Unknown desc = context deadline exceeded}} - For {{ImagePullBackOff}} we only have this message: {{Back-off pulling image "flink:1.14"}} which is not that useful Based on this, I think we have the following options: # Throw {{DeploymentFailedException}} at {{ErrImagePull}} and add provide the enhanced message. Cons: This throws an error on the first image pull error - previously we retried at least once (I am not sure that this is that important as we continue to monitor the state of the deployment and we act on the state changes anyway) # Store the message in the state and provide it when the ImagePullBackOff failed I would like to hear you opinions about the options, or I am interested in any alternatives you have in mind. Without any different opinions, I would go for option 1. > Add more information about image pull failures to the operator log > -- > > Key: FLINK-30315 > URL: https://issues.apache.org/jira/browse/FLINK-30315 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Peter Vary >Priority: Major > > When there is an image pull error, this is what we see in the operator log: > {code:java} > org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: > Back-off pulling image "flink:1.14" > at > org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.checkContainerBackoff(AbstractFlinkDeploymentObserver.java:194) > at > org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeJmDeployment(AbstractFlinkDeploymentObserver.java:150) > at > org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:84) > at > org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55) > at > org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56) > at > org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32) > at > org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:113) > at > org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54) > at > io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136) > at > io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94) > at > org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) > at > io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93) > at > io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130) > at > io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110) > at > io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81) > at > io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54) > at > io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) {code} > This is the information we have on kubernetes side: > {code} > Normal Scheduled 2m19s default-scheduler Successfully > assigned > default/quickstart-base-86787586cd-lb7j6 to minikube > Warning Failed 20s kubelet Failed to pull > image "flink:1.14": rpc error: code = Unknown desc = context deadline exceeded > *Warning Failed 20s kubelet Error*: > ErrImagePull > Normal BackOff 19s kubelet Back-off pulling > image "flink:1.14" > *Warning Failed 19s kubelet
[jira] [Created] (FLINK-30315) Add more information about image pull failures to the operator log
Peter Vary created FLINK-30315: -- Summary: Add more information about image pull failures to the operator log Key: FLINK-30315 URL: https://issues.apache.org/jira/browse/FLINK-30315 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Peter Vary When there is an image pull error, this is what we see in the operator log: {code:java} org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: Back-off pulling image "flink:1.14" at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.checkContainerBackoff(AbstractFlinkDeploymentObserver.java:194) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeJmDeployment(AbstractFlinkDeploymentObserver.java:150) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:84) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55) at org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56) at org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:113) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54) at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136) at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94) at org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54) at io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) {code} This is the information we have on kubernetes side: {code} Normal Scheduled 2m19s default-scheduler Successfully assigned default/quickstart-base-86787586cd-lb7j6 to minikube Warning Failed 20s kubelet Failed to pull image "flink:1.14": rpc error: code = Unknown desc = context deadline exceeded *Warning Failed 20s kubelet Error*: ErrImagePull Normal BackOff 19s kubelet Back-off pulling image "flink:1.14" *Warning Failed 19s kubelet Error*: ImagePullBackOff Normal Pulling 7s (x2 over 2m19s) kubelet Pulling image "flink:1.14" {code} It would be good to add the additional message (in this case {{Failed to pull image "flink:1.14": rpc error: code = Unknown desc = context deadline exceeded}}) to the message of the {{DeploymentFailedException}} for tracebility. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30311) CI error: Back-off pulling image "flink:1.14"
Peter Vary created FLINK-30311: -- Summary: CI error: Back-off pulling image "flink:1.14" Key: FLINK-30311 URL: https://issues.apache.org/jira/browse/FLINK-30311 Project: Flink Issue Type: Sub-task Reporter: Peter Vary CI failed with: {{Flink Deployment failed 2022-12-06T08:45:03.0244383Z org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: Back-off pulling image "flink:1.14"}} We should find the root cause of this issue and try to mitigate it. [https://github.com/apache/flink-kubernetes-operator/actions/runs/3627824632/jobs/6118131271] {code:java} 2022-12-06T08:45:03.0243558Z [m[33m2022-12-06 08:41:44,716[m [36mo.a.f.k.o.c.FlinkDeploymentController[m [1;31m[ERROR][default/flink-example-statemachine] Flink Deployment failed 2022-12-06T08:45:03.0244383Z org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: Back-off pulling image "flink:1.14" 2022-12-06T08:45:03.0245385Zat org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.checkContainerBackoff(AbstractFlinkDeploymentObserver.java:194) 2022-12-06T08:45:03.0246604Zat org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeJmDeployment(AbstractFlinkDeploymentObserver.java:150) 2022-12-06T08:45:03.0247780Zat org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:84) 2022-12-06T08:45:03.0248934Zat org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55) 2022-12-06T08:45:03.0249941Zat org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56) 2022-12-06T08:45:03.0250844Zat org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32) 2022-12-06T08:45:03.0252038Zat org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:113) 2022-12-06T08:45:03.0252936Zat org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54) 2022-12-06T08:45:03.0253850Zat io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136) 2022-12-06T08:45:03.0254412Zat io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94) 2022-12-06T08:45:03.0255322Zat org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) 2022-12-06T08:45:03.0256081Zat io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93) 2022-12-06T08:45:03.0256872Zat io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130) 2022-12-06T08:45:03.0257804Zat io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110) 2022-12-06T08:45:03.0258720Zat io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81) 2022-12-06T08:45:03.0259635Zat io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54) 2022-12-06T08:45:03.0260448Zat io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406) 2022-12-06T08:45:03.0261070Zat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 2022-12-06T08:45:03.0261595Zat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 2022-12-06T08:45:03.0262005Zat java.base/java.lang.Thread.run(Unknown Source) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30150) Evaluate operator error log whitelist entry: REST service in session cluster is bad now
[ https://issues.apache.org/jira/browse/FLINK-30150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643305#comment-17643305 ] Peter Vary commented on FLINK-30150: This is the exception in the logs: {code:java} 2022-12-05T11:40:59.2665289Z [m[33m2022-12-05 11:40:26,746[m [36mo.a.f.k.o.o.d.SessionObserver [m [1;31m[ERROR][default/session-cluster-1] REST service in session cluster is bad now 2022-12-05T11:40:59.2665851Z java.util.concurrent.TimeoutException 2022-12-05T11:40:59.2666258Zat java.base/java.util.concurrent.CompletableFuture.timedGet(Unknown Source) 2022-12-05T11:40:59.2666841Zat java.base/java.util.concurrent.CompletableFuture.get(Unknown Source) 2022-12-05T11:40:59.2667549Zat org.apache.flink.kubernetes.operator.service.AbstractFlinkService.listJobs(AbstractFlinkService.java:231) 2022-12-05T11:40:59.2668462Zat org.apache.flink.kubernetes.operator.observer.deployment.SessionObserver.observeFlinkCluster(SessionObserver.java:48) 2022-12-05T11:40:59.2669809Zat org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:89) 2022-12-05T11:40:59.2671385Zat org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55) 2022-12-05T11:40:59.2672514Zat org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56) 2022-12-05T11:40:59.2673507Zat org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32) 2022-12-05T11:40:59.2674466Zat org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:113) 2022-12-05T11:40:59.2675692Zat org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54) 2022-12-05T11:40:59.2676509Zat io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136) 2022-12-05T11:40:59.2677043Zat io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94) 2022-12-05T11:40:59.2677741Zat org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) 2022-12-05T11:40:59.2678451Zat io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93) 2022-12-05T11:40:59.2679180Zat io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130) 2022-12-05T11:40:59.2680055Zat io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110) 2022-12-05T11:40:59.2681621Zat io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81) 2022-12-05T11:40:59.2682478Zat io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54) 2022-12-05T11:40:59.2683241Zat io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406) 2022-12-05T11:40:59.2683817Zat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 2022-12-05T11:40:59.2684294Zat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 2022-12-05T11:40:59.2684676Zat java.base/java.lang.Thread.run(Unknown Source) {code} The log line show 2022-12-05 11:40:26,746 as the timestamp. This is happening when we manually kill the job to test the recovery: {code:java} 2022-12-05T11:40:12.8330378Z Successfully verified that sessionjob/flink-example-statemachine.status.jobStatus.state is in RUNNING state. 2022-12-05T11:40:12.9711940Z Kill the session-cluster-1-7bc5b4d7cb-t5hgq 2022-12-05T11:40:13.3083721Z Waiting for log "Restoring job 9b85cb750001 from Checkpoint"... 2022-12-05T11:40:35.8208688Z Log "Restoring job 9b85cb750001 from Checkpoint" shows up. {code} I would say that this is expected. > Evaluate operator error log whitelist entry: REST service in session cluster > is bad now > --- > > Key: FLINK-30150 > URL: https://issues.apache.org/jira/browse/FLINK-30150 > Project: Flink > Issue Type: Sub-task > Components: Kubernetes Operator >Reporter: Gabor Somogyi >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30199) Add a script to run Kubernetes Operator e2e tests manually
Peter Vary created FLINK-30199: -- Summary: Add a script to run Kubernetes Operator e2e tests manually Key: FLINK-30199 URL: https://issues.apache.org/jira/browse/FLINK-30199 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Peter Vary Currently it is very hard to run the Kubernetes Operator e2e tests manually. Especially on MAC. We need to improve upon this to ease of the development process. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29629) FlameGraph is empty for Legacy Source Threads
[ https://issues.apache.org/jira/browse/FLINK-29629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629824#comment-17629824 ] Peter Vary commented on FLINK-29629: Most probably I will not have time to work on this part of the code in the near future :(. Mostly only filed the ticket to document the current situation which is not ideal. I fear that by closing the ticket the 3 of us will be the only people to remember the issue for a while, but if this is the way how we handle these situations in Flink, feel free to go ahead and close the ticket (I am just learning the processes :)). Thanks, for all the help here [~zhuzh]! > FlameGraph is empty for Legacy Source Threads > - > > Key: FLINK-29629 > URL: https://issues.apache.org/jira/browse/FLINK-29629 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend >Reporter: Peter Vary >Priority: Major > > Thread dump gets the stack trace for the {{Custom Source}} thread, but this > thread is always in {{TIMED_WAITING}}: > {code} > "Source: Custom Source -> A random source (1/2)#0" ... >java.lang.Thread.State: TIMED_WAITING (parking) > at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method) > - parking to wait for <0xea775750> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.parkNanos() > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() > at > org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take() > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:335) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) > [..] > {code} > The actual code is run in the {{Legacy Source Thread}}: > {code} > "Legacy Source Thread - Source: Custom Source -> A random source (1/2)#0" ... >java.lang.Thread.State: RUNNABLE > {code} > This causes the WebUI FlameGraph to be empty of any useful data. > This is an example code to reproduce: > {code} > DataStream inputStream = env.addSource(new > RandomRecordSource(recordSize)); > inputStream = inputStream.map(new CounterMapper()); > FlinkSink.forRowData(inputStream).tableLoader(loader).append(); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29754) HadoopConfigLoader should consider Hadoop configuration files
[ https://issues.apache.org/jira/browse/FLINK-29754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623882#comment-17623882 ] Peter Vary commented on FLINK-29754: [This|https://github.com/apache/flink/blob/0e612856772d5f469c7d4a4fff90a58b6e0f5578/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/util/HadoopUtils.java#L59] is the line which causes the issue. When you try to instantiate the {{HdfsConfiguration}} you need the {{hadoop-hdfs}} on the classpath. > HadoopConfigLoader should consider Hadoop configuration files > - > > Key: FLINK-29754 > URL: https://issues.apache.org/jira/browse/FLINK-29754 > Project: Flink > Issue Type: Bug > Components: FileSystems >Reporter: Peter Vary >Priority: Major > > Currently > [HadoopConfigLoader|https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/util/HadoopConfigLoader.java] > considers Hadoop configurations on the classpath, but does not consider > Hadoop configuration files which are set in another way. > So if the Hadoop configuration is set through the {{HADOOP_CONF_DIR}} > environment variable, then the configuration loaded by the HadoopConfigLoader > will not contain the values set there. > This can cause unexpected behaviour when setting checkpoint / savepoint dirs > on S3, and the specific S3 configurations are set in the Hadoop configuration > files -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29754) HadoopConfigLoader should consider Hadoop configuration files
[ https://issues.apache.org/jira/browse/FLINK-29754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623704#comment-17623704 ] Peter Vary commented on FLINK-29754: I create a configuration file like this: {code:java} hadoop.fs.s3a.buffer.dir /flink-data fs.s3a.bucket.probe 0 [..] {code} The configuration values set in the configuration files are available, and used when accessing S3 in the case when this configuration file is on the classpath (packaged in the jar). OTOH, if I create a HADOO_CONF_DIR and put the config files there, then the configuration values are not available and not used when accessing S3. > HadoopConfigLoader should consider Hadoop configuration files > - > > Key: FLINK-29754 > URL: https://issues.apache.org/jira/browse/FLINK-29754 > Project: Flink > Issue Type: Bug > Components: FileSystems >Reporter: Peter Vary >Priority: Major > > Currently > [HadoopConfigLoader|https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/util/HadoopConfigLoader.java] > considers Hadoop configurations on the classpath, but does not consider > Hadoop configuration files which are set in another way. > So if the Hadoop configuration is set through the {{HADOOP_CONF_DIR}} > environment variable, then the configuration loaded by the HadoopConfigLoader > will not contain the values set there. > This can cause unexpected behaviour when setting checkpoint / savepoint dirs > on S3, and the specific S3 configurations are set in the Hadoop configuration > files -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29754) HadoopConfigLoader should consider Hadoop configuration files
Peter Vary created FLINK-29754: -- Summary: HadoopConfigLoader should consider Hadoop configuration files Key: FLINK-29754 URL: https://issues.apache.org/jira/browse/FLINK-29754 Project: Flink Issue Type: Bug Components: FileSystems Reporter: Peter Vary Currently [HadoopConfigLoader|https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/util/HadoopConfigLoader.java] considers Hadoop configurations on the classpath, but does not consider Hadoop configuration files which are set in another way. So if the Hadoop configuration is set through the {{HADOOP_CONF_DIR}} environment variable, then the configuration loaded by the HadoopConfigLoader will not contain the values set there. This can cause unexpected behaviour when setting checkpoint / savepoint dirs on S3, and the specific S3 configurations are set in the Hadoop configuration files -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29629) FlameGraph is empty for Legacy Source Threads
[ https://issues.apache.org/jira/browse/FLINK-29629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623647#comment-17623647 ] Peter Vary commented on FLINK-29629: Thanks [~zhuzh]! Thanks for the info, I checked the linked document and agree with you and [~chesnay] that we should not sink more resources in the Legacy Sources than needed. Also +1 on adding an option to add the stack trace of the extra threads for the operator FlameGraph. In some cases they are not that important as they are not on the critical path, but they are consuming resources and may become a bottleneck, so it would be good to have an option to display them. > FlameGraph is empty for Legacy Source Threads > - > > Key: FLINK-29629 > URL: https://issues.apache.org/jira/browse/FLINK-29629 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend >Reporter: Peter Vary >Priority: Major > > Thread dump gets the stack trace for the {{Custom Source}} thread, but this > thread is always in {{TIMED_WAITING}}: > {code} > "Source: Custom Source -> A random source (1/2)#0" ... >java.lang.Thread.State: TIMED_WAITING (parking) > at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method) > - parking to wait for <0xea775750> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.parkNanos() > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() > at > org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take() > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:335) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) > [..] > {code} > The actual code is run in the {{Legacy Source Thread}}: > {code} > "Legacy Source Thread - Source: Custom Source -> A random source (1/2)#0" ... >java.lang.Thread.State: RUNNABLE > {code} > This causes the WebUI FlameGraph to be empty of any useful data. > This is an example code to reproduce: > {code} > DataStream inputStream = env.addSource(new > RandomRecordSource(recordSize)); > inputStream = inputStream.map(new CounterMapper()); > FlinkSink.forRowData(inputStream).tableLoader(loader).append(); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29713) Kubernetes operator should restart failed jobs
[ https://issues.apache.org/jira/browse/FLINK-29713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622561#comment-17622561 ] Peter Vary commented on FLINK-29713: The main goal is to have a possibility to add a new sink, when a new type of data has arrived. Imagine a job where the incoming data defines the sinks, like a Kafka topics or a database tables. We start with a set of known data types. The main task will create the appropriate topics and sinks based on the current known data types. Later a new record arrives with a new type which needs a new sink. The job needs to be reconfigured and a new sink needs to be added. This way the Flink job can dynamically adjust itself to handle the incoming data. > Kubernetes operator should restart failed jobs > -- > > Key: FLINK-29713 > URL: https://issues.apache.org/jira/browse/FLINK-29713 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Fix For: kubernetes-operator-1.3.0 > > > It would be good to have the possibility to restart the Flink Application if > it goes to {{FAILED}} state. > This could be used to restart, and reconfigure the job dynamically in the > application {{main}} method if the current application can not handle the > incoming data -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29713) Kubernetes operator should restart failed jobs
Peter Vary created FLINK-29713: -- Summary: Kubernetes operator should restart failed jobs Key: FLINK-29713 URL: https://issues.apache.org/jira/browse/FLINK-29713 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Peter Vary It would be good to have the possibility to restart the Flink Application if it goes to {{FAILED}} state. This could be used to restart, and reconfigure the job dynamically in the application {{main}} method if the current application can not handle the incoming data -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29629) FlameGraph is empty for Legacy Source Threads
[ https://issues.apache.org/jira/browse/FLINK-29629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619053#comment-17619053 ] Peter Vary commented on FLINK-29629: [~chesnay]: Does that mean that some of the code below uses deprecated API for creating the {{Source}}? {code} DataStream inputStream = env.addSource(new RandomRecordSource(recordSize)); inputStream = inputStream.map(new CounterMapper()); FlinkSink.forRowData(inputStream).tableLoader(loader).append(); {code} Or it is just there are some ongoing effort to substitute the implementation which is ATM uses the {{Legacy}} Source and after the implementation is finished then it will be use the new Source? Thanks, Peter > FlameGraph is empty for Legacy Source Threads > - > > Key: FLINK-29629 > URL: https://issues.apache.org/jira/browse/FLINK-29629 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend >Reporter: Peter Vary >Priority: Major > > Thread dump gets the stack trace for the {{Custom Source}} thread, but this > thread is always in {{TIMED_WAITING}}: > {code} > "Source: Custom Source -> A random source (1/2)#0" ... >java.lang.Thread.State: TIMED_WAITING (parking) > at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method) > - parking to wait for <0xea775750> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.parkNanos() > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() > at > org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take() > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:335) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) > [..] > {code} > The actual code is run in the {{Legacy Source Thread}}: > {code} > "Legacy Source Thread - Source: Custom Source -> A random source (1/2)#0" ... >java.lang.Thread.State: RUNNABLE > {code} > This causes the WebUI FlameGraph to be empty of any useful data. > This is an example code to reproduce: > {code} > DataStream inputStream = env.addSource(new > RandomRecordSource(recordSize)); > inputStream = inputStream.map(new CounterMapper()); > FlinkSink.forRowData(inputStream).tableLoader(loader).append(); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29629) FlameGraph is empty for Legacy Source Threads
Peter Vary created FLINK-29629: -- Summary: FlameGraph is empty for Legacy Source Threads Key: FLINK-29629 URL: https://issues.apache.org/jira/browse/FLINK-29629 Project: Flink Issue Type: Bug Components: Runtime / Web Frontend Reporter: Peter Vary Thread dump gets the stack trace for the {{Custom Source}} thread, but this thread is always in {{TIMED_WAITING}}: {code} "Source: Custom Source -> A random source (1/2)#0" ... java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method) - parking to wait for <0xea775750> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos() at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() at org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take() at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:335) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) [..] {code} The actual code is run in the {{Legacy Source Thread}}: {code} "Legacy Source Thread - Source: Custom Source -> A random source (1/2)#0" ... java.lang.Thread.State: RUNNABLE {code} This causes the WebUI FlameGraph to be empty of any useful data. This is an example code to reproduce: {code} DataStream inputStream = env.addSource(new RandomRecordSource(recordSize)); inputStream = inputStream.map(new CounterMapper()); FlinkSink.forRowData(inputStream).tableLoader(loader).append(); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-29123) Dynamic paramters are not pushed to working with kubernetes
[ https://issues.apache.org/jira/browse/FLINK-29123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated FLINK-29123: --- Summary: Dynamic paramters are not pushed to working with kubernetes (was: Dynamic paramters are not pushed to working with kubertenes) > Dynamic paramters are not pushed to working with kubernetes > --- > > Key: FLINK-29123 > URL: https://issues.apache.org/jira/browse/FLINK-29123 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.15.2 >Reporter: Peter Vary >Priority: Major > > It is not possible to push dynamic parameters for the kubernetes deployments -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29123) Dynamic paramters are not pushed to working with kubertenes
Peter Vary created FLINK-29123: -- Summary: Dynamic paramters are not pushed to working with kubertenes Key: FLINK-29123 URL: https://issues.apache.org/jira/browse/FLINK-29123 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.15.2 Reporter: Peter Vary It is not possible to push dynamic parameters for the kubernetes deployments -- This message was sent by Atlassian Jira (v8.20.10#820010)