[jira] [Comment Edited] (FLINK-34043) Remove deprecated Sink V2 interfaces

2024-09-03 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878995#comment-17878995
 ] 

Peter Vary edited comment on FLINK-34043 at 9/3/24 7:23 PM:


[~guoweijie]: It would be nice if you could remove the deprecated 
interfaces/methods.

Thanks, Peter 


was (Author: pvary):
[~guoweijie]: It would be nice if you could remove the deprecated 
interfaces/methods?

> Remove deprecated Sink V2 interfaces
> 
>
> Key: FLINK-34043
> URL: https://issues.apache.org/jira/browse/FLINK-34043
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> In Flink 1.20.0 we should remove the interfaces deprecated by FLINK-33973



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34043) Remove deprecated Sink V2 interfaces

2024-09-03 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878995#comment-17878995
 ] 

Peter Vary commented on FLINK-34043:


[~guoweijie]: It would be nice if you could remove the deprecated 
interfaces/methods?

> Remove deprecated Sink V2 interfaces
> 
>
> Key: FLINK-34043
> URL: https://issues.apache.org/jira/browse/FLINK-34043
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> In Flink 1.20.0 we should remove the interfaces deprecated by FLINK-33973



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34840) [3.1][pipeline-connectors] Add Implementation of DataSink in Iceberg.

2024-05-26 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849623#comment-17849623
 ] 

Peter Vary commented on FLINK-34840:


If you need help with Iceberg reviews, feel free to ping me.

> [3.1][pipeline-connectors] Add Implementation of DataSink in Iceberg.
> -
>
> Key: FLINK-34840
> URL: https://issues.apache.org/jira/browse/FLINK-34840
> Project: Flink
>  Issue Type: Improvement
>  Components: Flink CDC
>Reporter: Flink CDC Issue Import
>Priority: Major
>  Labels: github-import
>
> ### Search before asking
> - [X] I searched in the 
> [issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
> nothing similar.
> ### Motivation
> Add pipeline sink Implementation for https://github.com/apache/iceberg.
> ### Solution
> _No response_
> ### Alternatives
> _No response_
> ### Anything else?
> _No response_
> ### Are you willing to submit a PR?
> - [ ] I'm willing to submit a PR!
>  Imported from GitHub 
> Url: https://github.com/apache/flink-cdc/issues/2863
> Created by: [lvyanquan|https://github.com/lvyanquan]
> Labels: enhancement, 
> Created at: Wed Dec 13 14:37:54 CST 2023
> State: open



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35051) Weird priorities when processing unaligned checkpoints

2024-04-08 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835142#comment-17835142
 ] 

Peter Vary commented on FLINK-35051:


FLINK-34704 is one of the ways this issue materializes

> Weird priorities when processing unaligned checkpoints
> --
>
> Key: FLINK-35051
> URL: https://issues.apache.org/jira/browse/FLINK-35051
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing, Runtime / Network, Runtime / 
> Task
>Affects Versions: 1.16.3, 1.17.2, 1.19.0, 1.18.1
>Reporter: Piotr Nowojski
>Priority: Major
>
> While looking through the code I noticed that `StreamTask` is processing 
> unaligned checkpoints in strange order/priority. The end result is that 
> unaligned checkpoint `Start Delay` /  triggering checkpoints in `StreamTask` 
> can be unnecessary delayed by other mailbox actions in the system, like for 
> example:
> * processing time timers
> * `AsyncWaitOperator` results
> * ... 
> Incoming UC barrier is treated as a priority event by the network stack (it 
> will be polled from the input before anything else). This is what we want, 
> but polling elements from network stack has lower priority then processing 
> enqueued mailbox actions.
> Secondly, if AC barrier timeout to UC, that's done via a mailbox action, but 
> this mailbox action is also not prioritised in any way, so other mailbox 
> actions could be unnecessarily executed first. 
> On top of that there is a clash of two separate concepts here:
> # Mailbox priority. yieldToDownstream - so in a sense reverse to what we 
> would like to have for triggering checkpoint, but that only kicks in #yield() 
> calls, where it's actually correct, that operator in a middle of execution 
> can not yield to checkpoint - it should only yield to downstream.
> # Control mails in mailbox executor - cancellation is done via that, it 
> bypasses whole mailbox queue.
> # Priority events in the network stack.
> It's unfortunate that 1. vs 3. has a naming clash, as priority name is used 
> in both things, and highest network priority event containing UC barrier, 
> when executed via mailbox has actually the lowest mailbox priority.
> Control mails mechanism is a kind of priority mails executed out of order, 
> but doesn't generalise well for use in checkpointing.
> This whole thing should be re-worked at some point. Ideally what we would 
> like have is that:
> * mail to convert AC barriers to UC
> * polling UC barrier from the network input
> * checkpoint trigger via RPC for source tasks
> should be processed first, with an exception of yieldToDownstream, where 
> current mailbox priorities should be adhered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34042) Update the documentation for Sink V2

2024-02-06 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814739#comment-17814739
 ] 

Peter Vary commented on FLINK-34042:


Currently there is no existing documentation for the Sink V2 API.

It would be good to create on, but this is not a blocker

> Update the documentation for Sink V2
> 
>
> Key: FLINK-34042
> URL: https://issues.apache.org/jira/browse/FLINK-34042
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> Check the documentation and update the Sink V2 API usages whenever it is 
> needed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33328) Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2024-02-06 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary closed FLINK-33328.
--
Resolution: Duplicate

Solved as part of FLINK-33972

> Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the 
> Committable
> ---
>
> Key: FLINK-33328
> URL: https://issues.apache.org/jira/browse/FLINK-33328
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream, Connectors / Common
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>
> In case of the Iceberg Sink, we would like to use the _WithPreCommitTopology_ 
> to aggregate the writer results and create a single committable from them. So 
> we would like to change both the type, and the number of the messages. Using 
> the current _WithPreCommitTopology_ interface we can work around the issue by 
> using a Tuple, or POJO where some of the fields are used only before the 
> _addPreCommitTopology_ method, and some of the fields are only used after the 
> method, but this seems more like abusing the interface than using it.
> This is a more generic issue where the _WithPreCommitTopology_ should provide 
> a way to transform not only the data, but the type of the data channelled 
> through it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34307) Release Testing Instructions: Verify FLINK-33972 Enhance and synchronize Sink API to match the Source API

2024-02-06 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary closed FLINK-34307.
--
Resolution: Won't Fix

[~lincoln.86xy]: Thanks for managing the release.
For now, the unit tests are sufficient.

> Release Testing Instructions: Verify FLINK-33972 Enhance and synchronize Sink 
> API to match the Source API
> -
>
> Key: FLINK-34307
> URL: https://issues.apache.org/jira/browse/FLINK-34307
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Peter Vary
>Priority: Blocker
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-34306) Release Testing Instructions: Verify FLINK-25857 Add committer metrics to track the status of committables

2024-02-06 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved FLINK-34306.

Resolution: Won't Fix

> Release Testing Instructions: Verify FLINK-25857 Add committer metrics to 
> track the status of committables 
> ---
>
> Key: FLINK-34306
> URL: https://issues.apache.org/jira/browse/FLINK-34306
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Common
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Peter Vary
>Priority: Blocker
> Fix For: 1.19.0
>
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34306) Release Testing Instructions: Verify FLINK-25857 Add committer metrics to track the status of committables

2024-02-06 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814733#comment-17814733
 ] 

Peter Vary commented on FLINK-34306:


[~lincoln.86xy]: Thanks for managing the release.
For now, the unit tests are sufficient.

> Release Testing Instructions: Verify FLINK-25857 Add committer metrics to 
> track the status of committables 
> ---
>
> Key: FLINK-34306
> URL: https://issues.apache.org/jira/browse/FLINK-34306
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Common
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Peter Vary
>Priority: Blocker
> Fix For: 1.19.0
>
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-25857) Add committer metrics to track the status of committables

2024-01-31 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated FLINK-25857:
---
Release Note: 
The TwoPhaseCommittingSink.createCommitter method parametrization has been 
changed, a new CommitterInitContext parameter has been added.

The original method will remain available during the 1.19 release line, but 
they will be removed in consecutive releases.

When migrating please also consider changes introduced by FLINK-33973 and 
FLIP-372 
(https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API)

> Add committer metrics to track the status of committables
> -
>
> Key: FLINK-25857
> URL: https://issues.apache.org/jira/browse/FLINK-25857
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Common
>Reporter: Fabian Paul
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: image-2023-10-20-17-23-09-595.png, screenshot-1.png
>
>
> With Sink V2 we can now track the progress of a committable during committing 
> and show metrics about the committing status. (i.e. failed, retried, 
> succeeded).
> The voted FLIP 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33973) Add new interfaces for SinkV2 to synchronize the API with the SourceV2 API

2024-01-31 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated FLINK-33973:
---
Release Note: 
According to FILP-372 
(https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API)
 the SinkV2 API has been changed.

The following interfaces are deprecated: TwoPhaseCommittingSink, StatefulSink, 
WithPreWriteTopology, WithPreCommitTopology, WithPostCommitTopology.

The following new interfaces has been introduced: CommitterInitContext, 
CommittingSinkWriter, WriterInitContext, StatefulSinkWriter.

The following interface method's parameter has been changed: Sink.createWriter

The original interfaces will remain available during the 1.19 release line, but 
they will be removed in consecutive releases. For the changes required when 
migrating, please consult the Migration Plan detailed in the FLIP

> Add new interfaces for SinkV2 to synchronize the API with the SourceV2 API
> --
>
> Key: FLINK-33973
> URL: https://issues.apache.org/jira/browse/FLINK-33973
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Create the new interfaces, set inheritance and deprecation to finalize the 
> interface.
> After this change the new interafaces will exits, but they will not be 
> functional.
> The existing interfaces, and test should be working without issue, to verify 
> that adding the API will be backward compatible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34228) Add long UTF serializer/deserializer

2024-01-24 Thread Peter Vary (Jira)
Peter Vary created FLINK-34228:
--

 Summary: Add long UTF serializer/deserializer
 Key: FLINK-34228
 URL: https://issues.apache.org/jira/browse/FLINK-34228
 Project: Flink
  Issue Type: Improvement
Reporter: Peter Vary


DataOutputSerializer.writeUTF has a hard limit on the length of the string 
(64k). This is inherited from the DataOutput.writeUTF method, where the JDK 
specifically defines this limit [1].

For our use-case we need to enable the possibility to serialize longer UTF 
strings, so we will need to define a writeLongUTF method with a similar 
specification than the writeUTF, but without the length limit.

Based on the discussion on the mailing list, this is a good additional 
serialization utility to Flink [2]

[1] - 
https://docs.oracle.com/javase/8/docs/api/java/io/DataOutput.html#writeUTF-java.lang.String-
[2] - https://lists.apache.org/thread/ocm6cj0h8o3wbwo7fz2l1b4odss750rk



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34209) Migrate FileSink to the new SinkV2 API

2024-01-23 Thread Peter Vary (Jira)
Peter Vary created FLINK-34209:
--

 Summary: Migrate FileSink to the new SinkV2 API
 Key: FLINK-34209
 URL: https://issues.apache.org/jira/browse/FLINK-34209
 Project: Flink
  Issue Type: Sub-task
Reporter: Peter Vary


Currently `FileSink` uses `TwoPhaseCommittingSink` and `StatefulSink` from the 
SinkV2 API. We should migrate it to use the new SinkV2 API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34208) Migrate SinkV1Adapter to the new SinkV2 API

2024-01-23 Thread Peter Vary (Jira)
Peter Vary created FLINK-34208:
--

 Summary: Migrate SinkV1Adapter to the new SinkV2 API
 Key: FLINK-34208
 URL: https://issues.apache.org/jira/browse/FLINK-34208
 Project: Flink
  Issue Type: Sub-task
Reporter: Peter Vary


Currently SinkV1Adapter still using `TwoPhaseCommittingSink` and `StatefulSink`.
We should migrate it to use the new API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34043) Remove deprecated Sink V2 interfaces

2024-01-09 Thread Peter Vary (Jira)
Peter Vary created FLINK-34043:
--

 Summary: Remove deprecated Sink V2 interfaces
 Key: FLINK-34043
 URL: https://issues.apache.org/jira/browse/FLINK-34043
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream
Reporter: Peter Vary


In Flink 1.20.0 we should remove the interfaces deprecated by FLINK-33973



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34042) Update the documentation for Sink V2

2024-01-09 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated FLINK-34042:
---
Description: Check the documentation and update the Sink V2 API usages 
whenever it is needed

> Update the documentation for Sink V2
> 
>
> Key: FLINK-34042
> URL: https://issues.apache.org/jira/browse/FLINK-34042
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Peter Vary
>Priority: Major
>
> Check the documentation and update the Sink V2 API usages whenever it is 
> needed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34042) Update the documentation

2024-01-09 Thread Peter Vary (Jira)
Peter Vary created FLINK-34042:
--

 Summary: Update the documentation 
 Key: FLINK-34042
 URL: https://issues.apache.org/jira/browse/FLINK-34042
 Project: Flink
  Issue Type: Sub-task
Reporter: Peter Vary






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34042) Update the documentation for Sink V2

2024-01-09 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated FLINK-34042:
---
Summary: Update the documentation for Sink V2  (was: Update the 
documentation )

> Update the documentation for Sink V2
> 
>
> Key: FLINK-34042
> URL: https://issues.apache.org/jira/browse/FLINK-34042
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Peter Vary
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33975) Tests for the new Sink V2 transformations

2024-01-03 Thread Peter Vary (Jira)
Peter Vary created FLINK-33975:
--

 Summary: Tests for the new Sink V2 transformations
 Key: FLINK-33975
 URL: https://issues.apache.org/jira/browse/FLINK-33975
 Project: Flink
  Issue Type: Sub-task
Reporter: Peter Vary


Create new tests for the SinkV2 api transformations, and migrate some of the 
tests to use the new API. Some of the old test should be kept using the old API 
to make sure that the backward compatibility is tested until the deprecation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33974) Implement the Sink transformation depending on the new SinkV2 interfaces

2024-01-03 Thread Peter Vary (Jira)
Peter Vary created FLINK-33974:
--

 Summary: Implement the Sink transformation depending on the new 
SinkV2 interfaces
 Key: FLINK-33974
 URL: https://issues.apache.org/jira/browse/FLINK-33974
 Project: Flink
  Issue Type: Sub-task
Reporter: Peter Vary


Implement the changes to the Sink transformation which should depend only on 
the new API interfaces. The tests should remain the same, to ensure backward 
compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33973) Add new interfaces for SinkV2 to synchronize the API with the SourceV2 API

2024-01-03 Thread Peter Vary (Jira)
Peter Vary created FLINK-33973:
--

 Summary: Add new interfaces for SinkV2 to synchronize the API with 
the SourceV2 API
 Key: FLINK-33973
 URL: https://issues.apache.org/jira/browse/FLINK-33973
 Project: Flink
  Issue Type: Sub-task
Reporter: Peter Vary


Create the new interfaces, set inheritance and deprecation to finalize the 
interface.
After this change the new interafaces will exits, but they will not be 
functional.

The existing interfaces, and test should be working without issue, to verify 
that adding the API will be backward compatible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33972) Enhance and synchronize Sink API to match the Source API

2024-01-03 Thread Peter Vary (Jira)
Peter Vary created FLINK-33972:
--

 Summary: Enhance and synchronize Sink API to match the Source API
 Key: FLINK-33972
 URL: https://issues.apache.org/jira/browse/FLINK-33972
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream
Reporter: Peter Vary


Umbrella jira for the implementation of 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33528) Externalize Python connector code

2023-12-21 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799320#comment-17799320
 ] 

Peter Vary commented on FLINK-33528:


[~Sergey Nuyanzin]: FLINK-33762 is needed, so the connectors could release 
their on Python package. I would remove the code from the Flink code only after 
the connector packages are released.

> Externalize Python connector code
> -
>
> Key: FLINK-33528
> URL: https://issues.apache.org/jira/browse/FLINK-33528
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Python, Connectors / Common
>Affects Versions: 1.18.0
>Reporter: Márton Balassi
>Assignee: Peter Vary
>Priority: Major
> Fix For: 1.19.0
>
>
> During the connector externalization effort end to end tests for the python 
> connectors were left in the main repository under:
> [https://github.com/apache/flink/tree/master/flink-python/pyflink/datastream/connectors]
> These include both python connector implementation and tests. Currently they 
> depend on a previously released version of the underlying connectors, 
> otherwise they would introduce a circular dependency given that they are in 
> the flink repo at the moment.
> This setup prevents us from propagating any breaking change to PublicEvolving 
> and Internal APIs used by the connectors as they lead to breaking the python 
> e2e tests. We run into this while implementing FLINK-25857.
> Note that we made the decision to turn off the Python test when merging 
> FLINK-25857, so now we are forced to fix this until 1.19 such that we can 
> reenable the test runs - now in the externalized connector repos.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33523) DataType ARRAY fails to cast into Object[]

2023-12-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794128#comment-17794128
 ] 

Peter Vary commented on FLINK-33523:


Created a thread about this topic on the mailing list: 
https://lists.apache.org/thread/m4c879y8mb7hbn2kkjh9h3d8g1jphh3j
I would appreciate if you can share your thoughts there [~prabhujoseph], 
[~jeyhun], [~aitozi], [~jark], [~xccui]

> DataType ARRAY fails to cast into Object[]
> 
>
> Key: FLINK-33523
> URL: https://issues.apache.org/jira/browse/FLINK-33523
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.18.0
>Reporter: Prabhu Joseph
>Priority: Major
>
> When upgrading Iceberg's Flink version to 1.18, we found the Flink-related 
> unit test case broken due to this issue. The below code used to work fine in 
> Flink 1.17 but failed after upgrading to 1.18. DataType ARRAY 
> fails to cast into Object[].
> *Error:*
> {code}
> Exception in thread "main" java.lang.ClassCastException: [I cannot be cast to 
> [Ljava.lang.Object;
> at FlinkArrayIntNotNullTest.main(FlinkArrayIntNotNullTest.java:18)
> {code}
> *Repro:*
> {code}
>   import org.apache.flink.table.data.ArrayData;
>   import org.apache.flink.table.data.GenericArrayData;
>   import org.apache.flink.table.api.EnvironmentSettings;
>   import org.apache.flink.table.api.TableEnvironment;
>   import org.apache.flink.table.api.TableResult;
>   public class FlinkArrayIntNotNullTest {
> public static void main(String[] args) throws Exception {
>   EnvironmentSettings settings = 
> EnvironmentSettings.newInstance().inBatchMode().build();
>   TableEnvironment env = TableEnvironment.create(settings);
>   env.executeSql("CREATE TABLE filesystemtable2 (id INT, data ARRAY NOT NULL>) WITH ('connector' = 'filesystem', 'path' = 
> '/tmp/FLINK/filesystemtable2', 'format'='json')");
>   env.executeSql("INSERT INTO filesystemtable2 VALUES (4,ARRAY [1,2,3])");
>   TableResult tableResult = env.executeSql("SELECT * from 
> filesystemtable2");
>   ArrayData actualArrayData = new GenericArrayData((Object[]) 
> tableResult.collect().next().getField(1));
> }
>   }
> {code}
> *Analysis:*
> 1. The code works fine with ARRAY datatype. The issue happens when using 
> ARRAY.
> 2. The code works fine when casting into int[] instead of Object[].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33762) Versioned release of flink-connector-shared-utils python scripts

2023-12-06 Thread Peter Vary (Jira)
Peter Vary created FLINK-33762:
--

 Summary: Versioned release of flink-connector-shared-utils python 
scripts
 Key: FLINK-33762
 URL: https://issues.apache.org/jira/browse/FLINK-33762
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Connectors / Common
Reporter: Peter Vary


We need a versioned release of the scripts stored in 
flink-connector-shared-utils/python directory. This will allow even 
incompatible changes for these scripts. The connector developers could chose 
which version of the scripts they depend on.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-25857) Add committer metrics to track the status of committables

2023-11-27 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789963#comment-17789963
 ] 

Peter Vary commented on FLINK-25857:


If you check the discussion [1] the diamond inheritance of the 
`Sink.createWriter` method prevents any backward compatible change of the 
method. One could argue that this is a flawed design.

*About the process and the compatibility*
[~Weijie Guo]: Here is my understanding of the FLIP process, please correct me, 
if I am wrong somewhere:
- If there is a change which modifies or creates a new API we should create a 
FLIP to discuss the change [2]
- We start the discussion on the mailing list, so everyone who is interested 
in, could participate [3]
- If there is a consensus on the design, we start a voting thread [4]
- If the voting is successful, we announce the result and close the FLIP [5]
- If during the implementation we found issues we discuss it there - we do not 
modify the finalised FLIP [6]

Maybe it would be good to have an additional step, that when there is a change 
related to the original design of the FLIP. We should send a letter to the 
mailing list as well, to notify interested parties who are not following the 
actual implementation.

About the deprecation process, I have been working based on the API 
compatibility guarantees [7] stated in the docs. Based on the table there a 
PublicEvolving API should be source and binary compatible for patch releases, 
but there is no guarantees for minor releases. Maybe the same redesign process 
happened during the implementation of FLIP-321 [8]? I was not involved there, 
so I do not have a first hand information.

[1] - https://github.com/apache/flink/pull/23555#discussion_r1371740397
[2] - 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink
[3] - https://lists.apache.org/thread/v3mrspdlrqrzvbwm0lcgr0j4v03dx97c
[4] - https://lists.apache.org/thread/4f7w4n3nywk8ygnwlxk39oncl3cntp3n
[5] - https://lists.apache.org/thread/jw39s55tzzpdkzmlh0vshmjnfrjg02nr
[6] - https://github.com/apache/flink/pull/23555#discussion_r1369177945
[7] - 
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/upgrading/#api-compatibility-guarantees
[8] - 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-321%3A+Introduce+an+API+deprecation+process



> Add committer metrics to track the status of committables
> -
>
> Key: FLINK-25857
> URL: https://issues.apache.org/jira/browse/FLINK-25857
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Common
>Reporter: Fabian Paul
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: image-2023-10-20-17-23-09-595.png, screenshot-1.png
>
>
> With Sink V2 we can now track the progress of a committable during committing 
> and show metrics about the committing status. (i.e. failed, retried, 
> succeeded).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-25857) Add committer metrics to track the status of committables

2023-11-23 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789335#comment-17789335
 ] 

Peter Vary commented on FLINK-25857:


For the Iceberg connector we release separate versions for different versions 
of Flink.

If I understand correctly, this was the case before the separation of the 
connectors too - every Flink version contained different versions of 
connectors, and the jars might not work cross versions.

Also the `TwoPhaseCommittingSink` is marked with `PublicEvolving` annotation, 
which means it could change between minor versions of Flink. We push forward 
these changes, so soon the API could be changed to `Public`, and we can avoid 
these kind of disruptions in the future. 

> Add committer metrics to track the status of committables
> -
>
> Key: FLINK-25857
> URL: https://issues.apache.org/jira/browse/FLINK-25857
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Common
>Reporter: Fabian Paul
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: image-2023-10-20-17-23-09-595.png, screenshot-1.png
>
>
> With Sink V2 we can now track the progress of a committable during committing 
> and show metrics about the committing status. (i.e. failed, retried, 
> succeeded).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33568) SinkV2MetricsITCase.testCommitterMetrics fails with NullPointerException

2023-11-16 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786947#comment-17786947
 ] 

Peter Vary commented on FLINK-33568:


With the patch it looks like this:

!Screenshot 2023-11-16 at 22.30.51.png|width=612,height=128!

> SinkV2MetricsITCase.testCommitterMetrics fails with NullPointerException
> 
>
> Key: FLINK-33568
> URL: https://issues.apache.org/jira/browse/FLINK-33568
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Screenshot 2023-11-16 at 22.30.51.png
>
>
> {code}
> Nov 16 01:48:57 01:48:57.537 [ERROR] Tests run: 2, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 6.023 s <<< FAILURE! - in 
> org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase
> Nov 16 01:48:57 01:48:57.538 [ERROR] 
> org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.testCommitterMetrics
>   Time elapsed: 0.745 s  <<< ERROR!
> Nov 16 01:48:57 java.lang.NullPointerException
> Nov 16 01:48:57   at 
> org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.assertSinkCommitterMetrics(SinkV2MetricsITCase.java:254)
> Nov 16 01:48:57   at 
> org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.testCommitterMetrics(SinkV2MetricsITCase.java:153)
> Nov 16 01:48:57   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54602&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=8546
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54602&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8605



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33568) SinkV2MetricsITCase.testCommitterMetrics fails with NullPointerException

2023-11-16 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated FLINK-33568:
---
Attachment: Screenshot 2023-11-16 at 22.30.51.png

> SinkV2MetricsITCase.testCommitterMetrics fails with NullPointerException
> 
>
> Key: FLINK-33568
> URL: https://issues.apache.org/jira/browse/FLINK-33568
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Screenshot 2023-11-16 at 22.30.51.png
>
>
> {code}
> Nov 16 01:48:57 01:48:57.537 [ERROR] Tests run: 2, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 6.023 s <<< FAILURE! - in 
> org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase
> Nov 16 01:48:57 01:48:57.538 [ERROR] 
> org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.testCommitterMetrics
>   Time elapsed: 0.745 s  <<< ERROR!
> Nov 16 01:48:57 java.lang.NullPointerException
> Nov 16 01:48:57   at 
> org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.assertSinkCommitterMetrics(SinkV2MetricsITCase.java:254)
> Nov 16 01:48:57   at 
> org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.testCommitterMetrics(SinkV2MetricsITCase.java:153)
> Nov 16 01:48:57   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54602&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=8546
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54602&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8605



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33568) SinkV2MetricsITCase.testCommitterMetrics fails with NullPointerException

2023-11-16 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786814#comment-17786814
 ] 

Peter Vary commented on FLINK-33568:


Checking

> SinkV2MetricsITCase.testCommitterMetrics fails with NullPointerException
> 
>
> Key: FLINK-33568
> URL: https://issues.apache.org/jira/browse/FLINK-33568
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Critical
>
> {code}
> Nov 16 01:48:57 01:48:57.537 [ERROR] Tests run: 2, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 6.023 s <<< FAILURE! - in 
> org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase
> Nov 16 01:48:57 01:48:57.538 [ERROR] 
> org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.testCommitterMetrics
>   Time elapsed: 0.745 s  <<< ERROR!
> Nov 16 01:48:57 java.lang.NullPointerException
> Nov 16 01:48:57   at 
> org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.assertSinkCommitterMetrics(SinkV2MetricsITCase.java:254)
> Nov 16 01:48:57   at 
> org.apache.flink.test.streaming.runtime.SinkV2MetricsITCase.testCommitterMetrics(SinkV2MetricsITCase.java:153)
> Nov 16 01:48:57   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54602&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=8546
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54602&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8605



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33295) Separate SinkV2 and SinkV1Adapter tests

2023-11-09 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784689#comment-17784689
 ] 

Peter Vary edited comment on FLINK-33295 at 11/10/23 6:23 AM:
--

_InternalSinkWriterMetricGroup_ is an initial class, so in theory connectors 
should not use it.
 * How much effort would it be to enable the annotation check for the 
connectors?
 * We can expose the _MetricsGroupTestUtils_ in a test jar, if we see that the 
connectors would like use it for testing.

Thanks for the heads-up!

 

Peter


was (Author: pvary):
`InternalSinkWriterMetricGroup` is an initial class, so in theory connectors 
should not use it.
 * How much effort would it be to enable the annotation check for the 
connectors?
 * We can expose the `MetricsGroupTestUtils` in a test jar, if we see that the 
connectors would like use it for testing.

Thanks for the heads-up!

 

Peter

> Separate SinkV2 and SinkV1Adapter tests
> ---
>
> Key: FLINK-33295
> URL: https://issues.apache.org/jira/browse/FLINK-33295
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream, Connectors / Common
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Current SinkV2 tests are based on the sink generated by the 
> _org.apache.flink.streaming.runtime.operators.sink.TestSink_ test class. This 
> test class does not generate the SinkV2 directly, but generates a SinkV1 and 
> wraps in with a 
> _org.apache.flink.streaming.api.transformations.SinkV1Adapter._ While this 
> tests the SinkV2, but only as it is aligned with SinkV1, and the 
> SinkV1Adapter.
> We should have tests where we create a SinkV2 directly and the functionality 
> is tested without the adapter.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33295) Separate SinkV2 and SinkV1Adapter tests

2023-11-09 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784689#comment-17784689
 ] 

Peter Vary commented on FLINK-33295:


`InternalSinkWriterMetricGroup` is an initial class, so in theory connectors 
should not use it.
 * How much effort would it be to enable the annotation check for the 
connectors?
 * We can expose the `MetricsGroupTestUtils` in a test jar, if we see that the 
connectors would like use it for testing.

Thanks for the heads-up!

 

Peter

> Separate SinkV2 and SinkV1Adapter tests
> ---
>
> Key: FLINK-33295
> URL: https://issues.apache.org/jira/browse/FLINK-33295
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream, Connectors / Common
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Current SinkV2 tests are based on the sink generated by the 
> _org.apache.flink.streaming.runtime.operators.sink.TestSink_ test class. This 
> test class does not generate the SinkV2 directly, but generates a SinkV1 and 
> wraps in with a 
> _org.apache.flink.streaming.api.transformations.SinkV1Adapter._ While this 
> tests the SinkV2, but only as it is aligned with SinkV1, and the 
> SinkV1Adapter.
> We should have tests where we create a SinkV2 directly and the functionality 
> is tested without the adapter.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33328) Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-10-20 Thread Peter Vary (Jira)
Peter Vary created FLINK-33328:
--

 Summary: Allow TwoPhaseCommittingSink WithPreCommitTopology to 
alter the type of the Committable
 Key: FLINK-33328
 URL: https://issues.apache.org/jira/browse/FLINK-33328
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream, Connectors / Common
Reporter: Peter Vary


In case of the Iceberg Sink, we would like to use the _WithPreCommitTopology_ 
to aggregate the writer results and create a single committable from them. So 
we would like to change both the type, and the number of the messages. Using 
the current _WithPreCommitTopology_ interface we can work around the issue by 
using a Tuple, or POJO where some of the fields are used only before the 
_addPreCommitTopology_ method, and some of the fields are only used after the 
method, but this seems more like abusing the interface than using it.

This is a more generic issue where the _WithPreCommitTopology_ should provide a 
way to transform not only the data, but the type of the data channelled through 
it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33295) Separate SinkV2 and SinkV1Adapter tests

2023-10-17 Thread Peter Vary (Jira)
Peter Vary created FLINK-33295:
--

 Summary: Separate SinkV2 and SinkV1Adapter tests
 Key: FLINK-33295
 URL: https://issues.apache.org/jira/browse/FLINK-33295
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream, Connectors / Common
Reporter: Peter Vary


Current SinkV2 tests are based on the sink generated by the 
_org.apache.flink.streaming.runtime.operators.sink.TestSink_ test class. This 
test class does not generate the SinkV2 directly, but generates a SinkV1 and 
wraps in with a _org.apache.flink.streaming.api.transformations.SinkV1Adapter._ 
While this tests the SinkV2, but only as it is aligned with SinkV1, and the 
SinkV1Adapter.

We should have tests where we create a SinkV2 directly and the functionality is 
tested without the adapter.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-32046) OOM caused by SplitAssignmentTracker.uncheckpointedAssignments

2023-05-10 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated FLINK-32046:
---
Description: 
If the checkpointing is turned off then the 
{{SplitAssignmentTracker.uncheckpointedAssignments}} is never cleared and grows 
indefinitely. Eventually leading to OOM.

The only other place which would remove elements from this map is 
{{{}getAndRemoveUncheckpointedAssignment{}}}, but it is only for failure 
scenarios.

By my understanding this problem exists since the introduction of the new 
{{Source}} implementation.

  was:
If the checkpointing is turned off then the 
{{SplitAssignmentTracker.uncheckpointedAssignments}} is never cleared and grows 
indefinitely. Eventually leading to OOM.

The only other place which would remove elements from this map is 
{{{}getAndRemoveUncheckpointedAssignment{}}}, but it is only for failure 
scenarios.

By my understanding this problem exists since the introduction of the new 
source code.


> OOM caused by SplitAssignmentTracker.uncheckpointedAssignments
> --
>
> Key: FLINK-32046
> URL: https://issues.apache.org/jira/browse/FLINK-32046
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Reporter: Peter Vary
>Priority: Major
>
> If the checkpointing is turned off then the 
> {{SplitAssignmentTracker.uncheckpointedAssignments}} is never cleared and 
> grows indefinitely. Eventually leading to OOM.
> The only other place which would remove elements from this map is 
> {{{}getAndRemoveUncheckpointedAssignment{}}}, but it is only for failure 
> scenarios.
> By my understanding this problem exists since the introduction of the new 
> {{Source}} implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32046) OOM caused by SplitAssignmentTracker.uncheckpointedAssignments

2023-05-10 Thread Peter Vary (Jira)
Peter Vary created FLINK-32046:
--

 Summary: OOM caused by 
SplitAssignmentTracker.uncheckpointedAssignments
 Key: FLINK-32046
 URL: https://issues.apache.org/jira/browse/FLINK-32046
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Reporter: Peter Vary


If the checkpointing is turned off then the 
{{SplitAssignmentTracker.uncheckpointedAssignments}} is never cleared and grows 
indefinitely. Eventually leading to OOM.

The only other place which would remove elements from this map is 
{{{}getAndRemoveUncheckpointedAssignment{}}}, but it is only for failure 
scenarios.

By my understanding this problem exists since the introduction of the new 
source code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31868) Fix DefaultInputSplitAssigner javadoc for class

2023-04-20 Thread Peter Vary (Jira)
Peter Vary created FLINK-31868:
--

 Summary: Fix DefaultInputSplitAssigner javadoc for class
 Key: FLINK-31868
 URL: https://issues.apache.org/jira/browse/FLINK-31868
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Reporter: Peter Vary


Based on the discussion[1] on the mailing list {{there
is no requirement of the order of splits by Flink itself}}, we should fix the 
discrepancy between the code and the comment by updating the comment.

 

[[1] 
https://lists.apache.org/thread/74m7z2kzgpzylhrp1oq4lz37pnqjmbkh|https://lists.apache.org/thread/74m7z2kzgpzylhrp1oq4lz37pnqjmbkh]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31246) Beautify the SpecChange message

2023-03-01 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated FLINK-31246:
---
Summary: Beautify the SpecChange message  (was: Remove PodTemplate 
description from the SpecChange message)

> Beautify the SpecChange message
> ---
>
> Key: FLINK-31246
> URL: https://issues.apache.org/jira/browse/FLINK-31246
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>
> Currently the Spec Change message contains the full PodTemplate twice.
> This makes the message seriously big and also contains very little useful 
> information.
> We should abbreviate the message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31246) Remove PodTemplate description from the SpecChange message

2023-02-28 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694443#comment-17694443
 ] 

Peter Vary edited comment on FLINK-31246 at 2/28/23 8:57 AM:
-

Talked with [~gyfora] about this, and he is concerned that this would be a 
breaking change for some users and could cause issues for them


was (Author: pvary):
Talked with [~gyfora] about this, and he is concerned that this would be a 
breaking change for some users and could cause issues for other users

> Remove PodTemplate description from the SpecChange message
> --
>
> Key: FLINK-31246
> URL: https://issues.apache.org/jira/browse/FLINK-31246
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Peter Vary
>Priority: Major
>
> Currently the Spec Change message contains the full PodTemplate twice.
> This makes the message seriously big and also contains very little useful 
> information.
> We should abbreviate the message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31246) Remove PodTemplate description from the SpecChange message

2023-02-28 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694443#comment-17694443
 ] 

Peter Vary commented on FLINK-31246:


Talked with [~gyfora] about this, and he is concerned that this would be a 
breaking change for some users and could cause issues for other users

> Remove PodTemplate description from the SpecChange message
> --
>
> Key: FLINK-31246
> URL: https://issues.apache.org/jira/browse/FLINK-31246
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Peter Vary
>Priority: Major
>
> Currently the Spec Change message contains the full PodTemplate twice.
> This makes the message seriously big and also contains very little useful 
> information.
> We should abbreviate the message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31246) Remove PodTemplate description from the SpecChange message

2023-02-27 Thread Peter Vary (Jira)
Peter Vary created FLINK-31246:
--

 Summary: Remove PodTemplate description from the SpecChange message
 Key: FLINK-31246
 URL: https://issues.apache.org/jira/browse/FLINK-31246
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Peter Vary


Currently the Spec Change message contains the full PodTemplate twice.
This makes the message seriously big and also contains very little useful 
information.

We should abbreviate the message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30543) Adding more examples for setting up jobs via operator.

2023-01-03 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated FLINK-30543:
---
Component/s: Kubernetes Operator

> Adding more examples for setting up jobs via operator.
> --
>
> Key: FLINK-30543
> URL: https://issues.apache.org/jira/browse/FLINK-30543
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Sriram Ganesh
>Priority: Minor
>
> Currently, we have only basic examples which help to see how to run the job 
> via an operator if we can add more examples for all upgrade modes that would 
> be more helpful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30367) Enrich the thread dump info with deeper stack

2022-12-12 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646136#comment-17646136
 ] 

Peter Vary commented on FLINK-30367:


The depth of the stack trace could be set by the 
{{cluster.thread-dump.stacktrace-max-depth}} configuration value.

See: 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#cluster-thread-dump-stacktrace-max-depth

> Enrich the thread dump info with deeper stack
> -
>
> Key: FLINK-30367
> URL: https://issues.apache.org/jira/browse/FLINK-30367
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Task, Runtime / Web Frontend
>Reporter: Yun Tang
>Assignee: Yu Chen
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently, we only have the thread dump info with very few stack depth, and 
> we cannot see the thread information in details. It would be useful to enrich 
> the thread dump info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-30315) Add more information about image pull failures to the operator log

2022-12-06 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated FLINK-30315:
---
Description: 
When there is an image pull error, this is what we see in the operator log:
{code:java}
org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: 
Back-off pulling image "flink:1.14"
at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.checkContainerBackoff(AbstractFlinkDeploymentObserver.java:194)
at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeJmDeployment(AbstractFlinkDeploymentObserver.java:150)
at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:84)
at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55)
at 
org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56)
at 
org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32)
at 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:113)
at 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54)
at 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136)
at 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94)
at 
org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
at 
io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93)
at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130)
at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110)
at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81)
at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54)
at 
io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
at java.base/java.lang.Thread.run(Unknown Source) {code}
This is the information we have on kubernetes side:
{code:java}
Normal   Scheduled  2m19s               default-scheduler  Successfully assigned
default/quickstart-base-86787586cd-lb7j6 to minikube
Warning  Failed     20s                 kubelet            Failed to pull image 
"flink:1.14": rpc error: code = Unknown desc = context deadline exceeded
*Warning  Failed     20s                 kubelet            Error*: ErrImagePull
Normal   BackOff    19s                 kubelet            Back-off pulling 
image "flink:1.14"
*Warning  Failed     19s                 kubelet            Error*: 
ImagePullBackOff
Normal   Pulling    7s (x2 over 2m19s)  kubelet            Pulling image 
"flink:1.14"
{code}
It would be good to add the additional message (in this case {{{}Failed to pull 
image "flink:1.14": rpc error: code = Unknown desc = context deadline 
exceeded{}}}) to the message of the {{DeploymentFailedException}} for 
traceability.

  was:
When there is an image pull error, this is what we see in the operator log:
{code:java}
org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: 
Back-off pulling image "flink:1.14"
at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.checkContainerBackoff(AbstractFlinkDeploymentObserver.java:194)
at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeJmDeployment(AbstractFlinkDeploymentObserver.java:150)
at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:84)
at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55)
at 
org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56)
at 
org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32)
at 
org.apache.flink.kubernetes.o

[jira] [Commented] (FLINK-30315) Add more information about image pull failures to the operator log

2022-12-06 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643909#comment-17643909
 ] 

Peter Vary commented on FLINK-30315:


The {{ContainerStateWaiting}} contains the message that we want.
The issue is that:
 - For {{ErrImagePull}} we have the correct message: {{Failed to pull image 
"flink:1.14": rpc error: code = Unknown desc = context deadline exceeded}}
 - For {{ImagePullBackOff}} we only have this message: {{Back-off pulling image 
"flink:1.14"}} which is not that useful

Based on this, I think we have the following options:
 # Throw {{DeploymentFailedException}} at {{ErrImagePull}} and add provide the 
enhanced message. Cons: This throws an error on the first image pull error - 
previously we retried at least once (I am not sure that this is that important 
as we continue to monitor the state of the deployment and we act on the state 
changes anyway)
 # Store the message in the state and provide it when the ImagePullBackOff 
failed

I would like to hear you opinions about the options, or I am interested in any 
alternatives you have in mind.



Without any different opinions, I would go for option 1.

> Add more information about image pull failures to the operator log
> --
>
> Key: FLINK-30315
> URL: https://issues.apache.org/jira/browse/FLINK-30315
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Peter Vary
>Priority: Major
>
> When there is an image pull error, this is what we see in the operator log:
> {code:java}
> org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: 
> Back-off pulling image "flink:1.14"
>   at 
> org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.checkContainerBackoff(AbstractFlinkDeploymentObserver.java:194)
>   at 
> org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeJmDeployment(AbstractFlinkDeploymentObserver.java:150)
>   at 
> org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:84)
>   at 
> org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55)
>   at 
> org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56)
>   at 
> org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32)
>   at 
> org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:113)
>   at 
> org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54)
>   at 
> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136)
>   at 
> io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94)
>   at 
> org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
>   at 
> io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93)
>   at 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130)
>   at 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110)
>   at 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81)
>   at 
> io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54)
>   at 
> io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406)
>   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>   at java.base/java.lang.Thread.run(Unknown Source) {code}
> This is the information we have on kubernetes side:
> {code}
> Normal   Scheduled  2m19s               default-scheduler  Successfully 
> assigned
> default/quickstart-base-86787586cd-lb7j6 to minikube
> Warning  Failed     20s                 kubelet            Failed to pull 
> image "flink:1.14": rpc error: code = Unknown desc = context deadline exceeded
> *Warning  Failed     20s                 kubelet            Error*: 
> ErrImagePull
> Normal   BackOff    19s                 kubelet            Back-off pulling 
> image "flink:1.14"
> *Warning  Failed     19s                 kubelet   

[jira] [Created] (FLINK-30315) Add more information about image pull failures to the operator log

2022-12-06 Thread Peter Vary (Jira)
Peter Vary created FLINK-30315:
--

 Summary: Add more information about image pull failures to the 
operator log
 Key: FLINK-30315
 URL: https://issues.apache.org/jira/browse/FLINK-30315
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Peter Vary


When there is an image pull error, this is what we see in the operator log:
{code:java}
org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: 
Back-off pulling image "flink:1.14"
at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.checkContainerBackoff(AbstractFlinkDeploymentObserver.java:194)
at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeJmDeployment(AbstractFlinkDeploymentObserver.java:150)
at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:84)
at 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55)
at 
org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56)
at 
org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32)
at 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:113)
at 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54)
at 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136)
at 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94)
at 
org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
at 
io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93)
at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130)
at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110)
at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81)
at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54)
at 
io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
at java.base/java.lang.Thread.run(Unknown Source) {code}

This is the information we have on kubernetes side:
{code}
Normal   Scheduled  2m19s               default-scheduler  Successfully assigned
default/quickstart-base-86787586cd-lb7j6 to minikube
Warning  Failed     20s                 kubelet            Failed to pull image 
"flink:1.14": rpc error: code = Unknown desc = context deadline exceeded
*Warning  Failed     20s                 kubelet            Error*: ErrImagePull
Normal   BackOff    19s                 kubelet            Back-off pulling 
image "flink:1.14"
*Warning  Failed     19s                 kubelet            Error*: 
ImagePullBackOff
Normal   Pulling    7s (x2 over 2m19s)  kubelet            Pulling image 
"flink:1.14"
{code}

It would be good to add the additional message (in this case {{Failed to pull 
image "flink:1.14": rpc error: code = Unknown desc = context deadline 
exceeded}}) to the message of the {{DeploymentFailedException}} for tracebility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30311) CI error: Back-off pulling image "flink:1.14"

2022-12-06 Thread Peter Vary (Jira)
Peter Vary created FLINK-30311:
--

 Summary: CI error: Back-off pulling image "flink:1.14"
 Key: FLINK-30311
 URL: https://issues.apache.org/jira/browse/FLINK-30311
 Project: Flink
  Issue Type: Sub-task
Reporter: Peter Vary


CI failed with: {{Flink Deployment failed 2022-12-06T08:45:03.0244383Z 
org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: 
Back-off pulling image "flink:1.14"}}

We should find the root cause of this issue and try to mitigate it.

[https://github.com/apache/flink-kubernetes-operator/actions/runs/3627824632/jobs/6118131271]

 
{code:java}
2022-12-06T08:45:03.0243558Z 2022-12-06 08:41:44,716 
o.a.f.k.o.c.FlinkDeploymentController 
[ERROR][default/flink-example-statemachine] Flink Deployment failed
2022-12-06T08:45:03.0244383Z 
org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: 
Back-off pulling image "flink:1.14"
2022-12-06T08:45:03.0245385Zat 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.checkContainerBackoff(AbstractFlinkDeploymentObserver.java:194)
2022-12-06T08:45:03.0246604Zat 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeJmDeployment(AbstractFlinkDeploymentObserver.java:150)
2022-12-06T08:45:03.0247780Zat 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:84)
2022-12-06T08:45:03.0248934Zat 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55)
2022-12-06T08:45:03.0249941Zat 
org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56)
2022-12-06T08:45:03.0250844Zat 
org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32)
2022-12-06T08:45:03.0252038Zat 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:113)
2022-12-06T08:45:03.0252936Zat 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54)
2022-12-06T08:45:03.0253850Zat 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136)
2022-12-06T08:45:03.0254412Zat 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94)
2022-12-06T08:45:03.0255322Zat 
org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
2022-12-06T08:45:03.0256081Zat 
io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93)
2022-12-06T08:45:03.0256872Zat 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130)
2022-12-06T08:45:03.0257804Zat 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110)
2022-12-06T08:45:03.0258720Zat 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81)
2022-12-06T08:45:03.0259635Zat 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54)
2022-12-06T08:45:03.0260448Zat 
io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406)
2022-12-06T08:45:03.0261070Zat 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
2022-12-06T08:45:03.0261595Zat 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
2022-12-06T08:45:03.0262005Zat java.base/java.lang.Thread.run(Unknown 
Source) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30150) Evaluate operator error log whitelist entry: REST service in session cluster is bad now

2022-12-05 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643305#comment-17643305
 ] 

Peter Vary commented on FLINK-30150:


This is the exception in the logs:
{code:java}
2022-12-05T11:40:59.2665289Z 2022-12-05 11:40:26,746 
o.a.f.k.o.o.d.SessionObserver  
[ERROR][default/session-cluster-1] REST service in session cluster is 
bad now
2022-12-05T11:40:59.2665851Z java.util.concurrent.TimeoutException
2022-12-05T11:40:59.2666258Zat 
java.base/java.util.concurrent.CompletableFuture.timedGet(Unknown Source)
2022-12-05T11:40:59.2666841Zat 
java.base/java.util.concurrent.CompletableFuture.get(Unknown Source)
2022-12-05T11:40:59.2667549Zat 
org.apache.flink.kubernetes.operator.service.AbstractFlinkService.listJobs(AbstractFlinkService.java:231)
2022-12-05T11:40:59.2668462Zat 
org.apache.flink.kubernetes.operator.observer.deployment.SessionObserver.observeFlinkCluster(SessionObserver.java:48)
2022-12-05T11:40:59.2669809Zat 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:89)
2022-12-05T11:40:59.2671385Zat 
org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55)
2022-12-05T11:40:59.2672514Zat 
org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56)
2022-12-05T11:40:59.2673507Zat 
org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32)
2022-12-05T11:40:59.2674466Zat 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:113)
2022-12-05T11:40:59.2675692Zat 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54)
2022-12-05T11:40:59.2676509Zat 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136)
2022-12-05T11:40:59.2677043Zat 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94)
2022-12-05T11:40:59.2677741Zat 
org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
2022-12-05T11:40:59.2678451Zat 
io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93)
2022-12-05T11:40:59.2679180Zat 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130)
2022-12-05T11:40:59.2680055Zat 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110)
2022-12-05T11:40:59.2681621Zat 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81)
2022-12-05T11:40:59.2682478Zat 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54)
2022-12-05T11:40:59.2683241Zat 
io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406)
2022-12-05T11:40:59.2683817Zat 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
2022-12-05T11:40:59.2684294Zat 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
2022-12-05T11:40:59.2684676Zat java.base/java.lang.Thread.run(Unknown 
Source) {code}
The log line show 2022-12-05 11:40:26,746 as the timestamp.

This is happening when we manually kill the job to test the recovery:
{code:java}
2022-12-05T11:40:12.8330378Z Successfully verified that 
sessionjob/flink-example-statemachine.status.jobStatus.state is in RUNNING 
state.
2022-12-05T11:40:12.9711940Z Kill the session-cluster-1-7bc5b4d7cb-t5hgq
2022-12-05T11:40:13.3083721Z Waiting for log "Restoring job 
9b85cb750001 from Checkpoint"...
2022-12-05T11:40:35.8208688Z Log "Restoring job 
9b85cb750001 from Checkpoint" shows up. {code}
I would say that this is expected.

> Evaluate operator error log whitelist entry: REST service in session cluster 
> is bad now
> ---
>
> Key: FLINK-30150
> URL: https://issues.apache.org/jira/browse/FLINK-30150
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gabor Somogyi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30199) Add a script to run Kubernetes Operator e2e tests manually

2022-11-24 Thread Peter Vary (Jira)
Peter Vary created FLINK-30199:
--

 Summary: Add a script to run Kubernetes Operator e2e tests manually
 Key: FLINK-30199
 URL: https://issues.apache.org/jira/browse/FLINK-30199
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Peter Vary


Currently it is very hard to run the Kubernetes Operator e2e tests manually. 
Especially on MAC. We need to improve upon this to ease of the development 
process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29629) FlameGraph is empty for Legacy Source Threads

2022-11-07 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629824#comment-17629824
 ] 

Peter Vary commented on FLINK-29629:


Most probably I will not have time to work on this part of the code in the near 
future :(. Mostly only filed the ticket to document the current situation which 
is not ideal. I fear that by closing the ticket the 3 of us will be the only 
people to remember the issue for a while, but if this is the way how we handle 
these situations in Flink, feel free to go ahead and close the ticket (I am 
just learning the processes :)).
Thanks, for all the help here [~zhuzh]!

> FlameGraph is empty for Legacy Source Threads
> -
>
> Key: FLINK-29629
> URL: https://issues.apache.org/jira/browse/FLINK-29629
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Reporter: Peter Vary
>Priority: Major
>
> Thread dump gets the stack trace for the {{Custom Source}} thread, but this 
> thread is always in {{TIMED_WAITING}}:
> {code}
> "Source: Custom Source -> A random source (1/2)#0" ...
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method)
>   - parking to wait for  <0xea775750> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.parkNanos()
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take()
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:335)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
> [..]
> {code}
> The actual code is run in the {{Legacy Source Thread}}:
> {code}
> "Legacy Source Thread - Source: Custom Source -> A random source (1/2)#0" ...
>java.lang.Thread.State: RUNNABLE
> {code}
> This causes the WebUI FlameGraph to be empty of any useful data.
> This is an example code to reproduce:
> {code}
> DataStream inputStream = env.addSource(new 
> RandomRecordSource(recordSize));
> inputStream = inputStream.map(new CounterMapper());
> FlinkSink.forRowData(inputStream).tableLoader(loader).append();
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29754) HadoopConfigLoader should consider Hadoop configuration files

2022-10-25 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623882#comment-17623882
 ] 

Peter Vary commented on FLINK-29754:


[This|https://github.com/apache/flink/blob/0e612856772d5f469c7d4a4fff90a58b6e0f5578/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/util/HadoopUtils.java#L59]
 is the line which causes the issue.

When you try to instantiate the {{HdfsConfiguration}} you need the 
{{hadoop-hdfs}} on the classpath.

> HadoopConfigLoader should consider Hadoop configuration files
> -
>
> Key: FLINK-29754
> URL: https://issues.apache.org/jira/browse/FLINK-29754
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Reporter: Peter Vary
>Priority: Major
>
> Currently 
> [HadoopConfigLoader|https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/util/HadoopConfigLoader.java]
>  considers Hadoop configurations on the classpath, but does not consider 
> Hadoop configuration files which are set in another way.
> So if the Hadoop configuration is set through the {{HADOOP_CONF_DIR}} 
> environment variable, then the configuration loaded by the HadoopConfigLoader 
> will not contain the values set there.
> This can cause unexpected behaviour when setting checkpoint / savepoint dirs 
> on S3, and the specific S3 configurations are set in the Hadoop configuration 
> files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29754) HadoopConfigLoader should consider Hadoop configuration files

2022-10-25 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623704#comment-17623704
 ] 

Peter Vary commented on FLINK-29754:


I create a configuration file like this:
{code:java}




hadoop.fs.s3a.buffer.dir
/flink-data


fs.s3a.bucket.probe
0

[..]
 {code}

The configuration values set in the configuration files are available, and used 
when accessing S3 in the case when this configuration file is on the classpath 
(packaged in the jar). OTOH, if I create a HADOO_CONF_DIR and put the config 
files there, then the configuration values are not available and not used when 
accessing S3.

> HadoopConfigLoader should consider Hadoop configuration files
> -
>
> Key: FLINK-29754
> URL: https://issues.apache.org/jira/browse/FLINK-29754
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Reporter: Peter Vary
>Priority: Major
>
> Currently 
> [HadoopConfigLoader|https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/util/HadoopConfigLoader.java]
>  considers Hadoop configurations on the classpath, but does not consider 
> Hadoop configuration files which are set in another way.
> So if the Hadoop configuration is set through the {{HADOOP_CONF_DIR}} 
> environment variable, then the configuration loaded by the HadoopConfigLoader 
> will not contain the values set there.
> This can cause unexpected behaviour when setting checkpoint / savepoint dirs 
> on S3, and the specific S3 configurations are set in the Hadoop configuration 
> files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29754) HadoopConfigLoader should consider Hadoop configuration files

2022-10-25 Thread Peter Vary (Jira)
Peter Vary created FLINK-29754:
--

 Summary: HadoopConfigLoader should consider Hadoop configuration 
files
 Key: FLINK-29754
 URL: https://issues.apache.org/jira/browse/FLINK-29754
 Project: Flink
  Issue Type: Bug
  Components: FileSystems
Reporter: Peter Vary


Currently 
[HadoopConfigLoader|https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/util/HadoopConfigLoader.java]
 considers Hadoop configurations on the classpath, but does not consider Hadoop 
configuration files which are set in another way.

So if the Hadoop configuration is set through the {{HADOOP_CONF_DIR}} 
environment variable, then the configuration loaded by the HadoopConfigLoader 
will not contain the values set there.

This can cause unexpected behaviour when setting checkpoint / savepoint dirs on 
S3, and the specific S3 configurations are set in the Hadoop configuration files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29629) FlameGraph is empty for Legacy Source Threads

2022-10-25 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623647#comment-17623647
 ] 

Peter Vary commented on FLINK-29629:


Thanks [~zhuzh]!

Thanks for the info, I checked the linked document and agree with you and 
[~chesnay] that we should not sink more resources in the Legacy Sources than 
needed. Also +1 on adding an option to add the stack trace of the extra threads 
for the operator FlameGraph. In some cases they are not that important as they 
are not on the critical path, but they are consuming resources and may become a 
bottleneck, so it would be good to have an option to display them.

> FlameGraph is empty for Legacy Source Threads
> -
>
> Key: FLINK-29629
> URL: https://issues.apache.org/jira/browse/FLINK-29629
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Reporter: Peter Vary
>Priority: Major
>
> Thread dump gets the stack trace for the {{Custom Source}} thread, but this 
> thread is always in {{TIMED_WAITING}}:
> {code}
> "Source: Custom Source -> A random source (1/2)#0" ...
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method)
>   - parking to wait for  <0xea775750> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.parkNanos()
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take()
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:335)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
> [..]
> {code}
> The actual code is run in the {{Legacy Source Thread}}:
> {code}
> "Legacy Source Thread - Source: Custom Source -> A random source (1/2)#0" ...
>java.lang.Thread.State: RUNNABLE
> {code}
> This causes the WebUI FlameGraph to be empty of any useful data.
> This is an example code to reproduce:
> {code}
> DataStream inputStream = env.addSource(new 
> RandomRecordSource(recordSize));
> inputStream = inputStream.map(new CounterMapper());
> FlinkSink.forRowData(inputStream).tableLoader(loader).append();
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29713) Kubernetes operator should restart failed jobs

2022-10-21 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622561#comment-17622561
 ] 

Peter Vary commented on FLINK-29713:


The main goal is to have a possibility to add a new sink, when a new type of 
data has arrived.

Imagine a job where the incoming data defines the sinks, like a Kafka topics or 
a database tables. We start with a set of known data types. The main task will 
create the appropriate topics and sinks based on the current known data types. 
Later a new record arrives with a new type which needs a new sink. The job 
needs to be reconfigured and a new sink needs to be added.

This way the Flink job can dynamically adjust itself to handle the incoming 
data.

> Kubernetes operator should restart failed jobs
> --
>
> Key: FLINK-29713
> URL: https://issues.apache.org/jira/browse/FLINK-29713
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
> Fix For: kubernetes-operator-1.3.0
>
>
> It would be good to have the possibility to restart the Flink Application if 
> it goes to {{FAILED}} state.
> This could be used to restart, and reconfigure the job dynamically in the 
> application {{main}} method if the current application can not handle the 
> incoming data



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29713) Kubernetes operator should restart failed jobs

2022-10-21 Thread Peter Vary (Jira)
Peter Vary created FLINK-29713:
--

 Summary: Kubernetes operator should restart failed jobs
 Key: FLINK-29713
 URL: https://issues.apache.org/jira/browse/FLINK-29713
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Peter Vary


It would be good to have the possibility to restart the Flink Application if it 
goes to {{FAILED}} state.
This could be used to restart, and reconfigure the job dynamically in the 
application {{main}} method if the current application can not handle the 
incoming data



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29629) FlameGraph is empty for Legacy Source Threads

2022-10-17 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619053#comment-17619053
 ] 

Peter Vary commented on FLINK-29629:


[~chesnay]: Does that mean that some of the code below uses deprecated API for 
creating the {{Source}}?
{code}
DataStream inputStream = env.addSource(new 
RandomRecordSource(recordSize));
inputStream = inputStream.map(new CounterMapper());
FlinkSink.forRowData(inputStream).tableLoader(loader).append();
{code}
Or it is just there are some ongoing effort to substitute the implementation 
which is ATM uses the {{Legacy}} Source and after the implementation is 
finished then it will be use the new Source?
Thanks,
Peter

> FlameGraph is empty for Legacy Source Threads
> -
>
> Key: FLINK-29629
> URL: https://issues.apache.org/jira/browse/FLINK-29629
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Reporter: Peter Vary
>Priority: Major
>
> Thread dump gets the stack trace for the {{Custom Source}} thread, but this 
> thread is always in {{TIMED_WAITING}}:
> {code}
> "Source: Custom Source -> A random source (1/2)#0" ...
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method)
>   - parking to wait for  <0xea775750> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.parkNanos()
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take()
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:335)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
> [..]
> {code}
> The actual code is run in the {{Legacy Source Thread}}:
> {code}
> "Legacy Source Thread - Source: Custom Source -> A random source (1/2)#0" ...
>java.lang.Thread.State: RUNNABLE
> {code}
> This causes the WebUI FlameGraph to be empty of any useful data.
> This is an example code to reproduce:
> {code}
> DataStream inputStream = env.addSource(new 
> RandomRecordSource(recordSize));
> inputStream = inputStream.map(new CounterMapper());
> FlinkSink.forRowData(inputStream).tableLoader(loader).append();
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29629) FlameGraph is empty for Legacy Source Threads

2022-10-13 Thread Peter Vary (Jira)
Peter Vary created FLINK-29629:
--

 Summary: FlameGraph is empty for Legacy Source Threads
 Key: FLINK-29629
 URL: https://issues.apache.org/jira/browse/FLINK-29629
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Reporter: Peter Vary


Thread dump gets the stack trace for the {{Custom Source}} thread, but this 
thread is always in {{TIMED_WAITING}}:
{code}
"Source: Custom Source -> A random source (1/2)#0" ...
   java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method)
- parking to wait for  <0xea775750> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos()
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
at 
org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take()
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:335)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
[..]
{code}

The actual code is run in the {{Legacy Source Thread}}:
{code}
"Legacy Source Thread - Source: Custom Source -> A random source (1/2)#0" ...
   java.lang.Thread.State: RUNNABLE
{code}

This causes the WebUI FlameGraph to be empty of any useful data.

This is an example code to reproduce:
{code}
DataStream inputStream = env.addSource(new 
RandomRecordSource(recordSize));
inputStream = inputStream.map(new CounterMapper());
FlinkSink.forRowData(inputStream).tableLoader(loader).append();
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29123) Dynamic paramters are not pushed to working with kubernetes

2022-08-26 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated FLINK-29123:
---
Summary: Dynamic paramters are not pushed to working with kubernetes  (was: 
Dynamic paramters are not pushed to working with kubertenes)

> Dynamic paramters are not pushed to working with kubernetes
> ---
>
> Key: FLINK-29123
> URL: https://issues.apache.org/jira/browse/FLINK-29123
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.15.2
>Reporter: Peter Vary
>Priority: Major
>
> It is not possible to push dynamic parameters for the kubernetes deployments



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29123) Dynamic paramters are not pushed to working with kubertenes

2022-08-26 Thread Peter Vary (Jira)
Peter Vary created FLINK-29123:
--

 Summary: Dynamic paramters are not pushed to working with 
kubertenes
 Key: FLINK-29123
 URL: https://issues.apache.org/jira/browse/FLINK-29123
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.15.2
Reporter: Peter Vary


It is not possible to push dynamic parameters for the kubernetes deployments



--
This message was sent by Atlassian Jira
(v8.20.10#820010)