from:"Chesnay Schepler \(JIRA\)"

[jira] [Updated] (FLINK-26958) Add Github Actions build pipeline for flink-connector-elasticsearch

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-26958:
-
Issue Type: Technical Debt  (was: Improvement)

> Add Github Actions build pipeline for flink-connector-elasticsearch
> ---
>
> Key: FLINK-26958
> URL: https://issues.apache.org/jira/browse/FLINK-26958
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System, Connectors / Common, Connectors / 
> ElasticSearch
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> With connectors being moved to their individual repository, we need to have a 
> pipeline that can run the necessary compile and test steps. 
> With Elasticsearch in the process of being moved out (see FLINK-26884) we 
> should add this to make sure that on pushes and pull requests this pipeline 
> is executed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-25806) Remove legacy high availability services

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-25806:
-
Issue Type: Technical Debt  (was: Improvement)

> Remove legacy high availability services
> 
>
> Key: FLINK-25806
> URL: https://issues.apache.org/jira/browse/FLINK-25806
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Affects Versions: 1.16.0
>Reporter: Till Rohrmann
>Assignee: Chesnay Schepler
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> After FLINK-24038, we should consider removing the legacy high availability 
> services {{ZooKeeperHaServices}} and {{KubernetesHaServices}} since they are 
> now subsumed by the multiple component leader election service that only uses 
> a single leader election per component.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28722) Hybrid Source should use .equals() for Integer comparison

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28722:
-
Issue Type: Bug  (was: Improvement)

> Hybrid Source should use .equals() for Integer comparison
> -
>
> Key: FLINK-28722
> URL: https://issues.apache.org/jira/browse/FLINK-28722
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.15.1
>Reporter: Mason Chen
>Priority: Major
> Fix For: 1.16.0, 1.15.2
>
>
> HybridSource should use .equals() for Integer comparison in filtering out the 
> underlying sources. This causes the HybridSource to stop working when it hits 
> the 128th source (would not work for anything past 127 sources).
> https://github.com/apache/flink/blob/release-1.14.3-rc1/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/hybrid/HybridSourceSplitEnumerator.java#L358
>  
> A user reported this issue here: 
> https://lists.apache.org/thread/7h2rblsdt7rjf85q9mhfht77bghtbswh



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-19358) when submit job on application mode with HA,the jobid will be 0000000000

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-19358:
-
Issue Type: Improvement  (was: Bug)

> when submit job on application mode with HA,the jobid will be 00
> 
>
> Key: FLINK-19358
> URL: https://issues.apache.org/jira/browse/FLINK-19358
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Jun Zhang
>Assignee: Yangze Guo
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available, 
> usability
> Fix For: 1.16.0
>
>
> when submit a flink job on application mode with HA ,the flink job id will be 
> , when I have many jobs ,they have the same 
> job id , it will be lead to a checkpoint error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-19358) HA should not always use jobid 0000000000

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-19358:
-
Summary: HA should not always use jobid 00  (was: when submit job 
on application mode with HA,the jobid will be 00)

> HA should not always use jobid 00
> -
>
> Key: FLINK-19358
> URL: https://issues.apache.org/jira/browse/FLINK-19358
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Jun Zhang
>Assignee: Yangze Guo
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available, 
> usability
> Fix For: 1.16.0
>
>
> when submit a flink job on application mode with HA ,the flink job id will be 
> , when I have many jobs ,they have the same 
> job id , it will be lead to a checkpoint error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-14998) Remove FileUtils#deletePathIfEmpty

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-14998:
-
Issue Type: Technical Debt  (was: Bug)

> Remove FileUtils#deletePathIfEmpty
> --
>
> Key: FLINK-14998
> URL: https://issues.apache.org/jira/browse/FLINK-14998
> Project: Flink
>  Issue Type: Technical Debt
>  Components: FileSystems
>Reporter: Yun Tang
>Assignee: jay li
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available, starter
> Fix For: 1.16.0
>
>
> With the lesson learned from FLINK-7266, and the refactor of FLINK-8540, 
> method of  {{FileUtils#deletePathIfEmpty}} has been totally useless in Flink 
> production code. From my point of view, it's not wise to provide a method 
> with already known high-risk defect in Flink official code. I suggest to 
> remove this part of code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-26553) Enable scalafmt for scala codebase

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-26553:
-
Issue Type: Technical Debt  (was: Bug)

> Enable scalafmt for scala codebase
> --
>
> Key: FLINK-26553
> URL: https://issues.apache.org/jira/browse/FLINK-26553
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Francesco Guardiani
>Assignee: Francesco Guardiani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> As discussed in 
> https://lists.apache.org/thread/97398pc9cb8y922xlb6mzlsbjtjf5jnv, we should 
> enable scalafmt in our codebase



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-26710) TestLoggerResource hides log lines

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-26710:
-
Issue Type: Technical Debt  (was: Bug)

> TestLoggerResource hides log lines
> --
>
> Key: FLINK-26710
> URL: https://issues.apache.org/jira/browse/FLINK-26710
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Test Infrastructure
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Niklas Semmler
>Assignee: Niklas Semmler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> {{org.apache.flink.testutils.logging.TestLoggerResource}} makes log lines 
> accessible to tests. It extends {{org.junit.rules.ExternalResource}} and thus 
> can be used as a rule. Example for its use can be found 
> [here|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/test/java/org/apache/flink/runtime/jobmanager/JobManagerProcessUtilsTest.java#L145].
> Unfortunately, the current implementation consumes *all* log lines of a 
> {{Logger}} in such a way that they are not forwarded to the general log 
> output. As such, someone debugging a test will not have access to log lines 
> and this can complicate debugging. The implementation needs to be changed to 
> non-exclusively consume log lines.
> In a first attempt, we enabled the {{additivity}} of the {{Logger}} created 
> by {{TestLoggerResource}}. This works only for the case were both the 
> {{Logger}} created by {{TestLoggerResource}} and the parent {{Logger}} use 
> the same log level. When they use different log levels, the 
> {{TestLoggerResource}} {{Logger}} overwrites the log level. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-27163) Fix typo issue in Flink Metrics documentation

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-27163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-27163:
-
Issue Type: Technical Debt  (was: Bug)

> Fix typo issue in Flink Metrics documentation
> -
>
> Key: FLINK-27163
> URL: https://issues.apache.org/jira/browse/FLINK-27163
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Documentation
>Affects Versions: 1.15.0, 1.16.0
>Reporter: hao wang
>Assignee: hao wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.16.0
>
> Attachments: 20220411145958283.png
>
>
> The Cluster module in the metrics documentation has five items，but only four 
> are specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-27485) Documentation build pipeline is broken

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-27485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-27485:
-
Issue Type: Technical Debt  (was: Bug)

> Documentation build pipeline is broken 
> ---
>
> Key: FLINK-27485
> URL: https://issues.apache.org/jira/browse/FLINK-27485
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System, Documentation
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> The current documentation build pipeline is broken due to two failures:
> - It uses git command {{git branch --show-current}} which isn't supported by 
> the installed Git version on the Docker image. We can switch to {{git 
> rev-parse --abbrev-ref HEAD}} as an alternative
> - The manual Hugo download and installation is outdated and doesn't add Hugo 
> to the PATH



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-27751) Dependency resolution from repository.jboss.org fails on CI

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-27751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-27751:
-
Issue Type: Technical Debt  (was: Improvement)

> Dependency resolution from repository.jboss.org fails on CI
> ---
>
> Key: FLINK-27751
> URL: https://issues.apache.org/jira/browse/FLINK-27751
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / Azure Pipelines
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Huang Xingbo
>Assignee: Chesnay Schepler
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.5, 1.15.1, 1.16.0
>
>
> {code:java}
> 2022-05-24T03:50:20.5443243Z 03:50:20,543 ERROR 
> org.apache.flink.tools.ci.suffixcheck.ScalaSuffixChecker [] - Violations 
> found:
> 2022-05-24T03:50:20.5444210Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 
> 'flink-formats/flink-sequence-file/pom.xml'.
> 2022-05-24T03:50:20.5445185Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 
> 'flink-connectors/flink-hadoop-compatibility/pom.xml'.
> 2022-05-24T03:50:20.5446207Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 
> 'flink-connectors/flink-connector-hbase-1.4/pom.xml'.
> 2022-05-24T03:50:20.5447186Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 
> 'flink-connectors/flink-connector-hbase-2.2/pom.xml'.
> 2022-05-24T03:50:20.5448135Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 'flink-connectors/flink-hcatalog/pom.xml'.
> 2022-05-24T03:50:20.5449237Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 
> 'flink-connectors/flink-connector-hive/pom.xml'.
> 2022-05-24T03:50:20.5450180Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 'flink-table/flink-sql-client/pom.xml'.
> 2022-05-24T03:50:20.5451049Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 'flink-tests/pom.xml'.
> 2022-05-24T03:50:20.5452020Z  Scala-free module 'flink-hcatalog' is 
> referenced with scala suffix in 'flink-connectors/flink-hcatalog/pom.xml'.
> 2022-05-24T03:50:20.5453369Z  Scala-free module 
> 'flink-sql-connector-hive-2.3.9' is referenced with scala suffix in 
> 'flink-connectors/flink-sql-connector-hive-2.3.9/pom.xml'.
> 2022-05-24T03:50:20.5454388Z  Scala-free module 
> 'flink-sql-connector-hive-3.1.2' is referenced with scala suffix in 
> 'flink-connectors/flink-sql-connector-hive-3.1.2/pom.xml'.
> 2022-05-24T03:50:20.6860887Z 
> ==
> 2022-05-24T03:50:20.6861601Z Suffix Check failed. See previous output for 
> details.
> 2022-05-24T03:50:20.6862335Z 
> ==
> 2022-05-24T03:50:20.6942100Z ##[error]Bash exited with code '1'.
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=35994&view=logs&j=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5&t=54421a62-0c80-5aad-3319-094ff69180bb



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-27751) Dependency resolution from repository.jboss.org fails on CI

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-27751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-27751:
-
Issue Type: Improvement  (was: Bug)

> Dependency resolution from repository.jboss.org fails on CI
> ---
>
> Key: FLINK-27751
> URL: https://issues.apache.org/jira/browse/FLINK-27751
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / Azure Pipelines
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Huang Xingbo
>Assignee: Chesnay Schepler
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.5, 1.15.1, 1.16.0
>
>
> {code:java}
> 2022-05-24T03:50:20.5443243Z 03:50:20,543 ERROR 
> org.apache.flink.tools.ci.suffixcheck.ScalaSuffixChecker [] - Violations 
> found:
> 2022-05-24T03:50:20.5444210Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 
> 'flink-formats/flink-sequence-file/pom.xml'.
> 2022-05-24T03:50:20.5445185Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 
> 'flink-connectors/flink-hadoop-compatibility/pom.xml'.
> 2022-05-24T03:50:20.5446207Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 
> 'flink-connectors/flink-connector-hbase-1.4/pom.xml'.
> 2022-05-24T03:50:20.5447186Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 
> 'flink-connectors/flink-connector-hbase-2.2/pom.xml'.
> 2022-05-24T03:50:20.5448135Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 'flink-connectors/flink-hcatalog/pom.xml'.
> 2022-05-24T03:50:20.5449237Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 
> 'flink-connectors/flink-connector-hive/pom.xml'.
> 2022-05-24T03:50:20.5450180Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 'flink-table/flink-sql-client/pom.xml'.
> 2022-05-24T03:50:20.5451049Z  Scala-free module 'flink-hadoop-compatibility' 
> is referenced with scala suffix in 'flink-tests/pom.xml'.
> 2022-05-24T03:50:20.5452020Z  Scala-free module 'flink-hcatalog' is 
> referenced with scala suffix in 'flink-connectors/flink-hcatalog/pom.xml'.
> 2022-05-24T03:50:20.5453369Z  Scala-free module 
> 'flink-sql-connector-hive-2.3.9' is referenced with scala suffix in 
> 'flink-connectors/flink-sql-connector-hive-2.3.9/pom.xml'.
> 2022-05-24T03:50:20.5454388Z  Scala-free module 
> 'flink-sql-connector-hive-3.1.2' is referenced with scala suffix in 
> 'flink-connectors/flink-sql-connector-hive-3.1.2/pom.xml'.
> 2022-05-24T03:50:20.6860887Z 
> ==
> 2022-05-24T03:50:20.6861601Z Suffix Check failed. See previous output for 
> details.
> 2022-05-24T03:50:20.6862335Z 
> ==
> 2022-05-24T03:50:20.6942100Z ##[error]Bash exited with code '1'.
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=35994&view=logs&j=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5&t=54421a62-0c80-5aad-3319-094ff69180bb



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28198) CassandraConnectorITCase fails with timeout

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28198:
-
Issue Type: Technical Debt  (was: Bug)

> CassandraConnectorITCase fails with timeout
> ---
>
> Key: FLINK-28198
> URL: https://issues.apache.org/jira/browse/FLINK-28198
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Cassandra
>Affects Versions: 1.16.0
>Reporter: Martijn Visser
>Assignee: Etienne Chauchot
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.16.0
>
>
> {code:java}
> Jun 22 07:57:37 [ERROR] 
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase.testRaiseCassandraRequestsTimeouts
>   Time elapsed: 12.067 s  <<< ERROR!
> Jun 22 07:57:37 
> com.datastax.driver.core.exceptions.OperationTimedOutException: 
> [/172.17.0.1:59915] Timed out waiting for server response
> Jun 22 07:57:37   at 
> com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:43)
> Jun 22 07:57:37   at 
> com.datastax.driver.core.exceptions.OperationTimedOutException.copy(OperationTimedOutException.java:25)
> Jun 22 07:57:37   at 
> com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35)
> Jun 22 07:57:37   at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293)
> Jun 22 07:57:37   at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58)
> Jun 22 07:57:37   at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39)
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=37037&view=logs&j=4eda0b4a-bd0d-521a-0916-8285b9be9bb5&t=2ff6d5fa-53a6-53ac-bff7-fa524ea361a9&l=13736



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28263) TPC-DS Bash e2e tests don't clean-up after completing

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28263:
-
Issue Type: Technical Debt  (was: Bug)

> TPC-DS Bash e2e tests don't clean-up after completing
> -
>
> Key: FLINK-28263
> URL: https://issues.apache.org/jira/browse/FLINK-28263
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Tests
>Affects Versions: 1.16.0
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.16.0, 1.15.2, 1.14.6
>
>
> When debugging the disk space usage for the e2e tests, the top 20 folders 
> with the largest file size are:
> {code:java}
> 2022-06-27T09:32:59.8000587Z Jun 27 09:32:59 List top 20 directories with 
> largest file size
> 2022-06-27T09:33:00.9811803Z Jun 27 09:33:00 4088524  .
> 2022-06-27T09:33:00.9813428Z Jun 27 09:33:00 1277080  ./flink-end-to-end-tests
> 2022-06-27T09:33:00.9814324Z Jun 27 09:33:00 624512   ./flink-dist
> 2022-06-27T09:33:00.9815152Z Jun 27 09:33:00 624124   ./flink-dist/target
> 2022-06-27T09:33:00.9816093Z Jun 27 09:33:00 500032   
> ./flink-dist/target/flink-1.16-SNAPSHOT-bin
> 2022-06-27T09:33:00.9817429Z Jun 27 09:33:00 500028   
> ./flink-dist/target/flink-1.16-SNAPSHOT-bin/flink-1.16-SNAPSHOT
> 2022-06-27T09:33:00.9818167Z Jun 27 09:33:00 486412   ./.git
> 2022-06-27T09:33:00.9819096Z Jun 27 09:33:00 479416   ./.git/objects
> 2022-06-27T09:33:00.9819512Z Jun 27 09:33:00 479408   ./.git/objects/pack
> 2022-06-27T09:33:00.9820584Z Jun 27 09:33:00 461456   ./flink-connectors
> 2022-06-27T09:33:00.9821403Z Jun 27 09:33:00 449832   
> ./.git/objects/pack/pack-0bdd9e3186d0cb404910c5843d19b5cb80b84fe0.pack
> 2022-06-27T09:33:00.9821992Z Jun 27 09:33:00 349236   ./flink-table
> 2022-06-27T09:33:00.9822631Z Jun 27 09:33:00 293008   
> ./flink-dist/target/flink-1.16-SNAPSHOT-bin/flink-1.16-SNAPSHOT/opt
> 2022-06-27T09:33:00.9823233Z Jun 27 09:33:00 251272   ./flink-filesystems
> 2022-06-27T09:33:00.9823818Z Jun 27 09:33:00 246588   
> ./flink-end-to-end-tests/flink-streaming-kinesis-test
> 2022-06-27T09:33:00.9824502Z Jun 27 09:33:00 246464   
> ./flink-end-to-end-tests/flink-streaming-kinesis-test/target
> 2022-06-27T09:33:00.9825210Z Jun 27 09:33:00 196656   
> ./flink-dist/target/flink-1.16-SNAPSHOT-bin/flink-1.16-SNAPSHOT/lib
> 2022-06-27T09:33:00.9825966Z Jun 27 09:33:00 184364   
> ./flink-end-to-end-tests/flink-streaming-kinesis-test/target/KinesisExample.jar
> 2022-06-27T09:33:00.9826652Z Jun 27 09:33:00 156136   
> ./flink-end-to-end-tests/flink-tpcds-test
> 2022-06-27T09:33:00.9827284Z Jun 27 09:33:00 151180   
> ./flink-end-to-end-tests/flink-tpcds-test/target
> {code}
> See 
> https://dev.azure.com/martijn0323/Flink/_build/results?buildId=2732&view=logs&j=0e31ee24-31a6-528c-a4bf-45cde9b2a14e&t=ff03a8fa-e84e-5199-efb2-5433077ce8e2&l=5093
> After running {{TPC-DS end-to-end test}} and after the clean-up, the 
> following directories are listed in the top 20:
> {code:java}
> 2022-06-27T09:49:51.7694429Z Jun 27 09:49:51 List top 20 directories with 
> largest file size AFTER cleaning temorary folders and files
> 2022-06-27T09:49:52.9617221Z Jun 27 09:49:52 5315996  .
> 2022-06-27T09:49:52.9618830Z Jun 27 09:49:52 2504556  ./flink-end-to-end-tests
> 2022-06-27T09:49:52.9619848Z Jun 27 09:49:52 1383612  
> ./flink-end-to-end-tests/flink-tpcds-test
> 2022-06-27T09:49:52.9620796Z Jun 27 09:49:52 1378656  
> ./flink-end-to-end-tests/flink-tpcds-test/target
> 2022-06-27T09:49:52.9621730Z Jun 27 09:49:52 1223944  
> ./flink-end-to-end-tests/flink-tpcds-test/target/table
> 2022-06-27T09:49:52.9622844Z Jun 27 09:49:52 624508   ./flink-dist
> 2022-06-27T09:49:52.9623585Z Jun 27 09:49:52 624120   ./flink-dist/target
> 2022-06-27T09:49:52.9624398Z Jun 27 09:49:52 500028   
> ./flink-dist/target/flink-1.16-SNAPSHOT-bin
> 2022-06-27T09:49:52.9625366Z Jun 27 09:49:52 500024   
> ./flink-dist/target/flink-1.16-SNAPSHOT-bin/flink-1.16-SNAPSHOT
> 2022-06-27T09:49:52.9625994Z Jun 27 09:49:52 486412   ./.git
> 2022-06-27T09:49:52.9626514Z Jun 27 09:49:52 479416   ./.git/objects
> 2022-06-27T09:49:52.9631740Z Jun 27 09:49:52 479408   ./.git/objects/pack
> 2022-06-27T09:49:52.9632755Z Jun 27 09:49:52 461456   ./flink-connectors
> 2022-06-27T09:49:52.9633717Z Jun 27 09:49:52 449832   
> ./.git/objects/pack/pack-0bdd9e3186d0cb404910c5843d19b5cb80b84fe0.pack
> 2022-06-27T09:49:52.9634769Z Jun 27 09:49:52 379348   
> ./flink-end-to-end-tests/flink-tpcds-test/target/table/store_sales.dat
> 2022-06-27T09:49:52.9635596Z Jun 27 09:49:52 349236   ./flink-table
> 2022-06-27T09:49:52.9636489Z Jun 27 09:49:52 293008   
> ./flink-dist/target/flink-1.16-SNAPSHOT-bin/flink-1.16-SNAPSHOT/opt
> 2022-06-27T09:49:52.

[jira] [Updated] (FLINK-28388) Python doc build breaking nightly docs

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28388:
-
Issue Type: Technical Debt  (was: Bug)

> Python doc build breaking nightly docs
> --
>
> Key: FLINK-28388
> URL: https://issues.apache.org/jira/browse/FLINK-28388
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Python, Documentation
>Affects Versions: 1.16.0
>Reporter: Márton Balassi
>Assignee: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> For the past 5 days the nightly doc builds via GHA are broken:
> https://github.com/apache/flink/actions/workflows/docs.yml
> {noformat}
> Exception occurred:
>   File "/root/flink/flink-python/pyflink/java_gateway.py", line 86, in 
> launch_gateway
> raise Exception("It's launching the PythonGatewayServer during Python UDF 
> execution "
> Exception: It's launching the PythonGatewayServer during Python UDF execution 
> which is unexpected. It usually happens when the job codes are in the top 
> level of the Python script file and are not enclosed in a `if name == 'main'` 
> statement.
> The full traceback has been saved in /tmp/sphinx-err-3thh_wi2.log, if you 
> want to report the issue to the developers.
> Please also report this if it was a user error, so that a better error 
> message can be provided next time.
> A bug report can be filed in the tracker at 
> . Thanks!
> Makefile:76: recipe for target 'html' failed
> make: *** [html] Error 2
> ==sphinx checks... [FAILED]===
> Error: Process completed with exit code 1.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28781) Hybrid Shuffle should support compression

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28781:
-
Issue Type: Improvement  (was: Bug)

> Hybrid Shuffle should support compression
> -
>
> Key: FLINK-28781
> URL: https://issues.apache.org/jira/browse/FLINK-28781
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Weijie Guo
>Assignee: Weijie Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Compression is a useful feature for batch jobs, which can significantly 
> reduce disk load and the amount of data transferred over the network. Hybrid 
> shuffle should also support the compression of spilled data, especially under 
> the full spilling strategy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28844) YARNHighAvailabilityITCase fails with NoSuchMethod of org.apache.curator

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28844:
-
Issue Type: Technical Debt  (was: Bug)

> YARNHighAvailabilityITCase fails with NoSuchMethod of org.apache.curator
> 
>
> Key: FLINK-28844
> URL: https://issues.apache.org/jira/browse/FLINK-28844
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Test Infrastructure
>Affects Versions: 1.16.0
>Reporter: Jark Wu
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.16.0
>
>
> This is keep failing on master since the commit of 
> https://github.com/flink-ci/flink-mirror/commit/6335b573863af2b30a6541f910be96ddf61f9c84
>  which removes curator-test dependency from the flink-test-utils module. 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=39394&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461
> {code}
> 2022-08-05T18:31:47.0438160Z Aug 05 18:31:47 [ERROR] Tests run: 1, Failures: 
> 0, Errors: 1, Skipped: 0, Time elapsed: 5.237 s <<< FAILURE! - in 
> org.apache.flink.yarn.YARNHighAvailabilityITCase
> 2022-08-05T18:31:47.0439564Z Aug 05 18:31:47 [ERROR] 
> org.apache.flink.yarn.YARNHighAvailabilityITCase  Time elapsed: 5.237 s  <<< 
> ERROR!
> 2022-08-05T18:31:47.0440370Z Aug 05 18:31:47 java.lang.NoSuchMethodError: 
> org.apache.curator.test.InstanceSpec.getHostname()Ljava/lang/String;
> 2022-08-05T18:31:47.0441582Z Aug 05 18:31:47  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.getZookeeperInstanceSpecWithIncreasedSessionTimeout(ZooKeeperTestUtils.java:71)
> 2022-08-05T18:31:47.0442643Z Aug 05 18:31:47  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.createAndStartZookeeperTestingServer(ZooKeeperTestUtils.java:49)
> 2022-08-05T18:31:47.0443461Z Aug 05 18:31:47  at 
> org.apache.flink.yarn.YARNHighAvailabilityITCase.setup(YARNHighAvailabilityITCase.java:114)
> 2022-08-05T18:31:47.0444094Z Aug 05 18:31:47  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-08-05T18:31:47.0444717Z Aug 05 18:31:47  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-08-05T18:31:47.0445424Z Aug 05 18:31:47  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-08-05T18:31:47.0446063Z Aug 05 18:31:47  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-08-05T18:31:47.0446818Z Aug 05 18:31:47  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-08-05T18:31:47.0447822Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-08-05T18:31:47.0448657Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-08-05T18:31:47.0449692Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-08-05T18:31:47.0450637Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:126)
> 2022-08-05T18:31:47.0451443Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptBeforeAllMethod(TimeoutExtension.java:68)
> 2022-08-05T18:31:47.0452304Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-08-05T18:31:47.0453162Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-08-05T18:31:47.0454013Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-08-05T18:31:47.0454882Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-08-05T18:31:47.0455716Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-08-05T18:31:47.0456525Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-08-05T18:31:47.0457512Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-08-05T18:31:47.0458637Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-08-05T18:31:47.0459742Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambd

[jira] [Updated] (FLINK-28856) YARNHighAvailabilityITCase tests failed with NoSuchMethodError

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28856:
-
Issue Type: Technical Debt  (was: Bug)

> YARNHighAvailabilityITCase tests failed with NoSuchMethodError
> --
>
> Key: FLINK-28856
> URL: https://issues.apache.org/jira/browse/FLINK-28856
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.16.0
>Reporter: Huang Xingbo
>Priority: Blocker
>  Labels: test-stability
>
> {code:java}
> 2022-08-07T15:54:12.7203154Z Aug 07 15:54:12 [ERROR] 
> org.apache.flink.yarn.YARNHighAvailabilityITCase  Time elapsed: 4.606 s  <<< 
> ERROR!
> 2022-08-07T15:54:12.7203828Z Aug 07 15:54:12 java.lang.NoSuchMethodError: 
> org.apache.curator.test.InstanceSpec.getHostname()Ljava/lang/String;
> 2022-08-07T15:54:12.7204675Z Aug 07 15:54:12  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.getZookeeperInstanceSpecWithIncreasedSessionTimeout(ZooKeeperTestUtils.java:71)
> 2022-08-07T15:54:12.7205582Z Aug 07 15:54:12  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.createAndStartZookeeperTestingServer(ZooKeeperTestUtils.java:49)
> 2022-08-07T15:54:12.7206508Z Aug 07 15:54:12  at 
> org.apache.flink.yarn.YARNHighAvailabilityITCase.setup(YARNHighAvailabilityITCase.java:114)
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=39502&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28856) YARNHighAvailabilityITCase tests failed with NoSuchMethodError

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28856:
-
Fix Version/s: (was: 1.16.0)

> YARNHighAvailabilityITCase tests failed with NoSuchMethodError
> --
>
> Key: FLINK-28856
> URL: https://issues.apache.org/jira/browse/FLINK-28856
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.16.0
>Reporter: Huang Xingbo
>Priority: Blocker
>  Labels: test-stability
>
> {code:java}
> 2022-08-07T15:54:12.7203154Z Aug 07 15:54:12 [ERROR] 
> org.apache.flink.yarn.YARNHighAvailabilityITCase  Time elapsed: 4.606 s  <<< 
> ERROR!
> 2022-08-07T15:54:12.7203828Z Aug 07 15:54:12 java.lang.NoSuchMethodError: 
> org.apache.curator.test.InstanceSpec.getHostname()Ljava/lang/String;
> 2022-08-07T15:54:12.7204675Z Aug 07 15:54:12  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.getZookeeperInstanceSpecWithIncreasedSessionTimeout(ZooKeeperTestUtils.java:71)
> 2022-08-07T15:54:12.7205582Z Aug 07 15:54:12  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.createAndStartZookeeperTestingServer(ZooKeeperTestUtils.java:49)
> 2022-08-07T15:54:12.7206508Z Aug 07 15:54:12  at 
> org.apache.flink.yarn.YARNHighAvailabilityITCase.setup(YARNHighAvailabilityITCase.java:114)
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=39502&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28931) BlockingPartitionBenchmark doesn't compile

2022-08-12 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28931:
-
Issue Type: Technical Debt  (was: Bug)

> BlockingPartitionBenchmark doesn't compile
> --
>
> Key: FLINK-28931
> URL: https://issues.apache.org/jira/browse/FLINK-28931
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Benchmarks
>Affects Versions: 1.16.0
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> {code}
> 10:15:12  [ERROR] 
> /home/jenkins/workspace/flink-master-benchmarks-java8/flink-benchmarks/src/main/java/org/apache/flink/benchmark/BlockingPartitionBenchmark.java:117:50:
>   error: cannot find symbol
> {code}
> Caused by
> https://github.com/apache/flink/commit/9f5d0c48f198ff69a175f630832687ba02cf4c3e#diff-f72e79ebd747b6fde91988d65de9121a5907c97e4630cb1e30ab65601b4d9753R79



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-28719) Mapping a data source before window aggregation causes Flink to stop handle late events correctly.

2022-08-12 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578890#comment-17578890
 ] 

Chesnay Schepler edited comment on FLINK-28719 at 8/12/22 9:42 AM:
---

??As far as I understand, op1 and op2 should have watermark 1 and 2 
respectively, because those subtasks don't have any events in them (besides 1 
and 2, of course, which create those watermarks). Then, why do they get the 
maximum watermark of 7 after first step???

Watermarks are broadcasted to all downstream operators. So, if a source emits 
watermark 7, then all maps get it as an input.

??Also, why after op1 and op2, op4 and op5 get consumed? Is there any strategy 
that dictates in which order to process subtasks???

Whichever arrives first at the downstream operator. You have 7 map subtasks all 
sending data to the window operators at the same time, and the order in which 
they arrive is not deterministic. So they _may_ arrive in a perfect round-robin 
pattern, or sequentially, or in any other pattern. The only guarantee is that 
the elements from a particular map subtask arrive in the same order that they 
were sent in.

??Also, why op3 to op7 have such watermarks: W2, W5, W6, W6,  W6, after first 
step? I thought those subtasks should have watermark of Long.MinValue, because 
there were no elements before???

The above 2 comments should answer it; all watermarks are sent to all map 
subtasks, and are consumed by the window operator in any order.


was (Author: zentol):
??As far as I understand, op1 and op2 should have watermark 1 and 2 
respectively, because those subtasks don't have any events in them (besides 1 
and 2, of course, which create those watermarks). Then, why do they get the 
maximum watermark of 7 after first step???

Watermarks are broadcasted to all downstream operators. So, if a source emits 
watermark 7, then all maps get it as an input.

??Also, why after op1 and op2, op4 and op5 get consumed? Is there any strategy 
that dictates in which order to process subtasks?

Whichever arrives first at the downstream operator. You have 7 map subtasks all 
sending data to the window operators at the same time, and the order in which 
they arrive is not deterministic. So they _may_ arrive in a perfect round-robin 
pattern, or sequentially, or in any other pattern. The only guarantee is that 
the elements from a particular map subtask arrive in the same order that they 
were sent in.

??Also, why op3 to op7 have such watermarks: W2, W5, W6, W6,  W6, after first 
step? I thought those subtasks should have watermark of Long.MinValue, because 
there were no elements before?

The above 2 comments should answer it; all watermarks are sent to all map 
subtasks, and are consumed by the window operator in any order.

> Mapping a data source before window aggregation causes Flink to stop handle 
> late events correctly.
> --
>
> Key: FLINK-28719
> URL: https://issues.apache.org/jira/browse/FLINK-28719
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.15.1
>Reporter: Mykyta Mykhailenko
>Priority: Major
>
> I have created a 
> [repository|https://github.com/mykytamykhailenko/flink-map-with-issue] where 
> I describe this issue in detail. 
> I have provided a few tests and source code so that you can reproduce the 
> issue on your own machine. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-28719) Mapping a data source before window aggregation causes Flink to stop handle late events correctly.

2022-08-12 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578890#comment-17578890
 ] 

Chesnay Schepler edited comment on FLINK-28719 at 8/12/22 9:43 AM:
---

??As far as I understand, op1 and op2 should have watermark 1 and 2 
respectively, because those subtasks don't have any events in them (besides 1 
and 2, of course, which create those watermarks). Then, why do they get the 
maximum watermark of 7 after first step???

Watermarks are broadcasted to all downstream operators. So, if a source emits 
watermark 7, then all map subtasks get it as an input.

??Also, why after op1 and op2, op4 and op5 get consumed? Is there any strategy 
that dictates in which order to process subtasks???

Whichever arrives first at the downstream operator. You have 7 map subtasks all 
sending data to the window operators at the same time, and the order in which 
they arrive is not deterministic. So they _may_ arrive in a perfect round-robin 
pattern, or sequentially, or in any other pattern. The only guarantee is that 
the elements from a particular map subtask arrive in the same order that they 
were sent in.

??Also, why op3 to op7 have such watermarks: W2, W5, W6, W6,  W6, after first 
step? I thought those subtasks should have watermark of Long.MinValue, because 
there were no elements before???

The above 2 comments should answer it; all watermarks are sent to all map 
subtasks, and are consumed by the window operator in any order.


was (Author: zentol):
??As far as I understand, op1 and op2 should have watermark 1 and 2 
respectively, because those subtasks don't have any events in them (besides 1 
and 2, of course, which create those watermarks). Then, why do they get the 
maximum watermark of 7 after first step???

Watermarks are broadcasted to all downstream operators. So, if a source emits 
watermark 7, then all maps get it as an input.

??Also, why after op1 and op2, op4 and op5 get consumed? Is there any strategy 
that dictates in which order to process subtasks???

Whichever arrives first at the downstream operator. You have 7 map subtasks all 
sending data to the window operators at the same time, and the order in which 
they arrive is not deterministic. So they _may_ arrive in a perfect round-robin 
pattern, or sequentially, or in any other pattern. The only guarantee is that 
the elements from a particular map subtask arrive in the same order that they 
were sent in.

??Also, why op3 to op7 have such watermarks: W2, W5, W6, W6,  W6, after first 
step? I thought those subtasks should have watermark of Long.MinValue, because 
there were no elements before???

The above 2 comments should answer it; all watermarks are sent to all map 
subtasks, and are consumed by the window operator in any order.

> Mapping a data source before window aggregation causes Flink to stop handle 
> late events correctly.
> --
>
> Key: FLINK-28719
> URL: https://issues.apache.org/jira/browse/FLINK-28719
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.15.1
>Reporter: Mykyta Mykhailenko
>Priority: Major
>
> I have created a 
> [repository|https://github.com/mykytamykhailenko/flink-map-with-issue] where 
> I describe this issue in detail. 
> I have provided a few tests and source code so that you can reproduce the 
> issue on your own machine. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-28719) Mapping a data source before window aggregation causes Flink to stop handle late events correctly.

2022-08-12 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578890#comment-17578890
 ] 

Chesnay Schepler edited comment on FLINK-28719 at 8/12/22 9:42 AM:
---

??As far as I understand, op1 and op2 should have watermark 1 and 2 
respectively, because those subtasks don't have any events in them (besides 1 
and 2, of course, which create those watermarks). Then, why do they get the 
maximum watermark of 7 after first step???

Watermarks are broadcasted to all downstream operators. So, if a source emits 
watermark 7, then all maps get it as an input.

??Also, why after op1 and op2, op4 and op5 get consumed? Is there any strategy 
that dictates in which order to process subtasks?

Whichever arrives first at the downstream operator. You have 7 map subtasks all 
sending data to the window operators at the same time, and the order in which 
they arrive is not deterministic. So they _may_ arrive in a perfect round-robin 
pattern, or sequentially, or in any other pattern. The only guarantee is that 
the elements from a particular map subtask arrive in the same order that they 
were sent in.

??Also, why op3 to op7 have such watermarks: W2, W5, W6, W6,  W6, after first 
step? I thought those subtasks should have watermark of Long.MinValue, because 
there were no elements before?

The above 2 comments should answer it; all watermarks are sent to all map 
subtasks, and are consumed by the window operator in any order.


was (Author: zentol):
> As far as I understand, op1 and op2 should have watermark 1 and 2 
> respectively, because those subtasks don't have any events in them (besides 1 
> and 2, of course, which create those watermarks). Then, why do they get the 
> maximum watermark of 7 after first step?

Watermarks are broadcasted to all downstream operators. So, if a source emits 
watermark 7, then all maps get it as an input.

> Also, why after op1 and op2, op4 and op5 get consumed? Is there any strategy 
> that dictates in which order to process subtasks?

Whichever arrives first at the downstream operator. You have 7 map subtasks all 
sending data to the window operators at the same time, and the order in which 
they arrive is not deterministic. So they _may_ arrive in a perfect round-robin 
pattern, or sequentially, or in any other pattern. The only guarantee is that 
the elements from a particular map subtask arrive in the same order that they 
were sent in.

> Also, why op3 to op7 have such watermarks: W2, W5, W6, W6,  W6, after first 
> step? I thought those subtasks should have watermark of Long.MinValue, 
> because there were no elements before?

The above 2 comments should answer it; all watermarks are sent to all map 
subtasks, and are consumed by the window operator in any order.

> Mapping a data source before window aggregation causes Flink to stop handle 
> late events correctly.
> --
>
> Key: FLINK-28719
> URL: https://issues.apache.org/jira/browse/FLINK-28719
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.15.1
>Reporter: Mykyta Mykhailenko
>Priority: Major
>
> I have created a 
> [repository|https://github.com/mykytamykhailenko/flink-map-with-issue] where 
> I describe this issue in detail. 
> I have provided a few tests and source code so that you can reproduce the 
> issue on your own machine. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-28719) Mapping a data source before window aggregation causes Flink to stop handle late events correctly.

2022-08-12 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578890#comment-17578890
 ] 

Chesnay Schepler commented on FLINK-28719:
--

> As far as I understand, op1 and op2 should have watermark 1 and 2 
> respectively, because those subtasks don't have any events in them (besides 1 
> and 2, of course, which create those watermarks). Then, why do they get the 
> maximum watermark of 7 after first step?

Watermarks are broadcasted to all downstream operators. So, if a source emits 
watermark 7, then all maps get it as an input.

> Also, why after op1 and op2, op4 and op5 get consumed? Is there any strategy 
> that dictates in which order to process subtasks?

Whichever arrives first at the downstream operator. You have 7 map subtasks all 
sending data to the window operators at the same time, and the order in which 
they arrive is not deterministic. So they _may_ arrive in a perfect round-robin 
pattern, or sequentially, or in any other pattern. The only guarantee is that 
the elements from a particular map subtask arrive in the same order that they 
were sent in.

> Also, why op3 to op7 have such watermarks: W2, W5, W6, W6,  W6, after first 
> step? I thought those subtasks should have watermark of Long.MinValue, 
> because there were no elements before?

The above 2 comments should answer it; all watermarks are sent to all map 
subtasks, and are consumed by the window operator in any order.

> Mapping a data source before window aggregation causes Flink to stop handle 
> late events correctly.
> --
>
> Key: FLINK-28719
> URL: https://issues.apache.org/jira/browse/FLINK-28719
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.15.1
>Reporter: Mykyta Mykhailenko
>Priority: Major
>
> I have created a 
> [repository|https://github.com/mykytamykhailenko/flink-map-with-issue] where 
> I describe this issue in detail. 
> I have provided a few tests and source code so that you can reproduce the 
> issue on your own machine. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-26682) Migrate regression check script to python and to main repository

2022-08-10 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-26682:
-
Issue Type: Technical Debt  (was: New Feature)

> Migrate regression check script to python and to main repository
> 
>
> Key: FLINK-26682
> URL: https://issues.apache.org/jira/browse/FLINK-26682
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Benchmarks
>Affects Versions: 1.15.0
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Script provided by Roman 
> https://github.com/rkhachatryan/flink-benchmarks/blob/regression-alerts/regression-alert.sh
>  should be merge to the main repo and migrated to python.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-24433) "No space left on device" in Azure e2e tests

2022-08-10 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-24433:
-
Issue Type: Technical Debt  (was: Bug)

> "No space left on device" in Azure e2e tests
> 
>
> Key: FLINK-24433
> URL: https://issues.apache.org/jira/browse/FLINK-24433
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / Azure Pipelines
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Dawid Wysakowicz
>Assignee: Martijn Visser
>Priority: Blocker
>  Labels: auto-deprioritized-critical, pull-request-available, 
> test-stability
> Fix For: 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=24668&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=070ff179-953e-5bda-71fa-d6599415701c&l=19772
> {code}
> Sep 30 17:08:42 Job has been submitted with JobID 
> 5594c18e128a328ede39cfa59cb3cb07
> Sep 30 17:08:56 2021-09-30 17:08:56,809 main ERROR Recovering from 
> StringBuilderEncoder.encode('2021-09-30 17:08:56,807 WARN  
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher [] - An 
> exception occurred when fetching query results
> Sep 30 17:08:56 java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.rest.util.RestClientException: [Internal server 
> error.,  Sep 30 17:08:56 org.apache.flink.runtime.messages.FlinkJobNotFoundException: 
> Could not find Flink job (5594c18e128a328ede39cfa59cb3cb07)
> Sep 30 17:08:56   at 
> org.apache.flink.runtime.dispatcher.Dispatcher.getJobMasterGateway(Dispatcher.java:923)
> Sep 30 17:08:56   at 
> org.apache.flink.runtime.dispatcher.Dispatcher.performOperationOnJobMasterGateway(Dispatcher.java:937)
> Sep 30 17:08:56   at 
> org.apache.flink.runtime.dispatcher.Dispatcher.deliverCoordinationRequestToCoordina2021-09-30T17:08:57.1584224Z
>  ##[error]No space left on device
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28060) Kafka Commit on checkpointing fails repeatedly after a broker restart

2022-08-10 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28060.

Resolution: Fixed

master: bc9b401ed1f2e7257c7b44c9838e34ede9c52ed5

> Kafka Commit on checkpointing fails repeatedly after a broker restart
> -
>
> Key: FLINK-28060
> URL: https://issues.apache.org/jira/browse/FLINK-28060
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Connectors / Kafka
>Affects Versions: 1.15.0
> Environment: Reproduced on MacOS and Linux.
> Using java 8, Flink 1.15.0, Kafka 2.8.1.
>Reporter: Christian Lorenz
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
> Attachments: flink-kafka-testjob.zip
>
>
> When Kafka Offset committing is enabled and done on Flinks checkpointing, an 
> error might occur if one Kafka broker is shutdown which might be the leader 
> of that partition in Kafkas internal __consumer_offsets topic.
> This is an expected behaviour. But once the broker is started up again, the 
> next checkpoint issued by flink should commit the meanwhile processed offsets 
> back to kafka. Somehow this does not seem to happen always in Flink 1.15.0 
> anymore and the offset committing is broken. An warning like the following 
> will be logged on each checkpoint:
> {code}
> [info] 14:33:13.684 WARN  [Source Data Fetcher for Source: input-kafka-source 
> -> Sink: output-stdout-sink (1/1)#1] o.a.f.c.k.s.reader.KafkaSourceReader - 
> Failed to commit consumer offsets for checkpoint 35
> [info] org.apache.kafka.clients.consumer.RetriableCommitFailedException: 
> Offset commit failed with a retriable exception. You should retry committing 
> the latest consumed offsets.
> [info] Caused by: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.
> {code}
> To reproduce this I've attached a small flink job program.  To execute this 
> java8, scala sbt and docker / docker-compose is required.  Also see readme.md 
> for more details.
> The job can be run with `sbt run`, kafka cluster is started by 
> `docker-compose up`. If then the kafka brokers are restarted gracefully by 
> e.g. `docker-compose stop kafka1` and `docker-compose start kafka1` with 
> kafka2 and kafka3 afterwards, this warning will occur and no offsets will be 
> committed into kafka.
> This is not reproducible in flink 1.14.4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28857) Add Document for DataStream Cache API

2022-08-10 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28857:
-
Fix Version/s: 1.16.0
   (was: 1.16])

> Add Document for DataStream Cache API
> -
>
> Key: FLINK-28857
> URL: https://issues.apache.org/jira/browse/FLINK-28857
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Xuannan Su
>Assignee: Xuannan Su
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28060) Kafka Commit on checkpointing fails repeatedly after a broker restart

2022-08-09 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28060:
-
Fix Version/s: 1.16.0

> Kafka Commit on checkpointing fails repeatedly after a broker restart
> -
>
> Key: FLINK-28060
> URL: https://issues.apache.org/jira/browse/FLINK-28060
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Connectors / Kafka
>Affects Versions: 1.15.0
> Environment: Reproduced on MacOS and Linux.
> Using java 8, Flink 1.15.0, Kafka 2.8.1.
>Reporter: Christian Lorenz
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
> Attachments: flink-kafka-testjob.zip
>
>
> When Kafka Offset committing is enabled and done on Flinks checkpointing, an 
> error might occur if one Kafka broker is shutdown which might be the leader 
> of that partition in Kafkas internal __consumer_offsets topic.
> This is an expected behaviour. But once the broker is started up again, the 
> next checkpoint issued by flink should commit the meanwhile processed offsets 
> back to kafka. Somehow this does not seem to happen always in Flink 1.15.0 
> anymore and the offset committing is broken. An warning like the following 
> will be logged on each checkpoint:
> {code}
> [info] 14:33:13.684 WARN  [Source Data Fetcher for Source: input-kafka-source 
> -> Sink: output-stdout-sink (1/1)#1] o.a.f.c.k.s.reader.KafkaSourceReader - 
> Failed to commit consumer offsets for checkpoint 35
> [info] org.apache.kafka.clients.consumer.RetriableCommitFailedException: 
> Offset commit failed with a retriable exception. You should retry committing 
> the latest consumed offsets.
> [info] Caused by: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.
> {code}
> To reproduce this I've attached a small flink job program.  To execute this 
> java8, scala sbt and docker / docker-compose is required.  Also see readme.md 
> for more details.
> The job can be run with `sbt run`, kafka cluster is started by 
> `docker-compose up`. If then the kafka brokers are restarted gracefully by 
> e.g. `docker-compose stop kafka1` and `docker-compose start kafka1` with 
> kafka2 and kafka3 afterwards, this warning will occur and no offsets will be 
> committed into kafka.
> This is not reproducible in flink 1.14.4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-28060) Kafka Commit on checkpointing fails repeatedly after a broker restart

2022-08-09 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-28060:


Assignee: Chesnay Schepler

> Kafka Commit on checkpointing fails repeatedly after a broker restart
> -
>
> Key: FLINK-28060
> URL: https://issues.apache.org/jira/browse/FLINK-28060
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Connectors / Kafka
>Affects Versions: 1.15.0
> Environment: Reproduced on MacOS and Linux.
> Using java 8, Flink 1.15.0, Kafka 2.8.1.
>Reporter: Christian Lorenz
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Attachments: flink-kafka-testjob.zip
>
>
> When Kafka Offset committing is enabled and done on Flinks checkpointing, an 
> error might occur if one Kafka broker is shutdown which might be the leader 
> of that partition in Kafkas internal __consumer_offsets topic.
> This is an expected behaviour. But once the broker is started up again, the 
> next checkpoint issued by flink should commit the meanwhile processed offsets 
> back to kafka. Somehow this does not seem to happen always in Flink 1.15.0 
> anymore and the offset committing is broken. An warning like the following 
> will be logged on each checkpoint:
> {code}
> [info] 14:33:13.684 WARN  [Source Data Fetcher for Source: input-kafka-source 
> -> Sink: output-stdout-sink (1/1)#1] o.a.f.c.k.s.reader.KafkaSourceReader - 
> Failed to commit consumer offsets for checkpoint 35
> [info] org.apache.kafka.clients.consumer.RetriableCommitFailedException: 
> Offset commit failed with a retriable exception. You should retry committing 
> the latest consumed offsets.
> [info] Caused by: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.
> {code}
> To reproduce this I've attached a small flink job program.  To execute this 
> java8, scala sbt and docker / docker-compose is required.  Also see readme.md 
> for more details.
> The job can be run with `sbt run`, kafka cluster is started by 
> `docker-compose up`. If then the kafka brokers are restarted gracefully by 
> e.g. `docker-compose stop kafka1` and `docker-compose start kafka1` with 
> kafka2 and kafka3 afterwards, this warning will occur and no offsets will be 
> committed into kafka.
> This is not reproducible in flink 1.14.4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28731) Logging of global config should take dynamic properties into account

2022-08-09 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28731:
-
Priority: Minor  (was: Major)

> Logging of global config should take dynamic properties into account
> 
>
> Key: FLINK-28731
> URL: https://issues.apache.org/jira/browse/FLINK-28731
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Configuration
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> When a Flink process is started they first thing they do is load the global 
> configuration from the flink-conf.yaml and log what was read.
> When additional options were set by the user via dynamic properties, then 
> they are not reflected in the logging of the global configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28731) Logging of global config should take dynamic properties into account

2022-08-09 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28731.

Resolution: Fixed

master: 6b20433e7f007a97a1a958ef28e1c8f58277d940

> Logging of global config should take dynamic properties into account
> 
>
> Key: FLINK-28731
> URL: https://issues.apache.org/jira/browse/FLINK-28731
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Configuration
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> When a Flink process is started they first thing they do is load the global 
> configuration from the flink-conf.yaml and log what was read.
> When additional options were set by the user via dynamic properties, then 
> they are not reflected in the logging of the global configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-28060) Kafka Commit on checkpointing fails repeatedly after a broker restart

2022-08-09 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577368#comment-17577368
 ] 

Chesnay Schepler commented on FLINK-28060:
--

I ran CI with Kafka 3.2.1 and it passed without requiring any other changes.
https://dev.azure.com/chesnay/flink/_build/results?buildId=2944&view=results

We could think about including it in 1.16.0, although I'm not too fond of 
bumping dependencies so close before the feature freeze. (To be fair though, we 
would have a few weeks to observe it).

> Kafka Commit on checkpointing fails repeatedly after a broker restart
> -
>
> Key: FLINK-28060
> URL: https://issues.apache.org/jira/browse/FLINK-28060
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Connectors / Kafka
>Affects Versions: 1.15.0
> Environment: Reproduced on MacOS and Linux.
> Using java 8, Flink 1.15.0, Kafka 2.8.1.
>Reporter: Christian Lorenz
>Priority: Major
>  Labels: pull-request-available
> Attachments: flink-kafka-testjob.zip
>
>
> When Kafka Offset committing is enabled and done on Flinks checkpointing, an 
> error might occur if one Kafka broker is shutdown which might be the leader 
> of that partition in Kafkas internal __consumer_offsets topic.
> This is an expected behaviour. But once the broker is started up again, the 
> next checkpoint issued by flink should commit the meanwhile processed offsets 
> back to kafka. Somehow this does not seem to happen always in Flink 1.15.0 
> anymore and the offset committing is broken. An warning like the following 
> will be logged on each checkpoint:
> {code}
> [info] 14:33:13.684 WARN  [Source Data Fetcher for Source: input-kafka-source 
> -> Sink: output-stdout-sink (1/1)#1] o.a.f.c.k.s.reader.KafkaSourceReader - 
> Failed to commit consumer offsets for checkpoint 35
> [info] org.apache.kafka.clients.consumer.RetriableCommitFailedException: 
> Offset commit failed with a retriable exception. You should retry committing 
> the latest consumed offsets.
> [info] Caused by: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.
> {code}
> To reproduce this I've attached a small flink job program.  To execute this 
> java8, scala sbt and docker / docker-compose is required.  Also see readme.md 
> for more details.
> The job can be run with `sbt run`, kafka cluster is started by 
> `docker-compose up`. If then the kafka brokers are restarted gracefully by 
> e.g. `docker-compose stop kafka1` and `docker-compose start kafka1` with 
> kafka2 and kafka3 afterwards, this warning will occur and no offsets will be 
> committed into kafka.
> This is not reproducible in flink 1.14.4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28811) Enable Java 11 tests for Hbase 2

2022-08-09 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28811.

Resolution: Fixed

master: 5fb135e23e3251bcacf19d8db8cf358ddda76d6e

> Enable Java 11 tests for Hbase 2
> 
>
> Key: FLINK-28811
> URL: https://issues.apache.org/jira/browse/FLINK-28811
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / HBase
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.16.0
>
>
> AFAICT the hbase 2.2 unit/it cases success on java 11.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28352) [Umbrella] Make Pulsar connector stable

2022-08-09 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28352:
-
Priority: Critical  (was: Blocker)

> [Umbrella] Make Pulsar connector stable
> ---
>
> Key: FLINK-28352
> URL: https://issues.apache.org/jira/browse/FLINK-28352
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.16.0, 1.15.2, 1.14.6
>Reporter: Martijn Visser
>Assignee: Yufan Sheng
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.16.0
>
>
> This ticket is an umbrella ticket to keep track of all currently known Pulsar 
> connector test instabilities. These need to be resolved as soon as possible 
> and before other new Pulsar features can be added. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28865) Add updated Print sink for new interfaces

2022-08-08 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28865:
-
Description: The built-in print sink still uses the old sink interfaces. 
Add a new implementation for the new sink interfaces.  (was: The built-in print 
sink still uses the old sink interfaces. Add a new implementation for the new 
sink interfaces and deprecate the old sink.)

> Add updated Print sink for new interfaces
> -
>
> Key: FLINK-28865
> URL: https://issues.apache.org/jira/browse/FLINK-28865
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> The built-in print sink still uses the old sink interfaces. Add a new 
> implementation for the new sink interfaces.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28865) Add updated Print sink for new interfaces

2022-08-08 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28865.

Resolution: Fixed

master: aaea1adc155122f066736a4e2a4a287a40a77969

> Add updated Print sink for new interfaces
> -
>
> Key: FLINK-28865
> URL: https://issues.apache.org/jira/browse/FLINK-28865
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> The built-in print sink still uses the old sink interfaces. Add a new 
> implementation for the new sink interfaces and deprecate the old sink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-28874) Python tests fail locally with flink-connector-hive_2.12 is missing jhyde pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde

2022-08-08 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576907#comment-17576907
 ] 

Chesnay Schepler commented on FLINK-28874:
--

You should never run that script locally outside of a flink-ci docker image.

If you want to run the python tests locally, call 
{{flink-python/dev/lint-python.sh}}.

> Python tests fail locally with flink-connector-hive_2.12 is missing jhyde 
> pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde 
> 
>
> Key: FLINK-28874
> URL: https://issues.apache.org/jira/browse/FLINK-28874
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure, Tests
>Reporter: Sergey Nuyanzin
>Priority: Minor
>  Labels: pull-request-available
>
> this command fails  {{./tools/ci/test_controller.sh python}}
> The reason is a newer maven version blocks http repositories.
> There were similar issues e.g. 
> https://issues.apache.org/jira/browse/FLINK-27640
> The suggestion is to use maven wrapper in {{run_mvn}} in 
> {{tools/ci/maven-utils.sh}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28621) Register Java 8 modules in all internal object mappers

2022-08-08 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28621.

Fix Version/s: (was: 1.15.2)
   Resolution: Fixed

master:
9a967c010a58e0b2277516068256ef45ee711edc
c0bf0ac3fb1fe4814bff09807ed2040bb13da052
328007f0b9a3e4da31b20e75b94d9c339b168af0

> Register Java 8 modules in all internal object mappers
> --
>
> Key: FLINK-28621
> URL: https://issues.apache.org/jira/browse/FLINK-28621
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> In FLINK-25588 we extended flink-shaded-jackson to also bundle the jackson 
> extensions for handling Java 8 time / optional classes, but barely any of the 
> internal object mappers were adjusted to register said module.
> We can improve the user experience by always registering this module (in 
> cases where users can provide a mapper), and solve some incompatibilities in 
> others (like the JsonNodeDeserializationSchema).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28333) GlueSchemaRegistryAvroKinesisITCase is being Ignored due to `Access key not configured`

2022-08-08 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28333.

Resolution: Not A Problem

> GlueSchemaRegistryAvroKinesisITCase is being Ignored due to `Access key not 
> configured`
> ---
>
> Key: FLINK-28333
> URL: https://issues.apache.org/jira/browse/FLINK-28333
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Tests
>Affects Versions: 1.15.0
>Reporter: Ahmed Hamdy
>Assignee: Danny Cranmer
>Priority: Major
>
> h1. Description
> {{GlueSchemaRegistryAvroKinesisITCase}} test is not being run on CI and is 
> skipped due to {{Access key not configured}}. 
> Access Key and Secret Key should be added to test environment variables to 
> enable test.
> Currently on adding these keys to environment variables the test fails with 
> {quote}AWSSchemaRegistryException: Exception occurred while fetching or 
> registering schema definition = 
> {"type":"record","name":"User","namespace":"org.apache.flink.glue.schema.registry.test","fields":[{"name":"name","type":"string"},{"name":"favorite_number","type":["int","null"]},{"name":"favorite_color","type":["string","null"]}]},
>  schema name = gsr_avro_input_stream 
>   at 
> com.amazonaws.services.schemaregistry.common.AWSSchemaRegistryClient.getORRegisterSchemaVersionId(AWSSchemaRegistryClient.java:202){quote}
> These tests should be enabled and fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28675) Avro Schemas should eagerly validate that class is SpecificRecord

2022-08-08 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28675.

Fix Version/s: (was: 1.16.0)
   Resolution: Invalid

This is already enforced by generics.

> Avro Schemas should eagerly validate that class is SpecificRecord
> -
>
> Key: FLINK-28675
> URL: https://issues.apache.org/jira/browse/FLINK-28675
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.15.1
>Reporter: Chesnay Schepler
>Priority: Major
>
> The AvroDeserializationSchema supports both generic and specific records, 
> with dedicated factory methods.
> It does however not validate in any way whether the classes passed to the 
> factories methods are actually generic/specific records respectively, which 
> can result in Flink attempting to read generic records (and failing with an 
> NPE) even though the user told us to read specific records.
> We should validate this eagerly and fail early.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28865) Add updated Print sink for new interfaces

2022-08-08 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-28865:


 Summary: Add updated Print sink for new interfaces
 Key: FLINK-28865
 URL: https://issues.apache.org/jira/browse/FLINK-28865
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


The built-in print sink still uses the old sink interfaces. Add a new 
implementation for the new sink interfaces and deprecate the old sink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28844) YARNHighAvailabilityITCase fails with NoSuchMethod of org.apache.curator

2022-08-08 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28844.

Resolution: Fixed

master: caef5b7a5c1b980cff926fec5dce3a34ad1b354f

> YARNHighAvailabilityITCase fails with NoSuchMethod of org.apache.curator
> 
>
> Key: FLINK-28844
> URL: https://issues.apache.org/jira/browse/FLINK-28844
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.16.0
>Reporter: Jark Wu
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.16.0
>
>
> This is keep failing on master since the commit of 
> https://github.com/flink-ci/flink-mirror/commit/6335b573863af2b30a6541f910be96ddf61f9c84
>  which removes curator-test dependency from the flink-test-utils module. 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=39394&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461
> {code}
> 2022-08-05T18:31:47.0438160Z Aug 05 18:31:47 [ERROR] Tests run: 1, Failures: 
> 0, Errors: 1, Skipped: 0, Time elapsed: 5.237 s <<< FAILURE! - in 
> org.apache.flink.yarn.YARNHighAvailabilityITCase
> 2022-08-05T18:31:47.0439564Z Aug 05 18:31:47 [ERROR] 
> org.apache.flink.yarn.YARNHighAvailabilityITCase  Time elapsed: 5.237 s  <<< 
> ERROR!
> 2022-08-05T18:31:47.0440370Z Aug 05 18:31:47 java.lang.NoSuchMethodError: 
> org.apache.curator.test.InstanceSpec.getHostname()Ljava/lang/String;
> 2022-08-05T18:31:47.0441582Z Aug 05 18:31:47  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.getZookeeperInstanceSpecWithIncreasedSessionTimeout(ZooKeeperTestUtils.java:71)
> 2022-08-05T18:31:47.0442643Z Aug 05 18:31:47  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.createAndStartZookeeperTestingServer(ZooKeeperTestUtils.java:49)
> 2022-08-05T18:31:47.0443461Z Aug 05 18:31:47  at 
> org.apache.flink.yarn.YARNHighAvailabilityITCase.setup(YARNHighAvailabilityITCase.java:114)
> 2022-08-05T18:31:47.0444094Z Aug 05 18:31:47  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-08-05T18:31:47.0444717Z Aug 05 18:31:47  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-08-05T18:31:47.0445424Z Aug 05 18:31:47  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-08-05T18:31:47.0446063Z Aug 05 18:31:47  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-08-05T18:31:47.0446818Z Aug 05 18:31:47  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-08-05T18:31:47.0447822Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-08-05T18:31:47.0448657Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-08-05T18:31:47.0449692Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-08-05T18:31:47.0450637Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:126)
> 2022-08-05T18:31:47.0451443Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptBeforeAllMethod(TimeoutExtension.java:68)
> 2022-08-05T18:31:47.0452304Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-08-05T18:31:47.0453162Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-08-05T18:31:47.0454013Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-08-05T18:31:47.0454882Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-08-05T18:31:47.0455716Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-08-05T18:31:47.0456525Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-08-05T18:31:47.0457512Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-08-05T18:31:47.0458637Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-08-05T18:31:47.0459742Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.descriptor.ClassBasedTest

[jira] [Updated] (FLINK-28844) YARNHighAvailabilityITCase fails with NoSuchMethod of org.apache.curator

2022-08-07 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28844:
-
Affects Version/s: 1.16.0

> YARNHighAvailabilityITCase fails with NoSuchMethod of org.apache.curator
> 
>
> Key: FLINK-28844
> URL: https://issues.apache.org/jira/browse/FLINK-28844
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.16.0
>Reporter: Jark Wu
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.16.0
>
>
> This is keep failing on master since the commit of 
> https://github.com/flink-ci/flink-mirror/commit/6335b573863af2b30a6541f910be96ddf61f9c84
>  which removes curator-test dependency from the flink-test-utils module. 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=39394&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461
> {code}
> 2022-08-05T18:31:47.0438160Z Aug 05 18:31:47 [ERROR] Tests run: 1, Failures: 
> 0, Errors: 1, Skipped: 0, Time elapsed: 5.237 s <<< FAILURE! - in 
> org.apache.flink.yarn.YARNHighAvailabilityITCase
> 2022-08-05T18:31:47.0439564Z Aug 05 18:31:47 [ERROR] 
> org.apache.flink.yarn.YARNHighAvailabilityITCase  Time elapsed: 5.237 s  <<< 
> ERROR!
> 2022-08-05T18:31:47.0440370Z Aug 05 18:31:47 java.lang.NoSuchMethodError: 
> org.apache.curator.test.InstanceSpec.getHostname()Ljava/lang/String;
> 2022-08-05T18:31:47.0441582Z Aug 05 18:31:47  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.getZookeeperInstanceSpecWithIncreasedSessionTimeout(ZooKeeperTestUtils.java:71)
> 2022-08-05T18:31:47.0442643Z Aug 05 18:31:47  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.createAndStartZookeeperTestingServer(ZooKeeperTestUtils.java:49)
> 2022-08-05T18:31:47.0443461Z Aug 05 18:31:47  at 
> org.apache.flink.yarn.YARNHighAvailabilityITCase.setup(YARNHighAvailabilityITCase.java:114)
> 2022-08-05T18:31:47.0444094Z Aug 05 18:31:47  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-08-05T18:31:47.0444717Z Aug 05 18:31:47  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-08-05T18:31:47.0445424Z Aug 05 18:31:47  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-08-05T18:31:47.0446063Z Aug 05 18:31:47  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-08-05T18:31:47.0446818Z Aug 05 18:31:47  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-08-05T18:31:47.0447822Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-08-05T18:31:47.0448657Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-08-05T18:31:47.0449692Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-08-05T18:31:47.0450637Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:126)
> 2022-08-05T18:31:47.0451443Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptBeforeAllMethod(TimeoutExtension.java:68)
> 2022-08-05T18:31:47.0452304Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-08-05T18:31:47.0453162Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-08-05T18:31:47.0454013Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-08-05T18:31:47.0454882Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-08-05T18:31:47.0455716Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-08-05T18:31:47.0456525Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-08-05T18:31:47.0457512Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-08-05T18:31:47.0458637Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-08-05T18:31:47.0459742Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllMethods

[jira] [Commented] (FLINK-28844) YARNHighAvailabilityITCase fails with NoSuchMethod of org.apache.curator

2022-08-07 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576370#comment-17576370
 ] 

Chesnay Schepler commented on FLINK-28844:
--

No clue why this happens but it should be easy to fix at least.

> YARNHighAvailabilityITCase fails with NoSuchMethod of org.apache.curator
> 
>
> Key: FLINK-28844
> URL: https://issues.apache.org/jira/browse/FLINK-28844
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Reporter: Jark Wu
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.16.0
>
>
> This is keep failing on master since the commit of 
> https://github.com/flink-ci/flink-mirror/commit/6335b573863af2b30a6541f910be96ddf61f9c84
>  which removes curator-test dependency from the flink-test-utils module. 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=39394&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461
> {code}
> 2022-08-05T18:31:47.0438160Z Aug 05 18:31:47 [ERROR] Tests run: 1, Failures: 
> 0, Errors: 1, Skipped: 0, Time elapsed: 5.237 s <<< FAILURE! - in 
> org.apache.flink.yarn.YARNHighAvailabilityITCase
> 2022-08-05T18:31:47.0439564Z Aug 05 18:31:47 [ERROR] 
> org.apache.flink.yarn.YARNHighAvailabilityITCase  Time elapsed: 5.237 s  <<< 
> ERROR!
> 2022-08-05T18:31:47.0440370Z Aug 05 18:31:47 java.lang.NoSuchMethodError: 
> org.apache.curator.test.InstanceSpec.getHostname()Ljava/lang/String;
> 2022-08-05T18:31:47.0441582Z Aug 05 18:31:47  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.getZookeeperInstanceSpecWithIncreasedSessionTimeout(ZooKeeperTestUtils.java:71)
> 2022-08-05T18:31:47.0442643Z Aug 05 18:31:47  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.createAndStartZookeeperTestingServer(ZooKeeperTestUtils.java:49)
> 2022-08-05T18:31:47.0443461Z Aug 05 18:31:47  at 
> org.apache.flink.yarn.YARNHighAvailabilityITCase.setup(YARNHighAvailabilityITCase.java:114)
> 2022-08-05T18:31:47.0444094Z Aug 05 18:31:47  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-08-05T18:31:47.0444717Z Aug 05 18:31:47  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-08-05T18:31:47.0445424Z Aug 05 18:31:47  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-08-05T18:31:47.0446063Z Aug 05 18:31:47  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-08-05T18:31:47.0446818Z Aug 05 18:31:47  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-08-05T18:31:47.0447822Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-08-05T18:31:47.0448657Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-08-05T18:31:47.0449692Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-08-05T18:31:47.0450637Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:126)
> 2022-08-05T18:31:47.0451443Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptBeforeAllMethod(TimeoutExtension.java:68)
> 2022-08-05T18:31:47.0452304Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-08-05T18:31:47.0453162Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-08-05T18:31:47.0454013Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-08-05T18:31:47.0454882Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-08-05T18:31:47.0455716Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-08-05T18:31:47.0456525Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-08-05T18:31:47.0457512Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-08-05T18:31:47.0458637Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-08-05T18:31:47.0459742Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.des

[jira] [Assigned] (FLINK-28844) YARNHighAvailabilityITCase fails with NoSuchMethod of org.apache.curator

2022-08-07 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-28844:


Assignee: Chesnay Schepler

> YARNHighAvailabilityITCase fails with NoSuchMethod of org.apache.curator
> 
>
> Key: FLINK-28844
> URL: https://issues.apache.org/jira/browse/FLINK-28844
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Reporter: Jark Wu
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.16.0
>
>
> This is keep failing on master since the commit of 
> https://github.com/flink-ci/flink-mirror/commit/6335b573863af2b30a6541f910be96ddf61f9c84
>  which removes curator-test dependency from the flink-test-utils module. 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=39394&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461
> {code}
> 2022-08-05T18:31:47.0438160Z Aug 05 18:31:47 [ERROR] Tests run: 1, Failures: 
> 0, Errors: 1, Skipped: 0, Time elapsed: 5.237 s <<< FAILURE! - in 
> org.apache.flink.yarn.YARNHighAvailabilityITCase
> 2022-08-05T18:31:47.0439564Z Aug 05 18:31:47 [ERROR] 
> org.apache.flink.yarn.YARNHighAvailabilityITCase  Time elapsed: 5.237 s  <<< 
> ERROR!
> 2022-08-05T18:31:47.0440370Z Aug 05 18:31:47 java.lang.NoSuchMethodError: 
> org.apache.curator.test.InstanceSpec.getHostname()Ljava/lang/String;
> 2022-08-05T18:31:47.0441582Z Aug 05 18:31:47  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.getZookeeperInstanceSpecWithIncreasedSessionTimeout(ZooKeeperTestUtils.java:71)
> 2022-08-05T18:31:47.0442643Z Aug 05 18:31:47  at 
> org.apache.flink.runtime.testutils.ZooKeeperTestUtils.createAndStartZookeeperTestingServer(ZooKeeperTestUtils.java:49)
> 2022-08-05T18:31:47.0443461Z Aug 05 18:31:47  at 
> org.apache.flink.yarn.YARNHighAvailabilityITCase.setup(YARNHighAvailabilityITCase.java:114)
> 2022-08-05T18:31:47.0444094Z Aug 05 18:31:47  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-08-05T18:31:47.0444717Z Aug 05 18:31:47  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-08-05T18:31:47.0445424Z Aug 05 18:31:47  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-08-05T18:31:47.0446063Z Aug 05 18:31:47  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-08-05T18:31:47.0446818Z Aug 05 18:31:47  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-08-05T18:31:47.0447822Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-08-05T18:31:47.0448657Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-08-05T18:31:47.0449692Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-08-05T18:31:47.0450637Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:126)
> 2022-08-05T18:31:47.0451443Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptBeforeAllMethod(TimeoutExtension.java:68)
> 2022-08-05T18:31:47.0452304Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-08-05T18:31:47.0453162Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-08-05T18:31:47.0454013Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-08-05T18:31:47.0454882Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-08-05T18:31:47.0455716Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-08-05T18:31:47.0456525Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-08-05T18:31:47.0457512Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-08-05T18:31:47.0458637Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-08-05T18:31:47.0459742Z Aug 05 18:31:47  at 
> org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllMethods$11(ClassBasedTestDesc

[jira] [Closed] (FLINK-28807) Various components don't respect schema lifecycle

2022-08-06 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28807.

Resolution: Fixed

master: fb95798b1c301152b912c4b8ec4a737ea16d8641

> Various components don't respect schema lifecycle
> -
>
> Key: FLINK-28807
> URL: https://issues.apache.org/jira/browse/FLINK-28807
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Connectors / ElasticSearch, Connectors / 
> Kafka, Connectors / Kinesis, Formats (JSON, Avro, Parquet, ORC, 
> SequenceFile), Tests
>Affects Versions: 1.15.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> A surprising number of components never call 
> \{{(De)SerializationSchema#open}} making life very difficult for people who 
> want to make use of said method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28733) jobmanager.sh should support dynamic properties

2022-08-05 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28733.

Resolution: Fixed

master: ff2f4cb624ab84c43f2fd0daea2daa8ba74b4169

> jobmanager.sh should support dynamic properties
> ---
>
> Key: FLINK-28733
> URL: https://issues.apache.org/jira/browse/FLINK-28733
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Scripts
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> {{jobmanager.sh}} throws away all arguments after the host/webui-port 
> settings, in contrast to other scripts like {{taskmanager.sh}},{{ 
> historyserver.sh}} or {{standalone-job.sh}}.
> This prevents users from using dynamic properties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28841) Document dynamic property support for startup scripts

2022-08-05 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-28841:


 Summary: Document dynamic property support for startup scripts
 Key: FLINK-28841
 URL: https://issues.apache.org/jira/browse/FLINK-28841
 Project: Flink
  Issue Type: Technical Debt
  Components: Deployment / Scripts, Documentation
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


The support for dynamic properties in startup scripts isn't documented anywhere.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28713) Remove unused curator-test dependency from flink-test-utils

2022-08-05 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28713.

Resolution: Fixed

master: 6335b573863af2b30a6541f910be96ddf61f9c84

> Remove unused curator-test dependency from flink-test-utils
> ---
>
> Key: FLINK-28713
> URL: https://issues.apache.org/jira/browse/FLINK-28713
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System, Tests
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Remove an unused dependency that also pulls in log4j1 into user projects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28808) CsvFileFormatFactory#createEncodingFormat should create ConverterContext on server-side

2022-08-05 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28808.

Resolution: Fixed

master: e0f4a0ca9271f0e9e31011578dbdea4b5a8d30fe

> CsvFileFormatFactory#createEncodingFormat should create ConverterContext on 
> server-side
> ---
>
> Key: FLINK-28808
> URL: https://issues.apache.org/jira/browse/FLINK-28808
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Since we shouldn't assume that jackson mappers are serializable (due to 
> non-serializable extension) the RowDataToCsvFormatConverterContext also 
> shouldn't be serializable.
> With that in mind the CsvFileFormatFactory should create the context when the 
> writer is created, not beforehand.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-28183) flink-python is lacking several test dependencies

2022-08-05 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-28183:


Assignee: Chesnay Schepler

> flink-python is lacking several test dependencies
> -
>
> Key: FLINK-28183
> URL: https://issues.apache.org/jira/browse/FLINK-28183
> Project: Flink
>  Issue Type: Technical Debt
>  Components: API / Python, Build System
>Affects Versions: 1.16.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Critical
> Fix For: 1.16.0
>
>
> The pyflink_gateway_server searches the output directories of various modules 
> to construct a test classpath.
> Half of these are not declared as actual test dependencies in maven. Because 
> of that there are no guarantees that these modules are actually built before 
> flink-python.
> Additionally there seem to be no safeguards in place to verify that these 
> jars actually exist.
> Considering that this is only required for testing most of this logic should 
> also be moved into maven, copying these dependencies to some directory under 
> flink-python/target, to make this de-facto build logic more discoverable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28804) Use proper stand-ins for missing metrics groups

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28804.

Resolution: Fixed

master: fd26e088a43a9426ad5f0b94237495d50d78656b

> Use proper stand-ins for missing metrics groups
> ---
>
> Key: FLINK-28804
> URL: https://issues.apache.org/jira/browse/FLINK-28804
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> A few classes call the open() method of schemas with a custom initialization 
> context that either throws an exception or returns null when the metricgroup 
> is accessed.
> The correct approach is to return an unregistered metrics group.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-24302) Direct buffer memory leak on Pulsar connector with Java 11

2022-08-04 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-24302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575445#comment-17575445
 ] 

Chesnay Schepler commented on FLINK-24302:
--

The linked PR was noted merged to 1.10. The merged commit 
([https://github.com/apache/pulsar/commit/1a098d5418be7d4674f9ba9dbc4da0018f91654a)]
 only exists on the master branch, and it was added to the 2.11 milestone:

[??codelipenghui??|https://github.com/codelipenghui]?? added this to the 
[2.11.0|https://github.com/apache/pulsar/milestone/33] milestone [on 19 
Apr|https://github.com/apache/pulsar/pull/15216#event-6455805257]??

> Direct buffer memory leak on Pulsar connector with Java 11
> --
>
> Key: FLINK-24302
> URL: https://issues.apache.org/jira/browse/FLINK-24302
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Pulsar
>Affects Versions: 1.14.0
>Reporter: Yufan Sheng
>Priority: Major
>  Labels: test-stability
>
> Running the Pulsar connector with multiple split readers on Java 11 could 
> throw {{a java.lang.OutOfMemoryError exception}}.
>  
> {code:java}
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException:
>  Could not complete the operation. Number of retries has been exhausted. 
> Failed reason: java.lang.OutOfMemoryError: Direct buffer memory
>   at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>   at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
>   at 
> java.base/java.util.concurrent.CompletableFuture$OrApply.tryFire(CompletableFuture.java:1503)
>   ... 42 more
> Caused by: 
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException:
>  Could not complete the operation. Number of retries has been exhausted. 
> Failed reason: java.lang.OutOfMemoryError: Direct buffer memory
>   at 
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$4(AsyncHttpConnector.java:249)
>   ... 39 more
> Caused by: org.apache.pulsar.shade.io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Direct buffer memory
>   at 
> org.apache.pulsar.shade.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:104)
>   at 
> org.apache.pulsar.shade.io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
>   at 
> org.apache.pulsar.shade.io.netty.handler.stream.ChunkedWriteHandler.doFlush(ChunkedWriteHandler.java:303)
>   at 
> org.apache.pulsar.shade.io.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:132)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
>   at 
> org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1020)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:311)
>   at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.NettyRequestSender.writeRequest(NettyRequestSender.java:420)
>   ... 23 more
> {code}
> The reason is that under Java 11, the Netty will allocate memory from the 
> pool of Java Direct Memory and is affected by the MaxDirectMemory limit. 
> Under Java 8, it allocates native memory and is not affected by that setting.
> We have to reduce the direct memory usage by using a newer Pulsar client 
> which has a memory-limits configuration.
> This issue is addressed on Pulsar, and 
> [PIP-74|https://github.com/apache/pulsar/wiki/PIP-74%3A-Pulsar-client-memory-limits]
>  has been created for resolving this issue.
> We should keep this issue ope

[jira] [Updated] (FLINK-28245) Refactor DependencyParser#parseDependencyTreeOutput to return a Tree

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28245:
-
Fix Version/s: 1.17.0
   (was: 1.16.0)

> Refactor DependencyParser#parseDependencyTreeOutput to return a Tree
> 
>
> Key: FLINK-28245
> URL: https://issues.apache.org/jira/browse/FLINK-28245
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.17.0
>
>
> Returning a tree structure makes it easier to infer transitive properties 
> based on the tree structure, which will be used by a new safeguard for 
> FLINK-28016.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28202) Generalize utils around shade-plugin

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28202:
-
Fix Version/s: 1.17.0
   (was: 1.16.0)

> Generalize utils around shade-plugin
> 
>
> Key: FLINK-28202
> URL: https://issues.apache.org/jira/browse/FLINK-28202
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System, Build System / CI
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.17.0
>
>
> We'll be adding another safeguard against developer mistakes which also 
> parses the output of the shade plugin, like the license checker.
> We should generalize this parsing such that both checks can use the same code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28016) Support Maven 3.3+

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28016:
-
Fix Version/s: 1.17.0
   (was: 1.16.0)

> Support Maven 3.3+
> --
>
> Key: FLINK-28016
> URL: https://issues.apache.org/jira/browse/FLINK-28016
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.17.0
>
>
> We are currently de-facto limited to Maven 3.2.5 because our packaging relies 
> on the shade-plugin modifying the dependency tree at runtime when bundling 
> dependencies, which is no longer possible on Maven 3.3+.
> Being locked in to such an old Maven version isn't a good state to be in, and 
> the contributor experience suffers as well.
> I've been looking into removing this limitation by explicitly marking every 
> dependency that we bundle as {{optional}} in the poms, which really means 
> {{non-transitive}}. This ensures that the everything being bundled by one 
> module is not visible to other modules. Some tooling to capture developer 
> mistakes were also written.
> Overall this is actually quite a nice change, as it makes things more 
> explicit and reduces inconsistencies (e.g., the dependency plugin results are 
> questionable if the shade-plugin didn't run!); and it already highlighted 
> several problems in Flink.
> This change will have no effect on users or the released poms, because the 
> dependency-reduced poms will be generated as before and remove all modified 
> dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28246) Store classifier in dependency

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28246:
-
Fix Version/s: 1.17.0
   (was: 1.16.0)

> Store classifier in dependency
> --
>
> Key: FLINK-28246
> URL: https://issues.apache.org/jira/browse/FLINK-28246
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System / CI
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.17.0
>
>
> The classifier is required to properly differentiate between jars and 
> test-jars.
> In the future we could also improve the accuracy of the notice checker (which 
> currently doesn't care about classifiers).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28203) Mark all bundled dependencies as optional

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28203:
-
Fix Version/s: 1.17.0
   (was: 1.16.0)

> Mark all bundled dependencies as optional
> -
>
> Key: FLINK-28203
> URL: https://issues.apache.org/jira/browse/FLINK-28203
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-25252) Enable Kafka E2E tests on Java 11

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-25252:
-
Summary: Enable Kafka E2E tests on Java 11  (was: Check whether Kafka E2E 
tests could be run on Java 11)

> Enable Kafka E2E tests on Java 11
> -
>
> Key: FLINK-25252
> URL: https://issues.apache.org/jira/browse/FLINK-25252
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kafka, Tests
>Reporter: Chesnay Schepler
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> The Java Kafka E2E tests are currently not run on Java 11. We should check 
> what the actual issue is and whether it can be resolved (e.g., by a Kafka 
> server version bump):



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-25252) Check whether Kafka E2E tests could be run on Java 11

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-25252.

  Assignee: Martijn Visser
Resolution: Fixed

master: 75a66f921c94ebc5686a3bacf741fee2bdfe6dd6

> Check whether Kafka E2E tests could be run on Java 11
> -
>
> Key: FLINK-25252
> URL: https://issues.apache.org/jira/browse/FLINK-25252
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kafka, Tests
>Reporter: Chesnay Schepler
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> The Java Kafka E2E tests are currently not run on Java 11. We should check 
> what the actual issue is and whether it can be resolved (e.g., by a Kafka 
> server version bump):



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-24302) Direct buffer memory leak on Pulsar connector with Java 11

2022-08-04 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-24302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575197#comment-17575197
 ] 

Chesnay Schepler edited comment on FLINK-24302 at 8/4/22 10:29 AM:
---

There is no release yet with the new APIs for controlling the memory 
consumption of consumers; it's only available for producers afaict.

[https://github.com/apache/pulsar/commit/7b2d4c1377607697dcc562b6c85eb447e0af]

[https://github.com/apache/pulsar/commit/1a098d5418be7d4674f9ba9dbc4da0018f91654a]


was (Author: zentol):
There is no release yet with the new APis for controlling the memory 
consumption of consumers; it's only available for producers afaict.

https://github.com/apache/pulsar/commit/7b2d4c1377607697dcc562b6c85eb447e0af

https://github.com/apache/pulsar/commit/1a098d5418be7d4674f9ba9dbc4da0018f91654a

> Direct buffer memory leak on Pulsar connector with Java 11
> --
>
> Key: FLINK-24302
> URL: https://issues.apache.org/jira/browse/FLINK-24302
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Pulsar
>Affects Versions: 1.14.0
>Reporter: Yufan Sheng
>Priority: Major
>  Labels: test-stability
>
> Running the Pulsar connector with multiple split readers on Java 11 could 
> throw {{a java.lang.OutOfMemoryError exception}}.
>  
> {code:java}
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException:
>  Could not complete the operation. Number of retries has been exhausted. 
> Failed reason: java.lang.OutOfMemoryError: Direct buffer memory
>   at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>   at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
>   at 
> java.base/java.util.concurrent.CompletableFuture$OrApply.tryFire(CompletableFuture.java:1503)
>   ... 42 more
> Caused by: 
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException:
>  Could not complete the operation. Number of retries has been exhausted. 
> Failed reason: java.lang.OutOfMemoryError: Direct buffer memory
>   at 
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$4(AsyncHttpConnector.java:249)
>   ... 39 more
> Caused by: org.apache.pulsar.shade.io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Direct buffer memory
>   at 
> org.apache.pulsar.shade.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:104)
>   at 
> org.apache.pulsar.shade.io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
>   at 
> org.apache.pulsar.shade.io.netty.handler.stream.ChunkedWriteHandler.doFlush(ChunkedWriteHandler.java:303)
>   at 
> org.apache.pulsar.shade.io.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:132)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
>   at 
> org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1020)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:311)
>   at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.NettyRequestSender.writeRequest(NettyRequestSender.java:420)
>   ... 23 more
> {code}
> The reason is that under Java 11, the Netty will allocate memory from the 
> pool of Java Direct Memory and is affected by the MaxDirectMemory limit. 
> Under Java 8, it allocates native memory and is not affected by that setting.
> We have to reduce the direct memory usage by using a newer Pulsar client 
> w

[jira] [Commented] (FLINK-24302) Direct buffer memory leak on Pulsar connector with Java 11

2022-08-04 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-24302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575197#comment-17575197
 ] 

Chesnay Schepler commented on FLINK-24302:
--

There is no release yet with the new APis for controlling the memory 
consumption of consumers; it's only available for producers afaict.

https://github.com/apache/pulsar/commit/7b2d4c1377607697dcc562b6c85eb447e0af

https://github.com/apache/pulsar/commit/1a098d5418be7d4674f9ba9dbc4da0018f91654a

> Direct buffer memory leak on Pulsar connector with Java 11
> --
>
> Key: FLINK-24302
> URL: https://issues.apache.org/jira/browse/FLINK-24302
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Pulsar
>Affects Versions: 1.14.0
>Reporter: Yufan Sheng
>Priority: Major
>  Labels: test-stability
>
> Running the Pulsar connector with multiple split readers on Java 11 could 
> throw {{a java.lang.OutOfMemoryError exception}}.
>  
> {code:java}
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException:
>  Could not complete the operation. Number of retries has been exhausted. 
> Failed reason: java.lang.OutOfMemoryError: Direct buffer memory
>   at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>   at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
>   at 
> java.base/java.util.concurrent.CompletableFuture$OrApply.tryFire(CompletableFuture.java:1503)
>   ... 42 more
> Caused by: 
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException:
>  Could not complete the operation. Number of retries has been exhausted. 
> Failed reason: java.lang.OutOfMemoryError: Direct buffer memory
>   at 
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$4(AsyncHttpConnector.java:249)
>   ... 39 more
> Caused by: org.apache.pulsar.shade.io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Direct buffer memory
>   at 
> org.apache.pulsar.shade.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:104)
>   at 
> org.apache.pulsar.shade.io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
>   at 
> org.apache.pulsar.shade.io.netty.handler.stream.ChunkedWriteHandler.doFlush(ChunkedWriteHandler.java:303)
>   at 
> org.apache.pulsar.shade.io.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:132)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
>   at 
> org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1020)
>   at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:311)
>   at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.NettyRequestSender.writeRequest(NettyRequestSender.java:420)
>   ... 23 more
> {code}
> The reason is that under Java 11, the Netty will allocate memory from the 
> pool of Java Direct Memory and is affected by the MaxDirectMemory limit. 
> Under Java 8, it allocates native memory and is not affected by that setting.
> We have to reduce the direct memory usage by using a newer Pulsar client 
> which has a memory-limits configuration.
> This issue is addressed on Pulsar, and 
> [PIP-74|https://github.com/apache/pulsar/wiki/PIP-74%3A-Pulsar-client-memory-limits]
>  has been created for resolving this issue.
> We should keep this issue open with no resolved versions until Pulsar 
> provides a new client with memory limits.



--
This message was sent by Atlassian Jira
(

[jira] [Updated] (FLINK-28680) No space left on device on Azure e2e_2_ci tests

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28680:
-
Fix Version/s: 1.16.0
   1.15.2
   1.14.6

> No space left on device on Azure e2e_2_ci tests
> ---
>
> Key: FLINK-28680
> URL: https://issues.apache.org/jira/browse/FLINK-28680
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / Azure Pipelines
>Affects Versions: 1.14.5, 1.16.0
>Reporter: Huang Xingbo
>Assignee: Robert Metzger
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.16.0, 1.15.2, 1.14.6
>
>
> Many e2e tests failed due to no enough space. We previously cleaned up the 
> space by cleaning up the flink-e2e directory, but at the moment this is not 
> enough to solve the problem. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28680) No space left on device on Azure e2e_2_ci tests

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28680:
-
Issue Type: Technical Debt  (was: Bug)

> No space left on device on Azure e2e_2_ci tests
> ---
>
> Key: FLINK-28680
> URL: https://issues.apache.org/jira/browse/FLINK-28680
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System / Azure Pipelines
>Affects Versions: 1.14.5, 1.16.0
>Reporter: Huang Xingbo
>Assignee: Robert Metzger
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> Many e2e tests failed due to no enough space. We previously cleaned up the 
> space by cleaning up the flink-e2e directory, but at the moment this is not 
> enough to solve the problem. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28811) Enabled Java 11 tests for Hbase 2

2022-08-04 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-28811:


 Summary: Enabled Java 11 tests for Hbase 2
 Key: FLINK-28811
 URL: https://issues.apache.org/jira/browse/FLINK-28811
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / HBase
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


AFAICT the hbase 2.2 unit/it cases success on java 11.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28811) Enable Java 11 tests for Hbase 2

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28811:
-
Summary: Enable Java 11 tests for Hbase 2  (was: Enabled Java 11 tests for 
Hbase 2)

> Enable Java 11 tests for Hbase 2
> 
>
> Key: FLINK-28811
> URL: https://issues.apache.org/jira/browse/FLINK-28811
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / HBase
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.16.0
>
>
> AFAICT the hbase 2.2 unit/it cases success on java 11.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28807) Various components don't respect schema lifecycle

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28807:
-
Description: A surprising number of components never call 
\{{(De)SerializationSchema#open}} making life very difficult for people who 
want to make use of said method.  (was: A surprising number of components never 
call {{(De)SerializationSchema#open, }}making life very difficult for people 
who want to make use of said method.)

> Various components don't respect schema lifecycle
> -
>
> Key: FLINK-28807
> URL: https://issues.apache.org/jira/browse/FLINK-28807
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Connectors / ElasticSearch, Connectors / 
> Kafka, Connectors / Kinesis, Formats (JSON, Avro, Parquet, ORC, 
> SequenceFile), Tests
>Affects Versions: 1.15.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.16.0
>
>
> A surprising number of components never call 
> \{{(De)SerializationSchema#open}} making life very difficult for people who 
> want to make use of said method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28807) Various components don't respect schema lifecycle

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28807:
-
Component/s: Connectors / Kinesis

> Various components don't respect schema lifecycle
> -
>
> Key: FLINK-28807
> URL: https://issues.apache.org/jira/browse/FLINK-28807
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Connectors / ElasticSearch, Connectors / 
> Kafka, Connectors / Kinesis, Formats (JSON, Avro, Parquet, ORC, 
> SequenceFile), Tests
>Affects Versions: 1.15.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.16.0
>
>
> A surprising number of components never call {{(De)SerializationSchema#open, 
> }}making life very difficult for people who want to make use of said method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28808) CsvFileFormatFactory#createEncodingFormat should create ConverterContext on server-side

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28808:
-
Description: 
Since we shouldn't assume that jackson mappers are serializable (due to 
non-serializable extension) the RowDataToCsvFormatConverterContext also 
shouldn't be serializable.

With that in mind the CsvFileFormatFactory should create the context when the 
writer is created, not beforehand.

  was:
Since we shouldn't assume that jackson mappers are serializable (due to 
non-serializable extension) the RowDataToCsvFormatConverterContext also 
shouldn't be serializable.

 

With that in mind the CsvFileFormatFactory should create the context when the 
writer is created, not beforehand.


> CsvFileFormatFactory#createEncodingFormat should create ConverterContext on 
> server-side
> ---
>
> Key: FLINK-28808
> URL: https://issues.apache.org/jira/browse/FLINK-28808
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.16.0
>
>
> Since we shouldn't assume that jackson mappers are serializable (due to 
> non-serializable extension) the RowDataToCsvFormatConverterContext also 
> shouldn't be serializable.
> With that in mind the CsvFileFormatFactory should create the context when the 
> writer is created, not beforehand.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28808) CsvFileFormatFactory#createEncodingFormat should create ConverterContext on server-side

2022-08-04 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-28808:


 Summary: CsvFileFormatFactory#createEncodingFormat should create 
ConverterContext on server-side
 Key: FLINK-28808
 URL: https://issues.apache.org/jira/browse/FLINK-28808
 Project: Flink
  Issue Type: Technical Debt
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


Since we shouldn't assume that jackson mappers are serializable (due to 
non-serializable extension) the RowDataToCsvFormatConverterContext also 
shouldn't be serializable.

 

With that in mind the CsvFileFormatFactory should create the context when the 
writer is created, not beforehand.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28804) Use proper stand-ins for missing metrics groups

2022-08-04 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-28804:


 Summary: Use proper stand-ins for missing metrics groups
 Key: FLINK-28804
 URL: https://issues.apache.org/jira/browse/FLINK-28804
 Project: Flink
  Issue Type: Technical Debt
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Reporter: Chesnay Schepler
 Fix For: 1.16.0


A few classes call the open() method of schemas with a custom initialization 
context that either throws an exception or returns null when the metricgroup is 
accessed.

The correct approach is to return an unregistered metrics group.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28807) Various components don't respect schema lifecycle

2022-08-04 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-28807:


 Summary: Various components don't respect schema lifecycle
 Key: FLINK-28807
 URL: https://issues.apache.org/jira/browse/FLINK-28807
 Project: Flink
  Issue Type: Bug
  Components: API / Python, Connectors / ElasticSearch, Connectors / 
Kafka, Formats (JSON, Avro, Parquet, ORC, SequenceFile), Tests
Affects Versions: 1.15.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


A surprising number of components never call {{(De)SerializationSchema#open, 
}}making life very difficult for people who want to make use of said method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-28804) Use proper stand-ins for missing metrics groups

2022-08-04 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-28804:


Assignee: Chesnay Schepler

> Use proper stand-ins for missing metrics groups
> ---
>
> Key: FLINK-28804
> URL: https://issues.apache.org/jira/browse/FLINK-28804
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.16.0
>
>
> A few classes call the open() method of schemas with a custom initialization 
> context that either throws an exception or returns null when the metricgroup 
> is accessed.
> The correct approach is to return an unregistered metrics group.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28791) flink-sql-gateway-test does not compile on Java 11

2022-08-03 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28791.

Resolution: Fixed

master: 9efd97e05717181ee9b5489cddc27b6351127e38

> flink-sql-gateway-test does not compile on Java 11
> --
>
> Key: FLINK-28791
> URL: https://issues.apache.org/jira/browse/FLINK-28791
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Table SQL / Gateway, Tests
>Affects Versions: 1.16.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> {{Could not find artifact jdk.tools:jdk.tools:jar:1.7}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-28791) flink-sql-gateway-test does not compile on Java 11

2022-08-03 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-28791:


Assignee: Chesnay Schepler

> flink-sql-gateway-test does not compile on Java 11
> --
>
> Key: FLINK-28791
> URL: https://issues.apache.org/jira/browse/FLINK-28791
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Table SQL / Gateway, Tests
>Affects Versions: 1.16.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.16.0
>
>
> {{Could not find artifact jdk.tools:jdk.tools:jar:1.7}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28791) flink-sql-gateway-test does not compile on Java 11

2022-08-03 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-28791:


 Summary: flink-sql-gateway-test does not compile on Java 11
 Key: FLINK-28791
 URL: https://issues.apache.org/jira/browse/FLINK-28791
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / Gateway, Tests
Affects Versions: 1.16.0
Reporter: Chesnay Schepler
 Fix For: 1.16.0


{{Could not find artifact jdk.tools:jdk.tools:jar:1.7}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28719) Mapping a data source before window aggregation causes Flink to stop handle late events correctly.

2022-08-03 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28719.

Resolution: Information Provided

> Mapping a data source before window aggregation causes Flink to stop handle 
> late events correctly.
> --
>
> Key: FLINK-28719
> URL: https://issues.apache.org/jira/browse/FLINK-28719
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.15.1
>Reporter: Mykyta Mykhailenko
>Priority: Major
>
> I have created a 
> [repository|https://github.com/mykytamykhailenko/flink-map-with-issue] where 
> I describe this issue in detail. 
> I have provided a few tests and source code so that you can reproduce the 
> issue on your own machine. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-28784) ParameterTool support read and parse program arguments from yml file

2022-08-03 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574581#comment-17574581
 ] 

Chesnay Schepler commented on FLINK-28784:
--

I would be against this. Arguably Flink should have never offered a tool to 
parse command-line arguments in the first place and should have left that to 
other dedicated projects.

> ParameterTool support read and parse program arguments from yml file
> 
>
> Key: FLINK-28784
> URL: https://issues.apache.org/jira/browse/FLINK-28784
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Affects Versions: 1.15.1
>Reporter: Aiden Gong
>Priority: Critical
> Fix For: 1.15.2
>
>
> ParameterTool support read and parse program arguments from different 
> sources. But don't support `yml` file. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-28653) State Schema Evolution does not work - Flink defaults to Kryo serialization even for POJOs and Avro SpecificRecords

2022-08-02 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574259#comment-17574259
 ] 

Chesnay Schepler edited comment on FLINK-28653 at 8/2/22 2:21 PM:
--

The issue could be that your aggregate function internally uses a List, and 
there's no way for Flink to know what type T is. The returns only locks in the 
output type of the AggregateFunction but not the accumulator type. AFAICT 
there's also nothing the API that allows this to be set.

IOW, you may have remove the generic parameter from the buffer, or (not sure if 
this works) use a similar trick as we do for OutputTags and create an anonymous 
class.
{{.aggregate(new Buffer(){})}}


was (Author: zentol):
The issue is that your aggregate function internally uses a List, and 
there's no way for Flink to know what type T is. AFAICT there's also nothing 
the API that allows this to be set.

IOW, you may have remove the generic parameter from the buffer, or (not sure if 
this works) use a similar trick as we do for OutputTags and create an anonymous 
class.
{{.aggregate(new Buffer(){})}}

> State Schema Evolution does not work - Flink defaults to Kryo serialization 
> even for POJOs and Avro SpecificRecords
> ---
>
> Key: FLINK-28653
> URL: https://issues.apache.org/jira/browse/FLINK-28653
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Runtime / State Backends
>Affects Versions: 1.14.3, 1.15.0
> Environment: I ran the job on a Flink cluster I spun up using docker 
> compose:
> ```
> version: "2.2"
> services:
>   jobmanager:
>     image: flink:latest
>     ports:
>       - "8081:8081"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>   taskmanager:
>     image: flink:latest
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     scale: 1
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
> ```
>  My machine is a MacBook Pro (14-inch, 2021) with the Apple M1 Pro chip.
> I'm running macOS Monterey Version 12.4.
>Reporter: Peleg Tsadok
>Priority: Major
>  Labels: KryoSerializer, State, avro, pojo, schema-evolution
>
> I am trying to do a POC of Flink State Schema Evolution. I am using Flink 
> 1.15.0 and Java 11 but also tested on Flink 1.14.3.
> I tried to create 3 data classes - one for each serialization type:
> 1. `io.peleg.kryo.User` - Uses `java.time.Instant` class which I know is not 
> supported for POJO serialization in Flink.
> 2. `io.peleg.pojo.User` - Uses only classic wrapped primitives - `Integer`, 
> `Long`, `String`. The getters, setters and constructors are generated using 
> Lombok.
> 3. `io.peleg.avro.User` - Generated from Avro schema using Avro Maven Plugin.
> For each class I wrote a stream job that uses a time window to buffer 
> elements and turn them into a list.
> For each class I tried to do the following:
> 1. Run a job
> 2. Stop with savepoint
> 3. Add a field to the data class
> 4. Submit using savepoint
> For all data classes the submit with savepoint failed with this exception:
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
>     at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_3983d6bb2f0a45b638461bc99138f22f_(2/2) from 
> any of the 1 provided restor

[jira] [Commented] (FLINK-28653) State Schema Evolution does not work - Flink defaults to Kryo serialization even for POJOs and Avro SpecificRecords

2022-08-02 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574259#comment-17574259
 ] 

Chesnay Schepler commented on FLINK-28653:
--

The issue is that your aggregate function internally uses a List, and 
there's no way for Flink to know what type T is. AFAICT there's also nothing 
the API that allows this to be set.

IOW, you may have remove the generic parameter from the buffer, or (not sure if 
this works) use a similar trick as we do for OutputTags and create an anonymous 
class.
{{.aggregate(new Buffer(){})}}

> State Schema Evolution does not work - Flink defaults to Kryo serialization 
> even for POJOs and Avro SpecificRecords
> ---
>
> Key: FLINK-28653
> URL: https://issues.apache.org/jira/browse/FLINK-28653
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Runtime / State Backends
>Affects Versions: 1.14.3, 1.15.0
> Environment: I ran the job on a Flink cluster I spun up using docker 
> compose:
> ```
> version: "2.2"
> services:
>   jobmanager:
>     image: flink:latest
>     ports:
>       - "8081:8081"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>   taskmanager:
>     image: flink:latest
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     scale: 1
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
> ```
>  My machine is a MacBook Pro (14-inch, 2021) with the Apple M1 Pro chip.
> I'm running macOS Monterey Version 12.4.
>Reporter: Peleg Tsadok
>Priority: Major
>  Labels: KryoSerializer, State, avro, pojo, schema-evolution
>
> I am trying to do a POC of Flink State Schema Evolution. I am using Flink 
> 1.15.0 and Java 11 but also tested on Flink 1.14.3.
> I tried to create 3 data classes - one for each serialization type:
> 1. `io.peleg.kryo.User` - Uses `java.time.Instant` class which I know is not 
> supported for POJO serialization in Flink.
> 2. `io.peleg.pojo.User` - Uses only classic wrapped primitives - `Integer`, 
> `Long`, `String`. The getters, setters and constructors are generated using 
> Lombok.
> 3. `io.peleg.avro.User` - Generated from Avro schema using Avro Maven Plugin.
> For each class I wrote a stream job that uses a time window to buffer 
> elements and turn them into a list.
> For each class I tried to do the following:
> 1. Run a job
> 2. Stop with savepoint
> 3. Add a field to the data class
> 4. Submit using savepoint
> For all data classes the submit with savepoint failed with this exception:
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
>     at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_3983d6bb2f0a45b638461bc99138f22f_(2/2) from 
> any of the 1 provided restore options.
>     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:164)
>     ... 11 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore hea

[jira] [Updated] (FLINK-25962) Flink generated Avro schemas can't be parsed using Python

2022-08-02 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-25962:
-
Component/s: Formats (JSON, Avro, Parquet, ORC, SequenceFile)

> Flink generated Avro schemas can't be parsed using Python
> -
>
> Key: FLINK-25962
> URL: https://issues.apache.org/jira/browse/FLINK-25962
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.14.3
>Reporter: Ryan Skraba
>Assignee: Ryan Skraba
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Flink currently generates Avro schemas as records with the top-level name 
> {{"record"}}
> Unfortunately, there is some inconsistency between Avro implementations in 
> different languages that may prevent this record from being read, notably 
> Python, which generates the error:
> *avro.schema.SchemaParseException: record is a reserved type name*
> (See the comment on FLINK-18096 for the full stack trace).
> The Java SDK accepts this name, and there's an [ongoing 
> discussion|https://lists.apache.org/thread/0wmgyx6z69gy07lvj9ndko75752b8cn2] 
> about what the expected behaviour should be.  This should be clarified and 
> fixed in Avro, of course.
> Regardless of the resolution, the best practice (which is used almost 
> everywhere else in the Flink codebase) is to explicitly specify a top-level 
> namespace for an Avro record.   We should use a default like: 
> {{{}org.apache.flink.avro.generated{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-25962) Flink generated Avro schemas can't be parsed using Python

2022-08-02 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-25962.

Fix Version/s: 1.16.0
 Release Note: Avro schemas generated by Flink now use the 
"org.apache.flink.avro.generated" namespace for compatibility with the Avro 
Python SDK.
   Resolution: Fixed

master: 2c58dca500d0ec4f5d80852aa96ddb9c06ae4d61

> Flink generated Avro schemas can't be parsed using Python
> -
>
> Key: FLINK-25962
> URL: https://issues.apache.org/jira/browse/FLINK-25962
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.3
>Reporter: Ryan Skraba
>Assignee: Ryan Skraba
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Flink currently generates Avro schemas as records with the top-level name 
> {{"record"}}
> Unfortunately, there is some inconsistency between Avro implementations in 
> different languages that may prevent this record from being read, notably 
> Python, which generates the error:
> *avro.schema.SchemaParseException: record is a reserved type name*
> (See the comment on FLINK-18096 for the full stack trace).
> The Java SDK accepts this name, and there's an [ongoing 
> discussion|https://lists.apache.org/thread/0wmgyx6z69gy07lvj9ndko75752b8cn2] 
> about what the expected behaviour should be.  This should be clarified and 
> fixed in Avro, of course.
> Regardless of the resolution, the best practice (which is used almost 
> everywhere else in the Flink codebase) is to explicitly specify a top-level 
> namespace for an Avro record.   We should use a default like: 
> {{{}org.apache.flink.avro.generated{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-28719) Mapping a data source before window aggregation causes Flink to stop handle late events correctly.

2022-08-01 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573809#comment-17573809
 ] 

Chesnay Schepler commented on FLINK-28719:
--

So let's dive into what is happening. The source runs with a parallelism of 1, 
and let's say for simplicity that the map runs with p=7 (one subtask for each 
record).
The sink send one record to one map subtask, and then the watermark to all map 
subtasks. These just forward what they receive to the window operator.

We can thus visualize the inputs to the window function as a series of stacks, 
one for each map subtask.

(X is an element where the timestamp is X, WY is a watermark with timestamp Y).
||op1||op2||op3||op4||op5||op6||op7||
|W7|W7|W7|W7|W7|W7|W7|
| | | | | | |7|
|W4|W4|W4|W4|W4|W4|W4|
| | | | | |4| |
|W3|W3|W3|W3|W3|W3|W3|
| | | | |3| | |
|W6|W6|W6|W6|W6|W6|W6|
| | | |6| | | |
|W5|W5|W5|W5|W5|W5|W5|
| | |5| | | | |
|W2|W2|W2|W2|W2|W2|W2|
| |2| | | | | |
|W1|W1|W1|W1|W1|W1|W1|
|1| | | | | | |

Now consume records in an arbitrary order (i.e., pull data from the bottom of a 
stack until you reach the value). Whenever the watermarks from each input is 
greater or equal T, then the window considers T as the current time.

Let's say we consume the input of op1 and op2 completely, and from all other 
inputs consume all watermarks until we hit a record in each.

The inputs then only contain this:
||op1||op2||op3||op4||op5||op6||op7||
| | |W7|W7|W7|W7|W7|
| | | | | | |7|
| | |W4|W4|W4|W4| |
| | | | | |4| |
| | |W3|W3|W3| | |
| | | | |3| | |
| | |W6|W6| | | |
| | | |6| | | |
| | |W5| | | | |
| | |5| | | | |

Whereas the current watermark for each input are this:
|W7|W7|W2|W5|W6|W6|W6|

Since 2 is the current watermark we can fire the first window (from timestamp 
0-2).

This is the point where things get interesting. The next window to be fired 
ranges from 3-5.

If we now consume an element from op3, then we read the element 5 and the 
watermark 5.

Our watermarks would then look like this:
|W7|W7|W5|W5|W6|W6|W6|

So we fire the next window, only containing element 5.

However, let's revert back, and instead first read records from other streams, 
like element 4 and 3.
These still fit into the current window (3-5)
||op1||op2||op3||op4||op5||op6||op7||
| | |W7|W7| | |W7|
| | | | | | |7|
| | |W4|W4| | | |
| | | | | | | |
| | |W3|W3| | | |
| | | | | | | |
| | |W6|W6| | | |
| | | |6| | | |
| | |W5| | | | |
| | |5| | | | |
Now let's look at the watermarks:
|W7|W7|W2|W5|W7|W7|W6|

Since we haven't read from op3 the current time is still 2, so these elements 
are now not considered late.
Now we read 5, bumping the time to 5, firing the window, containing 3,4 and 5.

> Mapping a data source before window aggregation causes Flink to stop handle 
> late events correctly.
> --
>
> Key: FLINK-28719
> URL: https://issues.apache.org/jira/browse/FLINK-28719
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.15.1
>Reporter: Mykyta Mykhailenko
>Priority: Major
>
> I have created a 
> [repository|https://github.com/mykytamykhailenko/flink-map-with-issue] where 
> I describe this issue in detail. 
> I have provided a few tests and source code so that you can reproduce the 
> issue on your own machine. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-28719) Mapping a data source before window aggregation causes Flink to stop handle late events correctly.

2022-08-01 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573804#comment-17573804
 ] 

Chesnay Schepler commented on FLINK-28719:
--

Your test only works when the elements arrive at the window function in the 
same order as in the original sequence.

But this is not the case since you are not setting a parallelism and the 
MiniClusterWithClientResource isn't used properly; it will use some parallelism 
> 1, causing the {{mapWith}} to effectively shuffle values.

This results in the record/watermark streams being interlaced, resulting in 
non-deterministic output.

> Mapping a data source before window aggregation causes Flink to stop handle 
> late events correctly.
> --
>
> Key: FLINK-28719
> URL: https://issues.apache.org/jira/browse/FLINK-28719
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.15.1
>Reporter: Mykyta Mykhailenko
>Priority: Major
>
> I have created a 
> [repository|https://github.com/mykytamykhailenko/flink-map-with-issue] where 
> I describe this issue in detail. 
> I have provided a few tests and source code so that you can reproduce the 
> issue on your own machine. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-28653) State Schema Evolution does not work - Flink defaults to Kryo serialization even for POJOs and Avro SpecificRecords

2022-07-29 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572977#comment-17572977
 ] 

Chesnay Schepler edited comment on FLINK-28653 at 7/29/22 1:17 PM:
---

You're also using too many generic parameters; Flink can't infer the type that 
the functions consume, so Kryo is/should used for everything.

Because the JobRunner has a parameter T, Flink doesn't know anything about the 
actual type (because at runtime it's just {{Object}}). In contrast, if 
JobRunner used the type {{User}} Flink could infer more about the data (e.g., 
that it's a POJO).


was (Author: zentol):
You're also using too many generic parameters; Flink can't infer the type that 
the functions consume, so Kryo is/should used for everything.

Because the JobRunner has a parameter T, Flink doesn't know anything about the 
actual type (because at runtime it's just {{Object}}). In contrast, if 
JobRunner used the type User Flink could infer more about the data (e.g., that 
it's a POJO).

> State Schema Evolution does not work - Flink defaults to Kryo serialization 
> even for POJOs and Avro SpecificRecords
> ---
>
> Key: FLINK-28653
> URL: https://issues.apache.org/jira/browse/FLINK-28653
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Runtime / State Backends
>Affects Versions: 1.14.3, 1.15.0
> Environment: I ran the job on a Flink cluster I spun up using docker 
> compose:
> ```
> version: "2.2"
> services:
>   jobmanager:
>     image: flink:latest
>     ports:
>       - "8081:8081"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>   taskmanager:
>     image: flink:latest
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     scale: 1
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
> ```
>  My machine is a MacBook Pro (14-inch, 2021) with the Apple M1 Pro chip.
> I'm running macOS Monterey Version 12.4.
>Reporter: Peleg Tsadok
>Priority: Major
>  Labels: KryoSerializer, State, avro, pojo, schema-evolution
>
> I am trying to do a POC of Flink State Schema Evolution. I am using Flink 
> 1.15.0 and Java 11 but also tested on Flink 1.14.3.
> I tried to create 3 data classes - one for each serialization type:
> 1. `io.peleg.kryo.User` - Uses `java.time.Instant` class which I know is not 
> supported for POJO serialization in Flink.
> 2. `io.peleg.pojo.User` - Uses only classic wrapped primitives - `Integer`, 
> `Long`, `String`. The getters, setters and constructors are generated using 
> Lombok.
> 3. `io.peleg.avro.User` - Generated from Avro schema using Avro Maven Plugin.
> For each class I wrote a stream job that uses a time window to buffer 
> elements and turn them into a list.
> For each class I tried to do the following:
> 1. Run a job
> 2. Stop with savepoint
> 3. Add a field to the data class
> 4. Submit using savepoint
> For all data classes the submit with savepoint failed with this exception:
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
>     at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_3983d6bb2f0a45b638461bc99138f22f_(2/2) from 
> any of the 1 provided restore options.
>     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.cre

[jira] [Commented] (FLINK-28653) State Schema Evolution does not work - Flink defaults to Kryo serialization even for POJOs and Avro SpecificRecords

2022-07-29 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572977#comment-17572977
 ] 

Chesnay Schepler commented on FLINK-28653:
--

You're also using too many generic parameters; Flink can't infer the type that 
the functions consume, so Kryo is/should used for everything.

Because the JobRunner has a parameter T, Flink doesn't know anything about the 
actual type (because at runtime it's just {{Object}}). In contrast, if 
JobRunner used the type User Flink could infer more about the data (e.g., that 
it's a POJO).

> State Schema Evolution does not work - Flink defaults to Kryo serialization 
> even for POJOs and Avro SpecificRecords
> ---
>
> Key: FLINK-28653
> URL: https://issues.apache.org/jira/browse/FLINK-28653
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Runtime / State Backends
>Affects Versions: 1.14.3, 1.15.0
> Environment: I ran the job on a Flink cluster I spun up using docker 
> compose:
> ```
> version: "2.2"
> services:
>   jobmanager:
>     image: flink:latest
>     ports:
>       - "8081:8081"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>   taskmanager:
>     image: flink:latest
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     scale: 1
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
> ```
>  My machine is a MacBook Pro (14-inch, 2021) with the Apple M1 Pro chip.
> I'm running macOS Monterey Version 12.4.
>Reporter: Peleg Tsadok
>Priority: Major
>  Labels: KryoSerializer, State, avro, pojo, schema-evolution
>
> I am trying to do a POC of Flink State Schema Evolution. I am using Flink 
> 1.15.0 and Java 11 but also tested on Flink 1.14.3.
> I tried to create 3 data classes - one for each serialization type:
> 1. `io.peleg.kryo.User` - Uses `java.time.Instant` class which I know is not 
> supported for POJO serialization in Flink.
> 2. `io.peleg.pojo.User` - Uses only classic wrapped primitives - `Integer`, 
> `Long`, `String`. The getters, setters and constructors are generated using 
> Lombok.
> 3. `io.peleg.avro.User` - Generated from Avro schema using Avro Maven Plugin.
> For each class I wrote a stream job that uses a time window to buffer 
> elements and turn them into a list.
> For each class I tried to do the following:
> 1. Run a job
> 2. Stop with savepoint
> 3. Add a field to the data class
> 4. Submit using savepoint
> For all data classes the submit with savepoint failed with this exception:
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
>     at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_3983d6bb2f0a45b638461bc99138f22f_(2/2) from 
> any of the 1 provided restore options.
>     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:164)
>     ... 11 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore

[jira] [Comment Edited] (FLINK-28653) State Schema Evolution does not work - Flink defaults to Kryo serialization even for POJOs and Avro SpecificRecords

2022-07-29 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572958#comment-17572958
 ] 

Chesnay Schepler edited comment on FLINK-28653 at 7/29/22 1:04 PM:
---

-FYI, these lines are _not_ required for the pojo serializer to be used:-

{code:java}
env.getConfig().registerPojoType(User.class);
env.getConfig().disableForceKryo();
{code}

Dang they are actually necessary for state?

[~peleg68] your JobRunner ignores the ExecutionEnvironment argument.
https://github.com/peleg68/flink-state-schema-evolution/blob/main/src/main/java/io/peleg/JobRunner.java#L17



was (Author: zentol):
FYI, these lines are _not_ required for the pojo serializer to be used:

{code:java}
env.getConfig().registerPojoType(User.class);
env.getConfig().disableForceKryo();
{code}


> State Schema Evolution does not work - Flink defaults to Kryo serialization 
> even for POJOs and Avro SpecificRecords
> ---
>
> Key: FLINK-28653
> URL: https://issues.apache.org/jira/browse/FLINK-28653
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Runtime / State Backends
>Affects Versions: 1.14.3, 1.15.0
> Environment: I ran the job on a Flink cluster I spun up using docker 
> compose:
> ```
> version: "2.2"
> services:
>   jobmanager:
>     image: flink:latest
>     ports:
>       - "8081:8081"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>   taskmanager:
>     image: flink:latest
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     scale: 1
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
> ```
>  My machine is a MacBook Pro (14-inch, 2021) with the Apple M1 Pro chip.
> I'm running macOS Monterey Version 12.4.
>Reporter: Peleg Tsadok
>Priority: Major
>  Labels: KryoSerializer, State, avro, pojo, schema-evolution
>
> I am trying to do a POC of Flink State Schema Evolution. I am using Flink 
> 1.15.0 and Java 11 but also tested on Flink 1.14.3.
> I tried to create 3 data classes - one for each serialization type:
> 1. `io.peleg.kryo.User` - Uses `java.time.Instant` class which I know is not 
> supported for POJO serialization in Flink.
> 2. `io.peleg.pojo.User` - Uses only classic wrapped primitives - `Integer`, 
> `Long`, `String`. The getters, setters and constructors are generated using 
> Lombok.
> 3. `io.peleg.avro.User` - Generated from Avro schema using Avro Maven Plugin.
> For each class I wrote a stream job that uses a time window to buffer 
> elements and turn them into a list.
> For each class I tried to do the following:
> 1. Run a job
> 2. Stop with savepoint
> 3. Add a field to the data class
> 4. Submit using savepoint
> For all data classes the submit with savepoint failed with this exception:
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
>     at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_3983d6bb2f0a45b638461bc99138f22f_(2/2) from 
> any of the 1 provided restore options.
>     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitiali

[jira] [Comment Edited] (FLINK-28653) State Schema Evolution does not work - Flink defaults to Kryo serialization even for POJOs and Avro SpecificRecords

2022-07-29 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572958#comment-17572958
 ] 

Chesnay Schepler edited comment on FLINK-28653 at 7/29/22 12:49 PM:


FYI, these lines are _not_ required for the pojo serializer to be used:

{code:java}
env.getConfig().registerPojoType(User.class);
env.getConfig().disableForceKryo();
{code}



was (Author: zentol):
FYI, if your lombok class would be recognized as a POJO then your "kryo" job 
would also use the pojo serializer.
These lines are _not_ required for the pojo serializer to be used:

{code:java}
env.getConfig().registerPojoType(User.class);
env.getConfig().disableForceKryo();
{code}


> State Schema Evolution does not work - Flink defaults to Kryo serialization 
> even for POJOs and Avro SpecificRecords
> ---
>
> Key: FLINK-28653
> URL: https://issues.apache.org/jira/browse/FLINK-28653
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Runtime / State Backends
>Affects Versions: 1.14.3, 1.15.0
> Environment: I ran the job on a Flink cluster I spun up using docker 
> compose:
> ```
> version: "2.2"
> services:
>   jobmanager:
>     image: flink:latest
>     ports:
>       - "8081:8081"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>   taskmanager:
>     image: flink:latest
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     scale: 1
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
> ```
>  My machine is a MacBook Pro (14-inch, 2021) with the Apple M1 Pro chip.
> I'm running macOS Monterey Version 12.4.
>Reporter: Peleg Tsadok
>Priority: Major
>  Labels: KryoSerializer, State, avro, pojo, schema-evolution
>
> I am trying to do a POC of Flink State Schema Evolution. I am using Flink 
> 1.15.0 and Java 11 but also tested on Flink 1.14.3.
> I tried to create 3 data classes - one for each serialization type:
> 1. `io.peleg.kryo.User` - Uses `java.time.Instant` class which I know is not 
> supported for POJO serialization in Flink.
> 2. `io.peleg.pojo.User` - Uses only classic wrapped primitives - `Integer`, 
> `Long`, `String`. The getters, setters and constructors are generated using 
> Lombok.
> 3. `io.peleg.avro.User` - Generated from Avro schema using Avro Maven Plugin.
> For each class I wrote a stream job that uses a time window to buffer 
> elements and turn them into a list.
> For each class I tried to do the following:
> 1. Run a job
> 2. Stop with savepoint
> 3. Add a field to the data class
> 4. Submit using savepoint
> For all data classes the submit with savepoint failed with this exception:
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
>     at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_3983d6bb2f0a45b638461bc99138f22f_(2/2) from 
> any of the 1 provided restore options.
>     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperat

[jira] [Commented] (FLINK-28653) State Schema Evolution does not work - Flink defaults to Kryo serialization even for POJOs and Avro SpecificRecords

2022-07-29 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572958#comment-17572958
 ] 

Chesnay Schepler commented on FLINK-28653:
--

FYI, if your lombok class would be recognized as a POJO then your "kryo" job 
would also use the pojo serializer.
These lines are _not_ required for the pojo serializer to be used:

{code:java}
env.getConfig().registerPojoType(User.class);
env.getConfig().disableForceKryo();
{code}


> State Schema Evolution does not work - Flink defaults to Kryo serialization 
> even for POJOs and Avro SpecificRecords
> ---
>
> Key: FLINK-28653
> URL: https://issues.apache.org/jira/browse/FLINK-28653
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Runtime / State Backends
>Affects Versions: 1.14.3, 1.15.0
> Environment: I ran the job on a Flink cluster I spun up using docker 
> compose:
> ```
> version: "2.2"
> services:
>   jobmanager:
>     image: flink:latest
>     ports:
>       - "8081:8081"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>   taskmanager:
>     image: flink:latest
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     scale: 1
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
> ```
>  My machine is a MacBook Pro (14-inch, 2021) with the Apple M1 Pro chip.
> I'm running macOS Monterey Version 12.4.
>Reporter: Peleg Tsadok
>Priority: Major
>  Labels: KryoSerializer, State, avro, pojo, schema-evolution
>
> I am trying to do a POC of Flink State Schema Evolution. I am using Flink 
> 1.15.0 and Java 11 but also tested on Flink 1.14.3.
> I tried to create 3 data classes - one for each serialization type:
> 1. `io.peleg.kryo.User` - Uses `java.time.Instant` class which I know is not 
> supported for POJO serialization in Flink.
> 2. `io.peleg.pojo.User` - Uses only classic wrapped primitives - `Integer`, 
> `Long`, `String`. The getters, setters and constructors are generated using 
> Lombok.
> 3. `io.peleg.avro.User` - Generated from Avro schema using Avro Maven Plugin.
> For each class I wrote a stream job that uses a time window to buffer 
> elements and turn them into a list.
> For each class I tried to do the following:
> 1. Run a job
> 2. Stop with savepoint
> 3. Add a field to the data class
> 4. Submit using savepoint
> For all data classes the submit with savepoint failed with this exception:
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
>     at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_3983d6bb2f0a45b638461bc99138f22f_(2/2) from 
> any of the 1 provided restore options.
>     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:164)
>     ... 11 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore heap backend
>     at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder

[jira] [Comment Edited] (FLINK-28653) State Schema Evolution does not work - Flink defaults to Kryo serialization even for POJOs and Avro SpecificRecords

2022-07-29 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572946#comment-17572946
 ] 

Chesnay Schepler edited comment on FLINK-28653 at 7/29/22 12:14 PM:


[~peleg68] Please run {{TypeInformation.of(User.class);}} for the lombok POJO 
with INFO logging enabled and check if the TypeExtractor rejects it as a Pojo.


was (Author: zentol):
[~peleg68] Please run {{TypeInformation.of(User.class);}} with INFO logging 
enabled and check if the TypeExtractor rejects it as a Pojo.

> State Schema Evolution does not work - Flink defaults to Kryo serialization 
> even for POJOs and Avro SpecificRecords
> ---
>
> Key: FLINK-28653
> URL: https://issues.apache.org/jira/browse/FLINK-28653
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Runtime / State Backends
>Affects Versions: 1.14.3, 1.15.0
> Environment: I ran the job on a Flink cluster I spun up using docker 
> compose:
> ```
> version: "2.2"
> services:
>   jobmanager:
>     image: flink:latest
>     ports:
>       - "8081:8081"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>   taskmanager:
>     image: flink:latest
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     scale: 1
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
> ```
>  My machine is a MacBook Pro (14-inch, 2021) with the Apple M1 Pro chip.
> I'm running macOS Monterey Version 12.4.
>Reporter: Peleg Tsadok
>Priority: Major
>  Labels: KryoSerializer, State, avro, pojo, schema-evolution
>
> I am trying to do a POC of Flink State Schema Evolution. I am using Flink 
> 1.15.0 and Java 11 but also tested on Flink 1.14.3.
> I tried to create 3 data classes - one for each serialization type:
> 1. `io.peleg.kryo.User` - Uses `java.time.Instant` class which I know is not 
> supported for POJO serialization in Flink.
> 2. `io.peleg.pojo.User` - Uses only classic wrapped primitives - `Integer`, 
> `Long`, `String`. The getters, setters and constructors are generated using 
> Lombok.
> 3. `io.peleg.avro.User` - Generated from Avro schema using Avro Maven Plugin.
> For each class I wrote a stream job that uses a time window to buffer 
> elements and turn them into a list.
> For each class I tried to do the following:
> 1. Run a job
> 2. Stop with savepoint
> 3. Add a field to the data class
> 4. Submit using savepoint
> For all data classes the submit with savepoint failed with this exception:
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
>     at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_3983d6bb2f0a45b638461bc99138f22f_(2/2) from 
> any of the 1 provided restore options.
>     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:164)
>     ... 11 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore heap backend
>

[jira] (FLINK-28653) State Schema Evolution does not work - Flink defaults to Kryo serialization even for POJOs and Avro SpecificRecords

2022-07-29 Thread Chesnay Schepler (Jira)



[ https://issues.apache.org/jira/browse/FLINK-28653 ]


Chesnay Schepler deleted comment on FLINK-28653:
--

was (Author: zentol):
Also, are you intentionally not bundling lombok in your application jar?

> State Schema Evolution does not work - Flink defaults to Kryo serialization 
> even for POJOs and Avro SpecificRecords
> ---
>
> Key: FLINK-28653
> URL: https://issues.apache.org/jira/browse/FLINK-28653
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Runtime / State Backends
>Affects Versions: 1.14.3, 1.15.0
> Environment: I ran the job on a Flink cluster I spun up using docker 
> compose:
> ```
> version: "2.2"
> services:
>   jobmanager:
>     image: flink:latest
>     ports:
>       - "8081:8081"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>   taskmanager:
>     image: flink:latest
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     scale: 1
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
> ```
>  My machine is a MacBook Pro (14-inch, 2021) with the Apple M1 Pro chip.
> I'm running macOS Monterey Version 12.4.
>Reporter: Peleg Tsadok
>Priority: Major
>  Labels: KryoSerializer, State, avro, pojo, schema-evolution
>
> I am trying to do a POC of Flink State Schema Evolution. I am using Flink 
> 1.15.0 and Java 11 but also tested on Flink 1.14.3.
> I tried to create 3 data classes - one for each serialization type:
> 1. `io.peleg.kryo.User` - Uses `java.time.Instant` class which I know is not 
> supported for POJO serialization in Flink.
> 2. `io.peleg.pojo.User` - Uses only classic wrapped primitives - `Integer`, 
> `Long`, `String`. The getters, setters and constructors are generated using 
> Lombok.
> 3. `io.peleg.avro.User` - Generated from Avro schema using Avro Maven Plugin.
> For each class I wrote a stream job that uses a time window to buffer 
> elements and turn them into a list.
> For each class I tried to do the following:
> 1. Run a job
> 2. Stop with savepoint
> 3. Add a field to the data class
> 4. Submit using savepoint
> For all data classes the submit with savepoint failed with this exception:
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
>     at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_3983d6bb2f0a45b638461bc99138f22f_(2/2) from 
> any of the 1 provided restore options.
>     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:164)
>     ... 11 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore heap backend
>     at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:172)
>     at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:106)
>     at 
> org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:143)
>

[jira] [Commented] (FLINK-28653) State Schema Evolution does not work - Flink defaults to Kryo serialization even for POJOs and Avro SpecificRecords

2022-07-29 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572947#comment-17572947
 ] 

Chesnay Schepler commented on FLINK-28653:
--

Also, are you intentionally not bundling lombok in your application jar?

> State Schema Evolution does not work - Flink defaults to Kryo serialization 
> even for POJOs and Avro SpecificRecords
> ---
>
> Key: FLINK-28653
> URL: https://issues.apache.org/jira/browse/FLINK-28653
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Runtime / State Backends
>Affects Versions: 1.14.3, 1.15.0
> Environment: I ran the job on a Flink cluster I spun up using docker 
> compose:
> ```
> version: "2.2"
> services:
>   jobmanager:
>     image: flink:latest
>     ports:
>       - "8081:8081"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>   taskmanager:
>     image: flink:latest
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     scale: 1
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
> ```
>  My machine is a MacBook Pro (14-inch, 2021) with the Apple M1 Pro chip.
> I'm running macOS Monterey Version 12.4.
>Reporter: Peleg Tsadok
>Priority: Major
>  Labels: KryoSerializer, State, avro, pojo, schema-evolution
>
> I am trying to do a POC of Flink State Schema Evolution. I am using Flink 
> 1.15.0 and Java 11 but also tested on Flink 1.14.3.
> I tried to create 3 data classes - one for each serialization type:
> 1. `io.peleg.kryo.User` - Uses `java.time.Instant` class which I know is not 
> supported for POJO serialization in Flink.
> 2. `io.peleg.pojo.User` - Uses only classic wrapped primitives - `Integer`, 
> `Long`, `String`. The getters, setters and constructors are generated using 
> Lombok.
> 3. `io.peleg.avro.User` - Generated from Avro schema using Avro Maven Plugin.
> For each class I wrote a stream job that uses a time window to buffer 
> elements and turn them into a list.
> For each class I tried to do the following:
> 1. Run a job
> 2. Stop with savepoint
> 3. Add a field to the data class
> 4. Submit using savepoint
> For all data classes the submit with savepoint failed with this exception:
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
>     at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_3983d6bb2f0a45b638461bc99138f22f_(2/2) from 
> any of the 1 provided restore options.
>     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:164)
>     ... 11 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore heap backend
>     at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:172)
>     at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:106)
>     at 
> org.apache.flink.runtime.state.hashmap.H

[jira] [Commented] (FLINK-28653) State Schema Evolution does not work - Flink defaults to Kryo serialization even for POJOs and Avro SpecificRecords

2022-07-29 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572946#comment-17572946
 ] 

Chesnay Schepler commented on FLINK-28653:
--

[~peleg68] Please run {{TypeInformation.of(User.class);}} with INFO logging 
enabled and check if the TypeExtractor rejects it as a Pojo.

> State Schema Evolution does not work - Flink defaults to Kryo serialization 
> even for POJOs and Avro SpecificRecords
> ---
>
> Key: FLINK-28653
> URL: https://issues.apache.org/jira/browse/FLINK-28653
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Runtime / State Backends
>Affects Versions: 1.14.3, 1.15.0
> Environment: I ran the job on a Flink cluster I spun up using docker 
> compose:
> ```
> version: "2.2"
> services:
>   jobmanager:
>     image: flink:latest
>     ports:
>       - "8081:8081"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>   taskmanager:
>     image: flink:latest
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     scale: 1
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
> ```
>  My machine is a MacBook Pro (14-inch, 2021) with the Apple M1 Pro chip.
> I'm running macOS Monterey Version 12.4.
>Reporter: Peleg Tsadok
>Priority: Major
>  Labels: KryoSerializer, State, avro, pojo, schema-evolution
>
> I am trying to do a POC of Flink State Schema Evolution. I am using Flink 
> 1.15.0 and Java 11 but also tested on Flink 1.14.3.
> I tried to create 3 data classes - one for each serialization type:
> 1. `io.peleg.kryo.User` - Uses `java.time.Instant` class which I know is not 
> supported for POJO serialization in Flink.
> 2. `io.peleg.pojo.User` - Uses only classic wrapped primitives - `Integer`, 
> `Long`, `String`. The getters, setters and constructors are generated using 
> Lombok.
> 3. `io.peleg.avro.User` - Generated from Avro schema using Avro Maven Plugin.
> For each class I wrote a stream job that uses a time window to buffer 
> elements and turn them into a list.
> For each class I tried to do the following:
> 1. Run a job
> 2. Stop with savepoint
> 3. Add a field to the data class
> 4. Submit using savepoint
> For all data classes the submit with savepoint failed with this exception:
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
>     at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>     at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_3983d6bb2f0a45b638461bc99138f22f_(2/2) from 
> any of the 1 provided restore options.
>     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346)
>     at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:164)
>     ... 11 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore heap backend
>     at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:172)
>     at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuild

[jira] [Updated] (FLINK-28675) Avro Schemas should eagerly validate that class is SpecificRecord

2022-07-29 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-28675:
-
Summary: Avro Schemas should eagerly validate that class is SpecificRecord  
(was: AvroDeserializationSchema should eagerly validate that class is 
SpecificRecord)

> Avro Schemas should eagerly validate that class is SpecificRecord
> -
>
> Key: FLINK-28675
> URL: https://issues.apache.org/jira/browse/FLINK-28675
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.15.1
>Reporter: Chesnay Schepler
>Priority: Major
> Fix For: 1.16.0
>
>
> The AvroDeserializationSchema supports both generic and specific records, 
> with dedicated factory methods.
> It does however not validate in any way whether the classes passed to the 
> factories methods are actually generic/specific records respectively, which 
> can result in Flink attempting to read generic records (and failing with an 
> NPE) even though the user told us to read specific records.
> We should validate this eagerly and fail early.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-27206) Deprecate reflection-based reporter instantiation

2022-07-28 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-27206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-27206:
-
Release Note: 
Configuring reporters by their class has been deprecated. Reporter 
implementations should provide a MetricReporterFactory, and all configurations 
should be migrated to such a factory.
If the reporter is loaded from the plugins directory, setting 
metrics.reporter.reporter_name.class no longer works.

  was:Configuring reporters by their class has been deprecated. Reporter 
implementations should provide a MetricReporterFactory, and all configurations 
should be migrated to such a factory.


> Deprecate reflection-based reporter instantiation
> -
>
> Key: FLINK-27206
> URL: https://issues.apache.org/jira/browse/FLINK-27206
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Metric reporters can currently be instantiated in one of 2 ways:
> a) the reporter class is loaded via reflection
> b) the reporter factory is loaded via reflection/ServiceLoader (aka, plugins)
> All reporters provided by Flink use the factory approach, and it is 
> preferable because it supports plugins. The plugin approach also has been 
> available 1.11, and I think it's fair to deprecate the old approach by now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28735) Deprecate host/webuiport parameter of jobmanager.sh

2022-07-28 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-28735:


 Summary: Deprecate host/webuiport parameter of jobmanager.sh
 Key: FLINK-28735
 URL: https://issues.apache.org/jira/browse/FLINK-28735
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Scripts
Reporter: Chesnay Schepler
 Fix For: 1.16.0


If we fix FLINK-28733 we could while we're at it deprecate these 2 parameters, 
since you can then also control them via dynamic properties.

This would also subsume FLINK-21038.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28733) jobmanager.sh should support dynamic properties

2022-07-28 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-28733:


 Summary: jobmanager.sh should support dynamic properties
 Key: FLINK-28733
 URL: https://issues.apache.org/jira/browse/FLINK-28733
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Scripts
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


{{jobmanager.sh}} throws away all arguments after the host/webui-port settings, 
in contrast to other scripts like {{taskmanager.sh}},{{ historyserver.sh}} or 
{{standalone-job.sh}}.

This prevents users from using dynamic properties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28634) Add a simple Json (De) SerializationSchema

2022-07-28 Thread Chesnay Schepler (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-28634.

Resolution: Fixed

master: ae45bd3c50abf8b2621a5de31410ae381f7ffa04

> Add a simple Json (De) SerializationSchema
> --
>
> Key: FLINK-28634
> URL: https://issues.apache.org/jira/browse/FLINK-28634
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Add a basic schema to read/write JSON.
> This is so common that users shouldn't have to implement that themselves.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28731) Logging of global config should take dynamic properties into account

2022-07-28 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-28731:


 Summary: Logging of global config should take dynamic properties 
into account
 Key: FLINK-28731
 URL: https://issues.apache.org/jira/browse/FLINK-28731
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0


When a Flink process is started they first thing they do is load the global 
configuration from the flink-conf.yaml and log what was read.

When additional options were set by the user via dynamic properties, then they 
are not reflected in the logging of the global configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

< 10 11 12 13 14 15 16 17 18 19 >

1401 - 1500 of 12546 matches

Mail list logo