[jira] [Created] (FLINK-21753) Cycle references between memory manager and gc cleaner action
Kezhu Wang created FLINK-21753: -- Summary: Cycle references between memory manager and gc cleaner action Key: FLINK-21753 URL: https://issues.apache.org/jira/browse/FLINK-21753 Project: Flink Issue Type: Bug Components: Runtime / Task Affects Versions: 1.12.2, 1.11.3, 1.10.3 Reporter: Kezhu Wang {{MemoryManager.allocatePages}} uses {{this::releasePage}} as cleanup action in {{MemorySegmentFactory.allocateOffHeapUnsafeMemory}}. The cleanup function is used as gc cleaner action there. This creates a cycle referencing between memory manager and gc cleaner *if allocated memory segment is not {{MemoryManager.release}} in time.* Symptoms should be different based on versions: * Before 1.12.2: memory will not be reclaimed until gc after {{MemoryManager.release}} * * 1.12.2: memory will not be reclaimed until {{MemorySegment.free}} or gc after {{MemoryManager.release}} I quotes javadoc from jdk {{java.lang.ref.Cleaner}} here for references: {quote}The cleaning action is invoked only after the associated object becomes phantom reachable, so it is important that the object implementing the cleaning action does not hold references to the object. In this example, a static class encapsulates the cleaning state and action. An "inner" class, anonymous or not, must not be used because it implicitly contains a reference to the outer instance, preventing it from becoming phantom reachable. The choice of a new cleaner or sharing an existing cleaner is determined by the use case. {quote} See also FLINK-13985 FLINK-21419. I pushed [test case|https://github.com/kezhuw/flink/commit/9a5d71d3a6be50611cf2f5c65c39f51353167306] in my repository after FLINK-21419 (which merged in 1.12.2 but not before) for evaluation. cc [~azagrebin] [~xintongsong] [~trohrmann] [~ykt836] [~nicholasjiang] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #15136: [FLINK-21622][table-planner] Introduce function TO_TIMESTAMP_LTZ(numeric [, precision])
flinkbot edited a comment on pull request #15136: URL: https://github.com/apache/flink/pull/15136#issuecomment-795141489 ## CI report: * 38f1cea9a24f244241f6e2a9f069feb7f774652c Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14561) * bc9e5155087db2f58663c0f13164d07c1716f7cd Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14577) * 30df8333ffb8cb76b6bf46043a3cc8dc1b2d8420 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15174: [FLINK-21651][sql-client] Migrate module related tests in LocalExecutorITCase to the new integration test framework
flinkbot edited a comment on pull request #15174: URL: https://github.com/apache/flink/pull/15174#issuecomment-797861091 ## CI report: * faa510486a60de5f71d9cf924ae47fcac04560ff Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14576) Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14572) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15136: [FLINK-21622][table-planner] Introduce function TO_TIMESTAMP_LTZ(numeric [, precision])
flinkbot edited a comment on pull request #15136: URL: https://github.com/apache/flink/pull/15136#issuecomment-795141489 ## CI report: * 38f1cea9a24f244241f6e2a9f069feb7f774652c Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14561) * bc9e5155087db2f58663c0f13164d07c1716f7cd UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample
flinkbot edited a comment on pull request #15121: URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240 ## CI report: * 7383416dfaf59f0aeebc77450b6c3dc9b495789a Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14571) * 621d7c223c5e8044ecb7965faf086431500f7d63 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14575) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15174: [FLINK-21651][sql-client] Migrate module related tests in LocalExecutorITCase to the new integration test framework
flinkbot edited a comment on pull request #15174: URL: https://github.com/apache/flink/pull/15174#issuecomment-797861091 ## CI report: * faa510486a60de5f71d9cf924ae47fcac04560ff Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14572) Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14576) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample
flinkbot edited a comment on pull request #15121: URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240 ## CI report: * 7383416dfaf59f0aeebc77450b6c3dc9b495789a Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14571) * 621d7c223c5e8044ecb7965faf086431500f7d63 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] LadyForest commented on pull request #15174: [FLINK-21651][sql-client] Migrate module related tests in LocalExecutorITCase to the new integration test framework
LadyForest commented on pull request #15174: URL: https://github.com/apache/flink/pull/15174#issuecomment-797876066 @flinkbot run azure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-21523) ArrayIndexOutOfBoundsException occurs while run a hive streaming job with partitioned table source
[ https://issues.apache.org/jira/browse/FLINK-21523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu updated FLINK-21523: Fix Version/s: 1.12.3 > ArrayIndexOutOfBoundsException occurs while run a hive streaming job with > partitioned table source > --- > > Key: FLINK-21523 > URL: https://issues.apache.org/jira/browse/FLINK-21523 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.12.1 >Reporter: zouyunhe >Assignee: zouyunhe >Priority: Major > Labels: pull-request-available > Fix For: 1.13.0, 1.12.3 > > > we have two hive table, the ddl as below > {code:java} > //test_tbl5 > create table test.test_5 > (dpi int, > uid bigint) > partitioned by( day string, hour string) stored as parquet; > //test_tbl3 > create table test.test_3( > dpi int, > uid bigint, > itime timestamp) stored as parquet;{code} > then add a partiton to test_tbl5, > {code:java} > alter table test_tbl5 add partition(day='2021-02-27',hour='12'); > {code} > we start a flink streaming job to read hive table test_tbl5 , and write the > data into test_tbl3, the job's sql as > {code:java} > set test_tbl5.streaming-source.enable = true; > insert into hive.test.test_tbl3 select dpi, uid, > cast(to_timestamp('2020-08-09 00:00:00') as timestamp(9)) from > hive.test.test_tbl5 where `day` = '2021-02-27'; > {code} > and we seen the exception throws > {code:java} > 2021-02-28 22:33:16,553 ERROR > org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext - > Exception while handling result from async call in SourceCoordinator-Source: > HiveSource-test.test_tbl5. Triggering job > failover.org.apache.flink.connectors.hive.FlinkHiveException: Failed to > enumerate filesat > org.apache.flink.connectors.hive.ContinuousHiveSplitEnumerator.handleNewSplits(ContinuousHiveSplitEnumerator.java:152) > ~[flink-connector-hive_2.12-1.12.1.jar:1.12.1]at > org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$null$4(ExecutorNotifier.java:136) > ~[flink-dist_2.12-1.12.1.jar:1.12.1]at > org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:40) > [flink-dist_2.12-1.12.1.jar:1.12.1]at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_60]at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_60]at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]Caused > by: java.lang.ArrayIndexOutOfBoundsException: -1at > org.apache.flink.connectors.hive.util.HivePartitionUtils.toHiveTablePartition(HivePartitionUtils.java:184) > ~[flink-connector-hive_2.12-1.12.1.jar:1.12.1]at > org.apache.flink.connectors.hive.HiveTableSource$HiveContinuousPartitionFetcherContext.toHiveTablePartition(HiveTableSource.java:417) > ~[flink-connector-hive_2.12-1.12.1.jar:1.12.1]at > org.apache.flink.connectors.hive.ContinuousHiveSplitEnumerator$PartitionMonitor.call(ContinuousHiveSplitEnumerator.java:237) > ~[flink-connector-hive_2.12-1.12.1.jar:1.12.1]at > org.apache.flink.connectors.hive.ContinuousHiveSplitEnumerator$PartitionMonitor.call(ContinuousHiveSplitEnumerator.java:177) > ~[flink-connector-hive_2.12-1.12.1.jar:1.12.1]at > org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$notifyReadyAsync$5(ExecutorNotifier.java:133) > ~[flink-dist_2.12-1.12.1.jar:1.12.1]at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_60]at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > ~[?:1.8.0_60]at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > ~[?:1.8.0_60]at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > ~[?:1.8.0_60]... 3 more{code} > it seems the partitoned field is not found in the source table field list. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-21523) ArrayIndexOutOfBoundsException occurs while run a hive streaming job with partitioned table source
[ https://issues.apache.org/jira/browse/FLINK-21523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300210#comment-17300210 ] Jark Wu edited comment on FLINK-21523 at 3/13/21, 6:19 AM: --- Fixed in - master: 63a6aba6722ae0e3d17381aaeb2fa464ea15d2f5 - release-1.12: 621d7c223c5e8044ecb7965faf086431500f7d63 was (Author: jark): Fixed in master: 63a6aba6722ae0e3d17381aaeb2fa464ea15d2f5 > ArrayIndexOutOfBoundsException occurs while run a hive streaming job with > partitioned table source > --- > > Key: FLINK-21523 > URL: https://issues.apache.org/jira/browse/FLINK-21523 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.12.1 >Reporter: zouyunhe >Assignee: zouyunhe >Priority: Major > Labels: pull-request-available > Fix For: 1.13.0 > > > we have two hive table, the ddl as below > {code:java} > //test_tbl5 > create table test.test_5 > (dpi int, > uid bigint) > partitioned by( day string, hour string) stored as parquet; > //test_tbl3 > create table test.test_3( > dpi int, > uid bigint, > itime timestamp) stored as parquet;{code} > then add a partiton to test_tbl5, > {code:java} > alter table test_tbl5 add partition(day='2021-02-27',hour='12'); > {code} > we start a flink streaming job to read hive table test_tbl5 , and write the > data into test_tbl3, the job's sql as > {code:java} > set test_tbl5.streaming-source.enable = true; > insert into hive.test.test_tbl3 select dpi, uid, > cast(to_timestamp('2020-08-09 00:00:00') as timestamp(9)) from > hive.test.test_tbl5 where `day` = '2021-02-27'; > {code} > and we seen the exception throws > {code:java} > 2021-02-28 22:33:16,553 ERROR > org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext - > Exception while handling result from async call in SourceCoordinator-Source: > HiveSource-test.test_tbl5. Triggering job > failover.org.apache.flink.connectors.hive.FlinkHiveException: Failed to > enumerate filesat > org.apache.flink.connectors.hive.ContinuousHiveSplitEnumerator.handleNewSplits(ContinuousHiveSplitEnumerator.java:152) > ~[flink-connector-hive_2.12-1.12.1.jar:1.12.1]at > org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$null$4(ExecutorNotifier.java:136) > ~[flink-dist_2.12-1.12.1.jar:1.12.1]at > org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:40) > [flink-dist_2.12-1.12.1.jar:1.12.1]at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_60]at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_60]at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]Caused > by: java.lang.ArrayIndexOutOfBoundsException: -1at > org.apache.flink.connectors.hive.util.HivePartitionUtils.toHiveTablePartition(HivePartitionUtils.java:184) > ~[flink-connector-hive_2.12-1.12.1.jar:1.12.1]at > org.apache.flink.connectors.hive.HiveTableSource$HiveContinuousPartitionFetcherContext.toHiveTablePartition(HiveTableSource.java:417) > ~[flink-connector-hive_2.12-1.12.1.jar:1.12.1]at > org.apache.flink.connectors.hive.ContinuousHiveSplitEnumerator$PartitionMonitor.call(ContinuousHiveSplitEnumerator.java:237) > ~[flink-connector-hive_2.12-1.12.1.jar:1.12.1]at > org.apache.flink.connectors.hive.ContinuousHiveSplitEnumerator$PartitionMonitor.call(ContinuousHiveSplitEnumerator.java:177) > ~[flink-connector-hive_2.12-1.12.1.jar:1.12.1]at > org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$notifyReadyAsync$5(ExecutorNotifier.java:133) > ~[flink-dist_2.12-1.12.1.jar:1.12.1]at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_60]at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > ~[?:1.8.0_60]at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > ~[?:1.8.0_60]at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > ~[?:1.8.0_60]... 3 more{code} > it seems the partitoned field is not found in the source table field list. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] wuchong merged pull request #15169: [BP-1.12][FLINK-21523][hive] Fix ArrayIndexOutOfBoundsException when running hive partitioned source with projection push down
wuchong merged pull request #15169: URL: https://github.com/apache/flink/pull/15169 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21747) Encounter an exception that contains "max key length exceeded ..." when reporting metrics to influxdb
[ https://issues.apache.org/jira/browse/FLINK-21747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300736#comment-17300736 ] Jark Wu commented on FLINK-21747: - We had a discussion in FLINK-20388. > Encounter an exception that contains "max key length exceeded ..." when > reporting metrics to influxdb > -- > > Key: FLINK-21747 > URL: https://issues.apache.org/jira/browse/FLINK-21747 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics >Affects Versions: 1.10.0 >Reporter: tim yu >Priority: Major > > I run a stream job with insert statement whose size is too long, it report > metrics to influxdb. I find many influxdb exceptions that contains "max key > length exceeded ..." in the log file of job manager . The job could not write > any metrics to the influxdb, because "task_name" and "operator_name" is too > long. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-21466) Make "embedded" parameter optional when start sql-client.sh
[ https://issues.apache.org/jira/browse/FLINK-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu reassigned FLINK-21466: --- Assignee: zck > Make "embedded" parameter optional when start sql-client.sh > --- > > Key: FLINK-21466 > URL: https://issues.apache.org/jira/browse/FLINK-21466 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Shengkai Fang >Assignee: zck >Priority: Major > Labels: starter > > Users can use command > {code:java} > >./sql-client.sh embedded -i init1.sql,init2.sql > {code} > to start the sql client and don't need to add {{embedded}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #15175: [FLINK-16829] Refactored Prometheus Metric Reporters to use construct…
flinkbot edited a comment on pull request #15175: URL: https://github.com/apache/flink/pull/15175#issuecomment-797871668 ## CI report: * 62141a2aff3348088ceb5eb6c449e9abfe5f9870 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14573) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-21702) Support option `sql-client.verbose` to print the exception stack
[ https://issues.apache.org/jira/browse/FLINK-21702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu reassigned FLINK-21702: --- Assignee: zhuxiaoshang > Support option `sql-client.verbose` to print the exception stack > > > Key: FLINK-21702 > URL: https://issues.apache.org/jira/browse/FLINK-21702 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Client >Affects Versions: 1.13.0 >Reporter: Shengkai Fang >Assignee: zhuxiaoshang >Priority: Major > Labels: starer > > By enable this option, users can get the full exception stack. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #15175: [FLINK-16829] Refactored Prometheus Metric Reporters to use construct…
flinkbot edited a comment on pull request #15175: URL: https://github.com/apache/flink/pull/15175#issuecomment-797871668 ## CI report: * 62141a2aff3348088ceb5eb6c449e9abfe5f9870 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14573) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #15175: [FLINK-16829] Refactored Prometheus Metric Reporters to use construct…
flinkbot commented on pull request #15175: URL: https://github.com/apache/flink/pull/15175#issuecomment-797871668 ## CI report: * 62141a2aff3348088ceb5eb6c449e9abfe5f9870 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15174: [FLINK-21651][sql-client] Migrate module related tests in LocalExecutorITCase to the new integration test framework
flinkbot edited a comment on pull request #15174: URL: https://github.com/apache/flink/pull/15174#issuecomment-797861091 ## CI report: * faa510486a60de5f71d9cf924ae47fcac04560ff Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14572) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample
flinkbot edited a comment on pull request #15121: URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240 ## CI report: * 7383416dfaf59f0aeebc77450b6c3dc9b495789a Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14571) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] rionmonster commented on pull request #15175: [FLINK-16829] Refactored Prometheus Metric Reporters to use construct…
rionmonster commented on pull request #15175: URL: https://github.com/apache/flink/pull/15175#issuecomment-797867198 @zentol I decided to make another, different pass at this guy since the last PR seemed to be a bit all over the place. I migrated almost all of the logic to simplify the constructors as much as possible, but I'm sure some additional eyes and feedback would be appreciated. The only thing that really stuck out was the use of the `MetricConfig` that was used under the hood, however the factory methods (e.g. `createMetricReporter(...)`) accept `Properties`, so I added an explicit cast since `MetricConfig` was just a superset of `Properties`. Additionally, I updated 1-2 of the available unit tests to call the static method that was added to one of the factories and adjusted another that assumed an empty constructor for the `PrometheusReporter` which explicitly now requires the `port` and `httpServer` respectively. Perhaps we need another empty constructor as well? Anyways - it's late on this end, but hopefully this is a step in a better direction. Thanks much, Rion This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #15175: [FLINK-16829] Refactored Prometheus Metric Reporters to use construct…
flinkbot commented on pull request #15175: URL: https://github.com/apache/flink/pull/15175#issuecomment-797866980 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit 62141a2aff3348088ceb5eb6c449e9abfe5f9870 (Sat Mar 13 04:54:28 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] rionmonster opened a new pull request #15175: [FLINK-16829] Refactored Prometheus Metric Reporters to use construct…
rionmonster opened a new pull request #15175: URL: https://github.com/apache/flink/pull/15175 ## What is the purpose of the change Clean-up for the initialization of the two Prometheus Metric Reporters (`PrometheusReporter` and `PrometheusPushGatewayReporter`) by explicitly passing in arguments to the constructors as opposed to relying on resolving these values via the internal configuration properties. ## Brief change log - Removed argument resolution against defaults from the `open()` functions and instead resolved those within the factories and passed them into the respective reporter instances (roughly modeled after the `JmxReporter` implementation) ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15174: [FLINK-21651][sql-client] Migrate module related tests in LocalExecutorITCase to the new integration test framework
flinkbot edited a comment on pull request #15174: URL: https://github.com/apache/flink/pull/15174#issuecomment-797861091 ## CI report: * faa510486a60de5f71d9cf924ae47fcac04560ff Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14572) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #15174: [FLINK-21651][sql-client] Migrate module related tests in LocalExecutorITCase to the new integration test framework
flinkbot commented on pull request #15174: URL: https://github.com/apache/flink/pull/15174#issuecomment-797861091 ## CI report: * faa510486a60de5f71d9cf924ae47fcac04560ff UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #15174: [FLINK-21651][sql-client] Migrate module related tests in LocalExecutorITCase to the new integration test framework
flinkbot commented on pull request #15174: URL: https://github.com/apache/flink/pull/15174#issuecomment-797859298 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit faa510486a60de5f71d9cf924ae47fcac04560ff (Sat Mar 13 03:36:38 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-21651) Migrate module-related tests in LocalExecutorITCase to new integration test framework
[ https://issues.apache.org/jira/browse/FLINK-21651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-21651: --- Labels: pull-request-available (was: ) > Migrate module-related tests in LocalExecutorITCase to new integration test > framework > - > > Key: FLINK-21651 > URL: https://issues.apache.org/jira/browse/FLINK-21651 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Client >Affects Versions: 1.13.0 >Reporter: Jane Chan >Assignee: Jane Chan >Priority: Major > Labels: pull-request-available > > Migrate module-related tests in `LocalExecutorITCase` after FLINK-21614 > merged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] LadyForest opened a new pull request #15174: [FLINK-21651][sql-client] Migrate module related tests in LocalExecutorITCase to the new integration test framework
LadyForest opened a new pull request #15174: URL: https://github.com/apache/flink/pull/15174 ## Brief change log This PR migrates the tests in`LocalExecutorITCase` to `CliClientITCase`, and removes `#listModules` and `#listFullModules` in `LocalExecutor` since they are not needed anymore. ## Verifying this change This change added a `module.q` script under `test/resources/sql/` dir and can be verified by running `CliClientITCase`. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21466) Make "embedded" parameter optional when start sql-client.sh
[ https://issues.apache.org/jira/browse/FLINK-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300669#comment-17300669 ] zck commented on FLINK-21466: - [~jark] plz assign to me > Make "embedded" parameter optional when start sql-client.sh > --- > > Key: FLINK-21466 > URL: https://issues.apache.org/jira/browse/FLINK-21466 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Shengkai Fang >Priority: Major > Labels: starter > > Users can use command > {code:java} > >./sql-client.sh embedded -i init1.sql,init2.sql > {code} > to start the sql client and don't need to add {{embedded}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21466) Make "embedded" parameter optional when start sql-client.sh
[ https://issues.apache.org/jira/browse/FLINK-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300668#comment-17300668 ] zck commented on FLINK-21466: - plz assign to me > Make "embedded" parameter optional when start sql-client.sh > --- > > Key: FLINK-21466 > URL: https://issues.apache.org/jira/browse/FLINK-21466 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Shengkai Fang >Priority: Major > Labels: starter > > Users can use command > {code:java} > >./sql-client.sh embedded -i init1.sql,init2.sql > {code} > to start the sql client and don't need to add {{embedded}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21674) JDBC sink can't get valid connection after 5 minutes using Oracle JDBC driver
[ https://issues.apache.org/jira/browse/FLINK-21674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300667#comment-17300667 ] Fuyao commented on FLINK-21674: --- I was not aware of this earlier, finally I find someone who knows some configurations. It is a little bit tricky here. > JDBC sink can't get valid connection after 5 minutes using Oracle JDBC driver > - > > Key: FLINK-21674 > URL: https://issues.apache.org/jira/browse/FLINK-21674 > Project: Flink > Issue Type: Bug > Components: Connectors / JDBC >Affects Versions: 1.12.1 > Environment: Flink version: 1.12.1 Scala version: 2.11 Java version: > 1.11 Flink System parallelism: 1 JDBC Driver: Oracle ojdbc10 Database: Oracle > Autonomous Database on Oracle Cloud Infrastructure version 19c(You can regard > this as an cloud based Oracle Database) > > Flink user mailing list: > http://mail-archives.apache.org/mod_mbox/flink-user/202103.mbox/%3CCH2PR10MB402466373B33A3BBC5635A0AEC8C9%40CH2PR10MB4024.namprd10.prod.outlook.com%3E >Reporter: Fuyao >Priority: Blocker > > I use JDBCSink.sink() method to sink data to Oracle Autonomous Data Warehousr > with Oracle JDBC driver. I can sink data into Oracle Autonomous database > sucessfully. If there is IDLE time of over 5 minutes, then do a insertion, > the retry mechanism can't reestablish the JDBC and it will run into the error > below. I have set the retry to be 3 times, even after retry, it will still > fail. Only restart the application(an automatic process) could solve the > issue from checkpoint. > 11:41:04,872 ERROR > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat - JDBC > executeBatch error, retry times = 0 > java.sql.BatchUpdateException: IO Error: Broken pipe > It will fail the application and restart from checkpoint. After restarting > from checkpoint, the JDBC connection can be established correctly. > The connection timeout can be configured by > alter system set MAX_IDLE_TIME=1440; // Connection will get timeout after > 1440 minutes. > Such timeout parameter behavior change can be verified by SQL developer. > However, Flink still got connection error after 5 minutes configuring this. > I suspect this is some issues in reading some configuration problems from > Flink side to establish to sucessful connection. > Full log: > {code:java} > 11:41:04,872 ERROR > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat - JDBC > executeBatch error, retry times = 0 > java.sql.BatchUpdateException: IO Error: Broken pipe > at > oracle.jdbc.driver.OraclePreparedStatement.executeLargeBatch(OraclePreparedStatement.java:9711) > at > oracle.jdbc.driver.T4CPreparedStatement.executeLargeBatch(T4CPreparedStatement.java:1447) > at > oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:9487) > at > oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:237) > at > org.apache.flink.connector.jdbc.internal.executor.SimpleBatchStatementExecutor.executeBatch(SimpleBatchStatementExecutor.java:73) > at > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.attemptFlush(JdbcBatchingOutputFormat.java:216) > at > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.flush(JdbcBatchingOutputFormat.java:184) > at > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.writeRecord(JdbcBatchingOutputFormat.java:167) > at > org.apache.flink.connector.jdbc.internal.GenericJdbcSinkFunction.invoke(GenericJdbcSinkFunction.java:54) > at > org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) > at > org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71) > at > org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46) > at > org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26) > at > org.apache.flink.streaming.runtime.tasks.BroadcastingOutputCollector.collect(BroadcastingOutputCollector.java:75) > at > org.apache.flink.streaming.runtime.tasks.BroadcastingOutputCollector.collect(BroadcastingOutputCollector.java:32) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) > at > org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:50) > at > org.myorg.quickstart.processor.InvoiceBoProcessFunction.onTimer(InvoiceBoProcessFunction.java:613) > at > org.apache.flink.
[jira] [Resolved] (FLINK-21674) JDBC sink can't get valid connection after 5 minutes using Oracle JDBC driver
[ https://issues.apache.org/jira/browse/FLINK-21674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fuyao resolved FLINK-21674. --- Resolution: Fixed > JDBC sink can't get valid connection after 5 minutes using Oracle JDBC driver > - > > Key: FLINK-21674 > URL: https://issues.apache.org/jira/browse/FLINK-21674 > Project: Flink > Issue Type: Bug > Components: Connectors / JDBC >Affects Versions: 1.12.1 > Environment: Flink version: 1.12.1 Scala version: 2.11 Java version: > 1.11 Flink System parallelism: 1 JDBC Driver: Oracle ojdbc10 Database: Oracle > Autonomous Database on Oracle Cloud Infrastructure version 19c(You can regard > this as an cloud based Oracle Database) > > Flink user mailing list: > http://mail-archives.apache.org/mod_mbox/flink-user/202103.mbox/%3CCH2PR10MB402466373B33A3BBC5635A0AEC8C9%40CH2PR10MB4024.namprd10.prod.outlook.com%3E >Reporter: Fuyao >Priority: Blocker > > I use JDBCSink.sink() method to sink data to Oracle Autonomous Data Warehousr > with Oracle JDBC driver. I can sink data into Oracle Autonomous database > sucessfully. If there is IDLE time of over 5 minutes, then do a insertion, > the retry mechanism can't reestablish the JDBC and it will run into the error > below. I have set the retry to be 3 times, even after retry, it will still > fail. Only restart the application(an automatic process) could solve the > issue from checkpoint. > 11:41:04,872 ERROR > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat - JDBC > executeBatch error, retry times = 0 > java.sql.BatchUpdateException: IO Error: Broken pipe > It will fail the application and restart from checkpoint. After restarting > from checkpoint, the JDBC connection can be established correctly. > The connection timeout can be configured by > alter system set MAX_IDLE_TIME=1440; // Connection will get timeout after > 1440 minutes. > Such timeout parameter behavior change can be verified by SQL developer. > However, Flink still got connection error after 5 minutes configuring this. > I suspect this is some issues in reading some configuration problems from > Flink side to establish to sucessful connection. > Full log: > {code:java} > 11:41:04,872 ERROR > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat - JDBC > executeBatch error, retry times = 0 > java.sql.BatchUpdateException: IO Error: Broken pipe > at > oracle.jdbc.driver.OraclePreparedStatement.executeLargeBatch(OraclePreparedStatement.java:9711) > at > oracle.jdbc.driver.T4CPreparedStatement.executeLargeBatch(T4CPreparedStatement.java:1447) > at > oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:9487) > at > oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:237) > at > org.apache.flink.connector.jdbc.internal.executor.SimpleBatchStatementExecutor.executeBatch(SimpleBatchStatementExecutor.java:73) > at > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.attemptFlush(JdbcBatchingOutputFormat.java:216) > at > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.flush(JdbcBatchingOutputFormat.java:184) > at > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.writeRecord(JdbcBatchingOutputFormat.java:167) > at > org.apache.flink.connector.jdbc.internal.GenericJdbcSinkFunction.invoke(GenericJdbcSinkFunction.java:54) > at > org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) > at > org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71) > at > org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46) > at > org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26) > at > org.apache.flink.streaming.runtime.tasks.BroadcastingOutputCollector.collect(BroadcastingOutputCollector.java:75) > at > org.apache.flink.streaming.runtime.tasks.BroadcastingOutputCollector.collect(BroadcastingOutputCollector.java:32) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) > at > org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:50) > at > org.myorg.quickstart.processor.InvoiceBoProcessFunction.onTimer(InvoiceBoProcessFunction.java:613) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.invokeUserFunction(KeyedProcessOperator.java:91) > at > org.apache.flink.streaming.api.operators.Keyed
[jira] [Updated] (FLINK-21747) Encounter an exception that contains "max key length exceeded ..." when reporting metrics to influxdb
[ https://issues.apache.org/jira/browse/FLINK-21747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tim yu updated FLINK-21747: --- Description: I run a stream job with insert statement whose size is too long, it report metrics to influxdb. I find many influxdb exceptions that contains "max key length exceeded ..." in the log file of job manager . The job could not write any metrics to the influxdb, because "task_name" and "operator_name" is too long. (was: I run a stream job with insert statement whose size is too long, it report metrics to influxdb. I find many influxdb exceptions that contains "max key length exceeded ..." in the log file of job manager . The job could not write the metrics to the influxdb, because "task_name" and "operator_name" is too long.) > Encounter an exception that contains "max key length exceeded ..." when > reporting metrics to influxdb > -- > > Key: FLINK-21747 > URL: https://issues.apache.org/jira/browse/FLINK-21747 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics >Affects Versions: 1.10.0 >Reporter: tim yu >Priority: Major > > I run a stream job with insert statement whose size is too long, it report > metrics to influxdb. I find many influxdb exceptions that contains "max key > length exceeded ..." in the log file of job manager . The job could not write > any metrics to the influxdb, because "task_name" and "operator_name" is too > long. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21674) JDBC sink can't get valid connection after 5 minutes using Oracle JDBC driver
[ https://issues.apache.org/jira/browse/FLINK-21674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300666#comment-17300666 ] Fuyao commented on FLINK-21674: --- Hi Roman, thanks for your help. I was able to figure out the root cause. I am an employee in Oracle. I need to be within OCI VPN instead of Oracle VPN to get access to ADW. I used to use proxy to access it. However, proxy will be pose a 5 min hard limit for connection. After 5 min, the JDBC connection will be cut off no matter what you configured from DB side and client side. With OCI VPN, I am able to get over the issue. Thanks for your help. > JDBC sink can't get valid connection after 5 minutes using Oracle JDBC driver > - > > Key: FLINK-21674 > URL: https://issues.apache.org/jira/browse/FLINK-21674 > Project: Flink > Issue Type: Bug > Components: Connectors / JDBC >Affects Versions: 1.12.1 > Environment: Flink version: 1.12.1 Scala version: 2.11 Java version: > 1.11 Flink System parallelism: 1 JDBC Driver: Oracle ojdbc10 Database: Oracle > Autonomous Database on Oracle Cloud Infrastructure version 19c(You can regard > this as an cloud based Oracle Database) > > Flink user mailing list: > http://mail-archives.apache.org/mod_mbox/flink-user/202103.mbox/%3CCH2PR10MB402466373B33A3BBC5635A0AEC8C9%40CH2PR10MB4024.namprd10.prod.outlook.com%3E >Reporter: Fuyao >Priority: Blocker > > I use JDBCSink.sink() method to sink data to Oracle Autonomous Data Warehousr > with Oracle JDBC driver. I can sink data into Oracle Autonomous database > sucessfully. If there is IDLE time of over 5 minutes, then do a insertion, > the retry mechanism can't reestablish the JDBC and it will run into the error > below. I have set the retry to be 3 times, even after retry, it will still > fail. Only restart the application(an automatic process) could solve the > issue from checkpoint. > 11:41:04,872 ERROR > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat - JDBC > executeBatch error, retry times = 0 > java.sql.BatchUpdateException: IO Error: Broken pipe > It will fail the application and restart from checkpoint. After restarting > from checkpoint, the JDBC connection can be established correctly. > The connection timeout can be configured by > alter system set MAX_IDLE_TIME=1440; // Connection will get timeout after > 1440 minutes. > Such timeout parameter behavior change can be verified by SQL developer. > However, Flink still got connection error after 5 minutes configuring this. > I suspect this is some issues in reading some configuration problems from > Flink side to establish to sucessful connection. > Full log: > {code:java} > 11:41:04,872 ERROR > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat - JDBC > executeBatch error, retry times = 0 > java.sql.BatchUpdateException: IO Error: Broken pipe > at > oracle.jdbc.driver.OraclePreparedStatement.executeLargeBatch(OraclePreparedStatement.java:9711) > at > oracle.jdbc.driver.T4CPreparedStatement.executeLargeBatch(T4CPreparedStatement.java:1447) > at > oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:9487) > at > oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:237) > at > org.apache.flink.connector.jdbc.internal.executor.SimpleBatchStatementExecutor.executeBatch(SimpleBatchStatementExecutor.java:73) > at > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.attemptFlush(JdbcBatchingOutputFormat.java:216) > at > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.flush(JdbcBatchingOutputFormat.java:184) > at > org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.writeRecord(JdbcBatchingOutputFormat.java:167) > at > org.apache.flink.connector.jdbc.internal.GenericJdbcSinkFunction.invoke(GenericJdbcSinkFunction.java:54) > at > org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) > at > org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71) > at > org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46) > at > org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26) > at > org.apache.flink.streaming.runtime.tasks.BroadcastingOutputCollector.collect(BroadcastingOutputCollector.java:75) > at > org.apache.flink.streaming.runtime.tasks.BroadcastingOutputCollector.collect(BroadcastingOutputCollector.java:32) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) > at > org.apac
[jira] [Comment Edited] (FLINK-21747) Encounter an exception that contains "max key length exceeded ..." when reporting metrics to influxdb
[ https://issues.apache.org/jira/browse/FLINK-21747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300665#comment-17300665 ] tim yu edited comment on FLINK-21747 at 3/13/21, 1:10 AM: -- Hi [~chesnay], Thanks for your suggestion, but that is not a perfect way. Should we shorten task name and operator name of some metrics ? Hi [~jark], Do you have any good suggestion? was (Author: yulei0824): Hi [~chesnay], Thanks for your suggestion, but that is not perfect way. Should we shorten task name and operator name of some metrics ? Hi [~jark], Do you have any good suggestion? > Encounter an exception that contains "max key length exceeded ..." when > reporting metrics to influxdb > -- > > Key: FLINK-21747 > URL: https://issues.apache.org/jira/browse/FLINK-21747 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics >Affects Versions: 1.10.0 >Reporter: tim yu >Priority: Major > > I run a stream job with insert statement whose size is too long, it report > metrics to influxdb. I find many influxdb exceptions that contains "max key > length exceeded ..." in the log file of job manager . The job could not write > the metrics to the influxdb, because "task_name" and "operator_name" is too > long. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21747) Encounter an exception that contains "max key length exceeded ..." when reporting metrics to influxdb
[ https://issues.apache.org/jira/browse/FLINK-21747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300665#comment-17300665 ] tim yu commented on FLINK-21747: Hi [~chesnay], Thanks for your suggestion, but that is not perfect way. Should we shorten task name and operator name of some metrics ? Hi [~jark], Do you have any good suggestion? > Encounter an exception that contains "max key length exceeded ..." when > reporting metrics to influxdb > -- > > Key: FLINK-21747 > URL: https://issues.apache.org/jira/browse/FLINK-21747 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics >Affects Versions: 1.10.0 >Reporter: tim yu >Priority: Major > > I run a stream job with insert statement whose size is too long, it report > metrics to influxdb. I find many influxdb exceptions that contains "max key > length exceeded ..." in the log file of job manager . The job could not write > the metrics to the influxdb, because "task_name" and "operator_name" is too > long. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample
flinkbot edited a comment on pull request #15121: URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240 ## CI report: * 76fdf33da7a9809d4c3b5fda8164d86880a39cac Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14566) * 7383416dfaf59f0aeebc77450b6c3dc9b495789a Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14571) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample
flinkbot edited a comment on pull request #15121: URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240 ## CI report: * dfe55bcf15e82fa63e85551fba2e0b415bba8dae Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14554) * 76fdf33da7a9809d4c3b5fda8164d86880a39cac Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14566) * 7383416dfaf59f0aeebc77450b6c3dc9b495789a Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14571) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample
flinkbot edited a comment on pull request #15121: URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240 ## CI report: * dfe55bcf15e82fa63e85551fba2e0b415bba8dae Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14554) * 76fdf33da7a9809d4c3b5fda8164d86880a39cac Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14566) * 7383416dfaf59f0aeebc77450b6c3dc9b495789a UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21538) Elasticsearch6DynamicSinkITCase.testWritingDocuments fails when submitting job
[ https://issues.apache.org/jira/browse/FLINK-21538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300636#comment-17300636 ] Austin Cawley-Edwards commented on FLINK-21538: --- Test debug build here: https://dev.azure.com/austincawley0684/flink/_build/results?buildId=3&view=results > Elasticsearch6DynamicSinkITCase.testWritingDocuments fails when submitting job > -- > > Key: FLINK-21538 > URL: https://issues.apache.org/jira/browse/FLINK-21538 > Project: Flink > Issue Type: Bug > Components: Connectors / ElasticSearch, Runtime / Coordination >Affects Versions: 1.12.1 >Reporter: Dawid Wysakowicz >Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=13868&view=logs&j=3d12d40f-c62d-5ec4-6acc-0efe94cc3e89&t=5d6e4255-0ea8-5e2a-f52c-c881b7872361 > {code} > 2021-02-27T00:16:06.9493539Z > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > 2021-02-27T00:16:06.9494494Z at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144) > 2021-02-27T00:16:06.9495733Z at > org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$2(MiniClusterJobClient.java:117) > 2021-02-27T00:16:06.9496596Z at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) > 2021-02-27T00:16:06.9497354Z at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > 2021-02-27T00:16:06.9525795Z at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2021-02-27T00:16:06.9526744Z at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > 2021-02-27T00:16:06.9527784Z at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:237) > 2021-02-27T00:16:06.9528552Z at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > 2021-02-27T00:16:06.9529271Z at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > 2021-02-27T00:16:06.9530013Z at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2021-02-27T00:16:06.9530482Z at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > 2021-02-27T00:16:06.9531068Z at > org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:1046) > 2021-02-27T00:16:06.9531544Z at > akka.dispatch.OnComplete.internal(Future.scala:264) > 2021-02-27T00:16:06.9531908Z at > akka.dispatch.OnComplete.internal(Future.scala:261) > 2021-02-27T00:16:06.9532449Z at > akka.dispatch.japi$CallbackBridge.apply(Future.scala:191) > 2021-02-27T00:16:06.9532860Z at > akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) > 2021-02-27T00:16:06.9533245Z at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) > 2021-02-27T00:16:06.9533721Z at > org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:73) > 2021-02-27T00:16:06.9534225Z at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) > 2021-02-27T00:16:06.9534697Z at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) > 2021-02-27T00:16:06.9535217Z at > scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) > 2021-02-27T00:16:06.9535718Z at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > 2021-02-27T00:16:06.9536127Z at > akka.pattern.PromiseActorRef.$bang(AskSupport.scala:573) > 2021-02-27T00:16:06.9536861Z at > akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22) > 2021-02-27T00:16:06.9537394Z at > akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21) > 2021-02-27T00:16:06.9537916Z at > scala.concurrent.Future.$anonfun$andThen$1(Future.scala:532) > 2021-02-27T00:16:06.9605804Z at > scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29) > 2021-02-27T00:16:06.9606794Z at > scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) > 2021-02-27T00:16:06.9607642Z at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) > 2021-02-27T00:16:06.9608419Z at > akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) > 2021-02-27T00:16:06.9609252Z at > akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91) > 2021-02-27T00:16:06.9610024Z at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > 2021-02-27T00:16:06.9613676Z at > scala.concurrent.BlockContext$.with
[GitHub] [flink] rionmonster commented on pull request #15165: [FLINK-16829] Refactored Prometheus Metric Reporters to Use Constructor Initialization
rionmonster commented on pull request #15165: URL: https://github.com/apache/flink/pull/15165#issuecomment-797790797 @zentol I'll be submitting another PR once this work is done, ran into some git issues and decided to scorch the earth in the interested of keeping things clean. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] rionmonster closed pull request #15165: [FLINK-16829] Refactored Prometheus Metric Reporters to Use Constructor Initialization
rionmonster closed pull request #15165: URL: https://github.com/apache/flink/pull/15165 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] rionmonster commented on a change in pull request #15165: [FLINK-16829] Refactored Prometheus Metric Reporters to Use Constructor Initialization
rionmonster commented on a change in pull request #15165: URL: https://github.com/apache/flink/pull/15165#discussion_r593477758 ## File path: flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java ## @@ -56,43 +56,42 @@ private boolean deleteOnShutdown; private Map groupingKey; -@Override -public void open(MetricConfig config) { -super.open(config); - -String host = config.getString(HOST.key(), HOST.defaultValue()); -int port = config.getInteger(PORT.key(), PORT.defaultValue()); -String configuredJobName = config.getString(JOB_NAME.key(), JOB_NAME.defaultValue()); -boolean randomSuffix = -config.getBoolean( -RANDOM_JOB_NAME_SUFFIX.key(), RANDOM_JOB_NAME_SUFFIX.defaultValue()); -deleteOnShutdown = -config.getBoolean(DELETE_ON_SHUTDOWN.key(), DELETE_ON_SHUTDOWN.defaultValue()); -groupingKey = -parseGroupingKey(config.getString(GROUPING_KEY.key(), GROUPING_KEY.defaultValue())); - -if (host == null || host.isEmpty() || port < 1) { +PrometheusPushGatewayReporter( +@Nullable final String hostConfig, +@Nullable final int portConfig, +@Nullable final String jobNameConfig, +@Nullable final boolean randomJobSuffixConfig, +@Nullable final boolean deleteOnShutdownConfig, +@Nullable final Map groupingKeyConfig +) { +deleteOnShutdown = deleteOnShutdownConfig +groupingKey = parseGroupingKey(groupingKeyConfig) + +if (hostConfig == null || hostConfig.isEmpty() || portConfig < 1) { throw new IllegalArgumentException( -"Invalid host/port configuration. Host: " + host + " Port: " + port); +"Invalid host/port configuration. Host: " + hostConfig + " Port: " + portConfig); } -if (randomSuffix) { -this.jobName = configuredJobName + new AbstractID(); +if (randomJobSuffixConfig) { +this.jobName = jobNameConfig + new AbstractID(); } else { -this.jobName = configuredJobName; +this.jobName = jobNameConfig; } -pushGateway = new PushGateway(host + ':' + port); +pushGateway = new PushGateway(hostConfig + ':' + portConfig); log.info( "Configured PrometheusPushGatewayReporter with {host:{}, port:{}, jobName:{}, randomJobNameSuffix:{}, deleteOnShutdown:{}, groupingKey:{}}", -host, -port, -jobName, -randomSuffix, +hostConfig, +portConfig, +jobNameConfig, +randomJobSuffixConfig, deleteOnShutdown, groupingKey); } +@Override +public void open(MetricConfig config) { } + Map parseGroupingKey(final String groupingKeyConfig) { Review comment: Okay, totally right. I was _way_ overthinking this. I'll probably burn this branch down and pull latest to avoid too much shuffling around. Much appreciated! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-21743) JdbcXaSinkFunction throws XAER_RMFAIL when calling snapshotState and beginTx
[ https://issues.apache.org/jira/browse/FLINK-21743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Khachatryan reassigned FLINK-21743: - Assignee: Roman Khachatryan > JdbcXaSinkFunction throws XAER_RMFAIL when calling snapshotState and beginTx > > > Key: FLINK-21743 > URL: https://issues.apache.org/jira/browse/FLINK-21743 > Project: Flink > Issue Type: Test > Components: Connectors / JDBC >Affects Versions: 1.13.0 > Environment: org.apache.flink:flink-streaming-java_2.11:1.12.1 > org.apache.flink:flink-connector-jdbc_2.11:1.13-SNAPSHOT >Reporter: Wei Hao >Assignee: Roman Khachatryan >Priority: Major > > {code:java} > public void snapshotState(FunctionSnapshotContext context) throws Exception { > LOG.debug("snapshot state, checkpointId={}", context.getCheckpointId()); > this.rollbackPreparedFromCheckpoint(context.getCheckpointId()); > this.prepareCurrentTx(context.getCheckpointId()); > this.beginTx(context.getCheckpointId() + 1L); > this.stateHandler.store(JdbcXaSinkFunctionState.of(this.preparedXids, > this.hangingXids)); > } > {code} > When checkpointing starts, it calls snapshotState(), which ends and prepares > the current transaction. The issue I found is with beginTx(), where a new Xid > is generated and xaFacade will run command like 'xa start new_xid', which > will throw the exception as shown below and causes checkpointing failure. > {code:java} > Caused by: org.apache.flink.connector.jdbc.xa.XaFacade$TransientXaException: > com.mysql.cj.jdbc.MysqlXAException: XAER_RMFAIL: The command cannot be > executed when global transaction is in the PREPARED stateCaused by: > org.apache.flink.connector.jdbc.xa.XaFacade$TransientXaException: > com.mysql.cj.jdbc.MysqlXAException: XAER_RMFAIL: The command cannot be > executed when global transaction is in the PREPARED state at > org.apache.flink.connector.jdbc.xa.XaFacadeImpl.wrapException(XaFacadeImpl.java:353) > at > org.apache.flink.connector.jdbc.xa.XaFacadeImpl.access$800(XaFacadeImpl.java:66) > at > org.apache.flink.connector.jdbc.xa.XaFacadeImpl$Command.lambda$fromRunnable$0(XaFacadeImpl.java:288) > at > org.apache.flink.connector.jdbc.xa.XaFacadeImpl$Command.lambda$fromRunnable$4(XaFacadeImpl.java:327) > at > org.apache.flink.connector.jdbc.xa.XaFacadeImpl.execute(XaFacadeImpl.java:267) > at > org.apache.flink.connector.jdbc.xa.XaFacadeImpl.start(XaFacadeImpl.java:160) > at > org.apache.flink.connector.jdbc.xa.JdbcXaSinkFunction.beginTx(JdbcXaSinkFunction.java:302) > at > org.apache.flink.connector.jdbc.xa.JdbcXaSinkFunction.snapshotState(JdbcXaSinkFunction.java:241) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:89) > at > org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:205) > ... 23 more > {code} > I think the scenario is quite predictable because it is how xa transaction > works. > The MySQL shell example below behaves quite similar to how JdbcXaSinkFunction > works. > {code:java} > xa start ""; > # Inserting some rows > # end the current transaction > xa end ""; > xa prepare ""; > # start a new transaction with the same connection while the previous one is > PREPARED > xa prepare ""; > {code} > This also produces error 'SQL Error [1399] [XAE07]: XAER_RMFAIL: The command > cannot be executed when global transaction is in the PREPARED state'. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21743) JdbcXaSinkFunction throws XAER_RMFAIL when calling snapshotState and beginTx
[ https://issues.apache.org/jira/browse/FLINK-21743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300627#comment-17300627 ] Roman Khachatryan commented on FLINK-21743: --- Works fine on postgresql (after increasing max_prepared_transactions): {code} postgres=# begin; BEGIN postgres=*# prepare transaction '1'; PREPARE TRANSACTION postgres=# begin; BEGIN postgres=*# prepare transaction '2'; PREPARE TRANSACTION postgres=# commit prepared '1'; COMMIT PREPARED postgres=# commit prepared '2'; COMMIT PREPARED {code} Probably [https://bugs.mysql.com/bug.php?id=17343] > JdbcXaSinkFunction throws XAER_RMFAIL when calling snapshotState and beginTx > > > Key: FLINK-21743 > URL: https://issues.apache.org/jira/browse/FLINK-21743 > Project: Flink > Issue Type: Test > Components: Connectors / JDBC >Affects Versions: 1.13.0 > Environment: org.apache.flink:flink-streaming-java_2.11:1.12.1 > org.apache.flink:flink-connector-jdbc_2.11:1.13-SNAPSHOT >Reporter: Wei Hao >Priority: Major > > {code:java} > public void snapshotState(FunctionSnapshotContext context) throws Exception { > LOG.debug("snapshot state, checkpointId={}", context.getCheckpointId()); > this.rollbackPreparedFromCheckpoint(context.getCheckpointId()); > this.prepareCurrentTx(context.getCheckpointId()); > this.beginTx(context.getCheckpointId() + 1L); > this.stateHandler.store(JdbcXaSinkFunctionState.of(this.preparedXids, > this.hangingXids)); > } > {code} > When checkpointing starts, it calls snapshotState(), which ends and prepares > the current transaction. The issue I found is with beginTx(), where a new Xid > is generated and xaFacade will run command like 'xa start new_xid', which > will throw the exception as shown below and causes checkpointing failure. > {code:java} > Caused by: org.apache.flink.connector.jdbc.xa.XaFacade$TransientXaException: > com.mysql.cj.jdbc.MysqlXAException: XAER_RMFAIL: The command cannot be > executed when global transaction is in the PREPARED stateCaused by: > org.apache.flink.connector.jdbc.xa.XaFacade$TransientXaException: > com.mysql.cj.jdbc.MysqlXAException: XAER_RMFAIL: The command cannot be > executed when global transaction is in the PREPARED state at > org.apache.flink.connector.jdbc.xa.XaFacadeImpl.wrapException(XaFacadeImpl.java:353) > at > org.apache.flink.connector.jdbc.xa.XaFacadeImpl.access$800(XaFacadeImpl.java:66) > at > org.apache.flink.connector.jdbc.xa.XaFacadeImpl$Command.lambda$fromRunnable$0(XaFacadeImpl.java:288) > at > org.apache.flink.connector.jdbc.xa.XaFacadeImpl$Command.lambda$fromRunnable$4(XaFacadeImpl.java:327) > at > org.apache.flink.connector.jdbc.xa.XaFacadeImpl.execute(XaFacadeImpl.java:267) > at > org.apache.flink.connector.jdbc.xa.XaFacadeImpl.start(XaFacadeImpl.java:160) > at > org.apache.flink.connector.jdbc.xa.JdbcXaSinkFunction.beginTx(JdbcXaSinkFunction.java:302) > at > org.apache.flink.connector.jdbc.xa.JdbcXaSinkFunction.snapshotState(JdbcXaSinkFunction.java:241) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:89) > at > org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:205) > ... 23 more > {code} > I think the scenario is quite predictable because it is how xa transaction > works. > The MySQL shell example below behaves quite similar to how JdbcXaSinkFunction > works. > {code:java} > xa start ""; > # Inserting some rows > # end the current transaction > xa end ""; > xa prepare ""; > # start a new transaction with the same connection while the previous one is > PREPARED > xa prepare ""; > {code} > This also produces error 'SQL Error [1399] [XAE07]: XAER_RMFAIL: The command > cannot be executed when global transaction is in the PREPARED state'. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #15134: [FLINK-21654][test] Added @throws information to tests
flinkbot edited a comment on pull request #15134: URL: https://github.com/apache/flink/pull/15134#issuecomment-795077470 ## CI report: * 63851b4ad1d97292257cb8cb02f6bd211b503979 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14562) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15049: [FLINK-21190][runtime-web] Expose exception history
flinkbot edited a comment on pull request #15049: URL: https://github.com/apache/flink/pull/15049#issuecomment-787719691 ## CI report: * 2cbffce55c35f7e163739d07f88e480870a0fc37 UNKNOWN * bd732209f8aaa9653dfeaf7be6ff8396ece70928 UNKNOWN * b8744b240d77fc84dee60042317b79585423d403 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14564) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample
flinkbot edited a comment on pull request #15121: URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240 ## CI report: * dfe55bcf15e82fa63e85551fba2e0b415bba8dae Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14554) * 76fdf33da7a9809d4c3b5fda8164d86880a39cac Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14566) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15136: [FLINK-21622][table-planner] Introduce function TO_TIMESTAMP_LTZ(numeric [, precision])
flinkbot edited a comment on pull request #15136: URL: https://github.com/apache/flink/pull/15136#issuecomment-795141489 ## CI report: * 38f1cea9a24f244241f6e2a9f069feb7f774652c Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14561) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15049: [FLINK-21190][runtime-web] Expose exception history
flinkbot edited a comment on pull request #15049: URL: https://github.com/apache/flink/pull/15049#issuecomment-787719691 ## CI report: * 2cbffce55c35f7e163739d07f88e480870a0fc37 UNKNOWN * bd732209f8aaa9653dfeaf7be6ff8396ece70928 UNKNOWN * d772ca219b9437388a0996cb5498789153ea8b1b Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14560) * b8744b240d77fc84dee60042317b79585423d403 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14564) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample
flinkbot edited a comment on pull request #15121: URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240 ## CI report: * dfe55bcf15e82fa63e85551fba2e0b415bba8dae Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14554) * 76fdf33da7a9809d4c3b5fda8164d86880a39cac UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] twalthr closed pull request #15154: [FLINK-21725][core] Name constructor arguments of Tuples like fields
twalthr closed pull request #15154: URL: https://github.com/apache/flink/pull/15154 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #15173: [FLINK-21367][Connectors / JDBC] Support objectReuse in JDBC sink
flinkbot commented on pull request #15173: URL: https://github.com/apache/flink/pull/15173#issuecomment-797709378 ## CI report: * c2d910a15fa478f09f81913837d14aa4782601b2 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15171: [FLINK-21558][coordination] Skip check if slotreport indicates no change
flinkbot edited a comment on pull request #15171: URL: https://github.com/apache/flink/pull/15171#issuecomment-797538070 ## CI report: * eaf04db0fc9193bb77e3ecb917f828f2adbbca14 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14555) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample
flinkbot edited a comment on pull request #15121: URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240 ## CI report: * dfe55bcf15e82fa63e85551fba2e0b415bba8dae Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14554) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15172: [FLINK-21388] [flink-parquet] support DECIMAL parquet logical type wh…
flinkbot edited a comment on pull request #15172: URL: https://github.com/apache/flink/pull/15172#issuecomment-797538175 ## CI report: * 06cea2c4d6cc6724f86063ddb5974c88ae9629b1 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14556) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15154: [FLINK-21725][core] Name constructor arguments of Tuples like fields
flinkbot edited a comment on pull request #15154: URL: https://github.com/apache/flink/pull/15154#issuecomment-796737826 ## CI report: * 4572f80bab02248b8d5d1f8d2bda99f9a933e17f Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14559) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #15173: Flink-21367
flinkbot commented on pull request #15173: URL: https://github.com/apache/flink/pull/15173#issuecomment-797679900 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit c2d910a15fa478f09f81913837d14aa4782601b2 (Fri Mar 12 18:49:37 UTC 2021) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tillrohrmann merged pull request #15107: [BP-1.11][FLINK-21606] Reject JobManager <-> TaskManager connection if JobManager is not responsible for job
tillrohrmann merged pull request #15107: URL: https://github.com/apache/flink/pull/15107 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] Bruschkov opened a new pull request #15173: Flink-21367
Bruschkov opened a new pull request #15173: URL: https://github.com/apache/flink/pull/15173 ## What is the purpose of the change * Users should now be able to use the jdbc-sink when object reuse is enabled (or from stateful functions, where object reuse is enabled explicitly) ## Brief change log *(for example:)* - Jdbc-sink and JdbcOutputFormat now implement InputTypeConfigurable, through which a serializer is obtained for the incoming records. When object reuse is enabled, a copy of each record (via the serializer) is stored in the jdbc-batch, as opposed to the record itself, when object reuse is disabled. ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): don't know - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21597) testMapAfterRepartitionHasCorrectParallelism2 Fail because of "NoResourceAvailableException"
[ https://issues.apache.org/jira/browse/FLINK-21597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300540#comment-17300540 ] Matthias commented on FLINK-21597: -- I attached the test failure's log for easier analysis. We have a suspicion that it could be related to a known race condition in the declarative slot protocol implementation. That can appear in cases where multiple jobs compete for limited available slots. FLINK-21751 is covering this issue. But looking at the logs again, I'm not confident that that's the reason for the failure: We ran into timeouts in the range of 10secs due to the internally used timeout threshold for inactive slots of 10 seconds. The {{PartitionITCase}} fails having a timeout of 300s. > testMapAfterRepartitionHasCorrectParallelism2 Fail because of > "NoResourceAvailableException" > - > > Key: FLINK-21597 > URL: https://issues.apache.org/jira/browse/FLINK-21597 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.13.0 >Reporter: Guowei Ma >Priority: Major > Labels: test-stability > Attachments: FLINK-21597.log > > > {code:java} > 2021-03-04T00:17:41.2017402Z [ERROR] > testMapAfterRepartitionHasCorrectParallelism2[Execution mode = > CLUSTER](org.apache.flink.api.scala.operators.PartitionITCase) Time elapsed: > 300.117 s <<< ERROR! > 2021-03-04T00:17:41.2018058Z > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > 2021-03-04T00:17:41.2018525Z at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144) > 2021-03-04T00:17:41.2019563Z at > org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137) > 2021-03-04T00:17:41.2020129Z at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) > 2021-03-04T00:17:41.2021974Z at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > 2021-03-04T00:17:41.2022634Z at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2021-03-04T00:17:41.2023118Z at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > 2021-03-04T00:17:41.2023682Z at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:237) > 2021-03-04T00:17:41.2024244Z at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > 2021-03-04T00:17:41.2024749Z at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > 2021-03-04T00:17:41.2025261Z at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2021-03-04T00:17:41.2026070Z at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > 2021-03-04T00:17:41.2026814Z at > org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:1066) > 2021-03-04T00:17:41.2027633Z at > akka.dispatch.OnComplete.internal(Future.scala:264) > 2021-03-04T00:17:41.2028245Z at > akka.dispatch.OnComplete.internal(Future.scala:261) > 2021-03-04T00:17:41.2028796Z at > akka.dispatch.japi$CallbackBridge.apply(Future.scala:191) > 2021-03-04T00:17:41.2029327Z at > akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) > 2021-03-04T00:17:41.2030017Z at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) > 2021-03-04T00:17:41.2030795Z at > org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:73) > 2021-03-04T00:17:41.2031885Z at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) > 2021-03-04T00:17:41.2032678Z at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) > 2021-03-04T00:17:41.2033428Z at > akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572) > 2021-03-04T00:17:41.2034197Z at > akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22) > 2021-03-04T00:17:41.2035094Z at > akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21) > 2021-03-04T00:17:41.2035915Z at > scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436) > 2021-03-04T00:17:41.2036617Z at > scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435) > 2021-03-04T00:17:41.2037537Z at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) > 2021-03-04T00:17:41.2038019Z at > akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) > 2021-03-04T00:17:41.2038554Z at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
[jira] [Updated] (FLINK-21597) testMapAfterRepartitionHasCorrectParallelism2 Fail because of "NoResourceAvailableException"
[ https://issues.apache.org/jira/browse/FLINK-21597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias updated FLINK-21597: - Attachment: FLINK-21597.log > testMapAfterRepartitionHasCorrectParallelism2 Fail because of > "NoResourceAvailableException" > - > > Key: FLINK-21597 > URL: https://issues.apache.org/jira/browse/FLINK-21597 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.13.0 >Reporter: Guowei Ma >Priority: Major > Labels: test-stability > Attachments: FLINK-21597.log > > > {code:java} > 2021-03-04T00:17:41.2017402Z [ERROR] > testMapAfterRepartitionHasCorrectParallelism2[Execution mode = > CLUSTER](org.apache.flink.api.scala.operators.PartitionITCase) Time elapsed: > 300.117 s <<< ERROR! > 2021-03-04T00:17:41.2018058Z > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > 2021-03-04T00:17:41.2018525Z at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144) > 2021-03-04T00:17:41.2019563Z at > org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137) > 2021-03-04T00:17:41.2020129Z at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) > 2021-03-04T00:17:41.2021974Z at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > 2021-03-04T00:17:41.2022634Z at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2021-03-04T00:17:41.2023118Z at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > 2021-03-04T00:17:41.2023682Z at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:237) > 2021-03-04T00:17:41.2024244Z at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > 2021-03-04T00:17:41.2024749Z at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > 2021-03-04T00:17:41.2025261Z at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2021-03-04T00:17:41.2026070Z at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > 2021-03-04T00:17:41.2026814Z at > org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:1066) > 2021-03-04T00:17:41.2027633Z at > akka.dispatch.OnComplete.internal(Future.scala:264) > 2021-03-04T00:17:41.2028245Z at > akka.dispatch.OnComplete.internal(Future.scala:261) > 2021-03-04T00:17:41.2028796Z at > akka.dispatch.japi$CallbackBridge.apply(Future.scala:191) > 2021-03-04T00:17:41.2029327Z at > akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) > 2021-03-04T00:17:41.2030017Z at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) > 2021-03-04T00:17:41.2030795Z at > org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:73) > 2021-03-04T00:17:41.2031885Z at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) > 2021-03-04T00:17:41.2032678Z at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) > 2021-03-04T00:17:41.2033428Z at > akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572) > 2021-03-04T00:17:41.2034197Z at > akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22) > 2021-03-04T00:17:41.2035094Z at > akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21) > 2021-03-04T00:17:41.2035915Z at > scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436) > 2021-03-04T00:17:41.2036617Z at > scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435) > 2021-03-04T00:17:41.2037537Z at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) > 2021-03-04T00:17:41.2038019Z at > akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) > 2021-03-04T00:17:41.2038554Z at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) > 2021-03-04T00:17:41.2039117Z at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > 2021-03-04T00:17:41.2039671Z at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > 2021-03-04T00:17:41.2040159Z at > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > 2021-03-04T00:17:41.2040632Z at > akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) > 2021-03-04T00:17:41.2041086Z at > akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > 2021
[jira] [Closed] (FLINK-21606) TaskManager connected to invalid JobManager leading to TaskSubmissionException
[ https://issues.apache.org/jira/browse/FLINK-21606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-21606. - Fix Version/s: 1.12.3 1.11.4 Resolution: Fixed Fixed via 1.13.0: ab2f89940d 42b94df19d 0bd837c4fd 691f87e00f b24c5e6768 1.12.3: 76fdf33da7 b025ed27b9 3f97d0e988 685baca1bf 2bcc56ef04 1.11.4: 7edd724e13 2f094da996 db0a5f5b64 969fda2525 172fdfed4d > TaskManager connected to invalid JobManager leading to TaskSubmissionException > -- > > Key: FLINK-21606 > URL: https://issues.apache.org/jira/browse/FLINK-21606 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.13.0 >Reporter: Robert Metzger >Assignee: Till Rohrmann >Priority: Critical > Labels: pull-request-available > Fix For: 1.11.4, 1.13.0, 1.12.3 > > Attachments: FLINK-21606-logs.tgz > > > While testing reactive mode, I had to start my JobManager a few times to get > the configuration right. While doing that, I had at least on TaskManager > (TM6), which was first connected to the first JobManager (with a running > job), and then to the second one. > On the second JobManager, I was able to execute my test job (on another > TaskManager (TMx)), once TM6 reconnected, and reactive mode tried to utilize > all available resources, I repeatedly ran into this issue: > {code} > 2021-03-04 15:49:36,322 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > Window(GlobalWindows(), DeltaTrigger, TimeEvictor, ComparableAggregator, > PassThroughWindowFunction) -> Sink: Print to Std. Out (5/7) > (ae8f39c8dd88148aff93c8f811fab22e) switched from DEPLOYING to FAILED on > 192.168.2.173:64041-4f7521 @ macbook-pro-2.localdomain (dataPort=64044). > java.util.concurrent.CompletionException: > org.apache.flink.runtime.taskexecutor.exceptions.TaskSubmissionException: > Could not submit task because there is no JobManager associated for the job > bbe8634736b5b1d813dd322cfaaa08ea. > at > java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326) > ~[?:1.8.0_252] > at > java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338) > ~[?:1.8.0_252] > at > java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:925) > ~[?:1.8.0_252] > at > java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:913) > ~[?:1.8.0_252] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_252] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_252] > at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:234) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_252] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_252] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_252] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_252] > at > org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:1064) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at akka.dispatch.OnComplete.internal(Future.scala:263) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at akka.dispatch.OnComplete.internal(Future.scala:261) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:73) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572) > ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT] > at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:101) >
[jira] [Commented] (FLINK-16908) FlinkKafkaProducerITCase testScaleUpAfterScalingDown Timeout expired while initializing transactional state in 60000ms.
[ https://issues.apache.org/jira/browse/FLINK-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300526#comment-17300526 ] Till Rohrmann commented on FLINK-16908: --- Another instance: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14550&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5&l=13043 > FlinkKafkaProducerITCase testScaleUpAfterScalingDown Timeout expired while > initializing transactional state in 6ms. > --- > > Key: FLINK-16908 > URL: https://issues.apache.org/jira/browse/FLINK-16908 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.11.0, 1.12.0 >Reporter: Piotr Nowojski >Priority: Critical > Labels: test-stability > Fix For: 1.13.0 > > > https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6889&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=f66652e3-384e-5b25-be29-abfea69ea8da > {noformat} > [ERROR] > testScaleUpAfterScalingDown(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase) > Time elapsed: 64.353 s <<< ERROR! > org.apache.kafka.common.errors.TimeoutException: Timeout expired while > initializing transactional state in 6ms. > {noformat} > After this initial error many other tests (I think all following unit tests) > failed with errors like: > {noformat} > [ERROR] > testFailAndRecoverSameCheckpointTwice(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase) > Time elapsed: 7.895 s <<< FAILURE! > java.lang.AssertionError: Detected producer leak. Thread name: > kafka-producer-network-thread | producer-196 > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase.checkProducerLeak(FlinkKafkaProducerITCase.java:675) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase.testFailAndRecoverSameCheckpointTwice(FlinkKafkaProducerITCase.java:311) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] tillrohrmann edited a comment on pull request #14948: [FLINK-21333][coordination] Add StopWithSavepoint state to declarative scheduler
tillrohrmann edited a comment on pull request #14948: URL: https://github.com/apache/flink/pull/14948#issuecomment-797663142 Here is [a branch with a simple StopWithSavepoint implementation](https://github.com/tillrohrmann/flink/tree/FLINK-21333). Let me know what you think @rmetzger. This can definitely be more beautified and I might have overlooked some cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tillrohrmann commented on pull request #14948: [FLINK-21333][coordination] Add StopWithSavepoint state to declarative scheduler
tillrohrmann commented on pull request #14948: URL: https://github.com/apache/flink/pull/14948#issuecomment-797663142 Here is [a branch with a simple `StopWithSavepoint` implementation](https://github.com/tillrohrmann/flink/tree/FLINK-21333). Let me know what you think. This can definitely be more beautified and I might have overlooked some cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15134: [FLINK-21654][test] Added @throws information to tests
flinkbot edited a comment on pull request #15134: URL: https://github.com/apache/flink/pull/15134#issuecomment-795077470 ## CI report: * f77abc145b06b6ef3a745ab382c1761ecae752d1 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14447) * 63851b4ad1d97292257cb8cb02f6bd211b503979 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14562) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15106: [BP-1.12][FLINK-21606] Reject JobManager <-> TaskManager connection if JobManager is not responsible for job
flinkbot edited a comment on pull request #15106: URL: https://github.com/apache/flink/pull/15106#issuecomment-792330426 ## CI report: * 8bf221d8f00686baafe48f05adc458406189dcd3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14550) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-21709) Officially deprecate the legacy planner
[ https://issues.apache.org/jira/browse/FLINK-21709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther closed FLINK-21709. Fix Version/s: 1.13.0 Release Note: The old planner of the Table & SQL API is deprecated and will be dropped in Flink 1.14. This means that both the BatchTableEnvironment and DataSet API interop are reaching end of life. Use the unified TableEnvironment for batch and stream processing with the new planner or the DataStream API in batch execution mode. Resolution: Fixed Fixed in 1.13.0: 42a45a584665e40f9629d4ecc00c8af1316b3db5 > Officially deprecate the legacy planner > --- > > Key: FLINK-21709 > URL: https://issues.apache.org/jira/browse/FLINK-21709 > Project: Flink > Issue Type: Sub-task > Components: Documentation, Table SQL / API >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Major > Labels: pull-request-available > Fix For: 1.13.0 > > > As discussed in > https://lists.apache.org/thread.html/r0851e101e37fbab273775b6a252172c7a9f7c7927107c160de779831%40%3Cdev.flink.apache.org%3E > we will perform the following steps in 1.13: > - Deprecate the `flink-table-planner` module > - Deprecate `BatchTableEnvironment` for both Java, Scala, and Python > - Add a notice to release notes and documentation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #14948: [FLINK-21333][coordination] Add StopWithSavepoint state to declarative scheduler
flinkbot edited a comment on pull request #14948: URL: https://github.com/apache/flink/pull/14948#issuecomment-779946654 ## CI report: * 097472900d4be5c87466bf7678aa506f6fd5d463 UNKNOWN * f13e6af735ee5c4e7917f9d8ee1e8c84208ef2ff Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14548) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] zentol commented on a change in pull request #15165: [FLINK-16829] Refactored Prometheus Metric Reporters to Use Constructor Initialization
zentol commented on a change in pull request #15165: URL: https://github.com/apache/flink/pull/15165#discussion_r593355790 ## File path: flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java ## @@ -56,43 +56,42 @@ private boolean deleteOnShutdown; private Map groupingKey; -@Override -public void open(MetricConfig config) { -super.open(config); - -String host = config.getString(HOST.key(), HOST.defaultValue()); -int port = config.getInteger(PORT.key(), PORT.defaultValue()); -String configuredJobName = config.getString(JOB_NAME.key(), JOB_NAME.defaultValue()); -boolean randomSuffix = -config.getBoolean( -RANDOM_JOB_NAME_SUFFIX.key(), RANDOM_JOB_NAME_SUFFIX.defaultValue()); -deleteOnShutdown = -config.getBoolean(DELETE_ON_SHUTDOWN.key(), DELETE_ON_SHUTDOWN.defaultValue()); -groupingKey = -parseGroupingKey(config.getString(GROUPING_KEY.key(), GROUPING_KEY.defaultValue())); - -if (host == null || host.isEmpty() || port < 1) { +PrometheusPushGatewayReporter( +@Nullable final String hostConfig, +@Nullable final int portConfig, +@Nullable final String jobNameConfig, +@Nullable final boolean randomJobSuffixConfig, +@Nullable final boolean deleteOnShutdownConfig, +@Nullable final Map groupingKeyConfig +) { +deleteOnShutdown = deleteOnShutdownConfig +groupingKey = parseGroupingKey(groupingKeyConfig) + +if (hostConfig == null || hostConfig.isEmpty() || portConfig < 1) { throw new IllegalArgumentException( -"Invalid host/port configuration. Host: " + host + " Port: " + port); +"Invalid host/port configuration. Host: " + hostConfig + " Port: " + portConfig); } -if (randomSuffix) { -this.jobName = configuredJobName + new AbstractID(); +if (randomJobSuffixConfig) { +this.jobName = jobNameConfig + new AbstractID(); } else { -this.jobName = configuredJobName; +this.jobName = jobNameConfig; } -pushGateway = new PushGateway(host + ':' + port); +pushGateway = new PushGateway(hostConfig + ':' + portConfig); log.info( "Configured PrometheusPushGatewayReporter with {host:{}, port:{}, jobName:{}, randomJobNameSuffix:{}, deleteOnShutdown:{}, groupingKey:{}}", -host, -port, -jobName, -randomSuffix, +hostConfig, +portConfig, +jobNameConfig, +randomJobSuffixConfig, deleteOnShutdown, groupingKey); } +@Override +public void open(MetricConfig config) { } + Map parseGroupingKey(final String groupingKeyConfig) { Review comment: I think you're overthinking it. ``` public PrometheusPushGatewayReporter createMetricReporter(Properties properties) { // do all the stuff the current version does groupingKey = parseGroupingKey(groupingKeyConfig) return new PrometheusPushGatewayReporter(..., groupingKey); } static Map parseGroupingKey(String groupingKeyConfig) { ... } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] twalthr closed pull request #15144: [FLINK-21709][table] Officially deprecate the legacy planner
twalthr closed pull request #15144: URL: https://github.com/apache/flink/pull/15144 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15136: [FLINK-21622][table-planner] Introduce function TO_TIMESTAMP_LTZ(numeric [, precision])
flinkbot edited a comment on pull request #15136: URL: https://github.com/apache/flink/pull/15136#issuecomment-795141489 ## CI report: * fce5cb5d02a79b95ceb3a5bbf41be7d186e2fb85 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14466) * 38f1cea9a24f244241f6e2a9f069feb7f774652c Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14561) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15134: [FLINK-21654][test] Added @throws information to tests
flinkbot edited a comment on pull request #15134: URL: https://github.com/apache/flink/pull/15134#issuecomment-795077470 ## CI report: * f77abc145b06b6ef3a745ab382c1761ecae752d1 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14447) * 63851b4ad1d97292257cb8cb02f6bd211b503979 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15105: [FLINK-21606] Reject JobManager <-> TaskManager connection if JobManager is not responsible for job
flinkbot edited a comment on pull request #15105: URL: https://github.com/apache/flink/pull/15105#issuecomment-792330363 ## CI report: * 6d709fcabe70bf783f83d9906ae4b1550d894096 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14549) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15107: [BP-1.11][FLINK-21606] Reject JobManager <-> TaskManager connection if JobManager is not responsible for job
flinkbot edited a comment on pull request #15107: URL: https://github.com/apache/flink/pull/15107#issuecomment-792330475 ## CI report: * 7edd724e13616e44c5b9aeff409aecdd910dc3fe Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14551) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XComp commented on pull request #15049: [FLINK-21190][runtime-web] Expose exception history
XComp commented on pull request #15049: URL: https://github.com/apache/flink/pull/15049#issuecomment-797643595 I addressed your comments, squashed and rebased the branch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XComp commented on a change in pull request #15049: [FLINK-21190][runtime-web] Expose exception history
XComp commented on a change in pull request #15049: URL: https://github.com/apache/flink/pull/15049#discussion_r593343570 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/ExceptionHistoryEntry.java ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.scheduler; + +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.runtime.executiongraph.AccessExecution; +import org.apache.flink.runtime.executiongraph.ErrorInfo; +import org.apache.flink.runtime.taskmanager.TaskManagerLocation; + +import javax.annotation.Nullable; + +/** + * {@code ExceptionHistoryEntry} collects information about a single task failure that should be + * exposed through the exception history. + */ +public class ExceptionHistoryEntry extends ErrorInfo { + +private static final long serialVersionUID = -3855285510064263701L; + +@Nullable private final String failingTaskName; +@Nullable private final TaskManagerLocation assignedResourceLocation; Review comment: I created a `ArchivedTaskManagerLocation` for now as the refactoring would take a larger effort which is out of scope for this ticket. @zentol Do we already have a ticket that addresses this refactoring? If not, whats the headline of the refactoring ticket I would create? I understood that introducing `TaskManagerLocation` interface was part of a larger effort? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XComp commented on a change in pull request #15049: [FLINK-21190][runtime-web] Expose exception history
XComp commented on a change in pull request #15049: URL: https://github.com/apache/flink/pull/15049#discussion_r593341506 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/JobExceptionsHandler.java ## @@ -132,7 +142,62 @@ private static JobExceptionsInfo createJobExceptionsInfo( } } +final ErrorInfo rootCause = executionGraph.getFailureInfo(); +return new JobExceptionsInfoWithHistory( +rootCause.getException().getOriginalErrorClassName(), +rootCause.getExceptionAsString(), +rootCause.getTimestamp(), +taskExceptionList, +truncated, +createJobExceptionHistory( +executionGraphInfo.getExceptionHistory(), exceptionToReportMaxSize)); +} + +static JobExceptionsInfoWithHistory.JobExceptionHistory createJobExceptionHistory( +Iterable historyEntries, int limit) { +// we need to reverse the history to have a stable result when doing paging on it +final List reversedHistoryEntries = new ArrayList<>(); +Iterables.addAll(reversedHistoryEntries, historyEntries); +Collections.reverse(reversedHistoryEntries); + +List exceptionHistoryEntries = +reversedHistoryEntries.stream() +.limit(limit) +.map(JobExceptionsHandler::createJobExceptionInfo) +.collect(Collectors.toList()); + +return new JobExceptionsInfoWithHistory.JobExceptionHistory( +exceptionHistoryEntries, +exceptionHistoryEntries.size() < reversedHistoryEntries.size()); +} + +private static JobExceptionsInfo createJobExceptionInfo(ExceptionHistoryEntry historyEntry) { return new JobExceptionsInfo( -rootExceptionMessage, rootTimestamp, taskExceptionList, truncated); +historyEntry.getException().getOriginalErrorClassName(), +historyEntry.getExceptionAsString(), +historyEntry.getTimestamp(), +Collections.singletonList( Review comment: You have a good point here. I refactored the schema not relying on the old structure anymore. The refactoring also covered your concern. The `taskName` and `location` are not shared anymore if `null` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-21752) NullPointerException on restore in PojoSerializer
[ https://issues.apache.org/jira/browse/FLINK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Khachatryan updated FLINK-21752: -- Description: As originally reported in [thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Schema-Evolution-Cannot-restore-from-savepoint-after-deleting-field-from-POJO-td42162.html], after removing a field from a class restore from savepoint fails with the following exception: {code} 2021-03-10T20:51:30.406Z INFO org.apache.flink.runtime.taskmanager.Task:960 … (6/8) (d630d5ff0d7ae4fbc428b151abebab52) switched from RUNNING to FAILED. java.lang.Exception: Exception while creating StreamOperatorStateContext. at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:253) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:901) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:415) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for KeyedCoProcessOperator_c535ac415eeb524d67c88f4a481077d2_(6/8) from any of the 1 provided restore options. at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) ... 6 common frames omitted Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed when trying to restore heap backend at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:116) at org.apache.flink.runtime.state.memory.MemoryStateBackend.createKeyedStateBackend(MemoryStateBackend.java:347) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) ... 8 common frames omitted Caused by: java.lang.NullPointerException: null at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.(PojoSerializer.java:123) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.duplicate(PojoSerializer.java:186) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.duplicate(PojoSerializer.java:56) at org.apache.flink.api.common.typeutils.CompositeSerializer$PrecomputedParameters.precompute(CompositeSerializer.java:228) at org.apache.flink.api.common.typeutils.CompositeSerializer.(CompositeSerializer.java:51) at org.apache.flink.runtime.state.ttl.TtlStateFactory$TtlSerializer.(TtlStateFactory.java:250) at org.apache.flink.runtime.state.ttl.TtlStateFactory$TtlSerializerSnapshot.createOuterSerializerWithNestedSerializers(TtlStateFactory.java:359) at org.apache.flink.runtime.state.ttl.TtlStateFactory$TtlSerializerSnapshot.createOuterSerializerWithNestedSerializers(TtlStateFactory.java:330) at org.apache.flink.api.common.typeutils.CompositeTypeSerializerSnapshot.restoreSerializer(CompositeTypeSerializerSnapshot.java:194) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:546) at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260) at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:505) at org.apache.flink.api.common.typeutils.NestedSerializersSnapshotDelegate.snapshotsToRestoreSerializers(NestedSerializersSnapshotDelegate.java:225) at org.apache.flink.api.common.typeutils.NestedSerializersSnapshotDelegate.getRestoredNestedSerializers(NestedSerializersSnapshotDelegate.java:83) at org.apache.flink.api.common.typ
[jira] [Created] (FLINK-21752) NullPointerException on restore in PojoSerializer
Roman Khachatryan created FLINK-21752: - Summary: NullPointerException on restore in PojoSerializer Key: FLINK-21752 URL: https://issues.apache.org/jira/browse/FLINK-21752 Project: Flink Issue Type: Bug Components: API / Type Serialization System Affects Versions: 1.9.3 Reporter: Roman Khachatryan As originally reported in [thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Schema-Evolution-Cannot-restore-from-savepoint-after-deleting-field-from-POJO-td42162.html], after removing a field from a class restore from savepoint fails with the following exception: {code} 2021-03-10T20:51:30.406Z INFO org.apache.flink.runtime.taskmanager.Task:960 … (6/8) (d630d5ff0d7ae4fbc428b151abebab52) switched from RUNNING to FAILED. java.lang.Exception: Exception while creating StreamOperatorStateContext. at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:253) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:901) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:415) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for KeyedCoProcessOperator_c535ac415eeb524d67c88f4a481077d2_(6/8) from any of the 1 provided restore options. at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) ... 6 common frames omitted Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed when trying to restore heap backend at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:116) at org.apache.flink.runtime.state.memory.MemoryStateBackend.createKeyedStateBackend(MemoryStateBackend.java:347) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) ... 8 common frames omitted Caused by: java.lang.NullPointerException: null at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.(PojoSerializer.java:123) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.duplicate(PojoSerializer.java:186) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.duplicate(PojoSerializer.java:56) at org.apache.flink.api.common.typeutils.CompositeSerializer$PrecomputedParameters.precompute(CompositeSerializer.java:228) at org.apache.flink.api.common.typeutils.CompositeSerializer.(CompositeSerializer.java:51) at org.apache.flink.runtime.state.ttl.TtlStateFactory$TtlSerializer.(TtlStateFactory.java:250) at org.apache.flink.runtime.state.ttl.TtlStateFactory$TtlSerializerSnapshot.createOuterSerializerWithNestedSerializers(TtlStateFactory.java:359) at org.apache.flink.runtime.state.ttl.TtlStateFactory$TtlSerializerSnapshot.createOuterSerializerWithNestedSerializers(TtlStateFactory.java:330) at org.apache.flink.api.common.typeutils.CompositeTypeSerializerSnapshot.restoreSerializer(CompositeTypeSerializerSnapshot.java:194) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:546) at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260) at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:505) at org.apache.flink.api.common.typeutils.NestedSerializersSnapshotDelegate.snapshotsToRestoreSerial
[GitHub] [flink] XComp commented on pull request #15134: [FLINK-21654][test] Added @throws information to tests
XComp commented on pull request #15134: URL: https://github.com/apache/flink/pull/15134#issuecomment-797634153 I applied your comments, squashed the commits and rebased the branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] XComp commented on pull request #15134: [FLINK-21654][test] Added @throws information to tests
XComp commented on pull request #15134: URL: https://github.com/apache/flink/pull/15134#issuecomment-797633919 > Thanks for creating this PR @XComp. I think the change makes sense. I had a few minor comments. > > Why is this problem only limited to when we close the `YarnTestBase`? According to YARN-7007 it reads as if the failing of `YarnClient.getApplications` is not restricted to a specific state/timing? Are you referring to the PR description here? You're right. It's not limited to the `close()` call. The description was out-dated. I realized during the implementation that `getApplication` gets also called in other situations. This is covered by the most recent changes. I updated the PR description accordingly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] Yanikovic commented on a change in pull request #15140: [FLINK-20628][connectors/rabbitmq2] RabbitMQ connector using new connector API
Yanikovic commented on a change in pull request #15140: URL: https://github.com/apache/flink/pull/15140#discussion_r593330541 ## File path: flink-connectors/flink-connector-rabbitmq2/src/main/java/org/apache/flink/connector/rabbitmq2/sink/RabbitMQSink.java ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.rabbitmq2.sink; + +import org.apache.flink.api.common.serialization.DeserializationSchema; +import org.apache.flink.api.common.serialization.SerializationSchema; +import org.apache.flink.api.connector.sink.Committer; +import org.apache.flink.api.connector.sink.GlobalCommitter; +import org.apache.flink.api.connector.sink.Sink; +import org.apache.flink.api.connector.sink.SinkWriter; +import org.apache.flink.connector.rabbitmq2.ConsistencyMode; +import org.apache.flink.connector.rabbitmq2.RabbitMQConnectionConfig; +import org.apache.flink.connector.rabbitmq2.sink.state.RabbitMQSinkWriterState; +import org.apache.flink.connector.rabbitmq2.sink.state.RabbitMQSinkWriterStateSerializer; +import org.apache.flink.connector.rabbitmq2.sink.writer.RabbitMQSinkWriterBase; +import org.apache.flink.connector.rabbitmq2.sink.writer.specalized.RabbitMQSinkWriterAtLeastOnce; +import org.apache.flink.connector.rabbitmq2.sink.writer.specalized.RabbitMQSinkWriterAtMostOnce; +import org.apache.flink.connector.rabbitmq2.sink.writer.specalized.RabbitMQSinkWriterExactlyOnce; +import org.apache.flink.core.io.SimpleVersionedSerializer; +import org.apache.flink.util.Preconditions; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.annotation.Nullable; + +import java.util.List; +import java.util.Optional; + +/** + * RabbitMQ sink (publisher) that publishes messages from upstream flink tasks to a RabbitMQ queue. + * It provides at-most-once, at-least-once and exactly-once processing semantics. For at-least-once + * and exactly-once, checkpointing needs to be enabled. The sink operates as a StreamingSink and + * thus works in a streaming fashion. + * + * {@code + * RabbitMQSink + * .builder() + * .setConnectionConfig(connectionConfig) + * .setQueueName("queue") + * .setSerializationSchema(new SimpleStringSchema()) + * .setConsistencyMode(ConsistencyMode.AT_LEAST_ONCE) + * .setMinimalResendInterval(10L) + * .build(); + * } + * + * When creating the sink a {@code connectionConfig} must be specified via {@link + * RabbitMQConnectionConfig}. It contains required information for the RabbitMQ java client to + * connect to the RabbitMQ server. A minimum configuration contains a (virtual) host, a username, a + * password and a port. Besides that, the {@code queueName} to publish to and a {@link + * SerializationSchema} for the sink input type is required. {@code publishOptions} can be added to + * route messages in RabbitMQ. + * + * If at-least-once is required, an optional number of {@code maxRetry} attempts can be specified + * until a failure is triggered. Generally, messages are buffered until an acknowledgement arrives + * because delivery needs to be guaranteed. On each checkpoint, all unacknowledged messages will be + * resent to RabbitMQ. If the checkpointing interval is set low or a high frequency of resending is + * not desired, the {@code minimalResendIntervalMilliseconds} can be specified to prevent the sink + * from resending data that has just arrived. In case of a failure, all unacknowledged messages can + * be restored and resend. + * + * In the case of exactly-once a transactional RabbitMQ channel is used to achieve that all Review comment: Just for clarification: The minimalResendInterval is used only for at-least-once, the maxRetry for both at-least and exactly-once. For at-least-once, we didn't use a transaction but rather resend all unacked messages on snapshotState invocation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.o
[GitHub] [flink] flinkbot edited a comment on pull request #15136: [FLINK-21622][table-planner] Introduce function TO_TIMESTAMP_LTZ(numeric [, precision])
flinkbot edited a comment on pull request #15136: URL: https://github.com/apache/flink/pull/15136#issuecomment-795141489 ## CI report: * fce5cb5d02a79b95ceb3a5bbf41be7d186e2fb85 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14466) * 38f1cea9a24f244241f6e2a9f069feb7f774652c UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15054: [FLINK-13550][runtime][ui] Operator's Flame Graph
flinkbot edited a comment on pull request #15054: URL: https://github.com/apache/flink/pull/15054#issuecomment-788337524 ## CI report: * 26a28f2d83f56cb386e1365fd4df4fb8a2f2bf86 UNKNOWN * dfb0e42f28b809d98e5dec29e9540111e1aa7b10 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14557) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15049: [FLINK-21190][runtime-web] Expose exception history
flinkbot edited a comment on pull request #15049: URL: https://github.com/apache/flink/pull/15049#issuecomment-787719691 ## CI report: * 2cbffce55c35f7e163739d07f88e480870a0fc37 UNKNOWN * bd732209f8aaa9653dfeaf7be6ff8396ece70928 UNKNOWN * ce6fe841168017d9f5a3d6948d69c060514c7ea9 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14495) * d772ca219b9437388a0996cb5498789153ea8b1b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14560) * b8744b240d77fc84dee60042317b79585423d403 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-21702) Support option `sql-client.verbose` to print the exception stack
[ https://issues.apache.org/jira/browse/FLINK-21702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300429#comment-17300429 ] zhuxiaoshang commented on FLINK-21702: -- [~fsk119] [~jark] plz assign to me. > Support option `sql-client.verbose` to print the exception stack > > > Key: FLINK-21702 > URL: https://issues.apache.org/jira/browse/FLINK-21702 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Client >Affects Versions: 1.13.0 >Reporter: Shengkai Fang >Priority: Major > Labels: starer > > By enable this option, users can get the full exception stack. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21466) Make "embedded" parameter optional when start sql-client.sh
[ https://issues.apache.org/jira/browse/FLINK-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300427#comment-17300427 ] zhuxiaoshang commented on FLINK-21466: -- [~fsk119] [~jark] plz assign to me. > Make "embedded" parameter optional when start sql-client.sh > --- > > Key: FLINK-21466 > URL: https://issues.apache.org/jira/browse/FLINK-21466 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Shengkai Fang >Priority: Major > Labels: starter > > Users can use command > {code:java} > >./sql-client.sh embedded -i init1.sql,init2.sql > {code} > to start the sql client and don't need to add {{embedded}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #15049: [FLINK-21190][runtime-web] Expose exception history
flinkbot edited a comment on pull request #15049: URL: https://github.com/apache/flink/pull/15049#issuecomment-787719691 ## CI report: * 2cbffce55c35f7e163739d07f88e480870a0fc37 UNKNOWN * bd732209f8aaa9653dfeaf7be6ff8396ece70928 UNKNOWN * ce6fe841168017d9f5a3d6948d69c060514c7ea9 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14495) * d772ca219b9437388a0996cb5498789153ea8b1b UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tillrohrmann commented on a change in pull request #14948: [FLINK-21333][coordination] Add StopWithSavepoint state to declarative scheduler
tillrohrmann commented on a change in pull request #14948: URL: https://github.com/apache/flink/pull/14948#discussion_r593279092 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/Executing.java ## @@ -143,6 +145,26 @@ public void notifyNewResourcesAvailable() { } } +CompletableFuture stopWithSavepoint( +@Nullable final String targetDirectory, boolean terminate) { +final ExecutionGraph executionGraph = getExecutionGraph(); + +StopWithSavepointOperationManager.checkStopWithSavepointPreconditions( +executionGraph.getCheckpointCoordinator(), +targetDirectory, +executionGraph.getJobID(), +getLogger()); + +getLogger().info("Triggering stop-with-savepoint for job {}.", executionGraph.getJobID()); Review comment: nit: This sanity check could also go into the `AdaptiveScheduler.goToStopWithSavepoint` implementation if one argues that it is not the responsibility of the `Executing` state to decide whether this operation is possible or not. ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/StopWithSavepointOperationManagerForAdaptiveScheduler.java ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.scheduler.adaptive; + +import org.apache.flink.runtime.checkpoint.CompletedCheckpoint; +import org.apache.flink.runtime.checkpoint.StopWithSavepointOperations; +import org.apache.flink.runtime.concurrent.FutureUtils; +import org.apache.flink.runtime.execution.ExecutionState; +import org.apache.flink.runtime.executiongraph.Execution; +import org.apache.flink.runtime.executiongraph.ExecutionGraph; +import org.apache.flink.runtime.scheduler.stopwithsavepoint.StopWithSavepointOperationHandler; +import org.apache.flink.runtime.scheduler.stopwithsavepoint.StopWithSavepointOperationHandlerImpl; +import org.apache.flink.util.Preconditions; + +import org.slf4j.Logger; + +import javax.annotation.Nullable; + +import java.util.Collection; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.Executor; +import java.util.stream.Collectors; + +/** + * A thin wrapper around the StopWithSavepointOperationHandler to handle global failures according + * to the needs of AdaptiveScheduler. AdaptiveScheduler currently doesn't support local failover, + * hence, any reported failure will lead to a transition to Restarting or Failing state. + */ +class StopWithSavepointOperationManagerForAdaptiveScheduler implements GlobalFailureHandler { +private final StopWithSavepointOperationHandler handler; +private final ExecutionGraph executionGraph; +private final StopWithSavepointOperations stopWithSavepointOperations; +private boolean operationFinished = false; + +StopWithSavepointOperationManagerForAdaptiveScheduler( +ExecutionGraph executionGraph, +StopWithSavepointOperations stopWithSavepointOperations, +boolean terminate, +@Nullable String targetDirectory, +Executor mainThreadExecutor, +Logger logger) { + +this.stopWithSavepointOperations = stopWithSavepointOperations; + +// do not trigger checkpoints while creating the final savepoint. We will start the +// scheduler onLeave() again. +stopWithSavepointOperations.stopCheckpointScheduler(); + +final CompletableFuture savepointFuture = + stopWithSavepointOperations.triggerSynchronousSavepoint(terminate, targetDirectory); + +this.handler = +new StopWithSavepointOperationHandlerImpl( +executionGraph.getJobID(), this, stopWithSavepointOperations, logger); + +FutureUtils.assertNoException( +savepointFuture +// the completedSavepointFuture could also be completed by +// CheckpointCanceller which doesn't run in the mainThreadExecutor Review comment: This is a very good finding. ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/sch
[GitHub] [flink] flinkbot edited a comment on pull request #15170: [FLINK-21610][tests] Do not attempt to cancel the job
flinkbot edited a comment on pull request #15170: URL: https://github.com/apache/flink/pull/15170#issuecomment-797471124 ## CI report: * 8f387ae8f7bf9ef74c971efd22704d85eec8e68f Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14547) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15154: [FLINK-21725][core] Name constructor arguments of Tuples like fields
flinkbot edited a comment on pull request #15154: URL: https://github.com/apache/flink/pull/15154#issuecomment-796737826 ## CI report: * 3040b7bdca9f57ca1a34a89cb057ff7b4d7124fe Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14476) * 4572f80bab02248b8d5d1f8d2bda99f9a933e17f Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14559) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] rionmonster commented on a change in pull request #15165: [FLINK-16829] Refactored Prometheus Metric Reporters to Use Constructor Initialization
rionmonster commented on a change in pull request #15165: URL: https://github.com/apache/flink/pull/15165#discussion_r593281214 ## File path: flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporterFactory.java ## @@ -22,13 +22,38 @@ import java.util.Properties; +import static org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporterOptions.DELETE_ON_SHUTDOWN; +import static org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporterOptions.GROUPING_KEY; +import static org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporterOptions.HOST; +import static org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporterOptions.JOB_NAME; +import static org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporterOptions.PORT; +import static org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporterOptions.RANDOM_JOB_NAME_SUFFIX; + /** {@link MetricReporterFactory} for {@link PrometheusPushGatewayReporter}. */ @InterceptInstantiationViaReflection( reporterClassName = "org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter") public class PrometheusPushGatewayReporterFactory implements MetricReporterFactory { @Override public PrometheusPushGatewayReporter createMetricReporter(Properties properties) { -return new PrometheusPushGatewayReporter(); +String hostConfig = properties.getString(HOST.key(), HOST.defaultValue()); +int portConfig = properties.getInteger(PORT.key(), PORT.defaultValue()); +String jobNameConfig = properties.getString(JOB_NAME.key(), JOB_NAME.defaultValue()) Review comment: I've been working primarily with Kotlin over the last year or so, semicolons aren't my friends. ;) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] rionmonster commented on a change in pull request #15165: [FLINK-16829] Refactored Prometheus Metric Reporters to Use Constructor Initialization
rionmonster commented on a change in pull request #15165: URL: https://github.com/apache/flink/pull/15165#discussion_r593279557 ## File path: flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java ## @@ -56,43 +56,42 @@ private boolean deleteOnShutdown; private Map groupingKey; -@Override -public void open(MetricConfig config) { -super.open(config); - -String host = config.getString(HOST.key(), HOST.defaultValue()); -int port = config.getInteger(PORT.key(), PORT.defaultValue()); -String configuredJobName = config.getString(JOB_NAME.key(), JOB_NAME.defaultValue()); -boolean randomSuffix = -config.getBoolean( -RANDOM_JOB_NAME_SUFFIX.key(), RANDOM_JOB_NAME_SUFFIX.defaultValue()); -deleteOnShutdown = -config.getBoolean(DELETE_ON_SHUTDOWN.key(), DELETE_ON_SHUTDOWN.defaultValue()); -groupingKey = -parseGroupingKey(config.getString(GROUPING_KEY.key(), GROUPING_KEY.defaultValue())); - -if (host == null || host.isEmpty() || port < 1) { +PrometheusPushGatewayReporter( +@Nullable final String hostConfig, +@Nullable final int portConfig, +@Nullable final String jobNameConfig, +@Nullable final boolean randomJobSuffixConfig, +@Nullable final boolean deleteOnShutdownConfig, +@Nullable final Map groupingKeyConfig +) { +deleteOnShutdown = deleteOnShutdownConfig +groupingKey = parseGroupingKey(groupingKeyConfig) + +if (hostConfig == null || hostConfig.isEmpty() || portConfig < 1) { throw new IllegalArgumentException( -"Invalid host/port configuration. Host: " + host + " Port: " + port); +"Invalid host/port configuration. Host: " + hostConfig + " Port: " + portConfig); } -if (randomSuffix) { -this.jobName = configuredJobName + new AbstractID(); +if (randomJobSuffixConfig) { +this.jobName = jobNameConfig + new AbstractID(); } else { -this.jobName = configuredJobName; +this.jobName = jobNameConfig; } -pushGateway = new PushGateway(host + ':' + port); +pushGateway = new PushGateway(hostConfig + ':' + portConfig); log.info( "Configured PrometheusPushGatewayReporter with {host:{}, port:{}, jobName:{}, randomJobNameSuffix:{}, deleteOnShutdown:{}, groupingKey:{}}", -host, -port, -jobName, -randomSuffix, +hostConfig, +portConfig, +jobNameConfig, +randomJobSuffixConfig, deleteOnShutdown, groupingKey); } +@Override +public void open(MetricConfig config) { } + Map parseGroupingKey(final String groupingKeyConfig) { Review comment: Thanks for the feedback! This makes sense from a testability perspective. Since the factory only presently has a `createMetricReporter` method defined on the interface, I'm guessing this static method would just support the sanitization / cleanup calls prior to just calling the other method? Something like: ``` static PrometheusPushGatewayReporter nameTbd(Properties properties) { // Check for nulls, general sanitization code here return createMetricReporter(...); } @Override public PrometheusPushGatewayReporter createMetricReporter(Properties properties) { // Omitted for brevity return new PrometheusPushGatewayReporter(...); } ``` Because otherwise, it just seems like `open` (or whatever it ends up being named) would just circumvent the need for a factory unless I'm just misunderstanding what the static method would do. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15163: [FLINK-21710][table-planner-blink] FlinkRelMdUniqueKeys gets incorrect result on TableScan after project push-down
flinkbot edited a comment on pull request #15163: URL: https://github.com/apache/flink/pull/15163#issuecomment-797203134 ## CI report: * 6563e261e69a04e6428e7cd766273e5a3967abe9 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14541) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15154: [FLINK-21725][core] Name constructor arguments of Tuples like fields
flinkbot edited a comment on pull request #15154: URL: https://github.com/apache/flink/pull/15154#issuecomment-796737826 ## CI report: * 3040b7bdca9f57ca1a34a89cb057ff7b4d7124fe Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14476) * 4572f80bab02248b8d5d1f8d2bda99f9a933e17f UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15144: [FLINK-21709][table] Officially deprecate the legacy planner
flinkbot edited a comment on pull request #15144: URL: https://github.com/apache/flink/pull/15144#issuecomment-796504890 ## CI report: * cf09b912a5b2e94d7df1339d71f532568216e308 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14483) * 8296846a743067953fbfd108c22c3fb627f0ef09 UNKNOWN * ef1b016a64f7755582e49e0e2a5d7d67a5e7dc16 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14558) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #15054: [FLINK-13550][runtime][ui] Operator's Flame Graph
flinkbot edited a comment on pull request #15054: URL: https://github.com/apache/flink/pull/15054#issuecomment-788337524 ## CI report: * 26a28f2d83f56cb386e1365fd4df4fb8a2f2bf86 UNKNOWN * abab5960a130986aca48bd51aec2b897205106c9 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14545) * dfb0e42f28b809d98e5dec29e9540111e1aa7b10 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=14557) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] tillrohrmann commented on a change in pull request #15153: [FLINK-21707][runtime] Do not trigger scheduling of non-CREATED regions in PipelinedRegionSchedulingStrategy
tillrohrmann commented on a change in pull request #15153: URL: https://github.com/apache/flink/pull/15153#discussion_r593272810 ## File path: flink-tests/src/test/java/org/apache/flink/test/scheduling/PipelinedRegionSchedulingITCase.java ## @@ -204,4 +248,22 @@ public void invoke() throws Exception { } } } + +/** Invokable which fails exactly once with a {@link PartitionException}. */ +public static class OneTimeFailingReceiverWithPartitionException extends AbstractInvokable { + +private static final AtomicBoolean hasFailed = new AtomicBoolean(false); Review comment: I would suggest to make this resettable. Otherwise it won't be possible to run `testRecoverFromPartitionException` repeatedly from your IDE. Only the first run would throw the `PartitionNotFoundException` and all other runs would simply be a `NoOpInvokable`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] knaufk commented on a change in pull request #15064: [FLINK-21074][FLINK-21076][docs][coord] Document Reactive Mode
knaufk commented on a change in pull request #15064: URL: https://github.com/apache/flink/pull/15064#discussion_r593257043 ## File path: docs/content/docs/deployment/elastic_scaling.md ## @@ -0,0 +1,107 @@ +--- +title: Elastic Scaling +weight: 5 +type: docs + +--- + + +# Elastic Scaling + +Flink allows you to adjust your cluster size to your workloads. This is possible manually by stopping a job with a savepoint and restarting it from the savepoint with a different parallelism. + +This page describes options where Flink automatically adjusts the parallelism. + +## Reactive Mode + +{{< hint danger >}} +Reactive mode is a MVP ("minimum viable product") feature. The Flink community is actively looking for feedback by users through our mailing lists. Please check the limitations listed on this page. +{{< /hint >}} + +Reactive Mode configures Flink so that it always uses all resources made available to Flink. Adding a TaskManager will scale up your job, removing resources will scale it down. Flink will manage the parallelism of a job's operators, always setting them to the highest possible values. Review comment: ```suggestion Reactive Mode configures a job so that it always uses all resources available in the cluster. Adding a TaskManager will scale up your job, removing resources will scale it down. Flink will manage the parallelism of the job, always setting them to the highest possible values. ``` ## File path: docs/content/docs/deployment/elastic_scaling.md ## @@ -0,0 +1,107 @@ +--- +title: Elastic Scaling +weight: 5 +type: docs + +--- + + +# Elastic Scaling + +Flink allows you to adjust your cluster size to your workloads. This is possible manually by stopping a job with a savepoint and restarting it from the savepoint with a different parallelism. + +This page describes options where Flink automatically adjusts the parallelism. + +## Reactive Mode + +{{< hint danger >}} +Reactive mode is a MVP ("minimum viable product") feature. The Flink community is actively looking for feedback by users through our mailing lists. Please check the limitations listed on this page. +{{< /hint >}} + +Reactive Mode configures Flink so that it always uses all resources made available to Flink. Adding a TaskManager will scale up your job, removing resources will scale it down. Flink will manage the parallelism of a job's operators, always setting them to the highest possible values. + +The Reactive Mode allows Flink users to implement a powerful autoscaling mechanism, by having an external service monitor certain metrics, such as consumer lag, aggregate CPU utilization, throughput or latency. As soon as these metrics are above or below a certain threshold, additional TaskManagers can be added or removed from the Flink cluster. This could be implemented through changing the [replica factor](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#replicas) of a Kubernetes deployment, or an [autoscaling](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html) group. This external service only needs to handle the resource allocation and deallocation. Flink will take care of keeping the job running with the resources available. + +### Getting started + +If you just want to try out Reactive Mode, follow these instructions. They assume that you are deploying Flink on one machine. + +```bash + +# these instructions assume you are in the root directory of a Flink distribution. + +# Put Job into lib/ directory +cp ./examples/streaming/TopSpeedWindowing.jar lib/ +# Submit Job in Reactive Mode +./bin/standalone-job.sh start -Drestart-strategy=fixeddelay -Drestart-strategy.fixed-delay.attempts=10 -Dscheduler-mode=reactive -j org.apache.flink.streaming.examples.windowing.TopSpeedWindowing +# Start first TaskManager +./bin/taskmanager.sh start +``` + +Let's quickly examine the used submission command: +- `./bin/standalone-job.sh start` deploys Flink in [Application Mode]({{< ref "docs/deployment/overview" >}}#application-mode) +- `-Drestart-strategy=fixeddelay` and `-Drestart-strategy.fixed-delay.attempts=10` configure the job to restart on failure. This is needed for supporting scale-down. +- `-Dscheduler-mode=reactive` enables Reactive Mode. +- the last argument is passing the Job's main class name. + +You have now started a Flink job in Reactive Mode. The [web interface](http://localhost:8081) shows that the job is running on one TaskManager. If you want to scale up the job, simply add another TaskManager to the cluster: +```bash +# Start additional TaskManager +./bin/taskmanager.sh start +``` + +To scale down, remove a TaskManager instance. +```bash +# Remove a TaskManager +./bin/taskmanager.sh stop +``` + +### Usage + + Configuration + +To enable Reactive Mode, you need to configure `scheduler-mode` to `reactive`. Review comment: Why is this configuration called "scheduler-mode"? Is it a