[jira] [Closed] (FLINK-29156) Support LISTAGG in the Table API

2022-10-03 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-29156.
---
Fix Version/s: 1.17.0
   Resolution: Fixed

Merged to master via 397141ed7c00586e5b5449177840243babe2a776

> Support LISTAGG in the Table API
> 
>
> Key: FLINK-29156
> URL: https://issues.apache.org/jira/browse/FLINK-29156
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / API
>Reporter: zhangjingcun
>Assignee: zhangjingcun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>
> Currently, LISTAGG  are not supported in Table API.
> table.group_by(col("a"))
>      .select(
>          col("a"),
>         call("LISTAGG", col("b"), ','))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29494) ChangeLogNormalize operator causes unexpected firing of past windows after state restoration

2022-10-03 Thread Rashmin Patel (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rashmin Patel updated FLINK-29494:
--
Component/s: (was: Runtime / Checkpointing)

> ChangeLogNormalize operator causes unexpected firing of past windows after 
> state restoration
> 
>
> Key: FLINK-29494
> URL: https://issues.apache.org/jira/browse/FLINK-29494
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Table SQL / Runtime
>Affects Versions: 1.14.2
> Environment: Flink version: 1.14.2
> API: Flink SQL
>Reporter: Rashmin Patel
>Priority: Critical
>
> *Issue Summary:*
> While doing GroupWindowAggregation on stream produced by `upsert-kafka` 
> connector, I am facing an unexpected behaviour, where restoring a job from 
> checkpoint/savepoint is causing past windows(wrt last watermark generated by 
> previous job run) to fire.
> *Detailed Description:* 
> My program is written in Flink SQL.
> Watermark Strategy: max-bounded-out-of-orderness with periodic generation 
> (with default 200ms interval)
> Rowtime field: `updated_at_ts` which is monotonically increasing field in 
> changelog stream produced by debezium.
> Below is the runtime topology of Flink Job
> Kafka Source (upsert mode) >>  ChangeLogNormalize >> GroupWindowAggregate >> 
> PostgresSink
> *Job Logic Context:*
> I am reading a cdc-stream from kafka and record schema looks something like 
> this:
> (pk, loan_acc_no, status, created_at, *updated_at,* __op).
> Now I want to count number of distinct loan_acc_no with *hourly* window. So I 
> have created watermark on {{updated_at}} field and hence tumbling also on 
> {{updated_at}}
> *Usual scenario which triggers unexpected late windows:*
> Now suppose that for the previous job run, the latest running window 
> was{color:#0747a6} {{2022-09-10 08:59:59}}{color} (win_end time) and job had 
> processed events till {{{}08:30{}}}.
> Now upon restarting a job, suppose I got a first cdc event like (pk1, loan_1, 
> "approved", {color:#00875a}{{2022-09-02 00:00:00}}{color}, 
> {color:#00875a}{{2022-09-10 08:45:00}}{color}, "u")  say it {*}E1{*}, which 
> is not a late event wrt the last watermark generated by source operator in 
> previous job run.
> Now there is ChangeLogNormalize operator in between kafka source and window 
> operator. So, when kafka source forwards this *E1* to ChangeLogNormalize, it 
> will emit two records which will be of type -U and +U, and will be passed as 
> input to window operator.
>  -U (pk1, loan_1, "pending", {color:#00875a}{{2022-09-02 00:00:00}}{color}, 
> {color:#00875a}{{2022-09-05 00:00:00}}{color}, "u") => previous state of 
> record with key `{_}pk1{_}`
> +U (pk1, loan_1, "approved", {color:#00875a}{{2022-09-02 00:00:00}}{color}, 
> {color:#00875a}{{2022-09-10 08:45:00}}{color}, "u") => same as E1
> So this -U type of events are causing the problem since their {{updated_at}} 
> can be of any timestamp in the past and we are tumbling on this field. As per 
> periodic watermarks, during the first watermark interval (i.e 200 ms 
> default), no events will be considered late, so these -U events will create 
> their window state and upon receiving first high watermark, windows created 
> by these events will fire.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29494) ChangeLogNormalize operator causes unexpected firing of past windows after state restoration

2022-10-03 Thread Rashmin Patel (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rashmin Patel updated FLINK-29494:
--
Component/s: Runtime / Checkpointing

> ChangeLogNormalize operator causes unexpected firing of past windows after 
> state restoration
> 
>
> Key: FLINK-29494
> URL: https://issues.apache.org/jira/browse/FLINK-29494
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Runtime / Checkpointing, Table SQL / 
> Runtime
>Affects Versions: 1.14.2
> Environment: Flink version: 1.14.2
> API: Flink SQL
>Reporter: Rashmin Patel
>Priority: Critical
>
> *Issue Summary:*
> While doing GroupWindowAggregation on stream produced by `upsert-kafka` 
> connector, I am facing an unexpected behaviour, where restoring a job from 
> checkpoint/savepoint is causing past windows(wrt last watermark generated by 
> previous job run) to fire.
> *Detailed Description:* 
> My program is written in Flink SQL.
> Watermark Strategy: max-bounded-out-of-orderness with periodic generation 
> (with default 200ms interval)
> Rowtime field: `updated_at_ts` which is monotonically increasing field in 
> changelog stream produced by debezium.
> Below is the runtime topology of Flink Job
> Kafka Source (upsert mode) >>  ChangeLogNormalize >> GroupWindowAggregate >> 
> PostgresSink
> *Job Logic Context:*
> I am reading a cdc-stream from kafka and record schema looks something like 
> this:
> (pk, loan_acc_no, status, created_at, *updated_at,* __op).
> Now I want to count number of distinct loan_acc_no with *hourly* window. So I 
> have created watermark on {{updated_at}} field and hence tumbling also on 
> {{updated_at}}
> *Usual scenario which triggers unexpected late windows:*
> Now suppose that for the previous job run, the latest running window 
> was{color:#0747a6} {{2022-09-10 08:59:59}}{color} (win_end time) and job had 
> processed events till {{{}08:30{}}}.
> Now upon restarting a job, suppose I got a first cdc event like (pk1, loan_1, 
> "approved", {color:#00875a}{{2022-09-02 00:00:00}}{color}, 
> {color:#00875a}{{2022-09-10 08:45:00}}{color}, "u")  say it {*}E1{*}, which 
> is not a late event wrt the last watermark generated by source operator in 
> previous job run.
> Now there is ChangeLogNormalize operator in between kafka source and window 
> operator. So, when kafka source forwards this *E1* to ChangeLogNormalize, it 
> will emit two records which will be of type -U and +U, and will be passed as 
> input to window operator.
>  -U (pk1, loan_1, "pending", {color:#00875a}{{2022-09-02 00:00:00}}{color}, 
> {color:#00875a}{{2022-09-05 00:00:00}}{color}, "u") => previous state of 
> record with key `{_}pk1{_}`
> +U (pk1, loan_1, "approved", {color:#00875a}{{2022-09-02 00:00:00}}{color}, 
> {color:#00875a}{{2022-09-10 08:45:00}}{color}, "u") => same as E1
> So this -U type of events are causing the problem since their {{updated_at}} 
> can be of any timestamp in the past and we are tumbling on this field. As per 
> periodic watermarks, during the first watermark interval (i.e 200 ms 
> default), no events will be considered late, so these -U events will create 
> their window state and upon receiving first high watermark, windows created 
> by these events will fire.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29402) Add USE_DIRECT_READ configuration parameter for RocksDB

2022-10-03 Thread Donatien (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612208#comment-17612208
 ] 

Donatien commented on FLINK-29402:
--

Thanks for your comment! Considering that Facebook uses DirectIO for reads and 
writes when performing benchmarks 
([https://www.usenix.org/system/files/fast20-cao_zhichao.pdf)] on RocksDB, I 
would say it is best practice to also enable DirectIO for Flink benchmarks 
using RocksDB. Disabling DirectIO can lead to unpredictable experiments 
depending on 1. the container memory limit 2. the amount of free heap memory 
used by the Page Cache. Again I understand that it is only for research 
purposes and agree that this could be done programmatically.

> Add USE_DIRECT_READ configuration parameter for RocksDB
> ---
>
> Key: FLINK-29402
> URL: https://issues.apache.org/jira/browse/FLINK-29402
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.16.0
>Reporter: Donatien
>Priority: Not a Priority
>  Labels: Enhancement, pull-request-available, rocksdb
> Fix For: 1.17.0
>
> Attachments: directIO-performance-comparison.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> RocksDB allows the use of DirectIO for read operations to bypass the Linux 
> Page Cache. To understand the impact of Linux Page Cache on performance, one 
> can run a heavy workload on a single-tasked Task Manager with a container 
> memory limit identical to the TM process memory. Running this same workload 
> on a TM with no container memory limit will result in better performances but 
> with the host memory exceeding the TM requirement.
> Linux Page Cache are of course useful but can give false results when 
> benchmarking the Managed Memory used by RocksDB. DirectIO is typically 
> enabled for benchmarks on working set estimation [Zwaenepoel et 
> al.|[https://arxiv.org/abs/1702.04323].]
> I propose to add a configuration key allowing users to enable the use of 
> DirectIO for reads thanks to the RocksDB API. This configuration would be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29494) ChangeLogNormalize operator causes unexpected firing of past windows after state restoration

2022-10-03 Thread David Anderson (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612213#comment-17612213
 ] 

David Anderson commented on FLINK-29494:


[~twalthr] [~jingzhang] [~godfrey] This is an interesting problem; it feels to 
me like we've missed something in the overall design of these operators.

One solution might be to implement watermark checkpointing / recovery, either 
globally, or just in the ChangeLogNormalize operator. But it also seems 
problematic that the window operator is consuming -U events. But I probably 
don't have the big picture. What do you think?

> ChangeLogNormalize operator causes unexpected firing of past windows after 
> state restoration
> 
>
> Key: FLINK-29494
> URL: https://issues.apache.org/jira/browse/FLINK-29494
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Table SQL / Runtime
>Affects Versions: 1.14.2
> Environment: Flink version: 1.14.2
> API: Flink SQL
>Reporter: Rashmin Patel
>Priority: Critical
>
> *Issue Summary:*
> While doing GroupWindowAggregation on stream produced by `upsert-kafka` 
> connector, I am facing an unexpected behaviour, where restoring a job from 
> checkpoint/savepoint is causing past windows(wrt last watermark generated by 
> previous job run) to fire.
> *Detailed Description:* 
> My program is written in Flink SQL.
> Watermark Strategy: max-bounded-out-of-orderness with periodic generation 
> (with default 200ms interval)
> Rowtime field: `updated_at_ts` which is monotonically increasing field in 
> changelog stream produced by debezium.
> Below is the runtime topology of Flink Job
> Kafka Source (upsert mode) >>  ChangeLogNormalize >> GroupWindowAggregate >> 
> PostgresSink
> *Job Logic Context:*
> I am reading a cdc-stream from kafka and record schema looks something like 
> this:
> (pk, loan_acc_no, status, created_at, *updated_at,* __op).
> Now I want to count number of distinct loan_acc_no with *hourly* window. So I 
> have created watermark on {{updated_at}} field and hence tumbling also on 
> {{updated_at}}
> *Usual scenario which triggers unexpected late windows:*
> Now suppose that for the previous job run, the latest running window 
> was{color:#0747a6} {{2022-09-10 08:59:59}}{color} (win_end time) and job had 
> processed events till {{{}08:30{}}}.
> Now upon restarting a job, suppose I got a first cdc event like (pk1, loan_1, 
> "approved", {color:#00875a}{{2022-09-02 00:00:00}}{color}, 
> {color:#00875a}{{2022-09-10 08:45:00}}{color}, "u")  say it {*}E1{*}, which 
> is not a late event wrt the last watermark generated by source operator in 
> previous job run.
> Now there is ChangeLogNormalize operator in between kafka source and window 
> operator. So, when kafka source forwards this *E1* to ChangeLogNormalize, it 
> will emit two records which will be of type -U and +U, and will be passed as 
> input to window operator.
>  -U (pk1, loan_1, "pending", {color:#00875a}{{2022-09-02 00:00:00}}{color}, 
> {color:#00875a}{{2022-09-05 00:00:00}}{color}, "u") => previous state of 
> record with key `{_}pk1{_}`
> +U (pk1, loan_1, "approved", {color:#00875a}{{2022-09-02 00:00:00}}{color}, 
> {color:#00875a}{{2022-09-10 08:45:00}}{color}, "u") => same as E1
> So this -U type of events are causing the problem since their {{updated_at}} 
> can be of any timestamp in the past and we are tumbling on this field. As per 
> periodic watermarks, during the first watermark interval (i.e 200 ms 
> default), no events will be considered late, so these -U events will create 
> their window state and upon receiving first high watermark, windows created 
> by these events will fire.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-29413) Make it possible to associate triggered and completed savepoints

2022-10-03 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora resolved FLINK-29413.

Resolution: Fixed

merged to main 1f6a75056acae90e9fab182fd076ee6755b35bbb

> Make it possible to associate triggered and completed savepoints
> 
>
> Key: FLINK-29413
> URL: https://issues.apache.org/jira/browse/FLINK-29413
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Matyas Orhidi
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.3.0
>
>
> Currently it is not clear how one would assoicate completed manual savepoints 
> with savepointTriggerNonce-es when using the operator.
> This makes it difficult to track when a savepoint was completed vs when it 
> was abandoned
> One idea would be to add the savepointTriggerNonce to the completed 
> checkpoint info for Manual savepoints.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29419) HybridShuffle.testHybridFullExchangesRestart hangs

2022-10-03 Thread Xingbo Huang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612296#comment-17612296
 ] 

Xingbo Huang commented on FLINK-29419:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41535&view=logs&j=2c3cbe13-dee0-5837-cf47-3053da9a8a78&t=b78d9d30-509a-5cea-1fef-db7abaa325ae

> HybridShuffle.testHybridFullExchangesRestart hangs
> --
>
> Key: FLINK-29419
> URL: https://issues.apache.org/jira/browse/FLINK-29419
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Huang Xingbo
>Assignee: Weijie Guo
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> {code:java}
> 2022-09-26T10:56:44.0766792Z Sep 26 10:56:44 "ForkJoinPool-1-worker-25" #27 
> daemon prio=5 os_prio=0 tid=0x7f41a4efa000 nid=0x6d76 waiting on 
> condition [0x7f40ac135000]
> 2022-09-26T10:56:44.0767432Z Sep 26 10:56:44java.lang.Thread.State: 
> WAITING (parking)
> 2022-09-26T10:56:44.0767892Z Sep 26 10:56:44  at sun.misc.Unsafe.park(Native 
> Method)
> 2022-09-26T10:56:44.0768644Z Sep 26 10:56:44  - parking to wait for  
> <0xa0704e18> (a java.util.concurrent.CompletableFuture$Signaller)
> 2022-09-26T10:56:44.0769287Z Sep 26 10:56:44  at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 2022-09-26T10:56:44.0769949Z Sep 26 10:56:44  at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> 2022-09-26T10:56:44.0770623Z Sep 26 10:56:44  at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
> 2022-09-26T10:56:44.0771349Z Sep 26 10:56:44  at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> 2022-09-26T10:56:44.0772092Z Sep 26 10:56:44  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> 2022-09-26T10:56:44.0772777Z Sep 26 10:56:44  at 
> org.apache.flink.test.runtime.JobGraphRunningUtil.execute(JobGraphRunningUtil.java:57)
> 2022-09-26T10:56:44.0773534Z Sep 26 10:56:44  at 
> org.apache.flink.test.runtime.BatchShuffleITCaseBase.executeJob(BatchShuffleITCaseBase.java:115)
> 2022-09-26T10:56:44.0774333Z Sep 26 10:56:44  at 
> org.apache.flink.test.runtime.HybridShuffleITCase.testHybridFullExchangesRestart(HybridShuffleITCase.java:59)
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41343&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29387) IntervalJoinITCase.testIntervalJoinSideOutputRightLateData failed with AssertionError

2022-10-03 Thread Xingbo Huang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612298#comment-17612298
 ] 

Xingbo Huang commented on FLINK-29387:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41535&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca

> IntervalJoinITCase.testIntervalJoinSideOutputRightLateData failed with 
> AssertionError
> -
>
> Key: FLINK-29387
> URL: https://issues.apache.org/jira/browse/FLINK-29387
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.17.0
>Reporter: Huang Xingbo
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> 2022-09-22T04:40:21.9296331Z Sep 22 04:40:21 [ERROR] 
> org.apache.flink.test.streaming.runtime.IntervalJoinITCase.testIntervalJoinSideOutputRightLateData
>   Time elapsed: 2.46 s  <<< FAILURE!
> 2022-09-22T04:40:21.9297487Z Sep 22 04:40:21 java.lang.AssertionError: 
> expected:<[(key,2)]> but was:<[]>
> 2022-09-22T04:40:21.9298208Z Sep 22 04:40:21  at 
> org.junit.Assert.fail(Assert.java:89)
> 2022-09-22T04:40:21.9298927Z Sep 22 04:40:21  at 
> org.junit.Assert.failNotEquals(Assert.java:835)
> 2022-09-22T04:40:21.9299655Z Sep 22 04:40:21  at 
> org.junit.Assert.assertEquals(Assert.java:120)
> 2022-09-22T04:40:21.9300403Z Sep 22 04:40:21  at 
> org.junit.Assert.assertEquals(Assert.java:146)
> 2022-09-22T04:40:21.9301538Z Sep 22 04:40:21  at 
> org.apache.flink.test.streaming.runtime.IntervalJoinITCase.expectInAnyOrder(IntervalJoinITCase.java:521)
> 2022-09-22T04:40:21.9302578Z Sep 22 04:40:21  at 
> org.apache.flink.test.streaming.runtime.IntervalJoinITCase.testIntervalJoinSideOutputRightLateData(IntervalJoinITCase.java:280)
> 2022-09-22T04:40:21.9303641Z Sep 22 04:40:21  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-09-22T04:40:21.9304472Z Sep 22 04:40:21  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-09-22T04:40:21.9305371Z Sep 22 04:40:21  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-09-22T04:40:21.9306195Z Sep 22 04:40:21  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-09-22T04:40:21.9307011Z Sep 22 04:40:21  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 2022-09-22T04:40:21.9308077Z Sep 22 04:40:21  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2022-09-22T04:40:21.9308968Z Sep 22 04:40:21  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 2022-09-22T04:40:21.9309849Z Sep 22 04:40:21  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2022-09-22T04:40:21.9310704Z Sep 22 04:40:21  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2022-09-22T04:40:21.9311533Z Sep 22 04:40:21  at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 2022-09-22T04:40:21.9312386Z Sep 22 04:40:21  at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> 2022-09-22T04:40:21.9313231Z Sep 22 04:40:21  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> 2022-09-22T04:40:21.9314985Z Sep 22 04:40:21  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> 2022-09-22T04:40:21.9315857Z Sep 22 04:40:21  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> 2022-09-22T04:40:21.9316633Z Sep 22 04:40:21  at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> 2022-09-22T04:40:21.9317450Z Sep 22 04:40:21  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> 2022-09-22T04:40:21.9318209Z Sep 22 04:40:21  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> 2022-09-22T04:40:21.9318949Z Sep 22 04:40:21  at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> 2022-09-22T04:40:21.9319680Z Sep 22 04:40:21  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> 2022-09-22T04:40:21.9320401Z Sep 22 04:40:21  at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 2022-09-22T04:40:21.9321130Z Sep 22 04:40:21  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> 2022-09-22T04:40:21.9321822Z Sep 22 04:40:21  at 
> org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> 2022-09-22T04:40:21.9322498Z Sep 22 04:40:21  at 
> org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> 2022-09-22T04:40:21.9323248Z Sep 22 04:40:21  at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
> 2022-09-22T

[jira] [Created] (FLINK-29495) PulsarSinkE2ECase hang

2022-10-03 Thread Xingbo Huang (Jira)
Xingbo Huang created FLINK-29495:


 Summary: PulsarSinkE2ECase hang
 Key: FLINK-29495
 URL: https://issues.apache.org/jira/browse/FLINK-29495
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Pulsar
Affects Versions: 1.15.2, 1.16.0, 1.17.0
Reporter: Xingbo Huang


{code:java}
2022-10-02T05:53:56.0611489Z "main" #1 prio=5 os_prio=0 cpu=5171.60ms 
elapsed=9072.82s tid=0x7f9508028000 nid=0x54ef1 waiting on condition  
[0x7f950f994000]
2022-10-02T05:53:56.0612041Zjava.lang.Thread.State: TIMED_WAITING (parking)
2022-10-02T05:53:56.0612475Zat 
jdk.internal.misc.Unsafe.park(java.base@11.0.16.1/Native Method)
2022-10-02T05:53:56.0613302Z- parking to wait for  <0x87d261f8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
2022-10-02T05:53:56.0613959Zat 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.16.1/LockSupport.java:234)
2022-10-02T05:53:56.0614661Zat 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.16.1/AbstractQueuedSynchronizer.java:2123)
2022-10-02T05:53:56.0615428Zat 
org.apache.pulsar.common.util.collections.GrowableArrayBlockingQueue.poll(GrowableArrayBlockingQueue.java:203)
2022-10-02T05:53:56.0616165Zat 
org.apache.pulsar.client.impl.MultiTopicsConsumerImpl.internalReceive(MultiTopicsConsumerImpl.java:370)
2022-10-02T05:53:56.0616807Zat 
org.apache.pulsar.client.impl.ConsumerBase.receive(ConsumerBase.java:198)
2022-10-02T05:53:56.0617486Zat 
org.apache.flink.connector.pulsar.testutils.sink.PulsarPartitionDataReader.poll(PulsarPartitionDataReader.java:72)
 {code}
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41526&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=5846934b-7a4f-545b-e5b0-eb4d8bda32e1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29495) PulsarSinkE2ECase hang

2022-10-03 Thread Xingbo Huang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612301#comment-17612301
 ] 

Xingbo Huang commented on FLINK-29495:
--

cc [~syhily]  Could you help take a look? Thx.

> PulsarSinkE2ECase hang
> --
>
> Key: FLINK-29495
> URL: https://issues.apache.org/jira/browse/FLINK-29495
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.16.0, 1.17.0, 1.15.2
>Reporter: Xingbo Huang
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> 2022-10-02T05:53:56.0611489Z "main" #1 prio=5 os_prio=0 cpu=5171.60ms 
> elapsed=9072.82s tid=0x7f9508028000 nid=0x54ef1 waiting on condition  
> [0x7f950f994000]
> 2022-10-02T05:53:56.0612041Zjava.lang.Thread.State: TIMED_WAITING 
> (parking)
> 2022-10-02T05:53:56.0612475Z  at 
> jdk.internal.misc.Unsafe.park(java.base@11.0.16.1/Native Method)
> 2022-10-02T05:53:56.0613302Z  - parking to wait for  <0x87d261f8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 2022-10-02T05:53:56.0613959Z  at 
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.16.1/LockSupport.java:234)
> 2022-10-02T05:53:56.0614661Z  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.16.1/AbstractQueuedSynchronizer.java:2123)
> 2022-10-02T05:53:56.0615428Z  at 
> org.apache.pulsar.common.util.collections.GrowableArrayBlockingQueue.poll(GrowableArrayBlockingQueue.java:203)
> 2022-10-02T05:53:56.0616165Z  at 
> org.apache.pulsar.client.impl.MultiTopicsConsumerImpl.internalReceive(MultiTopicsConsumerImpl.java:370)
> 2022-10-02T05:53:56.0616807Z  at 
> org.apache.pulsar.client.impl.ConsumerBase.receive(ConsumerBase.java:198)
> 2022-10-02T05:53:56.0617486Z  at 
> org.apache.flink.connector.pulsar.testutils.sink.PulsarPartitionDataReader.poll(PulsarPartitionDataReader.java:72)
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41526&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=5846934b-7a4f-545b-e5b0-eb4d8bda32e1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29495) PulsarSinkE2ECase hang

2022-10-03 Thread Xingbo Huang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612302#comment-17612302
 ] 

Xingbo Huang commented on FLINK-29495:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41525&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=5846934b-7a4f-545b-e5b0-eb4d8bda32e1&l=21963

> PulsarSinkE2ECase hang
> --
>
> Key: FLINK-29495
> URL: https://issues.apache.org/jira/browse/FLINK-29495
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.16.0, 1.17.0, 1.15.2
>Reporter: Xingbo Huang
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> 2022-10-02T05:53:56.0611489Z "main" #1 prio=5 os_prio=0 cpu=5171.60ms 
> elapsed=9072.82s tid=0x7f9508028000 nid=0x54ef1 waiting on condition  
> [0x7f950f994000]
> 2022-10-02T05:53:56.0612041Zjava.lang.Thread.State: TIMED_WAITING 
> (parking)
> 2022-10-02T05:53:56.0612475Z  at 
> jdk.internal.misc.Unsafe.park(java.base@11.0.16.1/Native Method)
> 2022-10-02T05:53:56.0613302Z  - parking to wait for  <0x87d261f8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 2022-10-02T05:53:56.0613959Z  at 
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.16.1/LockSupport.java:234)
> 2022-10-02T05:53:56.0614661Z  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.16.1/AbstractQueuedSynchronizer.java:2123)
> 2022-10-02T05:53:56.0615428Z  at 
> org.apache.pulsar.common.util.collections.GrowableArrayBlockingQueue.poll(GrowableArrayBlockingQueue.java:203)
> 2022-10-02T05:53:56.0616165Z  at 
> org.apache.pulsar.client.impl.MultiTopicsConsumerImpl.internalReceive(MultiTopicsConsumerImpl.java:370)
> 2022-10-02T05:53:56.0616807Z  at 
> org.apache.pulsar.client.impl.ConsumerBase.receive(ConsumerBase.java:198)
> 2022-10-02T05:53:56.0617486Z  at 
> org.apache.flink.connector.pulsar.testutils.sink.PulsarPartitionDataReader.poll(PulsarPartitionDataReader.java:72)
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41526&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=5846934b-7a4f-545b-e5b0-eb4d8bda32e1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29427) LookupJoinITCase failed with classloader problem

2022-10-03 Thread Xingbo Huang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612304#comment-17612304
 ] 

Xingbo Huang commented on FLINK-29427:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41527&view=logs&j=0c940707-2659-5648-cbe6-a1ad63045f0a&t=075c2716-8010-5565-fe08-3c4bb45824a4

> LookupJoinITCase failed with classloader problem
> 
>
> Key: FLINK-29427
> URL: https://issues.apache.org/jira/browse/FLINK-29427
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.0
>Reporter: Huang Xingbo
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> 2022-09-27T02:49:20.9501313Z Sep 27 02:49:20 Caused by: 
> org.codehaus.janino.InternalCompilerException: Compiling 
> "KeyProjection$108341": Trying to access closed classloader. Please check if 
> you store classloaders directly or indirectly in static fields. If the 
> stacktrace suggests that the leak occurs in a third party library and cannot 
> be fixed immediately, you can disable this check with the configuration 
> 'classloader.check-leaked-classloader'.
> 2022-09-27T02:49:20.9502654Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:382)
> 2022-09-27T02:49:20.9503366Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:237)
> 2022-09-27T02:49:20.9504044Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:465)
> 2022-09-27T02:49:20.9504704Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:216)
> 2022-09-27T02:49:20.9505341Z Sep 27 02:49:20  at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:207)
> 2022-09-27T02:49:20.9505965Z Sep 27 02:49:20  at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
> 2022-09-27T02:49:20.9506584Z Sep 27 02:49:20  at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:75)
> 2022-09-27T02:49:20.9507261Z Sep 27 02:49:20  at 
> org.apache.flink.table.runtime.generated.CompileUtils.doCompile(CompileUtils.java:104)
> 2022-09-27T02:49:20.9507883Z Sep 27 02:49:20  ... 30 more
> 2022-09-27T02:49:20.9509266Z Sep 27 02:49:20 Caused by: 
> java.lang.IllegalStateException: Trying to access closed classloader. Please 
> check if you store classloaders directly or indirectly in static fields. If 
> the stacktrace suggests that the leak occurs in a third party library and 
> cannot be fixed immediately, you can disable this check with the 
> configuration 'classloader.check-leaked-classloader'.
> 2022-09-27T02:49:20.9510835Z Sep 27 02:49:20  at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:184)
> 2022-09-27T02:49:20.9511760Z Sep 27 02:49:20  at 
> org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:192)
> 2022-09-27T02:49:20.9512456Z Sep 27 02:49:20  at 
> java.lang.Class.forName0(Native Method)
> 2022-09-27T02:49:20.9513014Z Sep 27 02:49:20  at 
> java.lang.Class.forName(Class.java:348)
> 2022-09-27T02:49:20.9513649Z Sep 27 02:49:20  at 
> org.codehaus.janino.ClassLoaderIClassLoader.findIClass(ClassLoaderIClassLoader.java:89)
> 2022-09-27T02:49:20.9514339Z Sep 27 02:49:20  at 
> org.codehaus.janino.IClassLoader.loadIClass(IClassLoader.java:312)
> 2022-09-27T02:49:20.9514990Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.findTypeByName(UnitCompiler.java:8556)
> 2022-09-27T02:49:20.9515659Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:6749)
> 2022-09-27T02:49:20.9516337Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:6594)
> 2022-09-27T02:49:20.9516989Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:6573)
> 2022-09-27T02:49:20.9517632Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler.access$13900(UnitCompiler.java:215)
> 2022-09-27T02:49:20.9518319Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22$1.visitReferenceType(UnitCompiler.java:6481)
> 2022-09-27T02:49:20.9519018Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22$1.visitReferenceType(UnitCompiler.java:6476)
> 2022-09-27T02:49:20.9519680Z Sep 27 02:49:20  at 
> org.codehaus.janino.Java$ReferenceType.accept(Java.java:3928)
> 2022-09-27T02:49:20.9520386Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22.visitType(UnitCompiler.java:6476)
> 2022-09-27T02:49:20.9521042Z Sep 27 02:49:20  at 
> org.codehaus.janino.UnitCompiler$22.visitType(UnitCompiler.java:6469)
> 2022-09-27T02:49:20.9521677Z Sep 27 02:49:20  at 
> org.codehaus.janino.Java$ReferenceType.accept(Java.java:3927

[jira] [Commented] (FLINK-29495) PulsarSinkE2ECase hang

2022-10-03 Thread Xingbo Huang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612306#comment-17612306
 ] 

Xingbo Huang commented on FLINK-29495:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41527&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=5846934b-7a4f-545b-e5b0-eb4d8bda32e1

> PulsarSinkE2ECase hang
> --
>
> Key: FLINK-29495
> URL: https://issues.apache.org/jira/browse/FLINK-29495
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.16.0, 1.17.0, 1.15.2
>Reporter: Xingbo Huang
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> 2022-10-02T05:53:56.0611489Z "main" #1 prio=5 os_prio=0 cpu=5171.60ms 
> elapsed=9072.82s tid=0x7f9508028000 nid=0x54ef1 waiting on condition  
> [0x7f950f994000]
> 2022-10-02T05:53:56.0612041Zjava.lang.Thread.State: TIMED_WAITING 
> (parking)
> 2022-10-02T05:53:56.0612475Z  at 
> jdk.internal.misc.Unsafe.park(java.base@11.0.16.1/Native Method)
> 2022-10-02T05:53:56.0613302Z  - parking to wait for  <0x87d261f8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 2022-10-02T05:53:56.0613959Z  at 
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.16.1/LockSupport.java:234)
> 2022-10-02T05:53:56.0614661Z  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.16.1/AbstractQueuedSynchronizer.java:2123)
> 2022-10-02T05:53:56.0615428Z  at 
> org.apache.pulsar.common.util.collections.GrowableArrayBlockingQueue.poll(GrowableArrayBlockingQueue.java:203)
> 2022-10-02T05:53:56.0616165Z  at 
> org.apache.pulsar.client.impl.MultiTopicsConsumerImpl.internalReceive(MultiTopicsConsumerImpl.java:370)
> 2022-10-02T05:53:56.0616807Z  at 
> org.apache.pulsar.client.impl.ConsumerBase.receive(ConsumerBase.java:198)
> 2022-10-02T05:53:56.0617486Z  at 
> org.apache.flink.connector.pulsar.testutils.sink.PulsarPartitionDataReader.poll(PulsarPartitionDataReader.java:72)
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41526&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=5846934b-7a4f-545b-e5b0-eb4d8bda32e1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29496) Unable to configure STS endpoint when using ASSUME_ROLE credential provider in Kinesis connector

2022-10-03 Thread Aleksandr Pilipenko (Jira)
Aleksandr Pilipenko created FLINK-29496:
---

 Summary: Unable to configure STS endpoint when using ASSUME_ROLE 
credential provider in Kinesis connector
 Key: FLINK-29496
 URL: https://issues.apache.org/jira/browse/FLINK-29496
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kinesis
Reporter: Aleksandr Pilipenko


When using Kinesis connector with credentials provider configured as 
ASSUME_ROLE in the job running in VPC without internet connection, credentials 
provider logic tries to access global STS endpoint, {{{}sts.amazonaws.com{}}}. 
However, only regional endpoints for STS are available in that case.

Connector need support for configuring STS endpoint to allow such use-case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29496) Unable to configure STS endpoint when using ASSUME_ROLE credential provider in Kinesis connector

2022-10-03 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-29496:
--
Fix Version/s: 1.17.0
   1.15.3
   1.16.1

> Unable to configure STS endpoint when using ASSUME_ROLE credential provider 
> in Kinesis connector
> 
>
> Key: FLINK-29496
> URL: https://issues.apache.org/jira/browse/FLINK-29496
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kinesis
>Reporter: Aleksandr Pilipenko
>Priority: Major
> Fix For: 1.17.0, 1.15.3, 1.16.1
>
>
> When using Kinesis connector with credentials provider configured as 
> ASSUME_ROLE in the job running in VPC without internet connection, 
> credentials provider logic tries to access global STS endpoint, 
> {{{}sts.amazonaws.com{}}}. However, only regional endpoints for STS are 
> available in that case.
> Connector need support for configuring STS endpoint to allow such use-case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29496) Unable to configure STS endpoint when using ASSUME_ROLE credential provider in Kinesis connector

2022-10-03 Thread Danny Cranmer (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612334#comment-17612334
 ] 

Danny Cranmer commented on FLINK-29496:
---

Thanks [~a.pilipenko], please apply this to Amazon Kinesis Data Firehose and 
Streams connectors. Please cover both AWS V1 (consumer) and V2 (sinks and EFO 
consumer). 

> Unable to configure STS endpoint when using ASSUME_ROLE credential provider 
> in Kinesis connector
> 
>
> Key: FLINK-29496
> URL: https://issues.apache.org/jira/browse/FLINK-29496
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kinesis
>Reporter: Aleksandr Pilipenko
>Priority: Major
>
> When using Kinesis connector with credentials provider configured as 
> ASSUME_ROLE in the job running in VPC without internet connection, 
> credentials provider logic tries to access global STS endpoint, 
> {{{}sts.amazonaws.com{}}}. However, only regional endpoints for STS are 
> available in that case.
> Connector need support for configuring STS endpoint to allow such use-case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-29496) Unable to configure STS endpoint when using ASSUME_ROLE credential provider in Kinesis connector

2022-10-03 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer reassigned FLINK-29496:
-

Assignee: Aleksandr Pilipenko

> Unable to configure STS endpoint when using ASSUME_ROLE credential provider 
> in Kinesis connector
> 
>
> Key: FLINK-29496
> URL: https://issues.apache.org/jira/browse/FLINK-29496
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kinesis
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
> Fix For: 1.17.0, 1.15.3, 1.16.1
>
>
> When using Kinesis connector with credentials provider configured as 
> ASSUME_ROLE in the job running in VPC without internet connection, 
> credentials provider logic tries to access global STS endpoint, 
> {{{}sts.amazonaws.com{}}}. However, only regional endpoints for STS are 
> available in that case.
> Connector need support for configuring STS endpoint to allow such use-case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29496) Unable to configure STS endpoint when using ASSUME_ROLE credential provider in Kinesis connector

2022-10-03 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-29496:
--
Affects Version/s: 1.15.2
   1.16.0

> Unable to configure STS endpoint when using ASSUME_ROLE credential provider 
> in Kinesis connector
> 
>
> Key: FLINK-29496
> URL: https://issues.apache.org/jira/browse/FLINK-29496
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kinesis
>Affects Versions: 1.16.0, 1.15.2
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
> Fix For: 1.17.0, 1.15.3, 1.16.1
>
>
> When using Kinesis connector with credentials provider configured as 
> ASSUME_ROLE in the job running in VPC without internet connection, 
> credentials provider logic tries to access global STS endpoint, 
> {{{}sts.amazonaws.com{}}}. However, only regional endpoints for STS are 
> available in that case.
> Connector need support for configuring STS endpoint to allow such use-case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29497) Provide an option to publish the flink-dist jar file artifact

2022-10-03 Thread Thomas Weise (Jira)
Thomas Weise created FLINK-29497:


 Summary: Provide an option to publish the flink-dist jar file 
artifact
 Key: FLINK-29497
 URL: https://issues.apache.org/jira/browse/FLINK-29497
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.16.0
Reporter: Thomas Weise
Assignee: Thomas Weise


Currently deployment is skipped for the flink-dist jar file. Instead of 
hardcoding that in pom.xml, use a property that can control this behavior from 
the maven command line.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29498) Flink Async I/O Retry Strategies Do Not Work for Scala AsyncDataStream API

2022-10-03 Thread Eric Xiao (Jira)
Eric Xiao created FLINK-29498:
-

 Summary: Flink Async I/O Retry Strategies Do Not Work for Scala 
AsyncDataStream API
 Key: FLINK-29498
 URL: https://issues.apache.org/jira/browse/FLINK-29498
 Project: Flink
  Issue Type: Bug
  Components: API / Scala
Affects Versions: 1.15.2
Reporter: Eric Xiao


When I try calling the function `AsyncDataStream.unorderedWaitWithRetry` from 
the scala API I with a retry strategy from the java API I get an error as 
`unorderedWaitWithRetry` expects a scala retry strategy. The problem is that 
retry strategies were only implemented in java and not Scala in this PR: 
http://github.com/apache/flink/pull/19983.
{code:java}
import org.apache.flink.streaming.api.scala.AsyncDataStream
import org.apache.flink.streaming.util.retryable.{AsyncRetryStrategies => 
JAsyncRetryStrategies}

val javaAsyncRetryStrategy = new 
JAsyncRetryStrategies.FixedDelayRetryStrategyBuilder[Int](3, 100L)
.build()

val data = AsyncDataStream.unorderedWaitWithRetry(
  source,
  asyncOperator,
  pipelineTimeoutInMs,
  TimeUnit.MILLISECONDS,
  javaAsyncRetryStrategy
){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29498) Flink Async I/O Retry Strategies Do Not Work for Scala AsyncDataStream API

2022-10-03 Thread Eric Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Xiao updated FLINK-29498:
--
Description: 
When I try calling the function \{{AsyncDataStream.unorderedWaitWithRetry}} 
from the scala API I with a retry strategy from the java API I get an error as 
\{{unorderedWaitWithRetry}} expects a scala retry strategy. The problem is that 
retry strategies were only implemented in java and not Scala in this PR: 
[http://github.com/apache/flink/pull/19983].
{code:java}
import org.apache.flink.streaming.api.scala.AsyncDataStream
import org.apache.flink.streaming.util.retryable.{AsyncRetryStrategies => 
JAsyncRetryStrategies}

val javaAsyncRetryStrategy = new 
JAsyncRetryStrategies.FixedDelayRetryStrategyBuilder[Int](3, 100L)
.build()

val data = AsyncDataStream.unorderedWaitWithRetry(
  source,
  asyncOperator,
  pipelineTimeoutInMs,
  TimeUnit.MILLISECONDS,
  javaAsyncRetryStrategy
){code}

  was:
When I try calling the function `AsyncDataStream.unorderedWaitWithRetry` from 
the scala API I with a retry strategy from the java API I get an error as 
`unorderedWaitWithRetry` expects a scala retry strategy. The problem is that 
retry strategies were only implemented in java and not Scala in this PR: 
http://github.com/apache/flink/pull/19983.
{code:java}
import org.apache.flink.streaming.api.scala.AsyncDataStream
import org.apache.flink.streaming.util.retryable.{AsyncRetryStrategies => 
JAsyncRetryStrategies}

val javaAsyncRetryStrategy = new 
JAsyncRetryStrategies.FixedDelayRetryStrategyBuilder[Int](3, 100L)
.build()

val data = AsyncDataStream.unorderedWaitWithRetry(
  source,
  asyncOperator,
  pipelineTimeoutInMs,
  TimeUnit.MILLISECONDS,
  javaAsyncRetryStrategy
){code}


> Flink Async I/O Retry Strategies Do Not Work for Scala AsyncDataStream API
> --
>
> Key: FLINK-29498
> URL: https://issues.apache.org/jira/browse/FLINK-29498
> Project: Flink
>  Issue Type: Bug
>  Components: API / Scala
>Affects Versions: 1.15.2
>Reporter: Eric Xiao
>Priority: Minor
>
> When I try calling the function \{{AsyncDataStream.unorderedWaitWithRetry}} 
> from the scala API I with a retry strategy from the java API I get an error 
> as \{{unorderedWaitWithRetry}} expects a scala retry strategy. The problem is 
> that retry strategies were only implemented in java and not Scala in this PR: 
> [http://github.com/apache/flink/pull/19983].
> {code:java}
> import org.apache.flink.streaming.api.scala.AsyncDataStream
> import org.apache.flink.streaming.util.retryable.{AsyncRetryStrategies => 
> JAsyncRetryStrategies}
> val javaAsyncRetryStrategy = new 
> JAsyncRetryStrategies.FixedDelayRetryStrategyBuilder[Int](3, 100L)
> .build()
> val data = AsyncDataStream.unorderedWaitWithRetry(
>   source,
>   asyncOperator,
>   pipelineTimeoutInMs,
>   TimeUnit.MILLISECONDS,
>   javaAsyncRetryStrategy
> ){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29498) Flink Async I/O Retry Strategies Do Not Work for Scala AsyncDataStream API

2022-10-03 Thread Eric Xiao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Xiao updated FLINK-29498:
--
Description: 
We are using the async I/O to make HTTP calls and one of the features we wanted 
to leverage was the retries, so we pulled the newest commit: 
[http://github.com/apache/flink/pull/19983] into our internal Flink fork.

When I try calling the function {{AsyncDataStream.unorderedWaitWithRetry}} from 
the scala API I with a retry strategy from the java API I get an error as 
{{unorderedWaitWithRetry}} expects a scala retry strategy. The problem is that 
retry strategies were only implemented in java and not Scala in this PR: 
[http://github.com/apache/flink/pull/19983].

 

Here is some of the code to reproduce the error:
{code:java}
import org.apache.flink.streaming.api.scala.AsyncDataStream
import org.apache.flink.streaming.util.retryable.{AsyncRetryStrategies => 
JAsyncRetryStrategies}

val javaAsyncRetryStrategy = new 
JAsyncRetryStrategies.FixedDelayRetryStrategyBuilder[Int](3, 100L)
.build()

val data = AsyncDataStream.unorderedWaitWithRetry(
  source,
  asyncOperator,
  pipelineTimeoutInMs,
  TimeUnit.MILLISECONDS,
  javaAsyncRetryStrategy
){code}

  was:
When I try calling the function \{{AsyncDataStream.unorderedWaitWithRetry}} 
from the scala API I with a retry strategy from the java API I get an error as 
\{{unorderedWaitWithRetry}} expects a scala retry strategy. The problem is that 
retry strategies were only implemented in java and not Scala in this PR: 
[http://github.com/apache/flink/pull/19983].
{code:java}
import org.apache.flink.streaming.api.scala.AsyncDataStream
import org.apache.flink.streaming.util.retryable.{AsyncRetryStrategies => 
JAsyncRetryStrategies}

val javaAsyncRetryStrategy = new 
JAsyncRetryStrategies.FixedDelayRetryStrategyBuilder[Int](3, 100L)
.build()

val data = AsyncDataStream.unorderedWaitWithRetry(
  source,
  asyncOperator,
  pipelineTimeoutInMs,
  TimeUnit.MILLISECONDS,
  javaAsyncRetryStrategy
){code}


> Flink Async I/O Retry Strategies Do Not Work for Scala AsyncDataStream API
> --
>
> Key: FLINK-29498
> URL: https://issues.apache.org/jira/browse/FLINK-29498
> Project: Flink
>  Issue Type: Bug
>  Components: API / Scala
>Affects Versions: 1.15.2
>Reporter: Eric Xiao
>Priority: Minor
>
> We are using the async I/O to make HTTP calls and one of the features we 
> wanted to leverage was the retries, so we pulled the newest commit: 
> [http://github.com/apache/flink/pull/19983] into our internal Flink fork.
> When I try calling the function {{AsyncDataStream.unorderedWaitWithRetry}} 
> from the scala API I with a retry strategy from the java API I get an error 
> as {{unorderedWaitWithRetry}} expects a scala retry strategy. The problem is 
> that retry strategies were only implemented in java and not Scala in this PR: 
> [http://github.com/apache/flink/pull/19983].
>  
> Here is some of the code to reproduce the error:
> {code:java}
> import org.apache.flink.streaming.api.scala.AsyncDataStream
> import org.apache.flink.streaming.util.retryable.{AsyncRetryStrategies => 
> JAsyncRetryStrategies}
> val javaAsyncRetryStrategy = new 
> JAsyncRetryStrategies.FixedDelayRetryStrategyBuilder[Int](3, 100L)
> .build()
> val data = AsyncDataStream.unorderedWaitWithRetry(
>   source,
>   asyncOperator,
>   pipelineTimeoutInMs,
>   TimeUnit.MILLISECONDS,
>   javaAsyncRetryStrategy
> ){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29245) [JUnit 5 Migration] Remove RetryRule

2022-10-03 Thread Nagaraj Tantri (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612480#comment-17612480
 ] 

Nagaraj Tantri commented on FLINK-29245:


Hi [~mapohl], I would like to work on this. Can I raise a PR?

> [JUnit 5 Migration] Remove RetryRule
> 
>
> Key: FLINK-29245
> URL: https://issues.apache.org/jira/browse/FLINK-29245
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation, Tests
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: starter
>
> With the transition to JUnit5, using 
> [RetryExtension|https://github.com/apache/flink/blob/78b231f60aed59061f0f609e0cfd659d78e6fdd5/flink-test-utils-parent/flink-test-utils-junit/src/main/java/org/apache/flink/testutils/junit/extensions/retry/RetryExtension.java#L43]
>  is favored, anyway. {{RetryExtension}} also utilizes the annotations 
> {{@RetryOnException}} and {{@RetryOnFailure}} which still refer to 
> {{RetryRule}} in its JavaDoc.
> This issue is about cleaning things up around {{RetryRule}} and the related 
> JavaDocs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29498) Flink Async I/O Retry Strategies Do Not Work for Scala AsyncDataStream API

2022-10-03 Thread lincoln lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612486#comment-17612486
 ] 

lincoln lee commented on FLINK-29498:
-

[~eric.xiao] There's a separate 
`org.apache.flink.streaming.api.scala.async.AsyncRetryStrategy` for scala api 
usage, the java utilities can not be used directly in scala (by design). For 
your case, you can try to implement your own `AsyncRetryStrategy` to enable 
retries.

> Flink Async I/O Retry Strategies Do Not Work for Scala AsyncDataStream API
> --
>
> Key: FLINK-29498
> URL: https://issues.apache.org/jira/browse/FLINK-29498
> Project: Flink
>  Issue Type: Bug
>  Components: API / Scala
>Affects Versions: 1.15.2
>Reporter: Eric Xiao
>Priority: Minor
>
> We are using the async I/O to make HTTP calls and one of the features we 
> wanted to leverage was the retries, so we pulled the newest commit: 
> [http://github.com/apache/flink/pull/19983] into our internal Flink fork.
> When I try calling the function {{AsyncDataStream.unorderedWaitWithRetry}} 
> from the scala API I with a retry strategy from the java API I get an error 
> as {{unorderedWaitWithRetry}} expects a scala retry strategy. The problem is 
> that retry strategies were only implemented in java and not Scala in this PR: 
> [http://github.com/apache/flink/pull/19983].
>  
> Here is some of the code to reproduce the error:
> {code:java}
> import org.apache.flink.streaming.api.scala.AsyncDataStream
> import org.apache.flink.streaming.util.retryable.{AsyncRetryStrategies => 
> JAsyncRetryStrategies}
> val javaAsyncRetryStrategy = new 
> JAsyncRetryStrategies.FixedDelayRetryStrategyBuilder[Int](3, 100L)
> .build()
> val data = AsyncDataStream.unorderedWaitWithRetry(
>   source,
>   asyncOperator,
>   pipelineTimeoutInMs,
>   TimeUnit.MILLISECONDS,
>   javaAsyncRetryStrategy
> ){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-26469) Adaptive job shows error in WebUI when not enough resource are available

2022-10-03 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz reassigned FLINK-26469:


Assignee: Dawid Wysakowicz

> Adaptive job shows error in WebUI when not enough resource are available
> 
>
> Key: FLINK-26469
> URL: https://issues.apache.org/jira/browse/FLINK-26469
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Niklas Semmler
>Assignee: Dawid Wysakowicz
>Priority: Major
>
> When there is no resource and job is in CREATED state, the job page shows the 
> error: "Job failed during initialization of JobManager". 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)