[jira] [Commented] (FLINK-25992) JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership fails on azure

2022-02-07 Thread Roman Khachatryan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488648#comment-17488648
 ] 

Roman Khachatryan commented on FLINK-25992:
---

Sure [~Jiangang], I've attached the log file (you can also download it from the 
main build page following the artifacts link).

> JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership
>  fails on azure
> -
>
> Key: FLINK-25992
> URL: https://issues.apache.org/jira/browse/FLINK-25992
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.15.0
>Reporter: Roman Khachatryan
>Priority: Major
>  Labels: test-stability
> Fix For: 1.15.0
>
> Attachments: mvn-2.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30871&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9154
> {code}
> 19:41:35,515 [flink-akka.actor.default-dispatcher-9] WARN  
> org.apache.flink.runtime.taskmanager.Task[] - jobVertex 
> (1/1)#0 (7efdea21f5f95490e02117063ce8a314) switched from RUNNING to FAILED 
> with failure cause: java.lang.RuntimeException: Error while notify checkpoint 
> ABORT.
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1457)
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpointAborted(Task.java:1407)
>   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.abortCheckpoint(TaskExecutor.java:1021)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
>   at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
>   at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
>   at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
>   at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>   at akka.actor.Actor.aroundReceive(Actor.scala:537)
>   at akka.actor.Actor.aroundReceive$(Actor.scala:535)
>   at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:548)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:231)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
>   at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
>   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> Caused by: java.lang.UnsupportedOperationException: 
> notifyCheckpointAbortAsync not supported by 
> org.apache.flink.runtime.dispatcher.JobDispatcherITCase$AtLeastOneCheckpointInvokable
>   at 
> org.apache.flink.runtime.jobgraph.tasks.AbstractInvokable.notifyCheckpointAbortAsync(AbstractInvokable.java:205)
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1430)
>   ... 31 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-25992) JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership fails on azure

2022-02-07 Thread Roman Khachatryan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488648#comment-17488648
 ] 

Roman Khachatryan edited comment on FLINK-25992 at 2/8/22, 7:56 AM:


Sure [~Jiangang], I've attached the log file (you can also download it from the 
build main page following the artifacts link).


was (Author: roman_khachatryan):
Sure [~Jiangang], I've attached the log file (you can also download it from the 
main build page following the artifacts link).

> JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership
>  fails on azure
> -
>
> Key: FLINK-25992
> URL: https://issues.apache.org/jira/browse/FLINK-25992
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.15.0
>Reporter: Roman Khachatryan
>Priority: Major
>  Labels: test-stability
> Fix For: 1.15.0
>
> Attachments: mvn-2.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30871&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9154
> {code}
> 19:41:35,515 [flink-akka.actor.default-dispatcher-9] WARN  
> org.apache.flink.runtime.taskmanager.Task[] - jobVertex 
> (1/1)#0 (7efdea21f5f95490e02117063ce8a314) switched from RUNNING to FAILED 
> with failure cause: java.lang.RuntimeException: Error while notify checkpoint 
> ABORT.
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1457)
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpointAborted(Task.java:1407)
>   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.abortCheckpoint(TaskExecutor.java:1021)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
>   at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
>   at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
>   at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
>   at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>   at akka.actor.Actor.aroundReceive(Actor.scala:537)
>   at akka.actor.Actor.aroundReceive$(Actor.scala:535)
>   at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:548)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:231)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
>   at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
>   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> Caused by: java.lang.UnsupportedOperationException: 
> notifyCheckpointAbortAsync not supported by 
> org.apache.flink.runtime.dispatcher.JobDispatcherITCase$AtLeastOneCheckpointInvokable
>   at 
> org.apache.flink.runtime.jobgraph.tasks.AbstractInvokable.notifyCheckpointAbortAsync(AbstractInvokable.java:205)
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1430)
>   ... 31 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25992) JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership fails on azure

2022-02-07 Thread Roman Khachatryan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-25992:
--
Attachment: mvn-2.log

> JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership
>  fails on azure
> -
>
> Key: FLINK-25992
> URL: https://issues.apache.org/jira/browse/FLINK-25992
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.15.0
>Reporter: Roman Khachatryan
>Priority: Major
>  Labels: test-stability
> Fix For: 1.15.0
>
> Attachments: mvn-2.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30871&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9154
> {code}
> 19:41:35,515 [flink-akka.actor.default-dispatcher-9] WARN  
> org.apache.flink.runtime.taskmanager.Task[] - jobVertex 
> (1/1)#0 (7efdea21f5f95490e02117063ce8a314) switched from RUNNING to FAILED 
> with failure cause: java.lang.RuntimeException: Error while notify checkpoint 
> ABORT.
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1457)
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpointAborted(Task.java:1407)
>   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.abortCheckpoint(TaskExecutor.java:1021)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
>   at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
>   at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
>   at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
>   at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>   at akka.actor.Actor.aroundReceive(Actor.scala:537)
>   at akka.actor.Actor.aroundReceive$(Actor.scala:535)
>   at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:548)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:231)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
>   at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
>   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> Caused by: java.lang.UnsupportedOperationException: 
> notifyCheckpointAbortAsync not supported by 
> org.apache.flink.runtime.dispatcher.JobDispatcherITCase$AtLeastOneCheckpointInvokable
>   at 
> org.apache.flink.runtime.jobgraph.tasks.AbstractInvokable.notifyCheckpointAbortAsync(AbstractInvokable.java:205)
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1430)
>   ... 31 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure

2022-02-07 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-23240:
--
Component/s: Runtime / Coordination

> ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper
>  fails on azure
> -
>
> Key: FLINK-23240
> URL: https://issues.apache.org/jira/browse/FLINK-23240
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Coordination
>Affects Versions: 1.14.0
>Reporter: Xintong Song
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186
> {code}
> Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 91.407 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Jul 04 22:17:29 [ERROR] 
> testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase)
>   Time elapsed: 31.356 s  <<< ERROR!
> Jul 04 22:17:29 java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public abstract 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
>  timed out.
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 04 22:17:29   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 04 22:17:29   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Jul 04 22:17:29   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Jul 04 22:17:29   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Jul 04 22:17:29   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> Jul 04 22:17:29  

[jira] [Updated] (FLINK-25992) JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership fails on azure

2022-02-07 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-25992:
--
Summary: 
JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership 
fails on azure  (was: 
JobDispatcherITCase..testRecoverFromCheckpointAfterLosingAndRegainingLeadership 
fails on azure)

> JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership
>  fails on azure
> -
>
> Key: FLINK-25992
> URL: https://issues.apache.org/jira/browse/FLINK-25992
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.15.0
>Reporter: Roman Khachatryan
>Priority: Major
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30871&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9154
> {code}
> 19:41:35,515 [flink-akka.actor.default-dispatcher-9] WARN  
> org.apache.flink.runtime.taskmanager.Task[] - jobVertex 
> (1/1)#0 (7efdea21f5f95490e02117063ce8a314) switched from RUNNING to FAILED 
> with failure cause: java.lang.RuntimeException: Error while notify checkpoint 
> ABORT.
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1457)
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpointAborted(Task.java:1407)
>   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.abortCheckpoint(TaskExecutor.java:1021)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
>   at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
>   at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
>   at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
>   at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>   at akka.actor.Actor.aroundReceive(Actor.scala:537)
>   at akka.actor.Actor.aroundReceive$(Actor.scala:535)
>   at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:548)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:231)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
>   at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
>   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> Caused by: java.lang.UnsupportedOperationException: 
> notifyCheckpointAbortAsync not supported by 
> org.apache.flink.runtime.dispatcher.JobDispatcherITCase$AtLeastOneCheckpointInvokable
>   at 
> org.apache.flink.runtime.jobgraph.tasks.AbstractInvokable.notifyCheckpointAbortAsync(AbstractInvokable.java:205)
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1430)
>   ... 31 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] wangyang0918 commented on a change in pull request #18531: [FLINK-24897] Enable application mode on YARN to use usrlib

2022-02-07 Thread GitBox


wangyang0918 commented on a change in pull request #18531:
URL: https://github.com/apache/flink/pull/18531#discussion_r801344011



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnApplicationFileUploader.java
##
@@ -335,7 +335,7 @@ public YarnLocalResourceDescriptor uploadFlinkDist(final 
Path localJarPath)
  *
  * @return list of class paths with the file name
  */
-List registerProvidedLocalResources() {
+List registerProvidedLocalResources(boolean 
shouldUsrLibExistInRemote) {

Review comment:
   I think no matter the `INCLUDE_USER_JAR` is configured, we should always 
not include `usrlib` in the user shipped files. This is also true for the 
provided lib.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18637: [FLINK-25433][runtime] Adds retry mechanism to DefaultResourceCleaner

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18637:
URL: https://github.com/apache/flink/pull/18637#issuecomment-1030235273


   
   ## CI report:
   
   * c57bc053bd614ab5fc2f4f375db37e2caf74c4ee Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30875)
 
   * 1e7d10905f467a6346959bfadcdeda5c6b7e570b Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30887)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] XComp commented on a change in pull request #18637: [FLINK-25433][runtime] Adds retry mechanism to DefaultResourceCleaner

2022-02-07 Thread GitBox


XComp commented on a change in pull request #18637:
URL: https://github.com/apache/flink/pull/18637#discussion_r801341748



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/dispatcher/cleanup/DefaultResourceCleanerTest.java
##
@@ -19,50 +19,52 @@
 package org.apache.flink.runtime.dispatcher.cleanup;
 
 import org.apache.flink.api.common.JobID;
+import org.apache.flink.core.testutils.FlinkAssertions;
 import 
org.apache.flink.runtime.concurrent.ComponentMainThreadExecutorServiceAdapter;
 import org.apache.flink.util.Preconditions;
 import org.apache.flink.util.concurrent.Executors;
+import org.apache.flink.util.concurrent.FixedRetryStrategy;
+import org.apache.flink.util.concurrent.FutureUtils;
+import org.apache.flink.util.concurrent.RetryStrategy;
 
-import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 
 import java.time.Duration;
+import java.util.ArrayList;
+import java.util.Collection;
 import java.util.concurrent.CompletableFuture;
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.Executor;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.BiFunction;
 import java.util.function.Consumer;
 
+import static org.apache.flink.core.testutils.FlinkAssertions.STREAM_THROWABLE;
 import static org.assertj.core.api.Assertions.assertThat;
 
 /** {@code DefaultResourceCleanerTest} tests {@link DefaultResourceCleaner}. */
 public class DefaultResourceCleanerTest {
 
+private static final Duration TIMEOUT = Duration.ofMillis(100);

Review comment:
   Looks like I was wrong ([build 
failure](https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30875&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9374)):
 We need the timeout. I added an 1h timeout.
   
   I'm not sure, yet, why, though. My understanding is due to the test using 
the `Executors.directExecutors()`, the execution should be handled in a 
non-concurrent fashion. That makes me believe that the `CompletableFuture` 
should have finished by the time we reach the asserts 🤔 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] XComp commented on a change in pull request #18637: [FLINK-25433][runtime] Adds retry mechanism to DefaultResourceCleaner

2022-02-07 Thread GitBox


XComp commented on a change in pull request #18637:
URL: https://github.com/apache/flink/pull/18637#discussion_r801341748



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/dispatcher/cleanup/DefaultResourceCleanerTest.java
##
@@ -19,50 +19,52 @@
 package org.apache.flink.runtime.dispatcher.cleanup;
 
 import org.apache.flink.api.common.JobID;
+import org.apache.flink.core.testutils.FlinkAssertions;
 import 
org.apache.flink.runtime.concurrent.ComponentMainThreadExecutorServiceAdapter;
 import org.apache.flink.util.Preconditions;
 import org.apache.flink.util.concurrent.Executors;
+import org.apache.flink.util.concurrent.FixedRetryStrategy;
+import org.apache.flink.util.concurrent.FutureUtils;
+import org.apache.flink.util.concurrent.RetryStrategy;
 
-import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 
 import java.time.Duration;
+import java.util.ArrayList;
+import java.util.Collection;
 import java.util.concurrent.CompletableFuture;
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.Executor;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.BiFunction;
 import java.util.function.Consumer;
 
+import static org.apache.flink.core.testutils.FlinkAssertions.STREAM_THROWABLE;
 import static org.assertj.core.api.Assertions.assertThat;
 
 /** {@code DefaultResourceCleanerTest} tests {@link DefaultResourceCleaner}. */
 public class DefaultResourceCleanerTest {
 
+private static final Duration TIMEOUT = Duration.ofMillis(100);

Review comment:
   Looks like I was wrong ([build 
failure](https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30875&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9374)):
 We need the timeout. Although, I'm not sure, yet, why. My understanding is due 
to the test using the `Executors.directExecutors()`, the execution should be 
handled in a non-concurrent fashion. That makes me believe that the 
`CompletableFuture` should have finished by the time we reach the asserts 🤔 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-25664) Notify will be not triggered for PipelinedSubpartition if more than one buffer is added during isBlocked == true

2022-02-07 Thread Cai Liuyang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cai Liuyang updated FLINK-25664:

Affects Version/s: 1.14.3

> Notify will be not triggered for PipelinedSubpartition if more than one 
> buffer is added during isBlocked == true
> 
>
> Key: FLINK-25664
> URL: https://issues.apache.org/jira/browse/FLINK-25664
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.3
>Reporter: Cai Liuyang
>Priority: Major
>  Labels: pull-request-available
>
> For now, there might be case like:
>  # PipelinedSubPartition only have one aligned-chk-barried-buffer (isBlocked 
> == false)
>  # CreditBasedSequenceNumberingViewReader pool this buffer and 
> PipelinedSubPartition become to Blocked (isBlocked == true)
>  # Before downStream resumeConsumption, we add two finished-buffer to this 
> PipelinedSubPartition (there is no limit for adding buffer to 
> blocked-PipelinedSubPartition)
>  ## add the first finished-buffer will not notifyDataAvailable because 
> isBlocked == true
>  ## add the second finished-buffer will also not notifyDataAvailable because 
> of isBlocked == true and finishedBuffer > 1
>  # DownStream resumeConsumption, PipelinedSubPartition is unblocked 
> (isBlocked == false)
>  # OutputFlusher call PipelinedSubPartition will not notifyDataAvailable 
> because of finishedBuffer > 1
> In conclusion,There are three case we should trigger notifyDataAvailable:
>     case1: only have one finished buffer (handled by add)
>     case2: only have one unfinished buffer (handled by flush)
>     case3: have more than on finished buffer, which is add during 
> PipelinedSubPartition is blocked (not handled)
> {code:java}
> // test code for this case
> // add this test case to 
> org.apache.flink.runtime.io.network.partition.PipelinedSubpartitionWithReadViewTest
>  
> @Test
> public void 
> testBlockedByCheckpointAndAddTwoDataBufferBeforeResumeConsumption()
> throws Exception {
> blockSubpartitionByCheckpoint(1);
> subpartition.add(createFilledFinishedBufferConsumer(BUFFER_SIZE));
> subpartition.add(createFilledFinishedBufferConsumer(BUFFER_SIZE));
> assertEquals(1, availablityListener.getNumNotifications());
> readView.resumeConsumption();
> subpartition.flush();
> assertEquals(2, availablityListener.getNumNotifications());
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25982) Support idleness with watermark alignment

2022-02-07 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-25982:
---
Description: In the PoC idleness is ignored. What it means that even if all 
splits are idle, probably the last emitted watermark will be kept reported to 
the coordinator probably causing blocking all of the remaining source operators.

> Support idleness with watermark alignment
> -
>
> Key: FLINK-25982
> URL: https://issues.apache.org/jira/browse/FLINK-25982
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Piotr Nowojski
>Priority: Major
>
> In the PoC idleness is ignored. What it means that even if all splits are 
> idle, probably the last emitted watermark will be kept reported to the 
> coordinator probably causing blocking all of the remaining source operators.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] wangyang0918 commented on a change in pull request #18531: [FLINK-24897] Enable application mode on YARN to use usrlib

2022-02-07 Thread GitBox


wangyang0918 commented on a change in pull request #18531:
URL: https://github.com/apache/flink/pull/18531#discussion_r801341361



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -910,14 +910,28 @@ private ApplicationReport startAppMaster(
 }
 
 // Upload and register user jars
-final List userClassPaths =
+List userClassPaths =

Review comment:
   Even though we could still have the `final`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wangyang0918 commented on a change in pull request #18531: [FLINK-24897] Enable application mode on YARN to use usrlib

2022-02-07 Thread GitBox


wangyang0918 commented on a change in pull request #18531:
URL: https://github.com/apache/flink/pull/18531#discussion_r801340286



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -1685,6 +1699,35 @@ void addLibFoldersToShipFiles(Collection 
effectiveShipFiles) {
 }
 }
 
+@VisibleForTesting
+void addUsrLibFolderToShipFiles(
+Collection effectiveShipFiles, Collection 
systemShipFiles) {
+// Add usrlib folder to the ship files if it exists
+// Classes in the folder will be loaded by UserClassLoader if 
CLASSPATH_INCLUDE_USER_JAR is
+// DISABLED.
+final Optional usrLibDir = getLocalUsrLibDirectory();
+
+if (usrLibDir.isPresent()) {
+File usrLibDirFile = usrLibDir.get();
+if (usrLibDirFile.isDirectory()) {
+checkArgument(

Review comment:
   Except for the naming, I do not think the 
`ClusterEntrypointUtils.tryFindUserLibDirectory()` should only work in the 
cluster side. If a user configure a customized `FLINK_LIB_DIR`, then he/she 
should also have the `usrlib` in such location. I think it is reasonable.
   
   For the naming, maybe we could have the `UserLibUtils.java`, which contains 
all the utility methods for `usrlib`. 
   
   Please note that `FLINK_LIB_DIR` will be exported in `bin/config.sh` 
automatically if not configured.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18637: [FLINK-25433][runtime] Adds retry mechanism to DefaultResourceCleaner

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18637:
URL: https://github.com/apache/flink/pull/18637#issuecomment-1030235273


   
   ## CI report:
   
   * c57bc053bd614ab5fc2f4f375db37e2caf74c4ee Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30875)
 
   * 1e7d10905f467a6346959bfadcdeda5c6b7e570b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wangyang0918 commented on a change in pull request #18531: [FLINK-24897] Enable application mode on YARN to use usrlib

2022-02-07 Thread GitBox


wangyang0918 commented on a change in pull request #18531:
URL: https://github.com/apache/flink/pull/18531#discussion_r801340286



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -1685,6 +1699,35 @@ void addLibFoldersToShipFiles(Collection 
effectiveShipFiles) {
 }
 }
 
+@VisibleForTesting
+void addUsrLibFolderToShipFiles(
+Collection effectiveShipFiles, Collection 
systemShipFiles) {
+// Add usrlib folder to the ship files if it exists
+// Classes in the folder will be loaded by UserClassLoader if 
CLASSPATH_INCLUDE_USER_JAR is
+// DISABLED.
+final Optional usrLibDir = getLocalUsrLibDirectory();
+
+if (usrLibDir.isPresent()) {
+File usrLibDirFile = usrLibDir.get();
+if (usrLibDirFile.isDirectory()) {
+checkArgument(

Review comment:
   Except for the naming, I do not think the 
`ClusterEntrypointUtils.tryFindUserLibDirectory()` should only work in the 
cluster side. If a user configure a customized `FLINK_LIB_DIR`, then he/she 
should also have the `usrlib` in such location. I think it is reasonable.
   
   For the naming, maybe we could have the `UserLibUtils.java`, which contains 
all the utility methods for `usrlib`. 
   
   Please note that `FLINK_LIB_DIR` will exported in `bin/config.sh` 
automatically if not configured.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wangyang0918 commented on a change in pull request #18531: [FLINK-24897] Enable application mode on YARN to use usrlib

2022-02-07 Thread GitBox


wangyang0918 commented on a change in pull request #18531:
URL: https://github.com/apache/flink/pull/18531#discussion_r801340286



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -1685,6 +1699,35 @@ void addLibFoldersToShipFiles(Collection 
effectiveShipFiles) {
 }
 }
 
+@VisibleForTesting
+void addUsrLibFolderToShipFiles(
+Collection effectiveShipFiles, Collection 
systemShipFiles) {
+// Add usrlib folder to the ship files if it exists
+// Classes in the folder will be loaded by UserClassLoader if 
CLASSPATH_INCLUDE_USER_JAR is
+// DISABLED.
+final Optional usrLibDir = getLocalUsrLibDirectory();
+
+if (usrLibDir.isPresent()) {
+File usrLibDirFile = usrLibDir.get();
+if (usrLibDirFile.isDirectory()) {
+checkArgument(

Review comment:
   Except for the naming, I do not think the 
`ClusterEntrypointUtils.tryFindUserLibDirectory()` should only work in the 
cluster side. If a user configure a customized `FLINK_LIB_DIR`, then he/she 
should also have the `usrlib` in such location. I think it is reasonable.
   
   For the naming, maybe we could have the `UserLibUtils.java`, which contains 
all the utility methods for `usrlib`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-24947) Support host network for native K8s integration

2022-02-07 Thread Yuan Huang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488642#comment-17488642
 ] 

Yuan Huang  commented on FLINK-24947:
-

Yeah, I agree. And this violated the idea we want to use Host-Network. Thank 
you for your answer~

> Support host network for native K8s integration
> ---
>
> Key: FLINK-24947
> URL: https://issues.apache.org/jira/browse/FLINK-24947
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes
>Reporter: liuzhuo
>Assignee: liuzhuo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> For the use of flink on k8s, for performance considerations, it is important 
> to choose a CNI plug-in. Usually we have two environments: Managed and 
> UnManaged.
>   Managed: Cloud vendors usually provide very efficient CNI plug-ins, we 
> don’t need to care about network performance issues
>   UnManaged: On self-built K8s clusters, CNI plug-ins are usually optional, 
> similar to Flannel and Calcico, but such software network cards usually lose 
> some performance or require some additional network strategies.
> For an unmanaged environment, if we also want to achieve the best network 
> performance, should we support the *HostNetWork* model?
> Use the host network to achieve the best performance



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] KarmaGYZ commented on a change in pull request #18360: [FLINK-25329][runtime] Support memory execution graph store in session cluster

2022-02-07 Thread GitBox


KarmaGYZ commented on a change in pull request #18360:
URL: https://github.com/apache/flink/pull/18360#discussion_r801322086



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/MemoryExecutionGraphInfoStore.java
##
@@ -20,36 +20,87 @@
 
 import org.apache.flink.api.common.JobID;
 import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.time.Time;
 import org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph;
 import org.apache.flink.runtime.messages.webmonitor.JobDetails;
 import org.apache.flink.runtime.messages.webmonitor.JobsOverview;
 import org.apache.flink.runtime.scheduler.ExecutionGraphInfo;
+import org.apache.flink.util.ShutdownHookUtil;
+import org.apache.flink.util.concurrent.ScheduledExecutor;
+
+import org.apache.flink.shaded.guava30.com.google.common.base.Ticker;
+import org.apache.flink.shaded.guava30.com.google.common.cache.Cache;
+import org.apache.flink.shaded.guava30.com.google.common.cache.CacheBuilder;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import javax.annotation.Nullable;
 
 import java.io.IOException;
 import java.util.Collection;
-import java.util.HashMap;
-import java.util.Map;
+import java.util.concurrent.ScheduledFuture;
+import java.util.concurrent.TimeUnit;
 import java.util.stream.Collectors;
 
 /**
  * {@link ExecutionGraphInfoStore} implementation which stores the {@link 
ArchivedExecutionGraph} in
- * memory.
+ * memory. The memory store support to keep maximum job graphs and remove the 
timeout ones.
  */
 public class MemoryExecutionGraphInfoStore implements ExecutionGraphInfoStore {
 
-private final Map 
serializableExecutionGraphInfos = new HashMap<>(4);
+private static final Logger LOG = 
LoggerFactory.getLogger(MemoryExecutionGraphInfoStore.class);
+
+private final Cache 
serializableExecutionGraphInfos;
+
+@Nullable private final ScheduledFuture cleanupFuture;
+
+private final Thread shutdownHook;
+
+public MemoryExecutionGraphInfoStore() {
+this(Time.milliseconds(0), 0, null, null);
+}
+
+public MemoryExecutionGraphInfoStore(
+Time expirationTime,
+int maximumCapacity,
+@Nullable ScheduledExecutor scheduledExecutor,
+@Nullable Ticker ticker) {
+final long expirationMills = expirationTime.toMilliseconds();
+CacheBuilder cacheBuilder = CacheBuilder.newBuilder();
+if (expirationMills > 0) {
+cacheBuilder.expireAfterWrite(expirationMills, 
TimeUnit.MILLISECONDS);
+}
+if (maximumCapacity > 0) {
+cacheBuilder.maximumSize(maximumCapacity);
+}
+if (ticker != null) {
+cacheBuilder.ticker(ticker);
+}
+
+this.serializableExecutionGraphInfos = cacheBuilder.build();
+if (scheduledExecutor != null) {
+this.cleanupFuture =
+scheduledExecutor.scheduleWithFixedDelay(
+serializableExecutionGraphInfos::cleanUp,
+expirationTime.toMilliseconds(),
+expirationTime.toMilliseconds(),
+TimeUnit.MILLISECONDS);
+} else {
+this.cleanupFuture = null;
+}
+this.shutdownHook = ShutdownHookUtil.addShutdownHook(this, 
getClass().getSimpleName(), LOG);
+}
 
 @Override
 public int size() {
-return serializableExecutionGraphInfos.size();
+return (int) serializableExecutionGraphInfos.size();

Review comment:
   ```suggestion
   return Math.toIntExact(serializableExecutionGraphInfos.size());
   ```

##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/dispatcher/MemoryExecutionGraphInfoStoreTest.java
##
@@ -0,0 +1,288 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.dispatcher;
+
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.api.common.JobStatus;
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.configuration.JobManagerOptions;
+import org.apache.flink.runtime

[GitHub] [flink] wangyang0918 commented on a change in pull request #18531: [FLINK-24897] Enable application mode on YARN to use usrlib

2022-02-07 Thread GitBox


wangyang0918 commented on a change in pull request #18531:
URL: https://github.com/apache/flink/pull/18531#discussion_r801299045



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnApplicationFileUploader.java
##
@@ -438,6 +446,18 @@ private static boolean isPlugin(Path path) {
 return false;
 }
 
+private static boolean isUsrLib(Path path) {

Review comment:
   I suggest to introduce the `isUsrLibDirIncludedInProvidedLib` instead 
and then `checkArgument` in the constructor of `YarnApplicationFileUploader`. 
Because the files in the provided lib will be relativized, we just need to 
check the name of provided lib.
   
   ```
   private boolean isUsrLibDirIncludedInProvidedLib(final List 
providedLibDirs)
   throws IOException {
   for (Path path : providedLibDirs) {
   final FileStatus fileStatus = fileSystem.getFileStatus(path);
   if (fileStatus.isDirectory()
   && 
ConfigConstants.DEFAULT_FLINK_USR_LIB_DIR.equals(path.getName())) {
   return true;
   }
   }
   return false;
   }
   ```





-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] bgeng777 commented on a change in pull request #18531: [FLINK-24897] Enable application mode on YARN to use usrlib

2022-02-07 Thread GitBox


bgeng777 commented on a change in pull request #18531:
URL: https://github.com/apache/flink/pull/18531#discussion_r801334847



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnApplicationFileUploader.java
##
@@ -335,7 +335,7 @@ public YarnLocalResourceDescriptor uploadFlinkDist(final 
Path localJarPath)
  *
  * @return list of class paths with the file name
  */
-List registerProvidedLocalResources() {
+List registerProvidedLocalResources(boolean 
shouldUsrLibExistInRemote) {

Review comment:
   Yes, I totally agree with the assumption and that it just have the same 
semantic with ship files. 
   But in ship files, we will check if `usrlib` exists in ship files and if so 
and `INCLUDE_USER_JAR` is DISABLED, we will throw exception due to our previous 
discussion. 
   The reason for me to adding such code is for this case: users have local 
`usrlib` and in the remote dirs specified by `provided.lib.dirs`, there is also 
a remote `usrlib`. I want to get rid of  such case as it could lead to 
overriding files in YARN cache's `usrlib`.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-24947) Support host network for native K8s integration

2022-02-07 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488640#comment-17488640
 ] 

Yang Wang commented on FLINK-24947:
---

If we are using ClusterIP for the internal service, then we have the chance to 
reuse the TaskManager pods when JobManager fails over since we could use the 
"internal-service(stable cluster ip): port(6123)" to perform the RPC 
communication.

But I think it is unnecessary. For YARN deployment, the TaskManager containers 
also could not be reused.

> Support host network for native K8s integration
> ---
>
> Key: FLINK-24947
> URL: https://issues.apache.org/jira/browse/FLINK-24947
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes
>Reporter: liuzhuo
>Assignee: liuzhuo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> For the use of flink on k8s, for performance considerations, it is important 
> to choose a CNI plug-in. Usually we have two environments: Managed and 
> UnManaged.
>   Managed: Cloud vendors usually provide very efficient CNI plug-ins, we 
> don’t need to care about network performance issues
>   UnManaged: On self-built K8s clusters, CNI plug-ins are usually optional, 
> similar to Flannel and Calcico, but such software network cards usually lose 
> some performance or require some additional network strategies.
> For an unmanaged environment, if we also want to achieve the best network 
> performance, should we support the *HostNetWork* model?
> Use the host network to achieve the best performance



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-22588) There is a configuration description error in the document

2022-02-07 Thread Yao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488639#comment-17488639
 ] 

Yao Zhang commented on FLINK-22588:
---

Hi [~jark] and [~chesnay] ,

It seems that this ticket is left untouched for quite a long time. There are 
still many inconsistencies in the docs. I think we need to discuss whether '1 
h' or '1h' is the formal one. I would like to fix the docs.

> There is a configuration description error in the document
> --
>
> Key: FLINK-22588
> URL: https://issues.apache.org/jira/browse/FLINK-22588
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, Documentation, Table SQL / 
> Ecosystem
>Affects Versions: 1.13.0
>Reporter: forideal
>Priority: Minor
>  Labels: pull-request-available
>
> origin:
> ‘sink.partition-commit.delay’=‘1h’ 
> correct:
> ‘sink.partition-commit.delay’=‘1 h’ 
> [One space missing here]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/table/filesystem/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] bgeng777 commented on a change in pull request #18531: [FLINK-24897] Enable application mode on YARN to use usrlib

2022-02-07 Thread GitBox


bgeng777 commented on a change in pull request #18531:
URL: https://github.com/apache/flink/pull/18531#discussion_r801326923



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -1799,4 +1842,27 @@ public static void logDetachedClusterInformation(
 yarnApplicationId,
 yarnApplicationId);
 }
+
+/**

Review comment:
   See 
https://github.com/apache/flink/pull/18531/files/5eb98df7422ffca67eb4d258189ff1eadb36731e#discussion_r801320099




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] bgeng777 commented on a change in pull request #18531: [FLINK-24897] Enable application mode on YARN to use usrlib

2022-02-07 Thread GitBox


bgeng777 commented on a change in pull request #18531:
URL: https://github.com/apache/flink/pull/18531#discussion_r801326773



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -910,14 +910,28 @@ private ApplicationReport startAppMaster(
 }
 
 // Upload and register user jars
-final List userClassPaths =
+List userClassPaths =

Review comment:
   Because we need to check if usrlib exists later. If usrlib exists, we 
need `userClassPaths.addAll(usrLibClassPaths);` so we can reuse current code to 
make other options o YarnConfigOptions.UserJarInclusion work as expected.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18548: [FLINK-25727][hive] fix the UDFArgumentException when call hive `json…

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18548:
URL: https://github.com/apache/flink/pull/18548#issuecomment-1023865581


   
   ## CI report:
   
   * 1046ea53f5d166b8c64e35087b8d3c5e20f0932a Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30880)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] bgeng777 commented on a change in pull request #18531: [FLINK-24897] Enable application mode on YARN to use usrlib

2022-02-07 Thread GitBox


bgeng777 commented on a change in pull request #18531:
URL: https://github.com/apache/flink/pull/18531#discussion_r801324145



##
File path: flink-yarn/pom.xml
##
@@ -130,6 +130,11 @@ under the License.



+   

Review comment:
   My bad. Removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-25998) Flink akka runs into NoClassDefFoundError on shutdown

2022-02-07 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488632#comment-17488632
 ] 

Robert Metzger commented on FLINK-25998:


[~chesnay] Could this be related to loading akka using a separate classloader?

> Flink akka runs into NoClassDefFoundError on shutdown
> -
>
> Key: FLINK-25998
> URL: https://issues.apache.org/jira/browse/FLINK-25998
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Robert Metzger
>Priority: Major
>
> When trying to start a standalone jobmanager on an unavailable port, I see 
> the following unexpected exception:
> {code}
> 2022-02-08 08:07:18,299 INFO  akka.remote.Remoting
>  [] - Starting remoting
> 2022-02-08 08:07:18,357 ERROR akka.remote.transport.netty.NettyTransport  
>  [] - failed to bind to /0.0.0.0:6123, shutting down Netty 
> transport
> 2022-02-08 08:07:18,373 INFO  
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - Shutting 
> StandaloneApplicationClusterEntryPoint down with application status FAILED. 
> Diagnostics java.net.BindException: Could not start actor system on any port 
> in port range 6123
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaBootstrapTools.startRemoteActorSystem(AkkaBootstrapTools.java:133)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils$AkkaRpcServiceBuilder.createAndStart(AkkaRpcServiceUtils.java:358)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils$AkkaRpcServiceBuilder.createAndStart(AkkaRpcServiceUtils.java:327)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils$AkkaRpcServiceBuilder.createAndStart(AkkaRpcServiceUtils.java:247)
>   at 
> org.apache.flink.runtime.rpc.RpcUtils.createRemoteRpcService(RpcUtils.java:191)
>   at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:334)
>   at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:253)
>   at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:203)
>   at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>   at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:200)
>   at 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:684)
>   at 
> org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:82)
> .
> 2022-02-08 08:07:18,377 INFO  
> akka.remote.RemoteActorRefProvider$RemotingTerminator[] - Shutting 
> down remote daemon.
> 2022-02-08 08:07:18,377 ERROR org.apache.flink.util.FatalExitExceptionHandler 
>  [] - FATAL: Thread 
> 'flink-akka.remote.default-remote-dispatcher-6' produced an uncaught 
> exception. Stopping the process...
> java.lang.NoClassDefFoundError: 
> akka/actor/dungeon/FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1
>   at 
> akka.actor.dungeon.FaultHandling.handleNonFatalOrInterruptedException(FaultHandling.scala:334)
>  ~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
>   at 
> akka.actor.dungeon.FaultHandling.handleNonFatalOrInterruptedException$(FaultHandling.scala:334)
>  ~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
>   at 
> akka.actor.ActorCell.handleNonFatalOrInterruptedException(ActorCell.scala:411)
>  ~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
>   at akka.actor.ActorCell.invoke(ActorCell.scala:551) 
> ~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) 
> ~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
>   at akka.dispatch.Mailbox.run(Mailbox.scala:231) 
> ~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:243) 
> [flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) 
> [?:1.8.0_312]
>   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) 
> [?:1.8.0_312]
>   at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) 
> [?:1.8.0_312]
>   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) 
> [?:1.8.0_312]
> Caused by: java.lang.ClassNotFoundException: 
> akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedExce

[jira] [Created] (FLINK-25998) Flink akka runs into NoClassDefFoundError on shutdown

2022-02-07 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-25998:
--

 Summary: Flink akka runs into NoClassDefFoundError on shutdown
 Key: FLINK-25998
 URL: https://issues.apache.org/jira/browse/FLINK-25998
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Robert Metzger


When trying to start a standalone jobmanager on an unavailable port, I see the 
following unexpected exception:

{code}
2022-02-08 08:07:18,299 INFO  akka.remote.Remoting  
   [] - Starting remoting
2022-02-08 08:07:18,357 ERROR akka.remote.transport.netty.NettyTransport
   [] - failed to bind to /0.0.0.0:6123, shutting down Netty transport
2022-02-08 08:07:18,373 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - Shutting 
StandaloneApplicationClusterEntryPoint down with application status FAILED. 
Diagnostics java.net.BindException: Could not start actor system on any port in 
port range 6123
at 
org.apache.flink.runtime.rpc.akka.AkkaBootstrapTools.startRemoteActorSystem(AkkaBootstrapTools.java:133)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils$AkkaRpcServiceBuilder.createAndStart(AkkaRpcServiceUtils.java:358)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils$AkkaRpcServiceBuilder.createAndStart(AkkaRpcServiceUtils.java:327)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils$AkkaRpcServiceBuilder.createAndStart(AkkaRpcServiceUtils.java:247)
at 
org.apache.flink.runtime.rpc.RpcUtils.createRemoteRpcService(RpcUtils.java:191)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:334)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:253)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:203)
at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:200)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:684)
at 
org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:82)
.
2022-02-08 08:07:18,377 INFO  
akka.remote.RemoteActorRefProvider$RemotingTerminator[] - Shutting down 
remote daemon.
2022-02-08 08:07:18,377 ERROR org.apache.flink.util.FatalExitExceptionHandler   
   [] - FATAL: Thread 'flink-akka.remote.default-remote-dispatcher-6' 
produced an uncaught exception. Stopping the process...
java.lang.NoClassDefFoundError: 
akka/actor/dungeon/FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1
at 
akka.actor.dungeon.FaultHandling.handleNonFatalOrInterruptedException(FaultHandling.scala:334)
 ~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at 
akka.actor.dungeon.FaultHandling.handleNonFatalOrInterruptedException$(FaultHandling.scala:334)
 ~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at 
akka.actor.ActorCell.handleNonFatalOrInterruptedException(ActorCell.scala:411) 
~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at akka.actor.ActorCell.invoke(ActorCell.scala:551) 
~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) 
~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at akka.dispatch.Mailbox.run(Mailbox.scala:231) 
~[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at akka.dispatch.Mailbox.exec(Mailbox.scala:243) 
[flink-rpc-akka_ce724655-52fe-4b3a-8cdc-b79ab446e34d.jar:1.15-SNAPSHOT]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) 
[?:1.8.0_312]
at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) 
[?:1.8.0_312]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) 
[?:1.8.0_312]
at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) 
[?:1.8.0_312]
Caused by: java.lang.ClassNotFoundException: 
akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:387) 
~[?:1.8.0_312]
at java.lang.ClassLoader.loadClass(ClassLoader.java:419) ~[?:1.8.0_312]
at 
org.apache.flink.core.classloading.ComponentClassLoader.loadClassFromComponentOnly(ComponentClassLoader.java:149)
 ~[flink-dist-1.15-SNAPSHOT.jar:1.15-SNAPSHOT]
at 
org.apache.flink.core.classloading.ComponentClassLoader.loadClass(ComponentClassLoader

[jira] [Commented] (FLINK-24947) Support host network for native K8s integration

2022-02-07 Thread Yuan Huang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488631#comment-17488631
 ] 

Yuan Huang  commented on FLINK-24947:
-

Thank you for your reply. Yes, I think so. And I believe even though Flink uses 
the head service, the TaskManager also can not be reused if Task managers do 
not want to query the service for the "jobmanager-rpc" port because the port 
number is not fixed.

I can imagine two reasons why the TaskManager does not want to access the 
service:
 # Involving Kube-client in all TaskManagers will put too much pressure on the 
Kubernetes API server.
 # In non-HA mode, the StandaloneHaService saves a fixed job manager address 
(IP and port) at the beginning. It is not natural to change the address when 
the TaskManagers lose connection to the JobManager. 

Am I understanding correctly?

> Support host network for native K8s integration
> ---
>
> Key: FLINK-24947
> URL: https://issues.apache.org/jira/browse/FLINK-24947
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes
>Reporter: liuzhuo
>Assignee: liuzhuo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> For the use of flink on k8s, for performance considerations, it is important 
> to choose a CNI plug-in. Usually we have two environments: Managed and 
> UnManaged.
>   Managed: Cloud vendors usually provide very efficient CNI plug-ins, we 
> don’t need to care about network performance issues
>   UnManaged: On self-built K8s clusters, CNI plug-ins are usually optional, 
> similar to Flannel and Calcico, but such software network cards usually lose 
> some performance or require some additional network strategies.
> For an unmanaged environment, if we also want to achieve the best network 
> performance, should we support the *HostNetWork* model?
> Use the host network to achieve the best performance



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] flinkbot edited a comment on pull request #18653: [FLINK-25825][connector-jdbc] MySqlCatalogITCase fails on azure

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18653:
URL: https://github.com/apache/flink/pull/18653#issuecomment-1032212433


   
   ## CI report:
   
   * b5b8d7da0088b0918da7b33a983ac41efa2e7da5 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30881)
 
   * 065c43692477ad54dc906bc6fb0b6afe8082ed0c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30882)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] bgeng777 commented on a change in pull request #18531: [FLINK-24897] Enable application mode on YARN to use usrlib

2022-02-07 Thread GitBox


bgeng777 commented on a change in pull request #18531:
URL: https://github.com/apache/flink/pull/18531#discussion_r801320099



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -1685,6 +1699,35 @@ void addLibFoldersToShipFiles(Collection 
effectiveShipFiles) {
 }
 }
 
+@VisibleForTesting
+void addUsrLibFolderToShipFiles(
+Collection effectiveShipFiles, Collection 
systemShipFiles) {
+// Add usrlib folder to the ship files if it exists
+// Classes in the folder will be loaded by UserClassLoader if 
CLASSPATH_INCLUDE_USER_JAR is
+// DISABLED.
+final Optional usrLibDir = getLocalUsrLibDirectory();
+
+if (usrLibDir.isPresent()) {
+File usrLibDirFile = usrLibDir.get();
+if (usrLibDirFile.isDirectory()) {
+checkArgument(

Review comment:
   Hi  Yang, thanks for the feedback. In fact, my first version of codes is 
pretty like your code snippet. 
   But after thinking more carefully, I use my current implementation 
(including writing my own getLocalUsrLibDirectory()) for the following reason:
   1. AFAIK, YarnClusterDescriptor is executed in the flink client side while 
ClusterEntryPointUtil is executed on the remote cluster side. If I use the 
`ClusterEntrypointUtils.tryFindUserLibDirectory()` in client side, due to its 
impl, the user must specify the FLINK_LIB_DIR correctly. But IIUC, we do not 
have such requirement before when using YARN. 
   2. In my `getLocalUsrLibDirectory()`, I use the FLINK_HOME as a pivot to 
find the `usrlib`. This is not so perfect but comparing with FLINK_LIB_DIR, 
FLINK_HOME may be more widely used.
   
   In summary, I bet the key question is if we should ask the action of finding 
`usrlib` to adopt same logic(using FLINK_LIB_DIR or use current working dir) in 
both client and cluster side.  Do you have any suggestion? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18363: [Flink-25600][table-planner] Support new statement set syntax in sql client and update docs

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18363:
URL: https://github.com/apache/flink/pull/18363#issuecomment-1012985353


   
   ## CI report:
   
   * 674933c78081256538559fbdd1d414ee068ed7f9 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30879)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-25679) Build arm64 Linux images for Apache Flink

2022-02-07 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-25679.
--
Fix Version/s: 1.15.0
   Resolution: Fixed

> Build arm64 Linux images for Apache Flink
> -
>
> Key: FLINK-25679
> URL: https://issues.apache.org/jira/browse/FLINK-25679
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-docker
>Affects Versions: 1.15.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> Building Flink images for arm64 Linux should be trivial to support, since 
> upstream docker images support arm64, as well as frocksdb.
> Building the images locally is also easily possible using Docker's buildx 
> features, and the build system of the official docker images most likely 
> supports ARM arch.
> This improvement would allow us supporting development / testing on Apple 
> M1-based systems, as well as ARM architecture at various cloud providers (AWS 
> Graviton)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25679) Build arm64 Linux images for Apache Flink

2022-02-07 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488628#comment-17488628
 ] 

Robert Metzger commented on FLINK-25679:


It worked (not for all tags, only for some, but I guess that's an inconsistency 
of the docker hub build system).
I assume with the next Flink release, we'll have both architectures for all 
tags.

> Build arm64 Linux images for Apache Flink
> -
>
> Key: FLINK-25679
> URL: https://issues.apache.org/jira/browse/FLINK-25679
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-docker
>Affects Versions: 1.15.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available
>
> Building Flink images for arm64 Linux should be trivial to support, since 
> upstream docker images support arm64, as well as frocksdb.
> Building the images locally is also easily possible using Docker's buildx 
> features, and the build system of the official docker images most likely 
> supports ARM arch.
> This improvement would allow us supporting development / testing on Apple 
> M1-based systems, as well as ARM architecture at various cloud providers (AWS 
> Graviton)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] MrWhiteSike commented on pull request #18655: [FLINK-25799] [docs] Translate table/filesystem.md page into Chinese.

2022-02-07 Thread GitBox


MrWhiteSike commented on pull request #18655:
URL: https://github.com/apache/flink/pull/18655#issuecomment-1032274801


   Hi, [@wuchong](https://github.com/wuchong) Would you mind helping to check 
this pr?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #16105: [FLINK-17775][java] Cannot set batch job name when using collect

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #16105:
URL: https://github.com/apache/flink/pull/16105#issuecomment-856602024


   
   ## CI report:
   
   * d968667c3ee2d1acb59ca901f7c5f4f2fec1497c Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=18777)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=22031)
 
   * dcd051e8d4a11b38830bce21e04aa6e141b085e7 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30886)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #16105: [FLINK-17775][java] Cannot set batch job name when using collect

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #16105:
URL: https://github.com/apache/flink/pull/16105#issuecomment-856602024


   
   ## CI report:
   
   * d968667c3ee2d1acb59ca901f7c5f4f2fec1497c Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=18777)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=22031)
 
   * dcd051e8d4a11b38830bce21e04aa6e141b085e7 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18360: [FLINK-25329][runtime] Support memory execution graph store in session cluster

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18360:
URL: https://github.com/apache/flink/pull/18360#issuecomment-1012981724


   
   ## CI report:
   
   * 5d6e58afe6c78826c56e5cd802f2f3ef10a0133f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30470)
 
   * 72130f1d774fa5a1d55b58605c226727a2bb4362 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30885)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zjureel commented on pull request #18360: [FLINK-25329][runtime] Support memory execution graph store in session cluster

2022-02-07 Thread GitBox


zjureel commented on pull request #18360:
URL: https://github.com/apache/flink/pull/18360#issuecomment-1032267886


   Thanks @KarmaGYZ , use cache in `MemoryExecutionGraphInfoStore` instead of 
queue and hashmap sounds good to me. I have update the code in 
`MemoryExecutionGraphInfoStore` and the test case, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18655: [FLINK-25799] [docs] Translate table/filesystem.md page into Chinese.

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18655:
URL: https://github.com/apache/flink/pull/18655#issuecomment-1032265310


   
   ## CI report:
   
   * a41a95c7619f337c9fd07cdca68e572921f61de8 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30884)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18360: [FLINK-25329][runtime] Support memory execution graph store in session cluster

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18360:
URL: https://github.com/apache/flink/pull/18360#issuecomment-1012981724


   
   ## CI report:
   
   * 5d6e58afe6c78826c56e5cd802f2f3ef10a0133f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30470)
 
   * 72130f1d774fa5a1d55b58605c226727a2bb4362 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #18655: [FLINK-25799] [docs] Translate table/filesystem.md page into Chinese.

2022-02-07 Thread GitBox


flinkbot commented on pull request #18655:
URL: https://github.com/apache/flink/pull/18655#issuecomment-1032266881


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit a41a95c7619f337c9fd07cdca68e572921f61de8 (Tue Feb 08 
06:48:08 UTC 2022)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #18655: [FLINK-25799] [docs] Translate table/filesystem.md page into Chinese.

2022-02-07 Thread GitBox


flinkbot commented on pull request #18655:
URL: https://github.com/apache/flink/pull/18655#issuecomment-1032265310


   
   ## CI report:
   
   * a41a95c7619f337c9fd07cdca68e572921f61de8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-25799) Translate table/filesystem.md page into Chinese.

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-25799:
---
Labels: chinese-translation pull-request-available  (was: 
chinese-translation)

> Translate table/filesystem.md page into Chinese.
> 
>
> Key: FLINK-25799
> URL: https://issues.apache.org/jira/browse/FLINK-25799
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: RocMarshal
>Assignee: baisike
>Priority: Minor
>  Labels: chinese-translation, pull-request-available
>
> docs/content.zh/docs/connectors/table/filesystem.md



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25992) JobDispatcherITCase..testRecoverFromCheckpointAfterLosingAndRegainingLeadership fails on azure

2022-02-07 Thread Liu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488622#comment-17488622
 ] 

Liu commented on FLINK-25992:
-

[~roman] Thanks for the report. Indeed, the class AtLeastOneCheckpointInvokable 
doesn't override the method notifyCheckpointAbortAsync. Since the code '

AtLeastOneCheckpointInvokable.atLeastOneCheckpointCompleted.await();

' is finished, there must be one complete checkpoint to recover later. I am 
confused with the AssertionError. The case is hard to reproduce in my local 
host. Can you show me the full log?

> JobDispatcherITCase..testRecoverFromCheckpointAfterLosingAndRegainingLeadership
>  fails on azure
> --
>
> Key: FLINK-25992
> URL: https://issues.apache.org/jira/browse/FLINK-25992
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.15.0
>Reporter: Roman Khachatryan
>Priority: Major
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30871&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9154
> {code}
> 19:41:35,515 [flink-akka.actor.default-dispatcher-9] WARN  
> org.apache.flink.runtime.taskmanager.Task[] - jobVertex 
> (1/1)#0 (7efdea21f5f95490e02117063ce8a314) switched from RUNNING to FAILED 
> with failure cause: java.lang.RuntimeException: Error while notify checkpoint 
> ABORT.
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1457)
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpointAborted(Task.java:1407)
>   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.abortCheckpoint(TaskExecutor.java:1021)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
>   at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
>   at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
>   at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
>   at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>   at akka.actor.Actor.aroundReceive(Actor.scala:537)
>   at akka.actor.Actor.aroundReceive$(Actor.scala:535)
>   at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:548)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:231)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
>   at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
>   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> Caused by: java.lang.UnsupportedOperationException: 
> notifyCheckpointAbortAsync not supported by 
> org.apache.flink.runtime.dispatcher.JobDispatcherITCase$AtLeastOneCheckpointInvokable
>   at 
> org.apache.flink.runtime.jobgraph.tasks.AbstractInvokable.notifyCheckpointAbortAsync(AbstractInvokable.java:205)
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1430)
>   ... 31 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] MrWhiteSike opened a new pull request #18655: [FLINK-25799] [docs] Translate table/filesystem.md page into Chinese.

2022-02-07 Thread GitBox


MrWhiteSike opened a new pull request #18655:
URL: https://github.com/apache/flink/pull/18655


   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluser with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18654: [FLINK-25997][state/changelog] Fix materialization boundary for FsSta…

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18654:
URL: https://github.com/apache/flink/pull/18654#issuecomment-1032258711


   
   ## CI report:
   
   * 23cd7d2fe248dc4020a349bf64e7c9cc6e716c4c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30883)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-24947) Support host network for native K8s integration

2022-02-07 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488620#comment-17488620
 ] 

Yang Wang commented on FLINK-24947:
---

I believe the TaskManager pods could not be reused with host network enabled 
when JobManager failover. Because we are using the headless service for 
internal service.

> Support host network for native K8s integration
> ---
>
> Key: FLINK-24947
> URL: https://issues.apache.org/jira/browse/FLINK-24947
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes
>Reporter: liuzhuo
>Assignee: liuzhuo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> For the use of flink on k8s, for performance considerations, it is important 
> to choose a CNI plug-in. Usually we have two environments: Managed and 
> UnManaged.
>   Managed: Cloud vendors usually provide very efficient CNI plug-ins, we 
> don’t need to care about network performance issues
>   UnManaged: On self-built K8s clusters, CNI plug-ins are usually optional, 
> similar to Flannel and Calcico, but such software network cards usually lose 
> some performance or require some additional network strategies.
> For an unmanaged environment, if we also want to achieve the best network 
> performance, should we support the *HostNetWork* model?
> Use the host network to achieve the best performance



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] flinkbot commented on pull request #18654: [FLINK-25997][state/changelog] Fix materialization boundary for FsSta…

2022-02-07 Thread GitBox


flinkbot commented on pull request #18654:
URL: https://github.com/apache/flink/pull/18654#issuecomment-1032260570


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 23cd7d2fe248dc4020a349bf64e7c9cc6e716c4c (Tue Feb 08 
06:36:56 UTC 2022)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-25997).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] masteryhx commented on pull request #18654: [FLINK-25997][state/changelog] Fix materialization boundary for FsSta…

2022-02-07 Thread GitBox


masteryhx commented on pull request #18654:
URL: https://github.com/apache/flink/pull/18654#issuecomment-1032260411


   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18654: [FLINK-25997][state/changelog] Fix materialization boundary for FsSta…

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18654:
URL: https://github.com/apache/flink/pull/18654#issuecomment-1032258711


   
   ## CI report:
   
   * 23cd7d2fe248dc4020a349bf64e7c9cc6e716c4c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30883)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #18654: [FLINK-25997][state/changelog] Fix materialization boundary for FsSta…

2022-02-07 Thread GitBox


flinkbot commented on pull request #18654:
URL: https://github.com/apache/flink/pull/18654#issuecomment-1032258711


   
   ## CI report:
   
   * 23cd7d2fe248dc4020a349bf64e7c9cc6e716c4c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-25997) FsStateChangelogWriter cannot work well when there is no change appended

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-25997:
---
Labels: pull-request-available  (was: )

> FsStateChangelogWriter cannot work well when there is no change appended
> 
>
> Key: FLINK-25997
> URL: https://issues.apache.org/jira/browse/FLINK-25997
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.15.0
>Reporter: Hangxiang Yu
>Priority: Major
>  Labels: pull-request-available
>
> For FsStateChangelogWriter, If there is no change appended, we should use the 
> old sqn, but we always use sqn.next() currently.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] masteryhx opened a new pull request #18654: [FLINK-25997][state/changelog] Fix materialization boundary for FsSta…

2022-02-07 Thread GitBox


masteryhx opened a new pull request #18654:
URL: https://github.com/apache/flink/pull/18654


   
   
   ## What is the purpose of the change
   
   This pull request modifies the way ChangelogKeyedStateBackend uses the sqn 
maintained by FsStateChangelogWriter.
   Avoid always using sqn.next() to act as "upTo" sqn, which is not correct 
when there is no change appended.
   
   ## Brief change log
   
   - *Replace "lastAppendedSequenceNumber" with "nextAppendedSequenceNumber"*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   This change is already covered by existing tests, such as 
*ChangelogDelegateHashMapTest*.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no 
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-24295) Too many requestPartitionState may jam the JobManager during task deployment

2022-02-07 Thread Zhilong Hong (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhilong Hong updated FLINK-24295:
-
Component/s: Runtime / Network
 (was: Runtime / Coordination)

> Too many requestPartitionState may jam the JobManager during task deployment
> 
>
> Key: FLINK-24295
> URL: https://issues.apache.org/jira/browse/FLINK-24295
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.14.0, 1.15.0
>Reporter: Zhilong Hong
>Priority: Major
>
> After the optimization of the phase 2 we've done in FLINK-21110, the speed of 
> task deployment has accelerated. However, we find that during the task 
> deployment, there may be too many {{requestPartitionState}} RPC calls from 
> TaskManagers that would jam the JobManager.
> Why would there be too many {{requestPartitionState}} RPC calls? After the 
> optimization, the JobManager can submit tasks to TaskManagers quickly. If 
> JobManager calls {{submitTask}} faster than the speed of dealing with 
> {{submitTask}} by TaskManagers, there may be a scenario that some 
> TaskManagers deploy tasks faster than other TaskManagers.
> When a downstream task is deployed, it would try to request partitions from 
> upstream tasks, which may be located at a remote TaskManager. If the upstream 
> tasks are not deployed, it would request the partition state from JobManager. 
> In the worst case, the complexity of the computation and memory would be 
> O(N^2).
> In our test with a streaming job, which has two vertices with the 8,000 
> parallelism and connected with all-to-all edges, in the worst case, there 
> will be 32,000,000 {{requestPartitionState}} RPC calls in the JobManager. 
> Each RPC call requires 1 KiB space in the heap memory of the JobManager. The 
> overall space cost of {{requestPartitionState}} will be 32 GiB, which is a 
> heavy burden for GC to deal with.
> In our test, the size of the heap memory of JobManager is 8 GiB. During the 
> task deployment the JobManager gets more full GCs. The JobManager gets stuck 
> since it is filled with full GCs and has no time to deal with the incoming 
> RPC calls.
> The worst thing is that there's no log outputted for this RPC call. When a 
> user find the JobManager is get slower or get stuck, he/she won't be able to 
> find out why.
> Why does this case rarely happen before? Before the optimization, it takes a 
> long time to calculate TaskDeploymentDescriptors and send them to 
> TaskManagers. JobManager calls {{submitTask}} more slowly than the speed of 
> dealing with {{submitTask}} by TaskManagers in most cases. Since the 
> deployment of tasks are topologically sorted, the upstream tasks is deployed 
> before the downstream tasks, and this case rarely happens.
> In my opinion, the solution of this issue needs more discussion. According to 
> the discussion in the pull request 
> ([https://github.com/apache/flink/pull/6680]), it's not safe to remove this 
> RPC call, because we cannot always make sure the assumption that an upstream 
> task failure will always fail the downstream consumers is always right.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25055) Support listen and notify mechanism for PartitionRequest

2022-02-07 Thread Zhilong Hong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488616#comment-17488616
 ] 

Zhilong Hong commented on FLINK-25055:
--

I think this improvement will also solve FLINK-24295. In FLINK-24295, we find 
that too many {{requestPartitionState}} checks will jam the JobManager during 
task deployment. Once the notify mechanism is introduced, 
{{PartitionNotFoundException}} will not be raised for scenario Till mentioned 
above (for A->B, B gets deployed before A has registered its partition on the 
TaskManager). There will be less {{requestPartitionState}} checks during task 
deployment. I'm wondering what the progress of this issue is, [~zjureel]?

> Support listen and notify mechanism for PartitionRequest
> 
>
> Key: FLINK-25055
> URL: https://issues.apache.org/jira/browse/FLINK-25055
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Affects Versions: 1.14.0, 1.12.5, 1.13.3
>Reporter: Shammon
>Assignee: Shammon
>Priority: Major
>
> We submit batch jobs to flink session cluster with eager scheduler for olap. 
> JM deploys subtasks to TaskManager independently, and the downstream subtasks 
> may start before the upstream ones are ready. The downstream subtask sends 
> PartitionRequest to upstream ones, and may receive PartitionNotFoundException 
> from them. Then it will retry to send PartitionRequest after a few ms until 
> timeout.
> The current approach raises two problems. First, there will be too many retry 
> PartitionRequest messages. Each downstream subtask will send PartitionRequest 
> to all its upstream subtasks and the total number of messages will be O(N*N), 
> where N is the parallelism of subtasks. Secondly, the interval between 
> polling retries will increase the delay for upstream and downstream tasks to 
> confirm PartitionRequest.
> We want to support listen and notify mechanism for PartitionRequest when the 
> job needs no failover. Upstream TaskManager will add the PartitionRequest to 
> a listen list with a timeout checker, and notify the request when the task 
> register its partition in the TaskManager.
> [~nkubicek] I noticed that your scenario of using flink is similar to ours. 
> What do you think?  And hope to hear from you [~trohrmann] THX



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-24897) Make usrlib could work for YARN application and per-job

2022-02-07 Thread Yang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-24897:
--
Summary: Make usrlib could work for YARN application and per-job  (was: 
Enable application mode on YARN to use usrlib)

> Make usrlib could work for YARN application and per-job
> ---
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Major
>  Labels: pull-request-available
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] wangyang0918 commented on a change in pull request #18531: [FLINK-24897] Enable application mode on YARN to use usrlib

2022-02-07 Thread GitBox


wangyang0918 commented on a change in pull request #18531:
URL: https://github.com/apache/flink/pull/18531#discussion_r801282398



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -1685,6 +1699,35 @@ void addLibFoldersToShipFiles(Collection 
effectiveShipFiles) {
 }
 }
 
+@VisibleForTesting
+void addUsrLibFolderToShipFiles(
+Collection effectiveShipFiles, Collection 
systemShipFiles) {
+// Add usrlib folder to the ship files if it exists
+// Classes in the folder will be loaded by UserClassLoader if 
CLASSPATH_INCLUDE_USER_JAR is
+// DISABLED.
+final Optional usrLibDir = getLocalUsrLibDirectory();
+
+if (usrLibDir.isPresent()) {
+File usrLibDirFile = usrLibDir.get();
+if (usrLibDirFile.isDirectory()) {
+checkArgument(

Review comment:
   Given that the shipped files could not include `usrlib`. So I think we 
could reuse the `checkArgument` in `YarnClusterDescriptor#addShipFiles`. And it 
need to be changed like following. WDYT?
   
   ```
   checkArgument(
   !isUsrLibDirIncludedInShipFiles(shipFiles),
   "The name of ship directory can not be %s.",
   DEFAULT_FLINK_USR_LIB_DIR);
   ```
   
   After then the implementation of `addUsrLibFolderToShipFiles` could be 
simplified.
   ```
   @VisibleForTesting
   void addUsrLibFolderToShipFiles(Collection effectiveShipFiles) {
   ClusterEntrypointUtils.tryFindUserLibDirectory()
   .ifPresent(
   usrLibDirFile -> {
   effectiveShipFiles.add(usrLibDirFile);
   LOG.info(
   "usrlib: {} will be shipped automatically.",
   usrLibDirFile.getAbsolutePath());
   });
   }
   ```

##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnApplicationFileUploader.java
##
@@ -438,6 +446,18 @@ private static boolean isPlugin(Path path) {
 return false;
 }
 
+private static boolean isUsrLib(Path path) {

Review comment:
   I suggest to introduce the `isUsrLibDirIncludedInProvidedLib` instead 
and then `checkArgument` in the constructor of `YarnApplicationFileUploader`. 
Because the files in the provided lib will be relativized, so we just need to 
check the name of provided lib.
   
   ```
   private boolean isUsrLibDirIncludedInProvidedLib(final List 
providedLibDirs)
   throws IOException {
   for (Path path : providedLibDirs) {
   final FileStatus fileStatus = fileSystem.getFileStatus(path);
   if (fileStatus.isDirectory()
   && 
ConfigConstants.DEFAULT_FLINK_USR_LIB_DIR.equals(path.getName())) {
   return true;
   }
   }
   return false;
   }
   ```


##
File path: flink-yarn/pom.xml
##
@@ -130,6 +130,11 @@ under the License.



+   

Review comment:
   Why do we need this change? I think we already have the dependency in 
the parent pom.

##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -1799,4 +1842,27 @@ public static void logDetachedClusterInformation(
 yarnApplicationId,
 yarnApplicationId);
 }
+
+/**

Review comment:
   Could we use `ClusterEntrypointUtils.tryFindUserLibDirectory()` to avoid 
introducing the `getLocalUsrLibDirectory()`?

##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -910,14 +910,28 @@ private ApplicationReport startAppMaster(
 }
 
 // Upload and register user jars
-final List userClassPaths =
+List userClassPaths =

Review comment:
   Why do we remove the `final`?

##
File path: 
flink-yarn/src/test/java/org/apache/flink/yarn/YarnClusterDescriptorTest.java
##
@@ -699,6 +700,41 @@ public void 
testDisableSystemClassPathIncludeUserJarAndWithIllegalShipDirectoryN
 }
 }
 
+/**
+ * Tests that the usrlib is added ship files again when local usrlib has 
been automatically
+ * shipped.
+ */
+@Test
+public void testShipUsrLibWithExistingLocalUsrLib() throws IOException {

Review comment:
   After moving out the `checkArgument` in 
`YarnClusterDescriptor#addUsrLibFolderToShipFiles`, we just need to test 
whether the `usrlib` is added to shipped files automatically.

##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -1752,7 +1795,7 @@ ContainerLaunchContext setupApplicationMasterContainer(
 return config.get(YarnConfigOptions.CLASSPATH_INCLUDE_USER_JAR);
 }
 
-private static 

[jira] [Commented] (FLINK-24307) Some lines in Scala code example end with unnecessary semicolons. Also some Scala code blocks are wrongly typed with Java blocks

2022-02-07 Thread Yao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488608#comment-17488608
 ] 

Yao Zhang commented on FLINK-24307:
---

Hi [~twalthr] ,

Could you please review this issue? Thanks.

> Some lines in Scala code example end with unnecessary semicolons. Also some 
> Scala code blocks are wrongly typed with Java blocks
> 
>
> Key: FLINK-24307
> URL: https://issues.apache.org/jira/browse/FLINK-24307
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.13.2, shaded-14.0
>Reporter: Yao Zhang
>Assignee: Yao Zhang
>Priority: Minor
>  Labels: pull-request-available, stale-assigned
> Attachments: image-2021-09-16-14-39-05-065.png
>
>
> In Flink docs, some sample of Scala codes end with semicolons.
> Also in some of the markdown files, several the Scala code block types are 
> Java. They should be Scala.
> !image-2021-09-16-14-39-05-065.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] flinkbot edited a comment on pull request #18653: [FLINK-25825][connector-jdbc] MySqlCatalogITCase fails on azure

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18653:
URL: https://github.com/apache/flink/pull/18653#issuecomment-1032212433


   
   ## CI report:
   
   * b5b8d7da0088b0918da7b33a983ac41efa2e7da5 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30881)
 
   * 065c43692477ad54dc906bc6fb0b6afe8082ed0c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30882)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-25997) FsStateChangelogWriter cannot work well when there is no change appended

2022-02-07 Thread Hangxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hangxiang Yu updated FLINK-25997:
-
Description: For FsStateChangelogWriter, If there is no change appended, we 
should use the old sqn, but we always use sqn.next() currently.  (was: If there 
is no change appended, we should use the old sqn, but we always use sqn.next() 
currently.)

> FsStateChangelogWriter cannot work well when there is no change appended
> 
>
> Key: FLINK-25997
> URL: https://issues.apache.org/jira/browse/FLINK-25997
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.15.0
>Reporter: Hangxiang Yu
>Priority: Major
>
> For FsStateChangelogWriter, If there is no change appended, we should use the 
> old sqn, but we always use sqn.next() currently.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25995) Make implicit assumption of SQL local keyBy/groupBy explicit

2022-02-07 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-25995:

Description: 
If there are multiple consecutive the same groupBy(i.e. keyBy), SQL planner 
will change them except the first one to use forward partitioner, so that these 
operators can be chained to reduce unnecessary shuffles.

However, sometimes the local keyBy operators are not chained (e.g. multiple 
inputs), and  this kind of forward partitioners will turn into forward job 
edges. These forward edges still have the local keyBy assumption, so that they 
cannot be changed into rescale/rebalance edges, otherwise it can lead to 
incorrect results. This prevents the adaptive batch scheduler from determining 
parallelism for other forward edge downstream job vertices (see FLINK-25046).

To solve it, I propose to introduce a new {{ForwardForLocalKeyByPartitioner}}. 
When SQL planner optimizes the case of multiple consecutive the same groupBy, 
it should use the proposed partitioner, so that the runtime framework can 
further decide whether the partitioner can be changed to rescale or not.

  was:
If there are multiple consecutive the same groupBy(i.e. keyBy), SQL planner 
will change them except the first one to use forward partitioner, so that these 
operators can be chained to reduce unnecessary shuffles.
However, sometimes the local keyBy operators are not chained (e.g. multiple 
inputs), and  this kind of forward partitioners will turn into forward job 
edges. These forward edges still have the local keyBy assumption, so that they 
cannot be changed into rescale/rebalance edges, otherwise it can lead to 
incorrect results. This prevents the adaptive batch scheduler from determining 
parallelism for other forward edge downstream job vertices (see FLINK-25046).
To solve it, I propose to introduce a new {{ForwardForRescalePartitioner}}. 
When SQL planner optimizes the case of multiple consecutive the same groupBy, 
it should use the proposed partitioner, so that the runtime framework can 
further decide whether the partitioner can be changed to rescale or not.


> Make implicit assumption of SQL local keyBy/groupBy explicit
> 
>
> Key: FLINK-25995
> URL: https://issues.apache.org/jira/browse/FLINK-25995
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Affects Versions: 1.15.0
>Reporter: Zhu Zhu
>Priority: Major
> Fix For: 1.15.0
>
>
> If there are multiple consecutive the same groupBy(i.e. keyBy), SQL planner 
> will change them except the first one to use forward partitioner, so that 
> these operators can be chained to reduce unnecessary shuffles.
> However, sometimes the local keyBy operators are not chained (e.g. multiple 
> inputs), and  this kind of forward partitioners will turn into forward job 
> edges. These forward edges still have the local keyBy assumption, so that 
> they cannot be changed into rescale/rebalance edges, otherwise it can lead to 
> incorrect results. This prevents the adaptive batch scheduler from 
> determining parallelism for other forward edge downstream job vertices (see 
> FLINK-25046).
> To solve it, I propose to introduce a new 
> {{ForwardForLocalKeyByPartitioner}}. When SQL planner optimizes the case of 
> multiple consecutive the same groupBy, it should use the proposed 
> partitioner, so that the runtime framework can further decide whether the 
> partitioner can be changed to rescale or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] paul8263 closed pull request #16017: [hotfix][doc] Correct code sample issues in Create a TableEnvironment…

2022-02-07 Thread GitBox


paul8263 closed pull request #16017:
URL: https://github.com/apache/flink/pull/16017


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18653: [FLINK-25825][connector-jdbc] MySqlCatalogITCase fails on azure

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18653:
URL: https://github.com/apache/flink/pull/18653#issuecomment-1032212433


   
   ## CI report:
   
   * b5b8d7da0088b0918da7b33a983ac41efa2e7da5 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30881)
 
   * 065c43692477ad54dc906bc6fb0b6afe8082ed0c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-25997) FsStateChangelogWriter cannot work well when there is no change appended

2022-02-07 Thread Hangxiang Yu (Jira)
Hangxiang Yu created FLINK-25997:


 Summary: FsStateChangelogWriter cannot work well when there is no 
change appended
 Key: FLINK-25997
 URL: https://issues.apache.org/jira/browse/FLINK-25997
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.15.0
Reporter: Hangxiang Yu


If there is no change appended, we should use the old sqn, but we always use 
sqn.next() currently.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] flinkbot edited a comment on pull request #18653: [FLINK-25825][connector-jdbc] MySqlCatalogITCase fails on azure

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18653:
URL: https://github.com/apache/flink/pull/18653#issuecomment-1032212433


   
   ## CI report:
   
   * b5b8d7da0088b0918da7b33a983ac41efa2e7da5 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30881)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18653: [FLINK-25825][connector-jdbc] MySqlCatalogITCase fails on azure

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18653:
URL: https://github.com/apache/flink/pull/18653#issuecomment-1032212433


   
   ## CI report:
   
   * b5b8d7da0088b0918da7b33a983ac41efa2e7da5 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30881)
 
   * 065c43692477ad54dc906bc6fb0b6afe8082ed0c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] gaoyunhaii commented on a change in pull request #18642: [FLINK-25572][connectors/filesystem] Update File Sink to use decomposed interfaces.

2022-02-07 Thread GitBox


gaoyunhaii commented on a change in pull request #18642:
URL: https://github.com/apache/flink/pull/18642#discussion_r801283600



##
File path: 
flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/committer/FileCommitter.java
##
@@ -48,9 +47,10 @@ public FileCommitter(BucketWriter bucketWriter) {
 }
 
 @Override
-public List commit(List 
committables)
-throws IOException {
-for (FileSinkCommittable committable : committables) {
+public void commit(Collection> requests)
+throws IOException, InterruptedException {
+for (CommitRequest request : requests) {

Review comment:
   I think the key is that we might need to know the result of a single 
request. We might first not change the behavior here and open a follow-up issue 
for this part if there are still controversy~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18653: [FLINK-25825][connector-jdbc] MySqlCatalogITCase fails on azure

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18653:
URL: https://github.com/apache/flink/pull/18653#issuecomment-1032212433


   
   ## CI report:
   
   * b5b8d7da0088b0918da7b33a983ac41efa2e7da5 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30881)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18653: [FLINK-25825][connector-jdbc] MySqlCatalogITCase fails on azure

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18653:
URL: https://github.com/apache/flink/pull/18653#issuecomment-1032212433


   
   ## CI report:
   
   * b5b8d7da0088b0918da7b33a983ac41efa2e7da5 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30881)
 
   * 065c43692477ad54dc906bc6fb0b6afe8082ed0c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18653: [FLINK-25825][connector-jdbc] MySqlCatalogITCase fails on azure

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18653:
URL: https://github.com/apache/flink/pull/18653#issuecomment-1032212433


   
   ## CI report:
   
   * b5b8d7da0088b0918da7b33a983ac41efa2e7da5 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30881)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #18653: [FLINK-25825][connector-jdbc] MySqlCatalogITCase fails on azure

2022-02-07 Thread GitBox


flinkbot commented on pull request #18653:
URL: https://github.com/apache/flink/pull/18653#issuecomment-1032213135


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit b5b8d7da0088b0918da7b33a983ac41efa2e7da5 (Tue Feb 08 
04:51:58 UTC 2022)
   
   **Warnings:**
* **1 pom.xml files were touched**: Check for build and licensing issues.
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-25825).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #18653: [FLINK-25825][connector-jdbc] MySqlCatalogITCase fails on azure

2022-02-07 Thread GitBox


flinkbot commented on pull request #18653:
URL: https://github.com/apache/flink/pull/18653#issuecomment-1032212433


   
   ## CI report:
   
   * b5b8d7da0088b0918da7b33a983ac41efa2e7da5 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-25825) MySqlCatalogITCase fails on azure

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-25825:
---
Labels: pull-request-available test-stability  (was: test-stability)

> MySqlCatalogITCase fails on azure
> -
>
> Key: FLINK-25825
> URL: https://issues.apache.org/jira/browse/FLINK-25825
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC, Table SQL / API
>Affects Versions: 1.15.0
>Reporter: Roman Khachatryan
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30189&view=logs&j=e9af9cde-9a65-5281-a58e-2c8511d36983&t=c520d2c3-4d17-51f1-813b-4b0b74a0c307&l=13677
>  
> {code}
> 2022-01-26T06:04:42.8019913Z Jan 26 06:04:42 [ERROR] 
> org.apache.flink.connector.jdbc.catalog.MySqlCatalogITCase.testFullPath  Time 
> elapsed: 2.166 *s  <<< FAILURE!
> 2022-01-26T06:04:42.8025522Z Jan 26 06:04:42 java.lang.AssertionError: 
> expected: java.util.ArrayList<[+I[1, -1, 1, null, true, null, hello, 2021-0 
> 8-04, 2021-08-04T01:54:16, -1, 1, -1.0, 1.0, enum2, -9.1, 9.1, -1, 1, -1, 1, 
> \{"k1": "v1"}, null, col_longtext, null, -1, 1, col_mediumtext, -99, 9 9, 
> -1.0, 1.0, set_ele1, -1, 1, col_text, 10:32:34, 2021-08-04T01:54:16, 
> col_tinytext, -1, 1, null, col_varchar, 2021-08-04T01:54:16.463, 09:33:43,  
> 2021-08-04T01:54:16.463, null], +I[2, -1, 1, null, true, null, hello, 
> 2021-08-04, 2021-08-04T01:53:19, -1, 1, -1.0, 1.0, enum2, -9.1, 9.1, -1, 1,  
> -1, 1, \{"k1": "v1"}, null, col_longtext, null, -1, 1, col_mediumtext, -99, 
> 99, -1.0, 1.0, set_ele1,set_ele12, -1, 1, col_text, 10:32:34, 2021-08- 
> 04T01:53:19, col_tinytext, -1, 1, null, col_varchar, 2021-08-04T01:53:19.098, 
> 09:33:43, 2021-08-04T01:53:19.098, null]]> but was: java.util.ArrayL 
> ist<[+I[1, -1, 1, null, true, null, hello, 2021-08-04, 2021-08-04T01:54:16, 
> -1, 1, -1.0, 1.0, enum2, -9.1, 9.1, -1, 1, -1, 1, \{"k1": "v1"}, null,  
> col_longtext, null, -1, 1, col_mediumtext, -99, 99, -1.0, 1.0, set_ele1, -1, 
> 1, col_text, 10:32:34, 2021-08-04T01:54:16, col_tinytext, -1, 1, null , 
> col_varchar, 2021-08-04T01:54:16.463, 09:33:43, 2021-08-04T01:54:16.463, 
> null], +I[2, -1, 1, null, true, null, hello, 2021-08-04, 2021-08-04T01: 
> 53:19, -1, 1, -1.0, 1.0, enum2, -9.1, 9.1, -1, 1, -1, 1, \{"k1": "v1"}, null, 
> col_longtext, null, -1, 1, col_mediumtext, -99, 99, -1.0, 1.0, set_el 
> e1,set_ele12, -1, 1, col_text, 10:32:34, 2021-08-04T01:53:19, col_tinytext, 
> -1, 1, null, col_varchar, 2021-08-04T01:53:19.098, 09:33:43, 2021-08-0 
> 4T01:53:19.098, null]]>
> 2022-01-26T06:04:42.8029336Z Jan 26 06:04:42    at 
> org.junit.Assert.fail(Assert.java:89)
> 2022-01-26T06:04:42.8029824Z Jan 26 06:04:42    at 
> org.junit.Assert.failNotEquals(Assert.java:835)
> 2022-01-26T06:04:42.8030319Z Jan 26 06:04:42    at 
> org.junit.Assert.assertEquals(Assert.java:120)
> 2022-01-26T06:04:42.8030815Z Jan 26 06:04:42    at 
> org.junit.Assert.assertEquals(Assert.java:146)
> 2022-01-26T06:04:42.8031419Z Jan 26 06:04:42    at 
> org.apache.flink.connector.jdbc.catalog.MySqlCatalogITCase.testFullPath(MySqlCatalogITCase.java*:306)
> {code}
>  
> {code}
> 2022-01-26T06:04:43.2899378Z Jan 26 06:04:43 [ERROR] Failures:
> 2022-01-26T06:04:43.2907942Z Jan 26 06:04:43 [ERROR]   
> MySqlCatalogITCase.testFullPath:306 expected: java.util.ArrayList<[+I[1, -1, 
> 1, null, true,
> 2022-01-26T06:04:43.2914065Z Jan 26 06:04:43 [ERROR]   
> MySqlCatalogITCase.testGetTable:253 expected:<(
> 2022-01-26T06:04:43.2983567Z Jan 26 06:04:43 [ERROR]   
> MySqlCatalogITCase.testSelectToInsert:323 expected: 
> java.util.ArrayList<[+I[1, -1, 1, null,
> 2022-01-26T06:04:43.2997373Z Jan 26 06:04:43 [ERROR]   
> MySqlCatalogITCase.testWithoutCatalog:291 expected: 
> java.util.ArrayList<[+I[1, -1, 1, null,
> 2022-01-26T06:04:43.3010450Z Jan 26 06:04:43 [ERROR]   
> MySqlCatalogITCase.testWithoutCatalogDB:278 expected: 
> java.util.ArrayList<[+I[1, -1, 1, nul
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] RocMarshal opened a new pull request #18653: [FLINK-25825][connector-jdbc] MySqlCatalogITCase fails on azure

2022-02-07 Thread GitBox


RocMarshal opened a new pull request #18653:
URL: https://github.com/apache/flink/pull/18653


   
   
   ## What is the purpose of the change
   
   *Fix `MySqlCatalogITCase` fails on azure*
   
   
   ## Brief change log
   **connector-jdbc module**
 - *Use test-container to refactor `UnsignedTypeConversionITCase`.*
 - *Remove dependencies about mariadb*
 - *Adjust some dependencies and code of  `UnsignedTypeConversionITCase` 
arrange*
   
   
   ## Verifying this change
   
   
   This change is already covered by existing tests, 
   such as:
   - org.apache.flink.connector.jdbc.catalog.MySqlCatalogITCase
   - org.apache.flink.connector.jdbc.table.UnsignedTypeConversionITCase
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18548: [FLINK-25727][hive] fix the UDFArgumentException when call hive `json…

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18548:
URL: https://github.com/apache/flink/pull/18548#issuecomment-1023865581


   
   ## CI report:
   
   * b5a79055a0b63e26f0b4d96c1c79a87b42efec33 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30349)
 
   * 1046ea53f5d166b8c64e35087b8d3c5e20f0932a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30880)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18548: [FLINK-25727][hive] fix the UDFArgumentException when call hive `json…

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18548:
URL: https://github.com/apache/flink/pull/18548#issuecomment-1023865581


   
   ## CI report:
   
   * b5a79055a0b63e26f0b4d96c1c79a87b42efec33 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30349)
 
   * 1046ea53f5d166b8c64e35087b8d3c5e20f0932a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] pltbkd commented on a change in pull request #18642: [FLINK-25572][connectors/filesystem] Update File Sink to use decomposed interfaces.

2022-02-07 Thread GitBox


pltbkd commented on a change in pull request #18642:
URL: https://github.com/apache/flink/pull/18642#discussion_r801253773



##
File path: 
flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/committer/FileCommitter.java
##
@@ -48,9 +47,10 @@ public FileCommitter(BucketWriter bucketWriter) {
 }
 
 @Override
-public List commit(List 
committables)
-throws IOException {
-for (FileSinkCommittable committable : committables) {
+public void commit(Collection> requests)
+throws IOException, InterruptedException {
+for (CommitRequest request : requests) {

Review comment:
   This is inherited from the interface. Maybe we can discuss whether to 
allow throwing exceptions  here or to make a agreement that all exceptions 
while committing should be reported via the CommitRequest (unless the exception 
is not bounded to a specific request)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18548: [FLINK-25727][hive] fix the UDFArgumentException when call hive `json…

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18548:
URL: https://github.com/apache/flink/pull/18548#issuecomment-1023865581


   
   ## CI report:
   
   * b5a79055a0b63e26f0b4d96c1c79a87b42efec33 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30349)
 
   * 8061d7c7ae8eef1489a09c86aadad47f3aa0c009 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-25940) pyflink/datastream/tests/test_data_stream.py::StreamingModeDataStreamTests::test_keyed_process_function_with_state failed on AZP

2022-02-07 Thread Huang Xingbo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huang Xingbo reassigned FLINK-25940:


Assignee: Huang Xingbo

> pyflink/datastream/tests/test_data_stream.py::StreamingModeDataStreamTests::test_keyed_process_function_with_state
>  failed on AZP
> 
>
> Key: FLINK-25940
> URL: https://issues.apache.org/jira/browse/FLINK-25940
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.0
>Reporter: Till Rohrmann
>Assignee: Huang Xingbo
>Priority: Critical
>  Labels: test-stability
>
> The test 
> {{pyflink/datastream/tests/test_data_stream.py::StreamingModeDataStreamTests::test_keyed_process_function_with_state}}
>  fails on AZP:
> {code}
> 2022-02-02T17:44:12.1898582Z Feb 02 17:44:12 
> === FAILURES 
> ===
> 2022-02-02T17:44:12.1899860Z Feb 02 17:44:12 _ 
> StreamingModeDataStreamTests.test_keyed_process_function_with_state __
> 2022-02-02T17:44:12.1900493Z Feb 02 17:44:12 
> 2022-02-02T17:44:12.1901218Z Feb 02 17:44:12 self = 
>  testMethod=test_keyed_process_function_with_state>
> 2022-02-02T17:44:12.1901948Z Feb 02 17:44:12 
> 2022-02-02T17:44:12.1902745Z Feb 02 17:44:12 def 
> test_keyed_process_function_with_state(self):
> 2022-02-02T17:44:12.1903722Z Feb 02 17:44:12 
> self.env.get_config().set_auto_watermark_interval(2000)
> 2022-02-02T17:44:12.1904473Z Feb 02 17:44:12 
> self.env.set_stream_time_characteristic(TimeCharacteristic.EventTime)
> 2022-02-02T17:44:12.1906780Z Feb 02 17:44:12 data_stream = 
> self.env.from_collection([(1, 'hi', '1603708211000'),
> 2022-02-02T17:44:12.1908034Z Feb 02 17:44:12  
>(2, 'hello', '1603708224000'),
> 2022-02-02T17:44:12.1909166Z Feb 02 17:44:12  
>(3, 'hi', '1603708226000'),
> 2022-02-02T17:44:12.1910122Z Feb 02 17:44:12  
>(4, 'hello', '1603708289000'),
> 2022-02-02T17:44:12.1911099Z Feb 02 17:44:12  
>(5, 'hi', '1603708291000'),
> 2022-02-02T17:44:12.1912451Z Feb 02 17:44:12  
>(6, 'hello', '1603708293000')],
> 2022-02-02T17:44:12.1913456Z Feb 02 17:44:12  
>   type_info=Types.ROW([Types.INT(), Types.STRING(),
> 2022-02-02T17:44:12.1914338Z Feb 02 17:44:12  
>Types.STRING()]))
> 2022-02-02T17:44:12.1914811Z Feb 02 17:44:12 
> 2022-02-02T17:44:12.1915317Z Feb 02 17:44:12 class 
> MyTimestampAssigner(TimestampAssigner):
> 2022-02-02T17:44:12.1915724Z Feb 02 17:44:12 
> 2022-02-02T17:44:12.1916782Z Feb 02 17:44:12 def 
> extract_timestamp(self, value, record_timestamp) -> int:
> 2022-02-02T17:44:12.1917621Z Feb 02 17:44:12 return 
> int(value[2])
> 2022-02-02T17:44:12.1918262Z Feb 02 17:44:12 
> 2022-02-02T17:44:12.1918855Z Feb 02 17:44:12 class 
> MyProcessFunction(KeyedProcessFunction):
> 2022-02-02T17:44:12.1919363Z Feb 02 17:44:12 
> 2022-02-02T17:44:12.1919744Z Feb 02 17:44:12 def __init__(self):
> 2022-02-02T17:44:12.1920143Z Feb 02 17:44:12 self.value_state 
> = None
> 2022-02-02T17:44:12.1920648Z Feb 02 17:44:12 self.list_state 
> = None
> 2022-02-02T17:44:12.1921298Z Feb 02 17:44:12 self.map_state = 
> None
> 2022-02-02T17:44:12.1921864Z Feb 02 17:44:12 
> 2022-02-02T17:44:12.1922479Z Feb 02 17:44:12 def open(self, 
> runtime_context: RuntimeContext):
> 2022-02-02T17:44:12.1923907Z Feb 02 17:44:12 
> value_state_descriptor = ValueStateDescriptor('value_state', Types.INT())
> 2022-02-02T17:44:12.1924922Z Feb 02 17:44:12 self.value_state 
> = runtime_context.get_state(value_state_descriptor)
> 2022-02-02T17:44:12.1925741Z Feb 02 17:44:12 
> list_state_descriptor = ListStateDescriptor('list_state', Types.INT())
> 2022-02-02T17:44:12.1926482Z Feb 02 17:44:12 self.list_state 
> = runtime_context.get_list_state(list_state_descriptor)
> 2022-02-02T17:44:12.1927465Z Feb 02 17:44:12 
> map_state_descriptor = MapStateDescriptor('map_state', Types.INT(), 
> Types.STRING())
> 2022-02-02T17:44:12.1927998Z Feb 02 17:44:12 state_ttl_config 
> = StateTtlConfig \
> 2022-02-02T17:44:12.1928444Z Feb 02 17:44:12 
> .new_builder(Time.seconds(1)) \
> 2022-02-02T17:44:12.1928943Z Feb 02 17:44:12   

[jira] [Commented] (FLINK-24477) Add MongoDB sink

2022-02-07 Thread RocMarshal (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488580#comment-17488580
 ] 

RocMarshal commented on FLINK-24477:


Thanks  [~nir.tsruya] , cc [~monster#12]

> Add MongoDB sink
> 
>
> Key: FLINK-24477
> URL: https://issues.apache.org/jira/browse/FLINK-24477
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Reporter: Nir Tsruya
>Assignee: Nir Tsruya
>Priority: Minor
>  Labels: pull-request-available
>
> h2. Motivation
> *User stories:*
> As a Flink user, I’d like to use MongoDB as sink for my data pipeline.
> *Scope:*
>  * Implement an asynchronous sink for MongoDB inheriting the AsyncSinkBase 
> class. The implementation can for now reside in its own module in 
> flink-connectors.
>  * Implement an asynchornous sink writer for MongoDB by extending the 
> AsyncSinkWriter. The implemented Sink Writer will be used by the Sink class 
> that will be created as part of this story.
>  * Java / code-level docs.
>  * End to end testing



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] flinkbot edited a comment on pull request #18363: [Flink-25600][table-planner] Support new statement set syntax in sql client and update docs

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18363:
URL: https://github.com/apache/flink/pull/18363#issuecomment-1012985353


   
   ## CI report:
   
   * 023810c97360cbec41e825735054561aa9ed2dcd Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29599)
 
   * 674933c78081256538559fbdd1d414ee068ed7f9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30879)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18363: [Flink-25600][table-planner] Support new statement set syntax in sql client and update docs

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18363:
URL: https://github.com/apache/flink/pull/18363#issuecomment-1012985353


   
   ## CI report:
   
   * 023810c97360cbec41e825735054561aa9ed2dcd Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29599)
 
   * 674933c78081256538559fbdd1d414ee068ed7f9 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-25996) Introduce job property isDynamicGraph to ExecutionConfig

2022-02-07 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-25996:
---

 Summary: Introduce job property isDynamicGraph to ExecutionConfig
 Key: FLINK-25996
 URL: https://issues.apache.org/jira/browse/FLINK-25996
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Zhu Zhu
 Fix For: 1.15.0


To enable FLINK-25995 and FLINK-25046 only in dynamic graph scenario, we need a 
property ExecutionConfig#isDynamicGraph. In the first step, the property will 
be decided automatically, true iff config {{jobmanager.scheduler}} is 
{{AdaptiveBatch}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] zjureel commented on pull request #18303: [FLINK-25085][runtime] Add a scheduled thread pool for periodic tasks in RpcEndpoint

2022-02-07 Thread GitBox


zjureel commented on pull request #18303:
URL: https://github.com/apache/flink/pull/18303#issuecomment-1032181406


   Thanks @XComp @dmvk @KarmaGYZ , I have run this PR in my devops after rebase 
the commits from master in 20220207.1-20220207.8 in 
https://dev.azure.com/zjureel/flink/_build?definitionId=2, and I find that the 
issue in [FLINK-18356](https://issues.apache.org/jira/browse/FLINK-18356) has 
been fixed. As mentioned above I will return to the previous implementation in 
https://github.com/apache/flink/pull/18007 , what do you think? @KarmaGYZ  THX


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-25995) Make implicit assumption of SQL local keyBy/groupBy explicit

2022-02-07 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-25995:
---

 Summary: Make implicit assumption of SQL local keyBy/groupBy 
explicit
 Key: FLINK-25995
 URL: https://issues.apache.org/jira/browse/FLINK-25995
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.15.0
Reporter: Zhu Zhu
 Fix For: 1.15.0


If there are multiple consecutive the same groupBy(i.e. keyBy), SQL planner 
will change them except the first one to use forward partitioner, so that these 
operators can be chained to reduce unnecessary shuffles.
However, sometimes the local keyBy operators are not chained (e.g. multiple 
inputs), and  this kind of forward partitioners will turn into forward job 
edges. These forward edges still have the local keyBy assumption, so that they 
cannot be changed into rescale/rebalance edges, otherwise it can lead to 
incorrect results. This prevents the adaptive batch scheduler from determining 
parallelism for other forward edge downstream job vertices (see FLINK-25046).
To solve it, I propose to introduce a new {{ForwardForRescalePartitioner}}. 
When SQL planner optimizes the case of multiple consecutive the same groupBy, 
it should use the proposed partitioner, so that the runtime framework can 
further decide whether the partitioner can be changed to rescale or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25994) Implement FileStoreExpire

2022-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-25994:
---
Labels: pull-request-available  (was: )

> Implement FileStoreExpire
> -
>
> Key: FLINK-25994
> URL: https://issues.apache.org/jira/browse/FLINK-25994
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table Store
>Reporter: Caizhi Weng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> Currently FileStoreExpire does not have an implementation. We need an 
> implementation to clean up old snapshots and related files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink-table-store] tsreaper opened a new pull request #17: [FLINK-25994] Implement FileStoreExpire

2022-02-07 Thread GitBox


tsreaper opened a new pull request #17:
URL: https://github.com/apache/flink-table-store/pull/17


   Currently FileStoreExpire does not have an implementation. We need an 
implementation to clean up old snapshots and related files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-25994) Implement FileStoreExpire

2022-02-07 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-25994:
---

 Summary: Implement FileStoreExpire
 Key: FLINK-25994
 URL: https://issues.apache.org/jira/browse/FLINK-25994
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Reporter: Caizhi Weng
 Fix For: 1.15.0


Currently FileStoreExpire does not have an implementation. We need an 
implementation to clean up old snapshots and related files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25152) FLIP-188: Introduce Built-in Dynamic Table Storage

2022-02-07 Thread Caizhi Weng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caizhi Weng updated FLINK-25152:

Labels:   (was: pull-request-available)

> FLIP-188: Introduce Built-in Dynamic Table Storage
> --
>
> Key: FLINK-25152
> URL: https://issues.apache.org/jira/browse/FLINK-25152
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / API, Table SQL / Ecosystem, Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
> Fix For: 1.15.0, table-store-0.1.0
>
>
> introduce built-in storage support for dynamic table, a truly unified 
> changelog & table representation, from Flink SQL’s perspective. The storage 
> will improve the usability a lot.
> More detail see: 
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] martin-liu commented on pull request #18561: [FLINK-25874] loose PyFlink dependency of 'python-dateutil' to '>=2.8.0,<3'

2022-02-07 Thread GitBox


martin-liu commented on pull request #18561:
URL: https://github.com/apache/flink/pull/18561#issuecomment-1032150418


   @dianfu The CI check is finished successfully, anything else do we need?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #18643: [hotfix][docs] replace deprecated class and fix typos

2022-02-07 Thread GitBox


flinkbot edited a comment on pull request #18643:
URL: https://github.com/apache/flink/pull/18643#issuecomment-1031337336


   
   ## CI report:
   
   * ccbf0ea3c22de75e75a10f517c606d83f8c9543c Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=30863)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-25529) java.lang.ClassNotFoundException: org.apache.orc.PhysicalWriter when write bulkly into hive-2.1.1 orc table

2022-02-07 Thread luoyuxia (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488558#comment-17488558
 ] 

luoyuxia commented on FLINK-25529:
--

[~straw] I think you can use hive-exec-2.2.0.jar instead. It's fine for the 
backward compatibility.

> java.lang.ClassNotFoundException: org.apache.orc.PhysicalWriter when write 
> bulkly into hive-2.1.1 orc table
> ---
>
> Key: FLINK-25529
> URL: https://issues.apache.org/jira/browse/FLINK-25529
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
> Environment: hive 2.1.1
> flink 1.12.4
>Reporter: Yuan Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: lib.jpg
>
>
> I tried to write data bulkly into hive-2.1.1 with orc format, and encountered 
> java.lang.ClassNotFoundException: org.apache.orc.PhysicalWriter
>  
> Using bulk writer by setting table.exec.hive.fallback-mapred-writer = false;
>  
> {code:java}
> SET 'table.sql-dialect'='hive';
> create table orders(
>     order_id int,
>     order_date timestamp,
>     customer_name string,
>     price decimal(10,3),
>     product_id int,
>     order_status boolean
> )partitioned by (dt string)
> stored as orc;
>  
> SET 'table.sql-dialect'='default';
> create table datagen_source (
> order_id int,
> order_date timestamp(9),
> customer_name varchar,
> price decimal(10,3),
> product_id int,
> order_status boolean
> )with('connector' = 'datagen');
> create catalog myhive with ('type' = 'hive', 'hive-conf-dir' = '/mnt/conf');
> set table.exec.hive.fallback-mapred-writer = false;
> insert into myhive.`default`.orders
> /*+ OPTIONS(
>     'sink.partition-commit.trigger'='process-time',
>     'sink.partition-commit.policy.kind'='metastore,success-file',
>     'sink.rolling-policy.file-size'='128MB',
>     'sink.rolling-policy.rollover-interval'='10s',
>     'sink.rolling-policy.check-interval'='10s',
>     'auto-compaction'='true',
>     'compaction.file-size'='1MB'    ) */
> select * , date_format(now(),'-MM-dd') as dt from datagen_source;  {code}
> [ERROR] Could not execute SQL statement. Reason:
> java.lang.ClassNotFoundException: org.apache.orc.PhysicalWriter
>  
> My jars in lib dir listed in attachment.
> In HiveTableSink#createStreamSink(line:270), createBulkWriterFactory if 
> table.exec.hive.fallback-mapred-writer is false.
> If table is orc, HiveShimV200#createOrcBulkWriterFactory will be invoked. 
> OrcBulkWriterFactory depends on org.apache.orc.PhysicalWriter in orc-core, 
> but flink-connector-hive excludes orc-core for conflicting with hive-exec.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-24558) dataStream can not use multiple classloaders

2022-02-07 Thread bai sui (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488549#comment-17488549
 ] 

bai sui edited comment on FLINK-24558 at 2/8/22, 2:12 AM:
--

[~RocMarshal]  hi,@RocMarshal ,I have fix the bugs that you have comment on 
[https://github.com/apache/flink/pull/17521|[https://github.com/apache/flink/pull/17521]]
     Could you resolve the PR ,thanks:D


was (Author: baisui):
[~RocMarshal]  hi,@RocMarshal ,I have fix the bugs that you have comment on 
[https://github.com/apache/flink/pull/17521|[https://github.com/apache/flink/pull/17521],]
     Could you resolve the PR ,thanks:D

> dataStream can not use multiple classloaders 
> -
>
> Key: FLINK-24558
> URL: https://issues.apache.org/jira/browse/FLINK-24558
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client
>Reporter: bai sui
>Assignee: bai sui
>Priority: Minor
>  Labels: pull-request-available
> Attachments: Flink ClassLoader优化 (1).png
>
>
> create a dataStream demo as below,in the demo,create a very simple example,
> read stream data from sourceFunction,and send it to sinkFunction without any 
> processing.
> The point is,by creating the instance of SourceFunction and SinkFunction has 
> used two separately URLClassLoader with different dependencies,for avoiding 
> the code conflict .
> but the problem is flink client send to server ,the server side throw an 
> classNotFoundException which defined the de classloader dependencies, 
> Obviously the server side has not use the classloader as client side.
> how can I solve the problem ,is there any one can give me some advice ? 
> thanks a lot 
>  
> {code:java}
> public class FlinkStreamDemo {
> public static void main(String[] args) throws Exception {
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> SourceFunction sourceFunc = createSourceFunction();
> DataStreamSource dtoDataStreamSource = env.addSource(sourceFunc);
> SinkFunction sinkFunction = createSink();
> dtoDataStreamSource.addSink(sinkFunction);
> env.execute("flink-example");
> }
> private static SinkFunction createSink() {
> URL[] urls = new URL[]{...};
> ClassLoader classLoader = new URLClassLoader(urls);
> ServiceLoader loaders = 
> ServiceLoader.load(ISinkFunctionFactory.class, classLoader);
> Iterator it = loaders.iterator();
> if (it.hasNext()) {
> return it.next().create();
> }
> throw new IllegalStateException();
> }
> private static SourceFunction createSourceFunction() {
> URL[] urls = new URL[]{...};
> ClassLoader classLoader = new URLClassLoader(urls);
> ServiceLoader loaders = 
> ServiceLoader.load(ISourceFunctionFactory.class, classLoader);
> Iterator it = loaders.iterator();
> if (it.hasNext()) {
> return it.next().create();
> }
> throw new IllegalStateException();
> }
> public interface ISinkFunctionFactory {
> SinkFunction create();
> }
> public interface ISourceFunctionFactory {
> SourceFunction create();
> }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-24558) dataStream can not use multiple classloaders

2022-02-07 Thread bai sui (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488549#comment-17488549
 ] 

bai sui commented on FLINK-24558:
-

[~RocMarshal]  hi,@RocMarshal ,I have fix the bugs that you have comment on 
[https://github.com/apache/flink/pull/17521|[https://github.com/apache/flink/pull/17521],]
     Could you resolve the PR ,thanks:D

> dataStream can not use multiple classloaders 
> -
>
> Key: FLINK-24558
> URL: https://issues.apache.org/jira/browse/FLINK-24558
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client
>Reporter: bai sui
>Assignee: bai sui
>Priority: Minor
>  Labels: pull-request-available
> Attachments: Flink ClassLoader优化 (1).png
>
>
> create a dataStream demo as below,in the demo,create a very simple example,
> read stream data from sourceFunction,and send it to sinkFunction without any 
> processing.
> The point is,by creating the instance of SourceFunction and SinkFunction has 
> used two separately URLClassLoader with different dependencies,for avoiding 
> the code conflict .
> but the problem is flink client send to server ,the server side throw an 
> classNotFoundException which defined the de classloader dependencies, 
> Obviously the server side has not use the classloader as client side.
> how can I solve the problem ,is there any one can give me some advice ? 
> thanks a lot 
>  
> {code:java}
> public class FlinkStreamDemo {
> public static void main(String[] args) throws Exception {
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> SourceFunction sourceFunc = createSourceFunction();
> DataStreamSource dtoDataStreamSource = env.addSource(sourceFunc);
> SinkFunction sinkFunction = createSink();
> dtoDataStreamSource.addSink(sinkFunction);
> env.execute("flink-example");
> }
> private static SinkFunction createSink() {
> URL[] urls = new URL[]{...};
> ClassLoader classLoader = new URLClassLoader(urls);
> ServiceLoader loaders = 
> ServiceLoader.load(ISinkFunctionFactory.class, classLoader);
> Iterator it = loaders.iterator();
> if (it.hasNext()) {
> return it.next().create();
> }
> throw new IllegalStateException();
> }
> private static SourceFunction createSourceFunction() {
> URL[] urls = new URL[]{...};
> ClassLoader classLoader = new URLClassLoader(urls);
> ServiceLoader loaders = 
> ServiceLoader.load(ISourceFunctionFactory.class, classLoader);
> Iterator it = loaders.iterator();
> if (it.hasNext()) {
> return it.next().create();
> }
> throw new IllegalStateException();
> }
> public interface ISinkFunctionFactory {
> SinkFunction create();
> }
> public interface ISourceFunctionFactory {
> SourceFunction create();
> }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-24558) dataStream can not use multiple classloaders

2022-02-07 Thread bai sui (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bai sui updated FLINK-24558:

Labels: pull-request-available  (was: pull-request-available stale-assigned)

> dataStream can not use multiple classloaders 
> -
>
> Key: FLINK-24558
> URL: https://issues.apache.org/jira/browse/FLINK-24558
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client
>Reporter: bai sui
>Assignee: bai sui
>Priority: Minor
>  Labels: pull-request-available
> Attachments: Flink ClassLoader优化 (1).png
>
>
> create a dataStream demo as below,in the demo,create a very simple example,
> read stream data from sourceFunction,and send it to sinkFunction without any 
> processing.
> The point is,by creating the instance of SourceFunction and SinkFunction has 
> used two separately URLClassLoader with different dependencies,for avoiding 
> the code conflict .
> but the problem is flink client send to server ,the server side throw an 
> classNotFoundException which defined the de classloader dependencies, 
> Obviously the server side has not use the classloader as client side.
> how can I solve the problem ,is there any one can give me some advice ? 
> thanks a lot 
>  
> {code:java}
> public class FlinkStreamDemo {
> public static void main(String[] args) throws Exception {
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> SourceFunction sourceFunc = createSourceFunction();
> DataStreamSource dtoDataStreamSource = env.addSource(sourceFunc);
> SinkFunction sinkFunction = createSink();
> dtoDataStreamSource.addSink(sinkFunction);
> env.execute("flink-example");
> }
> private static SinkFunction createSink() {
> URL[] urls = new URL[]{...};
> ClassLoader classLoader = new URLClassLoader(urls);
> ServiceLoader loaders = 
> ServiceLoader.load(ISinkFunctionFactory.class, classLoader);
> Iterator it = loaders.iterator();
> if (it.hasNext()) {
> return it.next().create();
> }
> throw new IllegalStateException();
> }
> private static SourceFunction createSourceFunction() {
> URL[] urls = new URL[]{...};
> ClassLoader classLoader = new URLClassLoader(urls);
> ServiceLoader loaders = 
> ServiceLoader.load(ISourceFunctionFactory.class, classLoader);
> Iterator it = loaders.iterator();
> if (it.hasNext()) {
> return it.next().create();
> }
> throw new IllegalStateException();
> }
> public interface ISinkFunctionFactory {
> SinkFunction create();
> }
> public interface ISourceFunctionFactory {
> SourceFunction create();
> }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] mans2singh commented on pull request #18576: [FLINK-25900][Table][Timezone][docs] Added aliases to functions calls to match the columns in docs instead of EXPR$5, EXPR$6, EXPR$7

2022-02-07 Thread GitBox


mans2singh commented on pull request #18576:
URL: https://github.com/apache/flink/pull/18576#issuecomment-1032129266


   @wuchong  - Can you please review this PR ?  Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-25801) add cpu processor metric of taskmanager

2022-02-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-25801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

王俊博 updated FLINK-25801:

Priority: Major  (was: Minor)

> add cpu processor metric of taskmanager
> ---
>
> Key: FLINK-25801
> URL: https://issues.apache.org/jira/browse/FLINK-25801
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: 王俊博
>Priority: Major
>  Labels: pull-request-available
>
> flink process add cpu load metric, with user know environment of cpu 
> processor they can determine that their job is io bound /cpu bound . But 
> flink doesn't add container access cpu processor metric, if cpu environment 
> of taskmanager is different(Cpu cores), it's hard to calculate cpu used of 
> flink.
>  
> {code:java}
> //代码占位符
> metrics.>gauge("Load", mxBean::getProcessCpuLoad);
> metrics.>gauge("Time", mxBean::getProcessCpuTime); {code}
> Spark give totalCores to show Number of cores available in this executor in 
> ExecutorSummary.
> [https://spark.apache.org/docs/3.1.1/monitoring.html#:~:text=totalCores,in%20this%20executor.]
> {code:java}
> //代码占位符
> val sb = new StringBuilder
> sb.append(s"""spark_info{version="$SPARK_VERSION_SHORT", 
> revision="$SPARK_REVISION"} 1.0\n""")
> val store = uiRoot.asInstanceOf[SparkUI].store
> store.executorList(true).foreach { executor =>
>   val prefix = "metrics_executor_"
>   val labels = Seq(
> "application_id" -> store.applicationInfo.id,
> "application_name" -> store.applicationInfo.name,
> "executor_id" -> executor.id
>   ).map { case (k, v) => s"""$k="$v }.mkString("{", ", ", "}")
>   sb.append(s"${prefix}rddBlocks$labels ${executor.rddBlocks}\n")
>   sb.append(s"${prefix}memoryUsed_bytes$labels ${executor.memoryUsed}\n")
>   sb.append(s"${prefix}diskUsed_bytes$labels ${executor.diskUsed}\n")
>   sb.append(s"${prefix}totalCores$labels ${executor.totalCores}\n") 
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink-statefun] flint-stone closed pull request #299: optimization: flush and lazy synchronize on downstream for propagation

2022-02-07 Thread GitBox


flint-stone closed pull request #299:
URL: https://github.com/apache/flink-statefun/pull/299


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-25916) Using upsert-kafka with a flush buffer results in Null Pointer Exception

2022-02-07 Thread Corey Shaw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488481#comment-17488481
 ] 

Corey Shaw commented on FLINK-25916:


I can confirm that `timeGMT` is always non null.  As mentioned in the 
description, everything works completely fine with no problems of any kind as 
long as I don't use `{{sink.buffer-flush.max-rows`}} and 
{{`sink.buffer-flush.interval`}}.  The moment I add those configuration options 
(using any values for them I've tried) I immediately start getting the Null 
Pointer Exception as mentioned.  

I'm not using checkpoints currently (as I'm still just testing), so flushing 
wouldn't be related to a checkpoint yet.

> Using upsert-kafka with a flush buffer results in Null Pointer Exception
> 
>
> Key: FLINK-25916
> URL: https://issues.apache.org/jira/browse/FLINK-25916
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Table SQL / Runtime
>Affects Versions: 1.15.0, 1.14.3
> Environment: CentOS 7.9 x64
> Intel Xeon Gold 6140 CPU
>Reporter: Corey Shaw
>Priority: Critical
>
> Flink Version: 1.14.3
> upsert-kafka version: 1.14.3
>  
> I have been trying to buffer output from the upsert-kafka connector using the 
> documented parameters {{sink.buffer-flush.max-rows}} and 
> {{sink.buffer-flush.interval}}
> Whenever I attempt to run an INSERT query with buffering, I receive the 
> following error (shortened for brevity):
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.flink.streaming.connectors.kafka.table.ReducingUpsertWriter.flush(ReducingUpsertWriter.java:145)
>  
> at 
> org.apache.flink.streaming.connectors.kafka.table.ReducingUpsertWriter.lambda$registerFlush$3(ReducingUpsertWriter.java:124)
>  
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1693)
>  
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$22(StreamTask.java:1684)
>  
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
>  
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) 
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
>  
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
>  
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
>  
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
>  
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
>  
> at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
>  
> at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937) 
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) 
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) 
> at java.lang.Thread.run(Thread.java:829) [?:?] {code}
>  
> If I remove the parameters related to flush buffering, then everything works 
> as expected with no problems at all.  For reference, here is the full setup 
> with source, destination, and queries.  Yes, I realize the INSERT could use 
> an overhaul, but that's not the issue at hand :).
> {code:java}
> CREATE TABLE `source_topic` (
>     `timeGMT` INT,
>     `eventtime` AS TO_TIMESTAMP(FROM_UNIXTIME(`timeGMT`)),
>     `visIdHigh` BIGINT,
>     `visIdLow` BIGINT,
>     `visIdStr` AS CONCAT(IF(`visIdHigh` IS NULL, '', CAST(`visIdHigh` AS 
> STRING)), IF(`visIdLow` IS NULL, '', CAST(`visIdLow` AS STRING))),
>     WATERMARK FOR eventtime AS eventtime - INTERVAL '25' SECONDS
> ) WITH (
>     'connector' = 'kafka',
>     'properties.group.id' = 'flink_metrics',
>     'properties.bootstrap.servers' = 'brokers.example.com:9093',
>     'topic' = 'source_topic',
>     'scan.startup.mode' = 'earliest-offset',
>     'value.format' = 'avro-confluent',
>     'value.avro-confluent.url' = 'http://schema.example.com',
>     'value.fields-include' = 'EXCEPT_KEY'
> );
>  CREATE TABLE dest_topic (
> `messageType` VARCHAR,
> `observationID` BIGINT,
> `obsYear` BIGINT,
> `obsMonth` BIGINT,
> `obsDay` BIGINT,
> `obsHour` BIGINT,
> `obsMinute` BIGINT,
> `obsTz` VARCHAR(5),
> `value` BIGINT,
> PRIMARY KEY (observationID, messageType) NOT ENFORCED
> ) WITH (
> 'connector' = 'upsert-kafka',
> 'key.format' = 'json',
> 'properties.bootstrap.servers' = 'broker

  1   2   3   4   5   6   >