[jira] [Commented] (FLINK-27900) Decouple the advertisedAddress and rest.bind-address

2023-03-16 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701048#comment-17701048
 ] 

Yu Wang commented on FLINK-27900:
-

[~Wencong Liu] Thanks for the reply.

But I have some different view about the advertisedAddress. By checking some 
other open source products, I think the advertise listener(host) is able to 
change. Just to list two of what I know, 
Kafka([https://kafka.apache.org/documentation/#brokerconfigs_advertised.listeners])
 and Pulsar 
([https://pulsar.apache.org/docs/2.11.x/concepts-multiple-advertised-listeners/#advertised-listeners])

And for the aspect of *RedirectingSslHandler.* If we need to put Flink service 
in the internal network and expose the service by a proxy, as the advertised 
address must be same as bind-address, the redirecting handler will always 
return the address that cannot be accessed from the external.

> Decouple the advertisedAddress and rest.bind-address
> 
>
> Key: FLINK-27900
> URL: https://issues.apache.org/jira/browse/FLINK-27900
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.10.3, 1.12.0, 1.11.6, 1.13.6, 1.14.4
> Environment: Flink 1.13, 1.12, 1.11, 1.10 with ssl
> Deploy Flink in Kubernetes pod with a nginx sidecar for auth
>Reporter: Yu Wang
>Priority: Minor
>
> Currently the Flink Rest api does not have authentication, according to the 
> doc 
> [https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/security/security-ssl/#external--rest-connectivity]
>  # We set up the Flink cluster in k8s
>  # We set up a nginx sidecar to enable auth for Flink Rest api.
>  # We set *rest.bind-address* to localhost to hide the original Flink address 
> and port
>  # We enabled the ssl for the Flink Rest api
> It works fine wen the client tried to call the Flink Rest api with *https* 
> scheme.
> But if the client using *http* scheme, the *RedirectingSslHandler* will try 
> to redirect the address to the advertised url. According to 
> {*}RestServerEndpoint{*}, Flink will use the value of *rest.bind-address* as 
> the {*}advertisedAddress{*}. So the client will be redirected to *127.0.0.1* 
> and failed to connect the url.
> So we hope the advertisedAddress can be decoupled with rest.bind-addres, to 
> provide more flexibility to the Flink deployment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29234) Dead lock in DefaultLeaderElectionService

2022-09-10 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-29234:

Affects Version/s: 1.15.2
   1.14.5

> Dead lock in DefaultLeaderElectionService
> -
>
> Key: FLINK-29234
> URL: https://issues.apache.org/jira/browse/FLINK-29234
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.13.5, 1.14.5, 1.15.2
>Reporter: Yu Wang
>Priority: Critical
>
> Jobmanager stop working because the deadlock in DefaultLeaderElectionService.
> The log stopped at
> {code:java}
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
> Stopping DefaultLeaderElectionService. {code}
> Which may similar to this ticket 
> https://issues.apache.org/jira/browse/FLINK-20008
> Here is the jstack info
> {code:java}
> Found one Java-level deadlock: = 
> "flink-akka.actor.default-dispatcher-18": waiting to lock monitor 
> 0x7f15c7eae3a8 (object 0x000678d395e8, a java.lang.Object), which is 
> held by "main-EventThread" "main-EventThread": waiting to lock monitor 
> 0x7f15a3811258 (object 0x000678cf1be0, a java.lang.Object), which is 
> held by "flink-akka.actor.default-dispatcher-18" Java stack information for 
> the threads listed above: === 
> "flink-akka.actor.default-dispatcher-18": at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.stop(DefaultLeaderElectionService.java:104)
>  - waiting to lock <0x000678d395e8> (a java.lang.Object) at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$closeAsync$0(JobMasterServiceLeadershipRunner.java:147)
>  at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$735/1742012752.run(Unknown
>  Source) at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$runAfterwardsAsync$18(FutureUtils.java:687)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$736/6716561.accept(Unknown
>  Source) at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
>  at 
> org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
>  at 
> java.util.concurrent.CompletableFuture$UniCompletion.claim(CompletableFuture.java:543)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:765)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:795)
>  at 
> java.util.concurrent.CompletableFuture.whenCompleteAsync(CompletableFuture.java:2163)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils.runAfterwardsAsync(FutureUtils.java:684)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils.runAfterwards(FutureUtils.java:651)
>  at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.closeAsync(JobMasterServiceLeadershipRunner.java:143)
>  - locked <0x000678cf1be0> (a java.lang.Object) at 
> org.apache.flink.runtime.dispatcher.Dispatcher.terminateJob(Dispatcher.java:807)
>  at 
> org.apache.flink.runtime.dispatcher.Dispatcher.terminateRunningJobs(Dispatcher.java:799)
>  at 
> org.apache.flink.runtime.dispatcher.Dispatcher.terminateRunningJobsAndGetTerminationFuture(Dispatcher.java:812)
>  at 
> org.apache.flink.runtime.dispatcher.Dispatcher.onStop(Dispatcher.java:268) at 
> org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:563)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:186)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$$Lambda$444/1289054037.apply(Unknown
>  Source) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:123) at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at 
> akka.actor.Actor.aroundReceive(Actor.scala:517) at 
> akka.actor.Actor.aroundReceive$(Actor.scala:515) at 
> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at 
> 

[jira] [Comment Edited] (FLINK-29234) Dead lock in DefaultLeaderElectionService

2022-09-10 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602689#comment-17602689
 ] 

Yu Wang edited comment on FLINK-29234 at 9/10/22 9:48 AM:
--

[~martijnvisser] I think this issue still exists in the master branch. The 
issue is caused by the ordering of Flink to get the lock.

 

In this line, the class *DefaultLeaderElectionService* will try to get the lock 
of itself, then invoke the method of 

*leaderContender(which is JobMasterServiceLeaderShipRunner in this case).*  

[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/DefaultLeaderElectionService.java#L204]

So the order of getting locks is
 # DefaultLeaderElectionService
 # JobMasterServiceLeaderShipRunner

 

And in this line *JobMasterServiceLeaderShipRunner* will try to get the lock of 
itself, then invoke the method of *leaderElectionService(which is* 
{*}DefaultLeaderElectionService in this case{*}{*}).{*}

[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMasterServiceLeadershipRunner.java#L148]

So the order of getting the locks is
 # JobMasterServiceLeaderShipRunner
 # DefaultLeaderElectionService

 

So if these two functions are invoked nearly at the same time, it will cause 
the dead lock issue.

 


was (Author: lucentwong):
[~martijnvisser] I think this issue still exists in the master branch. The 
issue is caused by the ordering of Flink to get the lock.

 

In this line, the class *DefaultLeaderElectionService* will try to get the lock 
of itself, then invoke the method of 

*leaderContender(which is JobMasterServiceLeaderShipRunner in this case).*  

[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/DefaultLeaderElectionService.java#L204]

So the order of getting locks is
 # DefaultLeaderElectionService
 # JobMasterServiceLeaderShipRunner{*}{*}

 

And in this line *JobMasterServiceLeaderShipRunner* will try to get the lock of 
itself, then invoke the method of *leaderElectionService(which is* 
{*}DefaultLeaderElectionService in this case{*}{*}).{*}

[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMasterServiceLeadershipRunner.java#L148]

So the order of getting the locks is
 # JobMasterServiceLeaderShipRunner
 # DefaultLeaderElectionService

 

So if these two functions are invoked nearly at the same time, it will cause 
the dead lock issue.

 

> Dead lock in DefaultLeaderElectionService
> -
>
> Key: FLINK-29234
> URL: https://issues.apache.org/jira/browse/FLINK-29234
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.13.5
>Reporter: Yu Wang
>Priority: Critical
>
> Jobmanager stop working because the deadlock in DefaultLeaderElectionService.
> The log stopped at
> {code:java}
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
> Stopping DefaultLeaderElectionService. {code}
> Which may similar to this ticket 
> https://issues.apache.org/jira/browse/FLINK-20008
> Here is the jstack info
> {code:java}
> Found one Java-level deadlock: = 
> "flink-akka.actor.default-dispatcher-18": waiting to lock monitor 
> 0x7f15c7eae3a8 (object 0x000678d395e8, a java.lang.Object), which is 
> held by "main-EventThread" "main-EventThread": waiting to lock monitor 
> 0x7f15a3811258 (object 0x000678cf1be0, a java.lang.Object), which is 
> held by "flink-akka.actor.default-dispatcher-18" Java stack information for 
> the threads listed above: === 
> "flink-akka.actor.default-dispatcher-18": at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.stop(DefaultLeaderElectionService.java:104)
>  - waiting to lock <0x000678d395e8> (a java.lang.Object) at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$closeAsync$0(JobMasterServiceLeadershipRunner.java:147)
>  at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$735/1742012752.run(Unknown
>  Source) at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$runAfterwardsAsync$18(FutureUtils.java:687)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$736/6716561.accept(Unknown
>  Source) at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
>  at 
> 

[jira] [Comment Edited] (FLINK-29234) Dead lock in DefaultLeaderElectionService

2022-09-10 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602689#comment-17602689
 ] 

Yu Wang edited comment on FLINK-29234 at 9/10/22 9:47 AM:
--

[~martijnvisser] I think this issue still exists in the master branch. The 
issue is caused by the ordering of Flink to get the lock.

 

In this line, the class *DefaultLeaderElectionService* will try to get the lock 
of itself, then invoke the method of 

*leaderContender(which is JobMasterServiceLeaderShipRunner in this case).*  

[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/DefaultLeaderElectionService.java#L204]

So the order of getting locks is
 # DefaultLeaderElectionService
 # JobMasterServiceLeaderShipRunner{*}{*}

 

And in this line *JobMasterServiceLeaderShipRunner* will try to get the lock of 
itself, then invoke the method of *leaderElectionService(which is* 
{*}DefaultLeaderElectionService in this case{*}{*}).{*}

[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMasterServiceLeadershipRunner.java#L148]

So the order of getting the locks is
 # JobMasterServiceLeaderShipRunner
 # DefaultLeaderElectionService

 

So if these two functions are invoked nearly at the same time, it will cause 
the dead lock issue.

 


was (Author: lucentwong):
[~martijnvisser] I think this issue still exists in the master branch. The 
issue is caused by the ordering of Flink to get the lock.

 

In this line, the class *DefaultLeaderElectionService* will try to get the lock 
of itself, then invoke the method of 

*leaderContender(which is JobMasterServiceLeaderShipRunner in this case).*   

[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/DefaultLeaderElectionService.java#L204]

So the order of getting locks is
 # DefaultLeaderElectionService
 # JobMasterServiceLeaderShipRunner{*}{*}

 

And in this line *JobMasterServiceLeaderShipRunner* will try to get the lock of 
itself, then invoke the method of *leaderElectionService(which is* 
{*}DefaultLeaderElectionService in this case{*}{*}).{*}

[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMasterServiceLeadershipRunner.java#L148]

So the order of getting the locks is
 # JobMasterServiceLeaderShipRunner
 # DefaultLeaderElectionService

 

So if these two functions are invoked nearly at the same time, it will cause 
the dead lock issue.

 

> Dead lock in DefaultLeaderElectionService
> -
>
> Key: FLINK-29234
> URL: https://issues.apache.org/jira/browse/FLINK-29234
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.13.5
>Reporter: Yu Wang
>Priority: Critical
>
> Jobmanager stop working because the deadlock in DefaultLeaderElectionService.
> The log stopped at
> {code:java}
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
> Stopping DefaultLeaderElectionService. {code}
> Which may similar to this ticket 
> https://issues.apache.org/jira/browse/FLINK-20008
> Here is the jstack info
> {code:java}
> Found one Java-level deadlock: = 
> "flink-akka.actor.default-dispatcher-18": waiting to lock monitor 
> 0x7f15c7eae3a8 (object 0x000678d395e8, a java.lang.Object), which is 
> held by "main-EventThread" "main-EventThread": waiting to lock monitor 
> 0x7f15a3811258 (object 0x000678cf1be0, a java.lang.Object), which is 
> held by "flink-akka.actor.default-dispatcher-18" Java stack information for 
> the threads listed above: === 
> "flink-akka.actor.default-dispatcher-18": at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.stop(DefaultLeaderElectionService.java:104)
>  - waiting to lock <0x000678d395e8> (a java.lang.Object) at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$closeAsync$0(JobMasterServiceLeadershipRunner.java:147)
>  at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$735/1742012752.run(Unknown
>  Source) at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$runAfterwardsAsync$18(FutureUtils.java:687)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$736/6716561.accept(Unknown
>  Source) at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
>  at 
> 

[jira] [Commented] (FLINK-29234) Dead lock in DefaultLeaderElectionService

2022-09-10 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602689#comment-17602689
 ] 

Yu Wang commented on FLINK-29234:
-

[~martijnvisser] I think this issue still exists in the master branch. The 
issue is caused by the ordering of Flink to get the lock.

 

In this line, the class *DefaultLeaderElectionService* will try to get the lock 
of itself, then invoke the method of 

*leaderContender(which is JobMasterServiceLeaderShipRunner in this case).*   

[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/leaderelection/DefaultLeaderElectionService.java#L204]

So the order of getting locks is
 # DefaultLeaderElectionService
 # JobMasterServiceLeaderShipRunner{*}{*}

 

And in this line *JobMasterServiceLeaderShipRunner* will try to get the lock of 
itself, then invoke the method of *leaderElectionService(which is* 
{*}DefaultLeaderElectionService in this case{*}{*}).{*}

[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMasterServiceLeadershipRunner.java#L148]

So the order of getting the locks is
 # JobMasterServiceLeaderShipRunner
 # DefaultLeaderElectionService

 

So if these two functions are invoked nearly at the same time, it will cause 
the dead lock issue.

 

> Dead lock in DefaultLeaderElectionService
> -
>
> Key: FLINK-29234
> URL: https://issues.apache.org/jira/browse/FLINK-29234
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.13.5
>Reporter: Yu Wang
>Priority: Critical
>
> Jobmanager stop working because the deadlock in DefaultLeaderElectionService.
> The log stopped at
> {code:java}
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
> Stopping DefaultLeaderElectionService. {code}
> Which may similar to this ticket 
> https://issues.apache.org/jira/browse/FLINK-20008
> Here is the jstack info
> {code:java}
> Found one Java-level deadlock: = 
> "flink-akka.actor.default-dispatcher-18": waiting to lock monitor 
> 0x7f15c7eae3a8 (object 0x000678d395e8, a java.lang.Object), which is 
> held by "main-EventThread" "main-EventThread": waiting to lock monitor 
> 0x7f15a3811258 (object 0x000678cf1be0, a java.lang.Object), which is 
> held by "flink-akka.actor.default-dispatcher-18" Java stack information for 
> the threads listed above: === 
> "flink-akka.actor.default-dispatcher-18": at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.stop(DefaultLeaderElectionService.java:104)
>  - waiting to lock <0x000678d395e8> (a java.lang.Object) at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$closeAsync$0(JobMasterServiceLeadershipRunner.java:147)
>  at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$735/1742012752.run(Unknown
>  Source) at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$runAfterwardsAsync$18(FutureUtils.java:687)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$736/6716561.accept(Unknown
>  Source) at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
>  at 
> org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
>  at 
> java.util.concurrent.CompletableFuture$UniCompletion.claim(CompletableFuture.java:543)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:765)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:795)
>  at 
> java.util.concurrent.CompletableFuture.whenCompleteAsync(CompletableFuture.java:2163)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils.runAfterwardsAsync(FutureUtils.java:684)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils.runAfterwards(FutureUtils.java:651)
>  at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.closeAsync(JobMasterServiceLeadershipRunner.java:143)
>  - locked <0x000678cf1be0> (a java.lang.Object) at 
> org.apache.flink.runtime.dispatcher.Dispatcher.terminateJob(Dispatcher.java:807)
>  at 
> org.apache.flink.runtime.dispatcher.Dispatcher.terminateRunningJobs(Dispatcher.java:799)
>  at 
> org.apache.flink.runtime.dispatcher.Dispatcher.terminateRunningJobsAndGetTerminationFuture(Dispatcher.java:812)
>  at 
> 

[jira] [Updated] (FLINK-29234) Dead lock in DefaultLeaderElectionService

2022-09-08 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-29234:

Description: 
Jobmanager stop working because the deadlock in DefaultLeaderElectionService.

The log stopped at
{code:java}
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
Stopping DefaultLeaderElectionService. {code}
Which may similar to this ticket 
https://issues.apache.org/jira/browse/FLINK-20008

Here is the jstack info
{code:java}
Found one Java-level deadlock: = 
"flink-akka.actor.default-dispatcher-18": waiting to lock monitor 
0x7f15c7eae3a8 (object 0x000678d395e8, a java.lang.Object), which is 
held by "main-EventThread" "main-EventThread": waiting to lock monitor 
0x7f15a3811258 (object 0x000678cf1be0, a java.lang.Object), which is 
held by "flink-akka.actor.default-dispatcher-18" Java stack information for the 
threads listed above: === 
"flink-akka.actor.default-dispatcher-18": at 


org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.stop(DefaultLeaderElectionService.java:104)
 - waiting to lock <0x000678d395e8> (a java.lang.Object) at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$closeAsync$0(JobMasterServiceLeadershipRunner.java:147)
 at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$735/1742012752.run(Unknown
 Source) at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$runAfterwardsAsync$18(FutureUtils.java:687)
 at 
org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$736/6716561.accept(Unknown
 Source) at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
 at 
org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
 at 
java.util.concurrent.CompletableFuture$UniCompletion.claim(CompletableFuture.java:543)
 at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:765)
 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 at 
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:795)
 at 
java.util.concurrent.CompletableFuture.whenCompleteAsync(CompletableFuture.java:2163)
 at 
org.apache.flink.runtime.concurrent.FutureUtils.runAfterwardsAsync(FutureUtils.java:684)
 at 
org.apache.flink.runtime.concurrent.FutureUtils.runAfterwards(FutureUtils.java:651)
 at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.closeAsync(JobMasterServiceLeadershipRunner.java:143)
 - locked <0x000678cf1be0> (a java.lang.Object) at 
org.apache.flink.runtime.dispatcher.Dispatcher.terminateJob(Dispatcher.java:807)
 at 
org.apache.flink.runtime.dispatcher.Dispatcher.terminateRunningJobs(Dispatcher.java:799)
 at 
org.apache.flink.runtime.dispatcher.Dispatcher.terminateRunningJobsAndGetTerminationFuture(Dispatcher.java:812)
 at org.apache.flink.runtime.dispatcher.Dispatcher.onStop(Dispatcher.java:268) 
at 
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214)
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:563)
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:186)
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$$Lambda$444/1289054037.apply(Unknown
 Source) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at 
akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:123) at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) at 
akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at 
akka.actor.Actor.aroundReceive(Actor.scala:517) at 
akka.actor.Actor.aroundReceive$(Actor.scala:515) at 
akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at 
akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at 
akka.actor.ActorCell.invoke(ActorCell.scala:561) at 
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at 
akka.dispatch.Mailbox.run(Mailbox.scala:225) at 
akka.dispatch.Mailbox.exec(Mailbox.scala:235) at 
akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 


"main-EventThread": at 

[jira] [Updated] (FLINK-29234) Dead lock in DefaultLeaderElectionService

2022-09-08 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-29234:

Description: 
Jobmanager stop working because the deadlock in DefaultLeaderElectionService. 
Which may similar to this ticket 
https://issues.apache.org/jira/browse/FLINK-20008

Here is the jstak info
{code:java}
Found one Java-level deadlock: = 
"flink-akka.actor.default-dispatcher-18": waiting to lock monitor 
0x7f15c7eae3a8 (object 0x000678d395e8, a java.lang.Object), which is 
held by "main-EventThread" "main-EventThread": waiting to lock monitor 
0x7f15a3811258 (object 0x000678cf1be0, a java.lang.Object), which is 
held by "flink-akka.actor.default-dispatcher-18" Java stack information for the 
threads listed above: === 
"flink-akka.actor.default-dispatcher-18": at 


org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.stop(DefaultLeaderElectionService.java:104)
 - waiting to lock <0x000678d395e8> (a java.lang.Object) at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$closeAsync$0(JobMasterServiceLeadershipRunner.java:147)
 at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$735/1742012752.run(Unknown
 Source) at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$runAfterwardsAsync$18(FutureUtils.java:687)
 at 
org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$736/6716561.accept(Unknown
 Source) at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
 at 
org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
 at 
java.util.concurrent.CompletableFuture$UniCompletion.claim(CompletableFuture.java:543)
 at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:765)
 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 at 
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:795)
 at 
java.util.concurrent.CompletableFuture.whenCompleteAsync(CompletableFuture.java:2163)
 at 
org.apache.flink.runtime.concurrent.FutureUtils.runAfterwardsAsync(FutureUtils.java:684)
 at 
org.apache.flink.runtime.concurrent.FutureUtils.runAfterwards(FutureUtils.java:651)
 at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.closeAsync(JobMasterServiceLeadershipRunner.java:143)
 - locked <0x000678cf1be0> (a java.lang.Object) at 
org.apache.flink.runtime.dispatcher.Dispatcher.terminateJob(Dispatcher.java:807)
 at 
org.apache.flink.runtime.dispatcher.Dispatcher.terminateRunningJobs(Dispatcher.java:799)
 at 
org.apache.flink.runtime.dispatcher.Dispatcher.terminateRunningJobsAndGetTerminationFuture(Dispatcher.java:812)
 at org.apache.flink.runtime.dispatcher.Dispatcher.onStop(Dispatcher.java:268) 
at 
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214)
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:563)
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:186)
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$$Lambda$444/1289054037.apply(Unknown
 Source) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at 
akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:123) at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) at 
akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at 
akka.actor.Actor.aroundReceive(Actor.scala:517) at 
akka.actor.Actor.aroundReceive$(Actor.scala:515) at 
akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at 
akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at 
akka.actor.ActorCell.invoke(ActorCell.scala:561) at 
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at 
akka.dispatch.Mailbox.run(Mailbox.scala:225) at 
akka.dispatch.Mailbox.exec(Mailbox.scala:235) at 
akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 


"main-EventThread": at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.runIfStateRunning(JobMasterServiceLeadershipRunner.java:468)
 - waiting to lock <0x000678cf1be0> (a java.lang.Object) at 

[jira] [Updated] (FLINK-29234) Dead lock in DefaultLeaderElectionService

2022-09-08 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-29234:

Priority: Critical  (was: Major)

> Dead lock in DefaultLeaderElectionService
> -
>
> Key: FLINK-29234
> URL: https://issues.apache.org/jira/browse/FLINK-29234
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.13.5
>Reporter: Yu Wang
>Priority: Critical
>
> Jobmanager stop working because the deadlock in DefaultLeaderElectionService. 
> Which may similar to this ticket 
> https://issues.apache.org/jira/browse/FLINK-20008
> Here is the jstak info
> {code:java}
> Found one Java-level deadlock: = 
> "flink-akka.actor.default-dispatcher-18": waiting to lock monitor 
> 0x7f15c7eae3a8 (object 0x000678d395e8, a java.lang.Object), which is 
> held by "main-EventThread" "main-EventThread": waiting to lock monitor 
> 0x7f15a3811258 (object 0x000678cf1be0, a java.lang.Object), which is 
> held by "flink-akka.actor.default-dispatcher-18" Java stack information for 
> the threads listed above: === 
> "flink-akka.actor.default-dispatcher-18": at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.stop(DefaultLeaderElectionService.java:104)
>  - waiting to lock <0x000678d395e8> (a java.lang.Object) at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$closeAsync$0(JobMasterServiceLeadershipRunner.java:147)
>  at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$735/1742012752.run(Unknown
>  Source) at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$runAfterwardsAsync$18(FutureUtils.java:687)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$736/6716561.accept(Unknown
>  Source) at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
>  at 
> org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
>  at 
> java.util.concurrent.CompletableFuture$UniCompletion.claim(CompletableFuture.java:543)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:765)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:795)
>  at 
> java.util.concurrent.CompletableFuture.whenCompleteAsync(CompletableFuture.java:2163)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils.runAfterwardsAsync(FutureUtils.java:684)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils.runAfterwards(FutureUtils.java:651)
>  at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.closeAsync(JobMasterServiceLeadershipRunner.java:143)
>  - locked <0x000678cf1be0> (a java.lang.Object) at 
> org.apache.flink.runtime.dispatcher.Dispatcher.terminateJob(Dispatcher.java:807)
>  at 
> org.apache.flink.runtime.dispatcher.Dispatcher.terminateRunningJobs(Dispatcher.java:799)
>  at 
> org.apache.flink.runtime.dispatcher.Dispatcher.terminateRunningJobsAndGetTerminationFuture(Dispatcher.java:812)
>  at 
> org.apache.flink.runtime.dispatcher.Dispatcher.onStop(Dispatcher.java:268) at 
> org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:563)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:186)
>  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$$Lambda$444/1289054037.apply(Unknown
>  Source) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:123) at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at 
> akka.actor.Actor.aroundReceive(Actor.scala:517) at 
> akka.actor.Actor.aroundReceive$(Actor.scala:515) at 
> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at 
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at 
> akka.actor.ActorCell.invoke(ActorCell.scala:561) at 
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at 
> akka.dispatch.Mailbox.run(Mailbox.scala:225) at 
> 

[jira] [Created] (FLINK-29234) Dead lock in DefaultLeaderElectionService

2022-09-08 Thread Yu Wang (Jira)
Yu Wang created FLINK-29234:
---

 Summary: Dead lock in DefaultLeaderElectionService
 Key: FLINK-29234
 URL: https://issues.apache.org/jira/browse/FLINK-29234
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.13.5
Reporter: Yu Wang


Jobmanager stop working because the deadlock in DefaultLeaderElectionService. 
Which may similar to this ticket 
https://issues.apache.org/jira/browse/FLINK-20008

Here is the jstak info
{code:java}
Found one Java-level deadlock: = 
"flink-akka.actor.default-dispatcher-18": waiting to lock monitor 
0x7f15c7eae3a8 (object 0x000678d395e8, a java.lang.Object), which is 
held by "main-EventThread" "main-EventThread": waiting to lock monitor 
0x7f15a3811258 (object 0x000678cf1be0, a java.lang.Object), which is 
held by "flink-akka.actor.default-dispatcher-18" Java stack information for the 
threads listed above: === 
"flink-akka.actor.default-dispatcher-18": at 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.stop(DefaultLeaderElectionService.java:104)
 - waiting to lock <0x000678d395e8> (a java.lang.Object) at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$closeAsync$0(JobMasterServiceLeadershipRunner.java:147)
 at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$735/1742012752.run(Unknown
 Source) at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$runAfterwardsAsync$18(FutureUtils.java:687)
 at 
org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$736/6716561.accept(Unknown
 Source) at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
 at 
org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
 at 
java.util.concurrent.CompletableFuture$UniCompletion.claim(CompletableFuture.java:543)
 at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:765)
 at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 at 
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:795)
 at 
java.util.concurrent.CompletableFuture.whenCompleteAsync(CompletableFuture.java:2163)
 at 
org.apache.flink.runtime.concurrent.FutureUtils.runAfterwardsAsync(FutureUtils.java:684)
 at 
org.apache.flink.runtime.concurrent.FutureUtils.runAfterwards(FutureUtils.java:651)
 at 
org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.closeAsync(JobMasterServiceLeadershipRunner.java:143)
 - locked <0x000678cf1be0> (a java.lang.Object) at 
org.apache.flink.runtime.dispatcher.Dispatcher.terminateJob(Dispatcher.java:807)
 at 
org.apache.flink.runtime.dispatcher.Dispatcher.terminateRunningJobs(Dispatcher.java:799)
 at 
org.apache.flink.runtime.dispatcher.Dispatcher.terminateRunningJobsAndGetTerminationFuture(Dispatcher.java:812)
 at org.apache.flink.runtime.dispatcher.Dispatcher.onStop(Dispatcher.java:268) 
at 
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214)
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:563)
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:186)
 at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$$Lambda$444/1289054037.apply(Unknown
 Source) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at 
akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:123) at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) at 
akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at 
akka.actor.Actor.aroundReceive(Actor.scala:517) at 
akka.actor.Actor.aroundReceive$(Actor.scala:515) at 
akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at 
akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at 
akka.actor.ActorCell.invoke(ActorCell.scala:561) at 
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at 
akka.dispatch.Mailbox.run(Mailbox.scala:225) at 
akka.dispatch.Mailbox.exec(Mailbox.scala:235) at 
akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
"main-EventThread": at 

[jira] [Updated] (FLINK-27900) Decouple the advertisedAddress and rest.bind-address

2022-06-06 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-27900:

Environment: 
Flink 1.13, 1.12, 1.11, 1.10 with ssl

Deploy Flink in Kubernetes pod with a nginx sidecar for auth

  was:
Flink 1.13, 1.12, 1.11, 1.10

Deploy Flink in Kubernetes pod with a nginx sidecar for auth


> Decouple the advertisedAddress and rest.bind-address
> 
>
> Key: FLINK-27900
> URL: https://issues.apache.org/jira/browse/FLINK-27900
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.10.3, 1.12.0, 1.11.6, 1.13.6, 1.14.4
> Environment: Flink 1.13, 1.12, 1.11, 1.10 with ssl
> Deploy Flink in Kubernetes pod with a nginx sidecar for auth
>Reporter: Yu Wang
>Priority: Minor
>
> Currently the Flink Rest api does not have authentication, according to the 
> doc 
> [https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/security/security-ssl/#external--rest-connectivity]
>  # We set up the Flink cluster in k8s
>  # We set up a nginx sidecar to enable auth for Flink Rest api.
>  # We set *rest.bind-address* to localhost to hide the original Flink address 
> and port
>  # We enabled the ssl for the Flink Rest api
> It works fine wen the client tried to call the Flink Rest api with *https* 
> scheme.
> But if the client using *http* scheme, the *RedirectingSslHandler* will try 
> to redirect the address to the advertised url. According to 
> {*}RestServerEndpoint{*}, Flink will use the value of *rest.bind-address* as 
> the {*}advertisedAddress{*}. So the client will be redirected to *127.0.0.1* 
> and failed to connect the url.
> So we hope the advertisedAddress can be decoupled with rest.bind-addres, to 
> provide more flexibility to the Flink deployment.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FLINK-27900) Decouple the advertisedAddress and rest.bind-address

2022-06-06 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-27900:

Description: 
Currently the Flink Rest api does not have authentication, according to the doc 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/security/security-ssl/#external--rest-connectivity]
 # We set up the Flink cluster in k8s
 # We set up a nginx sidecar to enable auth for Flink Rest api.
 # We set *rest.bind-address* to localhost to hide the original Flink address 
and port
 # We enabled the ssl for the Flink Rest api

It works fine wen the client tried to call the Flink Rest api with *https* 
scheme.

But if the client using *http* scheme, the *RedirectingSslHandler* will try to 
redirect the address to the advertised url. According to 
{*}RestServerEndpoint{*}, Flink will use the value of *rest.bind-address* as 
the {*}advertisedAddress{*}. So the client will be redirected to *127.0.0.1* 
and failed to connect the url.

So we hope the advertisedAddress can be decoupled with rest.bind-addres, to 
provide more flexibility to the Flink deployment.

  was:
Currently the Flink Rest api does not have authentication, according to the doc 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/security/security-ssl/#external--rest-connectivity]
 # We set up the Flink cluster in k8s
 # We set up a nginx sidecar to enable auth for Flink Rest api.
 # We set *rest.bind-address* to localhost to hide the original Flink address 
and port
 # We enabled the ssl for the Flink Rest api

It works fine wen the client tried to call the Flink Rest api with *https* 
scheme.

But if the client using *http* scheme, the *RedirectingSslHandler* will try to 
redirect the address to the advertised url. According to 
{*}RestServerEndpoint{*}, Flink will use the value of *rest.bind-address* as 
the {*}advertisedAddress{*}. So the client will be redirect to *127.0.0.1* and 
failed to connect the url.

So we hope the advertisedAddress can be decoupled with rest.bind-addres, to 
provide more flexibility to the Flink deployment.


> Decouple the advertisedAddress and rest.bind-address
> 
>
> Key: FLINK-27900
> URL: https://issues.apache.org/jira/browse/FLINK-27900
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.10.3, 1.12.0, 1.11.6, 1.13.6, 1.14.4
> Environment: Flink 1.13, 1.12, 1.11, 1.10
> Deploy Flink in Kubernetes pod with a nginx sidecar for auth
>Reporter: Yu Wang
>Priority: Minor
>
> Currently the Flink Rest api does not have authentication, according to the 
> doc 
> [https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/security/security-ssl/#external--rest-connectivity]
>  # We set up the Flink cluster in k8s
>  # We set up a nginx sidecar to enable auth for Flink Rest api.
>  # We set *rest.bind-address* to localhost to hide the original Flink address 
> and port
>  # We enabled the ssl for the Flink Rest api
> It works fine wen the client tried to call the Flink Rest api with *https* 
> scheme.
> But if the client using *http* scheme, the *RedirectingSslHandler* will try 
> to redirect the address to the advertised url. According to 
> {*}RestServerEndpoint{*}, Flink will use the value of *rest.bind-address* as 
> the {*}advertisedAddress{*}. So the client will be redirected to *127.0.0.1* 
> and failed to connect the url.
> So we hope the advertisedAddress can be decoupled with rest.bind-addres, to 
> provide more flexibility to the Flink deployment.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FLINK-27900) Decouple the advertisedAddress and rest.bind-address

2022-06-06 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-27900:

Description: 
Currently the Flink Rest api does not have authentication, according to the doc 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/security/security-ssl/#external--rest-connectivity]
 # We set up the Flink cluster in k8s
 # We set up a nginx sidecar to enable auth for Flink Rest api.
 # We set *rest.bind-address* to localhost to hide the original Flink address 
and port
 # We enabled the ssl for the Flink Rest api

It works fine wen the client tried to call the Flink Rest api with *https* 
scheme.

But if the client using *http* scheme, the *RedirectingSslHandler* will try to 
redirect the address to the advertised url. According to the code of 
{*}RestServerEndpoint{*}, Flink will use the value of *rest.bind-address* as 
the {*}advertisedAddress{*}. So the client will be redirect to *127.0.0.1* and 
failed to connect the url.

So we hope the advertisedAddress can be decoupled with rest.bind-addres, to 
provide more flexibility to the Flink deployment.

  was:
Currently the Flink Rest api does not have authentication, according to the doc 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/security/security-ssl/#external--rest-connectivity]
 # We set up the Flink cluster in k8s
 # We set up a nginx sidecar to enable auth for Flink Rest api.
 # We set *rest.bind-address* to localhost to hide the original Flink address 
and port
 # We enable the ssl for the Flink Rest api

It works fine wen the client tried to call the Flink Rest api with *https* 
scheme.

But if the client using *http* scheme, the *RedirectingSslHandler* will try to 
redirect the address to the advertised url. According to the code of 
{*}RestServerEndpoint{*}, Flink will use the value of *rest.bind-address* as 
the {*}advertisedAddress{*}. So the client will be redirect to *127.0.0.1* and 
failed to connect the url.

So we hope the advertisedAddress can be decoupled with rest.bind-addres, to 
provide more flexibility to the Flink deployment.


> Decouple the advertisedAddress and rest.bind-address
> 
>
> Key: FLINK-27900
> URL: https://issues.apache.org/jira/browse/FLINK-27900
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.10.3, 1.12.0, 1.11.6, 1.13.6, 1.14.4
> Environment: Flink 1.13, 1.12, 1.11, 1.10
> Deploy Flink in Kubernetes pod with a nginx sidecar for auth
>Reporter: Yu Wang
>Priority: Minor
>
> Currently the Flink Rest api does not have authentication, according to the 
> doc 
> [https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/security/security-ssl/#external--rest-connectivity]
>  # We set up the Flink cluster in k8s
>  # We set up a nginx sidecar to enable auth for Flink Rest api.
>  # We set *rest.bind-address* to localhost to hide the original Flink address 
> and port
>  # We enabled the ssl for the Flink Rest api
> It works fine wen the client tried to call the Flink Rest api with *https* 
> scheme.
> But if the client using *http* scheme, the *RedirectingSslHandler* will try 
> to redirect the address to the advertised url. According to the code of 
> {*}RestServerEndpoint{*}, Flink will use the value of *rest.bind-address* as 
> the {*}advertisedAddress{*}. So the client will be redirect to *127.0.0.1* 
> and failed to connect the url.
> So we hope the advertisedAddress can be decoupled with rest.bind-addres, to 
> provide more flexibility to the Flink deployment.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FLINK-27900) Decouple the advertisedAddress and rest.bind-address

2022-06-06 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-27900:

Description: 
Currently the Flink Rest api does not have authentication, according to the doc 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/security/security-ssl/#external--rest-connectivity]
 # We set up the Flink cluster in k8s
 # We set up a nginx sidecar to enable auth for Flink Rest api.
 # We set *rest.bind-address* to localhost to hide the original Flink address 
and port
 # We enabled the ssl for the Flink Rest api

It works fine wen the client tried to call the Flink Rest api with *https* 
scheme.

But if the client using *http* scheme, the *RedirectingSslHandler* will try to 
redirect the address to the advertised url. According to 
{*}RestServerEndpoint{*}, Flink will use the value of *rest.bind-address* as 
the {*}advertisedAddress{*}. So the client will be redirect to *127.0.0.1* and 
failed to connect the url.

So we hope the advertisedAddress can be decoupled with rest.bind-addres, to 
provide more flexibility to the Flink deployment.

  was:
Currently the Flink Rest api does not have authentication, according to the doc 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/security/security-ssl/#external--rest-connectivity]
 # We set up the Flink cluster in k8s
 # We set up a nginx sidecar to enable auth for Flink Rest api.
 # We set *rest.bind-address* to localhost to hide the original Flink address 
and port
 # We enabled the ssl for the Flink Rest api

It works fine wen the client tried to call the Flink Rest api with *https* 
scheme.

But if the client using *http* scheme, the *RedirectingSslHandler* will try to 
redirect the address to the advertised url. According to the code of 
{*}RestServerEndpoint{*}, Flink will use the value of *rest.bind-address* as 
the {*}advertisedAddress{*}. So the client will be redirect to *127.0.0.1* and 
failed to connect the url.

So we hope the advertisedAddress can be decoupled with rest.bind-addres, to 
provide more flexibility to the Flink deployment.


> Decouple the advertisedAddress and rest.bind-address
> 
>
> Key: FLINK-27900
> URL: https://issues.apache.org/jira/browse/FLINK-27900
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.10.3, 1.12.0, 1.11.6, 1.13.6, 1.14.4
> Environment: Flink 1.13, 1.12, 1.11, 1.10
> Deploy Flink in Kubernetes pod with a nginx sidecar for auth
>Reporter: Yu Wang
>Priority: Minor
>
> Currently the Flink Rest api does not have authentication, according to the 
> doc 
> [https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/security/security-ssl/#external--rest-connectivity]
>  # We set up the Flink cluster in k8s
>  # We set up a nginx sidecar to enable auth for Flink Rest api.
>  # We set *rest.bind-address* to localhost to hide the original Flink address 
> and port
>  # We enabled the ssl for the Flink Rest api
> It works fine wen the client tried to call the Flink Rest api with *https* 
> scheme.
> But if the client using *http* scheme, the *RedirectingSslHandler* will try 
> to redirect the address to the advertised url. According to 
> {*}RestServerEndpoint{*}, Flink will use the value of *rest.bind-address* as 
> the {*}advertisedAddress{*}. So the client will be redirect to *127.0.0.1* 
> and failed to connect the url.
> So we hope the advertisedAddress can be decoupled with rest.bind-addres, to 
> provide more flexibility to the Flink deployment.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27900) Decouple the advertisedAddress and rest.bind-address

2022-06-06 Thread Yu Wang (Jira)
Yu Wang created FLINK-27900:
---

 Summary: Decouple the advertisedAddress and rest.bind-address
 Key: FLINK-27900
 URL: https://issues.apache.org/jira/browse/FLINK-27900
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / REST
Affects Versions: 1.14.4, 1.13.6, 1.11.6, 1.12.0, 1.10.3
 Environment: Flink 1.13, 1.12, 1.11, 1.10

Deploy Flink in Kubernetes pod with a nginx sidecar for auth
Reporter: Yu Wang


Currently the Flink Rest api does not have authentication, according to the doc 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/security/security-ssl/#external--rest-connectivity]
 # We set up the Flink cluster in k8s
 # We set up a nginx sidecar to enable auth for Flink Rest api.
 # We set *rest.bind-address* to localhost to hide the original Flink address 
and port
 # We enable the ssl for the Flink Rest api

It works fine wen the client tried to call the Flink Rest api with *https* 
scheme.

But if the client using *http* scheme, the *RedirectingSslHandler* will try to 
redirect the address to the advertised url. According to the code of 
{*}RestServerEndpoint{*}, Flink will use the value of *rest.bind-address* as 
the {*}advertisedAddress{*}. So the client will be redirect to *127.0.0.1* and 
failed to connect the url.

So we hope the advertisedAddress can be decoupled with rest.bind-addres, to 
provide more flexibility to the Flink deployment.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Reopened] (FLINK-22362) Don't appear target address when taskmanager connect to jobmanager error

2021-04-25 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang reopened FLINK-22362:
-

> Don't appear target address when taskmanager connect to jobmanager error
> 
>
> Key: FLINK-22362
> URL: https://issues.apache.org/jira/browse/FLINK-22362
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Affects Versions: 1.12.2
>Reporter: Yu Wang
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-22362) Don't appear target address when taskmanager connect to jobmanager error

2021-04-25 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang resolved FLINK-22362.
-
Resolution: Fixed

> Don't appear target address when taskmanager connect to jobmanager error
> 
>
> Key: FLINK-22362
> URL: https://issues.apache.org/jira/browse/FLINK-22362
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Affects Versions: 1.12.2
>Reporter: Yu Wang
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22362) Don't appear target address when taskmanager connect to jobmanager error

2021-04-19 Thread Yu Wang (Jira)
Yu Wang created FLINK-22362:
---

 Summary: Don't appear target address when taskmanager connect to 
jobmanager error
 Key: FLINK-22362
 URL: https://issues.apache.org/jira/browse/FLINK-22362
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Network
Affects Versions: 1.12.2
Reporter: Yu Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-21430) Appear append data when flink sql sink mysql on key conflict

2021-02-21 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288154#comment-17288154
 ] 

Yu Wang edited comment on FLINK-21430 at 2/22/21, 3:23 AM:
---

[~jark] Thanks for your reminding


was (Author: yuwang0...@gmail.com):
[~jark]thanks for your reminding

> Appear append data when flink sql sink mysql on key conflict
> 
>
> Key: FLINK-21430
> URL: https://issues.apache.org/jira/browse/FLINK-21430
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.12.0
>Reporter: Yu Wang
>Priority: Major
>
> kafka data:
> {"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613802148092,"end_time":1613803144969,"process_time":"2021-02-20
>  06:39:05.088"}
> {"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613803609813,"end_time":1613803654280,"process_time":"2021-02-20
>  06:47:34.609"}
> kafka ddl :
> CREATE TABLE washroom_detail (
>  building_id STRING,
>  sofa_id STRING,
>  floor_num INT,
>  occupy_status INT,
>  start_time BIGINT,
>  end_time BIGINT,
>  process_time TIMESTAMP,
>  occupy_times as concat(date_format(TIMESTAMPADD(hour, 8, 
> cast(start_time / 1000 as timestamp)), 'HH:mm'), '-', 
> date_format(TIMESTAMPADD(hour, 8, cast(end_time / 1000 as timestamp)), 
> 'HH:mm')),
>  local_date as date_format(cast(start_time / 1000 as 
> timestamp), '-MM-dd'),
>  day_hour as cast(date_format(cast(start_time / 1000 as 
> timestamp), 'HH') as INT) + 8
> ) WITH (
>  'connector' = 'kafka',
>  'topic' = '',
>  'properties.bootstrap.servers' = 'xxx',
>  'properties.group.id' = '',
>  'scan.startup.mode' = 'earliest-offset',
>  'format' = 'json'
> );
> mysql ddl:
>   create table hour_ddl
> (
> building_idSTRING,
> sofa_id  STRING,
> local_date STRING,
> `hour`  INT,
> floor_num INT,
> occupy_frequency INT,
> occupy_times STRING,
> update_time TIMESTAMP,
> process_time TIMESTAMP,
> primary key (building_id, sofa_id, local_date, `hour`) 
> NOT ENFORCED
> ) with (
>   'connector' = 'jdbc',
>   'url' = '',
>   'table-name' = '',
>   'username' = 'x'
>   'password' = 'xx'
>   )
> flink sql dml:
> INSERT INTO hour_ddl (building_id, sofa_id, local_date, `hour`, floor_num, 
> occupy_frequency, occupy_times, update_time, process_time)
> SELECT a.building_id,
>a.sofa_id,
>a.local_date,
>a.day_hour,
>a.floor_num,
>CAST(a.frequency + IF(b.occupy_frequency IS NULL, 0, 
> b.occupy_frequency) AS INT),
>concat(if(b.occupy_times IS NULL, '', b.occupy_times), 
> if(b.occupy_times IS NULL, a.times, concat(',', a.times))),
>NOW(),
>a.process_time
> FROM
> (SELECT building_id,
> sofa_id,
> local_date,
> day_hour,
> floor_num,
> count(1) AS frequency,
> LISTAGG(occupy_times) AS times,
> MAX(process_time) AS process_time,
> PROCTIME() AS compute_time
>  FROM washroom_detail
>  GROUP BY building_id,
>   sofa_id,
>   local_date,
>   day_hour,
>   floor_num) a
> LEFT JOIN hour_ddl
> FOR SYSTEM_TIME AS OF a.compute_time AS b ON a.building_id = b.building_id
> AND a.sofa_id = b.sofa_id
> AND a.local_date = b.local_date
> AND a.day_hour = b.`hour`
> WHERE a.process_time > b.process_time
> OR b.process_time IS NULL
> appearance:
> when mysql has not this record,insert this record:
> occupy_frequencyoccupy_times
>   1  15:01-15:03
> when key conflict , upsert this record:
> occupy_frequencyoccupy_times
>   3  15:01-15:03,15:01-15:03,15:03-15:04
> result should be the following record:
> {color:red}occupy_frequencyoccupy_times
>   2  
> 15:01-15:03,15:03-15:04{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21430) Appear append data when flink sql sink mysql on key conflict

2021-02-21 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17288154#comment-17288154
 ] 

Yu Wang commented on FLINK-21430:
-

[~jark]thanks for your reminding

> Appear append data when flink sql sink mysql on key conflict
> 
>
> Key: FLINK-21430
> URL: https://issues.apache.org/jira/browse/FLINK-21430
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.12.0
>Reporter: Yu Wang
>Priority: Major
>
> kafka data:
> {"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613802148092,"end_time":1613803144969,"process_time":"2021-02-20
>  06:39:05.088"}
> {"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613803609813,"end_time":1613803654280,"process_time":"2021-02-20
>  06:47:34.609"}
> kafka ddl :
> CREATE TABLE washroom_detail (
>  building_id STRING,
>  sofa_id STRING,
>  floor_num INT,
>  occupy_status INT,
>  start_time BIGINT,
>  end_time BIGINT,
>  process_time TIMESTAMP,
>  occupy_times as concat(date_format(TIMESTAMPADD(hour, 8, 
> cast(start_time / 1000 as timestamp)), 'HH:mm'), '-', 
> date_format(TIMESTAMPADD(hour, 8, cast(end_time / 1000 as timestamp)), 
> 'HH:mm')),
>  local_date as date_format(cast(start_time / 1000 as 
> timestamp), '-MM-dd'),
>  day_hour as cast(date_format(cast(start_time / 1000 as 
> timestamp), 'HH') as INT) + 8
> ) WITH (
>  'connector' = 'kafka',
>  'topic' = '',
>  'properties.bootstrap.servers' = 'xxx',
>  'properties.group.id' = '',
>  'scan.startup.mode' = 'earliest-offset',
>  'format' = 'json'
> );
> mysql ddl:
>   create table hour_ddl
> (
> building_idSTRING,
> sofa_id  STRING,
> local_date STRING,
> `hour`  INT,
> floor_num INT,
> occupy_frequency INT,
> occupy_times STRING,
> update_time TIMESTAMP,
> process_time TIMESTAMP,
> primary key (building_id, sofa_id, local_date, `hour`) 
> NOT ENFORCED
> ) with (
>   'connector' = 'jdbc',
>   'url' = '',
>   'table-name' = '',
>   'username' = 'x'
>   'password' = 'xx'
>   )
> flink sql dml:
> INSERT INTO hour_ddl (building_id, sofa_id, local_date, `hour`, floor_num, 
> occupy_frequency, occupy_times, update_time, process_time)
> SELECT a.building_id,
>a.sofa_id,
>a.local_date,
>a.day_hour,
>a.floor_num,
>CAST(a.frequency + IF(b.occupy_frequency IS NULL, 0, 
> b.occupy_frequency) AS INT),
>concat(if(b.occupy_times IS NULL, '', b.occupy_times), 
> if(b.occupy_times IS NULL, a.times, concat(',', a.times))),
>NOW(),
>a.process_time
> FROM
> (SELECT building_id,
> sofa_id,
> local_date,
> day_hour,
> floor_num,
> count(1) AS frequency,
> LISTAGG(occupy_times) AS times,
> MAX(process_time) AS process_time,
> PROCTIME() AS compute_time
>  FROM washroom_detail
>  GROUP BY building_id,
>   sofa_id,
>   local_date,
>   day_hour,
>   floor_num) a
> LEFT JOIN hour_ddl
> FOR SYSTEM_TIME AS OF a.compute_time AS b ON a.building_id = b.building_id
> AND a.sofa_id = b.sofa_id
> AND a.local_date = b.local_date
> AND a.day_hour = b.`hour`
> WHERE a.process_time > b.process_time
> OR b.process_time IS NULL
> appearance:
> when mysql has not this record,insert this record:
> occupy_frequencyoccupy_times
>   1  15:01-15:03
> when key conflict , upsert this record:
> occupy_frequencyoccupy_times
>   3  15:01-15:03,15:01-15:03,15:03-15:04
> result should be the following record:
> {color:red}occupy_frequencyoccupy_times
>   2  
> 15:01-15:03,15:03-15:04{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-21430) Appear append data when flink sql sink mysql on key conflict

2021-02-21 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-21430:

Description: 
kafka data:
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613802148092,"end_time":1613803144969,"process_time":"2021-02-20
 06:39:05.088"}
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613803609813,"end_time":1613803654280,"process_time":"2021-02-20
 06:47:34.609"}

kafka ddl :
CREATE TABLE washroom_detail (
 building_id STRING,
 sofa_id STRING,
 floor_num INT,
 occupy_status INT,
 start_time BIGINT,
 end_time BIGINT,
 process_time TIMESTAMP,
 occupy_times as concat(date_format(TIMESTAMPADD(hour, 8, 
cast(start_time / 1000 as timestamp)), 'HH:mm'), '-', 
date_format(TIMESTAMPADD(hour, 8, cast(end_time / 1000 as timestamp)), 
'HH:mm')),
 local_date as date_format(cast(start_time / 1000 as 
timestamp), '-MM-dd'),
 day_hour as cast(date_format(cast(start_time / 1000 as 
timestamp), 'HH') as INT) + 8
) WITH (
 'connector' = 'kafka',
 'topic' = '',
 'properties.bootstrap.servers' = 'xxx',
 'properties.group.id' = '',
 'scan.startup.mode' = 'earliest-offset',
 'format' = 'json'
);


mysql ddl:

  create table hour_ddl
(
building_idSTRING,
sofa_id  STRING,
local_date STRING,
`hour`  INT,
floor_num INT,
occupy_frequency INT,
occupy_times STRING,
update_time TIMESTAMP,
process_time TIMESTAMP,
primary key (building_id, sofa_id, local_date, `hour`) NOT 
ENFORCED
) with (
  'connector' = 'jdbc',
  'url' = '',
  'table-name' = '',
  'username' = 'x'
  'password' = 'xx'
  )


flink sql dml:

INSERT INTO hour_ddl (building_id, sofa_id, local_date, `hour`, floor_num, 
occupy_frequency, occupy_times, update_time, process_time)
SELECT a.building_id,
   a.sofa_id,
   a.local_date,
   a.day_hour,
   a.floor_num,
   CAST(a.frequency + IF(b.occupy_frequency IS NULL, 0, b.occupy_frequency) 
AS INT),
   concat(if(b.occupy_times IS NULL, '', b.occupy_times), if(b.occupy_times 
IS NULL, a.times, concat(',', a.times))),
   NOW(),
   a.process_time
FROM
(SELECT building_id,
sofa_id,
local_date,
day_hour,
floor_num,
count(1) AS frequency,
LISTAGG(occupy_times) AS times,
MAX(process_time) AS process_time,
PROCTIME() AS compute_time
 FROM washroom_detail
 GROUP BY building_id,
  sofa_id,
  local_date,
  day_hour,
  floor_num) a
LEFT JOIN hour_ddl
FOR SYSTEM_TIME AS OF a.compute_time AS b ON a.building_id = b.building_id
AND a.sofa_id = b.sofa_id
AND a.local_date = b.local_date
AND a.day_hour = b.`hour`
WHERE a.process_time > b.process_time
OR b.process_time IS NULL

appearance:
when mysql has not this record,insert this record:
occupy_frequencyoccupy_times
  1  15:01-15:03
when key conflict , upsert this record:
occupy_frequencyoccupy_times
  3  15:01-15:03,15:01-15:03,15:03-15:04
result should be the following record:
{color:red}occupy_frequencyoccupy_times
  2  15:01-15:03,15:03-15:04{color}


  was:
kafka 数据格式:
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613802148092,"end_time":1613803144969,"process_time":"2021-02-20
 06:39:05.088"}
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613803609813,"end_time":1613803654280,"process_time":"2021-02-20
 06:47:34.609"}

kafka ddl :
CREATE TABLE washroom_detail (
 building_id STRING,
 sofa_id STRING,
 floor_num INT,
 occupy_status INT,
 start_time BIGINT,
 end_time BIGINT,
 process_time TIMESTAMP,
 occupy_times as concat(date_format(TIMESTAMPADD(hour, 8, 
cast(start_time / 1000 as timestamp)), 'HH:mm'), '-', 
date_format(TIMESTAMPADD(hour, 8, cast(end_time / 1000 as timestamp)), 
'HH:mm')),
 local_date as date_format(cast(start_time / 1000 as 
timestamp), 

[jira] [Updated] (FLINK-21430) Appear append data when flink sql sink mysql on key conflict

2021-02-21 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-21430:

Description: 
kafka 数据格式:
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613802148092,"end_time":1613803144969,"process_time":"2021-02-20
 06:39:05.088"}
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613803609813,"end_time":1613803654280,"process_time":"2021-02-20
 06:47:34.609"}

kafka ddl :
CREATE TABLE washroom_detail (
 building_id STRING,
 sofa_id STRING,
 floor_num INT,
 occupy_status INT,
 start_time BIGINT,
 end_time BIGINT,
 process_time TIMESTAMP,
 occupy_times as concat(date_format(TIMESTAMPADD(hour, 8, 
cast(start_time / 1000 as timestamp)), 'HH:mm'), '-', 
date_format(TIMESTAMPADD(hour, 8, cast(end_time / 1000 as timestamp)), 
'HH:mm')),
 local_date as date_format(cast(start_time / 1000 as 
timestamp), '-MM-dd'),
 day_hour as cast(date_format(cast(start_time / 1000 as 
timestamp), 'HH') as INT) + 8
) WITH (
 'connector' = 'kafka',
 'topic' = '',
 'properties.bootstrap.servers' = 'xxx',
 'properties.group.id' = '',
 'scan.startup.mode' = 'earliest-offset',
 'format' = 'json'
);


mysql ddl:
  create table hour_ddl
(
building_idSTRING,
sofa_id  STRING,
local_date STRING,
`hour`  INT,
floor_num INT,
occupy_frequency INT,
occupy_times STRING,
update_time TIMESTAMP,
process_time TIMESTAMP,
primary key (building_id, sofa_id, local_date, `hour`) NOT 
ENFORCED
) with (
  'connector' = 'jdbc',
  'url' = '',
  'table-name' = '',
  'username' = 'x'
  'password' = 'xx'
  )


flink sql dml:
INSERT INTO hour_ddl (building_id, sofa_id, local_date, `hour`, floor_num, 
occupy_frequency, occupy_times, update_time, process_time)
SELECT a.building_id,
   a.sofa_id,
   a.local_date,
   a.day_hour,
   a.floor_num,
   CAST(a.frequency + IF(b.occupy_frequency IS NULL, 0, b.occupy_frequency) 
AS INT),
   concat(if(b.occupy_times IS NULL, '', b.occupy_times), if(b.occupy_times 
IS NULL, a.times, concat(',', a.times))),
   NOW(),
   a.process_time
FROM
(SELECT building_id,
sofa_id,
local_date,
day_hour,
floor_num,
count(1) AS frequency,
LISTAGG(occupy_times) AS times,
MAX(process_time) AS process_time,
PROCTIME() AS compute_time
 FROM washroom_detail
 GROUP BY building_id,
  sofa_id,
  local_date,
  day_hour,
  floor_num) a
LEFT JOIN hour_ddl
FOR SYSTEM_TIME AS OF a.compute_time AS b ON a.building_id = b.building_id
AND a.sofa_id = b.sofa_id
AND a.local_date = b.local_date
AND a.day_hour = b.`hour`
WHERE a.process_time > b.process_time
OR b.process_time IS NULL

现象:
当mysql 没有数据时,插入一条记录:
occupy_frequencyoccupy_times
  1  15:01-15:03
当主键冲突时,现象是:
occupy_frequencyoccupy_times
  3  15:01-15:03,15:01-15:03,15:03-15:04
结果应该是:
{color:red}occupy_frequencyoccupy_times
  2  15:01-15:03,15:03-15:04{color}


  was:
kafka 数据格式:
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613802148092,"end_time":1613803144969,"process_time":"2021-02-20
 06:39:05.088"}
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613803609813,"end_time":1613803654280,"process_time":"2021-02-20
 06:47:34.609"}

kafka ddl :
CREATE TABLE washroom_detail (
 building_id STRING,
 sofa_id STRING,
 floor_num INT,
 occupy_status INT,
 start_time BIGINT,
 end_time BIGINT,
 process_time TIMESTAMP,
 occupy_times as concat(date_format(TIMESTAMPADD(hour, 8, 
cast(start_time / 1000 as timestamp)), 'HH:mm'), '-', 
date_format(TIMESTAMPADD(hour, 8, cast(end_time / 1000 as timestamp)), 
'HH:mm')),
 local_date as date_format(cast(start_time / 1000 as 
timestamp), '-MM-dd'),
 day_hour as cast(date_format(cast(start_time / 1000 as 
timestamp), 'HH') as 

[jira] [Updated] (FLINK-21430) Appear append data when flink sql sink mysql on key conflict

2021-02-21 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-21430:

Description: 
kafka 数据格式:
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613802148092,"end_time":1613803144969,"process_time":"2021-02-20
 06:39:05.088"}
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613803609813,"end_time":1613803654280,"process_time":"2021-02-20
 06:47:34.609"}

kafka ddl :
CREATE TABLE washroom_detail (
 building_id STRING,
 sofa_id STRING,
 floor_num INT,
 occupy_status INT,
 start_time BIGINT,
 end_time BIGINT,
 process_time TIMESTAMP,
 occupy_times as concat(date_format(TIMESTAMPADD(hour, 8, 
cast(start_time / 1000 as timestamp)), 'HH:mm'), '-', 
date_format(TIMESTAMPADD(hour, 8, cast(end_time / 1000 as timestamp)), 
'HH:mm')),
 local_date as date_format(cast(start_time / 1000 as 
timestamp), '-MM-dd'),
 day_hour as cast(date_format(cast(start_time / 1000 as 
timestamp), 'HH') as INT) + 8
) WITH (
 'connector' = 'kafka',
 'topic' = '',
 'properties.bootstrap.servers' = 'xxx',
 'properties.group.id' = '',
 'scan.startup.mode' = 'earliest-offset',
 'format' = 'json'
);


mysql ddl:
  create table hour_ddl
(
building_idSTRING,
sofa_id  STRING,
local_date STRING,
`hour`  INT,
floor_num INT,
occupy_frequency INT,
occupy_times STRING,
update_time TIMESTAMP,
process_time TIMESTAMP,
primary key (building_id, sofa_id, local_date, `hour`) NOT 
ENFORCED
) with (
  'connector' = 'jdbc',
  'url' = '',
  'table-name' = '',
  'username' = 'x'
  'password' = 'xx'
  )


flink sql dml:
INSERT INTO hour_ddl (building_id, sofa_id, local_date, `hour`, floor_num, 
occupy_frequency, occupy_times, update_time, process_time)
SELECT a.building_id,
   a.sofa_id,
   a.local_date,
   a.day_hour,
   a.floor_num,
   CAST(a.frequency + IF(b.occupy_frequency IS NULL, 0, b.occupy_frequency) 
AS INT),
   concat(if(b.occupy_times IS NULL, '', b.occupy_times), if(b.occupy_times 
IS NULL, a.times, concat(',', a.times))),
   NOW(),
   a.process_time
FROM
(SELECT building_id,
sofa_id,
local_date,
day_hour,
floor_num,
count(1) AS frequency,
LISTAGG(occupy_times) AS times,
MAX(process_time) AS process_time,
PROCTIME() AS compute_time
 FROM washroom_detail
 GROUP BY building_id,
  sofa_id,
  local_date,
  day_hour,
  floor_num) a
LEFT JOIN hour_ddl
FOR SYSTEM_TIME AS OF a.compute_time AS b ON a.building_id = b.building_id
AND a.sofa_id = b.sofa_id
AND a.local_date = b.local_date
AND a.day_hour = b.`hour`
WHERE a.process_time > b.process_time
OR b.process_time IS NULL

现象:
当mysql 没有数据时,插入一条记录:
occupy_frequencyoccupy_times
  1  15:01-15:03
当主键冲突时,现象是:
occupy_frequencyoccupy_times
  3  15:01-15:03,15:01-15:03,15:03-15:04
结果应该是:
{color:red}occupy_frequencyoccupy_times
  2  15:01-15:03,15:03-15:04{color}


  was:
kafka 数据格式:
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613802148092,"end_time":1613803144969,"process_time":"2021-02-20
 06:39:05.088"}
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613803609813,"end_time":1613803654280,"process_time":"2021-02-20
 06:47:34.609"}


> Appear append data when flink sql sink mysql on key conflict
> 
>
> Key: FLINK-21430
> URL: https://issues.apache.org/jira/browse/FLINK-21430
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.12.0
>Reporter: Yu Wang
>Priority: Major
>
> kafka 数据格式:
> {"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613802148092,"end_time":1613803144969,"process_time":"2021-02-20
>  06:39:05.088"}
> 

[jira] [Updated] (FLINK-21430) Appear append data when flink sql sink mysql on key conflict

2021-02-21 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-21430:

Description: 
kafka 数据格式:
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613802148092,"end_time":1613803144969,"process_time":"2021-02-20
 06:39:05.088"}
{"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613803609813,"end_time":1613803654280,"process_time":"2021-02-20
 06:47:34.609"}

  was:

{code:java}
// Some comments here
public String getFoo()
{
return foo;
}
{code}



> Appear append data when flink sql sink mysql on key conflict
> 
>
> Key: FLINK-21430
> URL: https://issues.apache.org/jira/browse/FLINK-21430
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.12.0
>Reporter: Yu Wang
>Priority: Major
>
> kafka 数据格式:
> {"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613802148092,"end_time":1613803144969,"process_time":"2021-02-20
>  06:39:05.088"}
> {"building_id":"trRlI9PL","sofa_id":"dJI53xLp","floor_num":10,"occupy_status":1,"start_time":1613803609813,"end_time":1613803654280,"process_time":"2021-02-20
>  06:47:34.609"}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21430) Appear append data when flink sql sink mysql on key conflict

2021-02-21 Thread Yu Wang (Jira)
Yu Wang created FLINK-21430:
---

 Summary: Appear append data when flink sql sink mysql on key 
conflict
 Key: FLINK-21430
 URL: https://issues.apache.org/jira/browse/FLINK-21430
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.12.0
Reporter: Yu Wang



{code:java}
// Some comments here
public String getFoo()
{
return foo;
}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-18312) SavepointStatusHandler and StaticFileServerHandler not redirect

2021-01-31 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139087#comment-17139087
 ] 

Yu Wang edited comment on FLINK-18312 at 2/1/21, 3:04 AM:
--

[~chesnay], [~trohrmann] agree with you, it's better to move the cache layer 
behind the RPC layer.


was (Author: lucentwong):
[~chesnay], [~trohrmann] agree with you, it's better to move the cache layer 
behind the PRC layer.

> SavepointStatusHandler and StaticFileServerHandler not redirect 
> 
>
> Key: FLINK-18312
> URL: https://issues.apache.org/jira/browse/FLINK-18312
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.8.0, 1.9.0, 1.10.0
> Environment: 1. Deploy flink cluster in standlone mode on kubernetes 
> and use two Jobmanagers for HA.
> 2. Deploy a kubernetes service for the two jobmanagers to provide a unified 
> url.
>Reporter: Yu Wang
>Priority: Major
>
> Savepoint:
> 1. Deploy our flink cluster in standlone mode on kubernetes and use two 
> Jobmanagers for HA.
> 2. Deploy a kubernetes service for the two jobmanagers to provide a unified 
> url.
> 3. Send a savepoint trigger request to the leader Jobmanager.
> 4. Query the savepoint status from leader Jobmanager, get correct response.
> 5. Query the savepoint status from standby Jobmanager, the response will be 
> 404.
> Jobmanager log:
> 1. Query log from leader Jobmanager, get leader log.
> 2. Query log from standby Jobmanager, get standby log.
>  
> Both these two requests will be redirect to the leader in 1.7.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18312) SavepointStatusHandler and StaticFileServerHandler not redirect

2020-06-17 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139087#comment-17139087
 ] 

Yu Wang commented on FLINK-18312:
-

[~chesnay], [~trohrmann] agree with you, it's better to move the cache layer 
behind the PRC layer.

> SavepointStatusHandler and StaticFileServerHandler not redirect 
> 
>
> Key: FLINK-18312
> URL: https://issues.apache.org/jira/browse/FLINK-18312
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.8.0, 1.9.0, 1.10.0
> Environment: 1. Deploy flink cluster in standlone mode on kubernetes 
> and use two Jobmanagers for HA.
> 2. Deploy a kubernetes service for the two jobmanagers to provide a unified 
> url.
>Reporter: Yu Wang
>Priority: Major
>
> Savepoint:
> 1. Deploy our flink cluster in standlone mode on kubernetes and use two 
> Jobmanagers for HA.
> 2. Deploy a kubernetes service for the two jobmanagers to provide a unified 
> url.
> 3. Send a savepoint trigger request to the leader Jobmanager.
> 4. Query the savepoint status from leader Jobmanager, get correct response.
> 5. Query the savepoint status from standby Jobmanager, the response will be 
> 404.
> Jobmanager log:
> 1. Query log from leader Jobmanager, get leader log.
> 2. Query log from standby Jobmanager, get standby log.
>  
> Both these two requests will be redirect to the leader in 1.7.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18312) SavepointStatusHandler and StaticFileServerHandler not redirect

2020-06-15 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136258#comment-17136258
 ] 

Yu Wang commented on FLINK-18312:
-

I think there seems a issue in "AbstractAsynchronousOperationHandlers", in this 
handler, there is a local memory cache "completedOperationCache" to store the 
pending savpoint opeartion before redirect the request to the leader 
jobmanager, which seems not synced between all the jobmanagers. This makes only 
the jobmanager which receive the savepoint trigger requset can lookup the 
status of the savpoint, while the others can only return 404.

> SavepointStatusHandler and StaticFileServerHandler not redirect 
> 
>
> Key: FLINK-18312
> URL: https://issues.apache.org/jira/browse/FLINK-18312
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.8.0, 1.9.0, 1.10.0
> Environment: 1. Deploy flink cluster in standlone mode on kubernetes 
> and use two Jobmanagers for HA.
> 2. Deploy a kubernetes service for the two jobmanagers to provide a unified 
> url.
>Reporter: Yu Wang
>Priority: Major
>
> Savepoint:
> 1. Deploy our flink cluster in standlone mode on kubernetes and use two 
> Jobmanagers for HA.
> 2. Deploy a kubernetes service for the two jobmanagers to provide a unified 
> url.
> 3. Send a savepoint trigger request to the leader Jobmanager.
> 4. Query the savepoint status from leader Jobmanager, get correct response.
> 5. Query the savepoint status from standby Jobmanager, the response will be 
> 404.
> Jobmanager log:
> 1. Query log from leader Jobmanager, get leader log.
> 2. Query log from standby Jobmanager, get standby log.
>  
> Both these two requests will be redirect to the leader in 1.7.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18312) SavepointStatusHandler and StaticFileServerHandler not redirect

2020-06-15 Thread Yu Wang (Jira)
Yu Wang created FLINK-18312:
---

 Summary: SavepointStatusHandler and StaticFileServerHandler not 
redirect 
 Key: FLINK-18312
 URL: https://issues.apache.org/jira/browse/FLINK-18312
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Affects Versions: 1.10.0, 1.9.0, 1.8.0
 Environment: 1. Deploy flink cluster in standlone mode on kubernetes 
and use two Jobmanagers for HA.
2. Deploy a kubernetes service for the two jobmanagers to provide a unified url.
Reporter: Yu Wang


Savepoint:

1. Deploy our flink cluster in standlone mode on kubernetes and use two 
Jobmanagers for HA.

2. Deploy a kubernetes service for the two jobmanagers to provide a unified url.

3. Send a savepoint trigger request to the leader Jobmanager.

4. Query the savepoint status from leader Jobmanager, get correct response.

5. Query the savepoint status from standby Jobmanager, the response will be 404.

Jobmanager log:

1. Query log from leader Jobmanager, get leader log.

2. Query log from standby Jobmanager, get standby log.

 

Both these two requests will be redirect to the leader in 1.7.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17272) param has space in config.sh

2020-04-20 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-17272:

Affects Version/s: (was: 1.9.1)

> param has space in config.sh
> 
>
> Key: FLINK-17272
> URL: https://issues.apache.org/jira/browse/FLINK-17272
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Yu Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: 屏幕快照 2020-04-20 下午8.25.43.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17272) param has space in config.sh

2020-04-20 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-17272:

Affects Version/s: 1.9.1

> param has space in config.sh
> 
>
> Key: FLINK-17272
> URL: https://issues.apache.org/jira/browse/FLINK-17272
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.9.1
>Reporter: Yu Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: 屏幕快照 2020-04-20 下午8.25.43.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17272) param has space in config.sh

2020-04-20 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-17272:

Component/s: Build System

> param has space in config.sh
> 
>
> Key: FLINK-17272
> URL: https://issues.apache.org/jira/browse/FLINK-17272
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Yu Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: 屏幕快照 2020-04-20 下午8.25.43.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17272) param has space in config.sh

2020-04-20 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087695#comment-17087695
 ] 

Yu Wang commented on FLINK-17272:
-

https://github.com/apache/flink/pull/11827

> param has space in config.sh
> 
>
> Key: FLINK-17272
> URL: https://issues.apache.org/jira/browse/FLINK-17272
> Project: Flink
>  Issue Type: Bug
>Reporter: Yu Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: 屏幕快照 2020-04-20 下午8.25.43.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17272) param has space in config.sh

2020-04-20 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-17272:

Attachment: 屏幕快照 2020-04-20 下午8.25.43.png

> param has space in config.sh
> 
>
> Key: FLINK-17272
> URL: https://issues.apache.org/jira/browse/FLINK-17272
> Project: Flink
>  Issue Type: Bug
>Reporter: Yu Wang
>Priority: Major
> Attachments: 屏幕快照 2020-04-20 下午8.25.43.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17272) param has space in config.sh

2020-04-20 Thread Yu Wang (Jira)
Yu Wang created FLINK-17272:
---

 Summary: param has space in config.sh
 Key: FLINK-17272
 URL: https://issues.apache.org/jira/browse/FLINK-17272
 Project: Flink
  Issue Type: Bug
Reporter: Yu Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17270) param has space in config.sh

2020-04-20 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087670#comment-17087670
 ] 

Yu Wang commented on FLINK-17270:
-

https://github.com/apache/flink/pull/11824

> param has space in config.sh
> 
>
> Key: FLINK-17270
> URL: https://issues.apache.org/jira/browse/FLINK-17270
> Project: Flink
>  Issue Type: Bug
>Reporter: Yu Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: 屏幕快照 2020-04-20 下午8.25.43.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (FLINK-17270) param has space in config.sh

2020-04-20 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-17270:

Comment: was deleted

(was: https://github.com/apache/flink/pull/11824)

> param has space in config.sh
> 
>
> Key: FLINK-17270
> URL: https://issues.apache.org/jira/browse/FLINK-17270
> Project: Flink
>  Issue Type: Bug
>Reporter: Yu Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: 屏幕快照 2020-04-20 下午8.25.43.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17270) param has space in config.sh

2020-04-20 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087663#comment-17087663
 ] 

Yu Wang commented on FLINK-17270:
-

https://github.com/apache/flink/pull/11824

> param has space in config.sh
> 
>
> Key: FLINK-17270
> URL: https://issues.apache.org/jira/browse/FLINK-17270
> Project: Flink
>  Issue Type: Bug
>Reporter: Yu Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: 屏幕快照 2020-04-20 下午8.25.43.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17270) param has space in config.sh

2020-04-20 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-17270:

Attachment: 屏幕快照 2020-04-20 下午8.25.43.png

> param has space in config.sh
> 
>
> Key: FLINK-17270
> URL: https://issues.apache.org/jira/browse/FLINK-17270
> Project: Flink
>  Issue Type: Bug
>Reporter: Yu Wang
>Priority: Major
> Attachments: 屏幕快照 2020-04-20 下午8.25.43.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17270) param has space in config.sh

2020-04-20 Thread Yu Wang (Jira)
Yu Wang created FLINK-17270:
---

 Summary: param has space in config.sh
 Key: FLINK-17270
 URL: https://issues.apache.org/jira/browse/FLINK-17270
 Project: Flink
  Issue Type: Bug
Reporter: Yu Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-13895) Client does not exit when bin/yarn-session.sh come fail

2019-08-31 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-13895:

Description: the hadoop cluster environment java version is 1.7, flink is 
compiled with jdk1.8,I used bin/yarn-session.sh submit it , then client comes 
error and does not exit . I found yarn application which is failed , so then we 
should not kill the yarn application, we can stop the yarn client . attachments 
is operation log  (was: the hadoop cluster environment java version is 1.7, 
flink is compiled with jdk1.8,I used bin/yarn-session.sh submit it , then 
client comes error and does not exit . I found yarn application which is failed 
, so then we should not kill the yarn application . attachments is operation 
log)

> Client does not exit when bin/yarn-session.sh come fail
> ---
>
> Key: FLINK-13895
> URL: https://issues.apache.org/jira/browse/FLINK-13895
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.9.0
>Reporter: Yu Wang
>Priority: Minor
>  Labels: pull-request-available
> Attachments: client_exit.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> the hadoop cluster environment java version is 1.7, flink is compiled with 
> jdk1.8,I used bin/yarn-session.sh submit it , then client comes error and 
> does not exit . I found yarn application which is failed , so then we should 
> not kill the yarn application, we can stop the yarn client . attachments is 
> operation log



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (FLINK-13895) Client does not exit when bin/yarn-session.sh come fail

2019-08-31 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-13895:

Description: the hadoop cluster environment java version is 1.7, flink is 
compiled with jdk1.8,I used bin/yarn-session.sh submit it , then client comes 
error and does not exit . I found yarn application which is failed , so then we 
should not kill the yarn application . attachments is operation log  (was: the 
hadoop cluster environment java version is 1.7, flink is compiled with jdk1.8,I 
used bin/yarn-session.sh submit it , then client comes error and does not exit 
. I found yarn application which is failed , so then we should not kill the 
yarn application .)

> Client does not exit when bin/yarn-session.sh come fail
> ---
>
> Key: FLINK-13895
> URL: https://issues.apache.org/jira/browse/FLINK-13895
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.9.0
>Reporter: Yu Wang
>Priority: Minor
>  Labels: pull-request-available
> Attachments: client_exit.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> the hadoop cluster environment java version is 1.7, flink is compiled with 
> jdk1.8,I used bin/yarn-session.sh submit it , then client comes error and 
> does not exit . I found yarn application which is failed , so then we should 
> not kill the yarn application . attachments is operation log



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (FLINK-13895) Client does not exit when bin/yarn-session.sh come fail

2019-08-31 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-13895:

Attachment: client_exit.txt

> Client does not exit when bin/yarn-session.sh come fail
> ---
>
> Key: FLINK-13895
> URL: https://issues.apache.org/jira/browse/FLINK-13895
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.9.0
>Reporter: Yu Wang
>Priority: Minor
>  Labels: pull-request-available
> Attachments: client_exit.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> the hadoop cluster environment java version is 1.7, flink is compiled with 
> jdk1.8,I used bin/yarn-session.sh submit it , then client comes error and 
> does not exit . I found yarn application which is failed , so then we should 
> not kill the yarn application .



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (FLINK-13895) Client does not exit when bin/yarn-session.sh come fail

2019-08-31 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-13895:

Description: the hadoop cluster environment java version is 1.7, flink is 
compiled with jdk1.8,I used bin/yarn-session.sh submit it , then client comes 
error and does not exit . I found yarn application which is failed , so then we 
should not kill the yarn application .  (was: 2019-08-29 09:42:00,589 INFO  
org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Deploying 
cluster, current state ACCEPTED
2019-08-29 09:42:04,718 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli 
- Error while running the Flink Yarn session.
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy 
Yarn session cluster
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:385)
at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:616)
at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$3(FlinkYarnSessionCli.java:844)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:844)
Caused by: 
org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: 
The YARN application unexpectedly switched to state FAILED during deployment. 
Diagnostics from YARN: Application application_1565802461003_0608 failed 1 
times due to AM Container for appattempt_1565802461003_0608_01 exited with  
exitCode: 1
For more detailed output, check application tracking 
page:https://hadoop-btnn9001.eniot.io:8090/cluster/app/application_1565802461003_0608Then,
 click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e35_1565802461003_0608_01_01
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:387)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Shell output: main : command provided 1
main : run as user is flinktest
main : requested yarn user is flinktest


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
If log aggregation is enabled on your cluster, use this command to further 
investigate the issue:
yarn logs -applicationId application_1565802461003_0608
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1024)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:507)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:378)
... 7 more
2019-08-29 09:42:04,723 INFO  
org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Cancelling 
deployment from Deployment Failure Hook
2019-08-29 09:42:04,723 INFO  
org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Killing YARN 
application
2019-08-29 09:42:04,729 INFO  org.apache.hadoop.io.retry.RetryInvocationHandler 
- Exception while invoking forceKillApplication of class 
ApplicationClientProtocolPBClientImpl over rm1. Trying to fail over immediately.
java.io.IOException: The client is stopped
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1508)
at org.apache.hadoop.ipc.Client.call(Client.java:1452)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy7.forceKillApplication(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.forceKillApplication(ApplicationClientProtocolPBClientImpl.java:176)

[jira] [Commented] (FLINK-13895) Client does not exit when bin/yarn-session.sh come fail

2019-08-29 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918491#comment-16918491
 ] 

Yu Wang commented on FLINK-13895:
-

[~Tison] Please review it , Thanks

> Client does not exit when bin/yarn-session.sh come fail
> ---
>
> Key: FLINK-13895
> URL: https://issues.apache.org/jira/browse/FLINK-13895
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.9.0
>Reporter: Yu Wang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 2019-08-29 09:42:00,589 INFO  
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Deploying 
> cluster, current state ACCEPTED
> 2019-08-29 09:42:04,718 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>   - Error while running the Flink Yarn session.
> org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't 
> deploy Yarn session cluster
>   at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:385)
>   at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:616)
>   at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$3(FlinkYarnSessionCli.java:844)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
>   at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:844)
> Caused by: 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: 
> The YARN application unexpectedly switched to state FAILED during deployment. 
> Diagnostics from YARN: Application application_1565802461003_0608 failed 1 
> times due to AM Container for appattempt_1565802461003_0608_01 exited 
> with  exitCode: 1
> For more detailed output, check application tracking 
> page:https://hadoop-btnn9001.eniot.io:8090/cluster/app/application_1565802461003_0608Then,
>  click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e35_1565802461003_0608_01_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
>   at org.apache.hadoop.util.Shell.run(Shell.java:456)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:387)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Shell output: main : command provided 1
> main : run as user is flinktest
> main : requested yarn user is flinktest
> Container exited with a non-zero exit code 1
> Failing this attempt. Failing the application.
> If log aggregation is enabled on your cluster, use this command to further 
> investigate the issue:
> yarn logs -applicationId application_1565802461003_0608
>   at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1024)
>   at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:507)
>   at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:378)
>   ... 7 more
> 2019-08-29 09:42:04,723 INFO  
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Cancelling 
> deployment from Deployment Failure Hook
> 2019-08-29 09:42:04,723 INFO  
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Killing YARN 
> application
> 2019-08-29 09:42:04,729 INFO  
> org.apache.hadoop.io.retry.RetryInvocationHandler - Exception 
> while invoking forceKillApplication of class 
> ApplicationClientProtocolPBClientImpl over rm1. Trying to fail over 
> immediately.
> java.io.IOException: The client is stopped
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1508)
>   at 

[jira] [Created] (FLINK-13895) Client does not exit when bin/yarn-session.sh come fail

2019-08-29 Thread Yu Wang (Jira)
Yu Wang created FLINK-13895:
---

 Summary: Client does not exit when bin/yarn-session.sh come fail
 Key: FLINK-13895
 URL: https://issues.apache.org/jira/browse/FLINK-13895
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Affects Versions: 1.9.0
Reporter: Yu Wang


2019-08-29 09:42:00,589 INFO  
org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Deploying 
cluster, current state ACCEPTED
2019-08-29 09:42:04,718 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli 
- Error while running the Flink Yarn session.
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy 
Yarn session cluster
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:385)
at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:616)
at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$3(FlinkYarnSessionCli.java:844)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:844)
Caused by: 
org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: 
The YARN application unexpectedly switched to state FAILED during deployment. 
Diagnostics from YARN: Application application_1565802461003_0608 failed 1 
times due to AM Container for appattempt_1565802461003_0608_01 exited with  
exitCode: 1
For more detailed output, check application tracking 
page:https://hadoop-btnn9001.eniot.io:8090/cluster/app/application_1565802461003_0608Then,
 click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e35_1565802461003_0608_01_01
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:387)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Shell output: main : command provided 1
main : run as user is flinktest
main : requested yarn user is flinktest


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
If log aggregation is enabled on your cluster, use this command to further 
investigate the issue:
yarn logs -applicationId application_1565802461003_0608
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1024)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:507)
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:378)
... 7 more
2019-08-29 09:42:04,723 INFO  
org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Cancelling 
deployment from Deployment Failure Hook
2019-08-29 09:42:04,723 INFO  
org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Killing YARN 
application
2019-08-29 09:42:04,729 INFO  org.apache.hadoop.io.retry.RetryInvocationHandler 
- Exception while invoking forceKillApplication of class 
ApplicationClientProtocolPBClientImpl over rm1. Trying to fail over immediately.
java.io.IOException: The client is stopped
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1508)
at org.apache.hadoop.ipc.Client.call(Client.java:1452)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy7.forceKillApplication(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.forceKillApplication(ApplicationClientProtocolPBClientImpl.java:176)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  

[jira] [Updated] (FLINK-13831) Free Slots / All Slots display error

2019-08-23 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-13831:

Labels: web-dashboard  (was: pull-request-available)

> Free Slots / All Slots display error
> 
>
> Key: FLINK-13831
> URL: https://issues.apache.org/jira/browse/FLINK-13831
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.9.0
>Reporter: Yu Wang
>Priority: Critical
>  Labels: web-dashboard
> Attachments: FLINK-13831.patch, slots.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Free Slots / All Slots display error



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (FLINK-13831) Free Slots / All Slots display error

2019-08-23 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-13831:

Description: Free Slots / All Slots display error  (was: slots total 
display error)

> Free Slots / All Slots display error
> 
>
> Key: FLINK-13831
> URL: https://issues.apache.org/jira/browse/FLINK-13831
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.9.0
>Reporter: Yu Wang
>Priority: Critical
> Attachments: FLINK-13831.patch, slots.png
>
>
> Free Slots / All Slots display error



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (FLINK-13831) Free Slots / All Slots display error

2019-08-23 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-13831:

Summary: Free Slots / All Slots display error  (was: slots total display 
error)

> Free Slots / All Slots display error
> 
>
> Key: FLINK-13831
> URL: https://issues.apache.org/jira/browse/FLINK-13831
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.9.0
>Reporter: Yu Wang
>Priority: Critical
> Attachments: FLINK-13831.patch, slots.png
>
>
> slots total display error



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13831) slots total display error

2019-08-23 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914113#comment-16914113
 ] 

Yu Wang commented on FLINK-13831:
-

[~Tison][~StephanEwen] Please review it , Thanks

> slots total display error
> -
>
> Key: FLINK-13831
> URL: https://issues.apache.org/jira/browse/FLINK-13831
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.9.0
>Reporter: Yu Wang
>Priority: Critical
> Attachments: FLINK-13831.patch, slots.png
>
>
> slots total display error



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (FLINK-13831) slots total display error

2019-08-23 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-13831:

Attachment: FLINK-13831.patch

> slots total display error
> -
>
> Key: FLINK-13831
> URL: https://issues.apache.org/jira/browse/FLINK-13831
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.9.0
>Reporter: Yu Wang
>Priority: Critical
> Attachments: FLINK-13831.patch, slots.png
>
>
> slots total display error



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (FLINK-13831) slots total display error

2019-08-23 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-13831:

Attachment: (was: FLINK-13831.patch)

> slots total display error
> -
>
> Key: FLINK-13831
> URL: https://issues.apache.org/jira/browse/FLINK-13831
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.9.0
>Reporter: Yu Wang
>Priority: Critical
> Attachments: FLINK-13831.patch, slots.png
>
>
> slots total display error



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (FLINK-13831) slots total display error

2019-08-23 Thread Yu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated FLINK-13831:

Attachment: FLINK-13831.patch

> slots total display error
> -
>
> Key: FLINK-13831
> URL: https://issues.apache.org/jira/browse/FLINK-13831
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.9.0
>Reporter: Yu Wang
>Priority: Critical
> Attachments: FLINK-13831.patch, slots.png
>
>
> slots total display error



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13831) slots total display error

2019-08-23 Thread Yu Wang (Jira)
Yu Wang created FLINK-13831:
---

 Summary: slots total display error
 Key: FLINK-13831
 URL: https://issues.apache.org/jira/browse/FLINK-13831
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Affects Versions: 1.9.0
Reporter: Yu Wang
 Attachments: slots.png

slots total display error



--
This message was sent by Atlassian Jira
(v8.3.2#803003)