Re: Flink Web ui not stable in kubernetes?

2022-03-25 Thread Guillaume Vauvert

Hi,

I agree that the changelog description of 
https://issues.apache.org/jira/browse/FLINK-25732 is talking only about 
technical root cause, not user consequences.

I have added a comment in https://issues.apache.org/jira/browse/FLINK-25732.

Have a nice day !

Guillaume

On 25/03/2022 10.31, Sebastian Struss wrote:

Hi Guillaume,

thank you for this great hint! It indeed fixed the mentioned issue.
Just from reading the changelog of 1.14.4 i would not have known that 
this fix is included, maybe i was searching for the wrong stuff though.


Have a great day!
Sebastian

On Fri, Mar 25, 2022 at 10:51 AM Guillaume Vauvert 
 wrote:


Hello Sebastian,

Multiple versions of Flink 1.14.x are known to have issue with
UI/CLI, please switch to Flink 1.14.4.

Best regards,

Guillaume

On 25/03/2022 08.42, Sebastian Struss wrote:

Hello all,

i've been setting up flink in my kubernetes cluster with 2 job
managers and 1 task manager (custom helm chart i wrote, no
flink CLI used).
I can access the web ui, but often it seems to switch pods which
i am connected to and as soon as i am connected to the standby
job manager it doesn't load at all.
The leader election does seem to work nicely, as when i kill the
leading pod the standby instance takes over after ~5s.
I do see errors like this when i browse the web ui:

"""
2022-03-24 15:38:17,269 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] -
Unhandled exception.
java.util.concurrent.CancellationException: null
at
java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2276)
~[?:1.8.0_302]
at

org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.getExecutionGraphInternal(DefaultExecutionGraphCache.java:98)
~[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.getExecutionGraphInfo(DefaultExecutionGraphCache.java:67)
~[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.handleRequest(AbstractExecutionGraphHandler.java:81)
~[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.runtime.rest.handler.AbstractRestHandler.respondToRequest(AbstractRestHandler.java:83)
~[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.runtime.rest.handler.AbstractHandler.respondAsLeader(AbstractHandler.java:195)
~[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.lambda$channelRead0$0(LeaderRetrievalHandler.java:83)
~[flink-dist_2.12-1.14.2.jar:1.14.2]
at java.util.Optional.ifPresent(Optional.java:159) [?:1.8.0_302]
at
org.apache.flink.util.OptionalConsumer.ifPresent(OptionalConsumer.java:45)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.channelRead0(LeaderRetrievalHandler.java:80)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.channelRead0(LeaderRetrievalHandler.java:49)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.runtime.rest.handler.router.RouterHandler.routed(RouterHandler.java:115)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.runtime.rest.handler.router.RouterHandler.channelRead0(RouterHandler.java:94)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.runtime.rest.handler.router.RouterHandler.channelRead0(RouterHandler.java:55)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
[flink-dist_2.12-1.14.2.jar:1.14.2]
at

org.apache.flin

Re: Flink Web ui not stable in kubernetes?

2022-03-25 Thread Guillaume Vauvert

Hello Sebastian,

Multiple versions of Flink 1.14.x are known to have issue with UI/CLI, 
please switch to Flink 1.14.4.


Best regards,

Guillaume

On 25/03/2022 08.42, Sebastian Struss wrote:

Hello all,

i've been setting up flink in my kubernetes cluster with 2 job 
managers and 1 task manager (custom helm chart i wrote, no flink CLI 
used).
I can access the web ui, but often it seems to switch pods which i am 
connected to and as soon as i am connected to the standby job manager 
it doesn't load at all.
The leader election does seem to work nicely, as when i kill the 
leading pod the standby instance takes over after ~5s.

I do see errors like this when i browse the web ui:

"""
2022-03-24 15:38:17,269 ERROR 
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - 
Unhandled exception.

java.util.concurrent.CancellationException: null
at 
java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2276) 
~[?:1.8.0_302]
at 
org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.getExecutionGraphInternal(DefaultExecutionGraphCache.java:98) 
~[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.getExecutionGraphInfo(DefaultExecutionGraphCache.java:67) 
~[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.handleRequest(AbstractExecutionGraphHandler.java:81) 
~[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.runtime.rest.handler.AbstractRestHandler.respondToRequest(AbstractRestHandler.java:83) 
~[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.runtime.rest.handler.AbstractHandler.respondAsLeader(AbstractHandler.java:195) 
~[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.lambda$channelRead0$0(LeaderRetrievalHandler.java:83) 
~[flink-dist_2.12-1.14.2.jar:1.14.2]

at java.util.Optional.ifPresent(Optional.java:159) [?:1.8.0_302]
at 
org.apache.flink.util.OptionalConsumer.ifPresent(OptionalConsumer.java:45) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.channelRead0(LeaderRetrievalHandler.java:80) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.channelRead0(LeaderRetrievalHandler.java:49) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.runtime.rest.handler.router.RouterHandler.routed(RouterHandler.java:115) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.runtime.rest.handler.router.RouterHandler.channelRead0(RouterHandler.java:94) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.runtime.rest.handler.router.RouterHandler.channelRead0(RouterHandler.java:55) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.shaded.netty4.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) 
[flink-dist_2.12-1.14.2.jar:1.14.2]
at 
org.apache.flink.runtime.rest.FileUploadHandler.channelRead0(FileUploadHandler.java:238) 

Re: Issue with Flink UI for Flink 1.14.0

2022-02-10 Thread Guillaume Vauvert

Hi,

This issue is impacting all deployments with 2 JobManagers or more (HA 
mode), because in this case serialization is used (well, depending on 
the JobManager who is responding, the Leader or a Follower).


It prevents:

* usage of Flink UI

* usage of Flink command "flink.sh list"

* usage of Flink REST API "/jobs/overview"

There are some workaround for all impacts, but that is additional work, 
so impact is important.


Should it be possible to release sooner than "planned" ?

Thanks !

--

Guillaume

On 10/02/2022 11.35, Roman Khachatryan wrote:

Hi,

AFAIK there are no plans currently to release 1.14.4.
The previous one (1.14.3) was released on Jan 20, so I'd 
1.14.4 preparation to start in the next several weeks.


Regards,
Roman


On Tue, Feb 8, 2022 at 7:31 PM Sweta Kalakuntla 
 wrote:


I am facing the same issue, do we know when 1.14.4 will be released?

Thanks.

On Fri, Jan 21, 2022 at 3:28 AM Chesnay Schepler
 wrote:

While FLINK-24550 was indeed fixed unfortunately a similar bug
was also introduced
(https://issues.apache.org/jira/browse/FLINK-25732).

On 20/01/2022 21:18, Peter Westermann wrote:


Just tried this again with Flink 1.14.3 since
https://issues.apache.org/jira/browse/FLINK-24550 is listed
as fixed. I am running into similar errors when calling the
/v1/jobs/overview endpoint (without any running jobs):

{"errors":["Internal server error.",""]}

Peter Westermann

Team Lead – Realtime Analytics

cidimage001.jpg@01D78D4C.C00AC080

peter.westerm...@genesys.com


cidimage001.jpg@01D78D4C.C00AC080

cidimage002.jpg@01D78D4C.C00AC080 

*From: *Dawid Wysakowicz 

*Date: *Thursday, October 14, 2021 at 10:00 AM
*To: *Peter Westermann 
, user@flink.apache.org
 
*Subject: *Re: Issue with Flink UI for Flink 1.14.0

I am afraid it is a bug in flink 1.14. I created a ticket for
it FLINK-24550[1]. I believe we should pick it up soonish.
Thanks for reporting the issue!

Best,

Dawid

[1] https://issues.apache.org/jira/browse/FLINK-24550

On 13/10/2021 20:32, Peter Westermann wrote:

Hello,

I just started testing Flink 1.14.0 and noticed some
weird behavior. This is for a Flink cluster with
zookeeper for HA and two job managers (one leader, one
backup). The UI on the leader works fine. The UI on the
other job manager does not load any job-specific data.
Same applies to the REST interface. If I requests job
data from /v1/jobs/{jobId}, I get the expected response
on the leader but on the other job manager, I only get an
exception stack trace:

{"errors":["Internal server error.",""]}

Peter Westermann

Team Lead – Realtime Analytics

cidimage001.jpg@01D78D4C.C00AC080

peter.westerm...@genesys.com


cidimage001.jpg@01D78D4C.C00AC080

cidimage002.jpg@01D78D4C.C00AC080 



[Statefun] Interaction Protocol for Statefun

2021-03-13 Thread Guillaume Vauvert
Hi Statefun folk,

(I already posted this message in the devs mailing list, but as this
feature may be added both as core and as an external package, both
lists are good candidates ...)

Now that we have a powerful framework to manage stateful functions, a part
of the algorithmic complexity has moved on the interaction between
functions side. But implementing an interaction protocol is complex and
error prone, so that could be great to provide an official/supported
library of classical interaction protocol, like (in distributed algorithm)
consensus, election, broadcast, auctions ...

A protocol could define a set of roles, each role implementing the
interaction with other roles, and letting the dev to implement the function
decision part. For example, in an auction protocol, the role "bidder"
manages the interaction with the auctioneer and "ask" the dev to implement
the functions onItemForSale, onNewBidMade and onWinnerSelected.


The next step could be to allow to define any interaction protocol using a
formal specification language, and compile the specification into Statefun.
But that's a long term goal. I found multiple approaches and languages,
more or less expressive/powerful/complex, but I have not started to compare
them.


Have you heard about such work/approach ? Is there an ongoing work in this
direction for Statefun ? What do you think about the proposition itself ?


Thanks !

--

Guillaume Vauvert