[ 
https://issues.apache.org/jira/browse/FLINK-13660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959611#comment-16959611
 ] 

Chesnay Schepler commented on FLINK-13660:
------------------------------------------

3) is already possible by configuring {{web.upload.dir}}; this option controls 
where jars are stored and supports distributed filesystems.

I'd prefer not to introduce redirections on the REST layer; redirections to the 
leader should happen underneath on the RPC layer. Redirections alone won't 
solve the problem in any case, as it can still fail if the leader changes 
between the upload and jar submission.

Storing the jars in the blob service might be feasible, but introduces a fair 
bit of complexity.

Something I've been wondering for a while is whether there should be a way to 
upload&run a jar in a single call, similar to how the CLI submission works.

> Cannot submit job on Flink session cluster on kubernetes with multiple JM 
> pods (zk HA) through web frontend
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-13660
>                 URL: https://issues.apache.org/jira/browse/FLINK-13660
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Runtime / Web Frontend
>    Affects Versions: 1.9.0
>            Reporter: MalcolmSanders
>            Priority: Minor
>
> Hi, all,
> Previously I'm testing HighAvailabilityService of Flink 1.9 on k8s. When 
> testing Flink session cluster with 3 JM pods deployed on k8s, I find the jar 
> I previously uploaded to the web frontend will continuously dispear in 
> "Uploaded Jars" web page. As a result, it's hard to submit the job.
> After investigation, I find that it has something to do with (1) the 
> implementation of method "handleRequest" of "JarListHandler" and 
> "JarUploadHandler" RestHandlers along with (2) the routing mechanism of k8s 
> service.
> (1) It seem to me that "handleRequest" method should dispatch the message 
> through "DispatcherGateway gateway" to the leader JM. While the two 
> RestHanders don't use the gateway and just do things locally. That is to say 
> if a "upload jar" request or "list loaded jars" request is sent to any of the 
> 3 JMs, the web frontend will only storage or fetch jars from local directory.
> (2) I use k8s service to open a flink web page, the URL pattern is (PS: start 
> "kubectl proxy" locally): 
> http://127.0.0.1:8001/api/v1/namespaces/${my_ns}/services/${my_session_cluster_service}:ui/proxy/#/submit
> Since there a 3 endpoints (3 JMs) of this k8s service, the k8s routing 
> mechanism will randomly choose which endpoint (JM) a REST message sends to.
> As a result of the two factors, Flink session cluster previously cannot be 
> deployed with multiple JMs using HighAvailablityService on k8s.
> Proposals:
> (1) redirect jar related REST messages to the leader JM
> (2) (along with proposal(1)) synchronize jar files with the standby JMs 
> incase of standby JM taking the leadership
> (3) support upload jars to global filesystem (etc. dfs)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to