Re: Multiple JobManager HA set up for Standalone Kubernetes

2021-03-05 Thread Yang Wang
Hi deepthi,

Thanks for trying the Kubernetes HA service.

> Do I need standby JobManagers?
I think the answer is based on your production requirements. Usually, it is
unnecessary to have more than one JobManagers.
Because we are using the Kubernetes deployment to manage the JobManager.
Once it crashed exceptionally, a new one will be launched
But if you want to get the recovery faster, then the standby JobManagers
could help, especially for the Kubernetes node failure. We could
save the time for scheduling and launching a new pod.

> Why the multiple JobManagers could not work with HA service?
I notice that all the JobManagers are using the "flink-jobmanager" service
name for rpc address. This should not happen. Instead, starting
JobManager with pod IP is the correct way. You could find how to set the
pod IP for "jobmanager.rpc.address" here[1]. The most important
change is to add the "--host", "$(_POD_IP_ADDRESS)" in the args.

[1].
https://issues.apache.org/jira/browse/FLINK-20982?focusedCommentId=17265715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17265715


Best,
Yang


deepthi Sridharan  于2021年3月5日周五 下午2:26写道:

> I am trying to figure out the right architecture for running Standalone
> Kubernetes with Job manager HA. The documentation
>  
> for
> running HA seems to always suggest that there needs to be multiple
> job managers, but there isn't much instructions on how we can set that up
> since most deployment recommendations suggest running a single Job Manager.
> A few talks I found online, only refer to multiple JobManagers instances
> when running with Zookeeper HA and the ones for K8S HA seem to all have 1
> JobManager.
>
> I tried running a setup with replicaSet 2 for JobManager and I could see
> one of them getting leadership for a previously submitted job (from when
> replicaSet was 1).
>
> 2021-03-04 18:34:19,282 INFO
>  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint   [] -
> http://flink-jobmanager:8081 was granted leadership with
> leaderSessionID=3e9a9d16-dc30-4ee1-9556-43b199db826d
> 2021-03-04 18:34:20,773 INFO
>  org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector
> [] - New leader elected e1f3af04-073b-4ab8-9261-9a18b3bf85d7 for
> flink-dks-ha1-dispatcher-leader.
> 2021-03-04 18:34:20,786 INFO
>  org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess
> [] - Start SessionDispatcherLeaderProcess.
> 2021-03-04 18:34:20,787 INFO
>  org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess
> [] - Recover all persisted job graphs.
> 2021-03-04 18:34:20,794 INFO
>  org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] -
> Retrieved job ids [7dc4e15f392ccd826a0bc95e8755b410] from
> KubernetesStateHandleStore{configMapName='flink-dks-ha1-dispatcher-leader'}
> 2021-03-04 18:34:20,794 INFO
>  org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess
> [] - Trying to recover job with job id 7dc4e15f392ccd826a0bc95e8755b410.
> 2021-03-04 18:34:20,862 INFO
>  org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess
> [] - Successfully recovered 1 persisted job graphs.
> 2021-03-04 18:34:21,134 INFO
>  org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] -
> Recovering checkpoints from
> KubernetesStateHandleStore{configMapName='flink-dks-ha1-7dc4e15f392ccd826a0bc95e8755b410-jobmanager-leader'}.
> 2021-03-04 18:34:21,145 INFO
>  org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] -
> Found 5 checkpoints in
> KubernetesStateHandleStore{configMapName='flink-dks-ha1-7dc4e15f392ccd826a0bc95e8755b410-jobmanager-leader'}.
>
> But, after killing the leader JobManager instance with
>
> kubectl exec {jobmanager_pod_name} -- /bin/sh -c "kill 1"
>
>
> I don't see the task managers being able to resume processing. I have 2
> Task manager pods and they both seem to be stuck here
>
> 2021-03-04 19:57:47,647
> INFO  org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] -
> Using predefined options: DEFAULT.
>
> 2021-03-04 19:57:47,648
> INFO  org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] -
> Using default options factory:
> DefaultConfigurableOptionsFactory{configuredOptions={}}.
>
> 2021-03-04 19:57:47,648
> INFO  org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] -
> Getting managed memory shared cache for RocksDB.
>
> 2021-03-04 19:57:47,648
> INFO  org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] -
> Obtained shared RocksDB cache of size 268435460 bytes
>
> while the same test with 1 JobManager shows the task managers successfully
> communicating to the restarted job manager and getting their assignments
> and start processing from saved checkpoints.
>
> Is multiple JobManagers not the recommended way to run JobManager HA for
> Kubernetes? If it is, is there any documentation on how to run multiple

Multiple JobManager HA set up for Standalone Kubernetes

2021-03-04 Thread deepthi Sridharan
I am trying to figure out the right architecture for running Standalone
Kubernetes with Job manager HA. The documentation

for
running HA seems to always suggest that there needs to be multiple
job managers, but there isn't much instructions on how we can set that up
since most deployment recommendations suggest running a single Job Manager.
A few talks I found online, only refer to multiple JobManagers instances
when running with Zookeeper HA and the ones for K8S HA seem to all have 1
JobManager.

I tried running a setup with replicaSet 2 for JobManager and I could see
one of them getting leadership for a previously submitted job (from when
replicaSet was 1).

2021-03-04 18:34:19,282 INFO
 org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint   [] -
http://flink-jobmanager:8081 was granted leadership with
leaderSessionID=3e9a9d16-dc30-4ee1-9556-43b199db826d
2021-03-04 18:34:20,773 INFO
 org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector
[] - New leader elected e1f3af04-073b-4ab8-9261-9a18b3bf85d7 for
flink-dks-ha1-dispatcher-leader.
2021-03-04 18:34:20,786 INFO
 org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess
[] - Start SessionDispatcherLeaderProcess.
2021-03-04 18:34:20,787 INFO
 org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess
[] - Recover all persisted job graphs.
2021-03-04 18:34:20,794 INFO
 org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] -
Retrieved job ids [7dc4e15f392ccd826a0bc95e8755b410] from
KubernetesStateHandleStore{configMapName='flink-dks-ha1-dispatcher-leader'}
2021-03-04 18:34:20,794 INFO
 org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess
[] - Trying to recover job with job id 7dc4e15f392ccd826a0bc95e8755b410.
2021-03-04 18:34:20,862 INFO
 org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess
[] - Successfully recovered 1 persisted job graphs.
2021-03-04 18:34:21,134 INFO
 org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] -
Recovering checkpoints from
KubernetesStateHandleStore{configMapName='flink-dks-ha1-7dc4e15f392ccd826a0bc95e8755b410-jobmanager-leader'}.
2021-03-04 18:34:21,145 INFO
 org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] -
Found 5 checkpoints in
KubernetesStateHandleStore{configMapName='flink-dks-ha1-7dc4e15f392ccd826a0bc95e8755b410-jobmanager-leader'}.

But, after killing the leader JobManager instance with

kubectl exec {jobmanager_pod_name} -- /bin/sh -c "kill 1"


I don't see the task managers being able to resume processing. I have 2
Task manager pods and they both seem to be stuck here

2021-03-04 19:57:47,647
INFO  org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] -
Using predefined options: DEFAULT.

2021-03-04 19:57:47,648
INFO  org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] -
Using default options factory:
DefaultConfigurableOptionsFactory{configuredOptions={}}.

2021-03-04 19:57:47,648
INFO  org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] -
Getting managed memory shared cache for RocksDB.

2021-03-04 19:57:47,648
INFO  org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] -
Obtained shared RocksDB cache of size 268435460 bytes

while the same test with 1 JobManager shows the task managers successfully
communicating to the restarted job manager and getting their assignments
and start processing from saved checkpoints.

Is multiple JobManagers not the recommended way to run JobManager HA for
Kubernetes? If it is, is there any documentation on how to run multiple JMs?


-- 
Thank you,
Deepthi