?????? JobGraphs not cleaned up in HA mode

2019-11-28 Thread ??????
the chk-* directory is not found , I think the misssing because of jobmanager 
removes it automaticly , but why it still in zookeeper?







----
??:"Vijay Bhaskar"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: JobGraphs not cleaned up in HA mode

2019-11-28 Thread Vijay Bhaskar
One more thing:
You configured:
high-availability.cluster-id: /cluster-test
it should be:
high-availability.cluster-id: cluster-test
I don't think this is major issue, in case it helps, you can check.
Can you check one more thing:
Is check pointing happening or not?
Were you able to see the chk-* folder under checkpoint directory?

Regards
Bhaskar

On Thu, Nov 28, 2019 at 5:00 PM 曾祥才  wrote:

> hi,
> Is there any deference (for me using nas is more convenient to test
> currently)?
> from the docs seems hdfs ,s3, nfs etc all will be fine.
>
>
>
> -- 原始邮件 --
> *发件人:* "vino yang";
> *发送时间:* 2019年11月28日(星期四) 晚上7:17
> *收件人:* "曾祥才";
> *抄送:* "Vijay Bhaskar";"User-Flink"<
> user@flink.apache.org>;
> *主题:* Re: JobGraphs not cleaned up in HA mode
>
> Hi,
>
> Why do you not use HDFS directly?
>
> Best,
> Vino
>
> 曾祥才  于2019年11月28日周四 下午6:48写道:
>
>>
>> anyone have the same problem? pls help, thks
>>
>>
>>
>> -- 原始邮件 ----------
>> *发件人:* "曾祥才";
>> *发送时间:* 2019年11月28日(星期四) 下午2:46
>> *收件人:* "Vijay Bhaskar";
>> *抄送:* "User-Flink";
>> *主题:* 回复: JobGraphs not cleaned up in HA mode
>>
>> the config  (/flink is the NASdirectory ):
>>
>> jobmanager.rpc.address: flink-jobmanager
>> taskmanager.numberOfTaskSlots: 16
>> web.upload.dir: /flink/webUpload
>> blob.server.port: 6124
>> jobmanager.rpc.port: 6123
>> taskmanager.rpc.port: 6122
>> jobmanager.heap.size: 1024m
>> taskmanager.heap.size: 1024m
>> high-availability: zookeeper
>> high-availability.cluster-id: /cluster-test
>> high-availability.storageDir: /flink/ha
>> high-availability.zookeeper.quorum: :2181
>> high-availability.jobmanager.port: 6123
>> high-availability.zookeeper.path.root: /flink/risk-insight
>> high-availability.zookeeper.path.checkpoints: /flink/zk-checkpoints
>> state.backend: filesystem
>> state.checkpoints.dir: file:///flink/checkpoints
>> state.savepoints.dir: file:///flink/savepoints
>> state.checkpoints.num-retained: 2
>> jobmanager.execution.failover-strategy: region
>> jobmanager.archive.fs.dir: file:///flink/archive/history
>>
>>
>>
>> -- 原始邮件 --
>> *发件人:* "Vijay Bhaskar";
>> *发送时间:* 2019年11月28日(星期四) 下午3:12
>> *收件人:* "曾祥才";
>> *抄送:* "User-Flink";
>> *主题:* Re: JobGraphs not cleaned up in HA mode
>>
>> Can you share the flink configuration once?
>>
>> Regards
>> Bhaskar
>>
>> On Thu, Nov 28, 2019 at 12:09 PM 曾祥才  wrote:
>>
>>> if i clean the zookeeper data , it runs fine .  but next time when the
>>> jobmanager failed and redeploy the error occurs again
>>>
>>>
>>>
>>>
>>> -- 原始邮件 --
>>> *发件人:* "Vijay Bhaskar";
>>> *发送时间:* 2019年11月28日(星期四) 下午3:05
>>> *收件人:* "曾祥才";
>>> *主题:* Re: JobGraphs not cleaned up in HA mode
>>>
>>> Again it could not find the state store file: "Caused by:
>>> java.io.FileNotFoundException: /flink/ha/submittedJobGraph0c6bcff01199 "
>>>  Check why its unable to find.
>>> Better thing is: Clean up zookeeper state and check your configurations,
>>> correct them and restart cluster.
>>> Otherwise it always picks up corrupted state from zookeeper and it will
>>> never restart
>>>
>>> Regards
>>> Bhaskar
>>>
>>> On Thu, Nov 28, 2019 at 11:51 AM 曾祥才  wrote:
>>>
>>>> i've made a misstake( the log before is another cluster) . the full
>>>> exception log is :
>>>>
>>>>
>>>> INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher  -
>>>> Recovering all persisted jobs.
>>>> 2019-11-28 02:33:12,726 INFO
>>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl  -
>>>> Starting the SlotManager.
>>>> 2019-11-28 02:33:12,743 INFO
>>>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>>>> Released locks of job graph 639170a9d710bacfd113ca66b2aacefa from
>>>> ZooKeeper.
>>>> 2019-11-28 02:33:12,744 ERROR
>>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
>>>> occurred in the cluster entrypoint.
>>>> org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take
>&g

?????? JobGraphs not cleaned up in HA mode

2019-11-28 Thread ??????
hi??
Is there any deference ??for me using nas is more convenient to test 
currently??? 
from the docs seems hdfs ,s3, nfs etc all will be fine.






----
??:"vino yang"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: JobGraphs not cleaned up in HA mode

2019-11-28 Thread vino yang
Hi,

Why do you not use HDFS directly?

Best,
Vino

曾祥才  于2019年11月28日周四 下午6:48写道:

>
> anyone have the same problem? pls help, thks
>
>
>
> -- 原始邮件 --
> *发件人:* "曾祥才";
> *发送时间:* 2019年11月28日(星期四) 下午2:46
> *收件人:* "Vijay Bhaskar";
> *抄送:* "User-Flink";
> *主题:* 回复: JobGraphs not cleaned up in HA mode
>
> the config  (/flink is the NASdirectory ):
>
> jobmanager.rpc.address: flink-jobmanager
> taskmanager.numberOfTaskSlots: 16
> web.upload.dir: /flink/webUpload
> blob.server.port: 6124
> jobmanager.rpc.port: 6123
> taskmanager.rpc.port: 6122
> jobmanager.heap.size: 1024m
> taskmanager.heap.size: 1024m
> high-availability: zookeeper
> high-availability.cluster-id: /cluster-test
> high-availability.storageDir: /flink/ha
> high-availability.zookeeper.quorum: :2181
> high-availability.jobmanager.port: 6123
> high-availability.zookeeper.path.root: /flink/risk-insight
> high-availability.zookeeper.path.checkpoints: /flink/zk-checkpoints
> state.backend: filesystem
> state.checkpoints.dir: file:///flink/checkpoints
> state.savepoints.dir: file:///flink/savepoints
> state.checkpoints.num-retained: 2
> jobmanager.execution.failover-strategy: region
> jobmanager.archive.fs.dir: file:///flink/archive/history
>
>
>
> ------ 原始邮件 ----------
> *发件人:* "Vijay Bhaskar";
> *发送时间:* 2019年11月28日(星期四) 下午3:12
> *收件人:* "曾祥才";
> *抄送:* "User-Flink";
> *主题:* Re: JobGraphs not cleaned up in HA mode
>
> Can you share the flink configuration once?
>
> Regards
> Bhaskar
>
> On Thu, Nov 28, 2019 at 12:09 PM 曾祥才  wrote:
>
>> if i clean the zookeeper data , it runs fine .  but next time when the
>> jobmanager failed and redeploy the error occurs again
>>
>>
>>
>>
>> -- 原始邮件 --
>> *发件人:* "Vijay Bhaskar";
>> *发送时间:* 2019年11月28日(星期四) 下午3:05
>> *收件人:* "曾祥才";
>> *主题:* Re: JobGraphs not cleaned up in HA mode
>>
>> Again it could not find the state store file: "Caused by:
>> java.io.FileNotFoundException: /flink/ha/submittedJobGraph0c6bcff01199 "
>>  Check why its unable to find.
>> Better thing is: Clean up zookeeper state and check your configurations,
>> correct them and restart cluster.
>> Otherwise it always picks up corrupted state from zookeeper and it will
>> never restart
>>
>> Regards
>> Bhaskar
>>
>> On Thu, Nov 28, 2019 at 11:51 AM 曾祥才  wrote:
>>
>>> i've made a misstake( the log before is another cluster) . the full
>>> exception log is :
>>>
>>>
>>> INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher  -
>>> Recovering all persisted jobs.
>>> 2019-11-28 02:33:12,726 INFO
>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl  -
>>> Starting the SlotManager.
>>> 2019-11-28 02:33:12,743 INFO
>>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>>> Released locks of job graph 639170a9d710bacfd113ca66b2aacefa from
>>> ZooKeeper.
>>> 2019-11-28 02:33:12,744 ERROR
>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
>>> occurred in the cluster entrypoint.
>>> org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take
>>> leadership with session id 607e1fe0-f1b4-4af0-af16-4137d779defb.
>>> at
>>> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$30(Dispatcher.java:915)
>>>
>>> at
>>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>>>
>>> at
>>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>>>
>>> at
>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>>>
>>> at
>>> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575)
>>> at
>>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:594)
>>>
>>> at
>>> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
>>>
>>> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>>> at
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
>>>
>>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at
>>> akka.dispatch

?????? JobGraphs not cleaned up in HA mode

2019-11-28 Thread ??????
anyone have the same problem?? pls help, thks






----
??:"??"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

?????? JobGraphs not cleaned up in HA mode

2019-11-27 Thread ??????
the config (/flink is the NASdirectory ):


jobmanager.rpc.address: flink-jobmanager
taskmanager.numberOfTaskSlots: 16
web.upload.dir: /flink/webUpload
blob.server.port: 6124
jobmanager.rpc.port: 6123
taskmanager.rpc.port: 6122
jobmanager.heap.size: 1024m
taskmanager.heap.size: 1024m
high-availability: zookeeper
high-availability.cluster-id: /cluster-test
high-availability.storageDir: /flink/ha
high-availability.zookeeper.quorum: :2181
high-availability.jobmanager.port: 6123
high-availability.zookeeper.path.root: /flink/risk-insight
high-availability.zookeeper.path.checkpoints: /flink/zk-checkpoints
state.backend: filesystem
state.checkpoints.dir: file:///flink/checkpoints
state.savepoints.dir: file:///flink/savepoints
state.checkpoints.num-retained: 2
jobmanager.execution.failover-strategy: region
jobmanager.archive.fs.dir: file:///flink/archive/history






----
??:"Vijay Bhaskar"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: JobGraphs not cleaned up in HA mode

2019-11-27 Thread Vijay Bhaskar
Can you share the flink configuration once?

Regards
Bhaskar

On Thu, Nov 28, 2019 at 12:09 PM 曾祥才  wrote:

> if i clean the zookeeper data , it runs fine .  but next time when the
> jobmanager failed and redeploy the error occurs again
>
>
>
>
> -- 原始邮件 --
> *发件人:* "Vijay Bhaskar";
> *发送时间:* 2019年11月28日(星期四) 下午3:05
> *收件人:* "曾祥才";
> *主题:* Re: JobGraphs not cleaned up in HA mode
>
> Again it could not find the state store file: "Caused by:
> java.io.FileNotFoundException: /flink/ha/submittedJobGraph0c6bcff01199 "
>  Check why its unable to find.
> Better thing is: Clean up zookeeper state and check your configurations,
> correct them and restart cluster.
> Otherwise it always picks up corrupted state from zookeeper and it will
> never restart
>
> Regards
> Bhaskar
>
> On Thu, Nov 28, 2019 at 11:51 AM 曾祥才  wrote:
>
>> i've made a misstake( the log before is another cluster) . the full
>> exception log is :
>>
>>
>> INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher  -
>> Recovering all persisted jobs.
>> 2019-11-28 02:33:12,726 INFO
>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl  -
>> Starting the SlotManager.
>> 2019-11-28 02:33:12,743 INFO
>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>> Released locks of job graph 639170a9d710bacfd113ca66b2aacefa from
>> ZooKeeper.
>> 2019-11-28 02:33:12,744 ERROR
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
>> occurred in the cluster entrypoint.
>> org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take
>> leadership with session id 607e1fe0-f1b4-4af0-af16-4137d779defb.
>> at
>> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$30(Dispatcher.java:915)
>> at
>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>> at
>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>> at
>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>> at
>> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575)
>> at
>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:594)
>> at
>> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
>> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>> at
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> Caused by: java.lang.RuntimeException:
>> org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph
>> from state handle under /639170a9d710bacfd113ca66b2aacefa. This indicates
>> that the retrieved state handle is broken. Try cleaning the state handle
>> store.
>> at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
>> at
>> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>> at
>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>> at
>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>> ... 7 more
>> Caused by: org.apache.flink.util.FlinkException: Could not retrieve
>> submitted JobGraph from state handle under
>> /639170a9d710bacfd113ca66b2aacefa. This indicates that the retrieved state
>> handle is broken. Try cleaning the state handle store.
>> at
>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:190)
>> at
>> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:755)
>> at
>> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:740)
>> at
>> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:721)
>> at
>> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$27(Dispatcher.java:888)
>> at
>> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73)
>> ... 9 more
>> Caused by: java.io.FileNotFoundException:
>> /flink/ha/submittedJobGraph0c6bcff01199 (No

?????? JobGraphs not cleaned up in HA mode

2019-11-27 Thread ??????
if i clean the zookeeper data , it runs fine . but next time when the 
jobmanager failed and redeploy the error occurs again








----
??:"Vijay Bhaskar"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: JobGraphs not cleaned up in HA mode

2019-11-27 Thread Vijay Bhaskar
Is it filesystem or hadoop? If its NAS then why the exception "Caused by:
org.apache.hadoop.hdfs.BlockMissingException: "
It seems you configured hadoop state store and giving NAS mount.

Regards
Bhaskar



On Thu, Nov 28, 2019 at 11:36 AM 曾祥才  wrote:

> /flink/checkpoints  is a external persistent store (a nas directory mounts
> to the job manager)
>
>
>
>
> -- 原始邮件 --
> *发件人:* "Vijay Bhaskar";
> *发送时间:* 2019年11月28日(星期四) 下午2:29
> *收件人:* "曾祥才";
> *抄送:* "user";
> *主题:* Re: JobGraphs not cleaned up in HA mode
>
> Following are the mandatory condition to run in HA:
>
> a) You should have persistent common external store for jobmanager and
> task managers to while writing the state
> b) You should have persistent external store for zookeeper to store the
> Jobgraph.
>
> Zookeeper is referring  path:
> /flink/checkpoints/submittedJobGraph480ddf9572ed  to get the job graph but
> jobmanager unable to find it.
> It seems /flink/checkpoints  is not the external persistent store
>
>
> Regards
> Bhaskar
>
> On Thu, Nov 28, 2019 at 10:43 AM seuzxc  wrote:
>
>> hi ,I've the same problem with flink 1.9.1 , any solution to fix it
>> when the k8s redoploy jobmanager ,  the error looks like (seems zk not
>> remove submitted job info, but jobmanager remove the file):
>>
>>
>> Caused by: org.apache.flink.util.FlinkException: Could not retrieve
>> submitted JobGraph from state handle under
>> /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state
>> handle is broken. Try cleaning the state handle store.
>> at
>>
>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208)
>> at
>>
>> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696)
>> at
>>
>> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681)
>> at
>>
>> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:662)
>> at
>>
>> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:821)
>> at
>>
>> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:72)
>> ... 9 more
>> Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain
>> block: BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494
>> file=/flink/checkpoints/submittedJobGraph480ddf9572ed
>> at
>>
>> org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1052)
>>
>>
>>
>> --
>> Sent from:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>
>


?????? JobGraphs not cleaned up in HA mode

2019-11-27 Thread ??????
/flink/checkpoints is a external persistent store(a nas directory 
mounts to the job manager)








----
??:"Vijay Bhaskar"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: JobGraphs not cleaned up in HA mode

2019-11-27 Thread Vijay Bhaskar
Following are the mandatory condition to run in HA:

a) You should have persistent common external store for jobmanager and task
managers to while writing the state
b) You should have persistent external store for zookeeper to store the
Jobgraph.

Zookeeper is referring  path:
/flink/checkpoints/submittedJobGraph480ddf9572ed  to get the job graph but
jobmanager unable to find it.
It seems /flink/checkpoints  is not the external persistent store


Regards
Bhaskar

On Thu, Nov 28, 2019 at 10:43 AM seuzxc  wrote:

> hi ,I've the same problem with flink 1.9.1 , any solution to fix it
> when the k8s redoploy jobmanager ,  the error looks like (seems zk not
> remove submitted job info, but jobmanager remove the file):
>
>
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve
> submitted JobGraph from state handle under
> /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state
> handle is broken. Try cleaning the state handle store.
> at
>
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208)
> at
>
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696)
> at
>
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681)
> at
>
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:662)
> at
>
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:821)
> at
>
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:72)
> ... 9 more
> Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain
> block: BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494
> file=/flink/checkpoints/submittedJobGraph480ddf9572ed
> at
>
> org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1052)
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Re: JobGraphs not cleaned up in HA mode

2019-11-27 Thread seuzxc
hi ,I've the same problem with flink 1.9.1 , any solution to fix it
when the k8s redoploy jobmanager ,  the error looks like (seems zk not
remove submitted job info, but jobmanager remove the file):  


Caused by: org.apache.flink.util.FlinkException: Could not retrieve
submitted JobGraph from state handle under
/147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state
handle is broken. Try cleaning the state handle store.
at
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208)
at
org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696)
at
org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681)
at
org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:662)
at
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:821)
at
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:72)
... 9 more
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain
block: BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494
file=/flink/checkpoints/submittedJobGraph480ddf9572ed
at
org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1052)



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Till Rohrmann
Hi Encho,

thanks for sending the first part of the logs. What I would actually be
interested in are the complete logs because somewhere in the jobmanager-2
logs there must be a log statement saying that the respective dispatcher
gained leadership. I would like to see why this happens but for this to
debug the complete logs are necessary. It would be awesome if you could
send them to me. Thanks a lot!

Cheers,
Till

On Wed, Aug 29, 2018 at 2:00 PM Encho Mishinev 
wrote:

> Hi Till,
>
> I will use the approach with a k8s deployment and HA mode with a single
> job manager. Nonetheless, here are the logs I just produced by repeating
> the aforementioned experiment, hope they help in debugging:
>
> *- Starting Jobmanager-1:*
>
> Starting Job Manager
> sed: cannot rename /opt/flink/conf/sedR98XPn: Device or resource busy
> config file:
> jobmanager.rpc.address: flink-jobmanager-1
> jobmanager.rpc.port: 6123
> jobmanager.heap.size: 8192
> taskmanager.heap.size: 8192
> taskmanager.numberOfTaskSlots: 4
> high-availability: zookeeper
> high-availability.storageDir:
> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability
> high-availability.zookeeper.quorum: zk-cs:2181
> high-availability.zookeeper.path.root: /flink
> high-availability.jobmanager.port: 50010
> state.backend: filesystem
> state.checkpoints.dir:
> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/checkpoints
> state.savepoints.dir:
> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/savepoints
> state.backend.incremental: false
> fs.default-scheme:
> hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020
> rest.port: 8081
> web.upload.dir: /opt/flink/upload
> query.server.port: 6125
> taskmanager.numberOfTaskSlots: 4
> classloader.parent-first-patterns.additional: org.apache.xerces.
> blob.storage.directory: /opt/flink/blob-server
> blob.server.port: 6124
> blob.server.port: 6124
> query.server.port: 6125
> Starting standalonesession as a console application on host
> flink-jobmanager-1-f76fd4df8-ftwt9.
> 2018-08-29 11:41:48,806 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
> 
> 2018-08-29 11:41:48,807 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Starting
> StandaloneSessionClusterEntrypoint (Version: 1.5.3, Rev:614f216,
> Date:16.08.2018 @ 06:39:50 GMT)
> 2018-08-29 11:41:48,807 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  OS current
> user: flink
> 2018-08-29 11:41:49,134 WARN  org.apache.hadoop.util.NativeCodeLoader
>  - Unable to load native-hadoop library for your
> platform... using builtin-java classes where applicable
> 2018-08-29 11:41:49,210 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Current
> Hadoop/Kerberos user: flink
> 2018-08-29 11:41:49,210 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM:
> OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
> 2018-08-29 11:41:49,210 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Maximum
> heap size: 6702 MiBytes
> 2018-08-29 11:41:49,210 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JAVA_HOME:
> /docker-java-home/jre
> 2018-08-29 11:41:49,213 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Hadoop
> version: 2.7.5
> 2018-08-29 11:41:49,213 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM
> Options:
> 2018-08-29 11:41:49,213 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
>  -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
> 2018-08-29 11:41:49,213 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
>  -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
> 2018-08-29 11:41:49,213 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Program
> Arguments:
> 2018-08-29 11:41:49,213 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
>  --configDir
> 2018-08-29 11:41:49,213 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
>  /opt/flink/conf
> 2018-08-29 11:41:49,213 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
>  --executionMode
> 2018-08-29 11:41:49,213 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster
> 2018-08-29 11:41:49,214 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --host
> 2018-08-29 11:41:49,214 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster
> 2018-08-29 11:41:49,214 INFO
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Classpath:
> 

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Encho Mishinev
Hi Till,

I will use the approach with a k8s deployment and HA mode with a single job
manager. Nonetheless, here are the logs I just produced by repeating the
aforementioned experiment, hope they help in debugging:

*- Starting Jobmanager-1:*

Starting Job Manager
sed: cannot rename /opt/flink/conf/sedR98XPn: Device or resource busy
config file:
jobmanager.rpc.address: flink-jobmanager-1
jobmanager.rpc.port: 6123
jobmanager.heap.size: 8192
taskmanager.heap.size: 8192
taskmanager.numberOfTaskSlots: 4
high-availability: zookeeper
high-availability.storageDir:
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability
high-availability.zookeeper.quorum: zk-cs:2181
high-availability.zookeeper.path.root: /flink
high-availability.jobmanager.port: 50010
state.backend: filesystem
state.checkpoints.dir:
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/checkpoints
state.savepoints.dir:
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/savepoints
state.backend.incremental: false
fs.default-scheme:
hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020
rest.port: 8081
web.upload.dir: /opt/flink/upload
query.server.port: 6125
taskmanager.numberOfTaskSlots: 4
classloader.parent-first-patterns.additional: org.apache.xerces.
blob.storage.directory: /opt/flink/blob-server
blob.server.port: 6124
blob.server.port: 6124
query.server.port: 6125
Starting standalonesession as a console application on host
flink-jobmanager-1-f76fd4df8-ftwt9.
2018-08-29 11:41:48,806 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -

2018-08-29 11:41:48,807 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Starting
StandaloneSessionClusterEntrypoint (Version: 1.5.3, Rev:614f216,
Date:16.08.2018 @ 06:39:50 GMT)
2018-08-29 11:41:48,807 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  OS current
user: flink
2018-08-29 11:41:49,134 WARN  org.apache.hadoop.util.NativeCodeLoader
 - Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
2018-08-29 11:41:49,210 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Current
Hadoop/Kerberos user: flink
2018-08-29 11:41:49,210 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM:
OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2018-08-29 11:41:49,210 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Maximum
heap size: 6702 MiBytes
2018-08-29 11:41:49,210 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JAVA_HOME:
/docker-java-home/jre
2018-08-29 11:41:49,213 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Hadoop
version: 2.7.5
2018-08-29 11:41:49,213 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM
Options:
2018-08-29 11:41:49,213 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
 -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2018-08-29 11:41:49,213 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
 -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2018-08-29 11:41:49,213 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Program
Arguments:
2018-08-29 11:41:49,213 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
 --configDir
2018-08-29 11:41:49,213 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
 /opt/flink/conf
2018-08-29 11:41:49,213 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
 --executionMode
2018-08-29 11:41:49,213 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster
2018-08-29 11:41:49,214 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --host
2018-08-29 11:41:49,214 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster
2018-08-29 11:41:49,214 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Classpath:
/opt/flink/lib/flink-python_2.11-1.5.3.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.5.3.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.5.3.jar:::
2018-08-29 11:41:49,214 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -

2018-08-29 11:41:49,215 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered
UNIX signal handlers for [TERM, HUP, INT]
2018-08-29 11:41:49,221 INFO
org.apache.flink.configuration.GlobalConfiguration- Loading
configuration property: jobmanager.rpc.address, flink-jobmanager-1
2018-08-29 11:41:49,221 INFO
org.apache.flink.configuration.GlobalConfiguration- Loading
configuration property: 

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Till Rohrmann
Hi Encho,

it sounds strange that the standby JobManager tries to recover a submitted
job graph. This should only happen if it has been granted leadership. Thus,
it seems as if the standby JobManager thinks that it is also the leader.
Could you maybe share the logs of the two JobManagers/ClusterEntrypoints
with us?

Running only a single JobManager/ClusterEntrypoint in HA mode via a
Kubernetes Deployment should do the trick and there is nothing wrong with
it.

Cheers,
Till

On Wed, Aug 29, 2018 at 11:05 AM Encho Mishinev 
wrote:

> Hello,
>
> Since two job managers don't seem to be working for me I was thinking of
> just using a single job manager in Kubernetes in HA mode with a deployment
> ensuring its restart whenever it fails. Is this approach viable? The
> High-Availability page mentions that you use only one job manager in an
> YARN cluster but does not specify such option for Kubernetes. Is there
> anything that can go wrong with this approach?
>
> Thanks
>
> On Wed, Aug 29, 2018 at 11:10 AM Encho Mishinev 
> wrote:
>
>> Hi,
>>
>> Unfortunately the thing I described does indeed happen every time. As
>> mentioned in the first email, I am running on Kubernetes so certain things
>> could be different compared to just a standalone cluster.
>>
>> Any ideas for workarounds are welcome, as this problem basically prevents
>> me from using HA.
>>
>> Thanks,
>> Encho
>>
>> On Wed, Aug 29, 2018 at 5:15 AM vino yang  wrote:
>>
>>> Hi Encho,
>>>
>>> From your description, I feel that there are extra bugs.
>>>
>>> About your description:
>>>
>>> *- Start both job managers*
>>> *- Start a batch job in JobManager 1 and let it finish*
>>> *The jobgraphs in both Zookeeper and HDFS remained.*
>>>
>>> Is it necessarily happening every time?
>>>
>>> In the Standalone cluster, the problems we encountered were sporadic.
>>>
>>> Thanks, vino.
>>>
>>> Encho Mishinev  于2018年8月28日周二 下午8:07写道:
>>>
 Hello Till,

 I spend a few more hours testing and looking at the logs and it seems
 like there's a more general problem here. While the two job managers are
 active neither of them can properly delete jobgraphs. The above problem I
 described comes from the fact that Kubernetes gets JobManager 1 quickly
 after I manually kill it, so when I stop the job on JobManager 2 both are
 alive.

 I did a very simple test:

 - Start both job managers
 - Start a batch job in JobManager 1 and let it finish
 The jobgraphs in both Zookeeper and HDFS remained.

 On the other hand if we do:

 - Start only JobManager 1 (again in HA mode)
 - Start a batch job and let it finish
 The jobgraphs in both Zookeeper and HDFS are deleted fine.

 It seems like the standby manager still leaves some kind of lock on the
 jobgraphs. Do you think that's possible? Have you seen a similar problem?
 The only logs that appear on the standby manager while waiting are of
 the type:

 2018-08-28 11:54:10,789 INFO
 org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
 Recovered SubmittedJobGraph(9e0a109b57511930c95d3b54574a66e3, null).

 Note that this log appears on the standby jobmanager immediately when a
 new job is submitted to the active jobmanager.
 Also note that the blobs and checkpoints are cleared fine. The problem
 is only for jobgraphs both in ZooKeeper and HDFS.

 Trying to access the UI of the standby manager redirects to the active
 one, so it is not a problem of them not knowing who the leader is. Do you
 have any ideas?

 Thanks a lot,
 Encho

 On Tue, Aug 28, 2018 at 10:27 AM Till Rohrmann 
 wrote:

> Hi Encho,
>
> thanks a lot for reporting this issue. The problem arises whenever the
> old leader maintains the connection to ZooKeeper. If this is the case, 
> then
> ephemeral nodes which we create to protect against faulty delete 
> operations
> are not removed and consequently the new leader is not able to delete the
> persisted job graph. So one thing to check is whether the old JM still has
> an open connection to ZooKeeper. The next thing to check is the session
> timeout of your ZooKeeper cluster. If you stop the job within the session
> timeout, then it is also not guaranteed that ZooKeeper has detected that
> the ephemeral nodes of the old JM must be deleted. In order to understand
> this better it would be helpful if you could tell us the timing of the
> different actions.
>
> Cheers,
> Till
>
> On Tue, Aug 28, 2018 at 8:17 AM vino yang 
> wrote:
>
>> Hi Encho,
>>
>> A temporary solution can be used to determine if it has been cleaned
>> up by monitoring the specific JobID under Zookeeper's "/jobgraph".
>> Another solution, modify the source code, rudely modify the cleanup
>> mode to the synchronous form, but the flink operation Zookeeper's 

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Encho Mishinev
Hello,

Since two job managers don't seem to be working for me I was thinking of
just using a single job manager in Kubernetes in HA mode with a deployment
ensuring its restart whenever it fails. Is this approach viable? The
High-Availability page mentions that you use only one job manager in an
YARN cluster but does not specify such option for Kubernetes. Is there
anything that can go wrong with this approach?

Thanks

On Wed, Aug 29, 2018 at 11:10 AM Encho Mishinev 
wrote:

> Hi,
>
> Unfortunately the thing I described does indeed happen every time. As
> mentioned in the first email, I am running on Kubernetes so certain things
> could be different compared to just a standalone cluster.
>
> Any ideas for workarounds are welcome, as this problem basically prevents
> me from using HA.
>
> Thanks,
> Encho
>
> On Wed, Aug 29, 2018 at 5:15 AM vino yang  wrote:
>
>> Hi Encho,
>>
>> From your description, I feel that there are extra bugs.
>>
>> About your description:
>>
>> *- Start both job managers*
>> *- Start a batch job in JobManager 1 and let it finish*
>> *The jobgraphs in both Zookeeper and HDFS remained.*
>>
>> Is it necessarily happening every time?
>>
>> In the Standalone cluster, the problems we encountered were sporadic.
>>
>> Thanks, vino.
>>
>> Encho Mishinev  于2018年8月28日周二 下午8:07写道:
>>
>>> Hello Till,
>>>
>>> I spend a few more hours testing and looking at the logs and it seems
>>> like there's a more general problem here. While the two job managers are
>>> active neither of them can properly delete jobgraphs. The above problem I
>>> described comes from the fact that Kubernetes gets JobManager 1 quickly
>>> after I manually kill it, so when I stop the job on JobManager 2 both are
>>> alive.
>>>
>>> I did a very simple test:
>>>
>>> - Start both job managers
>>> - Start a batch job in JobManager 1 and let it finish
>>> The jobgraphs in both Zookeeper and HDFS remained.
>>>
>>> On the other hand if we do:
>>>
>>> - Start only JobManager 1 (again in HA mode)
>>> - Start a batch job and let it finish
>>> The jobgraphs in both Zookeeper and HDFS are deleted fine.
>>>
>>> It seems like the standby manager still leaves some kind of lock on the
>>> jobgraphs. Do you think that's possible? Have you seen a similar problem?
>>> The only logs that appear on the standby manager while waiting are of
>>> the type:
>>>
>>> 2018-08-28 11:54:10,789 INFO
>>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>>> Recovered SubmittedJobGraph(9e0a109b57511930c95d3b54574a66e3, null).
>>>
>>> Note that this log appears on the standby jobmanager immediately when a
>>> new job is submitted to the active jobmanager.
>>> Also note that the blobs and checkpoints are cleared fine. The problem
>>> is only for jobgraphs both in ZooKeeper and HDFS.
>>>
>>> Trying to access the UI of the standby manager redirects to the active
>>> one, so it is not a problem of them not knowing who the leader is. Do you
>>> have any ideas?
>>>
>>> Thanks a lot,
>>> Encho
>>>
>>> On Tue, Aug 28, 2018 at 10:27 AM Till Rohrmann 
>>> wrote:
>>>
 Hi Encho,

 thanks a lot for reporting this issue. The problem arises whenever the
 old leader maintains the connection to ZooKeeper. If this is the case, then
 ephemeral nodes which we create to protect against faulty delete operations
 are not removed and consequently the new leader is not able to delete the
 persisted job graph. So one thing to check is whether the old JM still has
 an open connection to ZooKeeper. The next thing to check is the session
 timeout of your ZooKeeper cluster. If you stop the job within the session
 timeout, then it is also not guaranteed that ZooKeeper has detected that
 the ephemeral nodes of the old JM must be deleted. In order to understand
 this better it would be helpful if you could tell us the timing of the
 different actions.

 Cheers,
 Till

 On Tue, Aug 28, 2018 at 8:17 AM vino yang 
 wrote:

> Hi Encho,
>
> A temporary solution can be used to determine if it has been cleaned
> up by monitoring the specific JobID under Zookeeper's "/jobgraph".
> Another solution, modify the source code, rudely modify the cleanup
> mode to the synchronous form, but the flink operation Zookeeper's path
> needs to obtain the corresponding lock, so it is dangerous to do so, and 
> it
> is not recommended.
> I think maybe this problem can be solved in the next version. It
> depends on Till.
>
> Thanks, vino.
>
> Encho Mishinev  于2018年8月28日周二 下午1:17写道:
>
>> Thank you very much for the info! Will keep track of the progress.
>>
>> In the meantime is there any viable workaround? It seems like HA
>> doesn't really work due to this bug.
>>
>> On Tue, Aug 28, 2018 at 4:52 AM vino yang 
>> wrote:
>>
>>> About some implementation mechanisms.
>>> Flink uses Zookeeper to store JobGraph (Job's 

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Encho Mishinev
Hi,

Unfortunately the thing I described does indeed happen every time. As
mentioned in the first email, I am running on Kubernetes so certain things
could be different compared to just a standalone cluster.

Any ideas for workarounds are welcome, as this problem basically prevents
me from using HA.

Thanks,
Encho

On Wed, Aug 29, 2018 at 5:15 AM vino yang  wrote:

> Hi Encho,
>
> From your description, I feel that there are extra bugs.
>
> About your description:
>
> *- Start both job managers*
> *- Start a batch job in JobManager 1 and let it finish*
> *The jobgraphs in both Zookeeper and HDFS remained.*
>
> Is it necessarily happening every time?
>
> In the Standalone cluster, the problems we encountered were sporadic.
>
> Thanks, vino.
>
> Encho Mishinev  于2018年8月28日周二 下午8:07写道:
>
>> Hello Till,
>>
>> I spend a few more hours testing and looking at the logs and it seems
>> like there's a more general problem here. While the two job managers are
>> active neither of them can properly delete jobgraphs. The above problem I
>> described comes from the fact that Kubernetes gets JobManager 1 quickly
>> after I manually kill it, so when I stop the job on JobManager 2 both are
>> alive.
>>
>> I did a very simple test:
>>
>> - Start both job managers
>> - Start a batch job in JobManager 1 and let it finish
>> The jobgraphs in both Zookeeper and HDFS remained.
>>
>> On the other hand if we do:
>>
>> - Start only JobManager 1 (again in HA mode)
>> - Start a batch job and let it finish
>> The jobgraphs in both Zookeeper and HDFS are deleted fine.
>>
>> It seems like the standby manager still leaves some kind of lock on the
>> jobgraphs. Do you think that's possible? Have you seen a similar problem?
>> The only logs that appear on the standby manager while waiting are of the
>> type:
>>
>> 2018-08-28 11:54:10,789 INFO
>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>> Recovered SubmittedJobGraph(9e0a109b57511930c95d3b54574a66e3, null).
>>
>> Note that this log appears on the standby jobmanager immediately when a
>> new job is submitted to the active jobmanager.
>> Also note that the blobs and checkpoints are cleared fine. The problem is
>> only for jobgraphs both in ZooKeeper and HDFS.
>>
>> Trying to access the UI of the standby manager redirects to the active
>> one, so it is not a problem of them not knowing who the leader is. Do you
>> have any ideas?
>>
>> Thanks a lot,
>> Encho
>>
>> On Tue, Aug 28, 2018 at 10:27 AM Till Rohrmann 
>> wrote:
>>
>>> Hi Encho,
>>>
>>> thanks a lot for reporting this issue. The problem arises whenever the
>>> old leader maintains the connection to ZooKeeper. If this is the case, then
>>> ephemeral nodes which we create to protect against faulty delete operations
>>> are not removed and consequently the new leader is not able to delete the
>>> persisted job graph. So one thing to check is whether the old JM still has
>>> an open connection to ZooKeeper. The next thing to check is the session
>>> timeout of your ZooKeeper cluster. If you stop the job within the session
>>> timeout, then it is also not guaranteed that ZooKeeper has detected that
>>> the ephemeral nodes of the old JM must be deleted. In order to understand
>>> this better it would be helpful if you could tell us the timing of the
>>> different actions.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Aug 28, 2018 at 8:17 AM vino yang  wrote:
>>>
 Hi Encho,

 A temporary solution can be used to determine if it has been cleaned up
 by monitoring the specific JobID under Zookeeper's "/jobgraph".
 Another solution, modify the source code, rudely modify the cleanup
 mode to the synchronous form, but the flink operation Zookeeper's path
 needs to obtain the corresponding lock, so it is dangerous to do so, and it
 is not recommended.
 I think maybe this problem can be solved in the next version. It
 depends on Till.

 Thanks, vino.

 Encho Mishinev  于2018年8月28日周二 下午1:17写道:

> Thank you very much for the info! Will keep track of the progress.
>
> In the meantime is there any viable workaround? It seems like HA
> doesn't really work due to this bug.
>
> On Tue, Aug 28, 2018 at 4:52 AM vino yang 
> wrote:
>
>> About some implementation mechanisms.
>> Flink uses Zookeeper to store JobGraph (Job's description information
>> and metadata) as a basis for Job recovery.
>> However, previous implementations may cause this information to not
>> be properly cleaned up because it is asynchronously deleted by a 
>> background
>> thread.
>>
>> Thanks, vino.
>>
>> vino yang  于2018年8月28日周二 上午9:49写道:
>>
>>> Hi Encho,
>>>
>>> This is a problem already known to the Flink community, you can
>>> track its progress through FLINK-10011[1], and currently Till is fixing
>>> this issue.
>>>
>>> [1]: https://issues.apache.org/jira/browse/FLINK-10011
>>>
>>> 

Re: JobGraphs not cleaned up in HA mode

2018-08-28 Thread vino yang
Hi Encho,

>From your description, I feel that there are extra bugs.

About your description:

*- Start both job managers*
*- Start a batch job in JobManager 1 and let it finish*
*The jobgraphs in both Zookeeper and HDFS remained.*

Is it necessarily happening every time?

In the Standalone cluster, the problems we encountered were sporadic.

Thanks, vino.

Encho Mishinev  于2018年8月28日周二 下午8:07写道:

> Hello Till,
>
> I spend a few more hours testing and looking at the logs and it seems like
> there's a more general problem here. While the two job managers are active
> neither of them can properly delete jobgraphs. The above problem I
> described comes from the fact that Kubernetes gets JobManager 1 quickly
> after I manually kill it, so when I stop the job on JobManager 2 both are
> alive.
>
> I did a very simple test:
>
> - Start both job managers
> - Start a batch job in JobManager 1 and let it finish
> The jobgraphs in both Zookeeper and HDFS remained.
>
> On the other hand if we do:
>
> - Start only JobManager 1 (again in HA mode)
> - Start a batch job and let it finish
> The jobgraphs in both Zookeeper and HDFS are deleted fine.
>
> It seems like the standby manager still leaves some kind of lock on the
> jobgraphs. Do you think that's possible? Have you seen a similar problem?
> The only logs that appear on the standby manager while waiting are of the
> type:
>
> 2018-08-28 11:54:10,789 INFO
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
> Recovered SubmittedJobGraph(9e0a109b57511930c95d3b54574a66e3, null).
>
> Note that this log appears on the standby jobmanager immediately when a
> new job is submitted to the active jobmanager.
> Also note that the blobs and checkpoints are cleared fine. The problem is
> only for jobgraphs both in ZooKeeper and HDFS.
>
> Trying to access the UI of the standby manager redirects to the active
> one, so it is not a problem of them not knowing who the leader is. Do you
> have any ideas?
>
> Thanks a lot,
> Encho
>
> On Tue, Aug 28, 2018 at 10:27 AM Till Rohrmann 
> wrote:
>
>> Hi Encho,
>>
>> thanks a lot for reporting this issue. The problem arises whenever the
>> old leader maintains the connection to ZooKeeper. If this is the case, then
>> ephemeral nodes which we create to protect against faulty delete operations
>> are not removed and consequently the new leader is not able to delete the
>> persisted job graph. So one thing to check is whether the old JM still has
>> an open connection to ZooKeeper. The next thing to check is the session
>> timeout of your ZooKeeper cluster. If you stop the job within the session
>> timeout, then it is also not guaranteed that ZooKeeper has detected that
>> the ephemeral nodes of the old JM must be deleted. In order to understand
>> this better it would be helpful if you could tell us the timing of the
>> different actions.
>>
>> Cheers,
>> Till
>>
>> On Tue, Aug 28, 2018 at 8:17 AM vino yang  wrote:
>>
>>> Hi Encho,
>>>
>>> A temporary solution can be used to determine if it has been cleaned up
>>> by monitoring the specific JobID under Zookeeper's "/jobgraph".
>>> Another solution, modify the source code, rudely modify the cleanup mode
>>> to the synchronous form, but the flink operation Zookeeper's path needs to
>>> obtain the corresponding lock, so it is dangerous to do so, and it is not
>>> recommended.
>>> I think maybe this problem can be solved in the next version. It depends
>>> on Till.
>>>
>>> Thanks, vino.
>>>
>>> Encho Mishinev  于2018年8月28日周二 下午1:17写道:
>>>
 Thank you very much for the info! Will keep track of the progress.

 In the meantime is there any viable workaround? It seems like HA
 doesn't really work due to this bug.

 On Tue, Aug 28, 2018 at 4:52 AM vino yang 
 wrote:

> About some implementation mechanisms.
> Flink uses Zookeeper to store JobGraph (Job's description information
> and metadata) as a basis for Job recovery.
> However, previous implementations may cause this information to not be
> properly cleaned up because it is asynchronously deleted by a background
> thread.
>
> Thanks, vino.
>
> vino yang  于2018年8月28日周二 上午9:49写道:
>
>> Hi Encho,
>>
>> This is a problem already known to the Flink community, you can track
>> its progress through FLINK-10011[1], and currently Till is fixing this
>> issue.
>>
>> [1]: https://issues.apache.org/jira/browse/FLINK-10011
>>
>> Thanks, vino.
>>
>> Encho Mishinev  于2018年8月27日周一 下午10:13写道:
>>
>>> I am running Flink 1.5.3 with two job managers and two task managers
>>> in Kubernetes along with HDFS and Zookeeper in high-availability mode.
>>>
>>> My problem occurs after the following actions:
>>> - Upload a .jar file to jobmanager-1
>>> - Run a streaming job from the jar on jobmanager-1
>>> - Wait for 1 or 2 checkpoints to succeed
>>> - Kill pod of jobmanager-1
>>> After a short delay, 

Re: JobGraphs not cleaned up in HA mode

2018-08-28 Thread Encho Mishinev
Hello Till,

I spend a few more hours testing and looking at the logs and it seems like
there's a more general problem here. While the two job managers are active
neither of them can properly delete jobgraphs. The above problem I
described comes from the fact that Kubernetes gets JobManager 1 quickly
after I manually kill it, so when I stop the job on JobManager 2 both are
alive.

I did a very simple test:

- Start both job managers
- Start a batch job in JobManager 1 and let it finish
The jobgraphs in both Zookeeper and HDFS remained.

On the other hand if we do:

- Start only JobManager 1 (again in HA mode)
- Start a batch job and let it finish
The jobgraphs in both Zookeeper and HDFS are deleted fine.

It seems like the standby manager still leaves some kind of lock on the
jobgraphs. Do you think that's possible? Have you seen a similar problem?
The only logs that appear on the standby manager while waiting are of the
type:

2018-08-28 11:54:10,789 INFO
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
Recovered SubmittedJobGraph(9e0a109b57511930c95d3b54574a66e3, null).

Note that this log appears on the standby jobmanager immediately when a new
job is submitted to the active jobmanager.
Also note that the blobs and checkpoints are cleared fine. The problem is
only for jobgraphs both in ZooKeeper and HDFS.

Trying to access the UI of the standby manager redirects to the active one,
so it is not a problem of them not knowing who the leader is. Do you have
any ideas?

Thanks a lot,
Encho

On Tue, Aug 28, 2018 at 10:27 AM Till Rohrmann  wrote:

> Hi Encho,
>
> thanks a lot for reporting this issue. The problem arises whenever the old
> leader maintains the connection to ZooKeeper. If this is the case, then
> ephemeral nodes which we create to protect against faulty delete operations
> are not removed and consequently the new leader is not able to delete the
> persisted job graph. So one thing to check is whether the old JM still has
> an open connection to ZooKeeper. The next thing to check is the session
> timeout of your ZooKeeper cluster. If you stop the job within the session
> timeout, then it is also not guaranteed that ZooKeeper has detected that
> the ephemeral nodes of the old JM must be deleted. In order to understand
> this better it would be helpful if you could tell us the timing of the
> different actions.
>
> Cheers,
> Till
>
> On Tue, Aug 28, 2018 at 8:17 AM vino yang  wrote:
>
>> Hi Encho,
>>
>> A temporary solution can be used to determine if it has been cleaned up
>> by monitoring the specific JobID under Zookeeper's "/jobgraph".
>> Another solution, modify the source code, rudely modify the cleanup mode
>> to the synchronous form, but the flink operation Zookeeper's path needs to
>> obtain the corresponding lock, so it is dangerous to do so, and it is not
>> recommended.
>> I think maybe this problem can be solved in the next version. It depends
>> on Till.
>>
>> Thanks, vino.
>>
>> Encho Mishinev  于2018年8月28日周二 下午1:17写道:
>>
>>> Thank you very much for the info! Will keep track of the progress.
>>>
>>> In the meantime is there any viable workaround? It seems like HA doesn't
>>> really work due to this bug.
>>>
>>> On Tue, Aug 28, 2018 at 4:52 AM vino yang  wrote:
>>>
 About some implementation mechanisms.
 Flink uses Zookeeper to store JobGraph (Job's description information
 and metadata) as a basis for Job recovery.
 However, previous implementations may cause this information to not be
 properly cleaned up because it is asynchronously deleted by a background
 thread.

 Thanks, vino.

 vino yang  于2018年8月28日周二 上午9:49写道:

> Hi Encho,
>
> This is a problem already known to the Flink community, you can track
> its progress through FLINK-10011[1], and currently Till is fixing this
> issue.
>
> [1]: https://issues.apache.org/jira/browse/FLINK-10011
>
> Thanks, vino.
>
> Encho Mishinev  于2018年8月27日周一 下午10:13写道:
>
>> I am running Flink 1.5.3 with two job managers and two task managers
>> in Kubernetes along with HDFS and Zookeeper in high-availability mode.
>>
>> My problem occurs after the following actions:
>> - Upload a .jar file to jobmanager-1
>> - Run a streaming job from the jar on jobmanager-1
>> - Wait for 1 or 2 checkpoints to succeed
>> - Kill pod of jobmanager-1
>> After a short delay, jobmanager-2 takes leadership and correctly
>> restores the job and continues it
>> - Stop job from jobmanager-2
>>
>> At this point all seems well, but the problem is that jobmanager-2
>> does not clean up anything that was left from jobmanager-1. This means 
>> that
>> both in HDFS and in Zookeeper remain job graphs, which later on obstruct
>> any work of both managers as after any reset they unsuccessfully try to
>> restore a non-existent job and fail over and over again.
>>
>> I am quite certain that 

Re: JobGraphs not cleaned up in HA mode

2018-08-28 Thread Till Rohrmann
Hi Encho,

thanks a lot for reporting this issue. The problem arises whenever the old
leader maintains the connection to ZooKeeper. If this is the case, then
ephemeral nodes which we create to protect against faulty delete operations
are not removed and consequently the new leader is not able to delete the
persisted job graph. So one thing to check is whether the old JM still has
an open connection to ZooKeeper. The next thing to check is the session
timeout of your ZooKeeper cluster. If you stop the job within the session
timeout, then it is also not guaranteed that ZooKeeper has detected that
the ephemeral nodes of the old JM must be deleted. In order to understand
this better it would be helpful if you could tell us the timing of the
different actions.

Cheers,
Till

On Tue, Aug 28, 2018 at 8:17 AM vino yang  wrote:

> Hi Encho,
>
> A temporary solution can be used to determine if it has been cleaned up by
> monitoring the specific JobID under Zookeeper's "/jobgraph".
> Another solution, modify the source code, rudely modify the cleanup mode
> to the synchronous form, but the flink operation Zookeeper's path needs to
> obtain the corresponding lock, so it is dangerous to do so, and it is not
> recommended.
> I think maybe this problem can be solved in the next version. It depends
> on Till.
>
> Thanks, vino.
>
> Encho Mishinev  于2018年8月28日周二 下午1:17写道:
>
>> Thank you very much for the info! Will keep track of the progress.
>>
>> In the meantime is there any viable workaround? It seems like HA doesn't
>> really work due to this bug.
>>
>> On Tue, Aug 28, 2018 at 4:52 AM vino yang  wrote:
>>
>>> About some implementation mechanisms.
>>> Flink uses Zookeeper to store JobGraph (Job's description information
>>> and metadata) as a basis for Job recovery.
>>> However, previous implementations may cause this information to not be
>>> properly cleaned up because it is asynchronously deleted by a background
>>> thread.
>>>
>>> Thanks, vino.
>>>
>>> vino yang  于2018年8月28日周二 上午9:49写道:
>>>
 Hi Encho,

 This is a problem already known to the Flink community, you can track
 its progress through FLINK-10011[1], and currently Till is fixing this
 issue.

 [1]: https://issues.apache.org/jira/browse/FLINK-10011

 Thanks, vino.

 Encho Mishinev  于2018年8月27日周一 下午10:13写道:

> I am running Flink 1.5.3 with two job managers and two task managers
> in Kubernetes along with HDFS and Zookeeper in high-availability mode.
>
> My problem occurs after the following actions:
> - Upload a .jar file to jobmanager-1
> - Run a streaming job from the jar on jobmanager-1
> - Wait for 1 or 2 checkpoints to succeed
> - Kill pod of jobmanager-1
> After a short delay, jobmanager-2 takes leadership and correctly
> restores the job and continues it
> - Stop job from jobmanager-2
>
> At this point all seems well, but the problem is that jobmanager-2
> does not clean up anything that was left from jobmanager-1. This means 
> that
> both in HDFS and in Zookeeper remain job graphs, which later on obstruct
> any work of both managers as after any reset they unsuccessfully try to
> restore a non-existent job and fail over and over again.
>
> I am quite certain that jobmanager-2 does not know about any of
> jobmanager-1’s files since the Zookeeper logs reveal that it tries to
> duplicate job folders:
>
> 2018-08-27 13:11:00,038 [myid:] - INFO  [ProcessThread(sid:0
> cport:2181)::PrepRequestProcessor@648] - Got user-level
> KeeperException when processing sessionid:0x1657aa15e480033 type:create
> cxid:0x46 zxid:0x1ab txntype:-1 reqpath:n/a Error
> Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77
> Error:KeeperErrorCode = NodeExists for
> /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77
>
> 2018-08-27 13:11:02,296 [myid:] - INFO  [ProcessThread(sid:0
> cport:2181)::PrepRequestProcessor@648] - Got user-level
> KeeperException when processing sessionid:0x1657aa15e480033 type:create
> cxid:0x5c zxid:0x1ac txntype:-1 reqpath:n/a Error
> Path:/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15
> Error:KeeperErrorCode = NodeExists for
> /flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15
>
> Also jobmanager-2 attempts to delete the jobgraphs folder in Zookeeper
> when the job is stopped, but fails since there are leftover files in it
> from jobmanager-1:
>
> 2018-08-27 13:12:13,406 [myid:] - INFO  [ProcessThread(sid:0
> cport:2181)::PrepRequestProcessor@648] - Got user-level
> KeeperException when processing sessionid:0x1657aa15e480033 type:delete
> cxid:0xa8 zxid:0x1bd txntype:-1 reqpath:n/a Error
> Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15
> Error:KeeperErrorCode = Directory 

Re: JobGraphs not cleaned up in HA mode

2018-08-28 Thread vino yang
Hi Encho,

A temporary solution can be used to determine if it has been cleaned up by
monitoring the specific JobID under Zookeeper's "/jobgraph".
Another solution, modify the source code, rudely modify the cleanup mode to
the synchronous form, but the flink operation Zookeeper's path needs to
obtain the corresponding lock, so it is dangerous to do so, and it is not
recommended.
I think maybe this problem can be solved in the next version. It depends on
Till.

Thanks, vino.

Encho Mishinev  于2018年8月28日周二 下午1:17写道:

> Thank you very much for the info! Will keep track of the progress.
>
> In the meantime is there any viable workaround? It seems like HA doesn't
> really work due to this bug.
>
> On Tue, Aug 28, 2018 at 4:52 AM vino yang  wrote:
>
>> About some implementation mechanisms.
>> Flink uses Zookeeper to store JobGraph (Job's description information and
>> metadata) as a basis for Job recovery.
>> However, previous implementations may cause this information to not be
>> properly cleaned up because it is asynchronously deleted by a background
>> thread.
>>
>> Thanks, vino.
>>
>> vino yang  于2018年8月28日周二 上午9:49写道:
>>
>>> Hi Encho,
>>>
>>> This is a problem already known to the Flink community, you can track
>>> its progress through FLINK-10011[1], and currently Till is fixing this
>>> issue.
>>>
>>> [1]: https://issues.apache.org/jira/browse/FLINK-10011
>>>
>>> Thanks, vino.
>>>
>>> Encho Mishinev  于2018年8月27日周一 下午10:13写道:
>>>
 I am running Flink 1.5.3 with two job managers and two task managers in
 Kubernetes along with HDFS and Zookeeper in high-availability mode.

 My problem occurs after the following actions:
 - Upload a .jar file to jobmanager-1
 - Run a streaming job from the jar on jobmanager-1
 - Wait for 1 or 2 checkpoints to succeed
 - Kill pod of jobmanager-1
 After a short delay, jobmanager-2 takes leadership and correctly
 restores the job and continues it
 - Stop job from jobmanager-2

 At this point all seems well, but the problem is that jobmanager-2 does
 not clean up anything that was left from jobmanager-1. This means that both
 in HDFS and in Zookeeper remain job graphs, which later on obstruct any
 work of both managers as after any reset they unsuccessfully try to restore
 a non-existent job and fail over and over again.

 I am quite certain that jobmanager-2 does not know about any of
 jobmanager-1’s files since the Zookeeper logs reveal that it tries to
 duplicate job folders:

 2018-08-27 13:11:00,038 [myid:] - INFO  [ProcessThread(sid:0
 cport:2181)::PrepRequestProcessor@648] - Got user-level
 KeeperException when processing sessionid:0x1657aa15e480033 type:create
 cxid:0x46 zxid:0x1ab txntype:-1 reqpath:n/a Error
 Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77
 Error:KeeperErrorCode = NodeExists for
 /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77

 2018-08-27 13:11:02,296 [myid:] - INFO  [ProcessThread(sid:0
 cport:2181)::PrepRequestProcessor@648] - Got user-level
 KeeperException when processing sessionid:0x1657aa15e480033 type:create
 cxid:0x5c zxid:0x1ac txntype:-1 reqpath:n/a Error
 Path:/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15
 Error:KeeperErrorCode = NodeExists for
 /flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15

 Also jobmanager-2 attempts to delete the jobgraphs folder in Zookeeper
 when the job is stopped, but fails since there are leftover files in it
 from jobmanager-1:

 2018-08-27 13:12:13,406 [myid:] - INFO  [ProcessThread(sid:0
 cport:2181)::PrepRequestProcessor@648] - Got user-level
 KeeperException when processing sessionid:0x1657aa15e480033 type:delete
 cxid:0xa8 zxid:0x1bd txntype:-1 reqpath:n/a Error
 Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15
 Error:KeeperErrorCode = Directory not empty for
 /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15

 I’ve noticed that when restoring the job, it seems like jobmanager-2
 does not get anything more than jobID, while it perhaps needs some
 metadata? Here is the log that seems suspicious to me:

 2018-08-27 13:09:18,113 INFO
 org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
 Recovered SubmittedJobGraph(83bfa359ca59ce1d4635e18e16651e15, null).

 All other logs seem fine in jobmanager-2, it doesn’t seem to be aware
 that it’s overwriting anything or not deleting properly.

 My question is - what is the intended way for the job managers to
 correctly exchange metadata in HA mode and why is it not working for me?

 Thanks in advance!
>>>
>>>


Re: JobGraphs not cleaned up in HA mode

2018-08-27 Thread Encho Mishinev
Thank you very much for the info! Will keep track of the progress.

In the meantime is there any viable workaround? It seems like HA doesn't
really work due to this bug.

On Tue, Aug 28, 2018 at 4:52 AM vino yang  wrote:

> About some implementation mechanisms.
> Flink uses Zookeeper to store JobGraph (Job's description information and
> metadata) as a basis for Job recovery.
> However, previous implementations may cause this information to not be
> properly cleaned up because it is asynchronously deleted by a background
> thread.
>
> Thanks, vino.
>
> vino yang  于2018年8月28日周二 上午9:49写道:
>
>> Hi Encho,
>>
>> This is a problem already known to the Flink community, you can track its
>> progress through FLINK-10011[1], and currently Till is fixing this issue.
>>
>> [1]: https://issues.apache.org/jira/browse/FLINK-10011
>>
>> Thanks, vino.
>>
>> Encho Mishinev  于2018年8月27日周一 下午10:13写道:
>>
>>> I am running Flink 1.5.3 with two job managers and two task managers in
>>> Kubernetes along with HDFS and Zookeeper in high-availability mode.
>>>
>>> My problem occurs after the following actions:
>>> - Upload a .jar file to jobmanager-1
>>> - Run a streaming job from the jar on jobmanager-1
>>> - Wait for 1 or 2 checkpoints to succeed
>>> - Kill pod of jobmanager-1
>>> After a short delay, jobmanager-2 takes leadership and correctly
>>> restores the job and continues it
>>> - Stop job from jobmanager-2
>>>
>>> At this point all seems well, but the problem is that jobmanager-2 does
>>> not clean up anything that was left from jobmanager-1. This means that both
>>> in HDFS and in Zookeeper remain job graphs, which later on obstruct any
>>> work of both managers as after any reset they unsuccessfully try to restore
>>> a non-existent job and fail over and over again.
>>>
>>> I am quite certain that jobmanager-2 does not know about any of
>>> jobmanager-1’s files since the Zookeeper logs reveal that it tries to
>>> duplicate job folders:
>>>
>>> 2018-08-27 13:11:00,038 [myid:] - INFO  [ProcessThread(sid:0
>>> cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException
>>> when processing sessionid:0x1657aa15e480033 type:create cxid:0x46
>>> zxid:0x1ab txntype:-1 reqpath:n/a Error
>>> Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77
>>> Error:KeeperErrorCode = NodeExists for
>>> /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77
>>>
>>> 2018-08-27 13:11:02,296 [myid:] - INFO  [ProcessThread(sid:0
>>> cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException
>>> when processing sessionid:0x1657aa15e480033 type:create cxid:0x5c
>>> zxid:0x1ac txntype:-1 reqpath:n/a Error
>>> Path:/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15
>>> Error:KeeperErrorCode = NodeExists for
>>> /flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15
>>>
>>> Also jobmanager-2 attempts to delete the jobgraphs folder in Zookeeper
>>> when the job is stopped, but fails since there are leftover files in it
>>> from jobmanager-1:
>>>
>>> 2018-08-27 13:12:13,406 [myid:] - INFO  [ProcessThread(sid:0
>>> cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException
>>> when processing sessionid:0x1657aa15e480033 type:delete cxid:0xa8
>>> zxid:0x1bd txntype:-1 reqpath:n/a Error
>>> Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15
>>> Error:KeeperErrorCode = Directory not empty for
>>> /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15
>>>
>>> I’ve noticed that when restoring the job, it seems like jobmanager-2
>>> does not get anything more than jobID, while it perhaps needs some
>>> metadata? Here is the log that seems suspicious to me:
>>>
>>> 2018-08-27 13:09:18,113 INFO
>>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>>> Recovered SubmittedJobGraph(83bfa359ca59ce1d4635e18e16651e15, null).
>>>
>>> All other logs seem fine in jobmanager-2, it doesn’t seem to be aware
>>> that it’s overwriting anything or not deleting properly.
>>>
>>> My question is - what is the intended way for the job managers to
>>> correctly exchange metadata in HA mode and why is it not working for me?
>>>
>>> Thanks in advance!
>>
>>


Re: JobGraphs not cleaned up in HA mode

2018-08-27 Thread vino yang
Hi Encho,

This is a problem already known to the Flink community, you can track its
progress through FLINK-10011[1], and currently Till is fixing this issue.

[1]: https://issues.apache.org/jira/browse/FLINK-10011

Thanks, vino.

Encho Mishinev  于2018年8月27日周一 下午10:13写道:

> I am running Flink 1.5.3 with two job managers and two task managers in
> Kubernetes along with HDFS and Zookeeper in high-availability mode.
>
> My problem occurs after the following actions:
> - Upload a .jar file to jobmanager-1
> - Run a streaming job from the jar on jobmanager-1
> - Wait for 1 or 2 checkpoints to succeed
> - Kill pod of jobmanager-1
> After a short delay, jobmanager-2 takes leadership and correctly restores
> the job and continues it
> - Stop job from jobmanager-2
>
> At this point all seems well, but the problem is that jobmanager-2 does
> not clean up anything that was left from jobmanager-1. This means that both
> in HDFS and in Zookeeper remain job graphs, which later on obstruct any
> work of both managers as after any reset they unsuccessfully try to restore
> a non-existent job and fail over and over again.
>
> I am quite certain that jobmanager-2 does not know about any of
> jobmanager-1’s files since the Zookeeper logs reveal that it tries to
> duplicate job folders:
>
> 2018-08-27 13:11:00,038 [myid:] - INFO  [ProcessThread(sid:0
> cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException
> when processing sessionid:0x1657aa15e480033 type:create cxid:0x46
> zxid:0x1ab txntype:-1 reqpath:n/a Error
> Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77
> Error:KeeperErrorCode = NodeExists for
> /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77
>
> 2018-08-27 13:11:02,296 [myid:] - INFO  [ProcessThread(sid:0
> cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException
> when processing sessionid:0x1657aa15e480033 type:create cxid:0x5c
> zxid:0x1ac txntype:-1 reqpath:n/a Error
> Path:/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15
> Error:KeeperErrorCode = NodeExists for
> /flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15
>
> Also jobmanager-2 attempts to delete the jobgraphs folder in Zookeeper
> when the job is stopped, but fails since there are leftover files in it
> from jobmanager-1:
>
> 2018-08-27 13:12:13,406 [myid:] - INFO  [ProcessThread(sid:0
> cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException
> when processing sessionid:0x1657aa15e480033 type:delete cxid:0xa8
> zxid:0x1bd txntype:-1 reqpath:n/a Error
> Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15
> Error:KeeperErrorCode = Directory not empty for
> /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15
>
> I’ve noticed that when restoring the job, it seems like jobmanager-2 does
> not get anything more than jobID, while it perhaps needs some metadata?
> Here is the log that seems suspicious to me:
>
> 2018-08-27 13:09:18,113 INFO
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
> Recovered SubmittedJobGraph(83bfa359ca59ce1d4635e18e16651e15, null).
>
> All other logs seem fine in jobmanager-2, it doesn’t seem to be aware that
> it’s overwriting anything or not deleting properly.
>
> My question is - what is the intended way for the job managers to
> correctly exchange metadata in HA mode and why is it not working for me?
>
> Thanks in advance!


JobGraphs not cleaned up in HA mode

2018-08-27 Thread Encho Mishinev
I am running Flink 1.5.3 with two job managers and two task managers in 
Kubernetes along with HDFS and Zookeeper in high-availability mode.

My problem occurs after the following actions:
- Upload a .jar file to jobmanager-1
- Run a streaming job from the jar on jobmanager-1
- Wait for 1 or 2 checkpoints to succeed
- Kill pod of jobmanager-1
After a short delay, jobmanager-2 takes leadership and correctly restores the 
job and continues it
- Stop job from jobmanager-2

At this point all seems well, but the problem is that jobmanager-2 does not 
clean up anything that was left from jobmanager-1. This means that both in HDFS 
and in Zookeeper remain job graphs, which later on obstruct any work of both 
managers as after any reset they unsuccessfully try to restore a non-existent 
job and fail over and over again.

I am quite certain that jobmanager-2 does not know about any of jobmanager-1’s 
files since the Zookeeper logs reveal that it tries to duplicate job folders:

2018-08-27 13:11:00,038 [myid:] - INFO  [ProcessThread(sid:0 
cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when 
processing sessionid:0x1657aa15e480033 type:create cxid:0x46 zxid:0x1ab 
txntype:-1 reqpath:n/a Error 
Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77
 Error:KeeperErrorCode = NodeExists for 
/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77

2018-08-27 13:11:02,296 [myid:] - INFO  [ProcessThread(sid:0 
cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when 
processing sessionid:0x1657aa15e480033 type:create cxid:0x5c zxid:0x1ac 
txntype:-1 reqpath:n/a Error 
Path:/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 
Error:KeeperErrorCode = NodeExists for 
/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15

Also jobmanager-2 attempts to delete the jobgraphs folder in Zookeeper when the 
job is stopped, but fails since there are leftover files in it from 
jobmanager-1:

2018-08-27 13:12:13,406 [myid:] - INFO  [ProcessThread(sid:0 
cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when 
processing sessionid:0x1657aa15e480033 type:delete cxid:0xa8 zxid:0x1bd 
txntype:-1 reqpath:n/a Error 
Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15 
Error:KeeperErrorCode = Directory not empty for 
/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15

I’ve noticed that when restoring the job, it seems like jobmanager-2 does not 
get anything more than jobID, while it perhaps needs some metadata? Here is the 
log that seems suspicious to me:

2018-08-27 13:09:18,113 INFO  
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  - 
Recovered SubmittedJobGraph(83bfa359ca59ce1d4635e18e16651e15, null).

All other logs seem fine in jobmanager-2, it doesn’t seem to be aware that it’s 
overwriting anything or not deleting properly.

My question is - what is the intended way for the job managers to correctly 
exchange metadata in HA mode and why is it not working for me?

Thanks in advance!