?????? JobGraphs not cleaned up in HA mode
the chk-* directory is not found , I think the misssing because of jobmanager removes it automaticly , but why it still in zookeeper? -- -- ??: "Vijay Bhaskar"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: JobGraphs not cleaned up in HA mode
One more thing: You configured: high-availability.cluster-id: /cluster-test it should be: high-availability.cluster-id: cluster-test I don't think this is major issue, in case it helps, you can check. Can you check one more thing: Is check pointing happening or not? Were you able to see the chk-* folder under checkpoint directory? Regards Bhaskar On Thu, Nov 28, 2019 at 5:00 PM 曾祥才 wrote: > hi, > Is there any deference (for me using nas is more convenient to test > currently)? > from the docs seems hdfs ,s3, nfs etc all will be fine. > > > > -- 原始邮件 -- > *发件人:* "vino yang"; > *发送时间:* 2019年11月28日(星期四) 晚上7:17 > *收件人:* "曾祥才"; > *抄送:* "Vijay Bhaskar";"User-Flink"< > user@flink.apache.org>; > *主题:* Re: JobGraphs not cleaned up in HA mode > > Hi, > > Why do you not use HDFS directly? > > Best, > Vino > > 曾祥才 于2019年11月28日周四 下午6:48写道: > >> >> anyone have the same problem? pls help, thks >> >> >> >> ------ 原始邮件 ------ >> *发件人:* "曾祥才"; >> *发送时间:* 2019年11月28日(星期四) 下午2:46 >> *收件人:* "Vijay Bhaskar"; >> *抄送:* "User-Flink"; >> *主题:* 回复: JobGraphs not cleaned up in HA mode >> >> the config (/flink is the NASdirectory ): >> >> jobmanager.rpc.address: flink-jobmanager >> taskmanager.numberOfTaskSlots: 16 >> web.upload.dir: /flink/webUpload >> blob.server.port: 6124 >> jobmanager.rpc.port: 6123 >> taskmanager.rpc.port: 6122 >> jobmanager.heap.size: 1024m >> taskmanager.heap.size: 1024m >> high-availability: zookeeper >> high-availability.cluster-id: /cluster-test >> high-availability.storageDir: /flink/ha >> high-availability.zookeeper.quorum: :2181 >> high-availability.jobmanager.port: 6123 >> high-availability.zookeeper.path.root: /flink/risk-insight >> high-availability.zookeeper.path.checkpoints: /flink/zk-checkpoints >> state.backend: filesystem >> state.checkpoints.dir: file:///flink/checkpoints >> state.savepoints.dir: file:///flink/savepoints >> state.checkpoints.num-retained: 2 >> jobmanager.execution.failover-strategy: region >> jobmanager.archive.fs.dir: file:///flink/archive/history >> >> >> >> -- 原始邮件 -- >> *发件人:* "Vijay Bhaskar"; >> *发送时间:* 2019年11月28日(星期四) 下午3:12 >> *收件人:* "曾祥才"; >> *抄送:* "User-Flink"; >> *主题:* Re: JobGraphs not cleaned up in HA mode >> >> Can you share the flink configuration once? >> >> Regards >> Bhaskar >> >> On Thu, Nov 28, 2019 at 12:09 PM 曾祥才 wrote: >> >>> if i clean the zookeeper data , it runs fine . but next time when the >>> jobmanager failed and redeploy the error occurs again >>> >>> >>> >>> >>> -- 原始邮件 -- >>> *发件人:* "Vijay Bhaskar"; >>> *发送时间:* 2019年11月28日(星期四) 下午3:05 >>> *收件人:* "曾祥才"; >>> *主题:* Re: JobGraphs not cleaned up in HA mode >>> >>> Again it could not find the state store file: "Caused by: >>> java.io.FileNotFoundException: /flink/ha/submittedJobGraph0c6bcff01199 " >>> Check why its unable to find. >>> Better thing is: Clean up zookeeper state and check your configurations, >>> correct them and restart cluster. >>> Otherwise it always picks up corrupted state from zookeeper and it will >>> never restart >>> >>> Regards >>> Bhaskar >>> >>> On Thu, Nov 28, 2019 at 11:51 AM 曾祥才 wrote: >>> >>>> i've made a misstake( the log before is another cluster) . the full >>>> exception log is : >>>> >>>> >>>> INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - >>>> Recovering all persisted jobs. >>>> 2019-11-28 02:33:12,726 INFO >>>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - >>>> Starting the SlotManager. >>>> 2019-11-28 02:33:12,743 INFO >>>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - >>>> Released locks of job graph 639170a9d710bacfd113ca66b2aacefa from >>>> ZooKeeper. >>>> 2019-11-28 02:33:12,744 ERROR >>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error >>>> occurred in the cluster entrypoint. >>>> org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take >>&g
?????? JobGraphs not cleaned up in HA mode
hi?? Is there any deference ??for me using nas is more convenient to test currently??? from the docs seems hdfs ,s3, nfs etc all will be fine. -- -- ??: "vino yang"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: JobGraphs not cleaned up in HA mode
Hi, Why do you not use HDFS directly? Best, Vino 曾祥才 于2019年11月28日周四 下午6:48写道: > > anyone have the same problem? pls help, thks > > > > -- 原始邮件 -- > *发件人:* "曾祥才"; > *发送时间:* 2019年11月28日(星期四) 下午2:46 > *收件人:* "Vijay Bhaskar"; > *抄送:* "User-Flink"; > *主题:* 回复: JobGraphs not cleaned up in HA mode > > the config (/flink is the NASdirectory ): > > jobmanager.rpc.address: flink-jobmanager > taskmanager.numberOfTaskSlots: 16 > web.upload.dir: /flink/webUpload > blob.server.port: 6124 > jobmanager.rpc.port: 6123 > taskmanager.rpc.port: 6122 > jobmanager.heap.size: 1024m > taskmanager.heap.size: 1024m > high-availability: zookeeper > high-availability.cluster-id: /cluster-test > high-availability.storageDir: /flink/ha > high-availability.zookeeper.quorum: :2181 > high-availability.jobmanager.port: 6123 > high-availability.zookeeper.path.root: /flink/risk-insight > high-availability.zookeeper.path.checkpoints: /flink/zk-checkpoints > state.backend: filesystem > state.checkpoints.dir: file:///flink/checkpoints > state.savepoints.dir: file:///flink/savepoints > state.checkpoints.num-retained: 2 > jobmanager.execution.failover-strategy: region > jobmanager.archive.fs.dir: file:///flink/archive/history > > > > ------ 原始邮件 -------------- > *发件人:* "Vijay Bhaskar"; > *发送时间:* 2019年11月28日(星期四) 下午3:12 > *收件人:* "曾祥才"; > *抄送:* "User-Flink"; > *主题:* Re: JobGraphs not cleaned up in HA mode > > Can you share the flink configuration once? > > Regards > Bhaskar > > On Thu, Nov 28, 2019 at 12:09 PM 曾祥才 wrote: > >> if i clean the zookeeper data , it runs fine . but next time when the >> jobmanager failed and redeploy the error occurs again >> >> >> >> >> -- 原始邮件 -- >> *发件人:* "Vijay Bhaskar"; >> *发送时间:* 2019年11月28日(星期四) 下午3:05 >> *收件人:* "曾祥才"; >> *主题:* Re: JobGraphs not cleaned up in HA mode >> >> Again it could not find the state store file: "Caused by: >> java.io.FileNotFoundException: /flink/ha/submittedJobGraph0c6bcff01199 " >> Check why its unable to find. >> Better thing is: Clean up zookeeper state and check your configurations, >> correct them and restart cluster. >> Otherwise it always picks up corrupted state from zookeeper and it will >> never restart >> >> Regards >> Bhaskar >> >> On Thu, Nov 28, 2019 at 11:51 AM 曾祥才 wrote: >> >>> i've made a misstake( the log before is another cluster) . the full >>> exception log is : >>> >>> >>> INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - >>> Recovering all persisted jobs. >>> 2019-11-28 02:33:12,726 INFO >>> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - >>> Starting the SlotManager. >>> 2019-11-28 02:33:12,743 INFO >>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - >>> Released locks of job graph 639170a9d710bacfd113ca66b2aacefa from >>> ZooKeeper. >>> 2019-11-28 02:33:12,744 ERROR >>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error >>> occurred in the cluster entrypoint. >>> org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take >>> leadership with session id 607e1fe0-f1b4-4af0-af16-4137d779defb. >>> at >>> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$30(Dispatcher.java:915) >>> >>> at >>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) >>> >>> at >>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) >>> >>> at >>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) >>> >>> at >>> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575) >>> at >>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:594) >>> >>> at >>> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456) >>> >>> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) >>> at >>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) >>> >>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>> at >>> akka.dispatch.fo
?????? JobGraphs not cleaned up in HA mode
anyone have the same problem?? pls help, thks -- -- ??: "??"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
?????? JobGraphs not cleaned up in HA mode
the config (/flink is the NASdirectory ): jobmanager.rpc.address: flink-jobmanager taskmanager.numberOfTaskSlots: 16 web.upload.dir: /flink/webUpload blob.server.port: 6124 jobmanager.rpc.port: 6123 taskmanager.rpc.port: 6122 jobmanager.heap.size: 1024m taskmanager.heap.size: 1024m high-availability: zookeeper high-availability.cluster-id: /cluster-test high-availability.storageDir: /flink/ha high-availability.zookeeper.quorum: :2181 high-availability.jobmanager.port: 6123 high-availability.zookeeper.path.root: /flink/risk-insight high-availability.zookeeper.path.checkpoints: /flink/zk-checkpoints state.backend: filesystem state.checkpoints.dir: file:///flink/checkpoints state.savepoints.dir: file:///flink/savepoints state.checkpoints.num-retained: 2 jobmanager.execution.failover-strategy: region jobmanager.archive.fs.dir: file:///flink/archive/history -- -- ??: "Vijay Bhaskar"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
?????? JobGraphs not cleaned up in HA mode
if i clean the zookeeper data , it runs fine . but next time when the jobmanager failed and redeploy the error occurs again -- -- ??: "Vijay Bhaskar"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: JobGraphs not cleaned up in HA mode
Can you share the flink configuration once? Regards Bhaskar On Thu, Nov 28, 2019 at 12:09 PM 曾祥才 wrote: > if i clean the zookeeper data , it runs fine . but next time when the > jobmanager failed and redeploy the error occurs again > > > > > -- 原始邮件 -- > *发件人:* "Vijay Bhaskar"; > *发送时间:* 2019年11月28日(星期四) 下午3:05 > *收件人:* "曾祥才"; > *主题:* Re: JobGraphs not cleaned up in HA mode > > Again it could not find the state store file: "Caused by: > java.io.FileNotFoundException: /flink/ha/submittedJobGraph0c6bcff01199 " > Check why its unable to find. > Better thing is: Clean up zookeeper state and check your configurations, > correct them and restart cluster. > Otherwise it always picks up corrupted state from zookeeper and it will > never restart > > Regards > Bhaskar > > On Thu, Nov 28, 2019 at 11:51 AM 曾祥才 wrote: > >> i've made a misstake( the log before is another cluster) . the full >> exception log is : >> >> >> INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - >> Recovering all persisted jobs. >> 2019-11-28 02:33:12,726 INFO >> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - >> Starting the SlotManager. >> 2019-11-28 02:33:12,743 INFO >> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - >> Released locks of job graph 639170a9d710bacfd113ca66b2aacefa from >> ZooKeeper. >> 2019-11-28 02:33:12,744 ERROR >> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error >> occurred in the cluster entrypoint. >> org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take >> leadership with session id 607e1fe0-f1b4-4af0-af16-4137d779defb. >> at >> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$30(Dispatcher.java:915) >> at >> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) >> at >> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) >> at >> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) >> at >> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575) >> at >> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:594) >> at >> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456) >> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) >> at >> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) >> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >> at >> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >> at >> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >> Caused by: java.lang.RuntimeException: >> org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph >> from state handle under /639170a9d710bacfd113ca66b2aacefa. This indicates >> that the retrieved state handle is broken. Try cleaning the state handle >> store. >> at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199) >> at >> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75) >> at >> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) >> at >> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) >> ... 7 more >> Caused by: org.apache.flink.util.FlinkException: Could not retrieve >> submitted JobGraph from state handle under >> /639170a9d710bacfd113ca66b2aacefa. This indicates that the retrieved state >> handle is broken. Try cleaning the state handle store. >> at >> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:190) >> at >> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:755) >> at >> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:740) >> at >> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:721) >> at >> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$27(Dispatcher.java:888) >> at >> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) >> ... 9 more >> Caused by: java.io.FileNotFoundException: >> /flink/ha/submittedJobGraph0c6bcff0119
Re: JobGraphs not cleaned up in HA mode
Is it filesystem or hadoop? If its NAS then why the exception "Caused by: org.apache.hadoop.hdfs.BlockMissingException: " It seems you configured hadoop state store and giving NAS mount. Regards Bhaskar On Thu, Nov 28, 2019 at 11:36 AM 曾祥才 wrote: > /flink/checkpoints is a external persistent store (a nas directory mounts > to the job manager) > > > > > -- 原始邮件 -- > *发件人:* "Vijay Bhaskar"; > *发送时间:* 2019年11月28日(星期四) 下午2:29 > *收件人:* "曾祥才"; > *抄送:* "user"; > *主题:* Re: JobGraphs not cleaned up in HA mode > > Following are the mandatory condition to run in HA: > > a) You should have persistent common external store for jobmanager and > task managers to while writing the state > b) You should have persistent external store for zookeeper to store the > Jobgraph. > > Zookeeper is referring path: > /flink/checkpoints/submittedJobGraph480ddf9572ed to get the job graph but > jobmanager unable to find it. > It seems /flink/checkpoints is not the external persistent store > > > Regards > Bhaskar > > On Thu, Nov 28, 2019 at 10:43 AM seuzxc wrote: > >> hi ,I've the same problem with flink 1.9.1 , any solution to fix it >> when the k8s redoploy jobmanager , the error looks like (seems zk not >> remove submitted job info, but jobmanager remove the file): >> >> >> Caused by: org.apache.flink.util.FlinkException: Could not retrieve >> submitted JobGraph from state handle under >> /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state >> handle is broken. Try cleaning the state handle store. >> at >> >> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208) >> at >> >> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696) >> at >> >> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681) >> at >> >> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:662) >> at >> >> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:821) >> at >> >> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:72) >> ... 9 more >> Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain >> block: BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494 >> file=/flink/checkpoints/submittedJobGraph480ddf9572ed >> at >> >> org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1052) >> >> >> >> -- >> Sent from: >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >> >
?????? JobGraphs not cleaned up in HA mode
/flink/checkpoints is a external persistent store (a nas directory mounts to the job manager) -- -- ??: "Vijay Bhaskar"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: JobGraphs not cleaned up in HA mode
Following are the mandatory condition to run in HA: a) You should have persistent common external store for jobmanager and task managers to while writing the state b) You should have persistent external store for zookeeper to store the Jobgraph. Zookeeper is referring path: /flink/checkpoints/submittedJobGraph480ddf9572ed to get the job graph but jobmanager unable to find it. It seems /flink/checkpoints is not the external persistent store Regards Bhaskar On Thu, Nov 28, 2019 at 10:43 AM seuzxc wrote: > hi ,I've the same problem with flink 1.9.1 , any solution to fix it > when the k8s redoploy jobmanager , the error looks like (seems zk not > remove submitted job info, but jobmanager remove the file): > > > Caused by: org.apache.flink.util.FlinkException: Could not retrieve > submitted JobGraph from state handle under > /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state > handle is broken. Try cleaning the state handle store. > at > > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208) > at > > org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696) > at > > org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681) > at > > org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:662) > at > > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:821) > at > > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:72) > ... 9 more > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494 > file=/flink/checkpoints/submittedJobGraph480ddf9572ed > at > > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1052) > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >
Re: JobGraphs not cleaned up in HA mode
hi ,I've the same problem with flink 1.9.1 , any solution to fix it when the k8s redoploy jobmanager , the error looks like (seems zk not remove submitted job info, but jobmanager remove the file): Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph from state handle under /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state handle is broken. Try cleaning the state handle store. at org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208) at org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696) at org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681) at org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:662) at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:821) at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:72) ... 9 more Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494 file=/flink/checkpoints/submittedJobGraph480ddf9572ed at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1052) -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: JobGraphs not cleaned up in HA mode
Hi Encho, thanks for sending the first part of the logs. What I would actually be interested in are the complete logs because somewhere in the jobmanager-2 logs there must be a log statement saying that the respective dispatcher gained leadership. I would like to see why this happens but for this to debug the complete logs are necessary. It would be awesome if you could send them to me. Thanks a lot! Cheers, Till On Wed, Aug 29, 2018 at 2:00 PM Encho Mishinev wrote: > Hi Till, > > I will use the approach with a k8s deployment and HA mode with a single > job manager. Nonetheless, here are the logs I just produced by repeating > the aforementioned experiment, hope they help in debugging: > > *- Starting Jobmanager-1:* > > Starting Job Manager > sed: cannot rename /opt/flink/conf/sedR98XPn: Device or resource busy > config file: > jobmanager.rpc.address: flink-jobmanager-1 > jobmanager.rpc.port: 6123 > jobmanager.heap.size: 8192 > taskmanager.heap.size: 8192 > taskmanager.numberOfTaskSlots: 4 > high-availability: zookeeper > high-availability.storageDir: > hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability > high-availability.zookeeper.quorum: zk-cs:2181 > high-availability.zookeeper.path.root: /flink > high-availability.jobmanager.port: 50010 > state.backend: filesystem > state.checkpoints.dir: > hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/checkpoints > state.savepoints.dir: > hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/savepoints > state.backend.incremental: false > fs.default-scheme: > hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020 > rest.port: 8081 > web.upload.dir: /opt/flink/upload > query.server.port: 6125 > taskmanager.numberOfTaskSlots: 4 > classloader.parent-first-patterns.additional: org.apache.xerces. > blob.storage.directory: /opt/flink/blob-server > blob.server.port: 6124 > blob.server.port: 6124 > query.server.port: 6125 > Starting standalonesession as a console application on host > flink-jobmanager-1-f76fd4df8-ftwt9. > 2018-08-29 11:41:48,806 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > > 2018-08-29 11:41:48,807 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting > StandaloneSessionClusterEntrypoint (Version: 1.5.3, Rev:614f216, > Date:16.08.2018 @ 06:39:50 GMT) > 2018-08-29 11:41:48,807 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current > user: flink > 2018-08-29 11:41:49,134 WARN org.apache.hadoop.util.NativeCodeLoader > - Unable to load native-hadoop library for your > platform... using builtin-java classes where applicable > 2018-08-29 11:41:49,210 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current > Hadoop/Kerberos user: flink > 2018-08-29 11:41:49,210 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: > OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13 > 2018-08-29 11:41:49,210 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum > heap size: 6702 MiBytes > 2018-08-29 11:41:49,210 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: > /docker-java-home/jre > 2018-08-29 11:41:49,213 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Hadoop > version: 2.7.5 > 2018-08-29 11:41:49,213 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM > Options: > 2018-08-29 11:41:49,213 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties > 2018-08-29 11:41:49,213 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml > 2018-08-29 11:41:49,213 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program > Arguments: > 2018-08-29 11:41:49,213 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > --configDir > 2018-08-29 11:41:49,213 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > /opt/flink/conf > 2018-08-29 11:41:49,213 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > --executionMode > 2018-08-29 11:41:49,213 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster > 2018-08-29 11:41:49,214 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --host > 2018-08-29 11:41:49,214 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster > 2018-08-29 11:41:49,214 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Classpath: > /opt/flink/lib/flink-python_2.11-1.5.3.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.5.3.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7
Re: JobGraphs not cleaned up in HA mode
Hi Till, I will use the approach with a k8s deployment and HA mode with a single job manager. Nonetheless, here are the logs I just produced by repeating the aforementioned experiment, hope they help in debugging: *- Starting Jobmanager-1:* Starting Job Manager sed: cannot rename /opt/flink/conf/sedR98XPn: Device or resource busy config file: jobmanager.rpc.address: flink-jobmanager-1 jobmanager.rpc.port: 6123 jobmanager.heap.size: 8192 taskmanager.heap.size: 8192 taskmanager.numberOfTaskSlots: 4 high-availability: zookeeper high-availability.storageDir: hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/high-availability high-availability.zookeeper.quorum: zk-cs:2181 high-availability.zookeeper.path.root: /flink high-availability.jobmanager.port: 50010 state.backend: filesystem state.checkpoints.dir: hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/checkpoints state.savepoints.dir: hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020/flink/savepoints state.backend.incremental: false fs.default-scheme: hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.cluster.local:8020 rest.port: 8081 web.upload.dir: /opt/flink/upload query.server.port: 6125 taskmanager.numberOfTaskSlots: 4 classloader.parent-first-patterns.additional: org.apache.xerces. blob.storage.directory: /opt/flink/blob-server blob.server.port: 6124 blob.server.port: 6124 query.server.port: 6125 Starting standalonesession as a console application on host flink-jobmanager-1-f76fd4df8-ftwt9. 2018-08-29 11:41:48,806 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 2018-08-29 11:41:48,807 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneSessionClusterEntrypoint (Version: 1.5.3, Rev:614f216, Date:16.08.2018 @ 06:39:50 GMT) 2018-08-29 11:41:48,807 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: flink 2018-08-29 11:41:49,134 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2018-08-29 11:41:49,210 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: flink 2018-08-29 11:41:49,210 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13 2018-08-29 11:41:49,210 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 6702 MiBytes 2018-08-29 11:41:49,210 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /docker-java-home/jre 2018-08-29 11:41:49,213 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Hadoop version: 2.7.5 2018-08-29 11:41:49,213 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options: 2018-08-29 11:41:49,213 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties 2018-08-29 11:41:49,213 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml 2018-08-29 11:41:49,213 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments: 2018-08-29 11:41:49,213 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --configDir 2018-08-29 11:41:49,213 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - /opt/flink/conf 2018-08-29 11:41:49,213 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --executionMode 2018-08-29 11:41:49,213 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster 2018-08-29 11:41:49,214 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --host 2018-08-29 11:41:49,214 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster 2018-08-29 11:41:49,214 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Classpath: /opt/flink/lib/flink-python_2.11-1.5.3.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.5.3.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.5.3.jar::: 2018-08-29 11:41:49,214 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 2018-08-29 11:41:49,215 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT] 2018-08-29 11:41:49,221 INFO org.apache.flink.configuration.GlobalConfiguration- Loading configuration property: jobmanager.rpc.address, flink-jobmanager-1 2018-08-29 11:41:49,221 INFO org.apache.flink.configuration.GlobalConfiguration- Loading configuration property: jobmana
Re: JobGraphs not cleaned up in HA mode
Hi Encho, it sounds strange that the standby JobManager tries to recover a submitted job graph. This should only happen if it has been granted leadership. Thus, it seems as if the standby JobManager thinks that it is also the leader. Could you maybe share the logs of the two JobManagers/ClusterEntrypoints with us? Running only a single JobManager/ClusterEntrypoint in HA mode via a Kubernetes Deployment should do the trick and there is nothing wrong with it. Cheers, Till On Wed, Aug 29, 2018 at 11:05 AM Encho Mishinev wrote: > Hello, > > Since two job managers don't seem to be working for me I was thinking of > just using a single job manager in Kubernetes in HA mode with a deployment > ensuring its restart whenever it fails. Is this approach viable? The > High-Availability page mentions that you use only one job manager in an > YARN cluster but does not specify such option for Kubernetes. Is there > anything that can go wrong with this approach? > > Thanks > > On Wed, Aug 29, 2018 at 11:10 AM Encho Mishinev > wrote: > >> Hi, >> >> Unfortunately the thing I described does indeed happen every time. As >> mentioned in the first email, I am running on Kubernetes so certain things >> could be different compared to just a standalone cluster. >> >> Any ideas for workarounds are welcome, as this problem basically prevents >> me from using HA. >> >> Thanks, >> Encho >> >> On Wed, Aug 29, 2018 at 5:15 AM vino yang wrote: >> >>> Hi Encho, >>> >>> From your description, I feel that there are extra bugs. >>> >>> About your description: >>> >>> *- Start both job managers* >>> *- Start a batch job in JobManager 1 and let it finish* >>> *The jobgraphs in both Zookeeper and HDFS remained.* >>> >>> Is it necessarily happening every time? >>> >>> In the Standalone cluster, the problems we encountered were sporadic. >>> >>> Thanks, vino. >>> >>> Encho Mishinev 于2018年8月28日周二 下午8:07写道: >>> Hello Till, I spend a few more hours testing and looking at the logs and it seems like there's a more general problem here. While the two job managers are active neither of them can properly delete jobgraphs. The above problem I described comes from the fact that Kubernetes gets JobManager 1 quickly after I manually kill it, so when I stop the job on JobManager 2 both are alive. I did a very simple test: - Start both job managers - Start a batch job in JobManager 1 and let it finish The jobgraphs in both Zookeeper and HDFS remained. On the other hand if we do: - Start only JobManager 1 (again in HA mode) - Start a batch job and let it finish The jobgraphs in both Zookeeper and HDFS are deleted fine. It seems like the standby manager still leaves some kind of lock on the jobgraphs. Do you think that's possible? Have you seen a similar problem? The only logs that appear on the standby manager while waiting are of the type: 2018-08-28 11:54:10,789 INFO org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - Recovered SubmittedJobGraph(9e0a109b57511930c95d3b54574a66e3, null). Note that this log appears on the standby jobmanager immediately when a new job is submitted to the active jobmanager. Also note that the blobs and checkpoints are cleared fine. The problem is only for jobgraphs both in ZooKeeper and HDFS. Trying to access the UI of the standby manager redirects to the active one, so it is not a problem of them not knowing who the leader is. Do you have any ideas? Thanks a lot, Encho On Tue, Aug 28, 2018 at 10:27 AM Till Rohrmann wrote: > Hi Encho, > > thanks a lot for reporting this issue. The problem arises whenever the > old leader maintains the connection to ZooKeeper. If this is the case, > then > ephemeral nodes which we create to protect against faulty delete > operations > are not removed and consequently the new leader is not able to delete the > persisted job graph. So one thing to check is whether the old JM still has > an open connection to ZooKeeper. The next thing to check is the session > timeout of your ZooKeeper cluster. If you stop the job within the session > timeout, then it is also not guaranteed that ZooKeeper has detected that > the ephemeral nodes of the old JM must be deleted. In order to understand > this better it would be helpful if you could tell us the timing of the > different actions. > > Cheers, > Till > > On Tue, Aug 28, 2018 at 8:17 AM vino yang > wrote: > >> Hi Encho, >> >> A temporary solution can be used to determine if it has been cleaned >> up by monitoring the specific JobID under Zookeeper's "/jobgraph". >> Another solution, modify the source code, rudely modify the cleanup >> mode to the synchronous form, but the flink operation Zookeeper's
Re: JobGraphs not cleaned up in HA mode
Hello, Since two job managers don't seem to be working for me I was thinking of just using a single job manager in Kubernetes in HA mode with a deployment ensuring its restart whenever it fails. Is this approach viable? The High-Availability page mentions that you use only one job manager in an YARN cluster but does not specify such option for Kubernetes. Is there anything that can go wrong with this approach? Thanks On Wed, Aug 29, 2018 at 11:10 AM Encho Mishinev wrote: > Hi, > > Unfortunately the thing I described does indeed happen every time. As > mentioned in the first email, I am running on Kubernetes so certain things > could be different compared to just a standalone cluster. > > Any ideas for workarounds are welcome, as this problem basically prevents > me from using HA. > > Thanks, > Encho > > On Wed, Aug 29, 2018 at 5:15 AM vino yang wrote: > >> Hi Encho, >> >> From your description, I feel that there are extra bugs. >> >> About your description: >> >> *- Start both job managers* >> *- Start a batch job in JobManager 1 and let it finish* >> *The jobgraphs in both Zookeeper and HDFS remained.* >> >> Is it necessarily happening every time? >> >> In the Standalone cluster, the problems we encountered were sporadic. >> >> Thanks, vino. >> >> Encho Mishinev 于2018年8月28日周二 下午8:07写道: >> >>> Hello Till, >>> >>> I spend a few more hours testing and looking at the logs and it seems >>> like there's a more general problem here. While the two job managers are >>> active neither of them can properly delete jobgraphs. The above problem I >>> described comes from the fact that Kubernetes gets JobManager 1 quickly >>> after I manually kill it, so when I stop the job on JobManager 2 both are >>> alive. >>> >>> I did a very simple test: >>> >>> - Start both job managers >>> - Start a batch job in JobManager 1 and let it finish >>> The jobgraphs in both Zookeeper and HDFS remained. >>> >>> On the other hand if we do: >>> >>> - Start only JobManager 1 (again in HA mode) >>> - Start a batch job and let it finish >>> The jobgraphs in both Zookeeper and HDFS are deleted fine. >>> >>> It seems like the standby manager still leaves some kind of lock on the >>> jobgraphs. Do you think that's possible? Have you seen a similar problem? >>> The only logs that appear on the standby manager while waiting are of >>> the type: >>> >>> 2018-08-28 11:54:10,789 INFO >>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - >>> Recovered SubmittedJobGraph(9e0a109b57511930c95d3b54574a66e3, null). >>> >>> Note that this log appears on the standby jobmanager immediately when a >>> new job is submitted to the active jobmanager. >>> Also note that the blobs and checkpoints are cleared fine. The problem >>> is only for jobgraphs both in ZooKeeper and HDFS. >>> >>> Trying to access the UI of the standby manager redirects to the active >>> one, so it is not a problem of them not knowing who the leader is. Do you >>> have any ideas? >>> >>> Thanks a lot, >>> Encho >>> >>> On Tue, Aug 28, 2018 at 10:27 AM Till Rohrmann >>> wrote: >>> Hi Encho, thanks a lot for reporting this issue. The problem arises whenever the old leader maintains the connection to ZooKeeper. If this is the case, then ephemeral nodes which we create to protect against faulty delete operations are not removed and consequently the new leader is not able to delete the persisted job graph. So one thing to check is whether the old JM still has an open connection to ZooKeeper. The next thing to check is the session timeout of your ZooKeeper cluster. If you stop the job within the session timeout, then it is also not guaranteed that ZooKeeper has detected that the ephemeral nodes of the old JM must be deleted. In order to understand this better it would be helpful if you could tell us the timing of the different actions. Cheers, Till On Tue, Aug 28, 2018 at 8:17 AM vino yang wrote: > Hi Encho, > > A temporary solution can be used to determine if it has been cleaned > up by monitoring the specific JobID under Zookeeper's "/jobgraph". > Another solution, modify the source code, rudely modify the cleanup > mode to the synchronous form, but the flink operation Zookeeper's path > needs to obtain the corresponding lock, so it is dangerous to do so, and > it > is not recommended. > I think maybe this problem can be solved in the next version. It > depends on Till. > > Thanks, vino. > > Encho Mishinev 于2018年8月28日周二 下午1:17写道: > >> Thank you very much for the info! Will keep track of the progress. >> >> In the meantime is there any viable workaround? It seems like HA >> doesn't really work due to this bug. >> >> On Tue, Aug 28, 2018 at 4:52 AM vino yang >> wrote: >> >>> About some implementation mechanisms. >>> Flink uses Zookeeper to store JobGraph (Job's desc
Re: JobGraphs not cleaned up in HA mode
Hi, Unfortunately the thing I described does indeed happen every time. As mentioned in the first email, I am running on Kubernetes so certain things could be different compared to just a standalone cluster. Any ideas for workarounds are welcome, as this problem basically prevents me from using HA. Thanks, Encho On Wed, Aug 29, 2018 at 5:15 AM vino yang wrote: > Hi Encho, > > From your description, I feel that there are extra bugs. > > About your description: > > *- Start both job managers* > *- Start a batch job in JobManager 1 and let it finish* > *The jobgraphs in both Zookeeper and HDFS remained.* > > Is it necessarily happening every time? > > In the Standalone cluster, the problems we encountered were sporadic. > > Thanks, vino. > > Encho Mishinev 于2018年8月28日周二 下午8:07写道: > >> Hello Till, >> >> I spend a few more hours testing and looking at the logs and it seems >> like there's a more general problem here. While the two job managers are >> active neither of them can properly delete jobgraphs. The above problem I >> described comes from the fact that Kubernetes gets JobManager 1 quickly >> after I manually kill it, so when I stop the job on JobManager 2 both are >> alive. >> >> I did a very simple test: >> >> - Start both job managers >> - Start a batch job in JobManager 1 and let it finish >> The jobgraphs in both Zookeeper and HDFS remained. >> >> On the other hand if we do: >> >> - Start only JobManager 1 (again in HA mode) >> - Start a batch job and let it finish >> The jobgraphs in both Zookeeper and HDFS are deleted fine. >> >> It seems like the standby manager still leaves some kind of lock on the >> jobgraphs. Do you think that's possible? Have you seen a similar problem? >> The only logs that appear on the standby manager while waiting are of the >> type: >> >> 2018-08-28 11:54:10,789 INFO >> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - >> Recovered SubmittedJobGraph(9e0a109b57511930c95d3b54574a66e3, null). >> >> Note that this log appears on the standby jobmanager immediately when a >> new job is submitted to the active jobmanager. >> Also note that the blobs and checkpoints are cleared fine. The problem is >> only for jobgraphs both in ZooKeeper and HDFS. >> >> Trying to access the UI of the standby manager redirects to the active >> one, so it is not a problem of them not knowing who the leader is. Do you >> have any ideas? >> >> Thanks a lot, >> Encho >> >> On Tue, Aug 28, 2018 at 10:27 AM Till Rohrmann >> wrote: >> >>> Hi Encho, >>> >>> thanks a lot for reporting this issue. The problem arises whenever the >>> old leader maintains the connection to ZooKeeper. If this is the case, then >>> ephemeral nodes which we create to protect against faulty delete operations >>> are not removed and consequently the new leader is not able to delete the >>> persisted job graph. So one thing to check is whether the old JM still has >>> an open connection to ZooKeeper. The next thing to check is the session >>> timeout of your ZooKeeper cluster. If you stop the job within the session >>> timeout, then it is also not guaranteed that ZooKeeper has detected that >>> the ephemeral nodes of the old JM must be deleted. In order to understand >>> this better it would be helpful if you could tell us the timing of the >>> different actions. >>> >>> Cheers, >>> Till >>> >>> On Tue, Aug 28, 2018 at 8:17 AM vino yang wrote: >>> Hi Encho, A temporary solution can be used to determine if it has been cleaned up by monitoring the specific JobID under Zookeeper's "/jobgraph". Another solution, modify the source code, rudely modify the cleanup mode to the synchronous form, but the flink operation Zookeeper's path needs to obtain the corresponding lock, so it is dangerous to do so, and it is not recommended. I think maybe this problem can be solved in the next version. It depends on Till. Thanks, vino. Encho Mishinev 于2018年8月28日周二 下午1:17写道: > Thank you very much for the info! Will keep track of the progress. > > In the meantime is there any viable workaround? It seems like HA > doesn't really work due to this bug. > > On Tue, Aug 28, 2018 at 4:52 AM vino yang > wrote: > >> About some implementation mechanisms. >> Flink uses Zookeeper to store JobGraph (Job's description information >> and metadata) as a basis for Job recovery. >> However, previous implementations may cause this information to not >> be properly cleaned up because it is asynchronously deleted by a >> background >> thread. >> >> Thanks, vino. >> >> vino yang 于2018年8月28日周二 上午9:49写道: >> >>> Hi Encho, >>> >>> This is a problem already known to the Flink community, you can >>> track its progress through FLINK-10011[1], and currently Till is fixing >>> this issue. >>> >>> [1]: https://issues.apache.org/jira/browse/FLINK-10011 >>> >>> Tha
Re: JobGraphs not cleaned up in HA mode
Hi Encho, >From your description, I feel that there are extra bugs. About your description: *- Start both job managers* *- Start a batch job in JobManager 1 and let it finish* *The jobgraphs in both Zookeeper and HDFS remained.* Is it necessarily happening every time? In the Standalone cluster, the problems we encountered were sporadic. Thanks, vino. Encho Mishinev 于2018年8月28日周二 下午8:07写道: > Hello Till, > > I spend a few more hours testing and looking at the logs and it seems like > there's a more general problem here. While the two job managers are active > neither of them can properly delete jobgraphs. The above problem I > described comes from the fact that Kubernetes gets JobManager 1 quickly > after I manually kill it, so when I stop the job on JobManager 2 both are > alive. > > I did a very simple test: > > - Start both job managers > - Start a batch job in JobManager 1 and let it finish > The jobgraphs in both Zookeeper and HDFS remained. > > On the other hand if we do: > > - Start only JobManager 1 (again in HA mode) > - Start a batch job and let it finish > The jobgraphs in both Zookeeper and HDFS are deleted fine. > > It seems like the standby manager still leaves some kind of lock on the > jobgraphs. Do you think that's possible? Have you seen a similar problem? > The only logs that appear on the standby manager while waiting are of the > type: > > 2018-08-28 11:54:10,789 INFO > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - > Recovered SubmittedJobGraph(9e0a109b57511930c95d3b54574a66e3, null). > > Note that this log appears on the standby jobmanager immediately when a > new job is submitted to the active jobmanager. > Also note that the blobs and checkpoints are cleared fine. The problem is > only for jobgraphs both in ZooKeeper and HDFS. > > Trying to access the UI of the standby manager redirects to the active > one, so it is not a problem of them not knowing who the leader is. Do you > have any ideas? > > Thanks a lot, > Encho > > On Tue, Aug 28, 2018 at 10:27 AM Till Rohrmann > wrote: > >> Hi Encho, >> >> thanks a lot for reporting this issue. The problem arises whenever the >> old leader maintains the connection to ZooKeeper. If this is the case, then >> ephemeral nodes which we create to protect against faulty delete operations >> are not removed and consequently the new leader is not able to delete the >> persisted job graph. So one thing to check is whether the old JM still has >> an open connection to ZooKeeper. The next thing to check is the session >> timeout of your ZooKeeper cluster. If you stop the job within the session >> timeout, then it is also not guaranteed that ZooKeeper has detected that >> the ephemeral nodes of the old JM must be deleted. In order to understand >> this better it would be helpful if you could tell us the timing of the >> different actions. >> >> Cheers, >> Till >> >> On Tue, Aug 28, 2018 at 8:17 AM vino yang wrote: >> >>> Hi Encho, >>> >>> A temporary solution can be used to determine if it has been cleaned up >>> by monitoring the specific JobID under Zookeeper's "/jobgraph". >>> Another solution, modify the source code, rudely modify the cleanup mode >>> to the synchronous form, but the flink operation Zookeeper's path needs to >>> obtain the corresponding lock, so it is dangerous to do so, and it is not >>> recommended. >>> I think maybe this problem can be solved in the next version. It depends >>> on Till. >>> >>> Thanks, vino. >>> >>> Encho Mishinev 于2018年8月28日周二 下午1:17写道: >>> Thank you very much for the info! Will keep track of the progress. In the meantime is there any viable workaround? It seems like HA doesn't really work due to this bug. On Tue, Aug 28, 2018 at 4:52 AM vino yang wrote: > About some implementation mechanisms. > Flink uses Zookeeper to store JobGraph (Job's description information > and metadata) as a basis for Job recovery. > However, previous implementations may cause this information to not be > properly cleaned up because it is asynchronously deleted by a background > thread. > > Thanks, vino. > > vino yang 于2018年8月28日周二 上午9:49写道: > >> Hi Encho, >> >> This is a problem already known to the Flink community, you can track >> its progress through FLINK-10011[1], and currently Till is fixing this >> issue. >> >> [1]: https://issues.apache.org/jira/browse/FLINK-10011 >> >> Thanks, vino. >> >> Encho Mishinev 于2018年8月27日周一 下午10:13写道: >> >>> I am running Flink 1.5.3 with two job managers and two task managers >>> in Kubernetes along with HDFS and Zookeeper in high-availability mode. >>> >>> My problem occurs after the following actions: >>> - Upload a .jar file to jobmanager-1 >>> - Run a streaming job from the jar on jobmanager-1 >>> - Wait for 1 or 2 checkpoints to succeed >>> - Kill pod of jobmanager-1 >>> After a short delay, j
Re: JobGraphs not cleaned up in HA mode
Hello Till, I spend a few more hours testing and looking at the logs and it seems like there's a more general problem here. While the two job managers are active neither of them can properly delete jobgraphs. The above problem I described comes from the fact that Kubernetes gets JobManager 1 quickly after I manually kill it, so when I stop the job on JobManager 2 both are alive. I did a very simple test: - Start both job managers - Start a batch job in JobManager 1 and let it finish The jobgraphs in both Zookeeper and HDFS remained. On the other hand if we do: - Start only JobManager 1 (again in HA mode) - Start a batch job and let it finish The jobgraphs in both Zookeeper and HDFS are deleted fine. It seems like the standby manager still leaves some kind of lock on the jobgraphs. Do you think that's possible? Have you seen a similar problem? The only logs that appear on the standby manager while waiting are of the type: 2018-08-28 11:54:10,789 INFO org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - Recovered SubmittedJobGraph(9e0a109b57511930c95d3b54574a66e3, null). Note that this log appears on the standby jobmanager immediately when a new job is submitted to the active jobmanager. Also note that the blobs and checkpoints are cleared fine. The problem is only for jobgraphs both in ZooKeeper and HDFS. Trying to access the UI of the standby manager redirects to the active one, so it is not a problem of them not knowing who the leader is. Do you have any ideas? Thanks a lot, Encho On Tue, Aug 28, 2018 at 10:27 AM Till Rohrmann wrote: > Hi Encho, > > thanks a lot for reporting this issue. The problem arises whenever the old > leader maintains the connection to ZooKeeper. If this is the case, then > ephemeral nodes which we create to protect against faulty delete operations > are not removed and consequently the new leader is not able to delete the > persisted job graph. So one thing to check is whether the old JM still has > an open connection to ZooKeeper. The next thing to check is the session > timeout of your ZooKeeper cluster. If you stop the job within the session > timeout, then it is also not guaranteed that ZooKeeper has detected that > the ephemeral nodes of the old JM must be deleted. In order to understand > this better it would be helpful if you could tell us the timing of the > different actions. > > Cheers, > Till > > On Tue, Aug 28, 2018 at 8:17 AM vino yang wrote: > >> Hi Encho, >> >> A temporary solution can be used to determine if it has been cleaned up >> by monitoring the specific JobID under Zookeeper's "/jobgraph". >> Another solution, modify the source code, rudely modify the cleanup mode >> to the synchronous form, but the flink operation Zookeeper's path needs to >> obtain the corresponding lock, so it is dangerous to do so, and it is not >> recommended. >> I think maybe this problem can be solved in the next version. It depends >> on Till. >> >> Thanks, vino. >> >> Encho Mishinev 于2018年8月28日周二 下午1:17写道: >> >>> Thank you very much for the info! Will keep track of the progress. >>> >>> In the meantime is there any viable workaround? It seems like HA doesn't >>> really work due to this bug. >>> >>> On Tue, Aug 28, 2018 at 4:52 AM vino yang wrote: >>> About some implementation mechanisms. Flink uses Zookeeper to store JobGraph (Job's description information and metadata) as a basis for Job recovery. However, previous implementations may cause this information to not be properly cleaned up because it is asynchronously deleted by a background thread. Thanks, vino. vino yang 于2018年8月28日周二 上午9:49写道: > Hi Encho, > > This is a problem already known to the Flink community, you can track > its progress through FLINK-10011[1], and currently Till is fixing this > issue. > > [1]: https://issues.apache.org/jira/browse/FLINK-10011 > > Thanks, vino. > > Encho Mishinev 于2018年8月27日周一 下午10:13写道: > >> I am running Flink 1.5.3 with two job managers and two task managers >> in Kubernetes along with HDFS and Zookeeper in high-availability mode. >> >> My problem occurs after the following actions: >> - Upload a .jar file to jobmanager-1 >> - Run a streaming job from the jar on jobmanager-1 >> - Wait for 1 or 2 checkpoints to succeed >> - Kill pod of jobmanager-1 >> After a short delay, jobmanager-2 takes leadership and correctly >> restores the job and continues it >> - Stop job from jobmanager-2 >> >> At this point all seems well, but the problem is that jobmanager-2 >> does not clean up anything that was left from jobmanager-1. This means >> that >> both in HDFS and in Zookeeper remain job graphs, which later on obstruct >> any work of both managers as after any reset they unsuccessfully try to >> restore a non-existent job and fail over and over again. >> >> I am quite certain that jobm
Re: JobGraphs not cleaned up in HA mode
Hi Encho, thanks a lot for reporting this issue. The problem arises whenever the old leader maintains the connection to ZooKeeper. If this is the case, then ephemeral nodes which we create to protect against faulty delete operations are not removed and consequently the new leader is not able to delete the persisted job graph. So one thing to check is whether the old JM still has an open connection to ZooKeeper. The next thing to check is the session timeout of your ZooKeeper cluster. If you stop the job within the session timeout, then it is also not guaranteed that ZooKeeper has detected that the ephemeral nodes of the old JM must be deleted. In order to understand this better it would be helpful if you could tell us the timing of the different actions. Cheers, Till On Tue, Aug 28, 2018 at 8:17 AM vino yang wrote: > Hi Encho, > > A temporary solution can be used to determine if it has been cleaned up by > monitoring the specific JobID under Zookeeper's "/jobgraph". > Another solution, modify the source code, rudely modify the cleanup mode > to the synchronous form, but the flink operation Zookeeper's path needs to > obtain the corresponding lock, so it is dangerous to do so, and it is not > recommended. > I think maybe this problem can be solved in the next version. It depends > on Till. > > Thanks, vino. > > Encho Mishinev 于2018年8月28日周二 下午1:17写道: > >> Thank you very much for the info! Will keep track of the progress. >> >> In the meantime is there any viable workaround? It seems like HA doesn't >> really work due to this bug. >> >> On Tue, Aug 28, 2018 at 4:52 AM vino yang wrote: >> >>> About some implementation mechanisms. >>> Flink uses Zookeeper to store JobGraph (Job's description information >>> and metadata) as a basis for Job recovery. >>> However, previous implementations may cause this information to not be >>> properly cleaned up because it is asynchronously deleted by a background >>> thread. >>> >>> Thanks, vino. >>> >>> vino yang 于2018年8月28日周二 上午9:49写道: >>> Hi Encho, This is a problem already known to the Flink community, you can track its progress through FLINK-10011[1], and currently Till is fixing this issue. [1]: https://issues.apache.org/jira/browse/FLINK-10011 Thanks, vino. Encho Mishinev 于2018年8月27日周一 下午10:13写道: > I am running Flink 1.5.3 with two job managers and two task managers > in Kubernetes along with HDFS and Zookeeper in high-availability mode. > > My problem occurs after the following actions: > - Upload a .jar file to jobmanager-1 > - Run a streaming job from the jar on jobmanager-1 > - Wait for 1 or 2 checkpoints to succeed > - Kill pod of jobmanager-1 > After a short delay, jobmanager-2 takes leadership and correctly > restores the job and continues it > - Stop job from jobmanager-2 > > At this point all seems well, but the problem is that jobmanager-2 > does not clean up anything that was left from jobmanager-1. This means > that > both in HDFS and in Zookeeper remain job graphs, which later on obstruct > any work of both managers as after any reset they unsuccessfully try to > restore a non-existent job and fail over and over again. > > I am quite certain that jobmanager-2 does not know about any of > jobmanager-1’s files since the Zookeeper logs reveal that it tries to > duplicate job folders: > > 2018-08-27 13:11:00,038 [myid:] - INFO [ProcessThread(sid:0 > cport:2181)::PrepRequestProcessor@648] - Got user-level > KeeperException when processing sessionid:0x1657aa15e480033 type:create > cxid:0x46 zxid:0x1ab txntype:-1 reqpath:n/a Error > Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77 > Error:KeeperErrorCode = NodeExists for > /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77 > > 2018-08-27 13:11:02,296 [myid:] - INFO [ProcessThread(sid:0 > cport:2181)::PrepRequestProcessor@648] - Got user-level > KeeperException when processing sessionid:0x1657aa15e480033 type:create > cxid:0x5c zxid:0x1ac txntype:-1 reqpath:n/a Error > Path:/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 > Error:KeeperErrorCode = NodeExists for > /flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 > > Also jobmanager-2 attempts to delete the jobgraphs folder in Zookeeper > when the job is stopped, but fails since there are leftover files in it > from jobmanager-1: > > 2018-08-27 13:12:13,406 [myid:] - INFO [ProcessThread(sid:0 > cport:2181)::PrepRequestProcessor@648] - Got user-level > KeeperException when processing sessionid:0x1657aa15e480033 type:delete > cxid:0xa8 zxid:0x1bd txntype:-1 reqpath:n/a Error > Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15 > Error:KeeperErrorCode = Directory
Re: JobGraphs not cleaned up in HA mode
Hi Encho, A temporary solution can be used to determine if it has been cleaned up by monitoring the specific JobID under Zookeeper's "/jobgraph". Another solution, modify the source code, rudely modify the cleanup mode to the synchronous form, but the flink operation Zookeeper's path needs to obtain the corresponding lock, so it is dangerous to do so, and it is not recommended. I think maybe this problem can be solved in the next version. It depends on Till. Thanks, vino. Encho Mishinev 于2018年8月28日周二 下午1:17写道: > Thank you very much for the info! Will keep track of the progress. > > In the meantime is there any viable workaround? It seems like HA doesn't > really work due to this bug. > > On Tue, Aug 28, 2018 at 4:52 AM vino yang wrote: > >> About some implementation mechanisms. >> Flink uses Zookeeper to store JobGraph (Job's description information and >> metadata) as a basis for Job recovery. >> However, previous implementations may cause this information to not be >> properly cleaned up because it is asynchronously deleted by a background >> thread. >> >> Thanks, vino. >> >> vino yang 于2018年8月28日周二 上午9:49写道: >> >>> Hi Encho, >>> >>> This is a problem already known to the Flink community, you can track >>> its progress through FLINK-10011[1], and currently Till is fixing this >>> issue. >>> >>> [1]: https://issues.apache.org/jira/browse/FLINK-10011 >>> >>> Thanks, vino. >>> >>> Encho Mishinev 于2018年8月27日周一 下午10:13写道: >>> I am running Flink 1.5.3 with two job managers and two task managers in Kubernetes along with HDFS and Zookeeper in high-availability mode. My problem occurs after the following actions: - Upload a .jar file to jobmanager-1 - Run a streaming job from the jar on jobmanager-1 - Wait for 1 or 2 checkpoints to succeed - Kill pod of jobmanager-1 After a short delay, jobmanager-2 takes leadership and correctly restores the job and continues it - Stop job from jobmanager-2 At this point all seems well, but the problem is that jobmanager-2 does not clean up anything that was left from jobmanager-1. This means that both in HDFS and in Zookeeper remain job graphs, which later on obstruct any work of both managers as after any reset they unsuccessfully try to restore a non-existent job and fail over and over again. I am quite certain that jobmanager-2 does not know about any of jobmanager-1’s files since the Zookeeper logs reveal that it tries to duplicate job folders: 2018-08-27 13:11:00,038 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when processing sessionid:0x1657aa15e480033 type:create cxid:0x46 zxid:0x1ab txntype:-1 reqpath:n/a Error Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77 Error:KeeperErrorCode = NodeExists for /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77 2018-08-27 13:11:02,296 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when processing sessionid:0x1657aa15e480033 type:create cxid:0x5c zxid:0x1ac txntype:-1 reqpath:n/a Error Path:/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 Error:KeeperErrorCode = NodeExists for /flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 Also jobmanager-2 attempts to delete the jobgraphs folder in Zookeeper when the job is stopped, but fails since there are leftover files in it from jobmanager-1: 2018-08-27 13:12:13,406 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when processing sessionid:0x1657aa15e480033 type:delete cxid:0xa8 zxid:0x1bd txntype:-1 reqpath:n/a Error Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15 Error:KeeperErrorCode = Directory not empty for /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15 I’ve noticed that when restoring the job, it seems like jobmanager-2 does not get anything more than jobID, while it perhaps needs some metadata? Here is the log that seems suspicious to me: 2018-08-27 13:09:18,113 INFO org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - Recovered SubmittedJobGraph(83bfa359ca59ce1d4635e18e16651e15, null). All other logs seem fine in jobmanager-2, it doesn’t seem to be aware that it’s overwriting anything or not deleting properly. My question is - what is the intended way for the job managers to correctly exchange metadata in HA mode and why is it not working for me? Thanks in advance! >>> >>>
Re: JobGraphs not cleaned up in HA mode
Thank you very much for the info! Will keep track of the progress. In the meantime is there any viable workaround? It seems like HA doesn't really work due to this bug. On Tue, Aug 28, 2018 at 4:52 AM vino yang wrote: > About some implementation mechanisms. > Flink uses Zookeeper to store JobGraph (Job's description information and > metadata) as a basis for Job recovery. > However, previous implementations may cause this information to not be > properly cleaned up because it is asynchronously deleted by a background > thread. > > Thanks, vino. > > vino yang 于2018年8月28日周二 上午9:49写道: > >> Hi Encho, >> >> This is a problem already known to the Flink community, you can track its >> progress through FLINK-10011[1], and currently Till is fixing this issue. >> >> [1]: https://issues.apache.org/jira/browse/FLINK-10011 >> >> Thanks, vino. >> >> Encho Mishinev 于2018年8月27日周一 下午10:13写道: >> >>> I am running Flink 1.5.3 with two job managers and two task managers in >>> Kubernetes along with HDFS and Zookeeper in high-availability mode. >>> >>> My problem occurs after the following actions: >>> - Upload a .jar file to jobmanager-1 >>> - Run a streaming job from the jar on jobmanager-1 >>> - Wait for 1 or 2 checkpoints to succeed >>> - Kill pod of jobmanager-1 >>> After a short delay, jobmanager-2 takes leadership and correctly >>> restores the job and continues it >>> - Stop job from jobmanager-2 >>> >>> At this point all seems well, but the problem is that jobmanager-2 does >>> not clean up anything that was left from jobmanager-1. This means that both >>> in HDFS and in Zookeeper remain job graphs, which later on obstruct any >>> work of both managers as after any reset they unsuccessfully try to restore >>> a non-existent job and fail over and over again. >>> >>> I am quite certain that jobmanager-2 does not know about any of >>> jobmanager-1’s files since the Zookeeper logs reveal that it tries to >>> duplicate job folders: >>> >>> 2018-08-27 13:11:00,038 [myid:] - INFO [ProcessThread(sid:0 >>> cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException >>> when processing sessionid:0x1657aa15e480033 type:create cxid:0x46 >>> zxid:0x1ab txntype:-1 reqpath:n/a Error >>> Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77 >>> Error:KeeperErrorCode = NodeExists for >>> /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77 >>> >>> 2018-08-27 13:11:02,296 [myid:] - INFO [ProcessThread(sid:0 >>> cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException >>> when processing sessionid:0x1657aa15e480033 type:create cxid:0x5c >>> zxid:0x1ac txntype:-1 reqpath:n/a Error >>> Path:/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 >>> Error:KeeperErrorCode = NodeExists for >>> /flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 >>> >>> Also jobmanager-2 attempts to delete the jobgraphs folder in Zookeeper >>> when the job is stopped, but fails since there are leftover files in it >>> from jobmanager-1: >>> >>> 2018-08-27 13:12:13,406 [myid:] - INFO [ProcessThread(sid:0 >>> cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException >>> when processing sessionid:0x1657aa15e480033 type:delete cxid:0xa8 >>> zxid:0x1bd txntype:-1 reqpath:n/a Error >>> Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15 >>> Error:KeeperErrorCode = Directory not empty for >>> /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15 >>> >>> I’ve noticed that when restoring the job, it seems like jobmanager-2 >>> does not get anything more than jobID, while it perhaps needs some >>> metadata? Here is the log that seems suspicious to me: >>> >>> 2018-08-27 13:09:18,113 INFO >>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - >>> Recovered SubmittedJobGraph(83bfa359ca59ce1d4635e18e16651e15, null). >>> >>> All other logs seem fine in jobmanager-2, it doesn’t seem to be aware >>> that it’s overwriting anything or not deleting properly. >>> >>> My question is - what is the intended way for the job managers to >>> correctly exchange metadata in HA mode and why is it not working for me? >>> >>> Thanks in advance! >> >>
Re: JobGraphs not cleaned up in HA mode
About some implementation mechanisms. Flink uses Zookeeper to store JobGraph (Job's description information and metadata) as a basis for Job recovery. However, previous implementations may cause this information to not be properly cleaned up because it is asynchronously deleted by a background thread. Thanks, vino. vino yang 于2018年8月28日周二 上午9:49写道: > Hi Encho, > > This is a problem already known to the Flink community, you can track its > progress through FLINK-10011[1], and currently Till is fixing this issue. > > [1]: https://issues.apache.org/jira/browse/FLINK-10011 > > Thanks, vino. > > Encho Mishinev 于2018年8月27日周一 下午10:13写道: > >> I am running Flink 1.5.3 with two job managers and two task managers in >> Kubernetes along with HDFS and Zookeeper in high-availability mode. >> >> My problem occurs after the following actions: >> - Upload a .jar file to jobmanager-1 >> - Run a streaming job from the jar on jobmanager-1 >> - Wait for 1 or 2 checkpoints to succeed >> - Kill pod of jobmanager-1 >> After a short delay, jobmanager-2 takes leadership and correctly restores >> the job and continues it >> - Stop job from jobmanager-2 >> >> At this point all seems well, but the problem is that jobmanager-2 does >> not clean up anything that was left from jobmanager-1. This means that both >> in HDFS and in Zookeeper remain job graphs, which later on obstruct any >> work of both managers as after any reset they unsuccessfully try to restore >> a non-existent job and fail over and over again. >> >> I am quite certain that jobmanager-2 does not know about any of >> jobmanager-1’s files since the Zookeeper logs reveal that it tries to >> duplicate job folders: >> >> 2018-08-27 13:11:00,038 [myid:] - INFO [ProcessThread(sid:0 >> cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException >> when processing sessionid:0x1657aa15e480033 type:create cxid:0x46 >> zxid:0x1ab txntype:-1 reqpath:n/a Error >> Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77 >> Error:KeeperErrorCode = NodeExists for >> /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77 >> >> 2018-08-27 13:11:02,296 [myid:] - INFO [ProcessThread(sid:0 >> cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException >> when processing sessionid:0x1657aa15e480033 type:create cxid:0x5c >> zxid:0x1ac txntype:-1 reqpath:n/a Error >> Path:/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 >> Error:KeeperErrorCode = NodeExists for >> /flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 >> >> Also jobmanager-2 attempts to delete the jobgraphs folder in Zookeeper >> when the job is stopped, but fails since there are leftover files in it >> from jobmanager-1: >> >> 2018-08-27 13:12:13,406 [myid:] - INFO [ProcessThread(sid:0 >> cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException >> when processing sessionid:0x1657aa15e480033 type:delete cxid:0xa8 >> zxid:0x1bd txntype:-1 reqpath:n/a Error >> Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15 >> Error:KeeperErrorCode = Directory not empty for >> /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15 >> >> I’ve noticed that when restoring the job, it seems like jobmanager-2 does >> not get anything more than jobID, while it perhaps needs some metadata? >> Here is the log that seems suspicious to me: >> >> 2018-08-27 13:09:18,113 INFO >> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - >> Recovered SubmittedJobGraph(83bfa359ca59ce1d4635e18e16651e15, null). >> >> All other logs seem fine in jobmanager-2, it doesn’t seem to be aware >> that it’s overwriting anything or not deleting properly. >> >> My question is - what is the intended way for the job managers to >> correctly exchange metadata in HA mode and why is it not working for me? >> >> Thanks in advance! > >
Re: JobGraphs not cleaned up in HA mode
Hi Encho, This is a problem already known to the Flink community, you can track its progress through FLINK-10011[1], and currently Till is fixing this issue. [1]: https://issues.apache.org/jira/browse/FLINK-10011 Thanks, vino. Encho Mishinev 于2018年8月27日周一 下午10:13写道: > I am running Flink 1.5.3 with two job managers and two task managers in > Kubernetes along with HDFS and Zookeeper in high-availability mode. > > My problem occurs after the following actions: > - Upload a .jar file to jobmanager-1 > - Run a streaming job from the jar on jobmanager-1 > - Wait for 1 or 2 checkpoints to succeed > - Kill pod of jobmanager-1 > After a short delay, jobmanager-2 takes leadership and correctly restores > the job and continues it > - Stop job from jobmanager-2 > > At this point all seems well, but the problem is that jobmanager-2 does > not clean up anything that was left from jobmanager-1. This means that both > in HDFS and in Zookeeper remain job graphs, which later on obstruct any > work of both managers as after any reset they unsuccessfully try to restore > a non-existent job and fail over and over again. > > I am quite certain that jobmanager-2 does not know about any of > jobmanager-1’s files since the Zookeeper logs reveal that it tries to > duplicate job folders: > > 2018-08-27 13:11:00,038 [myid:] - INFO [ProcessThread(sid:0 > cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException > when processing sessionid:0x1657aa15e480033 type:create cxid:0x46 > zxid:0x1ab txntype:-1 reqpath:n/a Error > Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77 > Error:KeeperErrorCode = NodeExists for > /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77 > > 2018-08-27 13:11:02,296 [myid:] - INFO [ProcessThread(sid:0 > cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException > when processing sessionid:0x1657aa15e480033 type:create cxid:0x5c > zxid:0x1ac txntype:-1 reqpath:n/a Error > Path:/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 > Error:KeeperErrorCode = NodeExists for > /flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 > > Also jobmanager-2 attempts to delete the jobgraphs folder in Zookeeper > when the job is stopped, but fails since there are leftover files in it > from jobmanager-1: > > 2018-08-27 13:12:13,406 [myid:] - INFO [ProcessThread(sid:0 > cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException > when processing sessionid:0x1657aa15e480033 type:delete cxid:0xa8 > zxid:0x1bd txntype:-1 reqpath:n/a Error > Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15 > Error:KeeperErrorCode = Directory not empty for > /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15 > > I’ve noticed that when restoring the job, it seems like jobmanager-2 does > not get anything more than jobID, while it perhaps needs some metadata? > Here is the log that seems suspicious to me: > > 2018-08-27 13:09:18,113 INFO > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - > Recovered SubmittedJobGraph(83bfa359ca59ce1d4635e18e16651e15, null). > > All other logs seem fine in jobmanager-2, it doesn’t seem to be aware that > it’s overwriting anything or not deleting properly. > > My question is - what is the intended way for the job managers to > correctly exchange metadata in HA mode and why is it not working for me? > > Thanks in advance!
JobGraphs not cleaned up in HA mode
I am running Flink 1.5.3 with two job managers and two task managers in Kubernetes along with HDFS and Zookeeper in high-availability mode. My problem occurs after the following actions: - Upload a .jar file to jobmanager-1 - Run a streaming job from the jar on jobmanager-1 - Wait for 1 or 2 checkpoints to succeed - Kill pod of jobmanager-1 After a short delay, jobmanager-2 takes leadership and correctly restores the job and continues it - Stop job from jobmanager-2 At this point all seems well, but the problem is that jobmanager-2 does not clean up anything that was left from jobmanager-1. This means that both in HDFS and in Zookeeper remain job graphs, which later on obstruct any work of both managers as after any reset they unsuccessfully try to restore a non-existent job and fail over and over again. I am quite certain that jobmanager-2 does not know about any of jobmanager-1’s files since the Zookeeper logs reveal that it tries to duplicate job folders: 2018-08-27 13:11:00,038 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when processing sessionid:0x1657aa15e480033 type:create cxid:0x46 zxid:0x1ab txntype:-1 reqpath:n/a Error Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77 Error:KeeperErrorCode = NodeExists for /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15/bbb259fd-7826-4950-bc7c-c2be23346c77 2018-08-27 13:11:02,296 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when processing sessionid:0x1657aa15e480033 type:create cxid:0x5c zxid:0x1ac txntype:-1 reqpath:n/a Error Path:/flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 Error:KeeperErrorCode = NodeExists for /flink/default/checkpoint-counter/83bfa359ca59ce1d4635e18e16651e15 Also jobmanager-2 attempts to delete the jobgraphs folder in Zookeeper when the job is stopped, but fails since there are leftover files in it from jobmanager-1: 2018-08-27 13:12:13,406 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when processing sessionid:0x1657aa15e480033 type:delete cxid:0xa8 zxid:0x1bd txntype:-1 reqpath:n/a Error Path:/flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15 Error:KeeperErrorCode = Directory not empty for /flink/default/jobgraphs/83bfa359ca59ce1d4635e18e16651e15 I’ve noticed that when restoring the job, it seems like jobmanager-2 does not get anything more than jobID, while it perhaps needs some metadata? Here is the log that seems suspicious to me: 2018-08-27 13:09:18,113 INFO org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - Recovered SubmittedJobGraph(83bfa359ca59ce1d4635e18e16651e15, null). All other logs seem fine in jobmanager-2, it doesn’t seem to be aware that it’s overwriting anything or not deleting properly. My question is - what is the intended way for the job managers to correctly exchange metadata in HA mode and why is it not working for me? Thanks in advance!