[ https://issues.apache.org/jira/browse/KYLIN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
nichunen updated KYLIN-4017: ---------------------------- Fix Version/s: (was: v3.0.0) v3.0.0-beta > Build engine get zk(zookeeper) lock failed when building job, it causes the > whole build engine doesn't work. > ------------------------------------------------------------------------------------------------------------ > > Key: KYLIN-4017 > URL: https://issues.apache.org/jira/browse/KYLIN-4017 > Project: Kylin > Issue Type: Bug > Components: Job Engine, Tools, Build and Test > Affects Versions: Future, v3.0.0, v3.0.0-alpha > Reporter: wangxiaojing > Priority: Critical > Labels: build > Fix For: v3.0.0-beta > > Attachments: zkinstancestart.png > > > Kylin has ZK acquisition lock exception when it is building job. Only restart > can solve this problem. Otherwise, it can't build job ,the whole build engine > doesn't work.This problem will continue to occur one day after restart. Log > looks like below: > {code:java} > 2019-05-15 11:09:43,209 INFO [FetcherRunner 1910115020-57] > threadpool.FetcherRunner:59 : > CubingJob{id=878974c4-4c65-88a4-a912-b238fcc33bdc, name=BUILD CUBE - > es_report_respnse_rate_cube - 20190513000000_20190514000000 - GMT+08:00 > 2019-05-15 11:03:15, state=READY} prepare to schedule and its priority is 20 > 2019-05-15 11:09:43,209 INFO [FetcherRunner 1910115020-57] > threadpool.FetcherRunner:63 : > CubingJob{id=878974c4-4c65-88a4-a912-b238fcc33bdc, name=BUILD CUBE - > es_report_respnse_rate_cube - 20190513000000_20190514000000 - GMT+08:00 > 2019-05-15 11:03:15, state=READY} scheduled > 2019-05-15 11:09:43,209 DEBUG [Scheduler 719764581 Job > 878974c4-4c65-88a4-a912-b238fcc33bdc-132] > zookeeper.ZookeeperDistributedLock:92 : > 18...@bigdata-kylin-build01.gz01.diditaxi.com trying to lock > /job_engine/lock/878974c4-4c65-88a4-a912-b238fcc33bdc > 2019-05-15 11:09:43,212 ERROR [pool-12-thread-10] > threadpool.DistributedScheduler:115 : unknown error execute > job:878974c4-4c65-88a4-a912-b238fcc33bdc in server: > 18...@bigdata-kylin-build01.gz01.diditaxi.com > java.lang.IllegalStateException: Error while > 18...@bigdata-kylin-build01.gz01.diditaxi.com trying to lock > /job_engine/lock/878974c4-4c65-88a4-a912-b238fcc33bdc > at > org.apache.kylin.job.lock.zookeeper.ZookeeperDistributedLock.lock(ZookeeperDistributedLock.java:99) > at > org.apache.kylin.job.lock.zookeeper.ZookeeperJobLock.lock(ZookeeperJobLock.java:41) > at > org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: instance must be started before > calling this method > at > org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:176) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.create(CuratorFrameworkImpl.java:351) > at > org.apache.kylin.job.lock.zookeeper.ZookeeperDistributedLock.lock(ZookeeperDistributedLock.java:95) > ... 5 more{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)