[ https://issues.apache.org/jira/browse/KYLIN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849583#comment-16849583 ]
ASF GitHub Bot commented on KYLIN-4017: --------------------------------------- wangxiaojing123 commented on pull request #664: KYLIN-4017 Build engine get zk(zookeeper) lock failed when building job, it causes the whole build engine doesn't work URL: https://github.com/apache/kylin/pull/664 ``` 【Type】:BUG 【Severity】:1-Blocker 【Module】:Build Engine 【Description】:Kylin build engine occasionally appears to be unable to get the ZK lock exception, and once this build engine appears, it will not work and can only be restarted to solve.Usually this problem will recur one day after the build engine starts. 【Design】:Setting the cache for curator is never invalid (unless the service stops) and check the state before use curator instance (if closed ,create a new curator instance and put into the cache) ``` ``` issue: https://issues.apache.org/jira/browse/KYLIN-4017 ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build engine get zk(zookeeper) lock failed when building job, it causes the > whole build engine doesn't work. > ------------------------------------------------------------------------------------------------------------ > > Key: KYLIN-4017 > URL: https://issues.apache.org/jira/browse/KYLIN-4017 > Project: Kylin > Issue Type: Bug > Components: Job Engine, Tools, Build and Test > Affects Versions: Future, v3.0.0, v3.0.0-alpha > Reporter: wangxiaojing > Priority: Critical > Labels: build > Fix For: Future, v3.0.0-alpha > > Attachments: zkinstancestart.png > > > Kylin has ZK acquisition lock exception when it is building job. Only restart > can solve this problem. Otherwise, it can't build job ,the whole build engine > doesn't work.This problem will continue to occur one day after restart. Log > looks like below: > {code:java} > 2019-05-15 11:09:43,209 INFO [FetcherRunner 1910115020-57] > threadpool.FetcherRunner:59 : > CubingJob{id=878974c4-4c65-88a4-a912-b238fcc33bdc, name=BUILD CUBE - > es_report_respnse_rate_cube - 20190513000000_20190514000000 - GMT+08:00 > 2019-05-15 11:03:15, state=READY} prepare to schedule and its priority is 20 > 2019-05-15 11:09:43,209 INFO [FetcherRunner 1910115020-57] > threadpool.FetcherRunner:63 : > CubingJob{id=878974c4-4c65-88a4-a912-b238fcc33bdc, name=BUILD CUBE - > es_report_respnse_rate_cube - 20190513000000_20190514000000 - GMT+08:00 > 2019-05-15 11:03:15, state=READY} scheduled > 2019-05-15 11:09:43,209 DEBUG [Scheduler 719764581 Job > 878974c4-4c65-88a4-a912-b238fcc33bdc-132] > zookeeper.ZookeeperDistributedLock:92 : > 18...@bigdata-kylin-build01.gz01.diditaxi.com trying to lock > /job_engine/lock/878974c4-4c65-88a4-a912-b238fcc33bdc > 2019-05-15 11:09:43,212 ERROR [pool-12-thread-10] > threadpool.DistributedScheduler:115 : unknown error execute > job:878974c4-4c65-88a4-a912-b238fcc33bdc in server: > 18...@bigdata-kylin-build01.gz01.diditaxi.com > java.lang.IllegalStateException: Error while > 18...@bigdata-kylin-build01.gz01.diditaxi.com trying to lock > /job_engine/lock/878974c4-4c65-88a4-a912-b238fcc33bdc > at > org.apache.kylin.job.lock.zookeeper.ZookeeperDistributedLock.lock(ZookeeperDistributedLock.java:99) > at > org.apache.kylin.job.lock.zookeeper.ZookeeperJobLock.lock(ZookeeperJobLock.java:41) > at > org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalStateException: instance must be started before > calling this method > at > org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:176) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.create(CuratorFrameworkImpl.java:351) > at > org.apache.kylin.job.lock.zookeeper.ZookeeperDistributedLock.lock(ZookeeperDistributedLock.java:95) > ... 5 more{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)