[ 
https://issues.apache.org/jira/browse/KYLIN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849583#comment-16849583
 ] 

ASF GitHub Bot commented on KYLIN-4017:
---------------------------------------

wangxiaojing123 commented on pull request #664: KYLIN-4017 Build engine get 
zk(zookeeper) lock failed when building job, it causes the whole build engine 
doesn't work
URL: https://github.com/apache/kylin/pull/664
 
 
   ```
   【Type】:BUG 
   【Severity】:1-Blocker
   【Module】:Build Engine
   【Description】:Kylin build engine occasionally appears to be unable to get 
the ZK lock exception, and once this build engine appears, it will not work and 
can only be restarted to solve.Usually this problem will recur one day after 
the build engine starts.
   【Design】:Setting the cache for curator is never invalid (unless the service 
stops) and check the state before use curator instance (if closed ,create a new 
curator instance and put into the cache)
   ```
   ```
   issue: https://issues.apache.org/jira/browse/KYLIN-4017
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build engine get zk(zookeeper) lock failed when building job, it causes the 
> whole build engine doesn't work.
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-4017
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4017
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine, Tools, Build and Test
>    Affects Versions: Future, v3.0.0, v3.0.0-alpha
>            Reporter: wangxiaojing
>            Priority: Critical
>              Labels: build
>             Fix For: Future, v3.0.0-alpha
>
>         Attachments: zkinstancestart.png
>
>
> Kylin has ZK acquisition lock exception when it is building job. Only restart 
> can solve this problem. Otherwise, it can't build job ,the whole build engine 
> doesn't work.This problem will continue to occur one day after restart. Log 
> looks like below:
> {code:java}
> 2019-05-15 11:09:43,209 INFO [FetcherRunner 1910115020-57] 
> threadpool.FetcherRunner:59 : 
> CubingJob{id=878974c4-4c65-88a4-a912-b238fcc33bdc, name=BUILD CUBE - 
> es_report_respnse_rate_cube - 20190513000000_20190514000000 - GMT+08:00 
> 2019-05-15 11:03:15, state=READY} prepare to schedule and its priority is 20
> 2019-05-15 11:09:43,209 INFO [FetcherRunner 1910115020-57] 
> threadpool.FetcherRunner:63 : 
> CubingJob{id=878974c4-4c65-88a4-a912-b238fcc33bdc, name=BUILD CUBE - 
> es_report_respnse_rate_cube - 20190513000000_20190514000000 - GMT+08:00 
> 2019-05-15 11:03:15, state=READY} scheduled
> 2019-05-15 11:09:43,209 DEBUG [Scheduler 719764581 Job 
> 878974c4-4c65-88a4-a912-b238fcc33bdc-132] 
> zookeeper.ZookeeperDistributedLock:92 : 
> 18...@bigdata-kylin-build01.gz01.diditaxi.com trying to lock 
> /job_engine/lock/878974c4-4c65-88a4-a912-b238fcc33bdc
> 2019-05-15 11:09:43,212 ERROR [pool-12-thread-10] 
> threadpool.DistributedScheduler:115 : unknown error execute 
> job:878974c4-4c65-88a4-a912-b238fcc33bdc in server: 
> 18...@bigdata-kylin-build01.gz01.diditaxi.com
> java.lang.IllegalStateException: Error while 
> 18...@bigdata-kylin-build01.gz01.diditaxi.com trying to lock 
> /job_engine/lock/878974c4-4c65-88a4-a912-b238fcc33bdc
>  at 
> org.apache.kylin.job.lock.zookeeper.ZookeeperDistributedLock.lock(ZookeeperDistributedLock.java:99)
>  at 
> org.apache.kylin.job.lock.zookeeper.ZookeeperJobLock.lock(ZookeeperJobLock.java:41)
>  at 
> org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:105)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalStateException: instance must be started before 
> calling this method
>  at 
> org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>  at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.create(CuratorFrameworkImpl.java:351)
>  at 
> org.apache.kylin.job.lock.zookeeper.ZookeeperDistributedLock.lock(ZookeeperDistributedLock.java:95)
>  ... 5 more{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to