[ 
https://issues.apache.org/jira/browse/KYLIN-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368010#comment-17368010
 ] 

Zhong Yanghong edited comment on KYLIN-4165 at 6/23/21, 10:06 AM:
------------------------------------------------------------------

Why we need a distributed lock for two stages, which may introduce other issues?

For example, when the first step errors due to that cube is disabled, the lock 
should be released. Currently only when job is discarded, the lock will be 
released.

How about fixing it just in *SaveDictStep*?


was (Author: yaho):
Why we need a distributed lock for two stages, which may introduce other issues?

How about fixing it just in *SaveDictStep*?

> RT OLAP building job on "Save Cube Dictionaries" step concurrency error
> -----------------------------------------------------------------------
>
>                 Key: KYLIN-4165
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4165
>             Project: Kylin
>          Issue Type: Bug
>          Components: Real-time Streaming
>    Affects Versions: v3.0.0-alpha
>            Reporter: wangxiaojing
>            Priority: Major
>             Fix For: v3.0.0
>
>
> There is a dictionary version conflict in "Save Cube Dictionaries" step when 
> build the realtime fsegment from remote persisted to reday,Which is very 
> serious,it will lead to unsuccessful updating of dictionaries by multiple 
> jobs concurrently.This may occurs when a cube has many concurrent building 
> jobs one the same step ——”Save Cube Dictionaries“ . 
> Perhaps a globally distributed lock is needed to avoid one cube concurrency 
> running of this step .
> Save Cube Dictionaries log messages:
> {code:java}
> // code placeholder
> org.apache.kylin.common.persistence.WriteConflictException: Overwriting 
> conflict 
> /dict/DEFAULT.TASK_SNAPSHOT/GROUPVALUE/5387e747-9649-0b17-5a72-ee17f5baea0a.dict,
>  expect old TS 1568012509090, but it is 1568012509245    at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
>     at 
> org.apache.kylin.common.persistence.ResourceStore$7.call(ResourceStore.java:465)
>     at 
> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
>     at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:462)
>     at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:457)
>     at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:452)
>     at 
> org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
>     at 
> org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
>     at 
> org.apache.kylin.engine.mr.streaming.SaveDictStep.doWork(SaveDictStep.java:122)
>     at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
>     at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>     at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
>     at 
> org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:110)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to