[ 
https://issues.apache.org/jira/browse/KYLIN-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925703#comment-16925703
 ] 

ASF subversion and git services commented on KYLIN-4153:
--------------------------------------------------------

Commit 7e117e27764dc94cd627b0bd3dc4f4bbbf7f4a3e in kylin's branch 
refs/heads/master from XiaoxiangYu
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=7e117e2 ]

KYLIN-4153 Delete marker if real file not exists


> Failed to read big resource  /dict/xxxx at "Build Dimension Dictionary" Step
> ----------------------------------------------------------------------------
>
>                 Key: KYLIN-4153
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4153
>             Project: Kylin
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: v2.6.0
>            Reporter: Xiaoxiang Yu
>            Assignee: Xiaoxiang Yu
>            Priority: Major
>
> At the version of *Kylin 2.6.0*, kylin team has introduce an important 
> refactor of Kylin's Metadata Store, which add a lot of enhancement such as 
> upload/download metadata concurrently, store metadata with JDBC etc. Please 
> refer to https://issues.apache.org/jira/browse/KYLIN-3671 for detail.
>  
> When kylin want to save a *big resource*(such as dict or snapshot) into 
> metadata store, it won't store it into metadata store(HBase or RDBMS) 
> directly. Instead, kylin will first {color:red}save it into HDFS(Step 
> 1){color}, and then {color:red}write a empty byte array as marker into 
> metadata store(Step 2) {color}. If first action succeed and second action 
> failed, a rollback method will be called to revert modification for HDFS 
> files. We could regard it as a complete and atomic transaction.
>  
> {color:#0747A6}Here is part of the source code added in KYLIN-3671.{color} 
> Check it at 
> https://github.com/apache/kylin/blob/8737bc1f555a2789a67462c8f8420b6ab3be97ce/core-common/src/main/java/org/apache/kylin/common/persistence/PushdownResourceStore.java#L58
>  . 
> {code:java}
> final void putBigResource(String resPath, ContentWriter content, long newTS) 
> throws IOException {
>     // pushdown the big resource to DFS file
>     RollbackablePushdown pushdown = writePushdown(resPath, content); // Step 
> 1: write big resource into HDFS
>     try {
>         // write a marker in resource store, to indicate the resource is now 
> available
>         logger.debug("Writing marker for big resource {}", resPath);
>         putResourceWithRetry(resPath, 
> ContentWriter.create(BytesUtil.EMPTY_BYTE_ARRAY), newTS); // Step 2: write 
> marker into HBase/RDBMS
>     } catch (Throwable ex) {
>         pushdown.rollback();
>         throw ex;
>     } finally {
>         pushdown.close();
>     }
> }
> {code}
>  
>  
>  
> But in some case, both step 1 and step 2 succeed but an exception still 
> throwed in step 2,{color:red} the rollback won't clear marker written in Step 
> 2{color}, which break the atomicity of this put action, thus cause the 
> FileNotFoundException when Kylin want to read that dict later.
>  
>  
>  
> {color:#0747A6}Here is part of reporter's kylin.log of incomplete rollback 
> action.{color}
>  
>       
> {noformat}
>  2019-08-29 05:13:51,237 INFO  [Scheduler 169045403 Job 
> ca4a4a08-54e2-b922-70bb-2aa2bf58709f-492] dict.DictionaryManager:388 : Saving 
> dictionary at 
> /dict/KYLIN_VIEW.USER_SECRET_TABLE/COUNTRY/66292068-e8eb-975a-3e44-b56c933c14cc.dict
> 2019-08-29 05:13:51,238 DEBUG [Scheduler 169045403 Job 
> ca4a4a08-54e2-b922-70bb-2aa2bf58709f-492] persistence.HDFSResourceStore:98 : 
> Writing pushdown file 
> /kylin/kylin_metadata/resources/dict/KYLIN_VIEW.USER_SECRET_TABLE/COUNTRY/66292068-e8eb-975a-3e44-b56c933c14cc.dict.temp.-1798610090
> 2019-08-29 05:13:51,256 DEBUG [Scheduler 169045403 Job 
> ca4a4a08-54e2-b922-70bb-2aa2bf58709f-492] persistence.HDFSResourceStore:117 : 
> Move 
> /kylin/kylin_metadata/resources/dict/KYLIN_VIEW.USER_SECRET_TABLE/COUNTRY/66292068-e8eb-975a-3e44-b56c933c14cc.dict.temp.-1798610090
>  to 
> /kylin/kylin_metadata/resources/dict/KYLIN_VIEW.USER_SECRET_TABLE/COUNTRY/66292068-e8eb-975a-3e44-b56c933c14cc.dict
> 2019-08-29 05:13:51,258 DEBUG [Scheduler 169045403 Job 
> ca4a4a08-54e2-b922-70bb-2aa2bf58709f-492] persistence.HDFSResourceStore:65 : 
> Writing marker for big resource 
> /dict/KYLIN_VIEW.USER_SECRET_TABLE/COUNTRY/66292068-e8eb-975a-3e44-b56c933c14cc.dict
> 2019-08-29 05:13:56,263 WARN  
> [hconnection-0x56f3258e-shared--pool10944-t54867] client.AsyncProcess:1263 : 
> #10545, table=kylin_metadata, attempt=1/1 failed=1ops, last exception: 
> java.io.IOException: Call to tx-dn41.data/10.14.243.51:60020 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2662317, 
> waitTime=5001, operationTimeout=5000 expired. on 
> tx-dn41.data,60020,1565943919204, tracking started Thu Aug 29 05:13:51 
> GMT+08:00 2019; not retrying 1 - final failure
> 2019-08-29 05:13:56,266 ERROR [Scheduler 169045403 Job 
> ca4a4a08-54e2-b922-70bb-2aa2bf58709f-492] persistence.HDFSResourceStore:134 : 
> Rollback 
> /kylin/kylin_metadata/resources/dict/KYLIN_VIEW.USER_SECRET_TABLE/COUNTRY/66292068-e8eb-975a-3e44-b56c933c14cc.dict
>  from <empty>
> 2019-08-29 05:13:56,274 ERROR [Scheduler 169045403 Job 
> ca4a4a08-54e2-b922-70bb-2aa2bf58709f-492] common.HadoopShellExecutable:65 : 
> error execute 
> HadoopShellExecutable{id=ca4a4a08-54e2-b922-70bb-2aa2bf58709f-03, name=Build 
> Dimension Dictionary, state=RUNNING}
> 2019-08-29 05:13:56,274 INFO  [Scheduler 169045403 Job 
> ca4a4a08-54e2-b922-70bb-2aa2bf58709f-492] execution.AbstractExecutable:162 : 
> Retry 1
> {noformat}
>  
>  
>  
>  
> {color:#0747A6}Here is part of reporter's kylin.log of reading a non-exist 
> dict in HDFS in "Build Dimension Dictionary" Step. 
> {color} 
>        
> {noformat}
> 2019-08-29 14:54:59,602 INFO  [Scheduler 343338459 Job 
> af4b847d-afa6-3729-4c19-03a5db08447b-498] steps.CreateDictionaryJob:110 : 
> DictionaryProvider read dict from file: 
> hdfs://CDH-cluster-main/kylin/kylin_metadata/kylin-af4b847d-afa6-3729-4c19-03a5db08447b/209_new_device/fact_distinct_columns/USER_SECRET_TABLE.COUNTRY/COUNTRY.rldict-r-00004
> 2019-08-29 14:54:59,602 DEBUG [Scheduler 343338459 Job 
> af4b847d-afa6-3729-4c19-03a5db08447b-498] cli.DictionaryGeneratorCLI:73 : 
> Dict for 'COUNTRY' has already been built, save it
> 2019-08-29 14:54:59,720 ERROR [Scheduler 343338459 Job 
> af4b847d-afa6-3729-4c19-03a5db08447b-498] persistence.ResourceStore:233 : 
> Error reading resource 
> /dict/KYLIN_VIEW.USER_SECRET_TABLE/COUNTRY/66292068-e8eb-975a-3e44-b56c933c14cc.dict
> java.io.IOException: Failed to read big resource 
> /dict/KYLIN_VIEW.USER_SECRET_TABLE/COUNTRY/66292068-e8eb-975a-3e44-b56c933c14cc.dict
>        at 
> org.apache.kylin.common.persistence.PushdownResourceStore.openPushdown(PushdownResourceStore.java:176)
>        at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.getInputStream(HBaseResourceStore.java:256)
>        at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.rawResource(HBaseResourceStore.java:226)
>        at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.access$000(HBaseResourceStore.java:64)
>        at 
> org.apache.kylin.storage.hbase.HBaseResourceStore$1.visit(HBaseResourceStore.java:159)
>        at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.visitFolder(HBaseResourceStore.java:204)
>        at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.visitFolderImpl(HBaseResourceStore.java:152)
>        at 
> org.apache.kylin.common.persistence.ResourceStore.visitFolderInner(ResourceStore.java:689)
>        at 
> org.apache.kylin.common.persistence.ResourceStore.visitFolderAndContent(ResourceStore.java:675)
>        at 
> org.apache.kylin.common.persistence.ResourceStore$2.call(ResourceStore.java:224)
>        at 
> org.apache.kylin.common.persistence.ResourceStore$2.call(ResourceStore.java:220)
>        at 
> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
>        at 
> org.apache.kylin.common.persistence.ResourceStore.getAllResources(ResourceStore.java:220)
>        at 
> org.apache.kylin.common.persistence.ResourceStore.getAllResources(ResourceStore.java:209)
>        at 
> org.apache.kylin.dict.DictionaryManager.checkDupByInfo(DictionaryManager.java:334)
>        at 
> org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:314)
>        at 
> org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1127)
>        at 
> org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1089)
>        at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:74)
>        at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:55)
>        at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
>        at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
>        at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>        at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>        at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>        at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
>        at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>        at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: 
> /kylin/kylin_metadata/resources/dict/KYLIN_VIEW.USER_SECRET_TABLE/COUNTRY/66292068-e8eb-975a-3e44-b56c933c14cc.dict
>   (FS: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-784809092_27, 
> ugi=kylin (auth:SIMPLE)]])
>        at 
> org.apache.kylin.common.persistence.PushdownResourceStore.openPushdown(PushdownResourceStore.java:173)
>        ... 29 more
> {noformat}
>  
> This often happen in Build Step 4: {color:#0747A6}Build Dimension 
> Dictionary{color}. And this incomplete metadata entry will cause same 
> failure(*_FileNotFoundException_*) of {color:#DE350B}*ALL*{color} following 
> cube rebuild job.
>  
> As far as I can see, my *{color:#0747A6}workaround{color}* should be delete 
> that marker. Since this is a broken metadata entry, deletion won't make 
> damage. After the deletion, following rebuilt job will succeed.
>  
> This is some related report mail :
> 1. 
> http://apache-kylin.74782.x6.nabble.com/How-to-repair-the-cube-that-it-lost-someone-dictionary-td12989.html
> 2. 
> http://mail-archives.apache.org/mod_mbox/kylin-user/201908.mbox/%3c4bcca64e.4af8.16cdb473a62.coremail.itzhangqi...@163.com%3e
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to