[ https://issues.apache.org/jira/browse/KYLIN-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014390#comment-17014390 ]
weibin0516 commented on KYLIN-4339: ----------------------------------- Need to appear in a cluster environment, there is no problem with docker trial > Extract Fact Table Distinct Columns fail due to no kylin installed on worker > node > --------------------------------------------------------------------------------- > > Key: KYLIN-4339 > URL: https://issues.apache.org/jira/browse/KYLIN-4339 > Project: Kylin > Issue Type: Bug > Affects Versions: v3.0.0 > Reporter: weibin0516 > Assignee: weibin0516 > Priority: Major > > After set kylin.engine.spark-fact-distinct to true, using spark engine to > build cube will fail, error message as follow > {code:java} > 2020-01-13 22:19:23 INFO BlockManagerMaster:54 - BlockManagerMaster stopped > 2020-01-13 22:19:23 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - > OutputCommitCoordinator stopped! > 2020-01-13 22:19:23 INFO SparkContext:54 - Successfully stopped SparkContext > Exception in thread "main" java.lang.RuntimeException: error execute > org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Job aborted due > to stage failure: Task 9 in stage 1.0 failed 4 times, most recent failure: > Lost task 9.3 in stage 1.0 (TID 32, > sql-gateway-eu95-17.gz00c.test.alipay.net, executor 7): > org.apache.kylin.common.KylinConfigCannotInitException: Didn't find > KYLIN_CONF or KYLIN_HOME, please set one of them > at > org.apache.kylin.common.KylinConfig.getSitePropertiesFile(KylinConfig.java:336) > at > org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:378) > at > org.apache.kylin.common.KylinConfig.buildSiteProperties(KylinConfig.java:358) > at > org.apache.kylin.common.KylinConfig.getInstanceFromEnv(KylinConfig.java:137) > at > org.apache.kylin.dict.CacheDictionary.enableCache(CacheDictionary.java:105) > at > org.apache.kylin.dict.TrieDictionaryForest.initForestCache(TrieDictionaryForest.java:394) > at > org.apache.kylin.dict.TrieDictionaryForest.init(TrieDictionaryForest.java:77) > at > org.apache.kylin.dict.TrieDictionaryForest.<init>(TrieDictionaryForest.java:67) > at > org.apache.kylin.dict.TrieDictionaryForestBuilder.build(TrieDictionaryForestBuilder.java:114) > at > org.apache.kylin.dict.DictionaryGenerator$NumberTrieDictForestBuilder.build(DictionaryGenerator.java:312) > at > org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:774) > at > org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:650) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:800) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:800) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:756) > {code} > we should put kylin.properties in the execution environment of spark > application (via --files) to fix this problem -- This message was sent by Atlassian Jira (v8.3.4#803005)