Hi Experts, Any help on this will be appreciated. We are building a cube and getting the below issue with Fact Distinct Job Step:
2022-06-03 07:41:22,088 ERROR [IPC Server handler 7 on 38377] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1654084270426_0161_r_000000_0 - exited : org.apache.kylin.common.exceptions.TooBigDictionaryException: Too big dictionary, dictionary cannot be bigger than 2GB at org.apache.kylin.dict.TrieDictionaryForestBuilder.checkDictSize(TrieDictionaryForestBuilder.java:143) at org.apache.kylin.dict.TrieDictionaryForestBuilder.addTree(TrieDictionaryForestBuilder.java:132) at org.apache.kylin.dict.TrieDictionaryForestBuilder.build(TrieDictionaryForestBuilder.java:104) at org.apache.kylin.dict.DictionaryGenerator$NumberTrieDictForestBuilder.build(DictionaryGenerator.java:219) at org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer.doCleanup(FactDistinctColumnsReducer.java:203) at org.apache.kylin.engine.mr.KylinReducer.cleanup(KylinReducer.java:96) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) Please help, we got stuck with this issue. Kylin Version - 3.1.3 Hadoop - 3.1.0 In code, I updated to Dict default size from 2GB to 4GB and after this change, we are getting the below error: Failure task Diagnostics: Error: java.lang.NegativeArraySizeException at org.apache.commons.io.output.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:366) at org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer.outputDict(FactDistinctColumnsReducer.java:235) at org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer.doCleanup(FactDistinctColumnsReducer.java:204) at org.apache.kylin.engine.mr.KylinReducer.cleanup(KylinReducer.java:96) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:226) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:172) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:62) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:172) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:106) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Also, we tried this step in Spark and got stuck by the below error: *26_0141/container_e08_1654084270426_0141_01_000010/__app__.jar!/kylin-defaults.properties2022-06-02 17:48:19,971 WARN common.KylinConfigBase: KYLIN_HOME was not set2022-06-02 17:48:19,974 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 226)org.apache.kylin.common.KylinConfigCannotInitException: Didn't find QUBZ_CONF or QUBZ_HOME, please set one of themat org.apache.kylin.common.KylinConfig.getSitePropertiesFile(KylinConfig.java:341)at org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:383)at org.apache.kylin.common.KylinConfig.buildSiteProperties(KylinConfig.java:363)at org.apache.kylin.common.KylinConfig.getInstanceFromEnv(KylinConfig.java:142)at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(TrieDictionaryBuilder.java:430)at org.apache.kylin.dict.TrieDictionaryBuilder.build(TrieDictionaryBuilder.java:415)at org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:95)at org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:72)at org.apache.kylin.dict.DictionaryGenerator$NumberTrieDictForestBuilder.addValue(DictionaryGenerator.java:213)at org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.calculateColData(SparkFactDistinct.java:823)at org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:758)at org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:642)at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)at org.apache.spark.scheduler.Task.run(Task.scala:123)at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411)at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)* You can reach me out at Mb. No- 7092292112 Email- [email protected] with regards, Sonu Kumar Singh
