Xiaoxiang Yu created KYLIN-5067: ----------------------------------- Summary: CubeBuildJob build unnecessary snapshot Key: KYLIN-5067 URL: https://issues.apache.org/jira/browse/KYLIN-5067 Project: Kylin Issue Type: Bug Affects Versions: v4.0.0-beta Reporter: Xiaoxiang Yu Fix For: v4.1.0
In TPC-H benchmark, the query-13, which contains a 'left outer join', and its right table's join key(o_custkey), is not unique. And it will cause the build job failed with following exception. {code:java} java.lang.RuntimeException: Error execute org.apache.kylin.engine.spark.job.CubeBuildJob at org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:96) at org.apache.spark.application.JobWorker$$anon$2.run(JobWorker.scala:55) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalStateException: Failed to build lookup table V_ORDERS snapshot for Dup key found, key= O_CUSTKEY at org.apache.kylin.engine.spark.builder.CubeSnapshotBuilder$$anonfun$checkDupKey$1.apply(CubeSnapshotBuilder.scala:198) at org.apache.kylin.engine.spark.builder.CubeSnapshotBuilder$$anonfun$checkDupKey$1.apply(CubeSnapshotBuilder.scala:190) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.kylin.engine.spark.builder.CubeSnapshotBuilder.checkDupKey(CubeSnapshotBuilder.scala:189) at org.apache.kylin.engine.spark.job.ParentSourceChooser.decideFlatTableSource(ParentSourceChooser.scala:83) at org.apache.kylin.engine.spark.job.ParentSourceChooser$$anonfun$decideSources$1.apply(ParentSourceChooser.scala:71) at org.apache.kylin.engine.spark.job.ParentSourceChooser$$anonfun$decideSources$1.apply(ParentSourceChooser.scala:66) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.kylin.engine.spark.job.ParentSourceChooser.decideSources(ParentSourceChooser.scala:66) at org.apache.kylin.engine.spark.job.CubeBuildJob.doExecute(CubeBuildJob.java:178) at org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:304) at org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:93) ... 4 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)