[jira] [Updated] (KYLIN-5187) Support Alluxio Local Cache + Soft Affinity to speed up the query performance on the cloud
[ https://issues.apache.org/jira/browse/KYLIN-5187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-5187: -- Description: Support Alluxio Local Cache + Soft Affinity to speed up the query performance on the cloud. Currently, this feature could be only supported for Spark 3.1. Refer to : [Presto RaptorX|https://prestodb.io/blog/2021/02/04/raptorx] was: Support Alluxio Local Cache + Soft Affinity to speed up the query performance on the cloud. Refer to : [Presto RaptorX|https://prestodb.io/blog/2021/02/04/raptorx] > Support Alluxio Local Cache + Soft Affinity to speed up the query performance > on the cloud > -- > > Key: KYLIN-5187 > URL: https://issues.apache.org/jira/browse/KYLIN-5187 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Affects Versions: v4.0.1 >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Major > Fix For: v4.1.0 > > > Support Alluxio Local Cache + Soft Affinity to speed up the query performance > on the cloud. > Currently, this feature could be only supported for Spark 3.1. > Refer to : [Presto RaptorX|https://prestodb.io/blog/2021/02/04/raptorx] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (KYLIN-5187) Support Alluxio Local Cache + Soft Affinity to speed up the query performance on the cloud
Zhichao Zhang created KYLIN-5187: - Summary: Support Alluxio Local Cache + Soft Affinity to speed up the query performance on the cloud Key: KYLIN-5187 URL: https://issues.apache.org/jira/browse/KYLIN-5187 Project: Kylin Issue Type: New Feature Components: Query Engine Affects Versions: v4.0.1 Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.1.0 Support Alluxio Local Cache + Soft Affinity to speed up the query performance on the cloud. Refer to : [Presto RaptorX|https://prestodb.io/blog/2021/02/04/raptorx] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (KYLIN-5057) CubeBuildJob in Kylin4.0 run failed when open Spark3.1 AQE
[ https://issues.apache.org/jira/browse/KYLIN-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395861#comment-17395861 ] Zhichao Zhang commented on KYLIN-5057: --- Thank [~tianhui5], I reproduce this issue, and will check it ASAP > CubeBuildJob in Kylin4.0 run failed when open Spark3.1 AQE > -- > > Key: KYLIN-5057 > URL: https://issues.apache.org/jira/browse/KYLIN-5057 > Project: Kylin > Issue Type: Bug >Reporter: tianhui >Priority: Major > Attachments: errorStack.log > > > When I use standalone docker image with my own Kylin tar.gz, I add a > configuration to kylin.properties. > `kylin.engine.spark-conf.spark.sql.adaptive.enabled=true` > Then I build sample cube and run failed. I can get the error stack in Spark > driver log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-5058) Throws ConcurrentModificationException when building cube
[ https://issues.apache.org/jira/browse/KYLIN-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-5058: -- Description: When building cubes, there is an error 'ConcurrentModificationException' thrown by Spark, but it does not impact the building job. This is a known issue and was fixed on Spark 3.1.2. https://issues.apache.org/jira/browse/SPARK-34731 Fixed PR: [https://github.com/apache/spark/pull/31826] So It needs to upgrade Spark 3.X version to 3.1.2 was: When building cubes, there is an error 'ConcurrentModificationException' thrown by Spark, but it does not impact the building job. This is a known issue and was fixed on Spark 3.1.2. So It needs to upgrade Spark 3.X version to 3.1.2 > Throws ConcurrentModificationException when building cube > - > > Key: KYLIN-5058 > URL: https://issues.apache.org/jira/browse/KYLIN-5058 > Project: Kylin > Issue Type: Bug > Components: Spark Engine >Affects Versions: v4.0.0-beta >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.1.0 > > > When building cubes, there is an error 'ConcurrentModificationException' > thrown by Spark, but it does not impact the building job. > This is a known issue and was fixed on Spark 3.1.2. > https://issues.apache.org/jira/browse/SPARK-34731 > Fixed PR: [https://github.com/apache/spark/pull/31826] > So It needs to upgrade Spark 3.X version to 3.1.2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-5058) Throws ConcurrentModificationException when building cube
Zhichao Zhang created KYLIN-5058: - Summary: Throws ConcurrentModificationException when building cube Key: KYLIN-5058 URL: https://issues.apache.org/jira/browse/KYLIN-5058 Project: Kylin Issue Type: Bug Components: Spark Engine Affects Versions: v4.0.0-beta Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.1.0 When building cubes, there is an error 'ConcurrentModificationException' thrown by Spark, but it does not impact the building job. This is a known issue and was fixed on Spark 3.1.2. So It needs to upgrade Spark 3.X version to 3.1.2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-5057) CubeBuildJob in Kylin4.0 run failed when open Spark3.1 AQE
[ https://issues.apache.org/jira/browse/KYLIN-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395486#comment-17395486 ] Zhichao Zhang commented on KYLIN-5057: --- [~tianhui5], I can't reproduce this error on my docker env, can you show your 'kylin.properties' to check? > CubeBuildJob in Kylin4.0 run failed when open Spark3.1 AQE > -- > > Key: KYLIN-5057 > URL: https://issues.apache.org/jira/browse/KYLIN-5057 > Project: Kylin > Issue Type: Bug >Reporter: tianhui >Priority: Major > Attachments: errorStack.log > > > When I use standalone docker image with my own Kylin tar.gz, I add a > configuration to kylin.properties. > `kylin.engine.spark-conf.spark.sql.adaptive.enabled=true` > Then I build sample cube and run failed. I can get the error stack in Spark > driver log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-5057) CubeBuildJob in Kylin4.0 run failed when open Spark3.1 AQE
[ https://issues.apache.org/jira/browse/KYLIN-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394700#comment-17394700 ] Zhichao Zhang commented on KYLIN-5057: --- Thank [~tianhui5], I will check this issue. > CubeBuildJob in Kylin4.0 run failed when open Spark3.1 AQE > -- > > Key: KYLIN-5057 > URL: https://issues.apache.org/jira/browse/KYLIN-5057 > Project: Kylin > Issue Type: Bug >Reporter: tianhui >Priority: Major > Attachments: errorStack.log > > > When I use standalone docker image with my own Kylin tar.gz, I add a > configuration to kylin.properties. > `kylin.engine.spark-conf.spark.sql.adaptive.enabled=true` > Then I build sample cube and run failed. I can get the error stack in Spark > driver log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (KYLIN-4762) Optimize join where there is the same shardby partition num on join key
[ https://issues.apache.org/jira/browse/KYLIN-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang reopened KYLIN-4762: --- This issue isn't resolved yet. > Optimize join where there is the same shardby partition num on join key > --- > > Key: KYLIN-4762 > URL: https://issues.apache.org/jira/browse/KYLIN-4762 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0 > > Attachments: shardby_join.png > > > Optimize join by reducing shuffle when there is the same shard by partition > number on join key. > When execute this sql, > {code:java} > // code placeholder > select m.seller_id, m.part_dt, sum(m.price) as s > from kylin_sales m > left join ( > select m1.part_dt as pd, count(distinct m1.SELLER_ID) as m1, count(1) as m2 > > from kylin_sales m1 > where m1.part_dt = '2012-01-05' > group by m1.part_dt > ) j > on m.part_dt = j.pd > where m.lstg_format_name = 'FP-GTC' > and m.part_dt = '2012-01-05' > group by m.seller_id, m.part_dt limit 100; > {code} > the execution plan is shown below: > !shardby_join.png! > But the join key part_dt has the same shard by partition number, it can be > optimized to reduce shuffle, similar to bucket join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4762) Optimize join where there is the same shardby partition num on join key
[ https://issues.apache.org/jira/browse/KYLIN-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4762: -- Fix Version/s: (was: v4.0.0) v4.1.0 > Optimize join where there is the same shardby partition num on join key > --- > > Key: KYLIN-4762 > URL: https://issues.apache.org/jira/browse/KYLIN-4762 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.1.0 > > Attachments: shardby_join.png > > > Optimize join by reducing shuffle when there is the same shard by partition > number on join key. > When execute this sql, > {code:java} > // code placeholder > select m.seller_id, m.part_dt, sum(m.price) as s > from kylin_sales m > left join ( > select m1.part_dt as pd, count(distinct m1.SELLER_ID) as m1, count(1) as m2 > > from kylin_sales m1 > where m1.part_dt = '2012-01-05' > group by m1.part_dt > ) j > on m.part_dt = j.pd > where m.lstg_format_name = 'FP-GTC' > and m.part_dt = '2012-01-05' > group by m.seller_id, m.part_dt limit 100; > {code} > the execution plan is shown below: > !shardby_join.png! > But the join key part_dt has the same shard by partition number, it can be > optimized to reduce shuffle, similar to bucket join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-5014) Spark driver log is abnormal in yarn cluster mode
[ https://issues.apache.org/jira/browse/KYLIN-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-5014. --- Resolution: Fixed > Spark driver log is abnormal in yarn cluster mode > - > > Key: KYLIN-5014 > URL: https://issues.apache.org/jira/browse/KYLIN-5014 > Project: Kylin > Issue Type: Bug >Reporter: Yaqian Zhang >Assignee: Yaqian Zhang >Priority: Minor > Fix For: v4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-5008) backend spark was failed, but corresponding job status is shown as finished in WebUI
[ https://issues.apache.org/jira/browse/KYLIN-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang reassigned KYLIN-5008: - Assignee: Yaqian Zhang (was: Zhichao Zhang) > backend spark was failed, but corresponding job status is shown as finished > in WebUI > - > > Key: KYLIN-5008 > URL: https://issues.apache.org/jira/browse/KYLIN-5008 > Project: Kylin > Issue Type: Bug >Affects Versions: v4.0.0-beta >Reporter: ZHANGHONGJIA >Assignee: Yaqian Zhang >Priority: Major > Attachments: image-2021-06-10-16-46-35-919.png, merge-job.log > > > According to the log shown as below, the spark project was failed due to > Container killed by YARN for exceeding memory limits , but in Kylin WebUI > ,the status of the mergeJob is finished. Besides, the amount of data in the > segment after merged is as three times as the amount of actual data . It > seems that kylin didn't monitor the failure of this merge job. > > Here is the merge job log : > === > at > org.apache.kylin.engine.spark.job.BuildLayoutWithUpdate$1.call(BuildLayoutWithUpdate.java:43) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 244 in stage 1108.0 failed 4 times, most recent failure: Lost task 244.3 > in stage 1108.0 (TID 78736, r4200h1-app.travelsky.com, executor 109): > ExecutorLostFailure (executor 109 exited caused by one of the running tasks) > Reason: Container killed by YARN for exceeding memory limits. 39.0 GB of 36 > GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead > or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714. > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:167) > ... 34 more > } > RetryInfo{ > overrideConf : \{spark.executor.memory=36618MB, > spark.executor.memoryOverhead=7323MB}, > throwable : java.lang.RuntimeException: Error execute > org.apache.kylin.engine.spark.job.CubeMergeJob > at > org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92) > at org.apache.spark.application.JobWorker$$anon$2.run(JobWorker.scala:55) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: org.apache.spark.SparkException: Job > aborted. > at > org.apache.kylin.engine.spark.job.BuildLayoutWithUpdate.updateLayout(BuildLayoutWithUpdate.java:70) > at > org.apache.kylin.engine.spark.job.CubeMergeJob.mergeSegments(CubeMergeJob.java:122) > at > org.apache.kylin.engine.spark.job.CubeMergeJob.doExecute(CubeMergeJob.java:82) > at > org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:298) > at > org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89) > ... 4 more > Caused by:
[jira] [Assigned] (KYLIN-5008) backend spark was failed, but corresponding job status is shown as finished in WebUI
[ https://issues.apache.org/jira/browse/KYLIN-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang reassigned KYLIN-5008: - Assignee: Zhichao Zhang > backend spark was failed, but corresponding job status is shown as finished > in WebUI > - > > Key: KYLIN-5008 > URL: https://issues.apache.org/jira/browse/KYLIN-5008 > Project: Kylin > Issue Type: Bug >Affects Versions: v4.0.0-beta >Reporter: ZHANGHONGJIA >Assignee: Zhichao Zhang >Priority: Major > Attachments: image-2021-06-10-16-46-35-919.png, merge-job.log > > > According to the log shown as below, the spark project was failed due to > Container killed by YARN for exceeding memory limits , but in Kylin WebUI > ,the status of the mergeJob is finished. Besides, the amount of data in the > segment after merged is as three times as the amount of actual data . It > seems that kylin didn't monitor the failure of this merge job. > > Here is the merge job log : > === > at > org.apache.kylin.engine.spark.job.BuildLayoutWithUpdate$1.call(BuildLayoutWithUpdate.java:43) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 244 in stage 1108.0 failed 4 times, most recent failure: Lost task 244.3 > in stage 1108.0 (TID 78736, r4200h1-app.travelsky.com, executor 109): > ExecutorLostFailure (executor 109 exited caused by one of the running tasks) > Reason: Container killed by YARN for exceeding memory limits. 39.0 GB of 36 > GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead > or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714. > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:167) > ... 34 more > } > RetryInfo{ > overrideConf : \{spark.executor.memory=36618MB, > spark.executor.memoryOverhead=7323MB}, > throwable : java.lang.RuntimeException: Error execute > org.apache.kylin.engine.spark.job.CubeMergeJob > at > org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92) > at org.apache.spark.application.JobWorker$$anon$2.run(JobWorker.scala:55) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: org.apache.spark.SparkException: Job > aborted. > at > org.apache.kylin.engine.spark.job.BuildLayoutWithUpdate.updateLayout(BuildLayoutWithUpdate.java:70) > at > org.apache.kylin.engine.spark.job.CubeMergeJob.mergeSegments(CubeMergeJob.java:122) > at > org.apache.kylin.engine.spark.job.CubeMergeJob.doExecute(CubeMergeJob.java:82) > at > org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:298) > at > org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89) > ... 4 more > Caused by:
[jira] [Resolved] (KYLIN-4926) Optimize Global Dict building: replace operation 'mapPartitions.count()' with 'foreachPartitions'
[ https://issues.apache.org/jira/browse/KYLIN-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4926. --- Resolution: Fixed > Optimize Global Dict building: replace operation 'mapPartitions.count()' with > 'foreachPartitions' > - > > Key: KYLIN-4926 > URL: https://issues.apache.org/jira/browse/KYLIN-4926 > Project: Kylin > Issue Type: Improvement > Components: Spark Engine >Affects Versions: v4.0.0-beta >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-GA > > > Replace operation 'mapPartitions.count()' with 'foreachPartitions' when > building Global Dict -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4741) Support to config the sparder application name
[ https://issues.apache.org/jira/browse/KYLIN-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4741. --- Fix Version/s: v4.0.0-beta Resolution: Fixed > Support to config the sparder application name > -- > > Key: KYLIN-4741 > URL: https://issues.apache.org/jira/browse/KYLIN-4741 > Project: Kylin > Issue Type: Improvement >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > Support to config the sparder application name -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4937) Verify the uniqueness of the global dictionary after building global dictionary
[ https://issues.apache.org/jira/browse/KYLIN-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4937. --- Resolution: Fixed > Verify the uniqueness of the global dictionary after building global > dictionary > --- > > Key: KYLIN-4937 > URL: https://issues.apache.org/jira/browse/KYLIN-4937 > Project: Kylin > Issue Type: Improvement > Components: Spark Engine >Affects Versions: v4.0.0-beta >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-GA > > > Verify the uniqueness of the global dictionary after building global > dictionary -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4927) Forbid to use AE when building Global Dict
[ https://issues.apache.org/jira/browse/KYLIN-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4927. --- Resolution: Fixed > Forbid to use AE when building Global Dict > -- > > Key: KYLIN-4927 > URL: https://issues.apache.org/jira/browse/KYLIN-4927 > Project: Kylin > Issue Type: Improvement > Components: Spark Engine >Affects Versions: v4.0.0-beta >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-GA > > > When building Global Dict, it's forbidden to use AE, which will change the > partition num dynamically and lead to wrong Global Dict result. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4936) Exactly aggregation can't transform to project
[ https://issues.apache.org/jira/browse/KYLIN-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4936. --- Resolution: Fixed > Exactly aggregation can't transform to project > -- > > Key: KYLIN-4936 > URL: https://issues.apache.org/jira/browse/KYLIN-4936 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v4.0.0-beta >Reporter: ShengJun Zheng >Assignee: Zhichao Zhang >Priority: Major > Fix For: v4.0.0-GA > > > Exactly Aggregate can't transform to project, causing unnecessary spark > shuffle ! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4980) Support prunning segments from complex filter conditions
[ https://issues.apache.org/jira/browse/KYLIN-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4980. --- Resolution: Fixed > Support prunning segments from complex filter conditions > > > Key: KYLIN-4980 > URL: https://issues.apache.org/jira/browse/KYLIN-4980 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v4.0.0-beta >Reporter: ShengJun Zheng >Assignee: ShengJun Zheng >Priority: Major > Fix For: v4.0.0-GA > > > Segment pruner can't prune segment from complex filter conditions, like the > filter condition below: > "where (col_a = xxx and col_partition = xxx) or (col_b=xxx and col_partition > = xxx)" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4967) Forbid to set 'spark.sql.adaptive.enabled' to true when building cube with Spark 2.X
[ https://issues.apache.org/jira/browse/KYLIN-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4967: -- Affects Version/s: v4.0.0-beta > Forbid to set 'spark.sql.adaptive.enabled' to true when building cube with > Spark 2.X > > > Key: KYLIN-4967 > URL: https://issues.apache.org/jira/browse/KYLIN-4967 > Project: Kylin > Issue Type: Bug >Affects Versions: v4.0.0-beta >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-GA > > > With spark 2.X, when set 'spark.sql.adaptive.enabled' to true, it will impact > the actually partition count when doing repartition with spark, which will > lead to the wrong results for global dict and repartition by shardby column. > For example, after writing a cuboid data, kylin will repartition the cuboid > data with 3 partition if need, but if 'spark.sql.adaptive.enabled' is true, > spark will optimize the partition num to 1, which leads to wrong. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4967) Forbid to set 'spark.sql.adaptive.enabled' to true when building cube with Spark 2.X
[ https://issues.apache.org/jira/browse/KYLIN-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4967: -- Fix Version/s: v4.0.0-GA > Forbid to set 'spark.sql.adaptive.enabled' to true when building cube with > Spark 2.X > > > Key: KYLIN-4967 > URL: https://issues.apache.org/jira/browse/KYLIN-4967 > Project: Kylin > Issue Type: Bug >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-GA > > > With spark 2.X, when set 'spark.sql.adaptive.enabled' to true, it will impact > the actually partition count when doing repartition with spark, which will > lead to the wrong results for global dict and repartition by shardby column. > For example, after writing a cuboid data, kylin will repartition the cuboid > data with 3 partition if need, but if 'spark.sql.adaptive.enabled' is true, > spark will optimize the partition num to 1, which leads to wrong. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4965) model中使用join表中的字段作为filter过滤条件,在cube构建时报错
[ https://issues.apache.org/jira/browse/KYLIN-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4965: -- Fix Version/s: (was: Future) v4.0.0-GA > model中使用join表中的字段作为filter过滤条件,在cube构建时报错 > > > Key: KYLIN-4965 > URL: https://issues.apache.org/jira/browse/KYLIN-4965 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v4.0.0-beta > Environment: cdh 5.14.2,hadoop 2.6.0 > kylin 4.0 beta > spark-2.4.6-bin-hadoop2.7 >Reporter: liulei_first >Assignee: Zhichao Zhang >Priority: Major > Fix For: v4.0.0-GA > > Attachments: kylin_buildcube_error.jpg > > > model中使用join表(维度表)中的字段作为filter过滤条件,在cube构建时报错,提示过滤条件不在given input > columns。看了下given input columns中的列都是事实表中字段。 > model中,已将维度表的过滤条件字段设置为dimension -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4967) Forbid to set 'spark.sql.adaptive.enabled' to true when building cube with Spark 2.X
Zhichao Zhang created KYLIN-4967: - Summary: Forbid to set 'spark.sql.adaptive.enabled' to true when building cube with Spark 2.X Key: KYLIN-4967 URL: https://issues.apache.org/jira/browse/KYLIN-4967 Project: Kylin Issue Type: Bug Reporter: Zhichao Zhang Assignee: Zhichao Zhang With spark 2.X, when set 'spark.sql.adaptive.enabled' to true, it will impact the actually partition count when doing repartition with spark, which will lead to the wrong results for global dict and repartition by shardby column. For example, after writing a cuboid data, kylin will repartition the cuboid data with 3 partition if need, but if 'spark.sql.adaptive.enabled' is true, spark will optimize the partition num to 1, which leads to wrong. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4965) model中使用join表中的字段作为filter过滤条件,在cube构建时报错
[ https://issues.apache.org/jira/browse/KYLIN-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang reassigned KYLIN-4965: - Assignee: Zhichao Zhang > model中使用join表中的字段作为filter过滤条件,在cube构建时报错 > > > Key: KYLIN-4965 > URL: https://issues.apache.org/jira/browse/KYLIN-4965 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v4.0.0-beta > Environment: cdh 5.14.2,hadoop 2.6.0 > kylin 4.0 beta > spark-2.4.6-bin-hadoop2.7 >Reporter: liulei_first >Assignee: Zhichao Zhang >Priority: Major > Fix For: Future > > Attachments: kylin_buildcube_error.jpg > > > model中使用join表(维度表)中的字段作为filter过滤条件,在cube构建时报错,提示过滤条件不在given input > columns。看了下given input columns中的列都是事实表中字段。 > model中,已将维度表的过滤条件字段设置为dimension -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4965) model中使用join表中的字段作为filter过滤条件,在cube构建时报错
[ https://issues.apache.org/jira/browse/KYLIN-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317611#comment-17317611 ] Zhichao Zhang commented on KYLIN-4965: --- I will check this issue and fix it if there is problem. > model中使用join表中的字段作为filter过滤条件,在cube构建时报错 > > > Key: KYLIN-4965 > URL: https://issues.apache.org/jira/browse/KYLIN-4965 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v4.0.0-beta > Environment: cdh 5.14.2,hadoop 2.6.0 > kylin 4.0 beta > spark-2.4.6-bin-hadoop2.7 >Reporter: liulei_first >Priority: Major > Fix For: Future > > Attachments: kylin_buildcube_error.jpg > > > model中使用join表(维度表)中的字段作为filter过滤条件,在cube构建时报错,提示过滤条件不在given input > columns。看了下given input columns中的列都是事实表中字段。 > model中,已将维度表的过滤条件字段设置为dimension -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4944) Upgrade CentOS version, Hadoop version and Spark version for Kylin Docker image
Zhichao Zhang created KYLIN-4944: - Summary: Upgrade CentOS version, Hadoop version and Spark version for Kylin Docker image Key: KYLIN-4944 URL: https://issues.apache.org/jira/browse/KYLIN-4944 Project: Kylin Issue Type: Improvement Components: Others Affects Versions: v4.0.0-beta Reporter: Zhichao Zhang Fix For: v4.0.0-GA Currently, the centos version of kylin docker image is 6.9, and the yum of centos 6.9 is no longer maintained, it needs to upgrade to 7+. It also can upgrade hadoop to 2.8.5 , spark to 2.4.7. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4913) Update docker image for Kylin 4.0 Beta
[ https://issues.apache.org/jira/browse/KYLIN-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4913. --- Resolution: Fixed > Update docker image for Kylin 4.0 Beta > -- > > Key: KYLIN-4913 > URL: https://issues.apache.org/jira/browse/KYLIN-4913 > Project: Kylin > Issue Type: Improvement >Affects Versions: v4.0.0-beta >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > Update docker image for Kylin 4.0 Beta -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4937) Verify the uniqueness of the global dictionary after building global dictionary
Zhichao Zhang created KYLIN-4937: - Summary: Verify the uniqueness of the global dictionary after building global dictionary Key: KYLIN-4937 URL: https://issues.apache.org/jira/browse/KYLIN-4937 Project: Kylin Issue Type: Improvement Components: Spark Engine Affects Versions: v4.0.0-beta Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-GA Verify the uniqueness of the global dictionary after building global dictionary -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4936) Exactly aggregation can't transform to project
[ https://issues.apache.org/jira/browse/KYLIN-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang reassigned KYLIN-4936: - Assignee: Zhichao Zhang > Exactly aggregation can't transform to project > -- > > Key: KYLIN-4936 > URL: https://issues.apache.org/jira/browse/KYLIN-4936 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v4.0.0-beta >Reporter: ShengJun Zheng >Assignee: Zhichao Zhang >Priority: Major > Fix For: v4.0.0-GA > > > Exactly Aggregate can't transform to project, causing unnecessary spark > shuffle ! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4927) Forbid to use AE when building Global Dict
Zhichao Zhang created KYLIN-4927: - Summary: Forbid to use AE when building Global Dict Key: KYLIN-4927 URL: https://issues.apache.org/jira/browse/KYLIN-4927 Project: Kylin Issue Type: Improvement Components: Spark Engine Affects Versions: v4.0.0-beta Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-GA When building Global Dict, it's forbidden to use AE, which will change the partition num dynamically and lead to wrong Global Dict result. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4926) Optimize Global Dict building: replace operation 'mapPartitions.count()' with 'foreachPartitions'
[ https://issues.apache.org/jira/browse/KYLIN-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4926: -- Summary: Optimize Global Dict building: replace operation 'mapPartitions.count()' with 'foreachPartitions' (was: Optimize Global Dict build: replace operation 'mapPartitions.count()' with 'foreachPartitions') > Optimize Global Dict building: replace operation 'mapPartitions.count()' with > 'foreachPartitions' > - > > Key: KYLIN-4926 > URL: https://issues.apache.org/jira/browse/KYLIN-4926 > Project: Kylin > Issue Type: Improvement > Components: Spark Engine >Affects Versions: v4.0.0-beta >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-GA > > > Replace operation 'mapPartitions.count()' with 'foreachPartitions' when > building Global Dict -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4926) Optimize Global Dict build: replace operation 'mapPartitions.count()' with 'foreachPartitions'
Zhichao Zhang created KYLIN-4926: - Summary: Optimize Global Dict build: replace operation 'mapPartitions.count()' with 'foreachPartitions' Key: KYLIN-4926 URL: https://issues.apache.org/jira/browse/KYLIN-4926 Project: Kylin Issue Type: Improvement Components: Spark Engine Affects Versions: v4.0.0-beta Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-GA Replace operation 'mapPartitions.count()' with 'foreachPartitions' when building Global Dict -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (KYLIN-4916) ShardingReadRDD's partition number was set to shardNum,causing empty spark tasks
[ https://issues.apache.org/jira/browse/KYLIN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang closed KYLIN-4916. - > ShardingReadRDD's partition number was set to shardNum,causing empty spark > tasks > - > > Key: KYLIN-4916 > URL: https://issues.apache.org/jira/browse/KYLIN-4916 > Project: Kylin > Issue Type: Improvement >Reporter: ShengJun Zheng >Priority: Major > Fix For: v4.0.0-beta > > > when creating ShardingReadRDD, the created FileScanRDD's partition number was > set to shard number, causing too many empty tasks when shard number is big. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4916) ShardingReadRDD's partition number was set to shardNum,causing empty spark tasks
[ https://issues.apache.org/jira/browse/KYLIN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4916. --- Fix Version/s: v4.0.0-beta Resolution: Won't Fix > ShardingReadRDD's partition number was set to shardNum,causing empty spark > tasks > - > > Key: KYLIN-4916 > URL: https://issues.apache.org/jira/browse/KYLIN-4916 > Project: Kylin > Issue Type: Improvement >Reporter: ShengJun Zheng >Priority: Major > Fix For: v4.0.0-beta > > > when creating ShardingReadRDD, the created FileScanRDD's partition number was > set to shard number, causing too many empty tasks when shard number is big. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4916) ShardingReadRDD's partition number was set to shardNum,causing empty spark tasks
[ https://issues.apache.org/jira/browse/KYLIN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290638#comment-17290638 ] Zhichao Zhang commented on KYLIN-4916: --- This is expected behavior, similar to spark bucket function. Now I have add a parameter 'kylin.query.spark-engine.max-sharding-size-mb' to control the data size for each task, if the data size exceeds this value, it will fall back to non-sharding rdd. > ShardingReadRDD's partition number was set to shardNum,causing empty spark > tasks > - > > Key: KYLIN-4916 > URL: https://issues.apache.org/jira/browse/KYLIN-4916 > Project: Kylin > Issue Type: Improvement >Reporter: ShengJun Zheng >Priority: Major > > when creating ShardingReadRDD, the created FileScanRDD's partition number was > set to shard number, causing too many empty tasks when shard number is big. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-4914) Failed to query "select * from {fact_table}" if a fact table used in two different cubes
[ https://issues.apache.org/jira/browse/KYLIN-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289715#comment-17289715 ] Zhichao Zhang edited comment on KYLIN-4914 at 2/24/21, 7:02 AM: - 'select * from \{fact_table}' is querying detailed data but not cube data, it must throw 'No model found for OLAPContext' exception. If you enable pushdown and it will query data from external datasource, for example, hive. This behavior is expected. was (Author: zzcclp): 'select * from \{fact_table}' is querying detailed data but not cube data, it must throw 'No model found for OLAPContext' exception. If you enable pushdown and it will query data from external datasource, for example, hive. > Failed to query "select * from {fact_table}" if a fact table used in two > different cubes > - > > Key: KYLIN-4914 > URL: https://issues.apache.org/jira/browse/KYLIN-4914 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v3.0.2 >Reporter: xue lin >Priority: Major > > Steps to reproduce: > 1. Create one model only use one fact table > 2. Create two cubes with the same models, they have different dimensions and > measures, one cube measures contain COUNT_DISTINCT(return type : bitmap)the > other cube measures containEXTENDED_COLUMN (return type : > extendedcolumn(100)) and build the 2 cubes > 3. Run query with "select * from > {fact_table} > " with the 2 cubes in ready status, it should be failed with exception > message like > " > No model found for OLAPContext, > CUBE_NOT_CONTAIN_ALL_COLUMN[1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_ID], > > CUBE_NOT_CONTAIN_ALL_COLUMN[1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_STATUS_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_SOURCE_TYPE_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_TYPE_NAME, > > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUB_AR_RESOURCE_TYPE_NAME, > > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_TYPE_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_REGION, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_COMPANY_NAME, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_SALE_AREA, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_PROTOCOL_TYPE_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.RATE_BILLING_CYCLE_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.MAIN_AR_RESOURCE_TYPE_NAME, > > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_PRESENT_SOURCE_TYPE, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.ACTUAL_DAY_AMOUNT, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_TYPE, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_ADDRESS_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_STATUS_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_TYPE_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_PROTOCOL_TYPE, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.DAY_AMOUNT, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_SOURCE_TYPE, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_BUSINESS_NAME, > > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_PRESENT_SOURCE_TYPE_ID, > > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_NAME], > rel#2656421:OLAPTableScan.OLAP.[](table=[BOSS_DATABUS, > MIRROR_DATABUS_SUBSCRIPTIONFEE],ctx=,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, > 29, 30, 31, 32, 33]) while executing SQL: "select * from > MIRROR_DATABUS_SUBSCRIPTIONFEE limit 10" > > this issue is similar but different with > https://issues.apache.org/jira/browse/KYLIN-4120 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4914) Failed to query "select * from {fact_table}" if a fact table used in two different cubes
[ https://issues.apache.org/jira/browse/KYLIN-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289715#comment-17289715 ] Zhichao Zhang commented on KYLIN-4914: --- 'select * from \{fact_table}' is querying detailed data but not cube data, it must throw 'No model found for OLAPContext' exception. If you enable pushdown and it will query data from external datasource, for example, hive. > Failed to query "select * from {fact_table}" if a fact table used in two > different cubes > - > > Key: KYLIN-4914 > URL: https://issues.apache.org/jira/browse/KYLIN-4914 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v3.0.2 >Reporter: xue lin >Priority: Major > > Steps to reproduce: > 1. Create one model only use one fact table > 2. Create two cubes with the same models, they have different dimensions and > measures, one cube measures contain COUNT_DISTINCT(return type : bitmap)the > other cube measures containEXTENDED_COLUMN (return type : > extendedcolumn(100)) and build the 2 cubes > 3. Run query with "select * from > {fact_table} > " with the 2 cubes in ready status, it should be failed with exception > message like > " > No model found for OLAPContext, > CUBE_NOT_CONTAIN_ALL_COLUMN[1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_ID], > > CUBE_NOT_CONTAIN_ALL_COLUMN[1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_STATUS_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_SOURCE_TYPE_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_TYPE_NAME, > > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUB_AR_RESOURCE_TYPE_NAME, > > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_TYPE_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_REGION, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_COMPANY_NAME, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_SALE_AREA, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_PROTOCOL_TYPE_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.RATE_BILLING_CYCLE_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.MAIN_AR_RESOURCE_TYPE_NAME, > > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_PRESENT_SOURCE_TYPE, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.ACTUAL_DAY_AMOUNT, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_TYPE, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_ADDRESS_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_STATUS_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_TYPE_ID, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_PROTOCOL_TYPE, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.DAY_AMOUNT, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_SOURCE_TYPE, > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_BUSINESS_NAME, > > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_PRESENT_SOURCE_TYPE_ID, > > 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_NAME], > rel#2656421:OLAPTableScan.OLAP.[](table=[BOSS_DATABUS, > MIRROR_DATABUS_SUBSCRIPTIONFEE],ctx=,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, > 29, 30, 31, 32, 33]) while executing SQL: "select * from > MIRROR_DATABUS_SUBSCRIPTIONFEE limit 10" > > this issue is similar but different with > https://issues.apache.org/jira/browse/KYLIN-4120 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4913) Update docker image for Kylin 4.0 Beta
Zhichao Zhang created KYLIN-4913: - Summary: Update docker image for Kylin 4.0 Beta Key: KYLIN-4913 URL: https://issues.apache.org/jira/browse/KYLIN-4913 Project: Kylin Issue Type: Improvement Affects Versions: v4.0.0-beta Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta Update docker image for Kylin 4.0 Beta -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4910) Sparder URL is hardcoded to localhost
[ https://issues.apache.org/jira/browse/KYLIN-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang reassigned KYLIN-4910: - Assignee: ShengJun Zheng > Sparder URL is hardcoded to localhost > -- > > Key: KYLIN-4910 > URL: https://issues.apache.org/jira/browse/KYLIN-4910 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v4.0.0-beta >Reporter: ShengJun Zheng >Assignee: ShengJun Zheng >Priority: Minor > > When spark master is set to local, sparder url is hardcoded to "localhost". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4908) Segment pruner support integer partition col in spark query engine
[ https://issues.apache.org/jira/browse/KYLIN-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang reassigned KYLIN-4908: - Assignee: ShengJun Zheng > Segment pruner support integer partition col in spark query engine > -- > > Key: KYLIN-4908 > URL: https://issues.apache.org/jira/browse/KYLIN-4908 > Project: Kylin > Issue Type: Improvement >Reporter: ShengJun Zheng >Assignee: ShengJun Zheng >Priority: Major > > It's allowed to use int/bigint partition column from hive table to divide > KYLIN's segments, but segment pruner doesn't support prune segments based on > integer-type partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4905) Support limit .. offset ... in spark query engine
[ https://issues.apache.org/jira/browse/KYLIN-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286269#comment-17286269 ] Zhichao Zhang commented on KYLIN-4905: --- Thank [~zhengshengjun], please raise a pr. > Support limit .. offset ... in spark query engine > - > > Key: KYLIN-4905 > URL: https://issues.apache.org/jira/browse/KYLIN-4905 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Affects Versions: v4.0.0-alpha >Reporter: ShengJun Zheng >Priority: Major > Fix For: v4.0.0-GA > > > when use top-level result offset clause in query expression (ANSI SQL) : > limit xxx offset xxx in spark query engine,limit will not push down into > spark engine, and offset will not take effect. This is incompatible wIth > Kylin 2.x~3.x. > After looking through the code, i found it's because spark dose not support > limit ... offset ... now. There is a spark issue in progress: > https://issues.apache.org/jira/browse/SPARK-28330, which was created in 2019 > but still in progress. > So, should we support this feature temporarily in KYLIN? : > 1. push down limit to spark > 2. take result from starting offset in KYLIN query server > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4904) build cubes error
[ https://issues.apache.org/jira/browse/KYLIN-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286256#comment-17286256 ] Zhichao Zhang commented on KYLIN-4904: --- [~stayblank] , the version of spark is not the apache spark version, right? According to the error message, it can't create 'YarnClusterManager' for yarn. > build cubes error > --- > > Key: KYLIN-4904 > URL: https://issues.apache.org/jira/browse/KYLIN-4904 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v4.0.0-alpha > Environment: ubuntu20 >Reporter: stayblank >Priority: Major > Attachments: image-2021-02-08-18-36-21-136.png, > image-2021-02-08-18-37-03-126.png, image-2021-02-08-18-38-10-276.png, > image-2021-02-08-18-41-11-708.png > > > 1.不添加 `spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到 $kylin_home/lib,报这个错 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389) > at > org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: Error execute > org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob > at > org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92) > at > org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob.main(ResourceDetectBeforeCubingJob.java:100) > ... 13 more > Caused by: org.apache.spark.SparkException: Could not parse Master URL: 'yarn' > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2784) > at org.apache.spark.SparkContext.(SparkContext.scala:493) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921) > at > org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:283) > at > org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89) > ... 14 more > > > > 2.添加`spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到 $kylin_home/lib后报这个错 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389) > at > org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ExceptionInInitializerError > at >
[jira] [Commented] (KYLIN-4904) build cubes error
[ https://issues.apache.org/jira/browse/KYLIN-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285708#comment-17285708 ] Zhichao Zhang commented on KYLIN-4904: --- [~stayblank], can you show the properties you configured, the first step of building job is running on local mode, why it show the error message 'Could not parse Master URL: 'yarn'"? BTW, why you use spark-2.4.8-SNAPSHOT version? On kylin-on-parquet-v2 branch, we have updated the version of spark to 2.4.7, you can try to use the latest version of kylin on kylin-on-parquet-v2 branch. > build cubes error > --- > > Key: KYLIN-4904 > URL: https://issues.apache.org/jira/browse/KYLIN-4904 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v4.0.0-alpha > Environment: ubuntu20 >Reporter: stayblank >Priority: Major > Attachments: image-2021-02-08-18-36-21-136.png, > image-2021-02-08-18-37-03-126.png, image-2021-02-08-18-38-10-276.png, > image-2021-02-08-18-41-11-708.png > > > 1.不添加 `spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到 $kylin_home/lib,报这个错 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389) > at > org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: Error execute > org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob > at > org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92) > at > org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob.main(ResourceDetectBeforeCubingJob.java:100) > ... 13 more > Caused by: org.apache.spark.SparkException: Could not parse Master URL: 'yarn' > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2784) > at org.apache.spark.SparkContext.(SparkContext.scala:493) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921) > at > org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:283) > at > org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89) > ... 14 more > > > > 2.添加`spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到 $kylin_home/lib后报这个错 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389) > at > org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Resolved] (KYLIN-4846) Set the related query id to sparder job description
[ https://issues.apache.org/jira/browse/KYLIN-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4846. --- Fix Version/s: v4.0.0-beta Resolution: Fixed > Set the related query id to sparder job description > --- > > Key: KYLIN-4846 > URL: https://issues.apache.org/jira/browse/KYLIN-4846 > Project: Kylin > Issue Type: New Feature >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > Set the related query id to sparder job description -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4797) Correct inputRecordSizes of segment when there is no data in this segment
[ https://issues.apache.org/jira/browse/KYLIN-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4797. --- Fix Version/s: v4.0.0-alpha Resolution: Fixed > Correct inputRecordSizes of segment when there is no data in this segment > - > > Key: KYLIN-4797 > URL: https://issues.apache.org/jira/browse/KYLIN-4797 > Project: Kylin > Issue Type: Bug > Components: Spark Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-alpha > > > When there is no inputRecord, need to set inputRecordSize to 0. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4730) Add scan bytes metric to the query results
[ https://issues.apache.org/jira/browse/KYLIN-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4730. --- Fix Version/s: v2.6.6 v4.0.0-alpha Resolution: Fixed > Add scan bytes metric to the query results > -- > > Key: KYLIN-4730 > URL: https://issues.apache.org/jira/browse/KYLIN-4730 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-alpha, v2.6.6 > > > Add scan bytes metric to the query results -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4730) Add scan bytes metric to the query results
[ https://issues.apache.org/jira/browse/KYLIN-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4730: -- Fix Version/s: (was: v2.6.6) > Add scan bytes metric to the query results > -- > > Key: KYLIN-4730 > URL: https://issues.apache.org/jira/browse/KYLIN-4730 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-alpha > > > Add scan bytes metric to the query results -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4452) Kylin on Parquet with Docker
[ https://issues.apache.org/jira/browse/KYLIN-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4452. --- Resolution: Fixed > Kylin on Parquet with Docker > > > Key: KYLIN-4452 > URL: https://issues.apache.org/jira/browse/KYLIN-4452 > Project: Kylin > Issue Type: New Feature > Components: Storage - Parquet >Reporter: xuekaiqi >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-alpha > > > Since kylin can run independently of hadoop, containerized deployment is the > next step -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4894) Upgrade Apache Spark version to 2.4.7
[ https://issues.apache.org/jira/browse/KYLIN-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4894. --- Fix Version/s: v4.0.0-GA Resolution: Fixed > Upgrade Apache Spark version to 2.4.7 > - > > Key: KYLIN-4894 > URL: https://issues.apache.org/jira/browse/KYLIN-4894 > Project: Kylin > Issue Type: Improvement >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-GA > > > Upgrade Apache Spark version to 2.4.7 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4893) Optimize query performance when using shard by column
[ https://issues.apache.org/jira/browse/KYLIN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4893. --- Fix Version/s: v4.0.0-GA Resolution: Fixed > Optimize query performance when using shard by column > - > > Key: KYLIN-4893 > URL: https://issues.apache.org/jira/browse/KYLIN-4893 > Project: Kylin > Issue Type: Improvement >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-GA > > > Optimize query performance when using shard by column. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4892) Reduce the times of fetching files status from HDFS in FilePruner
[ https://issues.apache.org/jira/browse/KYLIN-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4892. --- Resolution: Fixed > Reduce the times of fetching files status from HDFS in FilePruner > - > > Key: KYLIN-4892 > URL: https://issues.apache.org/jira/browse/KYLIN-4892 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-GA > > > Reduce the times of fetching files status from HDFS in FilePruner -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4890) Use numSlices = 1 to reduce task num when executing sparder canary
[ https://issues.apache.org/jira/browse/KYLIN-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4890. --- Resolution: Fixed > Use numSlices = 1 to reduce task num when executing sparder canary > -- > > Key: KYLIN-4890 > URL: https://issues.apache.org/jira/browse/KYLIN-4890 > Project: Kylin > Issue Type: Improvement > Components: Spark Engine >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-GA > > > Use numSlices = 1 to reduce task num when executing sparder canary -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4894) Upgrade Apache Spark version to 2.4.7
Zhichao Zhang created KYLIN-4894: - Summary: Upgrade Apache Spark version to 2.4.7 Key: KYLIN-4894 URL: https://issues.apache.org/jira/browse/KYLIN-4894 Project: Kylin Issue Type: Improvement Reporter: Zhichao Zhang Assignee: Zhichao Zhang Upgrade Apache Spark version to 2.4.7 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4893) Optimize query performance when using shard by column
[ https://issues.apache.org/jira/browse/KYLIN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4893: -- Description: Optimize query performance when using shard by column. > Optimize query performance when using shard by column > - > > Key: KYLIN-4893 > URL: https://issues.apache.org/jira/browse/KYLIN-4893 > Project: Kylin > Issue Type: Improvement >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > > Optimize query performance when using shard by column. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4893) Optimize query performance when using shard by column
Zhichao Zhang created KYLIN-4893: - Summary: Optimize query performance when using shard by column Key: KYLIN-4893 URL: https://issues.apache.org/jira/browse/KYLIN-4893 Project: Kylin Issue Type: Improvement Reporter: Zhichao Zhang Assignee: Zhichao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4891) Set the default value of 'kylin.query.spark-engine.expose-sharding-trait' to false
[ https://issues.apache.org/jira/browse/KYLIN-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4891. --- Resolution: Won't Fix Don't fix, use other way to optimize > Set the default value of 'kylin.query.spark-engine.expose-sharding-trait' to > false > -- > > Key: KYLIN-4891 > URL: https://issues.apache.org/jira/browse/KYLIN-4891 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-GA > > > Set the default value of 'kylin.query.spark-engine.expose-sharding-trait' to > false -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4889) Query error when spark engine in local mode
[ https://issues.apache.org/jira/browse/KYLIN-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4889: -- Fix Version/s: (was: v4.0.0-beta) > Query error when spark engine in local mode > --- > > Key: KYLIN-4889 > URL: https://issues.apache.org/jira/browse/KYLIN-4889 > Project: Kylin > Issue Type: Bug >Affects Versions: v4.0.0-alpha >Reporter: Feng Zhu >Assignee: Feng Zhu >Priority: Major > Fix For: v4.0.0-GA > > > When i query with spark engine in local mode, with -Dspark.local=true, the > spark application was still submitted to yarn, and the following error > occurred: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 6, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: > cannot assign instance of scala.collection.immutable.List$SerializationProxy > to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of > type scala.collection.Seq in instance of > org.apache.spark.rdd.MapPartitionsRDD at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233) > at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at > java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:88) at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at > org.apache.spark.scheduler.Task.run(Task.scala:123) at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) Driver stacktrace: while executing > SQL: "select * from (select KYLIN_SALES.PART_DT , sum(KYLIN_SALES.PRICE ) > from KYLIN_SALES group by KYLIN_SALES.PART_DT union select > KYLIN_SALES.PART_DT , max(KYLIN_SALES.PRICE ) from KYLIN_SALES group by > KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , count(*) from > KYLIN_SALES group by KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , > count(distinct KYLIN_SALES.PRICE) from KYLIN_SALES group by > KYLIN_SALES.PART_DT) limit 501" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4889) Query error when spark engine in local mode
[ https://issues.apache.org/jira/browse/KYLIN-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4889: -- Fix Version/s: v4.0.0-GA > Query error when spark engine in local mode > --- > > Key: KYLIN-4889 > URL: https://issues.apache.org/jira/browse/KYLIN-4889 > Project: Kylin > Issue Type: Bug >Affects Versions: v4.0.0-alpha >Reporter: Feng Zhu >Assignee: Feng Zhu >Priority: Major > Fix For: v4.0.0-beta, v4.0.0-GA > > > When i query with spark engine in local mode, with -Dspark.local=true, the > spark application was still submitted to yarn, and the following error > occurred: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 6, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: > cannot assign instance of scala.collection.immutable.List$SerializationProxy > to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of > type scala.collection.Seq in instance of > org.apache.spark.rdd.MapPartitionsRDD at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233) > at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at > java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:88) at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at > org.apache.spark.scheduler.Task.run(Task.scala:123) at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) Driver stacktrace: while executing > SQL: "select * from (select KYLIN_SALES.PART_DT , sum(KYLIN_SALES.PRICE ) > from KYLIN_SALES group by KYLIN_SALES.PART_DT union select > KYLIN_SALES.PART_DT , max(KYLIN_SALES.PRICE ) from KYLIN_SALES group by > KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , count(*) from > KYLIN_SALES group by KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , > count(distinct KYLIN_SALES.PRICE) from KYLIN_SALES group by > KYLIN_SALES.PART_DT) limit 501" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4892) Reduce the times of fetching files status from HDFS in FilePruner
Zhichao Zhang created KYLIN-4892: - Summary: Reduce the times of fetching files status from HDFS in FilePruner Key: KYLIN-4892 URL: https://issues.apache.org/jira/browse/KYLIN-4892 Project: Kylin Issue Type: Improvement Components: Query Engine Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-GA Reduce the times of fetching files status from HDFS in FilePruner -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4891) Set the default value of 'kylin.query.spark-engine.expose-sharding-trait' to false
Zhichao Zhang created KYLIN-4891: - Summary: Set the default value of 'kylin.query.spark-engine.expose-sharding-trait' to false Key: KYLIN-4891 URL: https://issues.apache.org/jira/browse/KYLIN-4891 Project: Kylin Issue Type: Improvement Components: Query Engine Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-GA Set the default value of 'kylin.query.spark-engine.expose-sharding-trait' to false -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4890) Use numSlices = 1 to reduce task num when executing sparder canary
Zhichao Zhang created KYLIN-4890: - Summary: Use numSlices = 1 to reduce task num when executing sparder canary Key: KYLIN-4890 URL: https://issues.apache.org/jira/browse/KYLIN-4890 Project: Kylin Issue Type: Improvement Components: Spark Engine Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-GA Use numSlices = 1 to reduce task num when executing sparder canary -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4877) Use all dimension columns as sort columns when saving cuboid data
Zhichao Zhang created KYLIN-4877: - Summary: Use all dimension columns as sort columns when saving cuboid data Key: KYLIN-4877 URL: https://issues.apache.org/jira/browse/KYLIN-4877 Project: Kylin Issue Type: Improvement Components: Spark Engine Affects Versions: v4.0.0-alpha Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta Use all dimension columns as sort columns when saving cuboid data. This change will reduce the size of cuboid data (about the RLE encoding of parquet). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4875) Remove executor configurations when execute resource detect step (local mode)
Zhichao Zhang created KYLIN-4875: - Summary: Remove executor configurations when execute resource detect step (local mode) Key: KYLIN-4875 URL: https://issues.apache.org/jira/browse/KYLIN-4875 Project: Kylin Issue Type: Improvement Components: Spark Engine Affects Versions: v4.0.0-alpha Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta Remove executor configurations when execute resource detect step (local mode) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4872) Fix NPE when there are more than one segment if cube planner is open
[ https://issues.apache.org/jira/browse/KYLIN-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4872. --- Resolution: Fixed > Fix NPE when there are more than one segment if cube planner is open > > > Key: KYLIN-4872 > URL: https://issues.apache.org/jira/browse/KYLIN-4872 > Project: Kylin > Issue Type: Bug > Components: Spark Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > Fix NPE when there are more than one segment if cube planner is open -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4872) Fix NPE when there are more than one segment if cube planner is open
Zhichao Zhang created KYLIN-4872: - Summary: Fix NPE when there are more than one segment if cube planner is open Key: KYLIN-4872 URL: https://issues.apache.org/jira/browse/KYLIN-4872 Project: Kylin Issue Type: Bug Components: Spark Engine Affects Versions: v4.0.0-alpha Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta Fix NPE when there are more than one segment if cube planner is open -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4864) Support building and testing Kylin on ARM64 architecture platform
[ https://issues.apache.org/jira/browse/KYLIN-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262346#comment-17262346 ] Zhichao Zhang commented on KYLIN-4864: --- [~seanlau],sounds good, have you already made Kylin support ARM64 platform or met problems? Welcome to raise a PR to support this feature. > Support building and testing Kylin on ARM64 architecture platform > - > > Key: KYLIN-4864 > URL: https://issues.apache.org/jira/browse/KYLIN-4864 > Project: Kylin > Issue Type: Improvement >Reporter: liusheng >Priority: Major > > Currently, there are many softwares have support running on ARM64 platform. > We have also done many efforts about making big-data projects support ARM64 > platform. > For an example, Hadoop has published ARM64 platform specific packages: > [https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.0/hadoop-3.3.0-aarch64.tar.gz] > > and also have ARM specific CI job configured: > [https://ci-hadoop.apache.org/job/Hive-trunk-linux-ARM/] > It would be better to also enable ARM support and setup ARM CI for Kylin > projects -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4849) Support sum(case when...), sum(2*price+1), count(column) and more for Kylin 4
Zhichao Zhang created KYLIN-4849: - Summary: Support sum(case when...), sum(2*price+1), count(column) and more for Kylin 4 Key: KYLIN-4849 URL: https://issues.apache.org/jira/browse/KYLIN-4849 Project: Kylin Issue Type: New Feature Components: Query Engine Affects Versions: v4.0.0-alpha Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta Support sum(case when...), sum(2*price+1), count(column) and more Please reference to KYLIN-3358 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4846) Set the related query id to sparder job description
Zhichao Zhang created KYLIN-4846: - Summary: Set the related query id to sparder job description Key: KYLIN-4846 URL: https://issues.apache.org/jira/browse/KYLIN-4846 Project: Kylin Issue Type: New Feature Reporter: Zhichao Zhang Assignee: Zhichao Zhang Set the related query id to sparder job description -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4828) Add more sql test cases into NBuildAndQueryTest
[ https://issues.apache.org/jira/browse/KYLIN-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4828. --- Resolution: Fixed > Add more sql test cases into NBuildAndQueryTest > --- > > Key: KYLIN-4828 > URL: https://issues.apache.org/jira/browse/KYLIN-4828 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > Add more sql test cases into NBuildAndQueryTest -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (KYLIN-4828) Add more sql test cases into NBuildAndQueryTest
[ https://issues.apache.org/jira/browse/KYLIN-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang reopened KYLIN-4828: --- There are some un-supported sqls left. > Add more sql test cases into NBuildAndQueryTest > --- > > Key: KYLIN-4828 > URL: https://issues.apache.org/jira/browse/KYLIN-4828 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > Add more sql test cases into NBuildAndQueryTest -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4844) Add lookup table duplicate key check when building job
[ https://issues.apache.org/jira/browse/KYLIN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4844. --- Resolution: Fixed Please see : [https://github.com/apache/kylin/pull/1514] > Add lookup table duplicate key check when building job > -- > > Key: KYLIN-4844 > URL: https://issues.apache.org/jira/browse/KYLIN-4844 > Project: Kylin > Issue Type: Improvement >Affects Versions: v4.0.0-alpha >Reporter: Yaqian Zhang >Assignee: Yaqian Zhang >Priority: Major > Fix For: v4.0.0-beta > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4817) Refine Cube Migration Tool for Kylin4
[ https://issues.apache.org/jira/browse/KYLIN-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4817. --- Resolution: Fixed > Refine Cube Migration Tool for Kylin4 > - > > Key: KYLIN-4817 > URL: https://issues.apache.org/jira/browse/KYLIN-4817 > Project: Kylin > Issue Type: Improvement > Components: Client - CLI >Reporter: Xiaoxiang Yu >Assignee: Yaqian Zhang >Priority: Major > Fix For: v4.0.0-beta > > > - Collect and analyse all cube migration tool used in current Kylin. > - Verify if them works in Kylin4, if not, make them works. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4813) Refine spark logger for Kylin 4 build engine
[ https://issues.apache.org/jira/browse/KYLIN-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4813. --- Resolution: Fixed > Refine spark logger for Kylin 4 build engine > > > Key: KYLIN-4813 > URL: https://issues.apache.org/jira/browse/KYLIN-4813 > Project: Kylin > Issue Type: Improvement >Affects Versions: v4.0.0-alpha >Reporter: Xiaoxiang Yu >Assignee: Yaqian Zhang >Priority: Major > Fix For: v4.0.0-beta > > > - Separate spark log from kylin log > - Store driver/executor log into HDFS. > - Provided a API to view driver log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4820) Can not auto set spark resources configurations when building cube
[ https://issues.apache.org/jira/browse/KYLIN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4820. --- Resolution: Fixed > Can not auto set spark resources configurations when building cube > -- > > Key: KYLIN-4820 > URL: https://issues.apache.org/jira/browse/KYLIN-4820 > Project: Kylin > Issue Type: Bug > Components: Spark Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > Currently there are some spark resources configurations set in the > kylin-default.properties, so these configurations will override the ones set > by Kylin automatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4822) The metrics 'Total spark scan time' of query log is negative in some cases
[ https://issues.apache.org/jira/browse/KYLIN-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4822. --- Resolution: Fixed > The metrics 'Total spark scan time' of query log is negative in some cases > -- > > Key: KYLIN-4822 > URL: https://issues.apache.org/jira/browse/KYLIN-4822 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > The metrics 'Total spark scan time' of query log is negative in some cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4824) The metric 'Total scan bytes' of 'Query Log' is always 0 when querying
[ https://issues.apache.org/jira/browse/KYLIN-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4824. --- Resolution: Fixed > The metric 'Total scan bytes' of 'Query Log' is always 0 when querying > -- > > Key: KYLIN-4824 > URL: https://issues.apache.org/jira/browse/KYLIN-4824 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4829) Support to use thread-level SparkSession to execute query
[ https://issues.apache.org/jira/browse/KYLIN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4829. --- Resolution: Fixed > Support to use thread-level SparkSession to execute query > -- > > Key: KYLIN-4829 > URL: https://issues.apache.org/jira/browse/KYLIN-4829 > Project: Kylin > Issue Type: Improvement > Components: Query Engine, Spark Engine >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > Currently, when executing a query, it is impossible to configure proper > parameters for each query according to the data will be scanned, such as > spark.sql.shuffle.partitions, this will impact the performance of querying. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4843) Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4
[ https://issues.apache.org/jira/browse/KYLIN-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4843. --- Resolution: Fixed > Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4 > > > Key: KYLIN-4843 > URL: https://issues.apache.org/jira/browse/KYLIN-4843 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4843) Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4
[ https://issues.apache.org/jira/browse/KYLIN-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4843: -- Description: Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4 (was: Support INTERSECT_VALUE function for Kylin 4) > Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4 > > > Key: KYLIN-4843 > URL: https://issues.apache.org/jira/browse/KYLIN-4843 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4843) Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4
[ https://issues.apache.org/jira/browse/KYLIN-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4843: -- Summary: Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4 (was: Support INTERSECT_VALUE function for Kylin 4) > Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4 > > > Key: KYLIN-4843 > URL: https://issues.apache.org/jira/browse/KYLIN-4843 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > Support INTERSECT_VALUE function for Kylin 4 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4843) Support INTERSECT_VALUE function for Kylin 4
Zhichao Zhang created KYLIN-4843: - Summary: Support INTERSECT_VALUE function for Kylin 4 Key: KYLIN-4843 URL: https://issues.apache.org/jira/browse/KYLIN-4843 Project: Kylin Issue Type: New Feature Components: Query Engine Affects Versions: v4.0.0-alpha Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta Support INTERSECT_VALUE function for Kylin 4 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4842) Supports grouping sets function for Kylin 4
Zhichao Zhang created KYLIN-4842: - Summary: Supports grouping sets function for Kylin 4 Key: KYLIN-4842 URL: https://issues.apache.org/jira/browse/KYLIN-4842 Project: Kylin Issue Type: New Feature Components: Query Engine Affects Versions: v4.0.0-alpha Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta Currently Kylin 4 can not support grouping sets function, bacause it doesn't transform calcite grouping sets node to spark GroupingSets. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4840) When pushdown is enabled, execute sql which includes subquery will be pushdowned
[ https://issues.apache.org/jira/browse/KYLIN-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4840: -- Description: When pushdown is enabled, execute sql which includes subquery will be pushdowned. For example: {code:java} SELECT t1.week_beg_dt, t1.sum_price, t1.lstg_site_id FROM ( select KYLIN_CAL_DT.week_beg_dt, sum(price) as sum_price, lstg_site_id from KYLIN_SALES inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.cal_dt inner JOIN kylin_category_groupings ON KYLIN_SALES.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND KYLIN_SALES.lstg_site_id = kylin_category_groupings.site_id group by KYLIN_CAL_DT.week_beg_dt, lstg_site_id ) t1 inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT on t1.week_beg_dt = KYLIN_CAL_DT.week_beg_dt{code} was: When pushdown is enabled, execute sql which includes subquery will be pushdowned. For example: {code:java} SELECT t1.week_beg_dt, t1.sum_price, t1.lstg_site_id FROM ( select KYLIN_CAL_DT.week_beg_dt, sum(price) as sum_price, lstg_site_id from KYLIN_SALES inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.cal_dt inner JOIN kylin_category_groupings ON KYLIN_SALES.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND KYLIN_SALES.lstg_site_id = kylin_category_groupings.site_id group by KYLIN_CAL_DT.week_beg_dt, lstg_site_id ) t1 inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT on t1.week_beg_dt = KYLIN_CAL_DT.week_beg_dt{code} > When pushdown is enabled, execute sql which includes subquery will be > pushdowned > > > Key: KYLIN-4840 > URL: https://issues.apache.org/jira/browse/KYLIN-4840 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v4.0.0-alpha, v3.1.1 >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta, v3.1.2 > > > When pushdown is enabled, execute sql which includes subquery will be > pushdowned. > > For example: > {code:java} > SELECT t1.week_beg_dt, t1.sum_price, t1.lstg_site_id > FROM ( > select KYLIN_CAL_DT.week_beg_dt, sum(price) as sum_price, lstg_site_id > from KYLIN_SALES > inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT > ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.cal_dt > inner JOIN kylin_category_groupings > ON KYLIN_SALES.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND > KYLIN_SALES.lstg_site_id = kylin_category_groupings.site_id > group by KYLIN_CAL_DT.week_beg_dt, lstg_site_id > ) t1 > inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT > on t1.week_beg_dt = KYLIN_CAL_DT.week_beg_dt{code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4841) Spark RDD cache is invalid when building with spark engine
Zhichao Zhang created KYLIN-4841: - Summary: Spark RDD cache is invalid when building with spark engine Key: KYLIN-4841 URL: https://issues.apache.org/jira/browse/KYLIN-4841 Project: Kylin Issue Type: Bug Components: Spark Engine Affects Versions: v3.1.1 Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v3.1.2 Spark RDD cache is invalid when building with spark engine -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4840) When pushdown is enabled, execute sql which includes subquery will be pushdowned
Zhichao Zhang created KYLIN-4840: - Summary: When pushdown is enabled, execute sql which includes subquery will be pushdowned Key: KYLIN-4840 URL: https://issues.apache.org/jira/browse/KYLIN-4840 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v3.1.1, v4.0.0-alpha Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta, v3.1.2 When pushdown is enabled, execute sql which includes subquery will be pushdowned. For example: {code:java} SELECT t1.week_beg_dt, t1.sum_price, t1.lstg_site_id FROM ( select KYLIN_CAL_DT.week_beg_dt, sum(price) as sum_price, lstg_site_id from KYLIN_SALES inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.cal_dt inner JOIN kylin_category_groupings ON KYLIN_SALES.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND KYLIN_SALES.lstg_site_id = kylin_category_groupings.site_id group by KYLIN_CAL_DT.week_beg_dt, lstg_site_id ) t1 inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT on t1.week_beg_dt = KYLIN_CAL_DT.week_beg_dt{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4828) Add more sql test cases into NBuildAndQueryTest
[ https://issues.apache.org/jira/browse/KYLIN-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4828. --- Resolution: Fixed Done > Add more sql test cases into NBuildAndQueryTest > --- > > Key: KYLIN-4828 > URL: https://issues.apache.org/jira/browse/KYLIN-4828 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > Add more sql test cases into NBuildAndQueryTest -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4829) Support to use thread-level SparkSession to execute query
Zhichao Zhang created KYLIN-4829: - Summary: Support to use thread-level SparkSession to execute query Key: KYLIN-4829 URL: https://issues.apache.org/jira/browse/KYLIN-4829 Project: Kylin Issue Type: Improvement Components: Query Engine, Spark Engine Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta Currently, when executing a query, it is impossible to configure proper parameters for each query according to the data will be scanned, such as spark.sql.shuffle.partitions, this will impact the performance of querying. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4828) Add more sql test cases into NBuildAndQueryTest
Zhichao Zhang created KYLIN-4828: - Summary: Add more sql test cases into NBuildAndQueryTest Key: KYLIN-4828 URL: https://issues.apache.org/jira/browse/KYLIN-4828 Project: Kylin Issue Type: Improvement Components: Query Engine Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta Add more sql test cases into NBuildAndQueryTest -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4824) The metric 'Total scan bytes' of 'Query Log' is always 0 when querying
Zhichao Zhang created KYLIN-4824: - Summary: The metric 'Total scan bytes' of 'Query Log' is always 0 when querying Key: KYLIN-4824 URL: https://issues.apache.org/jira/browse/KYLIN-4824 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v4.0.0-alpha Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4738) The order in the returned result is wrong when use window function to query in kylin
[ https://issues.apache.org/jira/browse/KYLIN-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4738. --- Fix Version/s: v4.0.0-beta Resolution: Won't Fix Don't need to fix. > The order in the returned result is wrong when use window function to query > in kylin > > > Key: KYLIN-4738 > URL: https://issues.apache.org/jira/browse/KYLIN-4738 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > Attachments: image-2020-09-02-12-15-27-097.png, > image-2020-09-02-12-16-47-699.png > > > Use below sql to query in kylin: > {code:java} > // code placeholder > SELECT PART_DT, LSTG_FORMAT_NAME, SUM(PRICE) AS GMV,FIRST_VALUE(SUM(PRICE)) > OVER(PARTITION BY LSTG_FORMAT_NAME ORDER BY PART_DT) AS > "FIRST",LAST_VALUE(SUM(PRICE)) OVER(PARTITION BY LSTG_FORMAT_NAME ORDER BY > PART_DT) AS "CURRENT",LAG(SUM(PRICE), 1, 0.0) OVER(PARTITION BY > LSTG_FORMAT_NAME ORDER BY PART_DT) AS "PREV",LEAD(SUM(PRICE), 1, 0.0) > OVER(PARTITION BY LSTG_FORMAT_NAME ORDER BY PART_DT) AS "NEXT",NTILE(4) OVER > (PARTITION BY LSTG_FORMAT_NAME ORDER BY PART_DT) AS "QUARTER"FROM > KYLIN_SALESINNER JOIN KYLIN_ACCOUNT as SELLER_ACCOUNTON KYLIN_SALES.SELLER_ID > = SELLER_ACCOUNT.ACCOUNT_IDINNER JOIN KYLIN_COUNTRY as SELLER_COUNTRYON > SELLER_ACCOUNT.ACCOUNT_COUNTRY = SELLER_COUNTRY.COUNTRYWHERE PART_DT >= > '2012-12-30' and PART_DT < '2013-01-03' and SELLER_COUNTRY.COUNTRY in > ('CN')GROUP BY PART_DT, LSTG_FORMAT_NAMEORDER BY PART_DT LIMIT 5; > {code} > the result is: > !image-2020-09-02-12-15-27-097.png! > > and the expected result is : > !image-2020-09-02-12-16-47-699.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4822) The metrics 'Total spark scan time' of query log is negative in some cases
Zhichao Zhang created KYLIN-4822: - Summary: The metrics 'Total spark scan time' of query log is negative in some cases Key: KYLIN-4822 URL: https://issues.apache.org/jira/browse/KYLIN-4822 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v4.0.0-alpha Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta The metrics 'Total spark scan time' of query log is negative in some cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4820) Can not auto set spark resources configurations when building cube
Zhichao Zhang created KYLIN-4820: - Summary: Can not auto set spark resources configurations when building cube Key: KYLIN-4820 URL: https://issues.apache.org/jira/browse/KYLIN-4820 Project: Kylin Issue Type: Bug Components: Spark Engine Affects Versions: v4.0.0-alpha Reporter: Zhichao Zhang Assignee: Zhichao Zhang Fix For: v4.0.0-beta Currently there are some spark resources configurations set in the kylin-default.properties, so these configurations will override the ones set by Kylin automatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4812) Create Dimension Dictionary With Spark Failed
[ https://issues.apache.org/jira/browse/KYLIN-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231576#comment-17231576 ] Zhichao Zhang commented on KYLIN-4812: --- The spark which kylin used doesn't support to connect Hive 3.1.1, maybe consider to set 'kylin.engine.spark-dimension-dictionary' and 'kylin.engine.spark-udc-dictionary' to false. > Create Dimension Dictionary With Spark Failed > - > > Key: KYLIN-4812 > URL: https://issues.apache.org/jira/browse/KYLIN-4812 > Project: Kylin > Issue Type: Bug > Components: Spark Engine >Affects Versions: v3.1.1 >Reporter: vincent zeng >Priority: Major > > Hi, team. When set `kylin.engine.spark-dimension-dictionary=true`, step > `Build Dimension Dictionary with Spark` failed. > Error Log: > {code:java} > Driver stacktrace: > at > org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42) > at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684) > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in > stage 0.0 (TID 8, emr-worker-2.cluster-46685, executor 3): > java.lang.NullPointerException > at org.apache.kylin.common.KylinConfig.getManager(KylinConfig.java:462) > at org.apache.kylin.cube.CubeManager.getInstance(CubeManager.java:106) > at > org.apache.kylin.engine.spark.SparkBuildDictionary$DimensionDictsBuildFunction.init(SparkBuildDictionary.java:246) > at > org.apache.kylin.engine.spark.SparkBuildDictionary$DimensionDictsBuildFunction.call(SparkBuildDictionary.java:257) > at > org.apache.kylin.engine.spark.SparkBuildDictionary$DimensionDictsBuildFunction.call(SparkBuildDictionary.java:219) > at > org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043) > at > org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at scala.collection.AbstractIterator.to(Iterator.scala:1334) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1334) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:121) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4737) The precision in the returned result is different from the one by Spark SQL
[ https://issues.apache.org/jira/browse/KYLIN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4737. --- Fix Version/s: v4.0.0-beta Resolution: Won't Fix The root cause of this issue is that the algorithm used to calculate the 'percentile' values in Kylin 4.0 is different from the one of Spark SQL, so there is a little difference between them. > The precision in the returned result is different from the one by Spark SQL > --- > > Key: KYLIN-4737 > URL: https://issues.apache.org/jira/browse/KYLIN-4737 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > Attachments: image-2020-09-02-12-07-18-076.png, > image-2020-09-02-12-07-49-492.png > > > The precision in the returned result is different from the one by Spark SQL, > for example: > the result from kylin: > !image-2020-09-02-12-07-18-076.png! > the result from SparkSQL: > !image-2020-09-02-12-07-49-492.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4812) Create Dimension Dictionary With Spark Failed
[ https://issues.apache.org/jira/browse/KYLIN-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230397#comment-17230397 ] Zhichao Zhang commented on KYLIN-4812: --- [~vincentzeng] which versions of hadoop and hive you used? > Create Dimension Dictionary With Spark Failed > - > > Key: KYLIN-4812 > URL: https://issues.apache.org/jira/browse/KYLIN-4812 > Project: Kylin > Issue Type: Bug > Components: Spark Engine >Affects Versions: v3.1.1 >Reporter: vincent zeng >Priority: Major > > Hi, team. When set `kylin.engine.spark-dimension-dictionary=true`, step > `Build Dimension Dictionary with Spark` failed. > Error Log: > {code:java} > Driver stacktrace: > at > org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42) > at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684) > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in > stage 0.0 (TID 8, emr-worker-2.cluster-46685, executor 3): > java.lang.NullPointerException > at org.apache.kylin.common.KylinConfig.getManager(KylinConfig.java:462) > at org.apache.kylin.cube.CubeManager.getInstance(CubeManager.java:106) > at > org.apache.kylin.engine.spark.SparkBuildDictionary$DimensionDictsBuildFunction.init(SparkBuildDictionary.java:246) > at > org.apache.kylin.engine.spark.SparkBuildDictionary$DimensionDictsBuildFunction.call(SparkBuildDictionary.java:257) > at > org.apache.kylin.engine.spark.SparkBuildDictionary$DimensionDictsBuildFunction.call(SparkBuildDictionary.java:219) > at > org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043) > at > org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at scala.collection.AbstractIterator.to(Iterator.scala:1334) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1334) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:121) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KYLIN-4811) Support cube level configuration for BuildingJob
[ https://issues.apache.org/jira/browse/KYLIN-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang resolved KYLIN-4811. --- Resolution: Fixed Done > Support cube level configuration for BuildingJob > > > Key: KYLIN-4811 > URL: https://issues.apache.org/jira/browse/KYLIN-4811 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v4.0.0-alpha >Reporter: Xiaoxiang Yu >Assignee: Xiaoxiang Yu >Priority: Major > Fix For: v4.0.0-beta > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4811) Support cube level configuration for BuildingJob
[ https://issues.apache.org/jira/browse/KYLIN-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang reassigned KYLIN-4811: - Assignee: Xiaoxiang Yu > Support cube level configuration for BuildingJob > > > Key: KYLIN-4811 > URL: https://issues.apache.org/jira/browse/KYLIN-4811 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v4.0.0-alpha >Reporter: Xiaoxiang Yu >Assignee: Xiaoxiang Yu >Priority: Major > Fix For: v4.0.0-beta > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4762) Optimize join where there is the same shardby partition num on join key
[ https://issues.apache.org/jira/browse/KYLIN-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222000#comment-17222000 ] Zhichao Zhang commented on KYLIN-4762: --- The [PR1463|https://github.com/apache/kylin/pull/1463] only implemented the function of this optimization on kylin side, there are still some changes on spark side need to be implemented. Currently, on spark side, it used 'F__KYLIN_SALES_PART_DTXXX' as agg key, but FileScan use '17#0' which is equals to shardby column as partition key, although 'F__KYLIN_SALES_PART_DTXXX' is the alias of '17#0', it's not 'semanticEquals' between them, so in EnsureRequirements of spark side, it will still add exchange operator. I will raise another pr to solve this. > Optimize join where there is the same shardby partition num on join key > --- > > Key: KYLIN-4762 > URL: https://issues.apache.org/jira/browse/KYLIN-4762 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v4.0.0-beta >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Attachments: shardby_join.png > > > Optimize join by reducing shuffle when there is the same shard by partition > number on join key. > When execute this sql, > {code:java} > // code placeholder > select m.seller_id, m.part_dt, sum(m.price) as s > from kylin_sales m > left join ( > select m1.part_dt as pd, count(distinct m1.SELLER_ID) as m1, count(1) as m2 > > from kylin_sales m1 > where m1.part_dt = '2012-01-05' > group by m1.part_dt > ) j > on m.part_dt = j.pd > where m.lstg_format_name = 'FP-GTC' > and m.part_dt = '2012-01-05' > group by m.seller_id, m.part_dt limit 100; > {code} > the execution plan is shown below: > !shardby_join.png! > But the join key part_dt has the same shard by partition number, it can be > optimized to reduce shuffle, similar to bucket join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4797) Correct inputRecordSizes of segment when there is no data in this segment
Zhichao Zhang created KYLIN-4797: - Summary: Correct inputRecordSizes of segment when there is no data in this segment Key: KYLIN-4797 URL: https://issues.apache.org/jira/browse/KYLIN-4797 Project: Kylin Issue Type: Bug Components: Spark Engine Affects Versions: v4.0.0-alpha Reporter: Zhichao Zhang Assignee: Zhichao Zhang When there is no inputRecord, need to set inputRecordSize to 0. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4776) Release Kylin v3.1.1
[ https://issues.apache.org/jira/browse/KYLIN-4776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217273#comment-17217273 ] Zhichao Zhang commented on KYLIN-4776: --- ||Issue ID||Verified?||Documentation updated?||Others|| |KYLIN-4628| Yes| No need| No| |KYLIN-4585| Yes| No need| No| |KYLIN-4634| Yes| No need| No| |KYLIN-4576| Yes| No need| No| > Release Kylin v3.1.1 > > > Key: KYLIN-4776 > URL: https://issues.apache.org/jira/browse/KYLIN-4776 > Project: Kylin > Issue Type: Test > Components: Release >Affects Versions: v3.1.0 >Reporter: Xiaoxiang Yu >Assignee: Xiaoxiang Yu >Priority: Critical > Original Estimate: 336h > Remaining Estimate: 336h > > h2. Release Plan for Kylin v3.1.1 > > ||Key||Content|| > |Release Manager|Xiaoxiang Yu| > |Voting Date|2020/10/15| > > > h3. Issue List > [https://issues.apache.org/jira/projects/KYLIN/versions/12348354] > h3. Issue Verification Assignee > # Go to [https://issues.apache.org/jira/issues/?jql=] and input JQL, and you > will know the issue that you need to verify. > # After issue verified, please add a comment, in this comment, provided a > table to show result of each issue. Ask help from RM if faced any trouble. > ||Assignee ||Issue||Count|| > |Zhichao Zhang|project = 12316121 AND fixVersion = 12348354 and (assignee = > tianhui5 OR assignee = xxyu )|9| > |Yaqian Zhang|project = 12316121 AND fixVersion = 12348354 and (assignee = > gxcheng )|13| > |Rupeng Wang|project = 12316121 AND fixVersion = 12348354 and (assignee = > itzhangqiang or assignee = zhangyaqian or assignee = zhangzc and assignee = > julianpan )|10| > |Xiaoxiang Yu|project = 12316121 AND fixVersion = 12348354 and (assignee = > xiaoge )|14| > > h3. Hadoop3 patch PR > - Patch : [https://github.com/apache/kylin/pull/1434] > - Verifcation at CDH 6.3: > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4791) Throws exception 'UnsupportedOperationException: empty.reduceLeft' when there are cast expressions in the filters of FilePruner
[ https://issues.apache.org/jira/browse/KYLIN-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichao Zhang updated KYLIN-4791: -- Description: When execute function 'pruneSegments' of FilePruner, if there are some cast expressions in filter, it will throw exception 'UnsupportedOperationException: empty.reduceLeft'. Solution: Convert cast expressions in filter to attribute before translating filter. was:When execute function 'pruneSegments' of FilePruner, if there are some cast expressions in filter, it will throw exception 'UnsupportedOperationException: empty.reduceLeft'. > Throws exception 'UnsupportedOperationException: empty.reduceLeft' when there > are cast expressions in the filters of FilePruner > --- > > Key: KYLIN-4791 > URL: https://issues.apache.org/jira/browse/KYLIN-4791 > Project: Kylin > Issue Type: Bug > Components: Query Engine, Spark Engine >Affects Versions: v4.0.0-alpha >Reporter: Zhichao Zhang >Assignee: Zhichao Zhang >Priority: Minor > Fix For: v4.0.0-beta > > > When execute function 'pruneSegments' of FilePruner, if there are some cast > expressions in filter, it will throw exception > 'UnsupportedOperationException: empty.reduceLeft'. > > Solution: > Convert cast expressions in filter to attribute before translating filter. -- This message was sent by Atlassian Jira (v8.3.4#803005)