[jira] [Updated] (KYLIN-5187) Support Alluxio Local Cache + Soft Affinity to speed up the query performance on the cloud

2022-05-24 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-5187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-5187:
--
Description: 
Support Alluxio Local Cache + Soft Affinity to speed up the query performance 
on the cloud.

Currently, this feature could be only supported for Spark 3.1.

Refer to : [Presto RaptorX|https://prestodb.io/blog/2021/02/04/raptorx]

  was:
Support Alluxio Local Cache + Soft Affinity to speed up the query performance 
on the cloud.

Refer to : [Presto RaptorX|https://prestodb.io/blog/2021/02/04/raptorx]


> Support Alluxio Local Cache + Soft Affinity to speed up the query performance 
> on the cloud
> --
>
> Key: KYLIN-5187
> URL: https://issues.apache.org/jira/browse/KYLIN-5187
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Affects Versions: v4.0.1
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Major
> Fix For: v4.1.0
>
>
> Support Alluxio Local Cache + Soft Affinity to speed up the query performance 
> on the cloud.
> Currently, this feature could be only supported for Spark 3.1.
> Refer to : [Presto RaptorX|https://prestodb.io/blog/2021/02/04/raptorx]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KYLIN-5187) Support Alluxio Local Cache + Soft Affinity to speed up the query performance on the cloud

2022-05-24 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-5187:
-

 Summary: Support Alluxio Local Cache + Soft Affinity to speed up 
the query performance on the cloud
 Key: KYLIN-5187
 URL: https://issues.apache.org/jira/browse/KYLIN-5187
 Project: Kylin
  Issue Type: New Feature
  Components: Query Engine
Affects Versions: v4.0.1
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.1.0


Support Alluxio Local Cache + Soft Affinity to speed up the query performance 
on the cloud.

Refer to : [Presto RaptorX|https://prestodb.io/blog/2021/02/04/raptorx]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KYLIN-5057) CubeBuildJob in Kylin4.0 run failed when open Spark3.1 AQE

2021-08-09 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395861#comment-17395861
 ] 

Zhichao  Zhang commented on KYLIN-5057:
---

Thank [~tianhui5], I reproduce this issue, and will check it ASAP

> CubeBuildJob in Kylin4.0 run failed when open Spark3.1 AQE
> --
>
> Key: KYLIN-5057
> URL: https://issues.apache.org/jira/browse/KYLIN-5057
> Project: Kylin
>  Issue Type: Bug
>Reporter: tianhui
>Priority: Major
> Attachments: errorStack.log
>
>
> When I use standalone docker image with my own Kylin tar.gz, I add a 
> configuration to kylin.properties.
> `kylin.engine.spark-conf.spark.sql.adaptive.enabled=true`
> Then I build sample cube and run failed. I can get the error stack in Spark 
> driver log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-5058) Throws ConcurrentModificationException when building cube

2021-08-09 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-5058:
--
Description: 
When building cubes, there is an error 'ConcurrentModificationException' thrown 
by Spark, but it does not impact the building job.

This is a known issue and was fixed on Spark 3.1.2.

https://issues.apache.org/jira/browse/SPARK-34731

Fixed PR: [https://github.com/apache/spark/pull/31826] 

So It needs to upgrade Spark 3.X version to 3.1.2

  was:
When building cubes, there is an error 'ConcurrentModificationException' thrown 
by Spark, but it does not impact the building job.

This is a known issue and was fixed on Spark 3.1.2.

So It needs to upgrade Spark 3.X version to 3.1.2


> Throws ConcurrentModificationException when building cube
> -
>
> Key: KYLIN-5058
> URL: https://issues.apache.org/jira/browse/KYLIN-5058
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v4.0.0-beta
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.1.0
>
>
> When building cubes, there is an error 'ConcurrentModificationException' 
> thrown by Spark, but it does not impact the building job.
> This is a known issue and was fixed on Spark 3.1.2.
> https://issues.apache.org/jira/browse/SPARK-34731
> Fixed PR: [https://github.com/apache/spark/pull/31826] 
> So It needs to upgrade Spark 3.X version to 3.1.2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-5058) Throws ConcurrentModificationException when building cube

2021-08-09 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-5058:
-

 Summary: Throws ConcurrentModificationException when building cube
 Key: KYLIN-5058
 URL: https://issues.apache.org/jira/browse/KYLIN-5058
 Project: Kylin
  Issue Type: Bug
  Components: Spark Engine
Affects Versions: v4.0.0-beta
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.1.0


When building cubes, there is an error 'ConcurrentModificationException' thrown 
by Spark, but it does not impact the building job.

This is a known issue and was fixed on Spark 3.1.2.

So It needs to upgrade Spark 3.X version to 3.1.2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-5057) CubeBuildJob in Kylin4.0 run failed when open Spark3.1 AQE

2021-08-08 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395486#comment-17395486
 ] 

Zhichao  Zhang commented on KYLIN-5057:
---

[~tianhui5], I can't reproduce this error on my docker env, can you show your 
'kylin.properties' to check?

> CubeBuildJob in Kylin4.0 run failed when open Spark3.1 AQE
> --
>
> Key: KYLIN-5057
> URL: https://issues.apache.org/jira/browse/KYLIN-5057
> Project: Kylin
>  Issue Type: Bug
>Reporter: tianhui
>Priority: Major
> Attachments: errorStack.log
>
>
> When I use standalone docker image with my own Kylin tar.gz, I add a 
> configuration to kylin.properties.
> `kylin.engine.spark-conf.spark.sql.adaptive.enabled=true`
> Then I build sample cube and run failed. I can get the error stack in Spark 
> driver log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-5057) CubeBuildJob in Kylin4.0 run failed when open Spark3.1 AQE

2021-08-06 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394700#comment-17394700
 ] 

Zhichao  Zhang commented on KYLIN-5057:
---

Thank [~tianhui5], I will check this issue.

> CubeBuildJob in Kylin4.0 run failed when open Spark3.1 AQE
> --
>
> Key: KYLIN-5057
> URL: https://issues.apache.org/jira/browse/KYLIN-5057
> Project: Kylin
>  Issue Type: Bug
>Reporter: tianhui
>Priority: Major
> Attachments: errorStack.log
>
>
> When I use standalone docker image with my own Kylin tar.gz, I add a 
> configuration to kylin.properties.
> `kylin.engine.spark-conf.spark.sql.adaptive.enabled=true`
> Then I build sample cube and run failed. I can get the error stack in Spark 
> driver log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KYLIN-4762) Optimize join where there is the same shardby partition num on join key

2021-08-03 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang reopened KYLIN-4762:
---

This issue isn't resolved yet.

> Optimize join where there is the same shardby partition num on join key
> ---
>
> Key: KYLIN-4762
> URL: https://issues.apache.org/jira/browse/KYLIN-4762
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0
>
> Attachments: shardby_join.png
>
>
> Optimize join by reducing shuffle when there is the same shard by partition 
> number on join key.
> When execute this sql,
> {code:java}
> // code placeholder
> select m.seller_id, m.part_dt, sum(m.price) as s 
> from kylin_sales m 
> left join (
>   select m1.part_dt as pd, count(distinct m1.SELLER_ID) as m1, count(1) as m2 
>  
>   from kylin_sales m1
>   where m1.part_dt = '2012-01-05'
>   group by m1.part_dt 
>   ) j 
>   on m.part_dt = j.pd
>   where m.lstg_format_name = 'FP-GTC' 
>   and m.part_dt = '2012-01-05'
>   group by m.seller_id, m.part_dt limit 100;
> {code}
> the execution plan is shown below:
> !shardby_join.png!
> But the join key part_dt has the same shard by partition number, it can be 
> optimized to reduce shuffle, similar to bucket join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4762) Optimize join where there is the same shardby partition num on join key

2021-08-03 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4762:
--
Fix Version/s: (was: v4.0.0)
   v4.1.0

> Optimize join where there is the same shardby partition num on join key
> ---
>
> Key: KYLIN-4762
> URL: https://issues.apache.org/jira/browse/KYLIN-4762
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.1.0
>
> Attachments: shardby_join.png
>
>
> Optimize join by reducing shuffle when there is the same shard by partition 
> number on join key.
> When execute this sql,
> {code:java}
> // code placeholder
> select m.seller_id, m.part_dt, sum(m.price) as s 
> from kylin_sales m 
> left join (
>   select m1.part_dt as pd, count(distinct m1.SELLER_ID) as m1, count(1) as m2 
>  
>   from kylin_sales m1
>   where m1.part_dt = '2012-01-05'
>   group by m1.part_dt 
>   ) j 
>   on m.part_dt = j.pd
>   where m.lstg_format_name = 'FP-GTC' 
>   and m.part_dt = '2012-01-05'
>   group by m.seller_id, m.part_dt limit 100;
> {code}
> the execution plan is shown below:
> !shardby_join.png!
> But the join key part_dt has the same shard by partition number, it can be 
> optimized to reduce shuffle, similar to bucket join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-5014) Spark driver log is abnormal in yarn cluster mode

2021-06-24 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-5014.
---
Resolution: Fixed

> Spark driver log is abnormal in yarn cluster mode
> -
>
> Key: KYLIN-5014
> URL: https://issues.apache.org/jira/browse/KYLIN-5014
> Project: Kylin
>  Issue Type: Bug
>Reporter: Yaqian Zhang
>Assignee: Yaqian Zhang
>Priority: Minor
> Fix For: v4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-5008) backend spark was failed, but corresponding job status is shown as finished in WebUI

2021-06-11 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang reassigned KYLIN-5008:
-

Assignee: Yaqian Zhang  (was: Zhichao  Zhang)

> backend spark was failed, but corresponding job status is shown as finished 
> in WebUI 
> -
>
> Key: KYLIN-5008
> URL: https://issues.apache.org/jira/browse/KYLIN-5008
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-beta
>Reporter: ZHANGHONGJIA
>Assignee: Yaqian Zhang
>Priority: Major
> Attachments: image-2021-06-10-16-46-35-919.png, merge-job.log
>
>
> According to the log shown as below, the spark project was failed due to 
> Container killed by YARN for exceeding memory limits , but in Kylin WebUI 
> ,the status of the mergeJob is finished.  Besides, the amount of data in the 
> segment after merged is as three times as the amount of actual data . It 
> seems that kylin didn't monitor the failure of this merge job.
>  
> Here is the merge job log :
> ===
>  at 
> org.apache.kylin.engine.spark.job.BuildLayoutWithUpdate$1.call(BuildLayoutWithUpdate.java:43)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  ... 3 more
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 244 in stage 1108.0 failed 4 times, most recent failure: Lost task 244.3 
> in stage 1108.0 (TID 78736, r4200h1-app.travelsky.com, executor 109): 
> ExecutorLostFailure (executor 109 exited caused by one of the running tasks) 
> Reason: Container killed by YARN for exceeding memory limits. 39.0 GB of 36 
> GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead 
> or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
> Driver stacktrace:
>  at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>  at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
>  at scala.Option.foreach(Option.scala:257)
>  at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
>  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
>  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:167)
>  ... 34 more
> }
> RetryInfo{
>  overrideConf : \{spark.executor.memory=36618MB, 
> spark.executor.memoryOverhead=7323MB},
>  throwable : java.lang.RuntimeException: Error execute 
> org.apache.kylin.engine.spark.job.CubeMergeJob
>  at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92)
>  at org.apache.spark.application.JobWorker$$anon$2.run(JobWorker.scala:55)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: org.apache.spark.SparkException: Job 
> aborted.
>  at 
> org.apache.kylin.engine.spark.job.BuildLayoutWithUpdate.updateLayout(BuildLayoutWithUpdate.java:70)
>  at 
> org.apache.kylin.engine.spark.job.CubeMergeJob.mergeSegments(CubeMergeJob.java:122)
>  at 
> org.apache.kylin.engine.spark.job.CubeMergeJob.doExecute(CubeMergeJob.java:82)
>  at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:298)
>  at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89)
>  ... 4 more
> Caused by: 

[jira] [Assigned] (KYLIN-5008) backend spark was failed, but corresponding job status is shown as finished in WebUI

2021-06-10 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang reassigned KYLIN-5008:
-

Assignee: Zhichao  Zhang

> backend spark was failed, but corresponding job status is shown as finished 
> in WebUI 
> -
>
> Key: KYLIN-5008
> URL: https://issues.apache.org/jira/browse/KYLIN-5008
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-beta
>Reporter: ZHANGHONGJIA
>Assignee: Zhichao  Zhang
>Priority: Major
> Attachments: image-2021-06-10-16-46-35-919.png, merge-job.log
>
>
> According to the log shown as below, the spark project was failed due to 
> Container killed by YARN for exceeding memory limits , but in Kylin WebUI 
> ,the status of the mergeJob is finished.  Besides, the amount of data in the 
> segment after merged is as three times as the amount of actual data . It 
> seems that kylin didn't monitor the failure of this merge job.
>  
> Here is the merge job log :
> ===
>  at 
> org.apache.kylin.engine.spark.job.BuildLayoutWithUpdate$1.call(BuildLayoutWithUpdate.java:43)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  ... 3 more
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 244 in stage 1108.0 failed 4 times, most recent failure: Lost task 244.3 
> in stage 1108.0 (TID 78736, r4200h1-app.travelsky.com, executor 109): 
> ExecutorLostFailure (executor 109 exited caused by one of the running tasks) 
> Reason: Container killed by YARN for exceeding memory limits. 39.0 GB of 36 
> GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead 
> or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
> Driver stacktrace:
>  at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>  at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
>  at scala.Option.foreach(Option.scala:257)
>  at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
>  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
>  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:167)
>  ... 34 more
> }
> RetryInfo{
>  overrideConf : \{spark.executor.memory=36618MB, 
> spark.executor.memoryOverhead=7323MB},
>  throwable : java.lang.RuntimeException: Error execute 
> org.apache.kylin.engine.spark.job.CubeMergeJob
>  at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92)
>  at org.apache.spark.application.JobWorker$$anon$2.run(JobWorker.scala:55)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: org.apache.spark.SparkException: Job 
> aborted.
>  at 
> org.apache.kylin.engine.spark.job.BuildLayoutWithUpdate.updateLayout(BuildLayoutWithUpdate.java:70)
>  at 
> org.apache.kylin.engine.spark.job.CubeMergeJob.mergeSegments(CubeMergeJob.java:122)
>  at 
> org.apache.kylin.engine.spark.job.CubeMergeJob.doExecute(CubeMergeJob.java:82)
>  at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:298)
>  at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89)
>  ... 4 more
> Caused by: 

[jira] [Resolved] (KYLIN-4926) Optimize Global Dict building: replace operation 'mapPartitions.count()' with 'foreachPartitions'

2021-05-10 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4926.
---
Resolution: Fixed

> Optimize Global Dict building: replace operation 'mapPartitions.count()' with 
> 'foreachPartitions'
> -
>
> Key: KYLIN-4926
> URL: https://issues.apache.org/jira/browse/KYLIN-4926
> Project: Kylin
>  Issue Type: Improvement
>  Components: Spark Engine
>Affects Versions: v4.0.0-beta
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Replace operation 'mapPartitions.count()' with 'foreachPartitions' when 
> building Global Dict



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4741) Support to config the sparder application name

2021-05-10 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4741.
---
Fix Version/s: v4.0.0-beta
   Resolution: Fixed

> Support to config the sparder application name
> --
>
> Key: KYLIN-4741
> URL: https://issues.apache.org/jira/browse/KYLIN-4741
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Support to config the sparder application name



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4937) Verify the uniqueness of the global dictionary after building global dictionary

2021-05-10 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4937.
---
Resolution: Fixed

> Verify the uniqueness of the global dictionary after building global 
> dictionary
> ---
>
> Key: KYLIN-4937
> URL: https://issues.apache.org/jira/browse/KYLIN-4937
> Project: Kylin
>  Issue Type: Improvement
>  Components: Spark Engine
>Affects Versions: v4.0.0-beta
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Verify the uniqueness of the global dictionary after building global 
> dictionary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4927) Forbid to use AE when building Global Dict

2021-05-10 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4927.
---
Resolution: Fixed

> Forbid to use AE when building Global Dict
> --
>
> Key: KYLIN-4927
> URL: https://issues.apache.org/jira/browse/KYLIN-4927
> Project: Kylin
>  Issue Type: Improvement
>  Components: Spark Engine
>Affects Versions: v4.0.0-beta
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> When building Global Dict, it's forbidden to use AE, which will change the 
> partition num dynamically and lead to wrong Global Dict result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4936) Exactly aggregation can't transform to project

2021-05-10 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4936.
---
Resolution: Fixed

> Exactly aggregation can't transform to project
> --
>
> Key: KYLIN-4936
> URL: https://issues.apache.org/jira/browse/KYLIN-4936
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-beta
>Reporter: ShengJun Zheng
>Assignee: Zhichao  Zhang
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> Exactly Aggregate can't transform to project, causing unnecessary spark 
> shuffle !



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4980) Support prunning segments from complex filter conditions

2021-04-25 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4980.
---
Resolution: Fixed

> Support prunning segments from complex filter conditions
> 
>
> Key: KYLIN-4980
> URL: https://issues.apache.org/jira/browse/KYLIN-4980
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-beta
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> Segment pruner can't prune segment from complex filter conditions, like the 
> filter condition below:
> "where (col_a = xxx and col_partition = xxx) or (col_b=xxx and col_partition 
> =  xxx)" 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4967) Forbid to set 'spark.sql.adaptive.enabled' to true when building cube with Spark 2.X

2021-04-11 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4967:
--
Affects Version/s: v4.0.0-beta

> Forbid to set 'spark.sql.adaptive.enabled' to true when building cube with 
> Spark 2.X
> 
>
> Key: KYLIN-4967
> URL: https://issues.apache.org/jira/browse/KYLIN-4967
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-beta
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> With spark 2.X, when set 'spark.sql.adaptive.enabled' to true, it will impact 
> the actually partition count when doing repartition with spark, which will 
> lead to the wrong results for global dict and repartition by shardby column.
> For example, after writing a cuboid data, kylin will repartition the cuboid 
> data with 3 partition if need, but if 'spark.sql.adaptive.enabled' is true, 
> spark will optimize the partition num to 1, which leads to wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4967) Forbid to set 'spark.sql.adaptive.enabled' to true when building cube with Spark 2.X

2021-04-11 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4967:
--
Fix Version/s: v4.0.0-GA

> Forbid to set 'spark.sql.adaptive.enabled' to true when building cube with 
> Spark 2.X
> 
>
> Key: KYLIN-4967
> URL: https://issues.apache.org/jira/browse/KYLIN-4967
> Project: Kylin
>  Issue Type: Bug
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> With spark 2.X, when set 'spark.sql.adaptive.enabled' to true, it will impact 
> the actually partition count when doing repartition with spark, which will 
> lead to the wrong results for global dict and repartition by shardby column.
> For example, after writing a cuboid data, kylin will repartition the cuboid 
> data with 3 partition if need, but if 'spark.sql.adaptive.enabled' is true, 
> spark will optimize the partition num to 1, which leads to wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4965) model中使用join表中的字段作为filter过滤条件,在cube构建时报错

2021-04-11 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4965:
--
Fix Version/s: (was: Future)
   v4.0.0-GA

> model中使用join表中的字段作为filter过滤条件,在cube构建时报错
> 
>
> Key: KYLIN-4965
> URL: https://issues.apache.org/jira/browse/KYLIN-4965
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v4.0.0-beta
> Environment: cdh 5.14.2,hadoop 2.6.0
> kylin 4.0 beta
> spark-2.4.6-bin-hadoop2.7
>Reporter: liulei_first
>Assignee: Zhichao  Zhang
>Priority: Major
> Fix For: v4.0.0-GA
>
> Attachments: kylin_buildcube_error.jpg
>
>
> model中使用join表(维度表)中的字段作为filter过滤条件,在cube构建时报错,提示过滤条件不在given input 
> columns。看了下given input columns中的列都是事实表中字段。
> model中,已将维度表的过滤条件字段设置为dimension



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4967) Forbid to set 'spark.sql.adaptive.enabled' to true when building cube with Spark 2.X

2021-04-11 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4967:
-

 Summary: Forbid to set 'spark.sql.adaptive.enabled' to true when 
building cube with Spark 2.X
 Key: KYLIN-4967
 URL: https://issues.apache.org/jira/browse/KYLIN-4967
 Project: Kylin
  Issue Type: Bug
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang


With spark 2.X, when set 'spark.sql.adaptive.enabled' to true, it will impact 
the actually partition count when doing repartition with spark, which will lead 
to the wrong results for global dict and repartition by shardby column.

For example, after writing a cuboid data, kylin will repartition the cuboid 
data with 3 partition if need, but if 'spark.sql.adaptive.enabled' is true, 
spark will optimize the partition num to 1, which leads to wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-4965) model中使用join表中的字段作为filter过滤条件,在cube构建时报错

2021-04-11 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang reassigned KYLIN-4965:
-

Assignee: Zhichao  Zhang

> model中使用join表中的字段作为filter过滤条件,在cube构建时报错
> 
>
> Key: KYLIN-4965
> URL: https://issues.apache.org/jira/browse/KYLIN-4965
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v4.0.0-beta
> Environment: cdh 5.14.2,hadoop 2.6.0
> kylin 4.0 beta
> spark-2.4.6-bin-hadoop2.7
>Reporter: liulei_first
>Assignee: Zhichao  Zhang
>Priority: Major
> Fix For: Future
>
> Attachments: kylin_buildcube_error.jpg
>
>
> model中使用join表(维度表)中的字段作为filter过滤条件,在cube构建时报错,提示过滤条件不在given input 
> columns。看了下given input columns中的列都是事实表中字段。
> model中,已将维度表的过滤条件字段设置为dimension



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4965) model中使用join表中的字段作为filter过滤条件,在cube构建时报错

2021-04-08 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317611#comment-17317611
 ] 

Zhichao  Zhang commented on KYLIN-4965:
---

I will check this issue and fix it if there is problem.

> model中使用join表中的字段作为filter过滤条件,在cube构建时报错
> 
>
> Key: KYLIN-4965
> URL: https://issues.apache.org/jira/browse/KYLIN-4965
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v4.0.0-beta
> Environment: cdh 5.14.2,hadoop 2.6.0
> kylin 4.0 beta
> spark-2.4.6-bin-hadoop2.7
>Reporter: liulei_first
>Priority: Major
> Fix For: Future
>
> Attachments: kylin_buildcube_error.jpg
>
>
> model中使用join表(维度表)中的字段作为filter过滤条件,在cube构建时报错,提示过滤条件不在given input 
> columns。看了下given input columns中的列都是事实表中字段。
> model中,已将维度表的过滤条件字段设置为dimension



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4944) Upgrade CentOS version, Hadoop version and Spark version for Kylin Docker image

2021-03-23 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4944:
-

 Summary: Upgrade CentOS version, Hadoop version and Spark version 
for Kylin Docker image
 Key: KYLIN-4944
 URL: https://issues.apache.org/jira/browse/KYLIN-4944
 Project: Kylin
  Issue Type: Improvement
  Components: Others
Affects Versions: v4.0.0-beta
Reporter: Zhichao  Zhang
 Fix For: v4.0.0-GA


Currently, the centos version of kylin docker image is 6.9, and the yum of 
centos 6.9 is no longer maintained, it needs to upgrade to 7+.
It also can upgrade hadoop to 2.8.5 , spark to 2.4.7.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4913) Update docker image for Kylin 4.0 Beta

2021-03-20 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4913.
---
Resolution: Fixed

> Update docker image for Kylin 4.0 Beta
> --
>
> Key: KYLIN-4913
> URL: https://issues.apache.org/jira/browse/KYLIN-4913
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v4.0.0-beta
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Update docker image for Kylin 4.0 Beta



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4937) Verify the uniqueness of the global dictionary after building global dictionary

2021-03-18 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4937:
-

 Summary: Verify the uniqueness of the global dictionary after 
building global dictionary
 Key: KYLIN-4937
 URL: https://issues.apache.org/jira/browse/KYLIN-4937
 Project: Kylin
  Issue Type: Improvement
  Components: Spark Engine
Affects Versions: v4.0.0-beta
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-GA


Verify the uniqueness of the global dictionary after building global dictionary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-4936) Exactly aggregation can't transform to project

2021-03-18 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang reassigned KYLIN-4936:
-

Assignee: Zhichao  Zhang

> Exactly aggregation can't transform to project
> --
>
> Key: KYLIN-4936
> URL: https://issues.apache.org/jira/browse/KYLIN-4936
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-beta
>Reporter: ShengJun Zheng
>Assignee: Zhichao  Zhang
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> Exactly Aggregate can't transform to project, causing unnecessary spark 
> shuffle !



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4927) Forbid to use AE when building Global Dict

2021-03-04 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4927:
-

 Summary: Forbid to use AE when building Global Dict
 Key: KYLIN-4927
 URL: https://issues.apache.org/jira/browse/KYLIN-4927
 Project: Kylin
  Issue Type: Improvement
  Components: Spark Engine
Affects Versions: v4.0.0-beta
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-GA


When building Global Dict, it's forbidden to use AE, which will change the 
partition num dynamically and lead to wrong Global Dict result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4926) Optimize Global Dict building: replace operation 'mapPartitions.count()' with 'foreachPartitions'

2021-03-04 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4926:
--
Summary: Optimize Global Dict building: replace operation 
'mapPartitions.count()' with 'foreachPartitions'  (was: Optimize Global Dict 
build: replace operation 'mapPartitions.count()' with 'foreachPartitions')

> Optimize Global Dict building: replace operation 'mapPartitions.count()' with 
> 'foreachPartitions'
> -
>
> Key: KYLIN-4926
> URL: https://issues.apache.org/jira/browse/KYLIN-4926
> Project: Kylin
>  Issue Type: Improvement
>  Components: Spark Engine
>Affects Versions: v4.0.0-beta
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Replace operation 'mapPartitions.count()' with 'foreachPartitions' when 
> building Global Dict



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4926) Optimize Global Dict build: replace operation 'mapPartitions.count()' with 'foreachPartitions'

2021-03-04 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4926:
-

 Summary: Optimize Global Dict build: replace operation 
'mapPartitions.count()' with 'foreachPartitions'
 Key: KYLIN-4926
 URL: https://issues.apache.org/jira/browse/KYLIN-4926
 Project: Kylin
  Issue Type: Improvement
  Components: Spark Engine
Affects Versions: v4.0.0-beta
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-GA


Replace operation 'mapPartitions.count()' with 'foreachPartitions' when 
building Global Dict



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (KYLIN-4916) ShardingReadRDD's partition number was set to shardNum,causing empty spark tasks

2021-02-25 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang closed KYLIN-4916.
-

> ShardingReadRDD's partition number was set to shardNum,causing empty spark 
> tasks 
> -
>
> Key: KYLIN-4916
> URL: https://issues.apache.org/jira/browse/KYLIN-4916
> Project: Kylin
>  Issue Type: Improvement
>Reporter: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-beta
>
>
> when creating ShardingReadRDD, the created FileScanRDD's partition number was 
> set to shard number, causing too many empty tasks when shard number is big.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4916) ShardingReadRDD's partition number was set to shardNum,causing empty spark tasks

2021-02-25 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4916.
---
Fix Version/s: v4.0.0-beta
   Resolution: Won't Fix

> ShardingReadRDD's partition number was set to shardNum,causing empty spark 
> tasks 
> -
>
> Key: KYLIN-4916
> URL: https://issues.apache.org/jira/browse/KYLIN-4916
> Project: Kylin
>  Issue Type: Improvement
>Reporter: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-beta
>
>
> when creating ShardingReadRDD, the created FileScanRDD's partition number was 
> set to shard number, causing too many empty tasks when shard number is big.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4916) ShardingReadRDD's partition number was set to shardNum,causing empty spark tasks

2021-02-24 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290638#comment-17290638
 ] 

Zhichao  Zhang commented on KYLIN-4916:
---

This is expected behavior, similar to spark bucket function. 

Now I have add a parameter 'kylin.query.spark-engine.max-sharding-size-mb'  to 
control the data size for each task, if the data size exceeds this value, it 
will fall back to non-sharding rdd.

> ShardingReadRDD's partition number was set to shardNum,causing empty spark 
> tasks 
> -
>
> Key: KYLIN-4916
> URL: https://issues.apache.org/jira/browse/KYLIN-4916
> Project: Kylin
>  Issue Type: Improvement
>Reporter: ShengJun Zheng
>Priority: Major
>
> when creating ShardingReadRDD, the created FileScanRDD's partition number was 
> set to shard number, causing too many empty tasks when shard number is big.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KYLIN-4914) Failed to query "select * from {fact_table}" if a fact table used in two different cubes

2021-02-23 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289715#comment-17289715
 ] 

Zhichao  Zhang edited comment on KYLIN-4914 at 2/24/21, 7:02 AM:
-

'select * from \{fact_table}' is querying detailed data but not cube data, it 
must throw 'No model found for OLAPContext'  exception. If you enable pushdown 
and it will query data from external datasource, for example, hive.

This behavior is expected.


was (Author: zzcclp):
'select * from \{fact_table}' is querying detailed data but not cube data, it 
must throw 'No model found for OLAPContext'  exception. If you enable pushdown 
and it will query data from external datasource, for example, hive.

> Failed to query "select * from {fact_table}" if a fact table used  in two 
> different cubes
> -
>
> Key: KYLIN-4914
> URL: https://issues.apache.org/jira/browse/KYLIN-4914
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v3.0.2
>Reporter: xue lin
>Priority: Major
>
> Steps to reproduce:
>  1. Create one model only use one fact table
>  2. Create two cubes with the same models, they have different dimensions and 
> measures, one cube measures contain COUNT_DISTINCT(return type : bitmap)the 
> other cube measures containEXTENDED_COLUMN (return type : 
> extendedcolumn(100)) and build the 2 cubes
>  3. Run query with "select * from
> {fact_table}
> " with the 2 cubes in ready status, it should be failed with exception 
> message like
>  "
> No model found for OLAPContext, 
> CUBE_NOT_CONTAIN_ALL_COLUMN[1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_ID],
>  
> CUBE_NOT_CONTAIN_ALL_COLUMN[1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_STATUS_ID,
>  1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_SOURCE_TYPE_ID, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_TYPE_NAME,
>  
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUB_AR_RESOURCE_TYPE_NAME,
>  
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_TYPE_ID,
>  1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_REGION, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_COMPANY_NAME, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_ID,
>  1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_SALE_AREA, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_PROTOCOL_TYPE_ID,
>  1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.RATE_BILLING_CYCLE_ID, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.MAIN_AR_RESOURCE_TYPE_NAME,
>  
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_PRESENT_SOURCE_TYPE,
>  1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.ACTUAL_DAY_AMOUNT, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_TYPE, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_ADDRESS_ID, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_STATUS_ID, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_TYPE_ID, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_PROTOCOL_TYPE,
>  1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.DAY_AMOUNT, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_SOURCE_TYPE, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_BUSINESS_NAME,
>  
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_PRESENT_SOURCE_TYPE_ID,
>  
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_NAME],
>  rel#2656421:OLAPTableScan.OLAP.[](table=[BOSS_DATABUS, 
> MIRROR_DATABUS_SUBSCRIPTIONFEE],ctx=,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
> 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
> 29, 30, 31, 32, 33]) while executing SQL: "select * from 
> MIRROR_DATABUS_SUBSCRIPTIONFEE limit 10"
>  
> this issue is similar but different with  
> https://issues.apache.org/jira/browse/KYLIN-4120



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4914) Failed to query "select * from {fact_table}" if a fact table used in two different cubes

2021-02-23 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289715#comment-17289715
 ] 

Zhichao  Zhang commented on KYLIN-4914:
---

'select * from \{fact_table}' is querying detailed data but not cube data, it 
must throw 'No model found for OLAPContext'  exception. If you enable pushdown 
and it will query data from external datasource, for example, hive.

> Failed to query "select * from {fact_table}" if a fact table used  in two 
> different cubes
> -
>
> Key: KYLIN-4914
> URL: https://issues.apache.org/jira/browse/KYLIN-4914
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v3.0.2
>Reporter: xue lin
>Priority: Major
>
> Steps to reproduce:
>  1. Create one model only use one fact table
>  2. Create two cubes with the same models, they have different dimensions and 
> measures, one cube measures contain COUNT_DISTINCT(return type : bitmap)the 
> other cube measures containEXTENDED_COLUMN (return type : 
> extendedcolumn(100)) and build the 2 cubes
>  3. Run query with "select * from
> {fact_table}
> " with the 2 cubes in ready status, it should be failed with exception 
> message like
>  "
> No model found for OLAPContext, 
> CUBE_NOT_CONTAIN_ALL_COLUMN[1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_ID],
>  
> CUBE_NOT_CONTAIN_ALL_COLUMN[1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_STATUS_ID,
>  1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_SOURCE_TYPE_ID, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_TYPE_NAME,
>  
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUB_AR_RESOURCE_TYPE_NAME,
>  
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_TYPE_ID,
>  1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_REGION, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_COMPANY_NAME, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_ID,
>  1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_SALE_AREA, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_PROTOCOL_TYPE_ID,
>  1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.RATE_BILLING_CYCLE_ID, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.MAIN_AR_RESOURCE_TYPE_NAME,
>  
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_PRESENT_SOURCE_TYPE,
>  1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.ACTUAL_DAY_AMOUNT, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_TYPE, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_ADDRESS_ID, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_STATUS_ID, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_TYPE_ID, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_PROTOCOL_TYPE,
>  1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.DAY_AMOUNT, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_SOURCE_TYPE, 
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.SUBSCRIBER_BUSINESS_NAME,
>  
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.PBO_PRESENT_SOURCE_TYPE_ID,
>  
> 1_8e4ee38:BOSS_DATABUS.MIRROR_DATABUS_SUBSCRIPTIONFEE.CUSTOMER_CREATE_CHANNEL_NAME],
>  rel#2656421:OLAPTableScan.OLAP.[](table=[BOSS_DATABUS, 
> MIRROR_DATABUS_SUBSCRIPTIONFEE],ctx=,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
> 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
> 29, 30, 31, 32, 33]) while executing SQL: "select * from 
> MIRROR_DATABUS_SUBSCRIPTIONFEE limit 10"
>  
> this issue is similar but different with  
> https://issues.apache.org/jira/browse/KYLIN-4120



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4913) Update docker image for Kylin 4.0 Beta

2021-02-23 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4913:
-

 Summary: Update docker image for Kylin 4.0 Beta
 Key: KYLIN-4913
 URL: https://issues.apache.org/jira/browse/KYLIN-4913
 Project: Kylin
  Issue Type: Improvement
Affects Versions: v4.0.0-beta
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta


Update docker image for Kylin 4.0 Beta



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-4910) Sparder URL is hardcoded to localhost

2021-02-22 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang reassigned KYLIN-4910:
-

Assignee: ShengJun Zheng

> Sparder URL is hardcoded to localhost 
> --
>
> Key: KYLIN-4910
> URL: https://issues.apache.org/jira/browse/KYLIN-4910
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-beta
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Minor
>
>  When spark master is set to local, sparder url is hardcoded to "localhost".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-4908) Segment pruner support integer partition col in spark query engine

2021-02-22 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang reassigned KYLIN-4908:
-

Assignee: ShengJun Zheng

> Segment pruner support integer partition col in spark query engine
> --
>
> Key: KYLIN-4908
> URL: https://issues.apache.org/jira/browse/KYLIN-4908
> Project: Kylin
>  Issue Type: Improvement
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Major
>
> It's allowed to use int/bigint  partition column from hive table to divide 
> KYLIN's segments, but segment pruner doesn't support prune segments based on 
> integer-type partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4905) Support limit .. offset ... in spark query engine

2021-02-17 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286269#comment-17286269
 ] 

Zhichao  Zhang commented on KYLIN-4905:
---

Thank [~zhengshengjun], please raise a pr.

> Support limit .. offset ... in spark query engine
> -
>
> Key: KYLIN-4905
> URL: https://issues.apache.org/jira/browse/KYLIN-4905
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> when use top-level result offset clause in query expression (ANSI SQL) :  
> limit xxx offset xxx in spark query engine,limit will not push down into 
> spark engine, and offset will not take effect. This is incompatible wIth 
> Kylin 2.x~3.x.
> After looking through the code, i found it's because spark dose not support 
> limit ... offset ... now. There is a spark issue in progress: 
> https://issues.apache.org/jira/browse/SPARK-28330, which was created in 2019 
> but still in progress.
> So, should we support this feature temporarily in KYLIN? :
>    1. push down limit to spark
>    2. take result from starting offset  in KYLIN query server
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4904) build cubes error

2021-02-17 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286256#comment-17286256
 ] 

Zhichao  Zhang commented on KYLIN-4904:
---

[~stayblank] , the version of spark is not the apache spark version, right? 
According to the error message, it can't create 'YarnClusterManager' for yarn.

> build  cubes  error
> ---
>
> Key: KYLIN-4904
> URL: https://issues.apache.org/jira/browse/KYLIN-4904
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
> Environment: ubuntu20
>Reporter: stayblank
>Priority: Major
> Attachments: image-2021-02-08-18-36-21-136.png, 
> image-2021-02-08-18-37-03-126.png, image-2021-02-08-18-38-10-276.png, 
> image-2021-02-08-18-41-11-708.png
>
>
> 1.不添加  `spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到  $kylin_home/lib,报这个错
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Error execute 
> org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92)
>   at 
> org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob.main(ResourceDetectBeforeCubingJob.java:100)
>   ... 13 more
> Caused by: org.apache.spark.SparkException: Could not parse Master URL: 'yarn'
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2784)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:283)
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89)
>   ... 14 more
>  
>  
>  
> 2.添加`spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到 $kylin_home/lib后报这个错
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ExceptionInInitializerError
>   at 
> 

[jira] [Commented] (KYLIN-4904) build cubes error

2021-02-17 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285708#comment-17285708
 ] 

Zhichao  Zhang commented on KYLIN-4904:
---

[~stayblank], can you show the properties you configured, the first step of 
building job is running on local mode, why it show the error message 'Could not 
parse Master URL: 'yarn'"?

BTW, why you use spark-2.4.8-SNAPSHOT version? On kylin-on-parquet-v2 branch, 
we have updated the version of spark to 2.4.7, you can try to use the latest 
version of kylin on kylin-on-parquet-v2 branch.

> build  cubes  error
> ---
>
> Key: KYLIN-4904
> URL: https://issues.apache.org/jira/browse/KYLIN-4904
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
> Environment: ubuntu20
>Reporter: stayblank
>Priority: Major
> Attachments: image-2021-02-08-18-36-21-136.png, 
> image-2021-02-08-18-37-03-126.png, image-2021-02-08-18-38-10-276.png, 
> image-2021-02-08-18-41-11-708.png
>
>
> 1.不添加  `spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到  $kylin_home/lib,报这个错
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Error execute 
> org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92)
>   at 
> org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob.main(ResourceDetectBeforeCubingJob.java:100)
>   ... 13 more
> Caused by: org.apache.spark.SparkException: Could not parse Master URL: 'yarn'
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2784)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:283)
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89)
>   ... 14 more
>  
>  
>  
> 2.添加`spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到 $kylin_home/lib后报这个错
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> 

[jira] [Resolved] (KYLIN-4846) Set the related query id to sparder job description

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4846.
---
Fix Version/s: v4.0.0-beta
   Resolution: Fixed

> Set the related query id to sparder job description
> ---
>
> Key: KYLIN-4846
> URL: https://issues.apache.org/jira/browse/KYLIN-4846
> Project: Kylin
>  Issue Type: New Feature
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Set the related query id to sparder job description



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4797) Correct inputRecordSizes of segment when there is no data in this segment

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4797.
---
Fix Version/s: v4.0.0-alpha
   Resolution: Fixed

> Correct inputRecordSizes of segment when there is no data in this segment
> -
>
> Key: KYLIN-4797
> URL: https://issues.apache.org/jira/browse/KYLIN-4797
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-alpha
>
>
> When there is no inputRecord, need to set inputRecordSize to 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4730) Add scan bytes metric to the query results

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4730.
---
Fix Version/s: v2.6.6
   v4.0.0-alpha
   Resolution: Fixed

> Add scan bytes metric to the query results
> --
>
> Key: KYLIN-4730
> URL: https://issues.apache.org/jira/browse/KYLIN-4730
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-alpha, v2.6.6
>
>
> Add scan bytes metric to the query results



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4730) Add scan bytes metric to the query results

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4730:
--
Fix Version/s: (was: v2.6.6)

> Add scan bytes metric to the query results
> --
>
> Key: KYLIN-4730
> URL: https://issues.apache.org/jira/browse/KYLIN-4730
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-alpha
>
>
> Add scan bytes metric to the query results



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4452) Kylin on Parquet with Docker

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4452.
---
Resolution: Fixed

> Kylin on Parquet with Docker
> 
>
> Key: KYLIN-4452
> URL: https://issues.apache.org/jira/browse/KYLIN-4452
> Project: Kylin
>  Issue Type: New Feature
>  Components: Storage - Parquet
>Reporter: xuekaiqi
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-alpha
>
>
> Since kylin can run independently of hadoop, containerized deployment is the 
> next step



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4894) Upgrade Apache Spark version to 2.4.7

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4894.
---
Fix Version/s: v4.0.0-GA
   Resolution: Fixed

> Upgrade Apache Spark version to 2.4.7
> -
>
> Key: KYLIN-4894
> URL: https://issues.apache.org/jira/browse/KYLIN-4894
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Upgrade Apache Spark version to 2.4.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4893) Optimize query performance when using shard by column

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4893.
---
Fix Version/s: v4.0.0-GA
   Resolution: Fixed

> Optimize query performance when using shard by column
> -
>
> Key: KYLIN-4893
> URL: https://issues.apache.org/jira/browse/KYLIN-4893
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Optimize query performance when using shard by column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4892) Reduce the times of fetching files status from HDFS in FilePruner

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4892.
---
Resolution: Fixed

> Reduce the times of fetching files status from HDFS in FilePruner
> -
>
> Key: KYLIN-4892
> URL: https://issues.apache.org/jira/browse/KYLIN-4892
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Reduce the times of fetching files status from HDFS in FilePruner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4890) Use numSlices = 1 to reduce task num when executing sparder canary

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4890.
---
Resolution: Fixed

> Use numSlices = 1 to reduce task num when executing sparder canary
> --
>
> Key: KYLIN-4890
> URL: https://issues.apache.org/jira/browse/KYLIN-4890
> Project: Kylin
>  Issue Type: Improvement
>  Components: Spark Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Use numSlices = 1 to reduce task num when executing sparder canary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4894) Upgrade Apache Spark version to 2.4.7

2021-02-02 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4894:
-

 Summary: Upgrade Apache Spark version to 2.4.7
 Key: KYLIN-4894
 URL: https://issues.apache.org/jira/browse/KYLIN-4894
 Project: Kylin
  Issue Type: Improvement
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang


Upgrade Apache Spark version to 2.4.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4893) Optimize query performance when using shard by column

2021-02-02 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4893:
--
Description: Optimize query performance when using shard by column.

> Optimize query performance when using shard by column
> -
>
> Key: KYLIN-4893
> URL: https://issues.apache.org/jira/browse/KYLIN-4893
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
>
> Optimize query performance when using shard by column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4893) Optimize query performance when using shard by column

2021-02-02 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4893:
-

 Summary: Optimize query performance when using shard by column
 Key: KYLIN-4893
 URL: https://issues.apache.org/jira/browse/KYLIN-4893
 Project: Kylin
  Issue Type: Improvement
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4891) Set the default value of 'kylin.query.spark-engine.expose-sharding-trait' to false

2021-02-02 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4891.
---
Resolution: Won't Fix

Don't fix, use other way to optimize

> Set the default value of 'kylin.query.spark-engine.expose-sharding-trait' to 
> false
> --
>
> Key: KYLIN-4891
> URL: https://issues.apache.org/jira/browse/KYLIN-4891
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Set the default value of 'kylin.query.spark-engine.expose-sharding-trait' to 
> false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4889) Query error when spark engine in local mode

2021-01-29 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4889:
--
Fix Version/s: (was: v4.0.0-beta)

> Query error when spark engine in local mode
> ---
>
> Key: KYLIN-4889
> URL: https://issues.apache.org/jira/browse/KYLIN-4889
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-alpha
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> When i query with spark engine in local mode, with -Dspark.local=true, the 
> spark application was still submitted to yarn, and the following error 
> occurred:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 6, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: 
> cannot assign instance of scala.collection.immutable.List$SerializationProxy 
> to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of 
> type scala.collection.Seq in instance of 
> org.apache.spark.rdd.MapPartitionsRDD at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>  at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) 
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291) 
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:88) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Driver stacktrace: while executing 
> SQL: "select * from (select KYLIN_SALES.PART_DT , sum(KYLIN_SALES.PRICE ) 
> from KYLIN_SALES group by KYLIN_SALES.PART_DT union select 
> KYLIN_SALES.PART_DT , max(KYLIN_SALES.PRICE ) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , count(*) from 
> KYLIN_SALES group by KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , 
> count(distinct KYLIN_SALES.PRICE) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT) limit 501"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4889) Query error when spark engine in local mode

2021-01-29 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4889:
--
Fix Version/s: v4.0.0-GA

> Query error when spark engine in local mode
> ---
>
> Key: KYLIN-4889
> URL: https://issues.apache.org/jira/browse/KYLIN-4889
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-alpha
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Major
> Fix For: v4.0.0-beta, v4.0.0-GA
>
>
> When i query with spark engine in local mode, with -Dspark.local=true, the 
> spark application was still submitted to yarn, and the following error 
> occurred:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 6, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: 
> cannot assign instance of scala.collection.immutable.List$SerializationProxy 
> to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of 
> type scala.collection.Seq in instance of 
> org.apache.spark.rdd.MapPartitionsRDD at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>  at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) 
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291) 
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:88) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Driver stacktrace: while executing 
> SQL: "select * from (select KYLIN_SALES.PART_DT , sum(KYLIN_SALES.PRICE ) 
> from KYLIN_SALES group by KYLIN_SALES.PART_DT union select 
> KYLIN_SALES.PART_DT , max(KYLIN_SALES.PRICE ) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , count(*) from 
> KYLIN_SALES group by KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , 
> count(distinct KYLIN_SALES.PRICE) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT) limit 501"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4892) Reduce the times of fetching files status from HDFS in FilePruner

2021-01-28 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4892:
-

 Summary: Reduce the times of fetching files status from HDFS in 
FilePruner
 Key: KYLIN-4892
 URL: https://issues.apache.org/jira/browse/KYLIN-4892
 Project: Kylin
  Issue Type: Improvement
  Components: Query Engine
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-GA


Reduce the times of fetching files status from HDFS in FilePruner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4891) Set the default value of 'kylin.query.spark-engine.expose-sharding-trait' to false

2021-01-28 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4891:
-

 Summary: Set the default value of 
'kylin.query.spark-engine.expose-sharding-trait' to false
 Key: KYLIN-4891
 URL: https://issues.apache.org/jira/browse/KYLIN-4891
 Project: Kylin
  Issue Type: Improvement
  Components: Query Engine
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-GA


Set the default value of 'kylin.query.spark-engine.expose-sharding-trait' to 
false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4890) Use numSlices = 1 to reduce task num when executing sparder canary

2021-01-28 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4890:
-

 Summary: Use numSlices = 1 to reduce task num when executing 
sparder canary
 Key: KYLIN-4890
 URL: https://issues.apache.org/jira/browse/KYLIN-4890
 Project: Kylin
  Issue Type: Improvement
  Components: Spark Engine
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-GA


Use numSlices = 1 to reduce task num when executing sparder canary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4877) Use all dimension columns as sort columns when saving cuboid data

2021-01-18 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4877:
-

 Summary: Use all dimension columns as sort columns when saving 
cuboid data
 Key: KYLIN-4877
 URL: https://issues.apache.org/jira/browse/KYLIN-4877
 Project: Kylin
  Issue Type: Improvement
  Components: Spark Engine
Affects Versions: v4.0.0-alpha
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta


Use all dimension columns as sort columns when saving cuboid data.

This change will reduce the size of cuboid data (about the RLE encoding of 
parquet).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4875) Remove executor configurations when execute resource detect step (local mode)

2021-01-17 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4875:
-

 Summary: Remove executor configurations when execute resource 
detect step (local mode)
 Key: KYLIN-4875
 URL: https://issues.apache.org/jira/browse/KYLIN-4875
 Project: Kylin
  Issue Type: Improvement
  Components: Spark Engine
Affects Versions: v4.0.0-alpha
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta


Remove executor configurations when execute resource detect step (local mode)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4872) Fix NPE when there are more than one segment if cube planner is open

2021-01-14 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4872.
---
Resolution: Fixed

> Fix NPE when there are more than one segment if cube planner is open
> 
>
> Key: KYLIN-4872
> URL: https://issues.apache.org/jira/browse/KYLIN-4872
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Fix NPE when there are more than one segment if cube planner is open



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4872) Fix NPE when there are more than one segment if cube planner is open

2021-01-13 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4872:
-

 Summary: Fix NPE when there are more than one segment if cube 
planner is open
 Key: KYLIN-4872
 URL: https://issues.apache.org/jira/browse/KYLIN-4872
 Project: Kylin
  Issue Type: Bug
  Components: Spark Engine
Affects Versions: v4.0.0-alpha
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta


Fix NPE when there are more than one segment if cube planner is open



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4864) Support building and testing Kylin on ARM64 architecture platform

2021-01-10 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262346#comment-17262346
 ] 

Zhichao  Zhang commented on KYLIN-4864:
---

[~seanlau],sounds good, have you already made Kylin support ARM64 platform or 
met problems? Welcome to raise a PR to support this feature.

> Support building and testing Kylin on ARM64 architecture platform
> -
>
> Key: KYLIN-4864
> URL: https://issues.apache.org/jira/browse/KYLIN-4864
> Project: Kylin
>  Issue Type: Improvement
>Reporter: liusheng
>Priority: Major
>
> Currently, there are many softwares have support running on ARM64 platform. 
> We have also done many efforts about making big-data projects support ARM64 
> platform. 
> For an example, Hadoop has published ARM64 platform specific packages:
> [https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.0/hadoop-3.3.0-aarch64.tar.gz]
>  
> and also have ARM specific CI job configured:
> [https://ci-hadoop.apache.org/job/Hive-trunk-linux-ARM/]
> It would be better to also enable ARM support  and setup ARM CI for Kylin 
> projects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4849) Support sum(case when...), sum(2*price+1), count(column) and more for Kylin 4

2020-12-28 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4849:
-

 Summary: Support sum(case when...), sum(2*price+1), count(column) 
and more for Kylin 4
 Key: KYLIN-4849
 URL: https://issues.apache.org/jira/browse/KYLIN-4849
 Project: Kylin
  Issue Type: New Feature
  Components: Query Engine
Affects Versions: v4.0.0-alpha
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta


Support sum(case when...), sum(2*price+1), count(column) and more

Please reference to KYLIN-3358 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4846) Set the related query id to sparder job description

2020-12-20 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4846:
-

 Summary: Set the related query id to sparder job description
 Key: KYLIN-4846
 URL: https://issues.apache.org/jira/browse/KYLIN-4846
 Project: Kylin
  Issue Type: New Feature
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang


Set the related query id to sparder job description



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4828) Add more sql test cases into NBuildAndQueryTest

2020-12-18 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4828.
---
Resolution: Fixed

> Add more sql test cases into NBuildAndQueryTest
> ---
>
> Key: KYLIN-4828
> URL: https://issues.apache.org/jira/browse/KYLIN-4828
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Add more sql test cases into NBuildAndQueryTest



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KYLIN-4828) Add more sql test cases into NBuildAndQueryTest

2020-12-17 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang reopened KYLIN-4828:
---

There are some un-supported sqls left.

> Add more sql test cases into NBuildAndQueryTest
> ---
>
> Key: KYLIN-4828
> URL: https://issues.apache.org/jira/browse/KYLIN-4828
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Add more sql test cases into NBuildAndQueryTest



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4844) Add lookup table duplicate key check when building job

2020-12-17 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4844.
---
Resolution: Fixed

Please see : [https://github.com/apache/kylin/pull/1514] 

> Add lookup table duplicate key check when building job
> --
>
> Key: KYLIN-4844
> URL: https://issues.apache.org/jira/browse/KYLIN-4844
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v4.0.0-alpha
>Reporter: Yaqian Zhang
>Assignee: Yaqian Zhang
>Priority: Major
> Fix For: v4.0.0-beta
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4817) Refine Cube Migration Tool for Kylin4

2020-12-17 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4817.
---
Resolution: Fixed

> Refine Cube Migration Tool for Kylin4
> -
>
> Key: KYLIN-4817
> URL: https://issues.apache.org/jira/browse/KYLIN-4817
> Project: Kylin
>  Issue Type: Improvement
>  Components: Client - CLI
>Reporter: Xiaoxiang Yu
>Assignee: Yaqian Zhang
>Priority: Major
> Fix For: v4.0.0-beta
>
>
> - Collect and analyse all cube migration tool  used in current Kylin.
> - Verify if them works in Kylin4, if not, make them works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4813) Refine spark logger for Kylin 4 build engine

2020-12-17 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4813.
---
Resolution: Fixed

> Refine spark logger for Kylin 4 build engine
> 
>
> Key: KYLIN-4813
> URL: https://issues.apache.org/jira/browse/KYLIN-4813
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v4.0.0-alpha
>Reporter: Xiaoxiang Yu
>Assignee: Yaqian Zhang
>Priority: Major
> Fix For: v4.0.0-beta
>
>
> - Separate spark log from kylin log
> - Store driver/executor log into HDFS.
> - Provided a API to view driver log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4820) Can not auto set spark resources configurations when building cube

2020-12-17 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4820.
---
Resolution: Fixed

> Can not auto set spark resources configurations when building cube
> --
>
> Key: KYLIN-4820
> URL: https://issues.apache.org/jira/browse/KYLIN-4820
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Currently there are some spark resources configurations set in the 
> kylin-default.properties, so these configurations will override the ones set 
> by Kylin automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4822) The metrics 'Total spark scan time' of query log is negative in some cases

2020-12-17 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4822.
---
Resolution: Fixed

> The metrics 'Total spark scan time' of query log is negative in some cases
> --
>
> Key: KYLIN-4822
> URL: https://issues.apache.org/jira/browse/KYLIN-4822
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> The metrics 'Total spark scan time' of query log is negative in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4824) The metric 'Total scan bytes' of 'Query Log' is always 0 when querying

2020-12-17 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4824.
---
Resolution: Fixed

> The metric 'Total scan bytes' of 'Query Log' is always 0 when querying
> --
>
> Key: KYLIN-4824
> URL: https://issues.apache.org/jira/browse/KYLIN-4824
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4829) Support to use thread-level SparkSession to execute query

2020-12-17 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4829.
---
Resolution: Fixed

> Support to use thread-level SparkSession to execute query 
> --
>
> Key: KYLIN-4829
> URL: https://issues.apache.org/jira/browse/KYLIN-4829
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine, Spark Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Currently, when executing a query, it is impossible to configure proper 
> parameters for each query according to the data will be scanned, such as 
> spark.sql.shuffle.partitions, this will impact the performance of querying.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4843) Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4

2020-12-17 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4843.
---
Resolution: Fixed

> Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4
> 
>
> Key: KYLIN-4843
> URL: https://issues.apache.org/jira/browse/KYLIN-4843
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4843) Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4

2020-12-15 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4843:
--
Description: Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4  
(was: Support INTERSECT_VALUE function for Kylin 4)

> Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4
> 
>
> Key: KYLIN-4843
> URL: https://issues.apache.org/jira/browse/KYLIN-4843
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4843) Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4

2020-12-15 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4843:
--
Summary: Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4  
(was: Support INTERSECT_VALUE function for Kylin 4)

> Support INTERSECT_COUNT/INTERSECT_VALUE function for Kylin 4
> 
>
> Key: KYLIN-4843
> URL: https://issues.apache.org/jira/browse/KYLIN-4843
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Support INTERSECT_VALUE function for Kylin 4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4843) Support INTERSECT_VALUE function for Kylin 4

2020-12-15 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4843:
-

 Summary: Support INTERSECT_VALUE function for Kylin 4
 Key: KYLIN-4843
 URL: https://issues.apache.org/jira/browse/KYLIN-4843
 Project: Kylin
  Issue Type: New Feature
  Components: Query Engine
Affects Versions: v4.0.0-alpha
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta


Support INTERSECT_VALUE function for Kylin 4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4842) Supports grouping sets function for Kylin 4

2020-12-14 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4842:
-

 Summary: Supports grouping sets function for Kylin 4
 Key: KYLIN-4842
 URL: https://issues.apache.org/jira/browse/KYLIN-4842
 Project: Kylin
  Issue Type: New Feature
  Components: Query Engine
Affects Versions: v4.0.0-alpha
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta


Currently Kylin 4 can not support grouping sets function, bacause it doesn't 
transform calcite grouping sets node to spark GroupingSets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4840) When pushdown is enabled, execute sql which includes subquery will be pushdowned

2020-12-11 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4840:
--
Description: 
When pushdown is enabled, execute sql which includes subquery will be 
pushdowned.

  

For example:
{code:java}
SELECT t1.week_beg_dt, t1.sum_price, t1.lstg_site_id
FROM (
 select KYLIN_CAL_DT.week_beg_dt, sum(price) as sum_price, lstg_site_id
 from KYLIN_SALES
 inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT
 ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.cal_dt
 inner JOIN kylin_category_groupings
 ON KYLIN_SALES.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND 
KYLIN_SALES.lstg_site_id = kylin_category_groupings.site_id
 group by KYLIN_CAL_DT.week_beg_dt, lstg_site_id
) t1
inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT
on t1.week_beg_dt = KYLIN_CAL_DT.week_beg_dt{code}
 

 

  was:
When pushdown is enabled, execute sql which includes subquery will be 
pushdowned.

 

 

For example:
{code:java}
SELECT t1.week_beg_dt, t1.sum_price, t1.lstg_site_id
FROM (
 select KYLIN_CAL_DT.week_beg_dt, sum(price) as sum_price, lstg_site_id
 from KYLIN_SALES
 inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT
 ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.cal_dt
 inner JOIN kylin_category_groupings
 ON KYLIN_SALES.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND 
KYLIN_SALES.lstg_site_id = kylin_category_groupings.site_id
 group by KYLIN_CAL_DT.week_beg_dt, lstg_site_id
) t1
inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT
on t1.week_beg_dt = KYLIN_CAL_DT.week_beg_dt{code}
 

 


> When pushdown is enabled, execute sql which includes subquery will be 
> pushdowned
> 
>
> Key: KYLIN-4840
> URL: https://issues.apache.org/jira/browse/KYLIN-4840
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha, v3.1.1
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta, v3.1.2
>
>
> When pushdown is enabled, execute sql which includes subquery will be 
> pushdowned.
>   
> For example:
> {code:java}
> SELECT t1.week_beg_dt, t1.sum_price, t1.lstg_site_id
> FROM (
>  select KYLIN_CAL_DT.week_beg_dt, sum(price) as sum_price, lstg_site_id
>  from KYLIN_SALES
>  inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT
>  ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.cal_dt
>  inner JOIN kylin_category_groupings
>  ON KYLIN_SALES.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND 
> KYLIN_SALES.lstg_site_id = kylin_category_groupings.site_id
>  group by KYLIN_CAL_DT.week_beg_dt, lstg_site_id
> ) t1
> inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT
> on t1.week_beg_dt = KYLIN_CAL_DT.week_beg_dt{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4841) Spark RDD cache is invalid when building with spark engine

2020-12-11 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4841:
-

 Summary: Spark RDD cache is invalid when building with spark engine
 Key: KYLIN-4841
 URL: https://issues.apache.org/jira/browse/KYLIN-4841
 Project: Kylin
  Issue Type: Bug
  Components: Spark Engine
Affects Versions: v3.1.1
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v3.1.2


Spark RDD cache is invalid when building with spark engine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4840) When pushdown is enabled, execute sql which includes subquery will be pushdowned

2020-12-11 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4840:
-

 Summary: When pushdown is enabled, execute sql which includes 
subquery will be pushdowned
 Key: KYLIN-4840
 URL: https://issues.apache.org/jira/browse/KYLIN-4840
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v3.1.1, v4.0.0-alpha
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta, v3.1.2


When pushdown is enabled, execute sql which includes subquery will be 
pushdowned.

 

 

For example:
{code:java}
SELECT t1.week_beg_dt, t1.sum_price, t1.lstg_site_id
FROM (
 select KYLIN_CAL_DT.week_beg_dt, sum(price) as sum_price, lstg_site_id
 from KYLIN_SALES
 inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT
 ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.cal_dt
 inner JOIN kylin_category_groupings
 ON KYLIN_SALES.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND 
KYLIN_SALES.lstg_site_id = kylin_category_groupings.site_id
 group by KYLIN_CAL_DT.week_beg_dt, lstg_site_id
) t1
inner JOIN KYLIN_CAL_DT as KYLIN_CAL_DT
on t1.week_beg_dt = KYLIN_CAL_DT.week_beg_dt{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4828) Add more sql test cases into NBuildAndQueryTest

2020-12-06 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4828.
---
Resolution: Fixed

Done

> Add more sql test cases into NBuildAndQueryTest
> ---
>
> Key: KYLIN-4828
> URL: https://issues.apache.org/jira/browse/KYLIN-4828
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Add more sql test cases into NBuildAndQueryTest



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4829) Support to use thread-level SparkSession to execute query

2020-11-29 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4829:
-

 Summary: Support to use thread-level SparkSession to execute query 
 Key: KYLIN-4829
 URL: https://issues.apache.org/jira/browse/KYLIN-4829
 Project: Kylin
  Issue Type: Improvement
  Components: Query Engine, Spark Engine
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta


Currently, when executing a query, it is impossible to configure proper 
parameters for each query according to the data will be scanned, such as 
spark.sql.shuffle.partitions, this will impact the performance of querying.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4828) Add more sql test cases into NBuildAndQueryTest

2020-11-26 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4828:
-

 Summary: Add more sql test cases into NBuildAndQueryTest
 Key: KYLIN-4828
 URL: https://issues.apache.org/jira/browse/KYLIN-4828
 Project: Kylin
  Issue Type: Improvement
  Components: Query Engine
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta


Add more sql test cases into NBuildAndQueryTest



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4824) The metric 'Total scan bytes' of 'Query Log' is always 0 when querying

2020-11-22 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4824:
-

 Summary: The metric 'Total scan bytes' of 'Query Log' is always 0 
when querying
 Key: KYLIN-4824
 URL: https://issues.apache.org/jira/browse/KYLIN-4824
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v4.0.0-alpha
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4738) The order in the returned result is wrong when use window function to query in kylin

2020-11-20 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4738.
---
Fix Version/s: v4.0.0-beta
   Resolution: Won't Fix

Don't need to fix.

> The order in the returned result is wrong when use window function to query 
> in kylin
> 
>
> Key: KYLIN-4738
> URL: https://issues.apache.org/jira/browse/KYLIN-4738
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
> Attachments: image-2020-09-02-12-15-27-097.png, 
> image-2020-09-02-12-16-47-699.png
>
>
> Use below sql to query in kylin:
> {code:java}
> // code placeholder
> SELECT PART_DT, LSTG_FORMAT_NAME, SUM(PRICE) AS GMV,FIRST_VALUE(SUM(PRICE)) 
> OVER(PARTITION BY LSTG_FORMAT_NAME ORDER BY PART_DT) AS 
> "FIRST",LAST_VALUE(SUM(PRICE)) OVER(PARTITION BY LSTG_FORMAT_NAME ORDER BY 
> PART_DT) AS "CURRENT",LAG(SUM(PRICE), 1, 0.0) OVER(PARTITION BY 
> LSTG_FORMAT_NAME ORDER BY PART_DT) AS "PREV",LEAD(SUM(PRICE), 1, 0.0) 
> OVER(PARTITION BY LSTG_FORMAT_NAME ORDER BY PART_DT) AS "NEXT",NTILE(4) OVER 
> (PARTITION BY LSTG_FORMAT_NAME ORDER BY PART_DT) AS "QUARTER"FROM 
> KYLIN_SALESINNER JOIN KYLIN_ACCOUNT as SELLER_ACCOUNTON KYLIN_SALES.SELLER_ID 
> = SELLER_ACCOUNT.ACCOUNT_IDINNER JOIN KYLIN_COUNTRY as SELLER_COUNTRYON 
> SELLER_ACCOUNT.ACCOUNT_COUNTRY = SELLER_COUNTRY.COUNTRYWHERE PART_DT >= 
> '2012-12-30' and PART_DT < '2013-01-03' and SELLER_COUNTRY.COUNTRY in 
> ('CN')GROUP BY PART_DT, LSTG_FORMAT_NAMEORDER BY PART_DT LIMIT 5;
> {code}
> the result  is:
> !image-2020-09-02-12-15-27-097.png!
>  
> and the expected result is :
> !image-2020-09-02-12-16-47-699.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4822) The metrics 'Total spark scan time' of query log is negative in some cases

2020-11-17 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4822:
-

 Summary: The metrics 'Total spark scan time' of query log is 
negative in some cases
 Key: KYLIN-4822
 URL: https://issues.apache.org/jira/browse/KYLIN-4822
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v4.0.0-alpha
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta


The metrics 'Total spark scan time' of query log is negative in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4820) Can not auto set spark resources configurations when building cube

2020-11-15 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4820:
-

 Summary: Can not auto set spark resources configurations when 
building cube
 Key: KYLIN-4820
 URL: https://issues.apache.org/jira/browse/KYLIN-4820
 Project: Kylin
  Issue Type: Bug
  Components: Spark Engine
Affects Versions: v4.0.0-alpha
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang
 Fix For: v4.0.0-beta


Currently there are some spark resources configurations set in the 
kylin-default.properties, so these configurations will override the ones set by 
Kylin automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4812) Create Dimension Dictionary With Spark Failed

2020-11-13 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231576#comment-17231576
 ] 

Zhichao  Zhang commented on KYLIN-4812:
---

The spark which kylin used doesn't support to connect Hive 3.1.1, maybe 
consider to set 'kylin.engine.spark-dimension-dictionary' and 
'kylin.engine.spark-udc-dictionary' to false.

> Create Dimension Dictionary With Spark Failed
> -
>
> Key: KYLIN-4812
> URL: https://issues.apache.org/jira/browse/KYLIN-4812
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v3.1.1
>Reporter: vincent zeng
>Priority: Major
>
> Hi, team. When set `kylin.engine.spark-dimension-dictionary=true`, step 
> `Build Dimension Dictionary with Spark` failed. 
> Error Log:
> {code:java}
> Driver stacktrace:
>   at 
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
>   at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in 
> stage 0.0 (TID 8, emr-worker-2.cluster-46685, executor 3): 
> java.lang.NullPointerException
>   at org.apache.kylin.common.KylinConfig.getManager(KylinConfig.java:462)
>   at org.apache.kylin.cube.CubeManager.getInstance(CubeManager.java:106)
>   at 
> org.apache.kylin.engine.spark.SparkBuildDictionary$DimensionDictsBuildFunction.init(SparkBuildDictionary.java:246)
>   at 
> org.apache.kylin.engine.spark.SparkBuildDictionary$DimensionDictsBuildFunction.call(SparkBuildDictionary.java:257)
>   at 
> org.apache.kylin.engine.spark.SparkBuildDictionary$DimensionDictsBuildFunction.call(SparkBuildDictionary.java:219)
>   at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
>   at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1334)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4737) The precision in the returned result is different from the one by Spark SQL

2020-11-12 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4737.
---
Fix Version/s: v4.0.0-beta
   Resolution: Won't Fix

The root cause of this issue is that the algorithm used to calculate the 
'percentile' values in Kylin 4.0 is different from the one of Spark SQL, so 
there is a little difference between them.

> The precision in the returned result is different from the one by Spark SQL
> ---
>
> Key: KYLIN-4737
> URL: https://issues.apache.org/jira/browse/KYLIN-4737
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
> Attachments: image-2020-09-02-12-07-18-076.png, 
> image-2020-09-02-12-07-49-492.png
>
>
> The precision in the returned result is different from the one by Spark SQL, 
> for example:
> the result from kylin:
> !image-2020-09-02-12-07-18-076.png!
> the result from SparkSQL:
> !image-2020-09-02-12-07-49-492.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4812) Create Dimension Dictionary With Spark Failed

2020-11-11 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230397#comment-17230397
 ] 

Zhichao  Zhang commented on KYLIN-4812:
---

[~vincentzeng] which versions of hadoop and hive you used?

> Create Dimension Dictionary With Spark Failed
> -
>
> Key: KYLIN-4812
> URL: https://issues.apache.org/jira/browse/KYLIN-4812
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v3.1.1
>Reporter: vincent zeng
>Priority: Major
>
> Hi, team. When set `kylin.engine.spark-dimension-dictionary=true`, step 
> `Build Dimension Dictionary with Spark` failed. 
> Error Log:
> {code:java}
> Driver stacktrace:
>   at 
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
>   at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in 
> stage 0.0 (TID 8, emr-worker-2.cluster-46685, executor 3): 
> java.lang.NullPointerException
>   at org.apache.kylin.common.KylinConfig.getManager(KylinConfig.java:462)
>   at org.apache.kylin.cube.CubeManager.getInstance(CubeManager.java:106)
>   at 
> org.apache.kylin.engine.spark.SparkBuildDictionary$DimensionDictsBuildFunction.init(SparkBuildDictionary.java:246)
>   at 
> org.apache.kylin.engine.spark.SparkBuildDictionary$DimensionDictsBuildFunction.call(SparkBuildDictionary.java:257)
>   at 
> org.apache.kylin.engine.spark.SparkBuildDictionary$DimensionDictsBuildFunction.call(SparkBuildDictionary.java:219)
>   at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
>   at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1334)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4811) Support cube level configuration for BuildingJob

2020-11-08 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4811.
---
Resolution: Fixed

Done

> Support cube level configuration for BuildingJob
> 
>
> Key: KYLIN-4811
> URL: https://issues.apache.org/jira/browse/KYLIN-4811
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Major
> Fix For: v4.0.0-beta
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-4811) Support cube level configuration for BuildingJob

2020-11-08 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang reassigned KYLIN-4811:
-

Assignee: Xiaoxiang Yu

> Support cube level configuration for BuildingJob
> 
>
> Key: KYLIN-4811
> URL: https://issues.apache.org/jira/browse/KYLIN-4811
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Major
> Fix For: v4.0.0-beta
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4762) Optimize join where there is the same shardby partition num on join key

2020-10-28 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222000#comment-17222000
 ] 

Zhichao  Zhang commented on KYLIN-4762:
---

The [PR1463|https://github.com/apache/kylin/pull/1463] only implemented the 
function of this optimization on kylin side, there are still some changes on 
spark side need to be implemented.
Currently, on spark side, it used 'F__KYLIN_SALES_PART_DTXXX' as agg key, but 
FileScan use '17#0' which is equals to shardby column as partition key, 
although 'F__KYLIN_SALES_PART_DTXXX' is the alias of '17#0', it's not 
'semanticEquals' between them, so in EnsureRequirements of spark side, it will 
still add exchange operator. I will raise another pr to solve this.

> Optimize join where there is the same shardby partition num on join key
> ---
>
> Key: KYLIN-4762
> URL: https://issues.apache.org/jira/browse/KYLIN-4762
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-beta
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Attachments: shardby_join.png
>
>
> Optimize join by reducing shuffle when there is the same shard by partition 
> number on join key.
> When execute this sql,
> {code:java}
> // code placeholder
> select m.seller_id, m.part_dt, sum(m.price) as s 
> from kylin_sales m 
> left join (
>   select m1.part_dt as pd, count(distinct m1.SELLER_ID) as m1, count(1) as m2 
>  
>   from kylin_sales m1
>   where m1.part_dt = '2012-01-05'
>   group by m1.part_dt 
>   ) j 
>   on m.part_dt = j.pd
>   where m.lstg_format_name = 'FP-GTC' 
>   and m.part_dt = '2012-01-05'
>   group by m.seller_id, m.part_dt limit 100;
> {code}
> the execution plan is shown below:
> !shardby_join.png!
> But the join key part_dt has the same shard by partition number, it can be 
> optimized to reduce shuffle, similar to bucket join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4797) Correct inputRecordSizes of segment when there is no data in this segment

2020-10-22 Thread Zhichao Zhang (Jira)
Zhichao  Zhang created KYLIN-4797:
-

 Summary: Correct inputRecordSizes of segment when there is no data 
in this segment
 Key: KYLIN-4797
 URL: https://issues.apache.org/jira/browse/KYLIN-4797
 Project: Kylin
  Issue Type: Bug
  Components: Spark Engine
Affects Versions: v4.0.0-alpha
Reporter: Zhichao  Zhang
Assignee: Zhichao  Zhang


When there is no inputRecord, need to set inputRecordSize to 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4776) Release Kylin v3.1.1

2020-10-19 Thread Zhichao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217273#comment-17217273
 ] 

Zhichao  Zhang commented on KYLIN-4776:
---

||Issue ID||Verified?||Documentation updated?||Others||
|KYLIN-4628| Yes| No need| No|
|KYLIN-4585| Yes| No need| No|
|KYLIN-4634| Yes| No need| No|
|KYLIN-4576| Yes| No need| No|

> Release Kylin v3.1.1
> 
>
> Key: KYLIN-4776
> URL: https://issues.apache.org/jira/browse/KYLIN-4776
> Project: Kylin
>  Issue Type: Test
>  Components: Release
>Affects Versions: v3.1.0
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Critical
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> h2. Release Plan for Kylin v3.1.1 
>  
> ||Key||Content||
> |Release Manager|Xiaoxiang Yu|
> |Voting Date|2020/10/15|
>  
>  
> h3. Issue List
> [https://issues.apache.org/jira/projects/KYLIN/versions/12348354]
> h3. Issue Verification Assignee
>  # Go to [https://issues.apache.org/jira/issues/?jql=] and input JQL, and you 
> will know the issue that you need to verify.
>  # After issue verified, please add a comment, in this comment, provided a 
> table to show result of each issue. Ask help from RM if faced any trouble.
> ||Assignee ||Issue||Count||
> |Zhichao Zhang|project = 12316121 AND fixVersion = 12348354 and (assignee = 
> tianhui5 OR assignee = xxyu )|9|
> |Yaqian Zhang|project = 12316121 AND fixVersion = 12348354 and (assignee = 
> gxcheng  )|13|
> |Rupeng Wang|project = 12316121 AND fixVersion = 12348354 and (assignee = 
> itzhangqiang or assignee = zhangyaqian or assignee = zhangzc and assignee = 
> julianpan )|10|
> |Xiaoxiang Yu|project = 12316121 AND fixVersion = 12348354 and (assignee = 
> xiaoge )|14|
>  
> h3. Hadoop3 patch PR
> - Patch : [https://github.com/apache/kylin/pull/1434]
> - Verifcation at CDH 6.3: 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4791) Throws exception 'UnsupportedOperationException: empty.reduceLeft' when there are cast expressions in the filters of FilePruner

2020-10-19 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4791:
--
Description: 
When execute function 'pruneSegments' of FilePruner, if there are some cast 
expressions in filter, it will throw exception 'UnsupportedOperationException: 
empty.reduceLeft'.

 

Solution:

Convert cast expressions in filter to attribute before translating filter.

  was:When execute function 'pruneSegments' of FilePruner, if there are some 
cast expressions in filter, it will throw exception 
'UnsupportedOperationException: empty.reduceLeft'.


> Throws exception 'UnsupportedOperationException: empty.reduceLeft' when there 
> are cast expressions in the filters of FilePruner
> ---
>
> Key: KYLIN-4791
> URL: https://issues.apache.org/jira/browse/KYLIN-4791
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine, Spark Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> When execute function 'pruneSegments' of FilePruner, if there are some cast 
> expressions in filter, it will throw exception 
> 'UnsupportedOperationException: empty.reduceLeft'.
>  
> Solution:
> Convert cast expressions in filter to attribute before translating filter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >