[jira] [Assigned] (KYLIN-4906) support query/job server dynamic register and discovery

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu reassigned KYLIN-4906:
---

Assignee: ShengJun Zheng  (was: Yaqian Zhang)

> support query/job server dynamic register and discovery
> ---
>
> Key: KYLIN-4906
> URL: https://issues.apache.org/jira/browse/KYLIN-4906
> Project: Kylin
>  Issue Type: Improvement
>  Components: Integration
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Hi,currently we are troubled by a static configuration: 
> kylin.server.cluster-servers. There tow situations not so friendly to us.
>   1. when adding query node to cluster, we forgot to update this 
> configuration in job server. It caused the newly added server was not 
> notified of all metadata update events, causing wrong query result. 
>   2. we plan to deploy query server in k8s , this static configuration is not 
> cloud native (having to update k8s configmap)
> We use kylin 2.x,seems this is still a problem in kylin 3.x (relavent code: 
> org.apache.kylin.metadata.cachesync.Broadcaster).
> I wonder if i was wrong or is there already a solution .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4906) support query/job server dynamic register and discovery

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4906:

Affects Version/s: v4.0.0-beta

> support query/job server dynamic register and discovery
> ---
>
> Key: KYLIN-4906
> URL: https://issues.apache.org/jira/browse/KYLIN-4906
> Project: Kylin
>  Issue Type: Improvement
>  Components: Integration
>Affects Versions: v4.0.0-beta
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Hi,currently we are troubled by a static configuration: 
> kylin.server.cluster-servers. There tow situations not so friendly to us.
>   1. when adding query node to cluster, we forgot to update this 
> configuration in job server. It caused the newly added server was not 
> notified of all metadata update events, causing wrong query result. 
>   2. we plan to deploy query server in k8s , this static configuration is not 
> cloud native (having to update k8s configmap)
> We use kylin 2.x,seems this is still a problem in kylin 3.x (relavent code: 
> org.apache.kylin.metadata.cachesync.Broadcaster).
> I wonder if i was wrong or is there already a solution .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4906) support query/job server dynamic register and discovery

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4906:

Fix Version/s: v4.0.0-GA

> support query/job server dynamic register and discovery
> ---
>
> Key: KYLIN-4906
> URL: https://issues.apache.org/jira/browse/KYLIN-4906
> Project: Kylin
>  Issue Type: Improvement
>  Components: Integration
>Reporter: ShengJun Zheng
>Assignee: Yaqian Zhang
>Priority: Major
> Fix For: v4.0.0-GA
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Hi,currently we are troubled by a static configuration: 
> kylin.server.cluster-servers. There tow situations not so friendly to us.
>   1. when adding query node to cluster, we forgot to update this 
> configuration in job server. It caused the newly added server was not 
> notified of all metadata update events, causing wrong query result. 
>   2. we plan to deploy query server in k8s , this static configuration is not 
> cloud native (having to update k8s configmap)
> We use kylin 2.x,seems this is still a problem in kylin 3.x (relavent code: 
> org.apache.kylin.metadata.cachesync.Broadcaster).
> I wonder if i was wrong or is there already a solution .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4906) support query/job server dynamic register and discovery

2021-02-07 Thread ShengJun Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280809#comment-17280809
 ] 

ShengJun Zheng commented on KYLIN-4906:
---

thanks for [~zhangyaqian]'s time, the detailed answer is exactly what i need.  
By the way, this feature is not included in kylin-v4.

> support query/job server dynamic register and discovery
> ---
>
> Key: KYLIN-4906
> URL: https://issues.apache.org/jira/browse/KYLIN-4906
> Project: Kylin
>  Issue Type: Improvement
>  Components: Integration
>Reporter: ShengJun Zheng
>Assignee: Yaqian Zhang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Hi,currently we are troubled by a static configuration: 
> kylin.server.cluster-servers. There tow situations not so friendly to us.
>   1. when adding query node to cluster, we forgot to update this 
> configuration in job server. It caused the newly added server was not 
> notified of all metadata update events, causing wrong query result. 
>   2. we plan to deploy query server in k8s , this static configuration is not 
> cloud native (having to update k8s configmap)
> We use kylin 2.x,seems this is still a problem in kylin 3.x (relavent code: 
> org.apache.kylin.metadata.cachesync.Broadcaster).
> I wonder if i was wrong or is there already a solution .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4904) build cubes error

2021-02-07 Thread Xiaoxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280804#comment-17280804
 ] 

Xiaoxiang Yu commented on KYLIN-4904:
-

Spark 2.4.6

> build  cubes  error
> ---
>
> Key: KYLIN-4904
> URL: https://issues.apache.org/jira/browse/KYLIN-4904
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
> Environment: ubuntu20
>Reporter: stayblank
>Priority: Major
>
> 1.不添加  `spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到  $kylin_home/lib,报这个错
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Error execute 
> org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92)
>   at 
> org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob.main(ResourceDetectBeforeCubingJob.java:100)
>   ... 13 more
> Caused by: org.apache.spark.SparkException: Could not parse Master URL: 'yarn'
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2784)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:283)
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89)
>   ... 14 more
>  
>  
>  
> 2.添加`spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到 $kylin_home/lib后报这个错
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ExceptionInInitializerError
>   at 
> org.apache.spark.scheduler.EventLoggingListener$.initEventLog(EventLoggingListener.scala:307)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:126)
>   at org.apache.spark.SparkContext.(SparkContext.scala:523)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> 

[jira] [Assigned] (KYLIN-4906) support query/job server dynamic register and discovery

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu reassigned KYLIN-4906:
---

Assignee: Yaqian Zhang

> support query/job server dynamic register and discovery
> ---
>
> Key: KYLIN-4906
> URL: https://issues.apache.org/jira/browse/KYLIN-4906
> Project: Kylin
>  Issue Type: Improvement
>  Components: Integration
>Reporter: ShengJun Zheng
>Assignee: Yaqian Zhang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Hi,currently we are troubled by a static configuration: 
> kylin.server.cluster-servers. There tow situations not so friendly to us.
>   1. when adding query node to cluster, we forgot to update this 
> configuration in job server. It caused the newly added server was not 
> notified of all metadata update events, causing wrong query result. 
>   2. we plan to deploy query server in k8s , this static configuration is not 
> cloud native (having to update k8s configmap)
> We use kylin 2.x,seems this is still a problem in kylin 3.x (relavent code: 
> org.apache.kylin.metadata.cachesync.Broadcaster).
> I wonder if i was wrong or is there already a solution .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4856) Website guide is not matched with ciwki about Job Scheduler

2021-02-07 Thread Xiaoxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280801#comment-17280801
 ] 

Xiaoxiang Yu commented on KYLIN-4856:
-

[~zhangyaqian] please help to check if we need to modify wiki.

> Website guide is not matched with ciwki about Job Scheduler
> ---
>
> Key: KYLIN-4856
> URL: https://issues.apache.org/jira/browse/KYLIN-4856
> Project: Kylin
>  Issue Type: Bug
>  Components: Website
>Affects Versions: v3.1.1
>Reporter: yoonsung.lee
>Assignee: Yaqian Zhang
>Priority: Major
>
> Hi. I'm yoonsung.lee
> I'm trying to apply distributed multiple job schedulers to my cluster.
> But the explanations of distributed multiple job schedulers are different 
> between the website and cwiki.
> h3. Website
> The website ({color:red}Enable multiple job engines (HA){color} section in 
> {color:#0747A6}http://kylin.apache.org/docs/install/advance_settings.html{color})
>  guide to configure 
> {code}
> kylin.job.scheduler.default=2
> kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock
> {code}
> for multiple job engines.
> h3. Cwiki
> The cwiki 
> ({color:#0747A6}https://cwiki.apache.org/confluence/display/KYLIN/Comparison+of+Kylin+Job+scheduler{color})
>  says the *DistributedScheduler* (which is set as 
> *kylin.job.scheduler.default=2*) uses  ZookeeperDistributedLock  as 
> implementation class.
> But the website guides to set 
> *kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock* which 
> is described as DefaultScheduler in cwiki.
> h1. Solution
> If the cwiki is right, the description on the website should be changed.
> If the website is right, the description on the cwiki should be changed.
> Please let me know if I misunderstand for setting distributed multiple job 
> schedulers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-4856) Website guide is not matched with ciwki about Job Scheduler

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu reassigned KYLIN-4856:
---

Assignee: Yaqian Zhang  (was: yoonsung.lee)

> Website guide is not matched with ciwki about Job Scheduler
> ---
>
> Key: KYLIN-4856
> URL: https://issues.apache.org/jira/browse/KYLIN-4856
> Project: Kylin
>  Issue Type: Bug
>  Components: Website
>Affects Versions: v3.1.1
>Reporter: yoonsung.lee
>Assignee: Yaqian Zhang
>Priority: Major
>
> Hi. I'm yoonsung.lee
> I'm trying to apply distributed multiple job schedulers to my cluster.
> But the explanations of distributed multiple job schedulers are different 
> between the website and cwiki.
> h3. Website
> The website ({color:red}Enable multiple job engines (HA){color} section in 
> {color:#0747A6}http://kylin.apache.org/docs/install/advance_settings.html{color})
>  guide to configure 
> {code}
> kylin.job.scheduler.default=2
> kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock
> {code}
> for multiple job engines.
> h3. Cwiki
> The cwiki 
> ({color:#0747A6}https://cwiki.apache.org/confluence/display/KYLIN/Comparison+of+Kylin+Job+scheduler{color})
>  says the *DistributedScheduler* (which is set as 
> *kylin.job.scheduler.default=2*) uses  ZookeeperDistributedLock  as 
> implementation class.
> But the website guides to set 
> *kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock* which 
> is described as DefaultScheduler in cwiki.
> h1. Solution
> If the cwiki is right, the description on the website should be changed.
> If the website is right, the description on the cwiki should be changed.
> Please let me know if I misunderstand for setting distributed multiple job 
> schedulers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4859) Log4J reinitialized/reconfigured by Spark Logging

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu resolved KYLIN-4859.
-
Resolution: Fixed

> Log4J reinitialized/reconfigured by Spark Logging
> -
>
> Key: KYLIN-4859
> URL: https://issues.apache.org/jira/browse/KYLIN-4859
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v4.0.0-alpha
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Spark Logging was intrucoded in kylin 4.0-alpha, it uses SL4J too, but it 
> will reinitialize log4j when RootLoggers's Appender is empty.
> See: 
> https://github.com/apache/spark/blob/45e19bb99acd5066723fec2bbdc0c99c696c3daf/core/src/main/scala/org/apache/spark/internal/Logging.scala#L120
> this causes some logs of a query logged in file(kylin.log), the other logged 
> in stdout(kylin.out) using configuration file log4j-defaults.properties in 
> spark-core_2.11-2.x.x.jar
> not firendly to read and anlyze a query performance, and may cause other 
> logging confusions
> To Avoid this is,  default appenders for rootLoggers should be set in 
> kylin-server-log4j.properties



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-4859) Log4J reinitialized/reconfigured by Spark Logging

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu reassigned KYLIN-4859:
---

Assignee: ShengJun Zheng

> Log4J reinitialized/reconfigured by Spark Logging
> -
>
> Key: KYLIN-4859
> URL: https://issues.apache.org/jira/browse/KYLIN-4859
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v4.0.0-alpha
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Spark Logging was intrucoded in kylin 4.0-alpha, it uses SL4J too, but it 
> will reinitialize log4j when RootLoggers's Appender is empty.
> See: 
> https://github.com/apache/spark/blob/45e19bb99acd5066723fec2bbdc0c99c696c3daf/core/src/main/scala/org/apache/spark/internal/Logging.scala#L120
> this causes some logs of a query logged in file(kylin.log), the other logged 
> in stdout(kylin.out) using configuration file log4j-defaults.properties in 
> spark-core_2.11-2.x.x.jar
> not firendly to read and anlyze a query performance, and may cause other 
> logging confusions
> To Avoid this is,  default appenders for rootLoggers should be set in 
> kylin-server-log4j.properties



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4859) Log4J reinitialized/reconfigured by Spark Logging

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4859:

Fix Version/s: v4.0.0-beta

> Log4J reinitialized/reconfigured by Spark Logging
> -
>
> Key: KYLIN-4859
> URL: https://issues.apache.org/jira/browse/KYLIN-4859
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v4.0.0-alpha
>Reporter: ShengJun Zheng
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Spark Logging was intrucoded in kylin 4.0-alpha, it uses SL4J too, but it 
> will reinitialize log4j when RootLoggers's Appender is empty.
> See: 
> https://github.com/apache/spark/blob/45e19bb99acd5066723fec2bbdc0c99c696c3daf/core/src/main/scala/org/apache/spark/internal/Logging.scala#L120
> this causes some logs of a query logged in file(kylin.log), the other logged 
> in stdout(kylin.out) using configuration file log4j-defaults.properties in 
> spark-core_2.11-2.x.x.jar
> not firendly to read and anlyze a query performance, and may cause other 
> logging confusions
> To Avoid this is,  default appenders for rootLoggers should be set in 
> kylin-server-log4j.properties



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-4861) Wrong way to get CubeManager instance in CubeInstance.latestCopyForWrite()

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu reassigned KYLIN-4861:
---

Assignee: Xiaoxiang Yu

> Wrong way to get CubeManager instance in CubeInstance.latestCopyForWrite()
> --
>
> Key: KYLIN-4861
> URL: https://issues.apache.org/jira/browse/KYLIN-4861
> Project: Kylin
>  Issue Type: Bug
>Reporter: Zhong Yanghong
>Assignee: Xiaoxiang Yu
>Priority: Major
>
> Each cube can have its own KylinConfig. Then for the following code:
> {code}
> public CubeInstance latestCopyForWrite() {
> CubeManager mgr = CubeManager.getInstance(config);
> CubeInstance latest = mgr.getCube(name); // in case this object is 
> out-of-date
> return mgr.copyForWrite(latest);
> }
> {code}
> Each cube can have a different CubeManager instance, which may easily cause 
> map consistency issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4863) dependency cache script files not fully used

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4863:

Fix Version/s: (was: Future)
   v3.1.2

> dependency cache script files not fully used
> 
>
> Key: KYLIN-4863
> URL: https://issues.apache.org/jira/browse/KYLIN-4863
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v3.1.1
>Reporter: Huajie Wang
>Priority: Minor
> Fix For: v3.1.2
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> kylin startup-related scripts will generate dependent cache scripts at the 
> first startup to speed up the second startup. Now it is found that the 
> dependent cache scripts are not fully utilized. Only some scripts use this 
> file. It is recommended to use this file globally to speed up the startup 
> again, and it is recommended that the generated cache file start with ".", so 
> that dependent cache scripts are not directly visible



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4880) Prepare release for Kylin 4.0.0-beta

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu resolved KYLIN-4880.
-
Fix Version/s: v4.0.0-beta
   Resolution: Fixed

> Prepare release for Kylin 4.0.0-beta
> 
>
> Key: KYLIN-4880
> URL: https://issues.apache.org/jira/browse/KYLIN-4880
> Project: Kylin
>  Issue Type: Task
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Major
> Fix For: v4.0.0-beta
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> According to our plan 
> (https://cwiki.apache.org/confluence/display/KYLIN/2021+Dev+Plan), we are 
> going to release 4.0.0-beta in next two/three weeks.
> Let me ceate a issue to record our progress, like regression testing, 
> packaging, documentation update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (KYLIN-4885) "No dictionary found" after running MetadataCleanupJob

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4885.
---
Resolution: Implemented

> "No dictionary found" after running MetadataCleanupJob
> --
>
> Key: KYLIN-4885
> URL: https://issues.apache.org/jira/browse/KYLIN-4885
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.3.0
>Reporter: vergilchiu
>Priority: Major
> Fix For: v3.1.2
>
>
> Some erros happend after I ran MetadataCleanupJob script.
> -heres is merge job erro--
> java.lang.IllegalStateException: No resource found at -- 
> /dict/DM_BA.V_KYLIN_DM_HIVEBOX_IN_AND_OUT_STATS/P_HOUR/461eb5df-7aa9-413f-9661-0ecfdd1d3498.dict
>   at 
> org.apache.kylin.engine.mr.common.JobRelatedMetaUtil.dumpResources(JobRelatedMetaUtil.java:65)
>   at 
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.dumpKylinPropsAndMetadata(AbstractHadoopJob.java:574)
>   at 
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.attachCubeMetadataWithDict(AbstractHadoopJob.java:519)
>   at 
> org.apache.kylin.engine.mr.steps.MergeCuboidJob.run(MergeCuboidJob.java:64)
>   at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:130)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:67)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
>   at 
> org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:158)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> result code:2
> --here is query erro-
> No dictionary found by 
> /dict/FACT_RD.FACT_ASR_CONTEXT_REQUESTS/P_HOUR/9295e522-8be6-4e24-800c-e23c4eea4dcf.dict,
>  invalid cube state;
>  
> -my MetadataCleanupJob script cmd
> ${KYLIN_HOME}/bin/metastore.sh clean --jobThreshold 40 --delete true
>  
> ps : clean script runs at every weedend
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4889) Query error when spark engine in local mode

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu resolved KYLIN-4889.
-
Resolution: Fixed

> Query error when spark engine in local mode
> ---
>
> Key: KYLIN-4889
> URL: https://issues.apache.org/jira/browse/KYLIN-4889
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-alpha
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> When i query with spark engine in local mode, with -Dspark.local=true, the 
> spark application was still submitted to yarn, and the following error 
> occurred:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 6, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: 
> cannot assign instance of scala.collection.immutable.List$SerializationProxy 
> to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of 
> type scala.collection.Seq in instance of 
> org.apache.spark.rdd.MapPartitionsRDD at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>  at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) 
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291) 
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:88) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Driver stacktrace: while executing 
> SQL: "select * from (select KYLIN_SALES.PART_DT , sum(KYLIN_SALES.PRICE ) 
> from KYLIN_SALES group by KYLIN_SALES.PART_DT union select 
> KYLIN_SALES.PART_DT , max(KYLIN_SALES.PRICE ) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , count(*) from 
> KYLIN_SALES group by KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , 
> count(distinct KYLIN_SALES.PRICE) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT) limit 501"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4887) Segment pruner support string type partition col in spark query engine

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu resolved KYLIN-4887.
-
  Assignee: ShengJun Zheng
Resolution: Fixed

> Segment pruner support string type partition col in spark query engine
> --
>
> Key: KYLIN-4887
> URL: https://issues.apache.org/jira/browse/KYLIN-4887
> Project: Kylin
>  Issue Type: Bug
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> when Partition col is String type, segment pruner will be invalid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4896) 构建过程中, cube metadata 丢失

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu resolved KYLIN-4896.
-
Resolution: Fixed

> 构建过程中, cube metadata 丢失
> ---
>
> Key: KYLIN-4896
> URL: https://issues.apache.org/jira/browse/KYLIN-4896
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v3.1.1
>Reporter: hejian
>Assignee: Linghui Zeng
>Priority: Major
> Fix For: v3.1.2
>
> Attachments: image-2021-02-03-19-11-09-261.png
>
>
> {quote}今天又出现了在cube使用分布式构建过程中,cube metadata丢失的问题了,
> 构建到第四步(Build Dimension Dictionary)的时候出现了这个cube的metedata丢失的问题。
> 错误日志如下,
> !image-2021-02-03-19-11-09-261.png!
> kylin版本3.1.1采用的是  kylin.job.scheduler.default=2,
> 其余的配置均为正确的。{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4895) change spark deploy mode of kylin4.0 engine from local to cluster

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4895:

Fix Version/s: (was: Future)
   v4.0.0-GA

> change spark deploy mode of kylin4.0 engine from local to cluster
> -
>
> Key: KYLIN-4895
> URL: https://issues.apache.org/jira/browse/KYLIN-4895
> Project: Kylin
>  Issue Type: New Feature
>  Components: Job Engine
>Affects Versions: v4.0.0-alpha
>Reporter: tianhui
>Priority: Major
> Fix For: v4.0.0-GA
>
>
>     In cloud native environment, the memory of pod is quite limited. But the 
> spark Driver can use a huge amount of memory, in job engine pod, which is 
> quite difficult to manage when there are more and more projects. 
>     So it's better to put spark driver on yarn, and kylin only maintain its 
> status.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4896) 构建过程中, cube metadata 丢失

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4896:

Fix Version/s: v3.1.2

> 构建过程中, cube metadata 丢失
> ---
>
> Key: KYLIN-4896
> URL: https://issues.apache.org/jira/browse/KYLIN-4896
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v3.1.1
>Reporter: hejian
>Assignee: Linghui Zeng
>Priority: Major
> Fix For: v3.1.2
>
> Attachments: image-2021-02-03-19-11-09-261.png
>
>
> {quote}今天又出现了在cube使用分布式构建过程中,cube metadata丢失的问题了,
> 构建到第四步(Build Dimension Dictionary)的时候出现了这个cube的metedata丢失的问题。
> 错误日志如下,
> !image-2021-02-03-19-11-09-261.png!
> kylin版本3.1.1采用的是  kylin.job.scheduler.default=2,
> 其余的配置均为正确的。{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-4896) 构建过程中, cube metadata 丢失

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu reassigned KYLIN-4896:
---

Assignee: Linghui Zeng

> 构建过程中, cube metadata 丢失
> ---
>
> Key: KYLIN-4896
> URL: https://issues.apache.org/jira/browse/KYLIN-4896
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v3.1.1
>Reporter: hejian
>Assignee: Linghui Zeng
>Priority: Major
> Attachments: image-2021-02-03-19-11-09-261.png
>
>
> {quote}今天又出现了在cube使用分布式构建过程中,cube metadata丢失的问题了,
> 构建到第四步(Build Dimension Dictionary)的时候出现了这个cube的metedata丢失的问题。
> 错误日志如下,
> !image-2021-02-03-19-11-09-261.png!
> kylin版本3.1.1采用的是  kylin.job.scheduler.default=2,
> 其余的配置均为正确的。{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4903) cache parent datasource to accelerate next layer's cuboid building

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4903:

Affects Version/s: v4.0.0-beta

> cache parent datasource to accelerate next layer's cuboid building
> --
>
> Key: KYLIN-4903
> URL: https://issues.apache.org/jira/browse/KYLIN-4903
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v4.0.0-beta
>Reporter: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> In Kylin V4, parent datasource is not cached in next layer's cuboid building, 
> causing repeated HDFS files read. Cacheing parent datasource in memory will 
> in enhance 20~30% build performance in our case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4897) Add table snapshot and global dictionary cleaning in StorageCleanupJob

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu resolved KYLIN-4897.
-
Resolution: Fixed

> Add table snapshot and global dictionary cleaning in StorageCleanupJob
> --
>
> Key: KYLIN-4897
> URL: https://issues.apache.org/jira/browse/KYLIN-4897
> Project: Kylin
>  Issue Type: Improvement
>  Components: Client - CLI
>Affects Versions: v4.0.0-alpha, v4.0.0-beta
>Reporter: Yaqian Zhang
>Assignee: Yaqian Zhang
>Priority: Major
> Fix For: v4.0.0-GA
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4898) Add automated test cases

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu resolved KYLIN-4898.
-
Resolution: Fixed

> Add automated test cases
> 
>
> Key: KYLIN-4898
> URL: https://issues.apache.org/jira/browse/KYLIN-4898
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v4.0.0-alpha
>Reporter: Yaqian Zhang
>Assignee: Linghui Zeng
>Priority: Minor
> Fix For: v4.0.0-GA
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-4898) Add automated test cases

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu reassigned KYLIN-4898:
---

Assignee: Xiaoxiang Yu  (was: Yaqian Zhang)

> Add automated test cases
> 
>
> Key: KYLIN-4898
> URL: https://issues.apache.org/jira/browse/KYLIN-4898
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v4.0.0-alpha
>Reporter: Yaqian Zhang
>Assignee: Xiaoxiang Yu
>Priority: Minor
> Fix For: v4.0.0-GA
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-4898) Add automated test cases

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu reassigned KYLIN-4898:
---

Assignee: Linghui Zeng  (was: Xiaoxiang Yu)

> Add automated test cases
> 
>
> Key: KYLIN-4898
> URL: https://issues.apache.org/jira/browse/KYLIN-4898
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v4.0.0-alpha
>Reporter: Yaqian Zhang
>Assignee: Linghui Zeng
>Priority: Minor
> Fix For: v4.0.0-GA
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4904) build cubes error

2021-02-07 Thread stayblank (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280792#comment-17280792
 ] 

stayblank commented on KYLIN-4904:
--

what  is  the version  of  spark

> build  cubes  error
> ---
>
> Key: KYLIN-4904
> URL: https://issues.apache.org/jira/browse/KYLIN-4904
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
> Environment: ubuntu20
>Reporter: stayblank
>Priority: Major
>
> 1.不添加  `spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到  $kylin_home/lib,报这个错
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Error execute 
> org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92)
>   at 
> org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob.main(ResourceDetectBeforeCubingJob.java:100)
>   ... 13 more
> Caused by: org.apache.spark.SparkException: Could not parse Master URL: 'yarn'
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2784)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:283)
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89)
>   ... 14 more
>  
>  
>  
> 2.添加`spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到 $kylin_home/lib后报这个错
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ExceptionInInitializerError
>   at 
> org.apache.spark.scheduler.EventLoggingListener$.initEventLog(EventLoggingListener.scala:307)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:126)
>   at org.apache.spark.SparkContext.(SparkContext.scala:523)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> 

[jira] [Commented] (KYLIN-4904) build cubes error

2021-02-07 Thread stayblank (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280791#comment-17280791
 ] 

stayblank commented on KYLIN-4904:
--

thanks  very  much ,i  will  try  it

> build  cubes  error
> ---
>
> Key: KYLIN-4904
> URL: https://issues.apache.org/jira/browse/KYLIN-4904
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
> Environment: ubuntu20
>Reporter: stayblank
>Priority: Major
>
> 1.不添加  `spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到  $kylin_home/lib,报这个错
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Error execute 
> org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92)
>   at 
> org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob.main(ResourceDetectBeforeCubingJob.java:100)
>   ... 13 more
> Caused by: org.apache.spark.SparkException: Could not parse Master URL: 'yarn'
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2784)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:283)
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89)
>   ... 14 more
>  
>  
>  
> 2.添加`spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到 $kylin_home/lib后报这个错
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ExceptionInInitializerError
>   at 
> org.apache.spark.scheduler.EventLoggingListener$.initEventLog(EventLoggingListener.scala:307)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:126)
>   at org.apache.spark.SparkContext.(SparkContext.scala:523)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> 

[jira] [Updated] (KYLIN-4898) Add automated test cases

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4898:

Fix Version/s: (was: Future)
   v4.0.0-GA

> Add automated test cases
> 
>
> Key: KYLIN-4898
> URL: https://issues.apache.org/jira/browse/KYLIN-4898
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v4.0.0-alpha
>Reporter: Yaqian Zhang
>Assignee: Yaqian Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4899) NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/NoSuchObjectException

2021-02-07 Thread Xiaoxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280790#comment-17280790
 ] 

Xiaoxiang Yu commented on KYLIN-4899:
-

As far as I know, Kylin 3 didn't support deployment on : Hadoop 2 + Hive 3 . 
Please try to use modify source code and compile for your hadoop version.

> NoClassDefFoundError: 
> org/apache/hadoop/hive/metastore/api/NoSuchObjectException
> 
>
> Key: KYLIN-4899
> URL: https://issues.apache.org/jira/browse/KYLIN-4899
> Project: Kylin
>  Issue Type: Bug
>  Components: Others
>Affects Versions: v3.1.1
> Environment:  jdk 1.8.0_222, Hadoop 2.10.1, hive 3.1.2, hbase 1.6.0, 
> zookeeper 3.6.2 and Kylin 3.1.1-bin-hbase1x.
>Reporter: Clement Lee
>Priority: Major
> Attachments: check-env.sh.png, environment path.jpg, log.jpg
>
>
> Hello Kylin team,
>  I’m a new Kylin user. I have met some troubles with installing Kylin. My 
> environment information: jdk 1.8.0_222, Hadoop 2.10.1, hive 3.1.2, hbase 
> 1.6.0, zookeeper 3.6.2 and Kylin 3.1.1-bin-hbase1x. My environment path like 
> picture1.After I unzip the package, and run “bin/check-env.sh” , it showed 
> successfully ,like picture 2. And then run “bin/kylin.sh start ”, I got 
> everything is ok, like picture3. However, I cannot open the website 
> ‘ip:7070/kylin’. I checked the tomcat logs, it said ‘*NoClassDefFoundError: 
> org/apache/hadoop/hive/metastore/api/NoSuchObjectException*’ like picture4. 
> How can I fix this problem?
>    Thank you, looking forward to your reply!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4901) Query result use diff timezone in real-time stream

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4901:

Fix Version/s: v3.1.2

> Query result use diff timezone in real-time stream
> --
>
> Key: KYLIN-4901
> URL: https://issues.apache.org/jira/browse/KYLIN-4901
> Project: Kylin
>  Issue Type: Bug
>  Components: Real-time Streaming
>Reporter: Kun Liu
>Assignee: Kun Liu
>Priority: Major
> Fix For: v3.1.2
>
> Attachments: Different timezones for displaying results.png, other 
> non-derived time columns.png
>
>
> When i test the real-time stream with timezone configuration, I find that the 
> query result use diff timezone format.
> For example, I set the `kylin.stream.event.timezone` to GMT-1 (after i fix 
> the issue 
> [kylin-4900|https://issues.apache.org/jira/projects/KYLIN/issues/KYLIN-4900?filter=allopenissues]),
>  and push some data to kafka.
>  
> The result of derived time column is GMT-8 format, but other time/date 
> columns are displayed using GMT+0 format.
>  
> This result make me confused.
>  
> How to reproduce
>  
> The data produced by use `$KYLIN_HOME/bin/kylin.sh 
> org.apache.kylin.source.kafka.util.KafkaSampleProducer --topic 
> kylin_streaming_topic --broker localhost:9092 --interval 1`
>  
> message template is :
> 2021-02-05 06:32:28,720 INFO [main] util.KafkaSampleProducer:136 : Sending 1 
> message: 
> \{"country":"US","amount":65.78351439157635,"qty":9,"currency":"USD","order_time":1612506748660,"category":"ELECTRONIC","device":"Windows","user":{"gender":"Male","id":"e1f07f05-9eff-46fa-d401-180d0441df13","first_name":"unknown","age":22}}
>  
> The order_time of first message is 1612506748660 which is 2021-02-05 05:32:28 
> GMT-1 or 2021-02-05 06:32:28 GMT-0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4900) The result of derived time columns are error, when timezone is GMT-1 or GMT-N

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4900:

Fix Version/s: v3.1.2

> The result of derived time columns are error, when timezone is GMT-1 or GMT-N
> -
>
> Key: KYLIN-4900
> URL: https://issues.apache.org/jira/browse/KYLIN-4900
> Project: Kylin
>  Issue Type: Bug
>  Components: Real-time Streaming
>Reporter: Kun Liu
>Assignee: Kun Liu
>Priority: Blocker
> Fix For: v3.1.2
>
> Attachments: day_start is error.png, result- day_start.png
>
>
> When set the configuration of `kylin.stream.event.timezone` to GMT-1 or 
> GMT-N, the result of DAY_START is error.
>  
> The data produced by use `$KYLIN_HOME/bin/kylin.sh 
> org.apache.kylin.source.kafka.util.KafkaSampleProducer --topic 
> kylin_streaming_topic --broker localhost:9092 --interval 1`
>  
> message template is :
> 2021-02-05 06:32:28,720 INFO [main] util.KafkaSampleProducer:136 : Sending 1 
> message: 
> \{"country":"US","amount":65.78351439157635,"qty":9,"currency":"USD","order_time":1612506748660,"category":"ELECTRONIC","device":"Windows","user":{"gender":"Male","id":"e1f07f05-9eff-46fa-d401-180d0441df13","first_name":"unknown","age":22}}
>  
> The order_time of first message is 1612506748660 which is 2021-02-05 14:32:28 
> GMT+8 or 2021-02-05 5:32:28 GMT-0
>  
> The query result is in the attachments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4903) cache parent datasource to accelerate next layer's cuboid building

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4903:

Fix Version/s: v4.0.0-GA

> cache parent datasource to accelerate next layer's cuboid building
> --
>
> Key: KYLIN-4903
> URL: https://issues.apache.org/jira/browse/KYLIN-4903
> Project: Kylin
>  Issue Type: Improvement
>Reporter: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> In Kylin V4, parent datasource is not cached in next layer's cuboid building, 
> causing repeated HDFS files read. Cacheing parent datasource in memory will 
> in enhance 20~30% build performance in our case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4904) build cubes error

2021-02-07 Thread Xiaoxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280787#comment-17280787
 ] 

Xiaoxiang Yu commented on KYLIN-4904:
-

Hi, [~stayblank], I have verified that 4.0.0-beta works on EMR 5.31 (Hadoop 
2.10 and Hive 2.3), you may find if it could help you. Here is the link : 
https://cwiki.apache.org/confluence/display/KYLIN/Deploy+Kylin+4+on+AWS+EMR .

> build  cubes  error
> ---
>
> Key: KYLIN-4904
> URL: https://issues.apache.org/jira/browse/KYLIN-4904
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
> Environment: ubuntu20
>Reporter: stayblank
>Priority: Major
>
> 1.不添加  `spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到  $kylin_home/lib,报这个错
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Error execute 
> org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92)
>   at 
> org.apache.kylin.engine.spark.job.ResourceDetectBeforeCubingJob.main(ResourceDetectBeforeCubingJob.java:100)
>   ... 13 more
> Caused by: org.apache.spark.SparkException: Could not parse Master URL: 'yarn'
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2784)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:283)
>   at 
> org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89)
>   ... 14 more
>  
>  
>  
> 2.添加`spark-yarn_2.11-2.4.8-SNAPSHOT.jar` 到 $kylin_home/lib后报这个错
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.runLocalMode(NSparkExecutable.java:389)
>   at 
> org.apache.kylin.engine.spark.job.NSparkExecutable.doWork(NSparkExecutable.java:153)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:94)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:205)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ExceptionInInitializerError
>   at 
> org.apache.spark.scheduler.EventLoggingListener$.initEventLog(EventLoggingListener.scala:307)
>   at 
> 

[jira] [Updated] (KYLIN-4907) Cuboid ids missing when hitting the query cache

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4907:

Affects Version/s: v4.0.0-alpha
   v4.0.0-beta

> Cuboid ids missing when hitting the query cache
> ---
>
> Key: KYLIN-4907
> URL: https://issues.apache.org/jira/browse/KYLIN-4907
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-alpha, v4.0.0-beta
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Minor
> Fix For: v4.0.0-GA
>
> Attachments: cuboid id miss.png
>
>
> Cuboid ids missing when hitting the query cache !cuboid id miss.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4906) support query/job server dynamic register and discovery

2021-02-07 Thread Yaqian Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280770#comment-17280770
 ] 

Yaqian Zhang commented on KYLIN-4906:
-

Hi:
I think this feature has been available since kylin 3.1.0. You can check this 
issue: https://issues.apache.org/jira/browse/KYLIN-4499. 

If you set "kylin.server.self-discovery-enabled=true", kylin will automatically 
find all kylin instances in the cluster, whether query nodes or job nodes, and 
register them to "kylin.server.cluster-servers".

{code:java}
if (kylinConfig.getServerSelfDiscoveryEnabled()) {
KylinServerDiscovery.getInstance();
}
logger.info("Cluster servers: {}", 
Lists.newArrayList(kylinConfig.getRestServers()));
{code}

{code:java}
@Override
public void cacheChanged() {
logger.info("Service discovery get cacheChanged 
notification");
final List> instances = 
serviceCache.getInstances();
Map instanceNodes = 
Maps.newHashMapWithExpectedSize(instances.size());
for (ServiceInstance entry : instances) {
instanceNodes.put(entry.getAddress() + ":" + 
entry.getPort(),
(String) 
entry.getPayload().get(SERVICE_PAYLOAD_DESCRIPTION));
}

logger.info("kylin.server.cluster-servers update to " + 
instanceNodes);
// update cluster servers
System.setProperty("kylin.server.cluster-servers", 
StringUtil.join(instanceNodes.keySet(), ","));

// get servers and its mode(query, job, all)
final String restServersInClusterWithMode = 
StringUtil.join(instanceNodes.entrySet().stream()
.map(input -> input.getKey() + ":" + 
input.getValue()).collect(Collectors.toList()), ",");
logger.info("kylin.server.cluster-servers-with-mode update 
to " + restServersInClusterWithMode);

System.setProperty("kylin.server.cluster-servers-with-mode", 
restServersInClusterWithMode);
isFinishInit.set(true);
}
{code}

When Broadcaster synchronizes metadata information, it will get all kylin 
instance information from "kylin.server.cluster-servers" and synchronize 
metadata to all nodes.

{code:java}
 String[] restServers = config.getRestServers();
logger.debug("Servers in the cluster: {}", 
Arrays.toString(restServers));
for (final String node : restServers) {
if (restClientMap.containsKey(node) == false) {
restClientMap.put(node, new RestClient(node));
}
}

String toWhere = broadcastEvent.getTargetNode();
if (toWhere == null)
toWhere = "all";
logger.debug("Announcing new broadcast to {}: {}", 
toWhere, broadcastEvent);
{code}

I tested kylin 3.1.0 and started two query nodes and one job node. Their 
configurations are as follows:
{code:java}
kylin.server.mode=job/query
kylin.job.scheduler.default=2
kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock
kylin.server.cluster-servers=localhost:7070
kylin.server.self-discovery-enabled=true
{code}

And in the logs of the three kylin nodes, you can see the following logs, 
indicating that all nodes have been added to "kylin.server.cluster-servers":
{code:java}
2021-02-08 11:18:57,442 INFO  [localhost-startStop-1] 
zookeeper.KylinServerDiscovery:126 : Haven't registered, waiting ...
2021-02-08 11:18:57,446 INFO  [KylinServerTracker-0] 
zookeeper.KylinServerDiscovery:100 : Service discovery get cacheChanged 
notification
2021-02-08 11:18:57,454 INFO  [KylinServerTracker-0] 
zookeeper.KylinServerDiscovery:108 : kylin.server.cluster-servers update to 
{cdh-worker-2:7070=job, cdh-worker-2:7073=query, cdh-worker-2:7072=query}
2021-02-08 11:18:57,464 INFO  [KylinServerTracker-0] 
zookeeper.KylinServerDiscovery:115 : kylin.server.cluster-servers-with-mode 
update to cdh-worker-2:7070:job,cdh-worker-2:7073:query,cdh-worker-2:7072:query
2021-02-08 11:18:57,542 INFO  [localhost-startStop-1] service.JobService:141 : 
Cluster servers: [cdh-worker-2:7070, cdh-worker-2:7073, cdh-worker-2:7072]
{code}

When the metadata of one node is updated, Broadcaster will synchronize the 
update to all other nodes:

{code:java}
2021-02-08 11:32:30,740 DEBUG [pool-6-thread-1] cachesync.Broadcaster:119 : 
Servers in the cluster: [cdh-worker-2:7070, cdh-worker-2:7073, 
cdh-worker-2:7072]
2021-02-08 11:32:30,803 INFO  [http-bio-7070-exec-9] service.CubeService:240 : 
New cube kylin_sales_cube_test_service has 161 cuboids
2021-02-08 11:32:30,804 INFO  [http-bio-7070-exec-9] cube.CubeManager:255 : 
Creating cube 

[jira] [Updated] (KYLIN-4906) support query/job server dynamic register and discovery

2021-02-07 Thread Yaqian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaqian Zhang updated KYLIN-4906:

Attachment: screenshot-2.png

> support query/job server dynamic register and discovery
> ---
>
> Key: KYLIN-4906
> URL: https://issues.apache.org/jira/browse/KYLIN-4906
> Project: Kylin
>  Issue Type: Improvement
>  Components: Integration
>Reporter: ShengJun Zheng
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Hi,currently we are troubled by a static configuration: 
> kylin.server.cluster-servers. There tow situations not so friendly to us.
>   1. when adding query node to cluster, we forgot to update this 
> configuration in job server. It caused the newly added server was not 
> notified of all metadata update events, causing wrong query result. 
>   2. we plan to deploy query server in k8s , this static configuration is not 
> cloud native (having to update k8s configmap)
> We use kylin 2.x,seems this is still a problem in kylin 3.x (relavent code: 
> org.apache.kylin.metadata.cachesync.Broadcaster).
> I wonder if i was wrong or is there already a solution .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4906) support query/job server dynamic register and discovery

2021-02-07 Thread Yaqian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaqian Zhang updated KYLIN-4906:

Attachment: screenshot-1.png

> support query/job server dynamic register and discovery
> ---
>
> Key: KYLIN-4906
> URL: https://issues.apache.org/jira/browse/KYLIN-4906
> Project: Kylin
>  Issue Type: Improvement
>  Components: Integration
>Reporter: ShengJun Zheng
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Hi,currently we are troubled by a static configuration: 
> kylin.server.cluster-servers. There tow situations not so friendly to us.
>   1. when adding query node to cluster, we forgot to update this 
> configuration in job server. It caused the newly added server was not 
> notified of all metadata update events, causing wrong query result. 
>   2. we plan to deploy query server in k8s , this static configuration is not 
> cloud native (having to update k8s configmap)
> We use kylin 2.x,seems this is still a problem in kylin 3.x (relavent code: 
> org.apache.kylin.metadata.cachesync.Broadcaster).
> I wonder if i was wrong or is there already a solution .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4907) Cuboid ids missing when hitting the query cache

2021-02-07 Thread Xiaoxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280735#comment-17280735
 ] 

Xiaoxiang Yu commented on KYLIN-4907:
-

Because of the fact 4.0.0-beta is released, so move fixVersion to 4.0.0 .

> Cuboid ids missing when hitting the query cache
> ---
>
> Key: KYLIN-4907
> URL: https://issues.apache.org/jira/browse/KYLIN-4907
> Project: Kylin
>  Issue Type: Bug
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Minor
> Fix For: v4.0.0-GA
>
> Attachments: cuboid id miss.png
>
>
> Cuboid ids missing when hitting the query cache !cuboid id miss.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4907) Cuboid ids missing when hitting the query cache

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4907:

Fix Version/s: (was: v4.0.0-beta)
   v4.0.0-GA

> Cuboid ids missing when hitting the query cache
> ---
>
> Key: KYLIN-4907
> URL: https://issues.apache.org/jira/browse/KYLIN-4907
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-alpha
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Minor
> Fix For: v4.0.0-GA
>
> Attachments: cuboid id miss.png
>
>
> Cuboid ids missing when hitting the query cache !cuboid id miss.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4846) Set the related query id to sparder job description

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4846.
---
Fix Version/s: v4.0.0-beta
   Resolution: Fixed

> Set the related query id to sparder job description
> ---
>
> Key: KYLIN-4846
> URL: https://issues.apache.org/jira/browse/KYLIN-4846
> Project: Kylin
>  Issue Type: New Feature
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-beta
>
>
> Set the related query id to sparder job description



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4797) Correct inputRecordSizes of segment when there is no data in this segment

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4797.
---
Fix Version/s: v4.0.0-alpha
   Resolution: Fixed

> Correct inputRecordSizes of segment when there is no data in this segment
> -
>
> Key: KYLIN-4797
> URL: https://issues.apache.org/jira/browse/KYLIN-4797
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-alpha
>
>
> When there is no inputRecord, need to set inputRecordSize to 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4730) Add scan bytes metric to the query results

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4730.
---
Fix Version/s: v2.6.6
   v4.0.0-alpha
   Resolution: Fixed

> Add scan bytes metric to the query results
> --
>
> Key: KYLIN-4730
> URL: https://issues.apache.org/jira/browse/KYLIN-4730
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-alpha, v2.6.6
>
>
> Add scan bytes metric to the query results



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KYLIN-4730) Add scan bytes metric to the query results

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang updated KYLIN-4730:
--
Fix Version/s: (was: v2.6.6)

> Add scan bytes metric to the query results
> --
>
> Key: KYLIN-4730
> URL: https://issues.apache.org/jira/browse/KYLIN-4730
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-alpha
>
>
> Add scan bytes metric to the query results



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4452) Kylin on Parquet with Docker

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4452.
---
Resolution: Fixed

> Kylin on Parquet with Docker
> 
>
> Key: KYLIN-4452
> URL: https://issues.apache.org/jira/browse/KYLIN-4452
> Project: Kylin
>  Issue Type: New Feature
>  Components: Storage - Parquet
>Reporter: xuekaiqi
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-alpha
>
>
> Since kylin can run independently of hadoop, containerized deployment is the 
> next step



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4907) Cuboid ids missing when hitting the query cache

2021-02-07 Thread Feng Zhu (Jira)
Feng Zhu created KYLIN-4907:
---

 Summary: Cuboid ids missing when hitting the query cache
 Key: KYLIN-4907
 URL: https://issues.apache.org/jira/browse/KYLIN-4907
 Project: Kylin
  Issue Type: Bug
Affects Versions: v4.0.0-alpha
Reporter: Feng Zhu
Assignee: Feng Zhu
 Fix For: v4.0.0-beta
 Attachments: cuboid id miss.png

Cuboid ids missing when hitting the query cache !cuboid id miss.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4906) support query/job server dynamic register and discovery

2021-02-07 Thread ShengJun Zheng (Jira)
ShengJun Zheng created KYLIN-4906:
-

 Summary: support query/job server dynamic register and discovery
 Key: KYLIN-4906
 URL: https://issues.apache.org/jira/browse/KYLIN-4906
 Project: Kylin
  Issue Type: Improvement
  Components: Integration
Reporter: ShengJun Zheng


Hi,currently we are troubled by a static configuration: 
kylin.server.cluster-servers. There tow situations not so friendly to us.

  1. when adding query node to cluster, we forgot to update this configuration 
in job server. It caused the newly added server was not notified of all 
metadata update events, causing wrong query result. 

  2. we plan to deploy query server in k8s , this static configuration is not 
cloud native (having to update k8s configmap)

We use kylin 2.x,seems this is still a problem in kylin 3.x (relavent code: 
org.apache.kylin.metadata.cachesync.Broadcaster).

I wonder if i was wrong or is there already a solution .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4894) Upgrade Apache Spark version to 2.4.7

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4894.
---
Fix Version/s: v4.0.0-GA
   Resolution: Fixed

> Upgrade Apache Spark version to 2.4.7
> -
>
> Key: KYLIN-4894
> URL: https://issues.apache.org/jira/browse/KYLIN-4894
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Upgrade Apache Spark version to 2.4.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4893) Optimize query performance when using shard by column

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4893.
---
Fix Version/s: v4.0.0-GA
   Resolution: Fixed

> Optimize query performance when using shard by column
> -
>
> Key: KYLIN-4893
> URL: https://issues.apache.org/jira/browse/KYLIN-4893
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Optimize query performance when using shard by column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4892) Reduce the times of fetching files status from HDFS in FilePruner

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4892.
---
Resolution: Fixed

> Reduce the times of fetching files status from HDFS in FilePruner
> -
>
> Key: KYLIN-4892
> URL: https://issues.apache.org/jira/browse/KYLIN-4892
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Reduce the times of fetching files status from HDFS in FilePruner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-4890) Use numSlices = 1 to reduce task num when executing sparder canary

2021-02-07 Thread Zhichao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichao  Zhang resolved KYLIN-4890.
---
Resolution: Fixed

> Use numSlices = 1 to reduce task num when executing sparder canary
> --
>
> Key: KYLIN-4890
> URL: https://issues.apache.org/jira/browse/KYLIN-4890
> Project: Kylin
>  Issue Type: Improvement
>  Components: Spark Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Use numSlices = 1 to reduce task num when executing sparder canary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4893) Optimize query performance when using shard by column

2021-02-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280586#comment-17280586
 ] 

ASF subversion and git services commented on KYLIN-4893:


Commit b428f1e1ab6a48101bec03e515d135c57af32878 in kylin's branch 
refs/heads/kylin-on-parquet-v2 from Zhichao Zhang
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=b428f1e ]

KYLIN-4893 Optimize query performance when using shard by column

(cherry picked from commit bd5ab5e61ca4dc0c5ccabb66b17b4be1642ce13d)
(cherry picked from commit 8fa9d8d210b2755325999ed3e7496a320e3bd7f9)


> Optimize query performance when using shard by column
> -
>
> Key: KYLIN-4893
> URL: https://issues.apache.org/jira/browse/KYLIN-4893
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
>
> Optimize query performance when using shard by column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4893) Optimize query performance when using shard by column

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280584#comment-17280584
 ] 

ASF GitHub Bot commented on KYLIN-4893:
---

hit-lacus commented on pull request #1577:
URL: https://github.com/apache/kylin/pull/1577#issuecomment-774692034


   @zzcclp  You add a new configuration entry, could you please kindly add 
explain into Kylin 4 's wiki ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimize query performance when using shard by column
> -
>
> Key: KYLIN-4893
> URL: https://issues.apache.org/jira/browse/KYLIN-4893
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
>
> Optimize query performance when using shard by column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4893) Optimize query performance when using shard by column

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280585#comment-17280585
 ] 

ASF GitHub Bot commented on KYLIN-4893:
---

hit-lacus merged pull request #1577:
URL: https://github.com/apache/kylin/pull/1577


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimize query performance when using shard by column
> -
>
> Key: KYLIN-4893
> URL: https://issues.apache.org/jira/browse/KYLIN-4893
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
>
> Optimize query performance when using shard by column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1577: KYLIN-4893 Optimize query performance when using shard by column

2021-02-07 Thread GitBox


hit-lacus merged pull request #1577:
URL: https://github.com/apache/kylin/pull/1577


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kylin] hit-lacus commented on pull request #1577: KYLIN-4893 Optimize query performance when using shard by column

2021-02-07 Thread GitBox


hit-lacus commented on pull request #1577:
URL: https://github.com/apache/kylin/pull/1577#issuecomment-774692034


   @zzcclp  You add a new configuration entry, could you please kindly add 
explain into Kylin 4 's wiki ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (KYLIN-4905) Support limit .. offset ... in spark query engine

2021-02-07 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu updated KYLIN-4905:

Fix Version/s: v4.0.0-GA

> Support limit .. offset ... in spark query engine
> -
>
> Key: KYLIN-4905
> URL: https://issues.apache.org/jira/browse/KYLIN-4905
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Affects Versions: v4.0.0-alpha
>Reporter: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> when use top-level result offset clause in query expression (ANSI SQL) :  
> limit xxx offset xxx in spark query engine,limit will not push down into 
> spark engine, and offset will not take effect. This is incompatible wIth 
> Kylin 2.x~3.x.
> After looking through the code, i found it's because spark dose not support 
> limit ... offset ... now. There is a spark issue in progress: 
> https://issues.apache.org/jira/browse/SPARK-28330, which was created in 2019 
> but still in progress.
> So, should we support this feature temporarily in KYLIN? :
>    1. push down limit to spark
>    2. take result from starting offset  in KYLIN query server
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4892) Reduce the times of fetching files status from HDFS in FilePruner

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280541#comment-17280541
 ] 

ASF GitHub Bot commented on KYLIN-4892:
---

hit-lacus merged pull request #1576:
URL: https://github.com/apache/kylin/pull/1576


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reduce the times of fetching files status from HDFS in FilePruner
> -
>
> Key: KYLIN-4892
> URL: https://issues.apache.org/jira/browse/KYLIN-4892
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Reduce the times of fetching files status from HDFS in FilePruner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4892) Reduce the times of fetching files status from HDFS in FilePruner

2021-02-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280540#comment-17280540
 ] 

ASF subversion and git services commented on KYLIN-4892:


Commit b9cd81a0b1710913af469b04d44f3451d6a87f0c in kylin's branch 
refs/heads/kylin-on-parquet-v2 from Zhichao Zhang
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=b9cd81a ]

KYLIN-4892 Reduce the times of fetching files status from HDFS Namenode in 
FilePruner

(cherry picked from commit 6e4d94d1c027d5877eb3013f37ef223aa0532cc2)
(cherry picked from commit edebb98ca33e1f3ddf000842f12d0bff45109c57)


> Reduce the times of fetching files status from HDFS in FilePruner
> -
>
> Key: KYLIN-4892
> URL: https://issues.apache.org/jira/browse/KYLIN-4892
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Reduce the times of fetching files status from HDFS in FilePruner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1576: KYLIN-4892 Reduce the times of fetching files status from HDFS Namenode in FilePruner

2021-02-07 Thread GitBox


hit-lacus merged pull request #1576:
URL: https://github.com/apache/kylin/pull/1576


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4667) Automatically set kylin.query.cache-signature-enabled to be true when memcached is enabled

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280538#comment-17280538
 ] 

ASF GitHub Bot commented on KYLIN-4667:
---

hit-lacus commented on a change in pull request #1540:
URL: https://github.com/apache/kylin/pull/1540#discussion_r571624722



##
File path: 
server-base/src/main/java/org/apache/kylin/rest/service/CacheService.java
##
@@ -27,11 +27,11 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.beans.factory.InitializingBean;
+import org.springframework.cache.CacheManager;
 import org.springframework.beans.factory.annotation.Autowired;
 import org.springframework.beans.factory.annotation.Qualifier;
 import org.springframework.stereotype.Component;
 
-import net.sf.ehcache.CacheManager;

Review comment:
   I don't understand why chenge to another 
`org.springframework.cache.CacheManager.CacheManager`, could you please explain 
?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Automatically set kylin.query.cache-signature-enabled to be true when 
> memcached is enabled
> --
>
> Key: KYLIN-4667
> URL: https://issues.apache.org/jira/browse/KYLIN-4667
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: JiangYang
>Priority: Major
> Fix For: v3.1.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus commented on a change in pull request #1540: [KYLIN-4667] Automatically set kylin.query.cache-signature-enabled to…

2021-02-07 Thread GitBox


hit-lacus commented on a change in pull request #1540:
URL: https://github.com/apache/kylin/pull/1540#discussion_r571624722



##
File path: 
server-base/src/main/java/org/apache/kylin/rest/service/CacheService.java
##
@@ -27,11 +27,11 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.springframework.beans.factory.InitializingBean;
+import org.springframework.cache.CacheManager;
 import org.springframework.beans.factory.annotation.Autowired;
 import org.springframework.beans.factory.annotation.Qualifier;
 import org.springframework.stereotype.Component;
 
-import net.sf.ehcache.CacheManager;

Review comment:
   I don't understand why chenge to another 
`org.springframework.cache.CacheManager.CacheManager`, could you please explain 
?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4889) Query error when spark engine in local mode

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280523#comment-17280523
 ] 

ASF GitHub Bot commented on KYLIN-4889:
---

hit-lacus closed pull request #1565:
URL: https://github.com/apache/kylin/pull/1565


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Query error when spark engine in local mode
> ---
>
> Key: KYLIN-4889
> URL: https://issues.apache.org/jira/browse/KYLIN-4889
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-alpha
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> When i query with spark engine in local mode, with -Dspark.local=true, the 
> spark application was still submitted to yarn, and the following error 
> occurred:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 6, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: 
> cannot assign instance of scala.collection.immutable.List$SerializationProxy 
> to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of 
> type scala.collection.Seq in instance of 
> org.apache.spark.rdd.MapPartitionsRDD at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>  at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) 
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291) 
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:88) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Driver stacktrace: while executing 
> SQL: "select * from (select KYLIN_SALES.PART_DT , sum(KYLIN_SALES.PRICE ) 
> from KYLIN_SALES group by KYLIN_SALES.PART_DT union select 
> KYLIN_SALES.PART_DT , max(KYLIN_SALES.PRICE ) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , count(*) from 
> KYLIN_SALES group by KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , 
> count(distinct KYLIN_SALES.PRICE) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT) limit 501"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4889) Query error when spark engine in local mode

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280522#comment-17280522
 ] 

ASF GitHub Bot commented on KYLIN-4889:
---

hit-lacus commented on pull request #1565:
URL: https://github.com/apache/kylin/pull/1565#issuecomment-774680450


   Meged in https://github.com/apache/kylin/pull/1580



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Query error when spark engine in local mode
> ---
>
> Key: KYLIN-4889
> URL: https://issues.apache.org/jira/browse/KYLIN-4889
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-alpha
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> When i query with spark engine in local mode, with -Dspark.local=true, the 
> spark application was still submitted to yarn, and the following error 
> occurred:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 6, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: 
> cannot assign instance of scala.collection.immutable.List$SerializationProxy 
> to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of 
> type scala.collection.Seq in instance of 
> org.apache.spark.rdd.MapPartitionsRDD at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>  at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) 
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291) 
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:88) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Driver stacktrace: while executing 
> SQL: "select * from (select KYLIN_SALES.PART_DT , sum(KYLIN_SALES.PRICE ) 
> from KYLIN_SALES group by KYLIN_SALES.PART_DT union select 
> KYLIN_SALES.PART_DT , max(KYLIN_SALES.PRICE ) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , count(*) from 
> KYLIN_SALES group by KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , 
> count(distinct KYLIN_SALES.PRICE) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT) limit 501"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus closed pull request #1565: KYLIN-4889, fix spark engine in local mode

2021-02-07 Thread GitBox


hit-lacus closed pull request #1565:
URL: https://github.com/apache/kylin/pull/1565


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kylin] hit-lacus commented on pull request #1565: KYLIN-4889, fix spark engine in local mode

2021-02-07 Thread GitBox


hit-lacus commented on pull request #1565:
URL: https://github.com/apache/kylin/pull/1565#issuecomment-774680450


   Meged in https://github.com/apache/kylin/pull/1580



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4889) Query error when spark engine in local mode

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280521#comment-17280521
 ] 

ASF GitHub Bot commented on KYLIN-4889:
---

hit-lacus merged pull request #1580:
URL: https://github.com/apache/kylin/pull/1580


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Query error when spark engine in local mode
> ---
>
> Key: KYLIN-4889
> URL: https://issues.apache.org/jira/browse/KYLIN-4889
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-alpha
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> When i query with spark engine in local mode, with -Dspark.local=true, the 
> spark application was still submitted to yarn, and the following error 
> occurred:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 6, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: 
> cannot assign instance of scala.collection.immutable.List$SerializationProxy 
> to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of 
> type scala.collection.Seq in instance of 
> org.apache.spark.rdd.MapPartitionsRDD at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>  at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) 
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291) 
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:88) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Driver stacktrace: while executing 
> SQL: "select * from (select KYLIN_SALES.PART_DT , sum(KYLIN_SALES.PRICE ) 
> from KYLIN_SALES group by KYLIN_SALES.PART_DT union select 
> KYLIN_SALES.PART_DT , max(KYLIN_SALES.PRICE ) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , count(*) from 
> KYLIN_SALES group by KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , 
> count(distinct KYLIN_SALES.PRICE) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT) limit 501"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1580: KYLIN-4889

2021-02-07 Thread GitBox


hit-lacus merged pull request #1580:
URL: https://github.com/apache/kylin/pull/1580


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4889) Query error when spark engine in local mode

2021-02-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280519#comment-17280519
 ] 

ASF subversion and git services commented on KYLIN-4889:


Commit 591c0e61e5cdd74b2bce2d36d5b3995130673559 in kylin's branch 
refs/heads/kylin-on-parquet-v2 from feng.zhu
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=591c0e6 ]

KYLIN-4889, fix spark engine in local mode


> Query error when spark engine in local mode
> ---
>
> Key: KYLIN-4889
> URL: https://issues.apache.org/jira/browse/KYLIN-4889
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-alpha
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> When i query with spark engine in local mode, with -Dspark.local=true, the 
> spark application was still submitted to yarn, and the following error 
> occurred:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 6, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: 
> cannot assign instance of scala.collection.immutable.List$SerializationProxy 
> to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of 
> type scala.collection.Seq in instance of 
> org.apache.spark.rdd.MapPartitionsRDD at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>  at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) 
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291) 
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:88) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Driver stacktrace: while executing 
> SQL: "select * from (select KYLIN_SALES.PART_DT , sum(KYLIN_SALES.PRICE ) 
> from KYLIN_SALES group by KYLIN_SALES.PART_DT union select 
> KYLIN_SALES.PART_DT , max(KYLIN_SALES.PRICE ) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , count(*) from 
> KYLIN_SALES group by KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , 
> count(distinct KYLIN_SALES.PRICE) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT) limit 501"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4889) Query error when spark engine in local mode

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280518#comment-17280518
 ] 

ASF GitHub Bot commented on KYLIN-4889:
---

hit-lacus opened a new pull request #1580:
URL: https://github.com/apache/kylin/pull/1580


   ## Proposed changes
   
   Describe the big picture of your changes here to communicate to the 
maintainers why we should accept this pull request. If it fixes a bug or 
resolves a feature request, be sure to link to that issue.
   
   ## Types of changes
   
   What types of changes does your code introduce to Kylin?
   _Put an `x` in the boxes that apply_
   
   - [ ] Bugfix (non-breaking change which fixes an issue)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to not work as expected)
   - [ ] Documentation Update (if none of the other choices apply)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after 
creating the PR. If you're unsure about any of them, don't hesitate to ask. 
We're here to help! This is simply a reminder of what we are going to look for 
before merging your code._
   
   - [ ] I have create an issue on [Kylin's 
jira](https://issues.apache.org/jira/browse/KYLIN), and have described the 
bug/feature there in detail
   - [ ] Commit messages in my PR start with the related jira ID, like 
"KYLIN- Make Kylin project open-source"
   - [ ] Compiling and unit tests pass locally with my changes
   - [ ] I have added tests that prove my fix is effective or that my feature 
works
   - [ ] If this change need a document change, I will prepare another pr 
against the `document` branch
   - [ ] Any dependent changes have been merged
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
user@kylin or dev@kylin by explaining why you chose the solution you did and 
what alternatives you considered, etc...
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Query error when spark engine in local mode
> ---
>
> Key: KYLIN-4889
> URL: https://issues.apache.org/jira/browse/KYLIN-4889
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-alpha
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> When i query with spark engine in local mode, with -Dspark.local=true, the 
> spark application was still submitted to yarn, and the following error 
> occurred:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 6, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: 
> cannot assign instance of scala.collection.immutable.List$SerializationProxy 
> to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of 
> type scala.collection.Seq in instance of 
> org.apache.spark.rdd.MapPartitionsRDD at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>  at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) 
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291) 
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:88) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> 

[jira] [Commented] (KYLIN-4889) Query error when spark engine in local mode

2021-02-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280520#comment-17280520
 ] 

ASF subversion and git services commented on KYLIN-4889:


Commit 8f8bab94ed46ad81b45c3522125d7ab020f0e66f in kylin's branch 
refs/heads/kylin-on-parquet-v2 from feng.zhu
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=8f8bab9 ]

KYLIN-4889, fix master match error when getting master_app_url


> Query error when spark engine in local mode
> ---
>
> Key: KYLIN-4889
> URL: https://issues.apache.org/jira/browse/KYLIN-4889
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v4.0.0-alpha
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> When i query with spark engine in local mode, with -Dspark.local=true, the 
> spark application was still submitted to yarn, and the following error 
> occurred:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 6, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: 
> cannot assign instance of scala.collection.immutable.List$SerializationProxy 
> to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of 
> type scala.collection.Seq in instance of 
> org.apache.spark.rdd.MapPartitionsRDD at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>  at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) 
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291) 
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) at 
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:88) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at 
> org.apache.spark.scheduler.Task.run(Task.scala:123) at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Driver stacktrace: while executing 
> SQL: "select * from (select KYLIN_SALES.PART_DT , sum(KYLIN_SALES.PRICE ) 
> from KYLIN_SALES group by KYLIN_SALES.PART_DT union select 
> KYLIN_SALES.PART_DT , max(KYLIN_SALES.PRICE ) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , count(*) from 
> KYLIN_SALES group by KYLIN_SALES.PART_DT union select KYLIN_SALES.PART_DT , 
> count(distinct KYLIN_SALES.PRICE) from KYLIN_SALES group by 
> KYLIN_SALES.PART_DT) limit 501"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus opened a new pull request #1580: KYLIN-4889

2021-02-07 Thread GitBox


hit-lacus opened a new pull request #1580:
URL: https://github.com/apache/kylin/pull/1580


   ## Proposed changes
   
   Describe the big picture of your changes here to communicate to the 
maintainers why we should accept this pull request. If it fixes a bug or 
resolves a feature request, be sure to link to that issue.
   
   ## Types of changes
   
   What types of changes does your code introduce to Kylin?
   _Put an `x` in the boxes that apply_
   
   - [ ] Bugfix (non-breaking change which fixes an issue)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to not work as expected)
   - [ ] Documentation Update (if none of the other choices apply)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after 
creating the PR. If you're unsure about any of them, don't hesitate to ask. 
We're here to help! This is simply a reminder of what we are going to look for 
before merging your code._
   
   - [ ] I have create an issue on [Kylin's 
jira](https://issues.apache.org/jira/browse/KYLIN), and have described the 
bug/feature there in detail
   - [ ] Commit messages in my PR start with the related jira ID, like 
"KYLIN- Make Kylin project open-source"
   - [ ] Compiling and unit tests pass locally with my changes
   - [ ] I have added tests that prove my fix is effective or that my feature 
works
   - [ ] If this change need a document change, I will prepare another pr 
against the `document` branch
   - [ ] Any dependent changes have been merged
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
user@kylin or dev@kylin by explaining why you chose the solution you did and 
what alternatives you considered, etc...
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4898) Add automated test cases

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280512#comment-17280512
 ] 

ASF GitHub Bot commented on KYLIN-4898:
---

hit-lacus commented on pull request #1570:
URL: https://github.com/apache/kylin/pull/1570#issuecomment-774678779


   Thanks @helenzeng0503 , you are great.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add automated test cases
> 
>
> Key: KYLIN-4898
> URL: https://issues.apache.org/jira/browse/KYLIN-4898
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v4.0.0-alpha
>Reporter: Yaqian Zhang
>Assignee: Yaqian Zhang
>Priority: Minor
> Fix For: Future
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4900) The result of derived time columns are error, when timezone is GMT-1 or GMT-N

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280511#comment-17280511
 ] 

ASF GitHub Bot commented on KYLIN-4900:
---

hit-lacus edited a comment on pull request #1573:
URL: https://github.com/apache/kylin/pull/1573#issuecomment-774678467


   I have check the source code, it looks match my original design. Let's try 
to test it manually to see if it works as expected.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The result of derived time columns are error, when timezone is GMT-1 or GMT-N
> -
>
> Key: KYLIN-4900
> URL: https://issues.apache.org/jira/browse/KYLIN-4900
> Project: Kylin
>  Issue Type: Bug
>  Components: Real-time Streaming
>Reporter: Kun Liu
>Assignee: Kun Liu
>Priority: Blocker
> Attachments: day_start is error.png, result- day_start.png
>
>
> When set the configuration of `kylin.stream.event.timezone` to GMT-1 or 
> GMT-N, the result of DAY_START is error.
>  
> The data produced by use `$KYLIN_HOME/bin/kylin.sh 
> org.apache.kylin.source.kafka.util.KafkaSampleProducer --topic 
> kylin_streaming_topic --broker localhost:9092 --interval 1`
>  
> message template is :
> 2021-02-05 06:32:28,720 INFO [main] util.KafkaSampleProducer:136 : Sending 1 
> message: 
> \{"country":"US","amount":65.78351439157635,"qty":9,"currency":"USD","order_time":1612506748660,"category":"ELECTRONIC","device":"Windows","user":{"gender":"Male","id":"e1f07f05-9eff-46fa-d401-180d0441df13","first_name":"unknown","age":22}}
>  
> The order_time of first message is 1612506748660 which is 2021-02-05 14:32:28 
> GMT+8 or 2021-02-05 5:32:28 GMT-0
>  
> The query result is in the attachments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus commented on pull request #1570: KYLIN-4898 Automated test

2021-02-07 Thread GitBox


hit-lacus commented on pull request #1570:
URL: https://github.com/apache/kylin/pull/1570#issuecomment-774678779


   Thanks @helenzeng0503 , you are great.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kylin] hit-lacus edited a comment on pull request #1573: KYLIN-4900 real-time stream doesn't support GMT-N timezone and doesn't convert other time columnexcept derived column above day

2021-02-07 Thread GitBox


hit-lacus edited a comment on pull request #1573:
URL: https://github.com/apache/kylin/pull/1573#issuecomment-774678467


   I have check the source code, it looks match my original design. Let's try 
to test it manually to see if it works as expected.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4900) The result of derived time columns are error, when timezone is GMT-1 or GMT-N

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280505#comment-17280505
 ] 

ASF GitHub Bot commented on KYLIN-4900:
---

hit-lacus commented on pull request #1573:
URL: https://github.com/apache/kylin/pull/1573#issuecomment-774678467


   I have check the source code, it looks match my original design.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The result of derived time columns are error, when timezone is GMT-1 or GMT-N
> -
>
> Key: KYLIN-4900
> URL: https://issues.apache.org/jira/browse/KYLIN-4900
> Project: Kylin
>  Issue Type: Bug
>  Components: Real-time Streaming
>Reporter: Kun Liu
>Assignee: Kun Liu
>Priority: Blocker
> Attachments: day_start is error.png, result- day_start.png
>
>
> When set the configuration of `kylin.stream.event.timezone` to GMT-1 or 
> GMT-N, the result of DAY_START is error.
>  
> The data produced by use `$KYLIN_HOME/bin/kylin.sh 
> org.apache.kylin.source.kafka.util.KafkaSampleProducer --topic 
> kylin_streaming_topic --broker localhost:9092 --interval 1`
>  
> message template is :
> 2021-02-05 06:32:28,720 INFO [main] util.KafkaSampleProducer:136 : Sending 1 
> message: 
> \{"country":"US","amount":65.78351439157635,"qty":9,"currency":"USD","order_time":1612506748660,"category":"ELECTRONIC","device":"Windows","user":{"gender":"Male","id":"e1f07f05-9eff-46fa-d401-180d0441df13","first_name":"unknown","age":22}}
>  
> The order_time of first message is 1612506748660 which is 2021-02-05 14:32:28 
> GMT+8 or 2021-02-05 5:32:28 GMT-0
>  
> The query result is in the attachments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus commented on pull request #1573: KYLIN-4900 real-time stream doesn't support GMT-N timezone and doesn't convert other time columnexcept derived column above day

2021-02-07 Thread GitBox


hit-lacus commented on pull request #1573:
URL: https://github.com/apache/kylin/pull/1573#issuecomment-774678467


   I have check the source code, it looks match my original design.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4898) Add automated test cases

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280501#comment-17280501
 ] 

ASF GitHub Bot commented on KYLIN-4898:
---

hit-lacus merged pull request #1570:
URL: https://github.com/apache/kylin/pull/1570


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add automated test cases
> 
>
> Key: KYLIN-4898
> URL: https://issues.apache.org/jira/browse/KYLIN-4898
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v4.0.0-alpha
>Reporter: Yaqian Zhang
>Assignee: Yaqian Zhang
>Priority: Minor
> Fix For: Future
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4898) Add automated test cases

2021-02-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280500#comment-17280500
 ] 

ASF subversion and git services commented on KYLIN-4898:


Commit 0f94e8165427eeb0cbae516f9516d4742f7a1205 in kylin's branch 
refs/heads/kylin-on-parquet-v2 from helenzeng0503
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=0f94e81 ]

KYLIN-4898 Automated test


> Add automated test cases
> 
>
> Key: KYLIN-4898
> URL: https://issues.apache.org/jira/browse/KYLIN-4898
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v4.0.0-alpha
>Reporter: Yaqian Zhang
>Assignee: Yaqian Zhang
>Priority: Minor
> Fix For: Future
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1570: KYLIN-4898 Automated test

2021-02-07 Thread GitBox


hit-lacus merged pull request #1570:
URL: https://github.com/apache/kylin/pull/1570


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (KYLIN-4905) Support limit .. offset ... in spark query engine

2021-02-07 Thread ShengJun Zheng (Jira)
ShengJun Zheng created KYLIN-4905:
-

 Summary: Support limit .. offset ... in spark query engine
 Key: KYLIN-4905
 URL: https://issues.apache.org/jira/browse/KYLIN-4905
 Project: Kylin
  Issue Type: New Feature
  Components: Query Engine
Affects Versions: v4.0.0-alpha
Reporter: ShengJun Zheng


when use top-level result offset clause in query expression (ANSI SQL) :  limit 
xxx offset xxx in spark query engine,limit will not push down into spark 
engine, and offset will not take effect. This is incompatible wIth Kylin 
2.x~3.x.

After looking through the code, i found it's because spark dose not support 
limit ... offset ... now. There is a spark issue in progress: 
https://issues.apache.org/jira/browse/SPARK-28330, which was created in 2019 
but still in progress.

So, should we support this feature temporarily in KYLIN? :

   1. push down limit to spark

   2. take result from starting offset  in KYLIN query server

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4894) Upgrade Apache Spark version to 2.4.7

2021-02-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280434#comment-17280434
 ] 

ASF subversion and git services commented on KYLIN-4894:


Commit 69814e01a28210c95f2d296f6a5ed4402a8ae1e4 in kylin's branch 
refs/heads/kylin-on-parquet-v2 from Zhichao Zhang
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=69814e0 ]

KYLIN-4894 Upgrade Apache Spark version to 2.4.7

(cherry picked from commit ed87e2d940bca2d518d3ac34fd01c4e5129e8ee7)


> Upgrade Apache Spark version to 2.4.7
> -
>
> Key: KYLIN-4894
> URL: https://issues.apache.org/jira/browse/KYLIN-4894
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
>
> Upgrade Apache Spark version to 2.4.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4894) Upgrade Apache Spark version to 2.4.7

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280433#comment-17280433
 ] 

ASF GitHub Bot commented on KYLIN-4894:
---

hit-lacus merged pull request #1578:
URL: https://github.com/apache/kylin/pull/1578


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Apache Spark version to 2.4.7
> -
>
> Key: KYLIN-4894
> URL: https://issues.apache.org/jira/browse/KYLIN-4894
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
>
> Upgrade Apache Spark version to 2.4.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1578: KYLIN-4894 Upgrade Apache Spark version to 2.4.7

2021-02-07 Thread GitBox


hit-lacus merged pull request #1578:
URL: https://github.com/apache/kylin/pull/1578


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4887) Segment pruner support string type partition col in spark query engine

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280431#comment-17280431
 ] 

ASF GitHub Bot commented on KYLIN-4887:
---

hit-lacus merged pull request #1561:
URL: https://github.com/apache/kylin/pull/1561


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Segment pruner support string type partition col in spark query engine
> --
>
> Key: KYLIN-4887
> URL: https://issues.apache.org/jira/browse/KYLIN-4887
> Project: Kylin
>  Issue Type: Bug
>Reporter: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> when Partition col is String type, segment pruner will be invalid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4887) Segment pruner support string type partition col in spark query engine

2021-02-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280430#comment-17280430
 ] 

ASF subversion and git services commented on KYLIN-4887:


Commit 0cf9e3d5951e880f04033efa599059237deba766 in kylin's branch 
refs/heads/kylin-on-parquet-v2 from zhengshengjun
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=0cf9e3d ]

KYLIN-4887 Segment pruner support string type partition col in spark query 
engine


> Segment pruner support string type partition col in spark query engine
> --
>
> Key: KYLIN-4887
> URL: https://issues.apache.org/jira/browse/KYLIN-4887
> Project: Kylin
>  Issue Type: Bug
>Reporter: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> when Partition col is String type, segment pruner will be invalid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1561: KYLIN-4887 Segment pruner support string type partition col in spark query engine

2021-02-07 Thread GitBox


hit-lacus merged pull request #1561:
URL: https://github.com/apache/kylin/pull/1561


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4888) Performance optimization of union query with spark engine

2021-02-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280429#comment-17280429
 ] 

ASF subversion and git services commented on KYLIN-4888:


Commit a4a8480a05449b7ab19c7eace096dc3e39697dcb in kylin's branch 
refs/heads/kylin-on-parquet-v2 from feng.zhu
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=a4a8480 ]

KYLIN-4888, Performance optimization of union query with spark engine


>  Performance optimization of union query with spark engine
> --
>
> Key: KYLIN-4888
> URL: https://issues.apache.org/jira/browse/KYLIN-4888
> Project: Kylin
>  Issue Type: Improvement
>  Components: Spark Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Major
> Fix For: v4.0.0-GA
>
> Attachments: spark_union_plan_comparison, stages before.png, 
> stages_after.png
>
>
> when using union query with spark engine, UnionPlan transforms OLAPUnionRel 
> to spark
> DataFrame, when OLAPUnionRel.all = false, distinct transformation of spark 
> will be used, but
> it's used in a loop which traversing the DataFrame collection so that we 
> don't have an excepted optimized flattenUnion plan(the CombineUnions rule of 
> spark optimize the distinct, but the nested union plan does not be 
> flattened),there are so many stages in spark dag.  Actuall, distinct 
> transformation should be used only once at last.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4888) Performance optimization of union query with spark engine

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280428#comment-17280428
 ] 

ASF GitHub Bot commented on KYLIN-4888:
---

hit-lacus merged pull request #1562:
URL: https://github.com/apache/kylin/pull/1562


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


>  Performance optimization of union query with spark engine
> --
>
> Key: KYLIN-4888
> URL: https://issues.apache.org/jira/browse/KYLIN-4888
> Project: Kylin
>  Issue Type: Improvement
>  Components: Spark Engine
>Affects Versions: v4.0.0-alpha
>Reporter: Feng Zhu
>Assignee: Feng Zhu
>Priority: Major
> Fix For: v4.0.0-GA
>
> Attachments: spark_union_plan_comparison, stages before.png, 
> stages_after.png
>
>
> when using union query with spark engine, UnionPlan transforms OLAPUnionRel 
> to spark
> DataFrame, when OLAPUnionRel.all = false, distinct transformation of spark 
> will be used, but
> it's used in a loop which traversing the DataFrame collection so that we 
> don't have an excepted optimized flattenUnion plan(the CombineUnions rule of 
> spark optimize the distinct, but the nested union plan does not be 
> flattened),there are so many stages in spark dag.  Actuall, distinct 
> transformation should be used only once at last.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1562: KYLIN-4888, Performance optimization of union query with spark engine

2021-02-07 Thread GitBox


hit-lacus merged pull request #1562:
URL: https://github.com/apache/kylin/pull/1562


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4896) 构建过程中, cube metadata 丢失

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280426#comment-17280426
 ] 

ASF GitHub Bot commented on KYLIN-4896:
---

hit-lacus merged pull request #1569:
URL: https://github.com/apache/kylin/pull/1569


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> 构建过程中, cube metadata 丢失
> ---
>
> Key: KYLIN-4896
> URL: https://issues.apache.org/jira/browse/KYLIN-4896
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v3.1.1
>Reporter: hejian
>Priority: Major
> Attachments: image-2021-02-03-19-11-09-261.png
>
>
> {quote}今天又出现了在cube使用分布式构建过程中,cube metadata丢失的问题了,
> 构建到第四步(Build Dimension Dictionary)的时候出现了这个cube的metedata丢失的问题。
> 错误日志如下,
> !image-2021-02-03-19-11-09-261.png!
> kylin版本3.1.1采用的是  kylin.job.scheduler.default=2,
> 其余的配置均为正确的。{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4896) 构建过程中, cube metadata 丢失

2021-02-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280425#comment-17280425
 ] 

ASF subversion and git services commented on KYLIN-4896:


Commit 985ff834a2eefc536a3bb2e516ab36d8f5667893 in kylin's branch 
refs/heads/master from helenzeng0503
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=985ff83 ]

KYLIN-4896 Optimize the process of writing big resource files into HDFS.


> 构建过程中, cube metadata 丢失
> ---
>
> Key: KYLIN-4896
> URL: https://issues.apache.org/jira/browse/KYLIN-4896
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v3.1.1
>Reporter: hejian
>Priority: Major
> Attachments: image-2021-02-03-19-11-09-261.png
>
>
> {quote}今天又出现了在cube使用分布式构建过程中,cube metadata丢失的问题了,
> 构建到第四步(Build Dimension Dictionary)的时候出现了这个cube的metedata丢失的问题。
> 错误日志如下,
> !image-2021-02-03-19-11-09-261.png!
> kylin版本3.1.1采用的是  kylin.job.scheduler.default=2,
> 其余的配置均为正确的。{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1569: KYLIN-4896 Optimize the process of writing big resource files into HDFS.

2021-02-07 Thread GitBox


hit-lacus merged pull request #1569:
URL: https://github.com/apache/kylin/pull/1569


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4890) Use numSlices = 1 to reduce task num when executing sparder canary

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280424#comment-17280424
 ] 

ASF GitHub Bot commented on KYLIN-4890:
---

hit-lacus merged pull request #1575:
URL: https://github.com/apache/kylin/pull/1575


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use numSlices = 1 to reduce task num when executing sparder canary
> --
>
> Key: KYLIN-4890
> URL: https://issues.apache.org/jira/browse/KYLIN-4890
> Project: Kylin
>  Issue Type: Improvement
>  Components: Spark Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Use numSlices = 1 to reduce task num when executing sparder canary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4890) Use numSlices = 1 to reduce task num when executing sparder canary

2021-02-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280423#comment-17280423
 ] 

ASF subversion and git services commented on KYLIN-4890:


Commit b171aaccacbee0d41ab1465138677b431e691d81 in kylin's branch 
refs/heads/kylin-on-parquet-v2 from Zhichao Zhang
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=b171aac ]

KYLIN-4890 Use numSlices = 1 to reduce task num when executing sparder canary

(cherry picked from commit 32f77a0c5569bdd3c1c5ea588d8042c58c493555)
(cherry picked from commit a963db414169d1ca614664268effc74a983174df)


> Use numSlices = 1 to reduce task num when executing sparder canary
> --
>
> Key: KYLIN-4890
> URL: https://issues.apache.org/jira/browse/KYLIN-4890
> Project: Kylin
>  Issue Type: Improvement
>  Components: Spark Engine
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v4.0.0-GA
>
>
> Use numSlices = 1 to reduce task num when executing sparder canary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1575: KYLIN-4890 Use numSlices = 1 to reduce task num when executing sparder canary

2021-02-07 Thread GitBox


hit-lacus merged pull request #1575:
URL: https://github.com/apache/kylin/pull/1575


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kylin] hit-lacus merged pull request #1579: minor fix for 4.0.0-beta release

2021-02-07 Thread GitBox


hit-lacus merged pull request #1579:
URL: https://github.com/apache/kylin/pull/1579


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4897) Add table snapshot and global dictionary cleaning in StorageCleanupJob

2021-02-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280417#comment-17280417
 ] 

ASF GitHub Bot commented on KYLIN-4897:
---

hit-lacus merged pull request #1571:
URL: https://github.com/apache/kylin/pull/1571


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add table snapshot and global dictionary cleaning in StorageCleanupJob
> --
>
> Key: KYLIN-4897
> URL: https://issues.apache.org/jira/browse/KYLIN-4897
> Project: Kylin
>  Issue Type: Improvement
>  Components: Client - CLI
>Affects Versions: v4.0.0-alpha, v4.0.0-beta
>Reporter: Yaqian Zhang
>Assignee: Yaqian Zhang
>Priority: Major
> Fix For: v4.0.0-GA
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4897) Add table snapshot and global dictionary cleaning in StorageCleanupJob

2021-02-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280418#comment-17280418
 ] 

ASF subversion and git services commented on KYLIN-4897:


Commit 23d908208b0fa3ca8fe30ecdf9579ba83aed0abd in kylin's branch 
refs/heads/kylin-on-parquet-v2 from yaqian.zhang
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=23d9082 ]

KYLIN-4897 Add table snapshot and global dictionary cleaning in 
StorageCleanupJob


> Add table snapshot and global dictionary cleaning in StorageCleanupJob
> --
>
> Key: KYLIN-4897
> URL: https://issues.apache.org/jira/browse/KYLIN-4897
> Project: Kylin
>  Issue Type: Improvement
>  Components: Client - CLI
>Affects Versions: v4.0.0-alpha, v4.0.0-beta
>Reporter: Yaqian Zhang
>Assignee: Yaqian Zhang
>Priority: Major
> Fix For: v4.0.0-GA
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >