[jira] [Assigned] (KYLIN-2928) PUSH DOWN query cannot use order by function

2017-10-20 Thread Yang Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Hao reassigned KYLIN-2928:
---

Assignee: Yang Hao  (was: liyang)

> PUSH DOWN query cannot use order by function
> 
>
> Key: KYLIN-2928
> URL: https://issues.apache.org/jira/browse/KYLIN-2928
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Reporter: Yang Hao
>Assignee: Yang Hao
>
> SQL : select "DATE",count(1) from table_1 group by "DATE" order by "DATE" 
> desc;
> Exception:org.apache.calcite.sql.SqlOrderBy cannot be cast to 
> org.apache.calcite.sql.SqlSelect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2928) PUSH DOWN query cannot use order by function

2017-10-20 Thread Yang Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Hao updated KYLIN-2928:

Affects Version/s: v2.1.0

> PUSH DOWN query cannot use order by function
> 
>
> Key: KYLIN-2928
> URL: https://issues.apache.org/jira/browse/KYLIN-2928
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Yang Hao
>Assignee: Yang Hao
>
> SQL : select "DATE",count(1) from table_1 group by "DATE" order by "DATE" 
> desc;
> Exception:org.apache.calcite.sql.SqlOrderBy cannot be cast to 
> org.apache.calcite.sql.SqlSelect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KYLIN-2928) PUSH DOWN query cannot use order by function

2017-10-20 Thread Yang Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Hao resolved KYLIN-2928.
-
Resolution: Fixed

It has been fixed on master

> PUSH DOWN query cannot use order by function
> 
>
> Key: KYLIN-2928
> URL: https://issues.apache.org/jira/browse/KYLIN-2928
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Yang Hao
>Assignee: Yang Hao
>
> SQL : select "DATE",count(1) from table_1 group by "DATE" order by "DATE" 
> desc;
> Exception:org.apache.calcite.sql.SqlOrderBy cannot be cast to 
> org.apache.calcite.sql.SqlSelect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KYLIN-2928) PUSH DOWN query cannot use order by function

2017-10-20 Thread Yang Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212368#comment-16212368
 ] 

Yang Hao edited comment on KYLIN-2928 at 10/20/17 8:47 AM:
---

The problem has been seen in 2.1.0. It has been fixed on master by someon else


was (Author: yanghaogn):
It has been fixed on master

> PUSH DOWN query cannot use order by function
> 
>
> Key: KYLIN-2928
> URL: https://issues.apache.org/jira/browse/KYLIN-2928
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Yang Hao
>Assignee: Yang Hao
>
> SQL : select "DATE",count(1) from table_1 group by "DATE" order by "DATE" 
> desc;
> Exception:org.apache.calcite.sql.SqlOrderBy cannot be cast to 
> org.apache.calcite.sql.SqlSelect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KYLIN-1530) Measures become empty when click 'prev' then 'next' while editting a model

2017-10-20 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen resolved KYLIN-1530.
--
   Resolution: Fixed
Fix Version/s: v2.0.0

> Measures become empty when click 'prev' then 'next' while editting a model
> --
>
> Key: KYLIN-1530
> URL: https://issues.apache.org/jira/browse/KYLIN-1530
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v1.5.0
>Reporter: Dong Li
>Assignee: Zhong,Jason
>Priority: Minor
> Fix For: v2.0.0
>
>
> 1. Edit a model
> 2. Switch to Measures tabpage
> 3. Add a new measure column
> 4. Click 'Prev' button
> 5. Click 'Next' button
> Now we're at Measures tabpge again, but the textbox is empty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2952) dynamic cube build for time(statTime and endTime)

2017-10-20 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212475#comment-16212475
 ] 

Billy Liu commented on KYLIN-2952:
--

150860160  is GMT: Saturday, October 21, 2017 4:00:00 PM, not 2017-10-22

> dynamic cube build for time(statTime and endTime)
> -
>
> Key: KYLIN-2952
> URL: https://issues.apache.org/jira/browse/KYLIN-2952
> Project: Kylin
>  Issue Type: Improvement
>  Components: REST Service
>Affects Versions: v1.6.0
> Environment: linux
>Reporter: wenxue lin
>Assignee: Zhong,Jason
>Priority: Minor
>
> ex => curl -X PUT -u "ADMIN:KYLIN" -H 
> "Content-Type:application/json;charset=utf-8" -d 
> '{"startTime":150860160,"endTime":150868800,"buildType":"BUILD"}' 
> http://host:port/kylin/api/cubes/bi_dispatch_waiting_service_cube/rebuild
> desc :
>  rest api param is startTime:150860160(2017-10-22) and 
> endTime:150868800(2017-10-23), but the actual time of building the cube 
> is 1 day ahead of schedule (actually 8 hours ahead of schedule) 
> =》【2017-10-21~2017-10-22】,But using kylinUI build without question, the 
> actual view kylin code, found that is because the kylin on the server side 
> source code for configuration of GMT rather than use the timezone of fixed 
> GMT + 8, and front-end UI will according to the configuration of the timezone 
> is transformed to the time of the page to add GMT + 8 time, then the back-end 
> to GMT + 8 time in into GMT time, so the kylinUI cube build time without 
> error, and using restAPI build cube time not making timezone 8 hours is not 
> accurate time difference problem。
> *for code:*
> kylinProperties.js
> this.getTimeZone = function () {
> if (!this.timezone) {
>   this.timezone = this.getProperty("kylin.rest.timezone").trim();
> }
> return this.timezone;
> }
> org.apache.kylin.cube.CubeSegment
> public static String makeSegmentName(long startDate, long endDate, long 
> startOffset, long endOffset) {
> if (startOffset != 0 || endOffset != 0) {
> if (startOffset == 0 && (endOffset == 0 || endOffset == 
> Long.MAX_VALUE)) {
> return "FULL_BUILD";
> }
> return startOffset + "_" + endOffset;
> }
> // using time
> SimpleDateFormat dateFormat = new SimpleDateFormat("MMddHHmmss");
> dateFormat.setTimeZone(TimeZone.getTimeZone("GMT"));
> return dateFormat.format(startDate) + "_" + dateFormat.format(endDate);
> }
> org.apache.kylin.common.util.DateFormat:
> public static FastDateFormat getDateFormat(String datePattern) {
> FastDateFormat r = formatMap.get(datePattern);
> if (r == null) {
> r = FastDateFormat.getInstance(datePattern, 
> TimeZone.getTimeZone("GMT")); // NOTE: this must be GMT to calculate epoch 
> date correctly
> formatMap.put(datePattern, r);
> }
> return r;
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2952) dynamic cube build for time(statTime and endTime)

2017-10-20 Thread wenxue lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212479#comment-16212479
 ] 

wenxue lin commented on KYLIN-2952:
---

yes,but my kylin config timezone is gmt+8, and my runtime's timezone is 
gmt+8;However, kylin builds cube's time zone by GMT, resulting in a partition 
time error that builds the cube through the rest API

> dynamic cube build for time(statTime and endTime)
> -
>
> Key: KYLIN-2952
> URL: https://issues.apache.org/jira/browse/KYLIN-2952
> Project: Kylin
>  Issue Type: Improvement
>  Components: REST Service
>Affects Versions: v1.6.0
> Environment: linux
>Reporter: wenxue lin
>Assignee: Zhong,Jason
>Priority: Minor
>
> ex => curl -X PUT -u "ADMIN:KYLIN" -H 
> "Content-Type:application/json;charset=utf-8" -d 
> '{"startTime":150860160,"endTime":150868800,"buildType":"BUILD"}' 
> http://host:port/kylin/api/cubes/bi_dispatch_waiting_service_cube/rebuild
> desc :
>  rest api param is startTime:150860160(2017-10-22) and 
> endTime:150868800(2017-10-23), but the actual time of building the cube 
> is 1 day ahead of schedule (actually 8 hours ahead of schedule) 
> =》【2017-10-21~2017-10-22】,But using kylinUI build without question, the 
> actual view kylin code, found that is because the kylin on the server side 
> source code for configuration of GMT rather than use the timezone of fixed 
> GMT + 8, and front-end UI will according to the configuration of the timezone 
> is transformed to the time of the page to add GMT + 8 time, then the back-end 
> to GMT + 8 time in into GMT time, so the kylinUI cube build time without 
> error, and using restAPI build cube time not making timezone 8 hours is not 
> accurate time difference problem。
> *for code:*
> kylinProperties.js
> this.getTimeZone = function () {
> if (!this.timezone) {
>   this.timezone = this.getProperty("kylin.rest.timezone").trim();
> }
> return this.timezone;
> }
> org.apache.kylin.cube.CubeSegment
> public static String makeSegmentName(long startDate, long endDate, long 
> startOffset, long endOffset) {
> if (startOffset != 0 || endOffset != 0) {
> if (startOffset == 0 && (endOffset == 0 || endOffset == 
> Long.MAX_VALUE)) {
> return "FULL_BUILD";
> }
> return startOffset + "_" + endOffset;
> }
> // using time
> SimpleDateFormat dateFormat = new SimpleDateFormat("MMddHHmmss");
> dateFormat.setTimeZone(TimeZone.getTimeZone("GMT"));
> return dateFormat.format(startDate) + "_" + dateFormat.format(endDate);
> }
> org.apache.kylin.common.util.DateFormat:
> public static FastDateFormat getDateFormat(String datePattern) {
> FastDateFormat r = formatMap.get(datePattern);
> if (r == null) {
> r = FastDateFormat.getInstance(datePattern, 
> TimeZone.getTimeZone("GMT")); // NOTE: this must be GMT to calculate epoch 
> date correctly
> formatMap.put(datePattern, r);
> }
> return r;
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2952) dynamic cube build for time(statTime and endTime)

2017-10-20 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212493#comment-16212493
 ] 

Billy Liu commented on KYLIN-2952:
--

The issue is clear. epoch has no timezone itself. It's just a number. The 
timezone only works for some GUI display, not the actual data. Plesae use GMT 
for API parameter. 

> dynamic cube build for time(statTime and endTime)
> -
>
> Key: KYLIN-2952
> URL: https://issues.apache.org/jira/browse/KYLIN-2952
> Project: Kylin
>  Issue Type: Improvement
>  Components: REST Service
>Affects Versions: v1.6.0
> Environment: linux
>Reporter: wenxue lin
>Assignee: Zhong,Jason
>Priority: Minor
>
> ex => curl -X PUT -u "ADMIN:KYLIN" -H 
> "Content-Type:application/json;charset=utf-8" -d 
> '{"startTime":150860160,"endTime":150868800,"buildType":"BUILD"}' 
> http://host:port/kylin/api/cubes/bi_dispatch_waiting_service_cube/rebuild
> desc :
>  rest api param is startTime:150860160(2017-10-22) and 
> endTime:150868800(2017-10-23), but the actual time of building the cube 
> is 1 day ahead of schedule (actually 8 hours ahead of schedule) 
> =》【2017-10-21~2017-10-22】,But using kylinUI build without question, the 
> actual view kylin code, found that is because the kylin on the server side 
> source code for configuration of GMT rather than use the timezone of fixed 
> GMT + 8, and front-end UI will according to the configuration of the timezone 
> is transformed to the time of the page to add GMT + 8 time, then the back-end 
> to GMT + 8 time in into GMT time, so the kylinUI cube build time without 
> error, and using restAPI build cube time not making timezone 8 hours is not 
> accurate time difference problem。
> *for code:*
> kylinProperties.js
> this.getTimeZone = function () {
> if (!this.timezone) {
>   this.timezone = this.getProperty("kylin.rest.timezone").trim();
> }
> return this.timezone;
> }
> org.apache.kylin.cube.CubeSegment
> public static String makeSegmentName(long startDate, long endDate, long 
> startOffset, long endOffset) {
> if (startOffset != 0 || endOffset != 0) {
> if (startOffset == 0 && (endOffset == 0 || endOffset == 
> Long.MAX_VALUE)) {
> return "FULL_BUILD";
> }
> return startOffset + "_" + endOffset;
> }
> // using time
> SimpleDateFormat dateFormat = new SimpleDateFormat("MMddHHmmss");
> dateFormat.setTimeZone(TimeZone.getTimeZone("GMT"));
> return dateFormat.format(startDate) + "_" + dateFormat.format(endDate);
> }
> org.apache.kylin.common.util.DateFormat:
> public static FastDateFormat getDateFormat(String datePattern) {
> FastDateFormat r = formatMap.get(datePattern);
> if (r == null) {
> r = FastDateFormat.getInstance(datePattern, 
> TimeZone.getTimeZone("GMT")); // NOTE: this must be GMT to calculate epoch 
> date correctly
> formatMap.put(datePattern, r);
> }
> return r;
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2929) speed up Dump file performance

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213540#comment-16213540
 ] 

liyang commented on KYLIN-2929:
---

nice work~

> speed up Dump file performance
> --
>
> Key: KYLIN-2929
> URL: https://issues.apache.org/jira/browse/KYLIN-2929
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>  Labels: Performance
> Attachments: 
> 0002-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch
>
>
> when I work on KYLIN-2926, I find coprocessor will dump to disk once 
> estimatedMemSize is bigger than spillThreshold, and found that spill data 
> size is extraordinary smaller that estimatedMemSize, in my case dump file 
> size is about 8MB and spillThreshold is setting to 3GB.   
> So, I try to keep the spill data in memory rather than write the file to disk 
> immediately, and when those in-memory spill data reach the threshold, write 
> all spill files together.
> In my case, the coprocessor process cost time drop from 22s to 16s, it is 
> about 30% upgrade。



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2794) MultipleDictionaryValueEnumerator should output values in sorted order

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213567#comment-16213567
 ] 

liyang commented on KYLIN-2794:
---

How to reproduce your attempt? Kylin 2.2 is not released yet. Please provide 
more info like a stacktrace.

> MultipleDictionaryValueEnumerator should output values in sorted order
> --
>
> Key: KYLIN-2794
> URL: https://issues.apache.org/jira/browse/KYLIN-2794
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.0.0
> Environment: hadoop hadoop-2.6.0-cdh5.8.2   hive 2.1 hbase 0.98
>Reporter: 翟玉勇
>Assignee: Yifei Wu
>Priority: Critical
>  Labels: scope
> Fix For: v2.2.0
>
>
> Dictionary exception during merge of segments.
> {code}
> 2017-08-18 14:17:48,828 ERROR [pool-11-thread-1] 
> threadpool.DistributedScheduler:188 : ExecuteException 
> job:8d031b5f-2d3f-445f-a62b-7bc560d919ea in server: **
> org.apache.kylin.job.exception.ExecuteException: 
> org.apache.kylin.job.exception.ExecuteException: 
> java.lang.IllegalStateException: Invalid input data. Unordered data cannot be 
> split into multi trees
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:134)
>   at 
> org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:185)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kylin.job.exception.ExecuteException: 
> java.lang.IllegalStateException: Invalid input data. Unordered data cannot be 
> split into multi trees
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:134)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:124)
>   ... 4 more
> Caused by: java.lang.IllegalStateException: Invalid input data. Unordered 
> data cannot be split into multi trees
>   at 
> org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:92)
>   at 
> org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:78)
>   at 
> org.apache.kylin.dict.DictionaryGenerator$StringTrieDictForestBuilder.addValue(DictionaryGenerator.java:212)
>   at 
> org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:79)
>   at 
> org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:64)
>   at 
> org.apache.kylin.dict.DictionaryGenerator.mergeDictionaries(DictionaryGenerator.java:104)
>   at 
> org.apache.kylin.dict.DictionaryManager.mergeDictionary(DictionaryManager.java:267)
>   at 
> org.apache.kylin.engine.mr.steps.MergeDictionaryStep.mergeDictionaries(MergeDictionaryStep.java:146)
>   at 
> org.apache.kylin.engine.mr.steps.MergeDictionaryStep.makeDictForNewSegment(MergeDictionaryStep.java:136)
>   at 
> org.apache.kylin.engine.mr.steps.MergeDictionaryStep.doWork(MergeDictionaryStep.java:68)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:124)
>   ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2932) Simplify the thread model for in-memory cubing

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213614#comment-16213614
 ] 

liyang commented on KYLIN-2932:
---

Sounds good!

> Simplify the thread model for in-memory cubing
> --
>
> Key: KYLIN-2932
> URL: https://issues.apache.org/jira/browse/KYLIN-2932
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang Ken
>Assignee: Wang Ken
>
> The current implementation uses split threads, task threads and main thread 
> to do the cube building, there is complex join and error handling logic.
> The new implement leverages the ForkJoinPool from JDK,  the event split logic 
> is handled in
> main thread. Cuboid task and sub-tasks are handled in fork join pool, cube 
> results are collected
> async and can be write to output earlier.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2903) support cardinality calculation for Hive view

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213625#comment-16213625
 ] 

liyang commented on KYLIN-2903:
---

What's the plan here? Materialize the Hive view first?

> support cardinality calculation for Hive view
> -
>
> Key: KYLIN-2903
> URL: https://issues.apache.org/jira/browse/KYLIN-2903
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> Currently, Kylin leverage HCatlog to calculate column cardinality for Hive 
> tables. While, HCatlog does not support Hive view actually. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2943) 关于kylin设置queuename遇到的问题

2017-10-20 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang updated KYLIN-2943:
--
Labels: scope  (was: )

> 关于kylin设置queuename遇到的问题
> ---
>
> Key: KYLIN-2943
> URL: https://issues.apache.org/jira/browse/KYLIN-2943
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.1.0
> Environment: CDH5.7
>Reporter: wang20170707
>Assignee: Dong Li
>  Labels: scope
>
> 在CDH5.7环境下新搭建kylin2.1环境
> 1.需要修改队列名称,具体操作如下
> 在kylin.properties文件配置队列参数如下:
> kylin.source.kylin.client=beeline
> kylin.engine.mr.config-override.mapreduce.job.queuename=cbasQueue
> kylin.source.hive.config-override.mapreduce.job.queuename=cbasQueue
> 在kylin_hive_conf.xml里面配
> mapred.job.queue.name
> 在
> kylin_job_conf_inmem.xml
> kylin_job_conf.xml文件里配
> mapreduce.job.queuename=cbasQueue
> 2.修改好配置后重启kylin,在load hive table时,运行的mr任务报错,详情如下:
> User: bdcbas1234
> Name: Kylin Hive Column Cardinality Job table=CBAS_DB.S_WEB_APP_PAGE 
> output=hdfs://zhcbdpII/bdcbasApp/kylin/bdcbasApp-kylin_metadata/cardinality/5ae93815-0f53-412e-bb53-07c194c873d7/CBAS_DB.S_WEB_APP_PAGE
> Application Type: MAPREDUCE
> Application Tags: 
> State:FAILED
> FinalStatus:  FAILED
> Started:  星期二 十月 17 16:25:47 +0800 2017
> Elapsed:  8sec
> Tracking URL: History
> Diagnostics:  
> Application application_1507516583223_4851 failed 2 times due to AM Container 
> for appattempt_1507516583223_4851_02 exited with exitCode: 1 due to: 
> Exception from container-launch.
> Container id: container_1507516583223_4851_02_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> at org.apache.hadoop.util.Shell.run(Shell.java:455)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:290)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Shell output: main : command provided 1
> main : user is bdcbas1234
> main : requested yarn user is bdcbas1234
> Container exited with a non-zero exit code 1
> .Failing this attempt.. Failing the application.
> 3.在BULID CUBE时,在第2步失败:
> #1 Step Name: Create Intermediate Flat Hive Table
> 执行成功,mr任务使用的queue为配置后的cbasQueue
> #2 Step Name: Redistribute Flat Hive Table
> 执行失败,mr任务select count(*) from 
> ...6d_9935_49f0c9600e38(Stage-1)使用的queue为配置前的default
> 4.将kylin.properties文件里的参数
> kylin.source.kylin.client修改为cli
> 执行load hive table错误与beeline相同
> 5.在BULID CUBE时,在第3步失败
> #1 Step Name: Create Intermediate Flat Hive Table
> 执行成功,与beeline相同
> #2 Step Name: Redistribute Flat Hive Table
> 执行成功,后台没有运行mr任务,kylin日志打印如下
> Row count is 0, no need to redistribute
> #3 Step Name: Extract Fact Table Distinct Columns
> 执行失败,日志如下:
> User: bdcbas1234
> Name: Kylin_Fact_Distinct_Columns_w_cub_source_Step
> Application Type: MAPREDUCE
> Application Tags: 
> State:FAILED
> FinalStatus:  FAILED
> Started:  星期二 十月 17 13:52:40 +0800 2017
> Elapsed:  26sec
> Tracking URL: History
> Diagnostics:  
> Application application_1507516583223_4639 failed 2 times due to AM Container 
> for appattempt_1507516583223_4639_02 exited with exitCode: 1 due to: 
> Exception from container-launch.
> Container id: container_1507516583223_4639_02_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> at org.apache.hadoop.util.Shell.run(Shell.java:455)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:290)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Shell output: main : command provided 1
> main : user is bdcbas1234
> main : reque

[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213631#comment-16213631
 ] 

liyang commented on KYLIN-2764:
---

Understood that this is 2-level reducer pattern, to control the total records 
one reducer receives and improves performance. Let's move on this patch.

> Build the dict for UHC column with MR
> -
>
> Key: KYLIN-2764
> URL: https://issues.apache.org/jira/browse/KYLIN-2764
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.0.0
>Reporter: kangkaisen
>Assignee: kangkaisen
> Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column 
> still build dict in JobServer. Like KYLIN-2217, we also could use MR build 
> dict for UHC column. which could thoroughly release the memory pressure and  
> improve job concurrent for JobServer  as well as speed up multi UHC columns 
> procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
> output is the UHC column dict. Because it is very hard build global dict with 
> multi reducers, I use one reducer handle one UHC column and allocate enough 
> memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2944) HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in serialize()

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213638#comment-16213638
 ] 

liyang commented on KYLIN-2944:
---

Hold on a bit... by creating new objects, the increased may cause performance 
decrease in many places. Have we tested the performance difference at least?

> HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in 
> serialize()
> 
>
> Key: KYLIN-2944
> URL: https://issues.apache.org/jira/browse/KYLIN-2944
> Project: Kylin
>  Issue Type: Bug
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
> Fix For: v2.2.0
>
>
> This is a bug, which causing incorrect query result. See more in KYLIN-2926



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2944) HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in serialize()

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213640#comment-16213640
 ] 

liyang commented on KYLIN-2944:
---

If the consumer want new objects, the consumer can always clone the measure 
objects by itself.

> HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in 
> serialize()
> 
>
> Key: KYLIN-2944
> URL: https://issues.apache.org/jira/browse/KYLIN-2944
> Project: Kylin
>  Issue Type: Bug
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
> Fix For: v2.2.0
>
>
> This is a bug, which causing incorrect query result. See more in KYLIN-2926



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KYLIN-2944) HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in serialize()

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213638#comment-16213638
 ] 

liyang edited comment on KYLIN-2944 at 10/21/17 1:32 AM:
-

Hold on a bit... by creating new objects, the increased GC may impact 
performance at all places where the serializers are used. Have we tested the 
performance difference at least?


was (Author: liyang.g...@gmail.com):
Hold on a bit... by creating new objects, the increased may cause performance 
decrease in many places. Have we tested the performance difference at least?

> HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in 
> serialize()
> 
>
> Key: KYLIN-2944
> URL: https://issues.apache.org/jira/browse/KYLIN-2944
> Project: Kylin
>  Issue Type: Bug
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
> Fix For: v2.2.0
>
>
> This is a bug, which causing incorrect query result. See more in KYLIN-2926



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KYLIN-2944) HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in serialize()

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213638#comment-16213638
 ] 

liyang edited comment on KYLIN-2944 at 10/21/17 1:34 AM:
-

Hold on a bit... by creating new objects, the increased GC may impact 
performance at all places where the serializers are used. Have we tested the 
performance difference at least?

I recall a benchmark where reusing ArrayList vs new ArrayList on every record 
yields 5% performance difference.


was (Author: liyang.g...@gmail.com):
Hold on a bit... by creating new objects, the increased GC may impact 
performance at all places where the serializers are used. Have we tested the 
performance difference at least?

> HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in 
> serialize()
> 
>
> Key: KYLIN-2944
> URL: https://issues.apache.org/jira/browse/KYLIN-2944
> Project: Kylin
>  Issue Type: Bug
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
> Fix For: v2.2.0
>
>
> This is a bug, which causing incorrect query result. See more in KYLIN-2926



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213750#comment-16213750
 ] 

liyang commented on KYLIN-2926:
---

As mentioned in KYLIN-2944, I do have concern here. The sharing of object is 
for performance sake, the same reason as Mapper and Reducer reusing objects in 
their interfaces.

I'm doing a quick benchmark to see the performance difference.

> DumpMerger return incorrect results
> ---
>
> Key: KYLIN-2926
> URL: https://issues.apache.org/jira/browse/KYLIN-2926
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch
>
>
> I our scenario, a cube query will get wrong result once coprocessor need to 
> spill to disk, Our version is 2.0.0 and I find the root cause is that in 
> DumpMerger.enqueueFromDump
> because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
> leading to different elements in dumpCurrentValues share the same object, so 
> next fill up measure values will change the existing values. 
> the incorrect measures is HLLC and raw, which use current variable in 
> deserialize.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2944) HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in serialize()

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213766#comment-16213766
 ] 

liyang commented on KYLIN-2944:
---

Added a performance test case and result shows, for HLLCSerializer, returning 
shared object or not DON'T impact performance. I've no concern now.

> HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in 
> serialize()
> 
>
> Key: KYLIN-2944
> URL: https://issues.apache.org/jira/browse/KYLIN-2944
> Project: Kylin
>  Issue Type: Bug
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
> Fix For: v2.2.0
>
>
> This is a bug, which causing incorrect query result. See more in KYLIN-2926



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results

2017-10-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213767#comment-16213767
 ] 

liyang commented on KYLIN-2926:
---

Added a performance test case and result shows, for HLLCSerializer, returning 
shared object or not DON'T impact performance. I've no concern now.

> DumpMerger return incorrect results
> ---
>
> Key: KYLIN-2926
> URL: https://issues.apache.org/jira/browse/KYLIN-2926
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
> Attachments: 
> 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch
>
>
> I our scenario, a cube query will get wrong result once coprocessor need to 
> spill to disk, Our version is 2.0.0 and I find the root cause is that in 
> DumpMerger.enqueueFromDump
> because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It 
> leading to different elements in dumpCurrentValues share the same object, so 
> next fill up measure values will change the existing values. 
> the incorrect measures is HLLC and raw, which use current variable in 
> deserialize.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)