[jira] [Assigned] (KYLIN-2928) PUSH DOWN query cannot use order by function
[ https://issues.apache.org/jira/browse/KYLIN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Hao reassigned KYLIN-2928: --- Assignee: Yang Hao (was: liyang) > PUSH DOWN query cannot use order by function > > > Key: KYLIN-2928 > URL: https://issues.apache.org/jira/browse/KYLIN-2928 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Reporter: Yang Hao >Assignee: Yang Hao > > SQL : select "DATE",count(1) from table_1 group by "DATE" order by "DATE" > desc; > Exception:org.apache.calcite.sql.SqlOrderBy cannot be cast to > org.apache.calcite.sql.SqlSelect -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2928) PUSH DOWN query cannot use order by function
[ https://issues.apache.org/jira/browse/KYLIN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Hao updated KYLIN-2928: Affects Version/s: v2.1.0 > PUSH DOWN query cannot use order by function > > > Key: KYLIN-2928 > URL: https://issues.apache.org/jira/browse/KYLIN-2928 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v2.1.0 >Reporter: Yang Hao >Assignee: Yang Hao > > SQL : select "DATE",count(1) from table_1 group by "DATE" order by "DATE" > desc; > Exception:org.apache.calcite.sql.SqlOrderBy cannot be cast to > org.apache.calcite.sql.SqlSelect -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KYLIN-2928) PUSH DOWN query cannot use order by function
[ https://issues.apache.org/jira/browse/KYLIN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Hao resolved KYLIN-2928. - Resolution: Fixed It has been fixed on master > PUSH DOWN query cannot use order by function > > > Key: KYLIN-2928 > URL: https://issues.apache.org/jira/browse/KYLIN-2928 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v2.1.0 >Reporter: Yang Hao >Assignee: Yang Hao > > SQL : select "DATE",count(1) from table_1 group by "DATE" order by "DATE" > desc; > Exception:org.apache.calcite.sql.SqlOrderBy cannot be cast to > org.apache.calcite.sql.SqlSelect -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KYLIN-2928) PUSH DOWN query cannot use order by function
[ https://issues.apache.org/jira/browse/KYLIN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212368#comment-16212368 ] Yang Hao edited comment on KYLIN-2928 at 10/20/17 8:47 AM: --- The problem has been seen in 2.1.0. It has been fixed on master by someon else was (Author: yanghaogn): It has been fixed on master > PUSH DOWN query cannot use order by function > > > Key: KYLIN-2928 > URL: https://issues.apache.org/jira/browse/KYLIN-2928 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v2.1.0 >Reporter: Yang Hao >Assignee: Yang Hao > > SQL : select "DATE",count(1) from table_1 group by "DATE" order by "DATE" > desc; > Exception:org.apache.calcite.sql.SqlOrderBy cannot be cast to > org.apache.calcite.sql.SqlSelect -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KYLIN-1530) Measures become empty when click 'prev' then 'next' while editting a model
[ https://issues.apache.org/jira/browse/KYLIN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen resolved KYLIN-1530. -- Resolution: Fixed Fix Version/s: v2.0.0 > Measures become empty when click 'prev' then 'next' while editting a model > -- > > Key: KYLIN-1530 > URL: https://issues.apache.org/jira/browse/KYLIN-1530 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v1.5.0 >Reporter: Dong Li >Assignee: Zhong,Jason >Priority: Minor > Fix For: v2.0.0 > > > 1. Edit a model > 2. Switch to Measures tabpage > 3. Add a new measure column > 4. Click 'Prev' button > 5. Click 'Next' button > Now we're at Measures tabpge again, but the textbox is empty. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2952) dynamic cube build for time(statTime and endTime)
[ https://issues.apache.org/jira/browse/KYLIN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212475#comment-16212475 ] Billy Liu commented on KYLIN-2952: -- 150860160 is GMT: Saturday, October 21, 2017 4:00:00 PM, not 2017-10-22 > dynamic cube build for time(statTime and endTime) > - > > Key: KYLIN-2952 > URL: https://issues.apache.org/jira/browse/KYLIN-2952 > Project: Kylin > Issue Type: Improvement > Components: REST Service >Affects Versions: v1.6.0 > Environment: linux >Reporter: wenxue lin >Assignee: Zhong,Jason >Priority: Minor > > ex => curl -X PUT -u "ADMIN:KYLIN" -H > "Content-Type:application/json;charset=utf-8" -d > '{"startTime":150860160,"endTime":150868800,"buildType":"BUILD"}' > http://host:port/kylin/api/cubes/bi_dispatch_waiting_service_cube/rebuild > desc : > rest api param is startTime:150860160(2017-10-22) and > endTime:150868800(2017-10-23), but the actual time of building the cube > is 1 day ahead of schedule (actually 8 hours ahead of schedule) > =》【2017-10-21~2017-10-22】,But using kylinUI build without question, the > actual view kylin code, found that is because the kylin on the server side > source code for configuration of GMT rather than use the timezone of fixed > GMT + 8, and front-end UI will according to the configuration of the timezone > is transformed to the time of the page to add GMT + 8 time, then the back-end > to GMT + 8 time in into GMT time, so the kylinUI cube build time without > error, and using restAPI build cube time not making timezone 8 hours is not > accurate time difference problem。 > *for code:* > kylinProperties.js > this.getTimeZone = function () { > if (!this.timezone) { > this.timezone = this.getProperty("kylin.rest.timezone").trim(); > } > return this.timezone; > } > org.apache.kylin.cube.CubeSegment > public static String makeSegmentName(long startDate, long endDate, long > startOffset, long endOffset) { > if (startOffset != 0 || endOffset != 0) { > if (startOffset == 0 && (endOffset == 0 || endOffset == > Long.MAX_VALUE)) { > return "FULL_BUILD"; > } > return startOffset + "_" + endOffset; > } > // using time > SimpleDateFormat dateFormat = new SimpleDateFormat("MMddHHmmss"); > dateFormat.setTimeZone(TimeZone.getTimeZone("GMT")); > return dateFormat.format(startDate) + "_" + dateFormat.format(endDate); > } > org.apache.kylin.common.util.DateFormat: > public static FastDateFormat getDateFormat(String datePattern) { > FastDateFormat r = formatMap.get(datePattern); > if (r == null) { > r = FastDateFormat.getInstance(datePattern, > TimeZone.getTimeZone("GMT")); // NOTE: this must be GMT to calculate epoch > date correctly > formatMap.put(datePattern, r); > } > return r; > } -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2952) dynamic cube build for time(statTime and endTime)
[ https://issues.apache.org/jira/browse/KYLIN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212479#comment-16212479 ] wenxue lin commented on KYLIN-2952: --- yes,but my kylin config timezone is gmt+8, and my runtime's timezone is gmt+8;However, kylin builds cube's time zone by GMT, resulting in a partition time error that builds the cube through the rest API > dynamic cube build for time(statTime and endTime) > - > > Key: KYLIN-2952 > URL: https://issues.apache.org/jira/browse/KYLIN-2952 > Project: Kylin > Issue Type: Improvement > Components: REST Service >Affects Versions: v1.6.0 > Environment: linux >Reporter: wenxue lin >Assignee: Zhong,Jason >Priority: Minor > > ex => curl -X PUT -u "ADMIN:KYLIN" -H > "Content-Type:application/json;charset=utf-8" -d > '{"startTime":150860160,"endTime":150868800,"buildType":"BUILD"}' > http://host:port/kylin/api/cubes/bi_dispatch_waiting_service_cube/rebuild > desc : > rest api param is startTime:150860160(2017-10-22) and > endTime:150868800(2017-10-23), but the actual time of building the cube > is 1 day ahead of schedule (actually 8 hours ahead of schedule) > =》【2017-10-21~2017-10-22】,But using kylinUI build without question, the > actual view kylin code, found that is because the kylin on the server side > source code for configuration of GMT rather than use the timezone of fixed > GMT + 8, and front-end UI will according to the configuration of the timezone > is transformed to the time of the page to add GMT + 8 time, then the back-end > to GMT + 8 time in into GMT time, so the kylinUI cube build time without > error, and using restAPI build cube time not making timezone 8 hours is not > accurate time difference problem。 > *for code:* > kylinProperties.js > this.getTimeZone = function () { > if (!this.timezone) { > this.timezone = this.getProperty("kylin.rest.timezone").trim(); > } > return this.timezone; > } > org.apache.kylin.cube.CubeSegment > public static String makeSegmentName(long startDate, long endDate, long > startOffset, long endOffset) { > if (startOffset != 0 || endOffset != 0) { > if (startOffset == 0 && (endOffset == 0 || endOffset == > Long.MAX_VALUE)) { > return "FULL_BUILD"; > } > return startOffset + "_" + endOffset; > } > // using time > SimpleDateFormat dateFormat = new SimpleDateFormat("MMddHHmmss"); > dateFormat.setTimeZone(TimeZone.getTimeZone("GMT")); > return dateFormat.format(startDate) + "_" + dateFormat.format(endDate); > } > org.apache.kylin.common.util.DateFormat: > public static FastDateFormat getDateFormat(String datePattern) { > FastDateFormat r = formatMap.get(datePattern); > if (r == null) { > r = FastDateFormat.getInstance(datePattern, > TimeZone.getTimeZone("GMT")); // NOTE: this must be GMT to calculate epoch > date correctly > formatMap.put(datePattern, r); > } > return r; > } -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2952) dynamic cube build for time(statTime and endTime)
[ https://issues.apache.org/jira/browse/KYLIN-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212493#comment-16212493 ] Billy Liu commented on KYLIN-2952: -- The issue is clear. epoch has no timezone itself. It's just a number. The timezone only works for some GUI display, not the actual data. Plesae use GMT for API parameter. > dynamic cube build for time(statTime and endTime) > - > > Key: KYLIN-2952 > URL: https://issues.apache.org/jira/browse/KYLIN-2952 > Project: Kylin > Issue Type: Improvement > Components: REST Service >Affects Versions: v1.6.0 > Environment: linux >Reporter: wenxue lin >Assignee: Zhong,Jason >Priority: Minor > > ex => curl -X PUT -u "ADMIN:KYLIN" -H > "Content-Type:application/json;charset=utf-8" -d > '{"startTime":150860160,"endTime":150868800,"buildType":"BUILD"}' > http://host:port/kylin/api/cubes/bi_dispatch_waiting_service_cube/rebuild > desc : > rest api param is startTime:150860160(2017-10-22) and > endTime:150868800(2017-10-23), but the actual time of building the cube > is 1 day ahead of schedule (actually 8 hours ahead of schedule) > =》【2017-10-21~2017-10-22】,But using kylinUI build without question, the > actual view kylin code, found that is because the kylin on the server side > source code for configuration of GMT rather than use the timezone of fixed > GMT + 8, and front-end UI will according to the configuration of the timezone > is transformed to the time of the page to add GMT + 8 time, then the back-end > to GMT + 8 time in into GMT time, so the kylinUI cube build time without > error, and using restAPI build cube time not making timezone 8 hours is not > accurate time difference problem。 > *for code:* > kylinProperties.js > this.getTimeZone = function () { > if (!this.timezone) { > this.timezone = this.getProperty("kylin.rest.timezone").trim(); > } > return this.timezone; > } > org.apache.kylin.cube.CubeSegment > public static String makeSegmentName(long startDate, long endDate, long > startOffset, long endOffset) { > if (startOffset != 0 || endOffset != 0) { > if (startOffset == 0 && (endOffset == 0 || endOffset == > Long.MAX_VALUE)) { > return "FULL_BUILD"; > } > return startOffset + "_" + endOffset; > } > // using time > SimpleDateFormat dateFormat = new SimpleDateFormat("MMddHHmmss"); > dateFormat.setTimeZone(TimeZone.getTimeZone("GMT")); > return dateFormat.format(startDate) + "_" + dateFormat.format(endDate); > } > org.apache.kylin.common.util.DateFormat: > public static FastDateFormat getDateFormat(String datePattern) { > FastDateFormat r = formatMap.get(datePattern); > if (r == null) { > r = FastDateFormat.getInstance(datePattern, > TimeZone.getTimeZone("GMT")); // NOTE: this must be GMT to calculate epoch > date correctly > formatMap.put(datePattern, r); > } > return r; > } -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2929) speed up Dump file performance
[ https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213540#comment-16213540 ] liyang commented on KYLIN-2929: --- nice work~ > speed up Dump file performance > -- > > Key: KYLIN-2929 > URL: https://issues.apache.org/jira/browse/KYLIN-2929 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Labels: Performance > Attachments: > 0002-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch > > > when I work on KYLIN-2926, I find coprocessor will dump to disk once > estimatedMemSize is bigger than spillThreshold, and found that spill data > size is extraordinary smaller that estimatedMemSize, in my case dump file > size is about 8MB and spillThreshold is setting to 3GB. > So, I try to keep the spill data in memory rather than write the file to disk > immediately, and when those in-memory spill data reach the threshold, write > all spill files together. > In my case, the coprocessor process cost time drop from 22s to 16s, it is > about 30% upgrade。 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2794) MultipleDictionaryValueEnumerator should output values in sorted order
[ https://issues.apache.org/jira/browse/KYLIN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213567#comment-16213567 ] liyang commented on KYLIN-2794: --- How to reproduce your attempt? Kylin 2.2 is not released yet. Please provide more info like a stacktrace. > MultipleDictionaryValueEnumerator should output values in sorted order > -- > > Key: KYLIN-2794 > URL: https://issues.apache.org/jira/browse/KYLIN-2794 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v2.0.0 > Environment: hadoop hadoop-2.6.0-cdh5.8.2 hive 2.1 hbase 0.98 >Reporter: 翟玉勇 >Assignee: Yifei Wu >Priority: Critical > Labels: scope > Fix For: v2.2.0 > > > Dictionary exception during merge of segments. > {code} > 2017-08-18 14:17:48,828 ERROR [pool-11-thread-1] > threadpool.DistributedScheduler:188 : ExecuteException > job:8d031b5f-2d3f-445f-a62b-7bc560d919ea in server: ** > org.apache.kylin.job.exception.ExecuteException: > org.apache.kylin.job.exception.ExecuteException: > java.lang.IllegalStateException: Invalid input data. Unordered data cannot be > split into multi trees > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:134) > at > org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:185) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.kylin.job.exception.ExecuteException: > java.lang.IllegalStateException: Invalid input data. Unordered data cannot be > split into multi trees > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:134) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:124) > ... 4 more > Caused by: java.lang.IllegalStateException: Invalid input data. Unordered > data cannot be split into multi trees > at > org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:92) > at > org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:78) > at > org.apache.kylin.dict.DictionaryGenerator$StringTrieDictForestBuilder.addValue(DictionaryGenerator.java:212) > at > org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:79) > at > org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:64) > at > org.apache.kylin.dict.DictionaryGenerator.mergeDictionaries(DictionaryGenerator.java:104) > at > org.apache.kylin.dict.DictionaryManager.mergeDictionary(DictionaryManager.java:267) > at > org.apache.kylin.engine.mr.steps.MergeDictionaryStep.mergeDictionaries(MergeDictionaryStep.java:146) > at > org.apache.kylin.engine.mr.steps.MergeDictionaryStep.makeDictForNewSegment(MergeDictionaryStep.java:136) > at > org.apache.kylin.engine.mr.steps.MergeDictionaryStep.doWork(MergeDictionaryStep.java:68) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:124) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2932) Simplify the thread model for in-memory cubing
[ https://issues.apache.org/jira/browse/KYLIN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213614#comment-16213614 ] liyang commented on KYLIN-2932: --- Sounds good! > Simplify the thread model for in-memory cubing > -- > > Key: KYLIN-2932 > URL: https://issues.apache.org/jira/browse/KYLIN-2932 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang Ken >Assignee: Wang Ken > > The current implementation uses split threads, task threads and main thread > to do the cube building, there is complex join and error handling logic. > The new implement leverages the ForkJoinPool from JDK, the event split logic > is handled in > main thread. Cuboid task and sub-tasks are handled in fork join pool, cube > results are collected > async and can be write to output earlier. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2903) support cardinality calculation for Hive view
[ https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213625#comment-16213625 ] liyang commented on KYLIN-2903: --- What's the plan here? Materialize the Hive view first? > support cardinality calculation for Hive view > - > > Key: KYLIN-2903 > URL: https://issues.apache.org/jira/browse/KYLIN-2903 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > Currently, Kylin leverage HCatlog to calculate column cardinality for Hive > tables. While, HCatlog does not support Hive view actually. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2943) 关于kylin设置queuename遇到的问题
[ https://issues.apache.org/jira/browse/KYLIN-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyang updated KYLIN-2943: -- Labels: scope (was: ) > 关于kylin设置queuename遇到的问题 > --- > > Key: KYLIN-2943 > URL: https://issues.apache.org/jira/browse/KYLIN-2943 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v2.1.0 > Environment: CDH5.7 >Reporter: wang20170707 >Assignee: Dong Li > Labels: scope > > 在CDH5.7环境下新搭建kylin2.1环境 > 1.需要修改队列名称,具体操作如下 > 在kylin.properties文件配置队列参数如下: > kylin.source.kylin.client=beeline > kylin.engine.mr.config-override.mapreduce.job.queuename=cbasQueue > kylin.source.hive.config-override.mapreduce.job.queuename=cbasQueue > 在kylin_hive_conf.xml里面配 > mapred.job.queue.name > 在 > kylin_job_conf_inmem.xml > kylin_job_conf.xml文件里配 > mapreduce.job.queuename=cbasQueue > 2.修改好配置后重启kylin,在load hive table时,运行的mr任务报错,详情如下: > User: bdcbas1234 > Name: Kylin Hive Column Cardinality Job table=CBAS_DB.S_WEB_APP_PAGE > output=hdfs://zhcbdpII/bdcbasApp/kylin/bdcbasApp-kylin_metadata/cardinality/5ae93815-0f53-412e-bb53-07c194c873d7/CBAS_DB.S_WEB_APP_PAGE > Application Type: MAPREDUCE > Application Tags: > State:FAILED > FinalStatus: FAILED > Started: 星期二 十月 17 16:25:47 +0800 2017 > Elapsed: 8sec > Tracking URL: History > Diagnostics: > Application application_1507516583223_4851 failed 2 times due to AM Container > for appattempt_1507516583223_4851_02 exited with exitCode: 1 due to: > Exception from container-launch. > Container id: container_1507516583223_4851_02_01 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) > at org.apache.hadoop.util.Shell.run(Shell.java:455) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:290) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Shell output: main : command provided 1 > main : user is bdcbas1234 > main : requested yarn user is bdcbas1234 > Container exited with a non-zero exit code 1 > .Failing this attempt.. Failing the application. > 3.在BULID CUBE时,在第2步失败: > #1 Step Name: Create Intermediate Flat Hive Table > 执行成功,mr任务使用的queue为配置后的cbasQueue > #2 Step Name: Redistribute Flat Hive Table > 执行失败,mr任务select count(*) from > ...6d_9935_49f0c9600e38(Stage-1)使用的queue为配置前的default > 4.将kylin.properties文件里的参数 > kylin.source.kylin.client修改为cli > 执行load hive table错误与beeline相同 > 5.在BULID CUBE时,在第3步失败 > #1 Step Name: Create Intermediate Flat Hive Table > 执行成功,与beeline相同 > #2 Step Name: Redistribute Flat Hive Table > 执行成功,后台没有运行mr任务,kylin日志打印如下 > Row count is 0, no need to redistribute > #3 Step Name: Extract Fact Table Distinct Columns > 执行失败,日志如下: > User: bdcbas1234 > Name: Kylin_Fact_Distinct_Columns_w_cub_source_Step > Application Type: MAPREDUCE > Application Tags: > State:FAILED > FinalStatus: FAILED > Started: 星期二 十月 17 13:52:40 +0800 2017 > Elapsed: 26sec > Tracking URL: History > Diagnostics: > Application application_1507516583223_4639 failed 2 times due to AM Container > for appattempt_1507516583223_4639_02 exited with exitCode: 1 due to: > Exception from container-launch. > Container id: container_1507516583223_4639_02_01 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) > at org.apache.hadoop.util.Shell.run(Shell.java:455) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:290) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Shell output: main : command provided 1 > main : user is bdcbas1234 > main : reque
[jira] [Commented] (KYLIN-2764) Build the dict for UHC column with MR
[ https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213631#comment-16213631 ] liyang commented on KYLIN-2764: --- Understood that this is 2-level reducer pattern, to control the total records one reducer receives and improves performance. Let's move on this patch. > Build the dict for UHC column with MR > - > > Key: KYLIN-2764 > URL: https://issues.apache.org/jira/browse/KYLIN-2764 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v2.0.0 >Reporter: kangkaisen >Assignee: kangkaisen > Attachments: job-memory-after.png, job-memory-before.png > > > KYLIN-2217 has built dict for normal column with MR, but the UHC column > still build dict in JobServer. Like KYLIN-2217, we also could use MR build > dict for UHC column. which could thoroughly release the memory pressure and > improve job concurrent for JobServer as well as speed up multi UHC columns > procedure. > The MR input is the output of "Extract Fact Table Distinct Columns", the MR > output is the UHC column dict. Because it is very hard build global dict with > multi reducers, I use one reducer handle one UHC column and allocate enough > memory to the reducer. According to my test, 8G memory is enough. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2944) HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in serialize()
[ https://issues.apache.org/jira/browse/KYLIN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213638#comment-16213638 ] liyang commented on KYLIN-2944: --- Hold on a bit... by creating new objects, the increased may cause performance decrease in many places. Have we tested the performance difference at least? > HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in > serialize() > > > Key: KYLIN-2944 > URL: https://issues.apache.org/jira/browse/KYLIN-2944 > Project: Kylin > Issue Type: Bug >Reporter: Shaofeng SHI >Assignee: Shaofeng SHI > Fix For: v2.2.0 > > > This is a bug, which causing incorrect query result. See more in KYLIN-2926 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2944) HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in serialize()
[ https://issues.apache.org/jira/browse/KYLIN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213640#comment-16213640 ] liyang commented on KYLIN-2944: --- If the consumer want new objects, the consumer can always clone the measure objects by itself. > HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in > serialize() > > > Key: KYLIN-2944 > URL: https://issues.apache.org/jira/browse/KYLIN-2944 > Project: Kylin > Issue Type: Bug >Reporter: Shaofeng SHI >Assignee: Shaofeng SHI > Fix For: v2.2.0 > > > This is a bug, which causing incorrect query result. See more in KYLIN-2926 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KYLIN-2944) HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in serialize()
[ https://issues.apache.org/jira/browse/KYLIN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213638#comment-16213638 ] liyang edited comment on KYLIN-2944 at 10/21/17 1:32 AM: - Hold on a bit... by creating new objects, the increased GC may impact performance at all places where the serializers are used. Have we tested the performance difference at least? was (Author: liyang.g...@gmail.com): Hold on a bit... by creating new objects, the increased may cause performance decrease in many places. Have we tested the performance difference at least? > HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in > serialize() > > > Key: KYLIN-2944 > URL: https://issues.apache.org/jira/browse/KYLIN-2944 > Project: Kylin > Issue Type: Bug >Reporter: Shaofeng SHI >Assignee: Shaofeng SHI > Fix For: v2.2.0 > > > This is a bug, which causing incorrect query result. See more in KYLIN-2926 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KYLIN-2944) HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in serialize()
[ https://issues.apache.org/jira/browse/KYLIN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213638#comment-16213638 ] liyang edited comment on KYLIN-2944 at 10/21/17 1:34 AM: - Hold on a bit... by creating new objects, the increased GC may impact performance at all places where the serializers are used. Have we tested the performance difference at least? I recall a benchmark where reusing ArrayList vs new ArrayList on every record yields 5% performance difference. was (Author: liyang.g...@gmail.com): Hold on a bit... by creating new objects, the increased GC may impact performance at all places where the serializers are used. Have we tested the performance difference at least? > HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in > serialize() > > > Key: KYLIN-2944 > URL: https://issues.apache.org/jira/browse/KYLIN-2944 > Project: Kylin > Issue Type: Bug >Reporter: Shaofeng SHI >Assignee: Shaofeng SHI > Fix For: v2.2.0 > > > This is a bug, which causing incorrect query result. See more in KYLIN-2926 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213750#comment-16213750 ] liyang commented on KYLIN-2926: --- As mentioned in KYLIN-2944, I do have concern here. The sharing of object is for performance sake, the same reason as Mapper and Reducer reusing objects in their interfaces. I'm doing a quick benchmark to see the performance difference. > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch > > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC and raw, which use current variable in > deserialize. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2944) HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in serialize()
[ https://issues.apache.org/jira/browse/KYLIN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213766#comment-16213766 ] liyang commented on KYLIN-2944: --- Added a performance test case and result shows, for HLLCSerializer, returning shared object or not DON'T impact performance. I've no concern now. > HLLCSerializer, RawSerializer, PercentileSerializer returns shared object in > serialize() > > > Key: KYLIN-2944 > URL: https://issues.apache.org/jira/browse/KYLIN-2944 > Project: Kylin > Issue Type: Bug >Reporter: Shaofeng SHI >Assignee: Shaofeng SHI > Fix For: v2.2.0 > > > This is a bug, which causing incorrect query result. See more in KYLIN-2926 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213767#comment-16213767 ] liyang commented on KYLIN-2926: --- Added a performance test case and result shows, for HLLCSerializer, returning shared object or not DON'T impact performance. I've no concern now. > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch > > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC and raw, which use current variable in > deserialize. -- This message was sent by Atlassian JIRA (v6.4.14#64029)