a problem of build cube
date like : rowkey cate1 cate2 cate3 cate4 111234 11123null 1112nullnull dimentions: cate1,cate2,cate3,cate4 distinct count/global dictionary: rowkey when build cube ,this 3 lines date can be build to 3 lines or 1 line ? wang...@snqu.com
Re: Re: kylin 1.6 supports SQL right join
excute sql: (select cate1 from kylinlabel.USER_TAG group by cate1) as t5 left join (select t3.CATE1,count(*) from (select CATE1 ,CATE2 from kylinlabel.USER_TAG group by CATE1 ,CATE2) as t3 group by t3.CATE1) as t4 on (t4.cate1=t5.cate1) get error: Encountered "as" at line 1, column 56. Was expecting one of: "ORDER" ... "LIMIT" ... "OFFSET" ... "FETCH" ... "UNION" ... "INTERSECT" ... "EXCEPT" ... "NOT" ... "IN" ... "BETWEEN" ... "LIKE" ... "SIMILAR" ... "=" ... ">" ... "<" ... "<=" ... ">=" ... "<>" ... "+" ... "-" ... "*" ... "/" ... "||" ... "AND" ... "OR" ... "IS" ... "MEMBER" ... "SUBMULTISET" ... "MULTISET" ... "[" ... "OVERLAPS" ... "YEAR" ... "MONTH" ... "DAY" ... "HOUR" ... "MINUTE" ... "SECOND" ... wang...@snqu.com From: Billy Liu Date: 2016-12-21 12:19 To: dev Subject: Re: kylin 1.6 supports SQL right join Kylin supports left join and inner join. The right join could be rewritten into left join. Could you have a try? 2016-12-21 11:52 GMT+08:00 wang...@snqu.com : > Hi > when I excute the sql: > (select t3.CATE1,count(*) from (select CATE1 ,CATE2 from > kylinlabel.USER_TAG group by CATE1 ,CATE2) as t3 group by t3.CATE1) as t4 > right join > (select cate1 from kylinlabel.USER_TAG group by cate1) as t5 on > (t4.cate1=t5.cate1) > > I got the error: > Encountered "as" at line 1, column 129. Was expecting one of: > "ORDER" ... "LIMIT" ... "OFFSET" ... "FETCH" ... "UNION" ... "INTERSECT" > ... "EXCEPT" ... "NOT" ... "IN" ... "BETWEEN" ... "LIKE" ... "SIMILAR" ... > "=" ... ">" ... "<" ... "<=" ... ">=" ... "<>" ... "+" ... "-" ... "*" ... > "/" ... "||" ... "AND" ... "OR" ... "IS" ... "MEMBER" ... "SUBMULTISET" ... > "MULTISET" ... "[" ... "OVERLAPS" ... "YEAR" ... "MONTH" ... "DAY" ... > "HOUR" ... "MINUTE" ... "SECOND" ... > > I think the sql is right but cann't get the result, what can I do? > > > >
kylin 1.6 supports SQL right join
Hi when I excute the sql: (select t3.CATE1,count(*) from (select CATE1 ,CATE2 from kylinlabel.USER_TAG group by CATE1 ,CATE2) as t3 group by t3.CATE1) as t4 right join (select cate1 from kylinlabel.USER_TAG group by cate1) as t5 on (t4.cate1=t5.cate1) I got the error: Encountered "as" at line 1, column 129. Was expecting one of: "ORDER" ... "LIMIT" ... "OFFSET" ... "FETCH" ... "UNION" ... "INTERSECT" ... "EXCEPT" ... "NOT" ... "IN" ... "BETWEEN" ... "LIKE" ... "SIMILAR" ... "=" ... ">" ... "<" ... "<=" ... ">=" ... "<>" ... "+" ... "-" ... "*" ... "/" ... "||" ... "AND" ... "OR" ... "IS" ... "MEMBER" ... "SUBMULTISET" ... "MULTISET" ... "[" ... "OVERLAPS" ... "YEAR" ... "MONTH" ... "DAY" ... "HOUR" ... "MINUTE" ... "SECOND" ... I think the sql is right but cann't get the result, what can I do?
kylin 1.6.0 cardinality can't greater than 5000000 ?
I improved the version from 1.5.4.1 to 1.6.0 and modified KYLIN_HOME, and modied "kylin.dictionary.max.cardinality=500" to "kylin.dictionary.max.cardinality=3000" in file kylin.properties, then start kylin 1.6-->create model-->create cube-->build cube I got the following error message: java.lang.RuntimeException: Failed to create dictionary on DEFAULT.TEST_500W_TBL.ROWKEY at org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:325) at org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:222) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:50) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:41) at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: Too high cardinality is not suitable for dictionary -- cardinality: 5359970 at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:96)
Re: Re: 使用全局字典报错AppendTrieDictionary can't retrive value from id
Thank you for your answer, I will try my best to use English to ask question. 发件人: Billy Liu 发送时间: 2016-12-08 09:15 收件人: dev 主题: Re: Re: 使用全局字典报错AppendTrieDictionary can't retrive value from id To use GlobalDictionaryBuilder, there are some benefits, but also some limitation. It's very useful for precise count by using bitmap dictionary, but could not preserve the value order. That means it supports equals operator, but not greater(less) than operator. Back to your case, you should define precise count in the Cube measure settings page(Return Type is Precise(bitmap actually), not hllc12 in your json definition). Since you are distinct count on column "NAME", please define Name for global dictionary. Currently, you have defined "ROWKEY","SEX","LOCAL","JOB", but without "NAME". Then it should work. By the way, you have so many ultra high cardinality dimensions, and put all of them as normal dimensions. It will cost too much resource for building and index storage. If you are caring about TopN requirement, you could define topN measure also, order by SUM(1), it will be more efficiency for "select label, count(label) from USERCASE_20161204 group by label order by label desc". If you could, please using English also. This is worldwide community, English is the better language for everyone. 2016-12-06 13:24 GMT+08:00 wang...@snqu.com : > Hi Roger > > cube的详细定义如下: > { > "uuid": "9a27b749-1989-482a-a3bb-6f4fe1f8f3a0", > "last_modified": 1480931382552, > "version": "1.6.0", > "name": "dmp_cube_590w", > "model_name": "dmp_model_590w", > "description": "", > "null_string": null, > "dimensions": [ > { > "name": "ROWKEY", > "table": "DEFAULT.USERCASE_20161204", > "column": "ROWKEY", > "derived": null > }, > { > "name": "TIMESTAMP", > "table": "DEFAULT.USERCASE_20161204", > "column": "TIMESTAMP", > "derived": null > }, > { > "name": "NAME", > "table": "DEFAULT.USERCASE_20161204", > "column": "NAME", > "derived": null > }, > { > "name": "SEX", > "table": "DEFAULT.USERCASE_20161204", > "column": "SEX", > "derived": null > }, > { > "name": "LOCAL", > "table": "DEFAULT.USERCASE_20161204", > "column": "LOCAL", > "derived": null > }, > { > "name": "JOB", > "table": "DEFAULT.USERCASE_20161204", > "column": "JOB", > "derived": null > }, > { > "name": "LABEL", > "table": "DEFAULT.USERCASE_20161204", > "column": "LABEL", > "derived": null > } > ], > "measures": [ > { > "name": "_COUNT_", > "function": { > "expression": "COUNT", > "parameter": { > "type": "constant", > "value": "1", > "next_parameter": null > }, > "returntype": "bigint" > }, > "dependent_measure_ref": null > }, > { > "name": "CD_NAME", > "function": { > "expression": "COUNT_DISTINCT", > "parameter": { > "type": "column", > "value": "NAME", > "next_parameter": null > }, > "returntype": "hllc12" > }, > "dependent_measure_ref": null > } > ], > "dictionaries": [ > { > "column": "ROWKEY", > "builder": "org.apache.kylin.dict.GlobalDictionaryBuilder" > }, > { > "column": "SEX", > "builder": "org.apache.kylin.dict.GlobalDictionaryBuilder" > }, > { > "column": "LOCAL", > "builder": "org.apache
Re: Re: 使用全局字典报错AppendTrieDictionary can't retrive value from id
Hi Roger cube的详细定义如下: { "uuid": "9a27b749-1989-482a-a3bb-6f4fe1f8f3a0", "last_modified": 1480931382552, "version": "1.6.0", "name": "dmp_cube_590w", "model_name": "dmp_model_590w", "description": "", "null_string": null, "dimensions": [ { "name": "ROWKEY", "table": "DEFAULT.USERCASE_20161204", "column": "ROWKEY", "derived": null }, { "name": "TIMESTAMP", "table": "DEFAULT.USERCASE_20161204", "column": "TIMESTAMP", "derived": null }, { "name": "NAME", "table": "DEFAULT.USERCASE_20161204", "column": "NAME", "derived": null }, { "name": "SEX", "table": "DEFAULT.USERCASE_20161204", "column": "SEX", "derived": null }, { "name": "LOCAL", "table": "DEFAULT.USERCASE_20161204", "column": "LOCAL", "derived": null }, { "name": "JOB", "table": "DEFAULT.USERCASE_20161204", "column": "JOB", "derived": null }, { "name": "LABEL", "table": "DEFAULT.USERCASE_20161204", "column": "LABEL", "derived": null } ], "measures": [ { "name": "_COUNT_", "function": { "expression": "COUNT", "parameter": { "type": "constant", "value": "1", "next_parameter": null }, "returntype": "bigint" }, "dependent_measure_ref": null }, { "name": "CD_NAME", "function": { "expression": "COUNT_DISTINCT", "parameter": { "type": "column", "value": "NAME", "next_parameter": null }, "returntype": "hllc12" }, "dependent_measure_ref": null } ], "dictionaries": [ { "column": "ROWKEY", "builder": "org.apache.kylin.dict.GlobalDictionaryBuilder" }, { "column": "SEX", "builder": "org.apache.kylin.dict.GlobalDictionaryBuilder" }, { "column": "LOCAL", "builder": "org.apache.kylin.dict.GlobalDictionaryBuilder" }, { "column": "JOB", "builder": "org.apache.kylin.dict.GlobalDictionaryBuilder" } ], "rowkey": { "rowkey_columns": [ { "column": "ROWKEY", "encoding": "dict", "isShardBy": true }, { "column": "NAME", "encoding": "dict", "isShardBy": false }, { "column": "TIMESTAMP", "encoding": "dict", "isShardBy": false }, { "column": "SEX", "encoding": "dict", "isShardBy": false }, { "column": "LOCAL", "encoding": "dict", "isShardBy": false }, { "column": "JOB", "encoding": "dict", "isShardBy": false }, { "column": "LABEL", "encoding": "dict", "isShardBy": false } ] }, "hbase_mapping": { "column_family": [ { "name": "F1", "columns": [ { "qualifier": "M", "measure_refs": [ "_COUNT_" ] } ] }, { "name": "F2", "columns": [ { "qualifier": "M", "measure_refs": [ "CD_NAME" ] } ] } ] }, "aggregation_gr
回复: 使用全局字典报错AppendTrieDictionary can't retrive value from id
查看了kylin.properties文件中kylin.dictionary.max.cardinality=500修改为kylin.dictionary.max.cardinality=2000 同时修改cube, 添加rowkey,重新编译成功 但是查询时以下的两个语句可以成功 select label, count(label) from USERCASE_20161204 group by label order by label desc select name, count(name) from USERCASE_20161204 group by name order by name desc 以下的两个语句 select rowkey, count(rowkey) from USERCASE_20161204 group by rowkey order by rowkey desc select job, count(job) from USERCASE_20161204 group by job order by job des 执行后报错: Error while executing SQL "select rowkey, count(rowkey) from USERCASE_20161204 group by rowkey order by rowkey desc LIMIT 5": AppendTrieDictionary can't retrive value from id Error while executing SQL "select job, count(job) from USERCASE_20161204 group by job order by job desc LIMIT 5": AppendTrieDictionary can't retrive value from id cube的定义如下: { "uuid": "d4671695-96a1-4981-bb4c-2263de45f2ee", "last_modified": 1480939514479, "version": "1.6.0", "name": "dmp_cube_590w", "owner": "ADMIN", "descriptor": "dmp_cube_590w", "cost": 50, "status": "READY", "segments": [ { "uuid": "4237b09b-8d2e-4c5c-be19-afc67e6524f5", "name": "1970010100_2016120500", "storage_location_identifier": "KYLIN_IE2V4DQUY4", "date_range_start": 0, "date_range_end": 148089600, "source_offset_start": 0, "source_offset_end": 0, "status": "READY", "size_kb": 8180652, "input_records": 5978388, "input_records_size": 666419108, "last_build_time": 1480939514367, "last_build_job_id": "d563a6b8-c6cd-41c7-93c4-47bb319bf21b", "create_time_utc": 1480931400144, "cuboid_shard_nums": { "1": 2, "2": 2, "3": 3, "4": 2, "5": 3, "6": 3, "7": 4, "8": 2, "9": 3, "10": 3, "11": 4, "12": 3, "13": 4, "14": 4, "15": 5, "32": 2, "33": 3, "34": 3, "35": 4, "36": 3, "37": 4, "38": 4, "39": 5, "40": 3, "41": 4, "42": 4, "43": 5, "44": 4, "45": 5, "46": 5, "47": 6, "64": 6, "65": 6, "66": 6, "67": 6, "68": 6, "69": 6, "70": 6, "71": 6, "72": 6, "73": 6, "74": 6, "75": 6, "76": 6, "77": 6, "78": 6, "79": 6, "96": 6, "97": 6, "98": 6, "99": 6, "100": 6, "101": 6, "102": 6, "103": 6, "104": 6, "105": 6, "106": 6, "107": 6, "108": 6, "109": 6, "110": 6, "111": 6, "127": 6 }, "total_shards": 11, "blackout_cuboids": [], "binary_signature": null, "dictionaries": { "DEFAULT.USERCASE_20161204/SEX": "/dict/DEFAULT.USERCASE_20161204/SEX/17d36c0b-e7a7-4bb4-941f-47bc78a24751.dict", "DEFAULT.USERCASE_20161204/TIMESTAMP": "/dict/DEFAULT.USERCASE_20161204/TIMESTAMP/87ce791b-3de3-491f-901f-d28721a25e94.dict", "DEFAULT.USERCASE_20161204/NAME": "/dict/DEFAULT.USERCASE_20161204/NAME/73a59cfb-eaa5-4531-ba7e-16ba2adeaea9.dict", "DEFAULT.USERCASE_20161204/LABEL": "/dict/DEFAULT.USERCASE_20161204/LABEL/71c633ee-dffb-4d80-9844-768b6ee21782.dict", "DEFAULT.USERCASE_20161204/LOCAL": "/dict/DEFAULT.USERCASE_20161204/LOCAL/31ed5b68-aae2-40b7-ba09-83abf1d64953.dict", "DEFAULT.USERCASE_20161204/ROWKEY": "/dict/DEFAULT.USERCASE_20161204/ROWKEY/736822fd-5103-4814-bfcd-b6af80609970.dict", "DEFAULT.USERCASE_20161204/JOB": "/dict/DEFAULT.USERCASE_20161204/JOB/a47cc0f8-80ab-46fa-953a-59a326412395.dict" }, "snapshots": null, "index_path": "/kylin/kylin_metadata/kylin-d563a6b8-c6cd-41c7-93c4-47bb319bf21b/dmp_cube_590w/secondary_index/", "rowkey_stats": [ [ "ROWKEY", 5978389, 4 ], [ "NAME", 1195682, 3 ], [ "TIMESTAMP", 1, 1 ], [ "SEX", 1195680, 4 ], [ "LOCAL", 1195679, 4 ], [ "JOB", 1195679, 4 ], [ "LABEL", 1195676, 3 ] ] } ], "create_time_utc": 1480907805715, "size_kb": 8180652, "input_records_count": 5978388, "input_records_size": 666419108 } 发件人: wang...@snqu.com 发送时间: 2016-12-05 15:11 收件人: dev 主题: 使用全局字典报错AppendTrieDictionary can't retrive value from id hi, 每个维度的基数有590万,当在rowkey中选择dict时,编译产生错误: “Too high cardinality is not suitable for dictionary -- cardinality: 5978388“ 所以修改了model, 没有定义rowkey, 对所有维度定义了全局字典,build成功,查询时报错: “AppendTrieDictionary can't retrive value from id“
使用全局字典报错AppendTrieDictionary can't retrive value from id
hi, 每个维度的基数有590万,当在rowkey中选择dict时,编译产生错误: “Too high cardinality is not suitable for dictionary -- cardinality: 5978388“ 所以修改了model, 没有定义rowkey, 对所有维度定义了全局字典,build成功,查询时报错: “AppendTrieDictionary can't retrive value from id“
Re: Re: merge
thanks! I've found the reason. The MERGE END SEGMENT 2016112000-2016113000 if empty. when build MERGE END SEGMENT, there are some rows of date 20161130 in table, but the MERGE END SEGMENT build from rows between [20161120, 20161130), so the MERGE END SEGMENT is empty. wang...@snqu.com From: ShaoFeng Shi Date: 2016-12-01 10:19 To: dev Subject: Re: merge This warning is to remind user that there is one segment which has 0 records, we also call it as "empty" segment; Empty segment is allowed in Kylin by default; because it is possible that for a given time period there is no record in source table (especially in stream + tiny time window case); But empty segment might not be expected; it may indicate something went wrong in upstream work flow. User need to investigate and then refresh the segment. If you want Kylin fail the build job when there is 0 records, set " kylin.job.allow.empty.segment=false" in conf/kylin.properties After merge the empty segment with other segment, you will not be able to independently refresh that small time period; you have to refresh with the merged time period, which will cost more resource, that's why Kylin remind user here; If you think the empty segment is okay, you can forcely merge them. 2016-11-30 16:02 GMT+08:00 wang...@snqu.com : > hi, > the merge information is : > > PARTITION DATE COLUMN DEFAULT.DMP_USER.TIMESTAMP > MERGE START SEGMENT 2016111000-2016112000 > MERGE END SEGMENT 2016112000-2016113000 > START SEGMENT DETAIL Start Date(Include) 2016-11-10 00:00:00 > End Date(Exclude) 2016-11-20 00:00:00 > Last build Time 2016-11-30 14:18:49 GMT+8 > Last build ID b89bfb86-0395-4d55-8f8f-d25df3f59fdc > END SEGMENT DETAIL Start Date(Include) 2016-11-20 00:00:00 > End Date(Exclude) 2016-11-30 00:00:00 > Last build Time 2016-11-30 14:24:32 GMT+8 > Last build ID 26a55d70-9872-42de-8d93-96949a342729 > > > > > > wang...@snqu.com > > 发件人: wang...@snqu.com > 发送时间: 2016-11-30 15:28 > 收件人: dev > 主题: merge > Hi, > I have a problem about merge > when I do merge, the information is following: > the two segments are build from the same cube and one same table, after > click "submit' I got > “empty cube segment found:[2016112000_2016113000, do you want to > merge segments forcely?” > > > > wang...@snqu.com > -- Best regards, Shaofeng Shi 史少锋
回复: merge
hi, the merge information is : PARTITION DATE COLUMN DEFAULT.DMP_USER.TIMESTAMP MERGE START SEGMENT 2016111000-2016112000 MERGE END SEGMENT 2016112000-2016113000 START SEGMENT DETAIL Start Date(Include) 2016-11-10 00:00:00 End Date(Exclude) 2016-11-20 00:00:00 Last build Time 2016-11-30 14:18:49 GMT+8 Last build ID b89bfb86-0395-4d55-8f8f-d25df3f59fdc END SEGMENT DETAIL Start Date(Include) 2016-11-20 00:00:00 End Date(Exclude) 2016-11-30 00:00:00 Last build Time 2016-11-30 14:24:32 GMT+8 Last build ID 26a55d70-9872-42de-8d93-96949a342729 wang...@snqu.com 发件人: wang...@snqu.com 发送时间: 2016-11-30 15:28 收件人: dev 主题: merge Hi, I have a problem about merge when I do merge, the information is following: the two segments are build from the same cube and one same table, after click "submit' I got “empty cube segment found:[2016112000_2016113000, do you want to merge segments forcely?” wang...@snqu.com
merge
Hi, I have a problem about merge when I do merge, the information is following: the two segments are build from the same cube and one same table, after click "submit' I got “empty cube segment found:[2016112000_2016113000, do you want to merge segments forcely?” wang...@snqu.com
use RESTful API to create cube and model
Can use RESTful API to create cube and model? In RESTful API user manual, thers is no API to create cube and model. <http://kylin.apache.org/docs15/howto/howto_use_restapi.html> wang...@snqu.com
kylin supports complex data type?
Does Kylin 1.5.4 support complex data type(Map, array, struct) in Hive? wang...@snqu.com