Thank you for your answer, I will try my best to use English to  ask question.

发件人: Billy Liu
发送时间: 2016-12-08 09:15
收件人: dev
主题: Re: Re: 使用全局字典报错AppendTrieDictionary can't retrive value from id
To use GlobalDictionaryBuilder, there are some benefits, but also some
limitation. It's very useful for precise count by using bitmap dictionary,
but could not preserve the value order. That means it supports equals
operator, but not greater(less) than operator.
 
Back to your case, you should define precise count in the Cube measure
settings page(Return Type is Precise(bitmap actually), not  hllc12 in your
json definition).
 
Since you are distinct count on column "NAME", please define Name for
global dictionary. Currently, you have defined
"ROWKEY","SEX","LOCAL","JOB", but without "NAME".
 
Then it should work.
 
By the way, you have so many ultra high cardinality dimensions, and put all
of them as normal dimensions. It will cost too much resource for building
and index storage. If you are caring about TopN requirement, you could
define topN measure also, order by SUM(1), it will be more efficiency
for "select
label, count(label) from USERCASE_20161204 group by label order by label
desc".
 
If you could, please using English also. This is worldwide community,
English is the better language for everyone.
 
2016-12-06 13:24 GMT+08:00 wang...@snqu.com <wang...@snqu.com>:
 
> Hi Roger
>
> cube的详细定义如下:
> {
>   "uuid": "9a27b749-1989-482a-a3bb-6f4fe1f8f3a0",
>   "last_modified": 1480931382552,
>   "version": "1.6.0",
>   "name": "dmp_cube_590w",
>   "model_name": "dmp_model_590w",
>   "description": "",
>   "null_string": null,
>   "dimensions": [
>     {
>       "name": "ROWKEY",
>       "table": "DEFAULT.USERCASE_20161204",
>       "column": "ROWKEY",
>       "derived": null
>     },
>     {
>       "name": "TIMESTAMP",
>       "table": "DEFAULT.USERCASE_20161204",
>       "column": "TIMESTAMP",
>       "derived": null
>     },
>     {
>       "name": "NAME",
>       "table": "DEFAULT.USERCASE_20161204",
>       "column": "NAME",
>       "derived": null
>     },
>     {
>       "name": "SEX",
>       "table": "DEFAULT.USERCASE_20161204",
>       "column": "SEX",
>       "derived": null
>     },
>     {
>       "name": "LOCAL",
>       "table": "DEFAULT.USERCASE_20161204",
>       "column": "LOCAL",
>       "derived": null
>     },
>     {
>       "name": "JOB",
>       "table": "DEFAULT.USERCASE_20161204",
>       "column": "JOB",
>       "derived": null
>     },
>     {
>       "name": "LABEL",
>       "table": "DEFAULT.USERCASE_20161204",
>       "column": "LABEL",
>       "derived": null
>     }
>   ],
>   "measures": [
>     {
>       "name": "_COUNT_",
>       "function": {
>         "expression": "COUNT",
>         "parameter": {
>           "type": "constant",
>           "value": "1",
>           "next_parameter": null
>         },
>         "returntype": "bigint"
>       },
>       "dependent_measure_ref": null
>     },
>     {
>       "name": "CD_NAME",
>       "function": {
>         "expression": "COUNT_DISTINCT",
>         "parameter": {
>           "type": "column",
>           "value": "NAME",
>           "next_parameter": null
>         },
>         "returntype": "hllc12"
>       },
>       "dependent_measure_ref": null
>     }
>   ],
>   "dictionaries": [
>     {
>       "column": "ROWKEY",
>       "builder": "org.apache.kylin.dict.GlobalDictionaryBuilder"
>     },
>     {
>       "column": "SEX",
>       "builder": "org.apache.kylin.dict.GlobalDictionaryBuilder"
>     },
>     {
>       "column": "LOCAL",
>       "builder": "org.apache.kylin.dict.GlobalDictionaryBuilder"
>     },
>     {
>       "column": "JOB",
>       "builder": "org.apache.kylin.dict.GlobalDictionaryBuilder"
>     }
>   ],
>   "rowkey": {
>     "rowkey_columns": [
>       {
>         "column": "ROWKEY",
>         "encoding": "dict",
>         "isShardBy": true
>       },
>       {
>         "column": "NAME",
>         "encoding": "dict",
>         "isShardBy": false
>       },
>       {
>         "column": "TIMESTAMP",
>         "encoding": "dict",
>         "isShardBy": false
>       },
>       {
>         "column": "SEX",
>         "encoding": "dict",
>         "isShardBy": false
>       },
>       {
>         "column": "LOCAL",
>         "encoding": "dict",
>         "isShardBy": false
>       },
>       {
>         "column": "JOB",
>         "encoding": "dict",
>         "isShardBy": false
>       },
>       {
>         "column": "LABEL",
>         "encoding": "dict",
>         "isShardBy": false
>       }
>     ]
>   },
>   "hbase_mapping": {
>     "column_family": [
>       {
>         "name": "F1",
>         "columns": [
>           {
>             "qualifier": "M",
>             "measure_refs": [
>               "_COUNT_"
>             ]
>           }
>         ]
>       },
>       {
>         "name": "F2",
>         "columns": [
>           {
>             "qualifier": "M",
>             "measure_refs": [
>               "CD_NAME"
>             ]
>           }
>         ]
>       }
>     ]
>   },
>   "aggregation_groups": [
>     {
>       "includes": [
>         "ROWKEY",
>         "NAME",
>         "SEX",
>         "LOCAL",
>         "JOB",
>         "LABEL"
>       ],
>       "select_rule": {
>         "hierarchy_dims": [],
>         "mandatory_dims": [],
>         "joint_dims": []
>       }
>     }
>   ],
>   "signature": "wsatHc0Jx6dju5vYyzsgAw==",
>   "notify_list": [],
>   "status_need_notify": [
>     "ERROR",
>     "DISCARDED",
>     "SUCCEED"
>   ],
>   "partition_date_start": 0,
>   "partition_date_end": 3153600000000,
>   "auto_merge_time_ranges": [
>     604800000,
>     2419200000
>   ],
>   "retention_range": 0,
>   "engine_type": 2,
>   "storage_type": 2,
>   "override_kylin_properties": {}
> }
>
> Thanks,
> wangdan
>
> 发件人: roger shi
> 发送时间: 2016-12-06 10:34
> 收件人: dev@kylin.apache.org
> 主题: Re: 回复: 使用全局字典报错AppendTrieDictionary can't retrive value from id
> Hi wangdan,
>
> Would you please attach the cube desc json? The cube definition in
> previous email is cube instance json.
>
> Thanks,
> Roger
>
> On 06/12/2016, 10:20 AM, "wang...@snqu.com" <wang...@snqu.com> wrote:
>
>     查看了kylin.properties文件中kylin.dictionary.max.cardinality=
> 5000000修改为kylin.dictionary.max.cardinality=20000000 同时修改cube,
> 添加rowkey,重新编译成功
>     但是查询时以下的两个语句可以成功
>     select label, count(label) from USERCASE_20161204 group by label order
> by label desc
>     select name, count(name) from USERCASE_20161204 group by name order by
> name desc
>
>     以下的两个语句
>     select rowkey, count(rowkey) from USERCASE_20161204 group by rowkey
> order by rowkey desc
>     select job, count(job) from USERCASE_20161204 group by job order by
> job des
>     执行后报错:
>     Error while executing SQL "select rowkey, count(rowkey) from
> USERCASE_20161204 group by rowkey order by rowkey desc LIMIT 50000":
> AppendTrieDictionary can't retrive value from id
>     Error while executing SQL "select job, count(job) from
> USERCASE_20161204 group by job order by job desc LIMIT 50000":
> AppendTrieDictionary can't retrive value from id
>
>     cube的定义如下:
>     {
>       "uuid": "d4671695-96a1-4981-bb4c-2263de45f2ee",
>       "last_modified": 1480939514479,
>       "version": "1.6.0",
>       "name": "dmp_cube_590w",
>       "owner": "ADMIN",
>       "descriptor": "dmp_cube_590w",
>       "cost": 50,
>       "status": "READY",
>       "segments": [
>         {
>           "uuid": "4237b09b-8d2e-4c5c-be19-afc67e6524f5",
>           "name": "19700101000000_20161205000000",
>           "storage_location_identifier": "KYLIN_IE2V4DQUY4",
>           "date_range_start": 0,
>           "date_range_end": 1480896000000,
>           "source_offset_start": 0,
>           "source_offset_end": 0,
>           "status": "READY",
>           "size_kb": 8180652,
>           "input_records": 5978388,
>           "input_records_size": 666419108,
>           "last_build_time": 1480939514367,
>           "last_build_job_id": "d563a6b8-c6cd-41c7-93c4-47bb319bf21b",
>           "create_time_utc": 1480931400144,
>           "cuboid_shard_nums": {
>             "1": 2,
>             "2": 2,
>             "3": 3,
>             "4": 2,
>             "5": 3,
>             "6": 3,
>             "7": 4,
>             "8": 2,
>             "9": 3,
>             "10": 3,
>             "11": 4,
>             "12": 3,
>             "13": 4,
>             "14": 4,
>             "15": 5,
>             "32": 2,
>             "33": 3,
>             "34": 3,
>             "35": 4,
>             "36": 3,
>             "37": 4,
>             "38": 4,
>             "39": 5,
>             "40": 3,
>             "41": 4,
>             "42": 4,
>             "43": 5,
>             "44": 4,
>             "45": 5,
>             "46": 5,
>             "47": 6,
>             "64": 6,
>             "65": 6,
>             "66": 6,
>             "67": 6,
>             "68": 6,
>             "69": 6,
>             "70": 6,
>             "71": 6,
>             "72": 6,
>             "73": 6,
>             "74": 6,
>             "75": 6,
>             "76": 6,
>             "77": 6,
>             "78": 6,
>             "79": 6,
>             "96": 6,
>             "97": 6,
>             "98": 6,
>             "99": 6,
>             "100": 6,
>             "101": 6,
>             "102": 6,
>             "103": 6,
>             "104": 6,
>             "105": 6,
>             "106": 6,
>             "107": 6,
>             "108": 6,
>             "109": 6,
>             "110": 6,
>             "111": 6,
>             "127": 6
>           },
>           "total_shards": 11,
>           "blackout_cuboids": [],
>           "binary_signature": null,
>           "dictionaries": {
>             "DEFAULT.USERCASE_20161204/SEX": "/dict/DEFAULT.USERCASE_
> 20161204/SEX/17d36c0b-e7a7-4bb4-941f-47bc78a24751.dict",
>             "DEFAULT.USERCASE_20161204/TIMESTAMP":
> "/dict/DEFAULT.USERCASE_20161204/TIMESTAMP/87ce791b-
> 3de3-491f-901f-d28721a25e94.dict",
>             "DEFAULT.USERCASE_20161204/NAME": "/dict/DEFAULT.USERCASE_
> 20161204/NAME/73a59cfb-eaa5-4531-ba7e-16ba2adeaea9.dict",
>             "DEFAULT.USERCASE_20161204/LABEL": "/dict/DEFAULT.USERCASE_
> 20161204/LABEL/71c633ee-dffb-4d80-9844-768b6ee21782.dict",
>             "DEFAULT.USERCASE_20161204/LOCAL": "/dict/DEFAULT.USERCASE_
> 20161204/LOCAL/31ed5b68-aae2-40b7-ba09-83abf1d64953.dict",
>             "DEFAULT.USERCASE_20161204/ROWKEY": "/dict/DEFAULT.USERCASE_
> 20161204/ROWKEY/736822fd-5103-4814-bfcd-b6af80609970.dict",
>             "DEFAULT.USERCASE_20161204/JOB": "/dict/DEFAULT.USERCASE_
> 20161204/JOB/a47cc0f8-80ab-46fa-953a-59a326412395.dict"
>           },
>           "snapshots": null,
>           "index_path": "/kylin/kylin_metadata/kylin-
> d563a6b8-c6cd-41c7-93c4-47bb319bf21b/dmp_cube_590w/secondary_index/",
>           "rowkey_stats": [
>             [
>               "ROWKEY",
>               5978389,
>               4
>             ],
>             [
>               "NAME",
>               1195682,
>               3
>             ],
>             [
>               "TIMESTAMP",
>               1,
>               1
>             ],
>             [
>               "SEX",
>               1195680,
>               4
>             ],
>             [
>               "LOCAL",
>               1195679,
>               4
>             ],
>             [
>               "JOB",
>               1195679,
>               4
>             ],
>             [
>               "LABEL",
>               1195676,
>               3
>             ]
>           ]
>         }
>       ],
>       "create_time_utc": 1480907805715,
>       "size_kb": 8180652,
>       "input_records_count": 5978388,
>       "input_records_size": 666419108
>     }
>
>     发件人: wang...@snqu.com
>     发送时间: 2016-12-05 15:11
>     收件人: dev
>     主题: 使用全局字典报错AppendTrieDictionary can't retrive value from id
>     hi,
>      每个维度的基数有590万,当在rowkey中选择dict时,编译产生错误:
>     “Too high cardinality is not suitable for dictionary -- cardinality:
> 5978388“
>
>     所以修改了model, 没有定义rowkey, 对所有维度定义了全局字典,build成功,查询时报错:
>     “AppendTrieDictionary can't retrive value from id“
>
>
>
>
>

Reply via email to