Re: Re: kylin前端业务查询问题

2018-01-16 Thread
The conf of ''kylin.query.timeout-seconds" may help to stop long query

在 2017年12月29日 下午3:13,chenping...@keruyun.com 写道:

> 多谢你的及时回复
>
> --
>
> 陈平  DBA工程师
>
>
>
> 成都时时客科技有限责任公司
>
> 地址:成都市高新区天府大道1268号1栋3层
>
> 邮编:610041
>
> 手机:15108456581
>
> 在线:QQ 625852056
>
> 官网:www.keruyun.com
>
> 客服:4006-315-666
>
>
>
>
> *发件人:* Joanna He 
> *发送时间:* 2017-12-29 15:05
> *收件人:* user 
> *主题:* Re: kylin前端业务查询问题
> Translation: Hello my question is when there are multiple queries running
> , how can I know what query is currently running. And how can I kill
> the long-running query?
>
> Answer:
> You can view your currently running query in logs/kylin.property under
> your kylin installation directory.
> There is no way to kill single query in kylin at the moment, the only way
> to stop the query is to stop and start the kylin server.
>
> 你可以在kylin安装路径下的logs/kylin.property 中查看当前在运行的查询。目前你无法在kylin中kill掉单独的查询,
> 只有靠重启kylin服务器来停止查询。
>
>
> 2017-12-29 14:59 GMT+08:00 chenping...@keruyun.com <
> chenping...@keruyun.com>:
>
>>
>> 各位好,我现在遇到一个比较大的问题,前端有很多查询同时过来,我想知道怎么去查看当前的kylin实例正在运行哪些查询并且怎
>> 么去kill掉运行时间很久的查询?
>>
>>
>>
>> --
>>
>> 陈平  DBA工程师
>>
>>
>>
>> 成都时时客科技有限责任公司
>>
>> 地址:成都市高新区天府大道1268号1栋3层
>>
>> 邮编:610041
>>
>> 手机:15108456581 <(510)%20845-6581>
>>
>> 在线:QQ 625852056
>>
>> 官网:www.keruyun.com
>>
>> 客服:4006-315-666
>>
>>
>>
>
>


Re: how big cardinal of a column if we want to code a column as dict?

2017-11-29 Thread
Thank you , it has been enabled for the count_disctinct column has been
stored in HDFS.

What I warry about is that if a big dict is not in the cache , a query may
be very slow for having to fetch data from hbase or hdfs.

2017-11-29 16:58 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:

> Hi Hao,
>
> Kylin will automatically detect whether a resource size exceeds HBase
> cell's max size; if yes, it will save it to HDFS:
> https://github.com/apache/kylin/blob/master/storage-
> hbase/src/main/java/org/apache/kylin/storage/hbase/
> HBaseResourceStore.java#L419
>
> Please check whether it works on your side.
>
> 2017-11-29 16:01 GMT+08:00 杨浩 <yangha...@gmail.com>:
>
>> I have generated the dict data size on TrieDictionaryForestBenchmark.
>> If cardinality is less than  2, the dict size will be less than 802KB.
>> WIll the cardinality be less than 2 to set a col as dict  if we want to
>> speed up query speed if the cell size (less than 1MB) is limit by hbase
>> admin?
>>
>> cardinality 0 1 2 3 4 5 6 7
>> dict size  64B  406KB  802KB  1MB  1MB  1MB  2MB  2MB
>>
>> 2017-11-25 22:42 GMT+08:00 杨浩 <yangha...@gmail.com>:
>>
>>> Thanks. The biggest number  has been writen in "Kylin Guide", but it may
>>> affect query performance for hbase limit of KV cell size.  As there are
>>> many cubes in the KYLIN , query server would fetch the dict from hbase many
>>> times. Our hbase admin says, if a KV size is under about 500 KB, the query
>>> perfmance can be guaranteed. So the dict size should be less than 500KB in
>>> our env.
>>>
>>> We may choose 1 million or half of that as the guide to use dict to
>>> ensure the query perfmance
>>>
>>> 2017-11-24 22:25 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:
>>>
>>>> The cap is 5 million I remember, But it's better to control that less
>>>> than 1 million.
>>>>
>>>> 2017-11-24 20:33 GMT+08:00 杨浩 <yangha...@gmail.com>:
>>>>
>>>>> There are many cubes in our kylin env. Can any one give the numer
>>>>> of how big cardinal of a column if we want to code a column as dict?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Shaofeng Shi 史少锋
>>>>
>>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: how big cardinal of a column if we want to code a column as dict?

2017-11-29 Thread
I have generated the dict data size on TrieDictionaryForestBenchmark.
If cardinality is less than  2, the dict size will be less than 802KB.
WIll the cardinality be less than 2 to set a col as dict  if we want to
speed up query speed if the cell size (less than 1MB) is limit by hbase
admin?

cardinality 0 1 2 3 4 5 6 7
dict size  64B  406KB  802KB  1MB  1MB  1MB  2MB  2MB

2017-11-25 22:42 GMT+08:00 杨浩 <yangha...@gmail.com>:

> Thanks. The biggest number  has been writen in "Kylin Guide", but it may
> affect query performance for hbase limit of KV cell size.  As there are
> many cubes in the KYLIN , query server would fetch the dict from hbase many
> times. Our hbase admin says, if a KV size is under about 500 KB, the query
> perfmance can be guaranteed. So the dict size should be less than 500KB in
> our env.
>
> We may choose 1 million or half of that as the guide to use dict to ensure
> the query perfmance
>
> 2017-11-24 22:25 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:
>
>> The cap is 5 million I remember, But it's better to control that less
>> than 1 million.
>>
>> 2017-11-24 20:33 GMT+08:00 杨浩 <yangha...@gmail.com>:
>>
>>> There are many cubes in our kylin env. Can any one give the numer of how
>>> big cardinal of a column if we want to code a column as dict?
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>


Re: how big cardinal of a column if we want to code a column as dict?

2017-11-25 Thread
Thanks. The biggest number  has been writen in "Kylin Guide", but it may
affect query performance for hbase limit of KV cell size.  As there are
many cubes in the KYLIN , query server would fetch the dict from hbase many
times. Our hbase admin says, if a KV size is under about 500 KB, the query
perfmance can be guaranteed. So the dict size should be less than 500KB in
our env.

We may choose 1 million or half of that as the guide to use dict to ensure
the query perfmance

2017-11-24 22:25 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:

> The cap is 5 million I remember, But it's better to control that less than
> 1 million.
>
> 2017-11-24 20:33 GMT+08:00 杨浩 <yangha...@gmail.com>:
>
>> There are many cubes in our kylin env. Can any one give the numer of how
>> big cardinal of a column if we want to code a column as dict?
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


how big cardinal of a column if we want to code a column as dict?

2017-11-24 Thread
There are many cubes in our kylin env. Can any one give the numer of how
big cardinal of a column if we want to code a column as dict?


Re: Shall we define the base cuboid?

2017-11-22 Thread
 0001111100, est row: 10033, est MB:
>> 168.04, shrink: 100.44%
>
> | Cuboid 0001011100, est row: 10059, est
>> MB: 168.47, shrink: 100.26%
>
> | Cuboid 00010110111100, est row: 9929,
>> est MB: 166.29, shrink: 98.71%
>
> | Cuboid 00010100111100, est row:
>> 9885, est MB: 165.55, shrink: 99.56%
>
> | Cuboid 0001111100, est row:
>> 9342, est MB: 156.46, shrink: 94.51%
>
> | Cuboid 00011101010011, est row: 10052, est MB:
>> 168.36, shrink: 101.32%
>
> | Cuboid 00010101010011, est row: 10083, est MB:
>> 168.88, shrink: 100.31%
>
> | Cuboid 000101101101010011, est row: 10046, est
>> MB: 168.26, shrink: 99.63%
>
> | Cuboid 000101001101010011, est row: 10030,
>> est MB: 167.99, shrink: 99.84%
>
> | Cuboid 00011101010011, est row:
>> 9843, est MB: 164.85, shrink: 98.14%
>
> | Cuboid 0001110101, est row: 10165, est MB:
>> 170.25, shrink: 101.12%
>
> | Cuboid 0001010101, est row: 9954, est
>> MB: 166.71, shrink: 97.92%
>
> | Cuboid 00010110110101, est row: 9995,
>> est MB: 167.4, shrink: 100.41%
>
> | Cuboid 00010100110101, est row:
>> 9984, est MB: 167.21, shrink: 99.89%
>
> | Cuboid 0001110101, est row:
>> 9566, est MB: 160.21, shrink: 95.81%
>
> | Cuboid 00011101011100, est row: 9974, est MB:
>> 167.06, shrink: 100.53%
>
> | Cuboid 00010101011100, est row: 9989, est MB:
>> 167.3, shrink: 100.15%
>
> | Cuboid 000101101101011100, est row: 10067, est
>> MB: 168.61, shrink: 100.78%
>
> | Cuboid 000101001101011100, est row: 10126,
>> est MB: 169.59, shrink: 100.59%
>
> | Cuboid 00011101011100, est row:
>> 9982, est MB: 167.18, shrink: 98.58%
>
> | Cuboid 000010, est row: 10006, est MB: 167.6,
>> shrink: 101.06%
>
> | Cuboid 0001011010, est row: 10057, est MB:
>> 168.45, shrink: 100.51%
>
> | Cuboid 00010110111010, est row: 10052, est MB:
>> 168.36, shrink: 99.95%
>
> | Cuboid 00010100111010, est row: 9997, est
>> MB: 167.44, shrink: 99.45%
>
> | Cuboid 0001111010, est row: 10057,
>> est MB: 168.44, shrink: 100.6%
>
> | Cuboid 0000100011, est row: 10197, est MB:
>> 170.79, shrink: 101.91%
>
> | Cuboid 00010110100011, est row: 10048, est MB:
>> 168.29, shrink: 98.54%
>
> | Cuboid 000101101110100011, est row: 9890, est
>> MB: 165.64, shrink: 98.43%
>
> | Cuboid 000101001110100011, est row: 10083,
>> est MB: 168.87, shrink: 101.95%
>
> | Cuboid 00011110100011, est row:
>> 9772, est MB: 163.66, shrink: 96.92%
>
> | Cuboid 000010, est row: 10176, est MB:
>> 170.43, shrink: 99.79%
>
> | Cuboid 0001011010, est row: 9969, est
>> MB: 166.97, shrink: 97.97%
>
> | Cuboid 00010110111010, est row: 9962,
>> est MB: 166.85, shrink: 99.93%
>
> | Cuboid 00010100111010, est row:
>> 9978, est MB: 167.11, shrink: 100.16%
>
> | Cuboid 0001111010, est row:
>> 9550, est MB: 159.94, shrink: 95.71%
>
> | Cuboid 0000101100, est row: 9987, est MB:
>> 167.27, shrink: 99.81%
>
> | Cuboid 00010110101100, est row: 10083, est MB:
>> 168.88, shrink: 100.96%
>
> | Cuboid 000101101110101100, est row: 10017, est
>> MB: 167.77, shrink: 99.35%
>
> | Cuboid 000101001110101100, est row: 9857,
>> est MB: 165.09, shrink: 98.4%
>
> | Cuboid 00011110101100, est row:
>> 9865, est MB: 165.22, shrink: 100.08%
>
> | Cuboid 0001110011, est row: 9901, est MB: 165.84,
>> shrink: 100%
>
> | Cuboid 0001010011, est row: 10063, est MB:
>> 168.55, shrink: 101.64%
>
> | Cuboid 00010110110011, es

Re: Shall we define the base cuboid?

2017-11-19 Thread
So, shall we change the document to delete the AGG1 , and says it is
computed by default?

2017-11-20 8:55 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:

> Yes, the base cuboid is the cuboid of all dimensions on rowkey. It is the
> parent of all AGG.
>
> 2017-11-19 16:06 GMT+08:00 杨浩 <yangha...@gmail.com>:
>
>> As in the article http://kylin.apache.org/blog/2016/02/18/new-aggregat
>> ion-group/ , two aggs are defined in the last. The AGG1 is base cuboid,
>> shall we define it explicity? It seems the base cuboid is computed implicit
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: how to change the order of rowkey_columns

2017-11-14 Thread
Thanks for that

2017-11-14 22:50 GMT+08:00 Billy Liu <billy...@apache.org>:

> You could change the rowkey order by drag-and-drop from GUI directly.
>
> 2017-11-14 10:52 GMT+08:00 杨浩 <yangha...@gmail.com>:
>
>>To speed up the query, we should position the frequent column before
>> others, such as partition column date should be at the first for every
>> query would contain it. Then how to change the order of  rowkey_columns, or
>> in other words,  what decides the order of rowkey_columns?
>>
>
>


how to change the order of rowkey_columns

2017-11-13 Thread
   To speed up the query, we should position the frequent column before
others, such as partition column date should be at the first for every
query would contain it. Then how to change the order of  rowkey_columns, or
in other words,  what decides the order of rowkey_columns?


what's the best practice for choosing machine for kylin

2017-10-30 Thread
What's the best practice for choosing  machine for kylin?


question about joint dimensions

2017-10-25 Thread
I have used aggregation groups to reduce cuboids, but it seems not take
effect. A table has dimension A, B, C, D, E, and set this aggregation
groups:
group 1: A,B,C which has joint dimensions A,B,C
group 2: A,B,D which has joint dimensions A,B,D
group 3: A,B,E which has joint dimensions A,B,E
but I can query the table using "group by A B C E" ,  is it OK?


Re: [Announce] New Apache Kylin PMC Billy Liu

2017-10-19 Thread
Congratuolations to  Bill, Guosheng and Cheng Wang  !!

2017-10-16 20:41 GMT+08:00 Alberto Ramón :

> Congratuolations to  Bill, Guosheng and Cheng Wang  !!
>
> On 16 October 2017 at 11:33, Luke Han  wrote:
>
>> On behalf of the Apache Kylin PMC, I am very pleased to announce
>> that Billy Liu has accepted the PMC's invitation to become a
>> PMC member on the project.
>>
>> We appreciate all of Billy's generous contributions about many bug
>> fixes, patches, helped many users. We are so glad to have him to be
>> our new PMC and looking forward to his continued involvement.
>>
>> Congratulations and Welcome, Billy!
>>
>
>


How to remove error model?

2017-10-12 Thread
There are some error model/cube in kylin, it cannot be seen in the website,
and I cannot genereate the same model named that, How to delete it?

Command bin/metastore.sh clean --delete true cannot delete the dirty model.


Re: how to filter long tail data

2017-09-06 Thread
It's an elegant implementation. I have read the article
approximate-topn-measure
<http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/> , and
some problem meet in our situation

   1. The result is approximate. Our team is to supply statistics data for
   our company, a big company, and we don't want to be challenged by our users
   2. There is a little difference to filter data after all
   cuboid generated. If we have dimension with
   date、appId、appVersion、channel,measure with
   dayActiveUseCount、dayNewUseCount、dayUseCount、7dayActiveUseCount, we would
filter data which's dayActiveUseCount less than 2 before. It's very hard
   to use Top-N to implement this , but if using default measure "_COUNT_"
   to filter data after all cuboid generated, it may be OK.

 It seems we have to change the souce code,  and supply a parameter to
filter data by "_COUNT_" after all cuboid generated


I have a question for the topN measure: does it also filter data for
default measure _COUNT_ which is not in the TopN ?



2017-09-05 15:28 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:

> Cool, that is the case of top N.
>
> 2017-09-05 12:00 GMT+08:00 杨浩 <yangha...@gmail.com>:
>
>> Thanks. We would like to try Top-N measure. The "filter condition" filter
>> data from the source, but we want to filter the data after all cuboid built
>> for we don't know the long tail data unless building.
>>
>>
>> 2017-09-04 11:01 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:
>>
>>> Top-N measure is amied to filter the long tail data. Besides, in Data
>>> model, there is a "filter condition", where you can add a filtering
>>> condition to exclude those tail data.
>>>
>>> 2017-09-04 10:54 GMT+08:00 杨浩 <yangha...@gmail.com>:
>>>
>>>> Okay, our team want to use Kylin as an ETL tool, but there are many
>>>> long tail data after building. Can these data be filtered directly by
>>>> kylin, or do we have to  make some change to the code ?
>>>>
>>>> 2017-09-03 19:42 GMT+08:00 Li Yang <liy...@apache.org>:
>>>>
>>>>> Please ask Kylin related question here.
>>>>>
>>>>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <yangha...@gmail.com> wrote:
>>>>>
>>>>> > If a index is less than 2, we don't want to store it in hbase . How
>>>>> to
>>>>> > filter the long tail data ?
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: how to filter long tail data

2017-09-04 Thread
Thanks. We would like to try Top-N measure. The "filter condition" filter
data from the source, but we want to filter the data after all cuboid built
for we don't know the long tail data unless building.


2017-09-04 11:01 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:

> Top-N measure is amied to filter the long tail data. Besides, in Data
> model, there is a "filter condition", where you can add a filtering
> condition to exclude those tail data.
>
> 2017-09-04 10:54 GMT+08:00 杨浩 <yangha...@gmail.com>:
>
>> Okay, our team want to use Kylin as an ETL tool, but there are many long
>> tail data after building. Can these data be filtered directly by kylin, or
>> do we have to  make some change to the code ?
>>
>> 2017-09-03 19:42 GMT+08:00 Li Yang <liy...@apache.org>:
>>
>>> Please ask Kylin related question here.
>>>
>>> On Fri, Sep 1, 2017 at 2:47 PM, 杨浩 <yangha...@gmail.com> wrote:
>>>
>>> > If a index is less than 2, we don't want to store it in hbase . How to
>>> > filter the long tail data ?
>>> >
>>>
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


How to filter the long tail data?

2017-09-01 Thread
If a index is less than 2, we don't want to store it in hbase . How to
filter the long tail data ?


how to filter long tail data

2017-09-01 Thread
If a index is less than 2, we don't want to store it in hbase . How to
filter the long tail data ?


Re: use standalone secure Hive and MR

2017-06-27 Thread
thank you, it helps.

We have configured the hive from our company, but from the log we can see
ugi is not h_yanghao3 and  no tables read

configuration

kylin.job.status.with.kerberos=true
> kylin.hive.beeline.params=-u
> 'jdbc:hive2://***/;principal=sql_prc/hadoop@*.HADOOP' -n
> h_yanghao3

kylin.hive.client=beeline
>

kylin.out


> *2017-06-27 17:34:58,040 INFO  [http-bio-7070-exec-10]
> metastore.HiveMetaStore: Added admin role in metastore*
> *2017-06-27 17:34:58,041 INFO  [http-bio-7070-exec-10]
> metastore.HiveMetaStore: Added public role in metastore*
> *2017-06-27 17:34:58,127 INFO  [http-bio-7070-exec-10]
> metastore.HiveMetaStore: No user is added in admin role, since config is
> empty*
> *2017-06-27 17:34:58,203 INFO  [http-bio-7070-exec-10]
> metastore.HiveMetaStore: 0: get_all_databases**2017-06-27 17:34:58,204
> INFO  [http-bio-7070-exec-10] HiveMetaStore.audit: ugi=grid
> ip=unknown-ip-addr cmd=get_all_databases *




2017-06-27 20:10 GMT+08:00 Billy Liu <billy...@apache.org>:

> Are you looking for this http://kylin.apache.org/
> blog/2016/06/10/standalone-hbase-cluster/?
>
>
> 2017-06-27 20:07 GMT+08:00 杨浩 <yangha...@gmail.com>:
>
>> Our company have used a hadoop version different from Apache Hadoop, and
>> kerberos is used to keep secure. Our group want to use Kylin, using hive
>> and MR from our company , but the hbase maintained by our team using
>> Apache
>> HBase.  We have used Hive beeline but it cannot read meta info from Hive.
>> Any one knows how to configure or change source code for using standalone
>> Hive、MR?
>>
>
>


use standalone secure Hive and MR

2017-06-27 Thread
Our company have used a hadoop version different from Apache Hadoop, and
kerberos is used to keep secure. Our group want to use Kylin, using hive
and MR from our company , but the hbase maintained by our team using Apache
HBase.  We have used Hive beeline but it cannot read meta info from Hive.
Any one knows how to configure or change source code for using standalone
Hive、MR?