Re: Consulting "EXTENDED_COLUMN"

2016-12-11 Thread ShaoFeng Shi
Kylin will encode the dimension values with Dictionary (default encoding)
or other encoding methods when composing the rowkey; so the overhead will
be less in most of cases.

2016-12-02 17:59 GMT+08:00 Alberto Ramón :

> yes, I will asume this overhead in rowKey
>
> 2016-12-02 9:58 GMT+01:00 Billy(Yiming) Liu :
>
>> Using Joint Dimension for your 1:1 relation is the right design.
>>
>> 2016-12-02 0:21 GMT+08:00 Alberto Ramón :
>>
>>> Nice Liu
>>>
>>> We have some cases like
>>> DayWeekTXT , DayWeekID
>>> MonthTXT, MonthID
>>>
>>> small proposal:
>>> Can would be interesting create Derived with 1:1 relation, with support
>>> for filters and Group by
>>>
>>> 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu :
>>>
 The cost of joint dimension compared with extended column is you have
 more columns in the HBase rowkey. It may harm the query performance. But
 most time, joint dimension is still recommended, since the normal dimension
 column supports much more functions than extended column, such as count(*).

 2016-12-01 17:07 GMT+08:00 Alberto Ramón :

> Hello
> I was preparing a email with related doubts:
>
> Some times we have derived dimensions with relation 1:1, examples:
> WeekDayID & WeekDayTxt
> MonthID & WeekTxt
>
> SOL1: Derived.  ID as Host and Txt Extended
> PB: You can't filter / Group by Txt
>
> SOL2: Joint. Define tuples of ID & TXT
> Some PB/limitation?  (I need test this option)
>
> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu :
>
>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>> used for representation, but not filtering or grouping which is  done by
>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>> key/value map against the HOST_COLUMN.
>>
>> If the value in EXTENDED_COLUMN is not long, you could just define
>> two dimensions with joint dimension setting, it has almost the same
>> performance impact with EXTENDED_COLUMN which reduces one dimension, but
>> better understanding.
>>
>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón :
>>
>>> This will help you
>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>>
>>> The idea is always, How I can reduce the number of Dimension ?
>>> If you reduce Dim, the time / resources to build the cube and final
>>> size of
>>> it decrease --> Its good
>>>
>>> An example can be DIM_Persons: Id_Person , Name, Surname, Address,
>>> .
>>>Id_Person can be HostColumn
>>> and other columns can be calculated from ID --> are Extended
>>> Column
>>>
>>>
>>>
>>>
>>> 2016-11-30 11:35 GMT+01:00 仇同心 :
>>>
>>> > Hi ,all
>>> > I don’t understand the usage scenarios of
>>> EXTENDED_COLUMN,although I saw
>>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>>> > What,s the means about parameters of “Host Column” and “Extended
>>> Column”?
>>> > Why use this expression,and what aspects of optimization that this
>>> > expression solved?
>>> > Can be combined with a SQL statement to explain?
>>> >
>>> >
>>> > Thanks~
>>> >
>>>
>>
>>
>>
>> --
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>
>


 --
 With Warm regards

 Yiming Liu (刘一鸣)

>>>
>>>
>>
>>
>> --
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋


Re: Consulting "EXTENDED_COLUMN"

2016-12-02 Thread Alberto Ramón
yes, I will asume this overhead in rowKey

2016-12-02 9:58 GMT+01:00 Billy(Yiming) Liu :

> Using Joint Dimension for your 1:1 relation is the right design.
>
> 2016-12-02 0:21 GMT+08:00 Alberto Ramón :
>
>> Nice Liu
>>
>> We have some cases like
>> DayWeekTXT , DayWeekID
>> MonthTXT, MonthID
>>
>> small proposal:
>> Can would be interesting create Derived with 1:1 relation, with support
>> for filters and Group by
>>
>> 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu :
>>
>>> The cost of joint dimension compared with extended column is you have
>>> more columns in the HBase rowkey. It may harm the query performance. But
>>> most time, joint dimension is still recommended, since the normal dimension
>>> column supports much more functions than extended column, such as count(*).
>>>
>>> 2016-12-01 17:07 GMT+08:00 Alberto Ramón :
>>>
 Hello
 I was preparing a email with related doubts:

 Some times we have derived dimensions with relation 1:1, examples:
 WeekDayID & WeekDayTxt
 MonthID & WeekTxt

 SOL1: Derived.  ID as Host and Txt Extended
 PB: You can't filter / Group by Txt

 SOL2: Joint. Define tuples of ID & TXT
 Some PB/limitation?  (I need test this option)

 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu :

> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
> used for representation, but not filtering or grouping which is  done by
> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
> key/value map against the HOST_COLUMN.
>
> If the value in EXTENDED_COLUMN is not long, you could just define two
> dimensions with joint dimension setting, it has almost the same 
> performance
> impact with EXTENDED_COLUMN which reduces one dimension, but better
> understanding.
>
> 2016-11-30 19:00 GMT+08:00 Alberto Ramón :
>
>> This will help you
>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>
>> The idea is always, How I can reduce the number of Dimension ?
>> If you reduce Dim, the time / resources to build the cube and final
>> size of
>> it decrease --> Its good
>>
>> An example can be DIM_Persons: Id_Person , Name, Surname, Address,
>> .
>>Id_Person can be HostColumn
>> and other columns can be calculated from ID --> are Extended
>> Column
>>
>>
>>
>>
>> 2016-11-30 11:35 GMT+01:00 仇同心 :
>>
>> > Hi ,all
>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although
>> I saw
>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>> > What,s the means about parameters of “Host Column” and “Extended
>> Column”?
>> > Why use this expression,and what aspects of optimization that this
>> > expression solved?
>> > Can be combined with a SQL statement to explain?
>> >
>> >
>> > Thanks~
>> >
>>
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>


>>>
>>>
>>> --
>>> With Warm regards
>>>
>>> Yiming Liu (刘一鸣)
>>>
>>
>>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>


Re: Consulting "EXTENDED_COLUMN"

2016-12-02 Thread Billy(Yiming) Liu
Using Joint Dimension for your 1:1 relation is the right design.

2016-12-02 0:21 GMT+08:00 Alberto Ramón :

> Nice Liu
>
> We have some cases like
> DayWeekTXT , DayWeekID
> MonthTXT, MonthID
>
> small proposal:
> Can would be interesting create Derived with 1:1 relation, with support
> for filters and Group by
>
> 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu :
>
>> The cost of joint dimension compared with extended column is you have
>> more columns in the HBase rowkey. It may harm the query performance. But
>> most time, joint dimension is still recommended, since the normal dimension
>> column supports much more functions than extended column, such as count(*).
>>
>> 2016-12-01 17:07 GMT+08:00 Alberto Ramón :
>>
>>> Hello
>>> I was preparing a email with related doubts:
>>>
>>> Some times we have derived dimensions with relation 1:1, examples:
>>> WeekDayID & WeekDayTxt
>>> MonthID & WeekTxt
>>>
>>> SOL1: Derived.  ID as Host and Txt Extended
>>> PB: You can't filter / Group by Txt
>>>
>>> SOL2: Joint. Define tuples of ID & TXT
>>> Some PB/limitation?  (I need test this option)
>>>
>>> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu :
>>>
 Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
 used for representation, but not filtering or grouping which is  done by
 HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
 key/value map against the HOST_COLUMN.

 If the value in EXTENDED_COLUMN is not long, you could just define two
 dimensions with joint dimension setting, it has almost the same performance
 impact with EXTENDED_COLUMN which reduces one dimension, but better
 understanding.

 2016-11-30 19:00 GMT+08:00 Alberto Ramón :

> This will help you
> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>
> The idea is always, How I can reduce the number of Dimension ?
> If you reduce Dim, the time / resources to build the cube and final
> size of
> it decrease --> Its good
>
> An example can be DIM_Persons: Id_Person , Name, Surname, Address,
> .
>Id_Person can be HostColumn
> and other columns can be calculated from ID --> are Extended Column
>
>
>
>
> 2016-11-30 11:35 GMT+01:00 仇同心 :
>
> > Hi ,all
> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although
> I saw
> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
> > What,s the means about parameters of “Host Column” and “Extended
> Column”?
> > Why use this expression,and what aspects of optimization that this
> > expression solved?
> > Can be combined with a SQL statement to explain?
> >
> >
> > Thanks~
> >
>



 --
 With Warm regards

 Yiming Liu (刘一鸣)

>>>
>>>
>>
>>
>> --
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>
>


-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: Consulting "EXTENDED_COLUMN"

2016-12-01 Thread Alberto Ramón
Nice Liu

We have some cases like
DayWeekTXT , DayWeekID
MonthTXT, MonthID

small proposal:
Can would be interesting create Derived with 1:1 relation, with support for
filters and Group by

2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu :

> The cost of joint dimension compared with extended column is you have more
> columns in the HBase rowkey. It may harm the query performance. But most
> time, joint dimension is still recommended, since the normal dimension
> column supports much more functions than extended column, such as count(*).
>
> 2016-12-01 17:07 GMT+08:00 Alberto Ramón :
>
>> Hello
>> I was preparing a email with related doubts:
>>
>> Some times we have derived dimensions with relation 1:1, examples:
>> WeekDayID & WeekDayTxt
>> MonthID & WeekTxt
>>
>> SOL1: Derived.  ID as Host and Txt Extended
>> PB: You can't filter / Group by Txt
>>
>> SOL2: Joint. Define tuples of ID & TXT
>> Some PB/limitation?  (I need test this option)
>>
>> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu :
>>
>>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>>> used for representation, but not filtering or grouping which is  done by
>>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>>> key/value map against the HOST_COLUMN.
>>>
>>> If the value in EXTENDED_COLUMN is not long, you could just define two
>>> dimensions with joint dimension setting, it has almost the same performance
>>> impact with EXTENDED_COLUMN which reduces one dimension, but better
>>> understanding.
>>>
>>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón :
>>>
 This will help you
 http://kylin.apache.org/docs/howto/howto_optimize_cubes.html

 The idea is always, How I can reduce the number of Dimension ?
 If you reduce Dim, the time / resources to build the cube and final
 size of
 it decrease --> Its good

 An example can be DIM_Persons: Id_Person , Name, Surname, Address, .
Id_Person can be HostColumn
 and other columns can be calculated from ID --> are Extended Column




 2016-11-30 11:35 GMT+01:00 仇同心 :

 > Hi ,all
 > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although I
 saw
 > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
 > What,s the means about parameters of “Host Column” and “Extended
 Column”?
 > Why use this expression,and what aspects of optimization that this
 > expression solved?
 > Can be combined with a SQL statement to explain?
 >
 >
 > Thanks~
 >

>>>
>>>
>>>
>>> --
>>> With Warm regards
>>>
>>> Yiming Liu (刘一鸣)
>>>
>>
>>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>


Re: Consulting "EXTENDED_COLUMN"

2016-12-01 Thread Billy(Yiming) Liu
The cost of joint dimension compared with extended column is you have more
columns in the HBase rowkey. It may harm the query performance. But most
time, joint dimension is still recommended, since the normal dimension
column supports much more functions than extended column, such as count(*).

2016-12-01 17:07 GMT+08:00 Alberto Ramón :

> Hello
> I was preparing a email with related doubts:
>
> Some times we have derived dimensions with relation 1:1, examples:
> WeekDayID & WeekDayTxt
> MonthID & WeekTxt
>
> SOL1: Derived.  ID as Host and Txt Extended
> PB: You can't filter / Group by Txt
>
> SOL2: Joint. Define tuples of ID & TXT
> Some PB/limitation?  (I need test this option)
>
> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu :
>
>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>> used for representation, but not filtering or grouping which is  done by
>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>> key/value map against the HOST_COLUMN.
>>
>> If the value in EXTENDED_COLUMN is not long, you could just define two
>> dimensions with joint dimension setting, it has almost the same performance
>> impact with EXTENDED_COLUMN which reduces one dimension, but better
>> understanding.
>>
>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón :
>>
>>> This will help you
>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>>
>>> The idea is always, How I can reduce the number of Dimension ?
>>> If you reduce Dim, the time / resources to build the cube and final size
>>> of
>>> it decrease --> Its good
>>>
>>> An example can be DIM_Persons: Id_Person , Name, Surname, Address, .
>>>Id_Person can be HostColumn
>>> and other columns can be calculated from ID --> are Extended Column
>>>
>>>
>>>
>>>
>>> 2016-11-30 11:35 GMT+01:00 仇同心 :
>>>
>>> > Hi ,all
>>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although I
>>> saw
>>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>>> > What,s the means about parameters of “Host Column” and “Extended
>>> Column”?
>>> > Why use this expression,and what aspects of optimization that this
>>> > expression solved?
>>> > Can be combined with a SQL statement to explain?
>>> >
>>> >
>>> > Thanks~
>>> >
>>>
>>
>>
>>
>> --
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>
>


-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: Consulting "EXTENDED_COLUMN"

2016-12-01 Thread Alberto Ramón
Hello
I was preparing a email with related doubts:

Some times we have derived dimensions with relation 1:1, examples:
WeekDayID & WeekDayTxt
MonthID & WeekTxt

SOL1: Derived.  ID as Host and Txt Extended
PB: You can't filter / Group by Txt

SOL2: Joint. Define tuples of ID & TXT
Some PB/limitation?  (I need test this option)

2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu :

> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only used
> for representation, but not filtering or grouping which is  done by
> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
> key/value map against the HOST_COLUMN.
>
> If the value in EXTENDED_COLUMN is not long, you could just define two
> dimensions with joint dimension setting, it has almost the same performance
> impact with EXTENDED_COLUMN which reduces one dimension, but better
> understanding.
>
> 2016-11-30 19:00 GMT+08:00 Alberto Ramón :
>
>> This will help you
>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>
>> The idea is always, How I can reduce the number of Dimension ?
>> If you reduce Dim, the time / resources to build the cube and final size
>> of
>> it decrease --> Its good
>>
>> An example can be DIM_Persons: Id_Person , Name, Surname, Address, .
>>Id_Person can be HostColumn
>> and other columns can be calculated from ID --> are Extended Column
>>
>>
>>
>>
>> 2016-11-30 11:35 GMT+01:00 仇同心 :
>>
>> > Hi ,all
>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although I
>> saw
>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>> > What,s the means about parameters of “Host Column” and “Extended
>> Column”?
>> > Why use this expression,and what aspects of optimization that this
>> > expression solved?
>> > Can be combined with a SQL statement to explain?
>> >
>> >
>> > Thanks~
>> >
>>
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>


Re: Consulting "EXTENDED_COLUMN"

2016-11-30 Thread Billy(Yiming) Liu
Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only used
for representation, but not filtering or grouping which is  done by
HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
key/value map against the HOST_COLUMN.

If the value in EXTENDED_COLUMN is not long, you could just define two
dimensions with joint dimension setting, it has almost the same performance
impact with EXTENDED_COLUMN which reduces one dimension, but better
understanding.

2016-11-30 19:00 GMT+08:00 Alberto Ramón :

> This will help you
> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>
> The idea is always, How I can reduce the number of Dimension ?
> If you reduce Dim, the time / resources to build the cube and final size of
> it decrease --> Its good
>
> An example can be DIM_Persons: Id_Person , Name, Surname, Address, .
>Id_Person can be HostColumn
> and other columns can be calculated from ID --> are Extended Column
>
>
>
>
> 2016-11-30 11:35 GMT+01:00 仇同心 :
>
> > Hi ,all
> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although I saw
> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
> > What,s the means about parameters of “Host Column” and “Extended Column”?
> > Why use this expression,and what aspects of optimization that this
> > expression solved?
> > Can be combined with a SQL statement to explain?
> >
> >
> > Thanks~
> >
>



-- 
With Warm regards

Yiming Liu (刘一鸣)