Re: Consulting "EXTENDED_COLUMN"
Kylin will encode the dimension values with Dictionary (default encoding) or other encoding methods when composing the rowkey; so the overhead will be less in most of cases. 2016-12-02 17:59 GMT+08:00 Alberto Ramón : > yes, I will asume this overhead in rowKey > > 2016-12-02 9:58 GMT+01:00 Billy(Yiming) Liu : > >> Using Joint Dimension for your 1:1 relation is the right design. >> >> 2016-12-02 0:21 GMT+08:00 Alberto Ramón : >> >>> Nice Liu >>> >>> We have some cases like >>> DayWeekTXT , DayWeekID >>> MonthTXT, MonthID >>> >>> small proposal: >>> Can would be interesting create Derived with 1:1 relation, with support >>> for filters and Group by >>> >>> 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu : >>> The cost of joint dimension compared with extended column is you have more columns in the HBase rowkey. It may harm the query performance. But most time, joint dimension is still recommended, since the normal dimension column supports much more functions than extended column, such as count(*). 2016-12-01 17:07 GMT+08:00 Alberto Ramón : > Hello > I was preparing a email with related doubts: > > Some times we have derived dimensions with relation 1:1, examples: > WeekDayID & WeekDayTxt > MonthID & WeekTxt > > SOL1: Derived. ID as Host and Txt Extended > PB: You can't filter / Group by Txt > > SOL2: Joint. Define tuples of ID & TXT > Some PB/limitation? (I need test this option) > > 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu : > >> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only >> used for representation, but not filtering or grouping which is done by >> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a >> key/value map against the HOST_COLUMN. >> >> If the value in EXTENDED_COLUMN is not long, you could just define >> two dimensions with joint dimension setting, it has almost the same >> performance impact with EXTENDED_COLUMN which reduces one dimension, but >> better understanding. >> >> 2016-11-30 19:00 GMT+08:00 Alberto Ramón : >> >>> This will help you >>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html >>> >>> The idea is always, How I can reduce the number of Dimension ? >>> If you reduce Dim, the time / resources to build the cube and final >>> size of >>> it decrease --> Its good >>> >>> An example can be DIM_Persons: Id_Person , Name, Surname, Address, >>> . >>>Id_Person can be HostColumn >>> and other columns can be calculated from ID --> are Extended >>> Column >>> >>> >>> >>> >>> 2016-11-30 11:35 GMT+01:00 仇同心 : >>> >>> > Hi ,all >>> > I don’t understand the usage scenarios of >>> EXTENDED_COLUMN,although I saw >>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”. >>> > What,s the means about parameters of “Host Column” and “Extended >>> Column”? >>> > Why use this expression,and what aspects of optimization that this >>> > expression solved? >>> > Can be combined with a SQL statement to explain? >>> > >>> > >>> > Thanks~ >>> > >>> >> >> >> >> -- >> With Warm regards >> >> Yiming Liu (刘一鸣) >> > > -- With Warm regards Yiming Liu (刘一鸣) >>> >>> >> >> >> -- >> With Warm regards >> >> Yiming Liu (刘一鸣) >> > > -- Best regards, Shaofeng Shi 史少锋
Re: Consulting "EXTENDED_COLUMN"
yes, I will asume this overhead in rowKey 2016-12-02 9:58 GMT+01:00 Billy(Yiming) Liu : > Using Joint Dimension for your 1:1 relation is the right design. > > 2016-12-02 0:21 GMT+08:00 Alberto Ramón : > >> Nice Liu >> >> We have some cases like >> DayWeekTXT , DayWeekID >> MonthTXT, MonthID >> >> small proposal: >> Can would be interesting create Derived with 1:1 relation, with support >> for filters and Group by >> >> 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu : >> >>> The cost of joint dimension compared with extended column is you have >>> more columns in the HBase rowkey. It may harm the query performance. But >>> most time, joint dimension is still recommended, since the normal dimension >>> column supports much more functions than extended column, such as count(*). >>> >>> 2016-12-01 17:07 GMT+08:00 Alberto Ramón : >>> Hello I was preparing a email with related doubts: Some times we have derived dimensions with relation 1:1, examples: WeekDayID & WeekDayTxt MonthID & WeekTxt SOL1: Derived. ID as Host and Txt Extended PB: You can't filter / Group by Txt SOL2: Joint. Define tuples of ID & TXT Some PB/limitation? (I need test this option) 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu : > Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only > used for representation, but not filtering or grouping which is done by > HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a > key/value map against the HOST_COLUMN. > > If the value in EXTENDED_COLUMN is not long, you could just define two > dimensions with joint dimension setting, it has almost the same > performance > impact with EXTENDED_COLUMN which reduces one dimension, but better > understanding. > > 2016-11-30 19:00 GMT+08:00 Alberto Ramón : > >> This will help you >> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html >> >> The idea is always, How I can reduce the number of Dimension ? >> If you reduce Dim, the time / resources to build the cube and final >> size of >> it decrease --> Its good >> >> An example can be DIM_Persons: Id_Person , Name, Surname, Address, >> . >>Id_Person can be HostColumn >> and other columns can be calculated from ID --> are Extended >> Column >> >> >> >> >> 2016-11-30 11:35 GMT+01:00 仇同心 : >> >> > Hi ,all >> > I don’t understand the usage scenarios of EXTENDED_COLUMN,although >> I saw >> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”. >> > What,s the means about parameters of “Host Column” and “Extended >> Column”? >> > Why use this expression,and what aspects of optimization that this >> > expression solved? >> > Can be combined with a SQL statement to explain? >> > >> > >> > Thanks~ >> > >> > > > > -- > With Warm regards > > Yiming Liu (刘一鸣) > >>> >>> >>> -- >>> With Warm regards >>> >>> Yiming Liu (刘一鸣) >>> >> >> > > > -- > With Warm regards > > Yiming Liu (刘一鸣) >
Re: Consulting "EXTENDED_COLUMN"
Using Joint Dimension for your 1:1 relation is the right design. 2016-12-02 0:21 GMT+08:00 Alberto Ramón : > Nice Liu > > We have some cases like > DayWeekTXT , DayWeekID > MonthTXT, MonthID > > small proposal: > Can would be interesting create Derived with 1:1 relation, with support > for filters and Group by > > 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu : > >> The cost of joint dimension compared with extended column is you have >> more columns in the HBase rowkey. It may harm the query performance. But >> most time, joint dimension is still recommended, since the normal dimension >> column supports much more functions than extended column, such as count(*). >> >> 2016-12-01 17:07 GMT+08:00 Alberto Ramón : >> >>> Hello >>> I was preparing a email with related doubts: >>> >>> Some times we have derived dimensions with relation 1:1, examples: >>> WeekDayID & WeekDayTxt >>> MonthID & WeekTxt >>> >>> SOL1: Derived. ID as Host and Txt Extended >>> PB: You can't filter / Group by Txt >>> >>> SOL2: Joint. Define tuples of ID & TXT >>> Some PB/limitation? (I need test this option) >>> >>> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu : >>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only used for representation, but not filtering or grouping which is done by HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a key/value map against the HOST_COLUMN. If the value in EXTENDED_COLUMN is not long, you could just define two dimensions with joint dimension setting, it has almost the same performance impact with EXTENDED_COLUMN which reduces one dimension, but better understanding. 2016-11-30 19:00 GMT+08:00 Alberto Ramón : > This will help you > http://kylin.apache.org/docs/howto/howto_optimize_cubes.html > > The idea is always, How I can reduce the number of Dimension ? > If you reduce Dim, the time / resources to build the cube and final > size of > it decrease --> Its good > > An example can be DIM_Persons: Id_Person , Name, Surname, Address, > . >Id_Person can be HostColumn > and other columns can be calculated from ID --> are Extended Column > > > > > 2016-11-30 11:35 GMT+01:00 仇同心 : > > > Hi ,all > > I don’t understand the usage scenarios of EXTENDED_COLUMN,although > I saw > > this article “https://issues.apache.org/jira/browse/KYLIN-1313”. > > What,s the means about parameters of “Host Column” and “Extended > Column”? > > Why use this expression,and what aspects of optimization that this > > expression solved? > > Can be combined with a SQL statement to explain? > > > > > > Thanks~ > > > -- With Warm regards Yiming Liu (刘一鸣) >>> >>> >> >> >> -- >> With Warm regards >> >> Yiming Liu (刘一鸣) >> > > -- With Warm regards Yiming Liu (刘一鸣)
Re: Consulting "EXTENDED_COLUMN"
Nice Liu We have some cases like DayWeekTXT , DayWeekID MonthTXT, MonthID small proposal: Can would be interesting create Derived with 1:1 relation, with support for filters and Group by 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu : > The cost of joint dimension compared with extended column is you have more > columns in the HBase rowkey. It may harm the query performance. But most > time, joint dimension is still recommended, since the normal dimension > column supports much more functions than extended column, such as count(*). > > 2016-12-01 17:07 GMT+08:00 Alberto Ramón : > >> Hello >> I was preparing a email with related doubts: >> >> Some times we have derived dimensions with relation 1:1, examples: >> WeekDayID & WeekDayTxt >> MonthID & WeekTxt >> >> SOL1: Derived. ID as Host and Txt Extended >> PB: You can't filter / Group by Txt >> >> SOL2: Joint. Define tuples of ID & TXT >> Some PB/limitation? (I need test this option) >> >> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu : >> >>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only >>> used for representation, but not filtering or grouping which is done by >>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a >>> key/value map against the HOST_COLUMN. >>> >>> If the value in EXTENDED_COLUMN is not long, you could just define two >>> dimensions with joint dimension setting, it has almost the same performance >>> impact with EXTENDED_COLUMN which reduces one dimension, but better >>> understanding. >>> >>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón : >>> This will help you http://kylin.apache.org/docs/howto/howto_optimize_cubes.html The idea is always, How I can reduce the number of Dimension ? If you reduce Dim, the time / resources to build the cube and final size of it decrease --> Its good An example can be DIM_Persons: Id_Person , Name, Surname, Address, . Id_Person can be HostColumn and other columns can be calculated from ID --> are Extended Column 2016-11-30 11:35 GMT+01:00 仇同心 : > Hi ,all > I don’t understand the usage scenarios of EXTENDED_COLUMN,although I saw > this article “https://issues.apache.org/jira/browse/KYLIN-1313”. > What,s the means about parameters of “Host Column” and “Extended Column”? > Why use this expression,and what aspects of optimization that this > expression solved? > Can be combined with a SQL statement to explain? > > > Thanks~ > >>> >>> >>> >>> -- >>> With Warm regards >>> >>> Yiming Liu (刘一鸣) >>> >> >> > > > -- > With Warm regards > > Yiming Liu (刘一鸣) >
Re: Consulting "EXTENDED_COLUMN"
The cost of joint dimension compared with extended column is you have more columns in the HBase rowkey. It may harm the query performance. But most time, joint dimension is still recommended, since the normal dimension column supports much more functions than extended column, such as count(*). 2016-12-01 17:07 GMT+08:00 Alberto Ramón : > Hello > I was preparing a email with related doubts: > > Some times we have derived dimensions with relation 1:1, examples: > WeekDayID & WeekDayTxt > MonthID & WeekTxt > > SOL1: Derived. ID as Host and Txt Extended > PB: You can't filter / Group by Txt > > SOL2: Joint. Define tuples of ID & TXT > Some PB/limitation? (I need test this option) > > 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu : > >> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only >> used for representation, but not filtering or grouping which is done by >> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a >> key/value map against the HOST_COLUMN. >> >> If the value in EXTENDED_COLUMN is not long, you could just define two >> dimensions with joint dimension setting, it has almost the same performance >> impact with EXTENDED_COLUMN which reduces one dimension, but better >> understanding. >> >> 2016-11-30 19:00 GMT+08:00 Alberto Ramón : >> >>> This will help you >>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html >>> >>> The idea is always, How I can reduce the number of Dimension ? >>> If you reduce Dim, the time / resources to build the cube and final size >>> of >>> it decrease --> Its good >>> >>> An example can be DIM_Persons: Id_Person , Name, Surname, Address, . >>>Id_Person can be HostColumn >>> and other columns can be calculated from ID --> are Extended Column >>> >>> >>> >>> >>> 2016-11-30 11:35 GMT+01:00 仇同心 : >>> >>> > Hi ,all >>> > I don’t understand the usage scenarios of EXTENDED_COLUMN,although I >>> saw >>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”. >>> > What,s the means about parameters of “Host Column” and “Extended >>> Column”? >>> > Why use this expression,and what aspects of optimization that this >>> > expression solved? >>> > Can be combined with a SQL statement to explain? >>> > >>> > >>> > Thanks~ >>> > >>> >> >> >> >> -- >> With Warm regards >> >> Yiming Liu (刘一鸣) >> > > -- With Warm regards Yiming Liu (刘一鸣)
Re: Consulting "EXTENDED_COLUMN"
Hello I was preparing a email with related doubts: Some times we have derived dimensions with relation 1:1, examples: WeekDayID & WeekDayTxt MonthID & WeekTxt SOL1: Derived. ID as Host and Txt Extended PB: You can't filter / Group by Txt SOL2: Joint. Define tuples of ID & TXT Some PB/limitation? (I need test this option) 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu : > Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only used > for representation, but not filtering or grouping which is done by > HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a > key/value map against the HOST_COLUMN. > > If the value in EXTENDED_COLUMN is not long, you could just define two > dimensions with joint dimension setting, it has almost the same performance > impact with EXTENDED_COLUMN which reduces one dimension, but better > understanding. > > 2016-11-30 19:00 GMT+08:00 Alberto Ramón : > >> This will help you >> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html >> >> The idea is always, How I can reduce the number of Dimension ? >> If you reduce Dim, the time / resources to build the cube and final size >> of >> it decrease --> Its good >> >> An example can be DIM_Persons: Id_Person , Name, Surname, Address, . >>Id_Person can be HostColumn >> and other columns can be calculated from ID --> are Extended Column >> >> >> >> >> 2016-11-30 11:35 GMT+01:00 仇同心 : >> >> > Hi ,all >> > I don’t understand the usage scenarios of EXTENDED_COLUMN,although I >> saw >> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”. >> > What,s the means about parameters of “Host Column” and “Extended >> Column”? >> > Why use this expression,and what aspects of optimization that this >> > expression solved? >> > Can be combined with a SQL statement to explain? >> > >> > >> > Thanks~ >> > >> > > > > -- > With Warm regards > > Yiming Liu (刘一鸣) >
Re: Consulting "EXTENDED_COLUMN"
Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only used for representation, but not filtering or grouping which is done by HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a key/value map against the HOST_COLUMN. If the value in EXTENDED_COLUMN is not long, you could just define two dimensions with joint dimension setting, it has almost the same performance impact with EXTENDED_COLUMN which reduces one dimension, but better understanding. 2016-11-30 19:00 GMT+08:00 Alberto Ramón : > This will help you > http://kylin.apache.org/docs/howto/howto_optimize_cubes.html > > The idea is always, How I can reduce the number of Dimension ? > If you reduce Dim, the time / resources to build the cube and final size of > it decrease --> Its good > > An example can be DIM_Persons: Id_Person , Name, Surname, Address, . >Id_Person can be HostColumn > and other columns can be calculated from ID --> are Extended Column > > > > > 2016-11-30 11:35 GMT+01:00 仇同心 : > > > Hi ,all > > I don’t understand the usage scenarios of EXTENDED_COLUMN,although I saw > > this article “https://issues.apache.org/jira/browse/KYLIN-1313”. > > What,s the means about parameters of “Host Column” and “Extended Column”? > > Why use this expression,and what aspects of optimization that this > > expression solved? > > Can be combined with a SQL statement to explain? > > > > > > Thanks~ > > > -- With Warm regards Yiming Liu (刘一鸣)