Re: why does MetadataCleanupJob set, NEW_RESOURCE_THREADSHOLD_MS=12h limit ?

2018-10-30 Thread Alberto Ramón
I think this is done (but unodcumented)

Kylin 2602  v2.1, add job
threshold

I used it, and works fine

On Tue, 30 Oct 2018 at 11:54, ShaoFeng Shi  wrote:

> That is just for safe I think. We can make it configurable so that user
> can customize it.
>
> you Zhuang  于2018年10月30日周二 下午5:06写道:
>
>> When I want to cleanup a repeatable error cube ,and then rebuild , dict
>> error always occurs in build dict step, then I cleanup again and again ,the
>> dict
>>
>> Metadata is never be removed from meta table. I dive into source code ,
>> discover that MetadataCleanupJob only cleanup resources 12h before now.
>>
>> I think all resources not used can be removed excluding running build.
>> But 12h does not actually do it.
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: kylin 作为Grafana 支持的一个数据源

2018-10-16 Thread Alberto Ramón
If your column is by hours, Days, . . . this use case is good for Apache
Kylin
If your column is by TimeStamp, is not the best scenario for Apache Kylin

this means what in the best scenario in Grafana, you will see values
grouped by Hours

On Tue, 16 Oct 2018 at 13:20, 潘博存  wrote:

>
>
>-
>1.Grafana is time-based and needs to wrap the time columns, but that 
> doesn't mean that grafana's data sources are all sequential databases, just 
> as grafana supports MySQL and SQL Server。
>-
>2.In our business scenario, we put more emphasis on Grafan's external 
> presentation capabilities, and in terms of timelines we use our business 
> dates by day, hour, etc.
>
>
>
> So I think grafana + kylin is another form of presentation besides saiku, 
> tableup, and so on. In fact, we're trying to put saiku as a grafan's layout 
> plug-in into grafan for data presentation
>
>
>
>
>
> --
> 发件人:Alberto Ramón 
> 发送时间:2018年10月16日(星期二) 17:31
> 收件人:user 
> 抄 送:潘博存 ; dev 
> 主 题:Re: kylin 作为Grafana 支持的一个数据源
>
> I checked this possibility time ago (2-3years)
> Grafana is focus in time-line series (one column must be TimeStamp)
> Work with TS doesn't sense in A Kylin, because you are not aggregating
>
> On Tue, 16 Oct 2018 at 06:04, ShaoFeng Shi  wrote:
> Good question, let me translate it to English:
>
> Grafana is one of our important data visualization tools; Kylin is a
> powerful tool for big data query, we want to display Kylin data on Grafana,
> is there anyone already running this solution? Is there a grafana-kylin
> plugin that can be used directly? Currently, Grafana doesn't provide a
> plugin for Kylin.
>
> 潘博存  于2018年10月16日周二 上午11:27写道:
>
> hi,all
>大数据可视化这一块,Grafana 是我们的一个重要展现工具,kylin 的快速查询 是大数据查询的利器,我们想在grafana
> 上展示kylin的数据,不知道大家有没有这样使用的?是否有可以直接使用的grafana -kylin 插件.目前grafana
> 是没有kylin的插件的,
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>
>


Re: kylin 作为Grafana 支持的一个数据源

2018-10-16 Thread Alberto Ramón
I checked this possibility time ago (2-3years)
Grafana is focus in time-line series (one column must be TimeStamp)
Work with TS doesn't sense in A Kylin, because you are not aggregating

On Tue, 16 Oct 2018 at 06:04, ShaoFeng Shi  wrote:

> Good question, let me translate it to English:
>
> Grafana is one of our important data visualization tools; Kylin is a
> powerful tool for big data query, we want to display Kylin data on Grafana,
> is there anyone already running this solution? Is there a grafana-kylin
> plugin that can be used directly? Currently, Grafana doesn't provide a
> plugin for Kylin.
>
> 潘博存  于2018年10月16日周二 上午11:27写道:
>
>>
>> hi,all
>>大数据可视化这一块,Grafana 是我们的一个重要展现工具,kylin 的快速查询 是大数据查询的利器,我们想在grafana
>> 上展示kylin的数据,不知道大家有没有这样使用的?是否有可以直接使用的grafana -kylin 插件.目前grafana
>> 是没有kylin的插件的,
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Can I build an hierarchy aggregation group with joint dimensions

2018-09-20 Thread Alberto Ramón
https://issues.apache.org/jira/browse/KYLIN-2149

On Thu, 20 Sep 2018 at 05:52, you Zhuang  wrote:

> Example: (aid,aname),(bid,bname),(cid,cname).  The three joint dimensions
> are also hierarchical .


Re: #3 Step Name: Extract Fact Table Distinct Columns (slow)

2018-03-14 Thread Alberto Ramón
You can monitoring your yarn in step 3
In any case, step 3 is a sample of Fat table to estimate number of keys for
each dim
If this step takes a lot of time, you will need review your cube design

Alb

On 14 March 2018 at 16:54, Sonny Heer  wrote:

> 8 YARN nodes with 11 slots each.  each slot is configured to ~2gb.  Step
> #3 in Kylin is launching 19 mappers and 5 reducers.  5 reducers when there
> are 88 slots.
>
> btw: kylin version is 1.6
>
> On Wed, Mar 14, 2018 at 9:48 AM, Sonny Heer  wrote:
>
>> YARN is properly configured.  we use many other m/r and spark programs
>> that utilize the full slots.  It's only when building cubes.
>>
>> On Wed, Mar 14, 2018 at 9:46 AM, Alberto Ramón > > wrote:
>>
>>> You need  check your yarn configuration first
>>>
>>> On Wed, 14 Mar 2018, 14:58 Sonny Heer,  wrote:
>>>
>>>> Step 3 isn't using our full cluster.  How can i increase the
>>>> mappers/reducers to use all the slots?  Any config to look at in kylin?
>>>>
>>>> Thanks
>>>>
>>>
>>
>


Re: #3 Step Name: Extract Fact Table Distinct Columns (slow)

2018-03-14 Thread Alberto Ramón
You need  check your yarn configuration first

On Wed, 14 Mar 2018, 14:58 Sonny Heer,  wrote:

> Step 3 isn't using our full cluster.  How can i increase the
> mappers/reducers to use all the slots?  Any config to look at in kylin?
>
> Thanks
>


Re: RAW Measure kylin 2.3

2018-03-06 Thread Alberto Ramón
>From this mailList: questions about 'RAW'  measures

MailList

: Kylin 3062  v2.3
Propose to disable RAW from UI

On 6 Mar 2018 1:38 p.m., "deva namaste"  wrote:

> Hello,
>
> I do not see RAW measure after I upgraded to kylin version 2.3.
>
> Any other alternative measure we should use to show the raw data as is?
> (Instead of RAW measure, any other alternative which can be used?)
>
> Thanks
> Deva
>


Re: Get daily average for periodic readings

2018-03-01 Thread Alberto Ramón
You cant portioned your cube per week.  Must be per -mm-dd

You can perform your own test.  Doing a calculate per year as dim and year
as sum of days

On 1 Mar 2018 3:50 p.m., "deva namaste"  wrote:

> Hi Alberto,
>
> when I was saying 6 vs 365 its for one item. for 20 Million items it will
> multiply by a lot.  Do you think it wont make much differnce?
> Also what is  YY-MM-WW ? so I can explain you? Basically I need same
> avg() for week, month, year, etc.
>
> Thanks
> Deva
>
> On Thu, Mar 1, 2018 at 8:42 AM, Alberto Ramón 
> wrote:
>
>> - the 95% of time response, are latencies (= there is no difference
>> between sum one int or 365, I thought the same when I started with Kylin)
>> - The YY-MM-WW, is not implemented, but can be nice if you can contribute
>> to it
>>
>> Alb
>>
>> On 28 February 2018 at 22:59, deva namaste  wrote:
>>
>>> I was thinking of saving only 6 records in kylin instead of splitting
>>> them outside in daily avg and adding 365 records for each item.  So is
>>> there anyway I can achieve using sql level in kylin or have changes to
>>> model to accomodate above change? Please advice. Thanks
>>>
>>> On Wed, Feb 28, 2018 at 5:51 PM, Alberto Ramón <
>>> a.ramonporto...@gmail.com> wrote:
>>>
>>>> Sounds like:
>>>> - your minimum granularity for queries are on Weeks, your fact table
>>>> need be on weeks (or less, like days)
>>>> - you will need expand you actual fact table to weeks (or more, days)
>>>> Example use a hive view
>>>> - as extra:  Kylin can't use partition format columns on weeks, the
>>>> minimum es days
>>>>
>>>> Alb
>>>>
>>>> On 28 February 2018 at 21:51, deva namaste  wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> How would I calculate value for a week while I have bi-monthly values.
>>>>>
>>>>> e.g. Here is my data looks like -
>>>>>
>>>>> Date   -  Value
>>>>> 01/18/2017 -  100
>>>>> 03/27/2017 -  130  (68 Days)
>>>>> 05/17/2017 -  102  (51 Days)
>>>>>
>>>>> I need average value per week, as below. Lets consider between 03/27
>>>>> and 05/17. So total days between period are 51. so Daily average would be
>>>>> 102/51= 2.04
>>>>>
>>>>> Week4 (Starting March 26, #days = 4) = (4 x 2.04) = 8.16
>>>>> Week1 (Starting Apr 2, #days = 7) = 14.28
>>>>> Week2 (starting Apr 9, #days = 7)= 14.28
>>>>> Week3 (starting Apr 16, #days = 7)= 14.28
>>>>> Week4 (starting Apr 23, #days = 7)= 14.28
>>>>> week5 (Starting Apr 30, #days =7)= 14.28
>>>>> week1 (starting May 7, #days = 7)= 14.28
>>>>> Week2 (starting May 14, #days = 4)= 8.16
>>>>>
>>>>> But as you see that period from 01/18 to 03/27, have 68 days and daily
>>>>> average would be 130/68=1.91
>>>>>
>>>>> So really to get complete week I need 3 days from 130 value and 4 days
>>>>> from 102 value.
>>>>>
>>>>> So real total for that first week would be -
>>>>> Week4 (Starting March 26, #days = 4) = (4x2.04=8.16) + (3x1.91=5.73) =
>>>>> 13.89
>>>>>
>>>>> How would I achieve this in Kylin? Any function? or other method I can
>>>>> use?
>>>>> Just for 6 records for year, I dont want to populate daily records.
>>>>> Thanks
>>>>> Deva
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Get daily average for periodic readings

2018-03-01 Thread Alberto Ramón
- the 95% of time response, are latencies (= there is no difference between
sum one int or 365, I thought the same when I started with Kylin)
- The YY-MM-WW, is not implemented, but can be nice if you can contribute
to it

Alb

On 28 February 2018 at 22:59, deva namaste  wrote:

> I was thinking of saving only 6 records in kylin instead of splitting them
> outside in daily avg and adding 365 records for each item.  So is there
> anyway I can achieve using sql level in kylin or have changes to model to
> accomodate above change? Please advice. Thanks
>
> On Wed, Feb 28, 2018 at 5:51 PM, Alberto Ramón 
> wrote:
>
>> Sounds like:
>> - your minimum granularity for queries are on Weeks, your fact table need
>> be on weeks (or less, like days)
>> - you will need expand you actual fact table to weeks (or more, days)
>> Example use a hive view
>> - as extra:  Kylin can't use partition format columns on weeks, the
>> minimum es days
>>
>> Alb
>>
>> On 28 February 2018 at 21:51, deva namaste  wrote:
>>
>>> Hello,
>>>
>>> How would I calculate value for a week while I have bi-monthly values.
>>>
>>> e.g. Here is my data looks like -
>>>
>>> Date   -  Value
>>> 01/18/2017 -  100
>>> 03/27/2017 -  130  (68 Days)
>>> 05/17/2017 -  102  (51 Days)
>>>
>>> I need average value per week, as below. Lets consider between 03/27 and
>>> 05/17. So total days between period are 51. so Daily average would be
>>> 102/51= 2.04
>>>
>>> Week4 (Starting March 26, #days = 4) = (4 x 2.04) = 8.16
>>> Week1 (Starting Apr 2, #days = 7) = 14.28
>>> Week2 (starting Apr 9, #days = 7)= 14.28
>>> Week3 (starting Apr 16, #days = 7)= 14.28
>>> Week4 (starting Apr 23, #days = 7)= 14.28
>>> week5 (Starting Apr 30, #days =7)= 14.28
>>> week1 (starting May 7, #days = 7)= 14.28
>>> Week2 (starting May 14, #days = 4)= 8.16
>>>
>>> But as you see that period from 01/18 to 03/27, have 68 days and daily
>>> average would be 130/68=1.91
>>>
>>> So really to get complete week I need 3 days from 130 value and 4 days
>>> from 102 value.
>>>
>>> So real total for that first week would be -
>>> Week4 (Starting March 26, #days = 4) = (4x2.04=8.16) + (3x1.91=5.73) =
>>> 13.89
>>>
>>> How would I achieve this in Kylin? Any function? or other method I can
>>> use?
>>> Just for 6 records for year, I dont want to populate daily records.
>>> Thanks
>>> Deva
>>>
>>>
>>>
>>
>


Re: Questions about 'RAW' measure

2018-03-01 Thread Alberto Ramón
MailList
<http://apache-kylin.74782.x6.nabble.com/Discuss-Disable-hide-RAW-measure-in-Kylin-web-GUI-tp6636.html>:
Kylin 3062 <https://issues.apache.org/jira/browse/KYLIN-3062> v2.3 Propose
to disable RAW from UI

Nowadays you cant control the execution (or not) to create Flat Tables,
there is a propuse Kylin 2532
<https://issues.apache.org/jira/browse/KYLIN-2532?focusedCommentId=15956535&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15956535>
v2.1



On 1 March 2018 at 08:30, BELLIER Jean-luc 
wrote:

> Hello Alberto,
>
>
>
> Thank you for your answer. I will look further for this mistake on the
> cube building.
>
>
>
> Concerning the RAW measure, are you referring to this discussion  ?
>
> I still can see this option on measures section on Kylin 2.2, that is why
> it kept my attention.
>
> Does it mean that to access raw data, we need to first use an aggregated
> measure ? My final users mainly use raw data (e.g. slicing), so I want to
> be sure on that.
>
>
>
> What about building cubes using only a table of facts with all the data
> inside ? Is it a conceivable way of doing (in terms of space storage,
> efficiency) or is it preferable to use separate tables foe dimensions and
> why ?
>
>
>
> Thank you in advance for your help.
>
> Have a good day.
>
>
>
> Best regards,
>
> Jean-Luc.
>
>
>
> *De :* Alberto Ramón [mailto:a.ramonporto...@gmail.com]
> *Envoyé :* mercredi 28 février 2018 19:04
> *À :* user 
> *Objet :* Re: Questions about 'RAW' measure
>
>
>
> Hello
>
> - RAW format are deprecated. You will find the thread in this MailList
> - "Job hasn't been submitted after" sound a configuration problem with
> your YARN, please find it on Google and review your CPU and RAM resources
>
>
>
> On 28 February 2018 at 16:44, BELLIER Jean-luc <
> jean-luc.bell...@rte-france.com> wrote:
>
> Hello
>
>
>
> I discovered that there wsas a RAW measure to get raw data instead of
> aggregated data (http://kylin.apache.org/blog/2016/05/29/raw-measure-in-
> kylin/)
>
>
>
> My assumption is that these raw data are stored in HBase, as aggregated
> data are, i.e. these data are duplicated from Hive into HBase.
>
> So my question is : are there limitations on the data volume ? My fact
> tables contain billions of rows and we need to get detailed information
> from them. So what are the restrictions, and also the benefits related to
> querying directly the data into Hive ?
>
>
>
> I have another question : I tested the way to create a model directly from
> a  facts table containing raw data, in order to make a test of feasibility
> and avoid transformations (the table is a CSV file provided by an external
> team). I wanted in a first step to avoid creating files for the
> corresponding dimensions a generate a “clean” facts table having foreign
> keys corresponding to  the primary keys of dimension tables.
>
> The creation of the model was OK.
>
> However the cube generation failed at first step, and I got this message :
>
>
>
> INFO  : Query ID = hive_20180228120101_6990f9d4-
> 182d-4dd9-b319-fce02caf75ef
>
> INFO  : Total jobs = 3
>
> INFO  : Launching Job 1 out of 3
>
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
>
> INFO  : In order to change the average load for a reducer (in bytes):
>
> INFO  :   set hive.exec.reducers.bytes.per.reducer=
>
> INFO  : In order to limit the maximum number of reducers:
>
> INFO  :   set hive.exec.reducers.max=
>
> INFO  : In order to set a constant number of reducers:
>
> INFO  :   set mapreduce.job.reduces=
>
> INFO  : Starting Spark Job = 3556ecc6-2609-4085-bcca-b1b81fa9855c
>
> ERROR : Job hasn't been submitted after 61s. Aborting it.
>
>
>
> How could I process to avoid this. Are there kylin parameters (or other)
> to adjust ?
>
>
>
> Thank you in advance for your help. Have a good day.
>
> Best regards,
>
> Jean-Luc
>
>
>
>
>
>
>
>
>
> "Ce message est destiné exclusivement aux personnes ou entités auxquelles
> il est adressé et peut contenir des informations privilégiées ou
> confidentielles. Si vous avez reçu ce document par erreur, merci de nous
> l'indiquer par retour, de ne pas le transmettre et de procéder à sa
> destruction.
>
> This message is solely intended for the use of the individual or entity to
> which it is addressed and may contain information that is privileged or
> confidential. If you have received this communication by error, please
> notify us immediately by electronic mail, do not disclose it and delete the
> original m

Re: Get daily average for periodic readings

2018-02-28 Thread Alberto Ramón
Sounds like:
- your minimum granularity for queries are on Weeks, your fact table need
be on weeks (or less, like days)
- you will need expand you actual fact table to weeks (or more, days)
Example use a hive view
- as extra:  Kylin can't use partition format columns on weeks, the minimum
es days

Alb

On 28 February 2018 at 21:51, deva namaste  wrote:

> Hello,
>
> How would I calculate value for a week while I have bi-monthly values.
>
> e.g. Here is my data looks like -
>
> Date   -  Value
> 01/18/2017 -  100
> 03/27/2017 -  130  (68 Days)
> 05/17/2017 -  102  (51 Days)
>
> I need average value per week, as below. Lets consider between 03/27 and
> 05/17. So total days between period are 51. so Daily average would be
> 102/51= 2.04
>
> Week4 (Starting March 26, #days = 4) = (4 x 2.04) = 8.16
> Week1 (Starting Apr 2, #days = 7) = 14.28
> Week2 (starting Apr 9, #days = 7)= 14.28
> Week3 (starting Apr 16, #days = 7)= 14.28
> Week4 (starting Apr 23, #days = 7)= 14.28
> week5 (Starting Apr 30, #days =7)= 14.28
> week1 (starting May 7, #days = 7)= 14.28
> Week2 (starting May 14, #days = 4)= 8.16
>
> But as you see that period from 01/18 to 03/27, have 68 days and daily
> average would be 130/68=1.91
>
> So really to get complete week I need 3 days from 130 value and 4 days
> from 102 value.
>
> So real total for that first week would be -
> Week4 (Starting March 26, #days = 4) = (4x2.04=8.16) + (3x1.91=5.73) =
> 13.89
>
> How would I achieve this in Kylin? Any function? or other method I can
> use?
> Just for 6 records for year, I dont want to populate daily records.
> Thanks
> Deva
>
>
>


Re: Questions about 'RAW' measure

2018-02-28 Thread Alberto Ramón
Hello

- RAW format are deprecated. You will find the thread in this MailList
- "Job hasn't been submitted after" sound a configuration problem with your
YARN, please find it on Google and review your CPU and RAM resources

On 28 February 2018 at 16:44, BELLIER Jean-luc <
jean-luc.bell...@rte-france.com> wrote:

> Hello
>
>
>
> I discovered that there wsas a RAW measure to get raw data instead of
> aggregated data (http://kylin.apache.org/blog/2016/05/29/raw-measure-in-
> kylin/)
>
>
>
> My assumption is that these raw data are stored in HBase, as aggregated
> data are, i.e. these data are duplicated from Hive into HBase.
>
> So my question is : are there limitations on the data volume ? My fact
> tables contain billions of rows and we need to get detailed information
> from them. So what are the restrictions, and also the benefits related to
> querying directly the data into Hive ?
>
>
>
> I have another question : I tested the way to create a model directly from
> a  facts table containing raw data, in order to make a test of feasibility
> and avoid transformations (the table is a CSV file provided by an external
> team). I wanted in a first step to avoid creating files for the
> corresponding dimensions a generate a “clean” facts table having foreign
> keys corresponding to  the primary keys of dimension tables.
>
> The creation of the model was OK.
>
> However the cube generation failed at first step, and I got this message :
>
>
>
> INFO  : Query ID = hive_20180228120101_6990f9d4-
> 182d-4dd9-b319-fce02caf75ef
>
> INFO  : Total jobs = 3
>
> INFO  : Launching Job 1 out of 3
>
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
>
> INFO  : In order to change the average load for a reducer (in bytes):
>
> INFO  :   set hive.exec.reducers.bytes.per.reducer=
>
> INFO  : In order to limit the maximum number of reducers:
>
> INFO  :   set hive.exec.reducers.max=
>
> INFO  : In order to set a constant number of reducers:
>
> INFO  :   set mapreduce.job.reduces=
>
> INFO  : Starting Spark Job = 3556ecc6-2609-4085-bcca-b1b81fa9855c
>
> ERROR : Job hasn't been submitted after 61s. Aborting it.
>
>
>
> How could I process to avoid this. Are there kylin parameters (or other)
> to adjust ?
>
>
>
> Thank you in advance for your help. Have a good day.
>
> Best regards,
>
> Jean-Luc
>
>
>
>
>
>
>
>
> "Ce message est destiné exclusivement aux personnes ou entités auxquelles
> il est adressé et peut contenir des informations privilégiées ou
> confidentielles. Si vous avez reçu ce document par erreur, merci de nous
> l'indiquer par retour, de ne pas le transmettre et de procéder à sa
> destruction.
>
> This message is solely intended for the use of the individual or entity to
> which it is addressed and may contain information that is privileged or
> confidential. If you have received this communication by error, please
> notify us immediately by electronic mail, do not disclose it and delete the
> original message."
>


RE: Optimize Cube Build process

2018-02-01 Thread Alberto Ramón
How many process are you runing in parallel? In build cube step

On 1 Feb 2018 7:39 a.m., "Kumar, Manoj H" 
wrote:

> We have 15 nodes & each nodes have 8 cores. RAM= 256 MB.
>
>
>
> I don’t think memory is issue here.
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Alberto Ramón [mailtoa.ramonporto...@gmail.com]
> *Sent:* Thursday, February 01, 2018 1:33 AM
> *To:* user 
> *Subject:* Re: Optimize Cube Build process
>
>
>
> How many nodes do you have?
>
> how many RAM and CPU do you have per node?
>
>
>
> On 31 January 2018 at 05:07, Kumar, Manoj H 
> wrote:
>
> It has close to 68 mapper & reducers 500.. It keeps running on this. Pls.
> advise.
>
> [image: cid:image001.png@01D39B5D.E7ED9FD0]
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Kumar, Manoj H
> *Sent:* Wednesday, January 31, 2018 9:24 AM
> *To:* 'user@kylin.apache.org' 
> *Subject:* Optimize Cube Build process
>
>
>
> Hi Folks – I have close to 33 million of fact data to be processed, Data
> is having lot of unique/Distinct values such Loan_unique_code,
> Facility_code,card_id such.. Dimension looks up are made of these.
>
>
>
> Fact table – 33 millions
>
> Looks up tables having to 3 to 4 millions
>
> Cube build type I have chosen – inmem
>
> Engine – Mapreduce
>
>
>
> Cube build step is taking 90 minutes which is seems to be high. What I can
> do in order to minimize build time? What Parameter I should tweak so that
> Build time gets reduced. Thanks.
>
>
>
>
>
> I have Followed the same steps as given below but it doesn’t help in this
> case
>
>
>
> http://kylin.apache.org/docs21/howto/howto_optimize_build.html
>
>
>
>
>
> Regards,
>
> Manoj
>
>
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>
>
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>


Re: Optimize Cube Build process

2018-01-31 Thread Alberto Ramón
How many nodes do you have?
how many RAM and CPU do you have per node?

On 31 January 2018 at 05:07, Kumar, Manoj H 
wrote:

> It has close to 68 mapper & reducers 500.. It keeps running on this. Pls.
> advise.
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Kumar, Manoj H
> *Sent:* Wednesday, January 31, 2018 9:24 AM
> *To:* 'user@kylin.apache.org' 
> *Subject:* Optimize Cube Build process
>
>
>
> Hi Folks – I have close to 33 million of fact data to be processed, Data
> is having lot of unique/Distinct values such Loan_unique_code,
> Facility_code,card_id such.. Dimension looks up are made of these.
>
>
>
> Fact table – 33 millions
>
> Looks up tables having to 3 to 4 millions
>
> Cube build type I have chosen – inmem
>
> Engine – Mapreduce
>
>
>
> Cube build step is taking 90 minutes which is seems to be high. What I can
> do in order to minimize build time? What Parameter I should tweak so that
> Build time gets reduced. Thanks.
>
>
>
>
>
> I have Followed the same steps as given below but it doesn’t help in this
> case
>
>
>
> http://kylin.apache.org/docs21/howto/howto_optimize_build.html
>
>
>
>
>
> Regards,
>
> Manoj
>
>
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>


Re: segment size estimate when merging

2018-01-27 Thread Alberto Ramón
Could be this related? KYLIN-2779
, this JIRA have a lot of
sense

On 24 January 2018 at 13:43, ShaoFeng Shi  wrote:

> Hi Qilong,
>
> If seg A's estimation size is 10 GB, but real size is 5 GB; then when
> merge or build another segment, we can adjust the estimated size by divide
> by 2. Then it should be closer with real size.
>
> 2018-01-24 9:49 GMT+08:00 苏启龙 :
>
>> Many thanks shaofeng! We’ll check more on these parameters to see how to
>> make it better.
>>
>> BTW, what do u mean by the last line? I mean by which way I can introduce
>> the actual size to help Kylin to adjust the estimation? Currently I can
>> only use the max-regions parameter manually, but this is not convenient for
>> auto-merging.
>>
>> QIlong
>>
>> 发件人: ShaoFeng Shi 
>> 答复: "user@kylin.apache.org" 
>> 日期: 2018年1月23日 星期二 21:49
>>
>> 至: user 
>> 抄送: 林豪(linhao)-技术产品中心 
>> 主题: Re: segment size estimate when merging
>>
>> Hi Qilong,
>>
>> Does your cube have count-distinct or Top-N measure?
>>
>> If you observed that there are too many or too small hbase regions, you
>> can adjust some parameters:
>>
>> kylin.cube.size-estimate-ratio=0.25
>> kylin.cube.size-estimate-countdistinct-ratio=0.05
>>
>> The default ratio for common case is 0.25, you can set it to smaller if
>> the estimated size is bigger than actual size. These two parameters can be
>> set at Cube level.
>>
>> A better way is when doing merge, using the actual size of existing
>> segments to adjust the estimated size, then get a closer result.
>>
>> 2018-01-23 14:47 GMT+08:00 苏启龙 :
>>
>>> Hi shaofeng,
>>>
>>> Yes, it’s usually smaller then the sum of each segment, but usually a
>>> small amount compared with the total size.
>>>
>>> But for the statistics estimate, usually result in a N times larger than
>>> it actually be, and results in a huge waste of HBase region numbers。
>>>
>>>
>>>1. Do you have any data about deviation of the two ways in
>>>statistics? I mean generally which way will be closer?
>>>2. Is there any improve plan for this in the roadmap? Or some
>>>consideration to give more options to user to select their own estimate
>>>algo?
>>>
>>>
>>> Thanks
>>>
>>> Qilong
>>>
>>> 发件人: ShaoFeng Shi 
>>> 答复: "user@kylin.apache.org" 
>>> 日期: 2018年1月23日 星期二 09:43
>>> 至: user 
>>> 抄送: 林豪(linhao)-技术产品中心 
>>> 主题: Re: segment size estimate when merging
>>>
>>> Hi Qilong,
>>>
>>> When merging segments, the dimension-measure values (k-v) will be
>>> re-orged and the same key will be merged, so the merged size is not simply
>>> a sum of each segment; usually, it is smaller than before.
>>>
>>> Always using the statistics to estimate the size is for consistency. Of
>>> course, there is room to improve the estimation accuracy.
>>>
>>>
>>>
>>> 2018-01-22 16:54 GMT+08:00 苏启龙 :
>>>

 Hi,

 We have some unclear points about the segment size estimate when
 merging multi-segments.

 We find that the segment merge job still uses
 CubeStatsReader::getCuboidSizeMap to estimate the total size of the
 merged segment. From our understanding, when building a new segment, Kylin
 uses this way to estimate the total size is OK since no other info we can
 turn to. But in merging we may sum the table size of the segments to be
 merged, which should be more accurate.

 So why for this consideration?



 Su Qilong

>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: MDX queries on kylin cubes.

2018-01-24 Thread Alberto Ramón
You can check this Q in older mail List.

You must think that Apache Kylin has been designed to use SQL as language
(Use Apache Calcite to do it)

Either way, if you want use MDX:

E xcel using Mondrian

Mondrian


On 24 January 2018 at 20:02, Db-Blog  wrote:

> Hi Prasanna & Team,
> Can you please suggest if you were able to access kylin cube using MDX
> queries?
>
> Thanks,
> Saurabh
>
> Sent from my iPhone, please avoid typos.
>
> On 17-Jan-2018, at 10:06 AM, Prasanna 
> wrote:
>
> Hi all,
>
>
>
>   I am using kylin 2.2.0 version. Present I am using only sql type queries
> on kylin cubes like select with aggregation functions. I would like to use
> MDX queries on cubes. If anybody is using please can you guide me, any
> document is available regarding of this.
>
>
>
>
>
> Thanks,
>
> Prasanna.P
>
>


Re: #20 Step Name: Load HFile to HBase Table failed

2018-01-22 Thread Alberto Ramón
You created HFiles, but Kylin don't have permissions to execute
CompleteBulkLoad
Its a typical issue on this mail-list, check permission of user that start
Kylin service

On 22 January 2018 at 09:14, Neters  wrote:

> Hello guys:
>
> I have some problem when the program access to  #20 Step Name: Load HFile
> to HBase Table ;
> and the log displays that:
>
> Could you please advice me some solution to check it out?
>
> The detail kylin.log is attached,please check it.
>
> Thank you
>
> Best Regards
>


Re: Re: kylin前端业务查询问题

2018-01-21 Thread Alberto Ramón
could you check these notes, too:

To use, kylin.query.timeout-seconds, you will need Kylin 2.0
https://issues.apache.org/jira/browse/KYLIN-2847 (v2.4)
https://issues.apache.org/jira/browse/KYLIN-3157 (open)

2018-01-17 5:15 GMT+00:00 杨浩 :

> The conf of ''kylin.query.timeout-seconds" may help to stop long query
>
> 在 2017年12月29日 下午3:13,chenping...@keruyun.com 写道:
>
>> 多谢你的及时回复
>>
>> --
>>
>> 陈平  DBA工程师
>>
>>
>>
>> 成都时时客科技有限责任公司
>>
>> 地址:成都市高新区天府大道1268号1栋3层
>>
>> 邮编:610041
>>
>> 手机:15108456581 <(510)%20845-6581>
>>
>> 在线:QQ 625852056
>>
>> 官网:www.keruyun.com
>>
>> 客服:4006-315-666
>>
>>
>>
>>
>> *发件人:* Joanna He 
>> *发送时间:* 2017-12-29 15:05
>> *收件人:* user 
>> *主题:* Re: kylin前端业务查询问题
>> Translation: Hello my question is when there are multiple queries running
>> , how can I know what query is currently running. And how can I kill
>> the long-running query?
>>
>> Answer:
>> You can view your currently running query in logs/kylin.property under
>> your kylin installation directory.
>> There is no way to kill single query in kylin at the moment, the only way
>> to stop the query is to stop and start the kylin server.
>>
>> 你可以在kylin安装路径下的logs/kylin.property 中查看当前在运行的查询。目前你无法在kylin中kill掉单
>> 独的查询,只有靠重启kylin服务器来停止查询。
>>
>>
>> 2017-12-29 14:59 GMT+08:00 chenping...@keruyun.com <
>> chenping...@keruyun.com>:
>>
>>>
>>> 各位好,我现在遇到一个比较大的问题,前端有很多查询同时过来,我想知道怎么去查看当前的kylin实例正在运行哪些查询并且怎
>>> 么去kill掉运行时间很久的查询?
>>>
>>>
>>>
>>> --
>>>
>>> 陈平  DBA工程师
>>>
>>>
>>>
>>> 成都时时客科技有限责任公司
>>>
>>> 地址:成都市高新区天府大道1268号1栋3层
>>>
>>> 邮编:610041
>>>
>>> 手机:15108456581 <(510)%20845-6581>
>>>
>>> 在线:QQ 625852056
>>>
>>> 官网:www.keruyun.com
>>>
>>> 客服:4006-315-666
>>>
>>>
>>>
>>
>>
>


Re: flat table stored as parquet

2017-12-29 Thread Alberto Ramón
Could you check this: Kylin 3070
 v2.3

On 29 December 2017 at 22:22, Ruslan Dautkhanov 
wrote:

> Is there is a knob I can set to tell Kylin to create flat table
> as Parquet and not as default 'text' serialization?
> I mean that "flat" Hive table that Kylin creates when it builds a cube.
>
>
> Thanks!
> Ruslan Dautkhanov
>
>


Doubt Kylin 2363

2017-12-09 Thread Alberto Ramón
KYLIN-3067 

If you put dim_cap=3 and Dim are A,B,C,D
And you lunch a query  . . .  Group by A,B,C,D, how is this Q resolved?
is Base Cuboid is A,B,C,D?
Internal Cuboids are deleted after used?

BR, Alb


Re: availableVirtualCores

2017-11-29 Thread Alberto Ramón
yes, sorry:

When you execute:* ${KYLIN_HOME}/bin/check-env.sh*

it creates a file:  ${KYLIN_HOME}/logs/cluster.info with this text:
 availableMB=40460<- Correct
availableVirtualCor*es=3 * <- NO correct

which is used by: check-spark.sh in lines:
''saveFileName=${KYLIN_HOME}/logs/cluster.info"
"*yarn_available_cores=`getValueByKey availableVirtualCores
${saveFileName}`*"

On 28 November 2017 at 01:36, Li Yang  wrote:

> Where do you see -- Cluster.info: 'availableVirtualCores=3'??
>
> Cannot recognize it.
>
> On Sat, Nov 25, 2017 at 4:29 AM, Alberto Ramón 
> wrote:
>
>> Hello
>>
>> From Ambari, the number of virtual cores is 4:
>> [image: Inline images 1]
>>
>> But in the file Cluster.info: 'availableVirtualCores=3'
>>
>> (RAM is correct)
>>
>> I don't know from where Kylin read this config
>>
>
>


availableVirtualCores

2017-11-24 Thread Alberto Ramón
Hello

>From Ambari, the number of virtual cores is 4:
[image: Inline images 1]

But in the file Cluster.info: 'availableVirtualCores=3'

(RAM is correct)

I don't know from where Kylin read this config


Re: Can hierarchyDims contain jointDims

2017-11-17 Thread Alberto Ramón
https://issues.apache.org/jira/browse/KYLIN-2149

Check this link, you need choose between use one or other
Some times would be great use both together

On 17 November 2017 at 06:43, doom <43535...@qq.com> wrote:

> So what's the second code segment mean in AggregationGroup build step?
> is it means replace the hierarchy dim with the joint dims witch contain it?
>
>
> -- 原始邮件 --
> *发件人:* "ShaoFeng Shi";;
> *发送时间:* 2017年11月17日(星期五) 下午2:02
> *收件人:* "user";
> *主题:* Re: Can hierarchyDims contain jointDims
>
> Joint could not be used in the hierarchy.
>
> Joint means treating multiple dimensions as one: they either all appeared,
> either all not; It is a conflict with hierarchy.
>
> 2017-11-16 21:29 GMT+08:00 doom <43535...@qq.com>:
>
>> HI ALL:
>> I read the src code of kylin 2.2, and find:
>>
>> In class CubeDes, if hierarchyDims contain jointDims will throw exception.
>> public void validateAggregationGroups() {
>> ...
>> if (CollectionUtils.containsAny(hierarchyDims, jointDims)) {
>> logger.error("Aggregation group " + index + " hierarchy
>> dimensions overlap with joint dimensions");
>> throw new IllegalStateException(
>> "Aggregation group " + index + " hierarchy
>> dimensions overlap with joint dimensions: "
>> + 
>> ensureOrder(CollectionUtils.intersection(hierarchyDims,
>> jointDims)));
>> }
>>
>> But in class AggregationGroup will replace the hierarchy dim with the
>> joint dims witch contain it.
>> private void buildHierarchyMasks(RowKeyDesc rowKeyDesc) {
>> .
>> for (int i = 0; i < hierarchy_dims.length; i++) {
>> TblColRef hColumn = cubeDesc.getModel().findColumn
>> (hierarchy_dims[i]);
>> Integer index = rowKeyDesc.getColumnBitIndex(hColumn);
>> long bit = 1L << index;
>>
>> // combine joint as logic dim
>> if (dim2JointMap.get(bit) != null) {
>> bit = dim2JointMap.get(bit);
>> }
>>
>> mask.fullMask |= bit;
>> allMaskList.add(mask.fullMask);
>> dimList.add(bit);
>> }
>> }
>>
>> do i understand in a wrong way?
>>
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: [Announce] New Apache Kylin PMC Billy Liu

2017-10-16 Thread Alberto Ramón
Congratuolations to  Bill, Guosheng and Cheng Wang  !!

On 16 October 2017 at 11:33, Luke Han  wrote:

> On behalf of the Apache Kylin PMC, I am very pleased to announce
> that Billy Liu has accepted the PMC's invitation to become a
> PMC member on the project.
>
> We appreciate all of Billy's generous contributions about many bug
> fixes, patches, helped many users. We are so glad to have him to be
> our new PMC and looking forward to his continued involvement.
>
> Congratulations and Welcome, Billy!
>


Kylin and SuperSet

2017-09-12 Thread Alberto Ramón
Hi

Will be an official support of Apache Kylin on Apache SuperSet?


Re: Some questions about Kylin2.0

2017-06-13 Thread Alberto Ramón
Q1:Kylin 2633   The
actual version of spark is 1.6.3 (in Kylin 2.0.0)

On 13 June 2017 at 04:41, lxw  wrote:

> Hi,All :
>
>I have some questions about Kylin2.0, and my environment:
> hadoop-2.6.0-cdh5.8.3
> hbase-1.2.0-cdh5.8.3
> apache-kylin-2.0.0-bin-cdh57
> spark-2.1.0-bin-hadoop2.6
>
> *Q1: Kylin2.0 not support Spark2.0?*
>
>  find-spark-dependency.sh:
>  spark_dependency=`find -L $spark_home -name
> 'spark-assembly-[a-z0-9A-Z\.-]*.jar' 
>
> *Q2: I want to use Kylin2.0 without Spark Cubing, but failed.*
>
>  kylin.sh:
>  function retrieveDependency() {
>  #retrive $hive_dependency and $hbase_dependency
>  source ${dir}/find-hive-dependency.sh
>  source ${dir}/find-hbase-dependency.sh
>  source ${dir}/find-hadoop-conf-dir.sh
>  source ${dir}/find-kafka-dependency.sh
>  source ${dir}/find-spark-dependency.sh
>
>  If not found spark dependencies, Kylin can not start :
>
>  [hadoop@hadoop10 bin]$ ./kylin.sh start
>  Retrieving hadoop conf dir...
>  KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
>  Retrieving hive dependency...
>  Retrieving hbase dependency...
>  Retrieving hadoop conf dir...
>  Retrieving kafka dependency...
>  Retrieving Spark dependency...
>  *spark assembly lib not found.*
>
>  after modify kylin.sh “**source ${dir}/find-spark-dependency.sh”,
> Kylin start success ..
>
> *Q3: Abount kylin_hadoop_conf_dir ?*
>
>  I make some soft link under $KYLIN_HOME/hadoop-conf
> (core-site.xml、yarn-site.xml、hbase-site.xml、hive-site.xml),
>  and set 
> "kylin.env.hadoop-conf-dir=/home/bigdata/kylin/current/hadoop-conf",
> when I execute ./check-env.sh,
>
>  *[hadoop@hadoop10 bin]$ ./check-env.sh *
> * Retrieving hadoop conf dir...*
> */home/bigdata/kylin/current/hadoop-conf is override as the
> kylin_hadoop_conf_dir*
> *KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current*
> *-mkdir: java.net.UnknownHostException: cdh5*
> *Usage: hadoop fs [generic options] -mkdir [-p]  ...*
> *Failed to create /kylin20. Please make sure the user has right to
> access /kylin20*
>
> My HDFS with HA, fs.defaultFS is "cdh5",when I don't set
> "kylin.env.hadoop-conf-dir", and use HADOOP_CONF_DIR, HIVE_CONF, 
> HBASE_CONF_DIR
> from envionment variables (/etc/profile), it was correct.
>
>
> Best Regards!
> lxw
>


Re: Why in the Convert Cuboid Data to HFile step to start too many maps and reduces

2017-05-27 Thread Alberto Ramón
Sounds like a YARN configuration problem
Parallelize is good :), not all Map / reduces are executed at same times
Check some configurations like:

   -

   yarn.nodemanager.resource.memory-mb per node
   -

   yarn.nodemanager.resource.cpu-vcores per node

This can help you to start:
https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html

If your cluster is very small, put block size to 256 MB can be too big, you
can try with 128 MB

On 27 May 2017 at 08:49, jianhui.yi  wrote:

> My model have 7 tables,a cube have 15 dimensions, in the “Convert Cuboid
> Data to HFile” step to start too many maps and reduces(maps 500+,reduces
> 1.4k+),This step expend all resources of the small cluster.
>
> I set these parameters in the cluster:
>
> dfs.block.size=256M
>
> hive.exec.reducers.bytes.per.reducer=1073741824
>
> hive.merge.mapfiles=true
>
> hive.merge.mapredfiles=true
>
> hive.merge.size.per.task=256M
>
>
>
> kylin_hive_conf.xml this file uses the default settings
>
> Where can I turning performance optimization?
>
> Thanks.
>


Re: Cannot install Kylin on CDH 5.11 CentOS 7

2017-05-22 Thread Alberto Ramón
Java recomended (I dont know if mandatory) is 1.7

https://kylin.apache.org/docs20/install/hadoop_env.html


On 22 May 2017 at 22:59, Szalai Gergely  wrote:

> Hi All,
>
> We have a blocking issue by installing Kylin on CDH 5.11. By executing
> such lines on CentOS 7 we always getting empty strings.
>
> Could you please advise?
>
> bash $KYLIN_HOME/bin/get-properties.sh kylin.env.hdfs-working-dir
>
> KYLIN_HOME points to the right location, it also gives empty when we call
> directly bash get-properties.sh kylin.env.hdfs-working-dir
>
> Could it be a JAVA issue? we have only 1.6 installed.
>
> Many thanks in advance.
> Regards
>
>
>
>
> ​Kérjük, gondoljon a környezetére, mielőtt kinyomtatja ezt a levelet!
> Please think of environment before printing this e-mail!
>


Re: How to apply historical Updates to existing cube data

2017-05-11 Thread Alberto Ramón
Q1- Check this previous mailList about late data:
http://apache-kylin.74782.x6.nabble.com/Reloading-data-td5669.html

You only will need recalculate segments involved

Q2- Check Shardin (https://issues.apache.org/jira/browse/KYLIN-1453)
  Partition by time column is not reoomended (It Will create hotspot in
HBase)



On 11 May 2017 at 19:43, Nirav Patel  wrote:

> Hi,
>
> Correct me if I am wrong but currently you can not update existing kylin
> cube without refreshing entire cube. Does it mean if I am pulling new data
> from hive based on lets say customerId, Timestamp for which I already built
> cube before I have to rebuild entire cube from scratch? Or can I say
> refresh between startTime and endTime which will update cube data for that
> timeframe only.
>
> Also Hive data can be partitioned by any keys(columns) not just timestamp.
> so why not allow kylin cube updates based on any arbitrary partition
> strategy that user have defined on their hive table?
> e.g. update part of the cube based on timestamp, customerid, batchid etc.
>
> Thanks,
> Nirav
>
>
>
> [image: What's New with Xactly] 
>
>   [image: LinkedIn]
>   [image: Twitter]
>   [image: Facebook]
>   [image: YouTube]
> 


Re: 答复: kylin nonsupport Multi-value dimensions?

2017-05-10 Thread Alberto Ramón
You can convert this dim to string and check performance using like filters

With hive duplicate values in fact table.  One for each dim value

Other complex solution can be extended dictionary encode dimension to
understand multivalues

No more ideas :)


On 10 May 2017 8:51 a.m., "jianhui.yi"  wrote:

Sorry, I write it wrongly,this problem is multi-value dimension,

Example: I have a fact table named fact_order,a dimension table named
dim_sales

In the fact_order table ,An order data contains multiple salespeople.

When I use fact_order join dim_sales it report that error: Dup key found.

How can I solve it ?



*发件人:* Alberto Ramón [mailto:a.ramonporto...@gmail.com]
*发送时间:* 2017年5月10日 15:29
*收件人:* user 
*主题:* Re: kylin nonsupport Multi-value dimensions?



Hi,

Not all hive types are supported

Check this lines:
https://github.com/apache/kylin/blob/5d4982e247a2172d97d44c85309cef
4b3dbfce09/core-metadata/src/main/java/org/apache/kylin/dimension/
DimensionEncodingFactory.java#L76



On 10 May 2017 at 08:10, jianhui.yi  wrote:

I encountered a multi-dimensional dimension of the problem, and I used
bridge table to try to solve it, but when building a cube,it report an error

java.lang.IllegalStateException: The table: DIM_XXX Dup key found,
key=[1446], value1=[1446,29,1,1], value2=[1446,28,0,0]

 at org.apache.kylin.dict.lookup.LookupTable.initRow(
LookupTable.java:86)

 at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.
java:69)

 at org.apache.kylin.dict.lookup.LookupStringTable.init(
LookupStringTable.java:79)

 at org.apache.kylin.dict.lookup.LookupTable.(
LookupTable.java:57)

 at org.apache.kylin.dict.lookup.LookupStringTable.(
LookupStringTable.java:65)

 at org.apache.kylin.cube.CubeManager.getLookupTable(
CubeManager.java:644)

 at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(
DictionaryGeneratorCLI.java:98)

 at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(
DictionaryGeneratorCLI.java:54)

 at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(
CreateDictionaryJob.java:66)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

 at org.apache.kylin.engine.mr.common.HadoopShellExecutable.
doWork(HadoopShellExecutable.java:63)

 at org.apache.kylin.job.execution.AbstractExecutable.
execute(AbstractExecutable.java:124)

 at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
DefaultChainedExecutable.java:64)

 at org.apache.kylin.job.execution.AbstractExecutable.
execute(AbstractExecutable.java:124)

 at org.apache.kylin.job.impl.threadpool.DefaultScheduler$
JobRunner.run(DefaultScheduler.java:142)

 at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)

 at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)

 at java.lang.Thread.run(Thread.java:745)

result code:2


Re: kylin nonsupport Multi-value dimensions?

2017-05-10 Thread Alberto Ramón
Hi,
Not all hive types are supported

Check this lines:
https://github.com/apache/kylin/blob/5d4982e247a2172d97d44c85309cef4b3dbfce09/core-metadata/src/main/java/org/apache/kylin/dimension/DimensionEncodingFactory.java#L76

On 10 May 2017 at 08:10, jianhui.yi  wrote:

> I encountered a multi-dimensional dimension of the problem, and I used
> bridge table to try to solve it, but when building a cube,it report an error
>
> java.lang.IllegalStateException: The table: DIM_XXX Dup key found,
> key=[1446], value1=[1446,29,1,1], value2=[1446,28,0,0]
>
>  at org.apache.kylin.dict.lookup.LookupTable.initRow(
> LookupTable.java:86)
>
>  at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.
> java:69)
>
>  at org.apache.kylin.dict.lookup.LookupStringTable.init(
> LookupStringTable.java:79)
>
>  at org.apache.kylin.dict.lookup.LookupTable.(
> LookupTable.java:57)
>
>  at org.apache.kylin.dict.lookup.LookupStringTable.(
> LookupStringTable.java:65)
>
>  at org.apache.kylin.cube.CubeManager.getLookupTable(
> CubeManager.java:644)
>
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.
> processSegment(DictionaryGeneratorCLI.java:98)
>
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.
> processSegment(DictionaryGeneratorCLI.java:54)
>
>  at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(
> CreateDictionaryJob.java:66)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>
>  at org.apache.kylin.engine.mr.common.HadoopShellExecutable.
> doWork(HadoopShellExecutable.java:63)
>
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
>
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.
> doWork(DefaultChainedExecutable.java:64)
>
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
>
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$
> JobRunner.run(DefaultScheduler.java:142)
>
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
>  at java.lang.Thread.run(Thread.java:745)
>
> result code:2
>
>
>
>
>
>
>


Re: [Announce] New Apache Kylin committer Zhixiong Chen

2017-04-29 Thread Alberto Ramón
Congratulations  to Roger Shi and  Zhixiong!! (and Dev team for next 2.0
version)

If you are ever near London or Spain, let me know, have beer will be
necesary  :)

2017-04-29 12:47 GMT+01:00 Dong Li :

> Welcome!
>
> Thanks,
> Dong Li
>
>  Original Message
> *Sender:* Li Yang
> *Recipient:* user
> *Cc:* dev; Apache Kylin PMC;
> chen
> *Date:* Saturday, Apr 29, 2017 19:13
> *Subject:* Re: [Announce] New Apache Kylin committer Zhixiong Chen
>
> Welcome Zhixiong!
>
> Yang
>
> On Sat, Apr 29, 2017 at 6:07 PM, Luke Han  wrote:
>
>> On behalf of the Apache Kylin PMC, I am very pleased to announce
>> that Zhixiong Chen has accepted the PMC's invitation to become a
>> committer on the project.
>>
>> We appreciate all of Zhixiong's generous contributions about many bug
>> fixes, patches, helped many users. We are so glad to have him to be
>> our new committer and looking forward to his continued involvement.
>>
>> Congratulations and Welcome, Zhixiong!
>>
>
>


Re: how to use kylin

2017-03-24 Thread Alberto Ramón
hummm, of course
Apache Kylin its a cube on Hadoop (= read only)

2017-03-24 7:36 GMT+00:00 mathieu ferlay :

> Hi everybody,
>
> I’m totally new with Kylin and I’m not sure to have really understood how
> to use Kylin and in which cases. I have created my tables in hive and
> synchronized them with Kylin. I can see them in Kylin web UI.
>
> I have linked my project to Kylin by using the JDBC driver and I want to
> populate my tables.
>
> My issue is I obtain an error which says  a stuff like SQL is not
> supported when I try to do an INSERT. What I start to think is Kylin allows
> only to use SELECT requests.
>
>
>
> Thanks for your help,
>
> Regards
>
>
>
> *Mathieu FERLAY*
>
> R&D Engineer
>
> *GNUBILA/MAAT France*
> 174, Imp. Pres d'en Bas
> 74370 Argonay (France)
>
> Tel. 0033 450 685 601
> Fax. 0033 972 213 540
>
> www.gnubila.fr
> mfer...@gnubila.fr
>
>
>
>
>
>
> [image: cid:AA0029B7-277B-4551-B524-3DCED89E16A9]
>
>
>
> *PRIVACY** DESIGNER*
>
>
>


Re: The coprocessor thread stopped itself due to scan timeout or scan threshold

2017-03-18 Thread Alberto Ramón
For the new version , check this:
https://issues.apache.org/jira/browse/KYLIN-2438

but keep in mind, that these limits exists to protect HBase coprocesor and
it you query is to slow ... pehapt you need re-design the cube

BR

2017-03-18 8:13 GMT+00:00 java_prog...@aliyun.com :

> Hi,
>when  I execute a query , there is an error shows below.
>
> Error while executing SQL "select t.hotel_id_m,t.live_dt,
> d.day_of_week,sum(rns) from tableT t join tableB d on t.live_dt = d.daY_no
> group by t.hotel_id_m,t.live_dt, d.day_of_week LIMIT 5":  for Query 553d8027-b97f-4e86-9aad-47bb0053b6ee GTScanRequest 1c96c729>The
> coprocessor thread stopped itself due to scan timeout or scan
> threshold(check region server log), failing current query..
>
>  I try to set kylin.query.coprocessor.mem.gb, kylin.query.mem.budget as
> bigger as it can be. but it did not work. If I set a small LIMIT number
> like 2 ,it work well.
>  Coulld you tell me what I can do if I want to using limit 5 or Is
> there any other way to let me get final result.
>
>
> Best regards,
>
> --
> java_prog...@aliyun.com
>


Re: java.lang.RuntimeException: Too big dictionary, dictionary cannot be bigger than 2GB

2017-02-14 Thread Alberto Ramón
Max cardinality of defeault Dic is 2 millons
Why encode  Sale_ord_id as Dic? if this is an int, you can use integer Dic

Please check:
http://apache-kylin.74782.x6.nabble.com/create-dictionary-error-td7155.html
http://mail-archives.apache.org/mod_mbox/kylin-user/201702.mbox/%3CCAEcyM17BTkhVpFcZLP6%2Boawx%3D1eap%3DZS_ER1HJbhevJPBE71-g%40mail.gmail.com%3E



2017-02-14 10:14 GMT+01:00 仇同心 :

> Hi ,all
>
>   The first step in cube merge, an error :
>
>
>
>java.lang.RuntimeException: Too big dictionary, dictionary cannot be
> bigger than 2GB
>
>at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(
> TrieDictionaryBuilder.java:421)
>
>at org.apache.kylin.dict.TrieDictionaryBuilder.build(
> TrieDictionaryBuilder.java:408)
>
>at org.apache.kylin.dict.DictionaryGenerator$
> StringDictBuilder.build(DictionaryGenerator.java:165)
>
>at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> DictionaryGenerator.java:81)
>
>at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> DictionaryGenerator.java:73)
>
>at org.apache.kylin.dict.DictionaryGenerator.mergeDictionaries(
> DictionaryGenerator.java:102)
>
>at org.apache.kylin.dict.DictionaryManager.mergeDictionary(
> DictionaryManager.java:268)
>
>at org.apache.kylin.engine.mr.steps.MergeDictionaryStep.
> mergeDictionaries(MergeDictionaryStep.java:145)
>
>at org.apache.kylin.engine.mr.steps.MergeDictionaryStep.
> makeDictForNewSegment(MergeDictionaryStep.java:135)
>
>at org.apache.kylin.engine.mr.steps.MergeDictionaryStep.
> doWork(MergeDictionaryStep.java:67)
>
>at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:113)
>
>at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> DefaultChainedExecutable.java:57)
>
>at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:113)
>
>at org.apache.kylin.job.impl.threadpool.DefaultScheduler$
> JobRunner.run(DefaultScheduler.java:136)
>
>at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
>at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
>at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
>  “SALE_ORD_ID”  Cardinality :157644463
>
>  SALECOUNT_DISTINCT  Value:SALE_ORD_ID, Type:column   bitmap
>
>
>
> I'm wondering that the high base fields can't do count_distinct accurate
> statistical metrics ??
>
>
>
>
>
>
>


Re: kylin job stop accidentally and can resume success!

2017-02-13 Thread Alberto Ramón
Do you have the Resource Manager in a dedicated node ?(without container or
Node Manager)

2017-02-13 17:38 GMT+01:00 不清 <452652...@qq.com>:

> I check the configure in CM。
>
> Java Heap Size of ResourceManager in Bytes =1536 MiB
> Container Memory Minimum =1GiB
>
> Container Memory Increment =512MiB
>
> Container Memory Maximum =8GiB
>
> -- 原始邮件 --
> *发件人:* "Alberto Ramón";;
> *发送时间:* 2017年2月14日(星期二) 凌晨0:34
> *收件人:* "user";
> *主题:* Re: kylin job stop accidentally and can resume success!
>
> check this
> <https://www.mapr.com/blog/best-practices-yarn-resource-management>:
> "Basically, it means RM can only allocate memory to containers in
> increments of .  . . "
>
> TIP: is your RM in a work node? If this is true, this can be the problem
> (Its good idea put yarn master, RM, in a dedicated node)
>
>
> 2017-02-13 17:19 GMT+01:00 不清 <452652...@qq.com>:
>
>> how can i get this heap size?
>>
>>
>> -- 原始邮件 --
>> *发件人:* "Alberto Ramón";;
>> *发送时间:* 2017年2月14日(星期二) 凌晨0:17
>> *收件人:* "user";
>> *主题:* Re: kylin job stop accidentally and can resume success!
>>
>> Sounds like a problem of Resource Manager (RM) of YARN, check the Heap
>> size for RM
>> Kylin loose connectivity whit RM
>>
>> 2017-02-13 17:00 GMT+01:00 不清 <452652...@qq.com>:
>>
>>> hello,kylin community!
>>>
>>> sometimes my jobs stop accidenttly.It is can stop by any step.
>>>
>>> kylin log is like :
>>> 2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8]
>>> hbase.HBaseResourceStore:262 : Update row 
>>> /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02
>>> from oldTs: 1486999611524, to newTs: 1486999621545, operation result: true
>>> 2017-02-13 23:27:13,384 INFO  [pool-8-thread-8] ipc.Client:842 :
>>> Retrying connect to server: jxhdp1datanode29/10.180.212.61:50504.
>>> Already tried 0 time(s); retry policy is 
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>>> sleepTime=1000 MILLISECONDS)
>>> 2017-02-13 23:27:14,387 INFO  [pool-8-thread-8] ipc.Client:842 :
>>> Retrying connect to server: jxhdp1datanode29/10.180.212.61:50504.
>>> Already tried 1 time(s); retry policy is 
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>>> sleepTime=1000 MILLISECONDS)
>>> 2017-02-13 23:27:15,388 INFO  [pool-8-thread-8] ipc.Client:842 :
>>> Retrying connect to server: jxhdp1datanode29/10.180.212.61:50504.
>>> Already tried 2 time(s); retry policy is 
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>>> sleepTime=1000 MILLISECONDS)
>>> 2017-02-13 23:27:15,495 INFO  [pool-8-thread-8]
>>> mapred.ClientServiceDelegate:273 : Application state is completed.
>>> FinalApplicationStatus=KILLED. Redirecting to job history server
>>> 2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 :
>>> updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02
>>>
>>> CM log is like:
>>> Job Name: Kylin_Cube_Builder_user_all_cube_2_only_msisdn
>>> User Name: tmn
>>> Queue: root.tmn
>>> State: KILLED
>>> Uberized: false
>>> Submitted: Sun Feb 12 19:19:24 CST 2017
>>> Started: Sun Feb 12 19:19:38 CST 2017
>>> Finished: Sun Feb 12 20:30:13 CST 2017
>>> Elapsed: 1hrs, 10mins, 35sec
>>> Diagnostics:
>>> Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at
>>> 10.180.212.38
>>> Job received Kill while in RUNNING state.
>>> Average Map Time 24mins, 48sec
>>>
>>> mapreduce job log
>>> Task KILL is received. Killing attempt!
>>>
>>> and when this happened ,by resume job,the job can resume success! I mean
>>>  it is not stop by error!
>>>
>>> what's the problem?
>>>
>>> My hadoop cluster is very busy,this situation happens very often.
>>>
>>> can I set retry time and retry  Interval?
>>>
>>
>>
>


Re: kylin job stop accidentally and can resume success!

2017-02-13 Thread Alberto Ramón
check this
<https://www.mapr.com/blog/best-practices-yarn-resource-management>:
"Basically, it means RM can only allocate memory to containers in
increments of .  . . "

TIP: is your RM in a work node? If this is true, this can be the problem
(Its good idea put yarn master, RM, in a dedicated node)


2017-02-13 17:19 GMT+01:00 不清 <452652...@qq.com>:

> how can i get this heap size?
>
>
> -- 原始邮件 ------
> *发件人:* "Alberto Ramón";;
> *发送时间:* 2017年2月14日(星期二) 凌晨0:17
> *收件人:* "user";
> *主题:* Re: kylin job stop accidentally and can resume success!
>
> Sounds like a problem of Resource Manager (RM) of YARN, check the Heap
> size for RM
> Kylin loose connectivity whit RM
>
> 2017-02-13 17:00 GMT+01:00 不清 <452652...@qq.com>:
>
>> hello,kylin community!
>>
>> sometimes my jobs stop accidenttly.It is can stop by any step.
>>
>> kylin log is like :
>> 2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8]
>> hbase.HBaseResourceStore:262 : Update row 
>> /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02
>> from oldTs: 1486999611524, to newTs: 1486999621545, operation result: true
>> 2017-02-13 23:27:13,384 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
>> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 0
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>> sleepTime=1000 MILLISECONDS)
>> 2017-02-13 23:27:14,387 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
>> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 1
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>> sleepTime=1000 MILLISECONDS)
>> 2017-02-13 23:27:15,388 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
>> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 2
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>> sleepTime=1000 MILLISECONDS)
>> 2017-02-13 23:27:15,495 INFO  [pool-8-thread-8]
>> mapred.ClientServiceDelegate:273 : Application state is completed.
>> FinalApplicationStatus=KILLED. Redirecting to job history server
>> 2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 :
>> updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02
>>
>> CM log is like:
>> Job Name: Kylin_Cube_Builder_user_all_cube_2_only_msisdn
>> User Name: tmn
>> Queue: root.tmn
>> State: KILLED
>> Uberized: false
>> Submitted: Sun Feb 12 19:19:24 CST 2017
>> Started: Sun Feb 12 19:19:38 CST 2017
>> Finished: Sun Feb 12 20:30:13 CST 2017
>> Elapsed: 1hrs, 10mins, 35sec
>> Diagnostics:
>> Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at
>> 10.180.212.38
>> Job received Kill while in RUNNING state.
>> Average Map Time 24mins, 48sec
>>
>> mapreduce job log
>> Task KILL is received. Killing attempt!
>>
>> and when this happened ,by resume job,the job can resume success! I mean
>>  it is not stop by error!
>>
>> what's the problem?
>>
>> My hadoop cluster is very busy,this situation happens very often.
>>
>> can I set retry time and retry  Interval?
>>
>
>


Re: kylin job stop accidentally and can resume success!

2017-02-13 Thread Alberto Ramón
Sounds like a problem of Resource Manager (RM) of YARN, check the Heap size
for RM
Kylin loose connectivity whit RM

2017-02-13 17:00 GMT+01:00 不清 <452652...@qq.com>:

> hello,kylin community!
>
> sometimes my jobs stop accidenttly.It is can stop by any step.
>
> kylin log is like :
> 2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8]
> hbase.HBaseResourceStore:262 : Update row 
> /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02
> from oldTs: 1486999611524, to newTs: 1486999621545, operation result: true
> 2017-02-13 23:27:13,384 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 0
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-02-13 23:27:14,387 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 1
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-02-13 23:27:15,388 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 2
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-02-13 23:27:15,495 INFO  [pool-8-thread-8]
> mapred.ClientServiceDelegate:273 : Application state is completed.
> FinalApplicationStatus=KILLED. Redirecting to job history server
> 2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 :
> updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02
>
> CM log is like:
> Job Name: Kylin_Cube_Builder_user_all_cube_2_only_msisdn
> User Name: tmn
> Queue: root.tmn
> State: KILLED
> Uberized: false
> Submitted: Sun Feb 12 19:19:24 CST 2017
> Started: Sun Feb 12 19:19:38 CST 2017
> Finished: Sun Feb 12 20:30:13 CST 2017
> Elapsed: 1hrs, 10mins, 35sec
> Diagnostics:
> Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at
> 10.180.212.38
> Job received Kill while in RUNNING state.
> Average Map Time 24mins, 48sec
>
> mapreduce job log
> Task KILL is received. Killing attempt!
>
> and when this happened ,by resume job,the job can resume success! I mean
>  it is not stop by error!
>
> what's the problem?
>
> My hadoop cluster is very busy,this situation happens very often.
>
> can I set retry time and retry  Interval?
>


Re: 求助有一个超大维度

2017-02-13 Thread Alberto Ramón
for B: its a java option, (. . . java.opts)
  Check if your JVM isn't very old, there are a lot of optimizacions for GC
in last versions of Java 8

TIP 1: Check if you can reduce dimensionality of cube or use AGG to make
lighter the build process
You canTake some ideas from this
<https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance>

TIP 2: solve first problem A, because if you enlarge Heap, the B will be
worst


2017-02-13 10:16 GMT+01:00 不清 <452652...@qq.com>:

> thanks for reply!
>
> For error A, I can set these parameters in kylin.
>
> But for error B,should I fix this problem for whole hadoop cluster?  Can
> you speak the parameter fix in detail?
>
> This really helped us a lot!
>
>
> ---------- 原始邮件 --
> *发件人:* "Alberto Ramón";;
> *发送时间:* 2017年2月13日(星期一) 下午3:58
> *收件人:* "user";
> *主题:* Re: 求助有一个超大维度
>
> Hello 不清
>
>
> From your errors: "Failed to build cube in mapper " &
> A- "java.lang.OutOfMemoryError: Java heap space at java" &
> B- "java.lang.OutOfMemoryError: GC overhead limit"
>
> For error A:Check override this parameters from kylin:
>
>
> *   kylin.job.mr.config.override.mapred.map.child.java.opts=-Xmx8g  *
>
> *   kylin.job.mr.config.override.mapreduce.map.memory.mb=8500*
>
>
>
> *For error B:  (this is more complicated)*
>
> *   Check you are using Java 8 or higer*
>
> *   Try with this *-XX:+UseG1GC
>
>Explanation: https://wiki.apache.org/solr/ShawnHeisey
>
>
> yes, use Integer dictionary is the best option
>
>
>
> 2017-02-13 3:53 GMT+01:00 不清 <452652...@qq.com>:
>
>> kylin社区,您好!
>>
>> 是手机号作为维度,这个维度的去重值在500w~1500w。
>> 我是使用的integer 编码 然后length设置为8.  测试的数据量大约在1亿条。是我设置的有问题么?
>>
>> 对于超大维度,kylin需要进行什么设置么?
>>
>> 我使用的kylin版本是1.6.
>>
>> 谢谢
>>
>> 报错步骤是 build cube
>> map任务耗时特别长,最后还报错了,如下
>> Error: java.io.IOException: Failed to build cube in mapper 36 at
>> org.apache.kylin.engine.mr.steps.InMemCuboidMapper.cleanup(InMemCuboidMapper.java:145)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148) at
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> javax.security.auth.Subject.doAs(Subject.java:415) at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused
>> by: java.util.concurrent.ExecutionException: java.lang.RuntimeException:
>> java.io.IOException: java.io.IOException: java.lang.RuntimeException:
>> java.io.IOException: java.lang.OutOfMemoryError: Java heap space at
>> java.util.concurrent.FutureTask.report(FutureTask.java:122) at
>> java.util.concurrent.FutureTask.get(FutureTask.java:188) at
>> org.apache.kylin.engine.mr.steps.InMemCuboidMapper.cleanup(InMemCuboidMapper.java:143)
>> ... 8 more Caused by: java.lang.RuntimeException: java.io.IOException:
>> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
>> java.lang.OutOfMemoryError: Java heap space at
>> org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1.run(
>> AbstractInMemCubeBuilder.java:84) at java.util.concurrent.Executors
>> $RunnableAdapter.call(Executors.java:471) at
>> java.util.concurrent.FutureTask.run(FutureTask.java:262) at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException:
>> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
>> java.lang.OutOfMemoryError: Java heap space at
>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnc
>> e.build(DoggedCubeBuilder.java:128) at org.apache.kylin.cube.inmemcub
>> ing.DoggedCubeBuilder.build(DoggedCubeBuilder.java:75) at
>> org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1.run(
>> AbstractInMemCubeBuilder.java:82) ... 5 more Caused by:
>> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
>> java.lang.OutOfMemoryError: Java heap space at
>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnc
>> e.abort(DoggedCubeBuilder.java:196) at org.apache.kylin.cube.inmemcub
>> ing.DoggedCubeBuilder$BuildOnce.checkException(DoggedCubeBuilder.java:169)
>> at org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOn

Re: 求助有一个超大维度

2017-02-12 Thread Alberto Ramón
Hello 不清


>From your errors: "Failed to build cube in mapper " &
A- "java.lang.OutOfMemoryError: Java heap space at java" &
B- "java.lang.OutOfMemoryError: GC overhead limit"

For error A:Check override this parameters from kylin:


*   kylin.job.mr.config.override.mapred.map.child.java.opts=-Xmx8g  *

*   kylin.job.mr.config.override.mapreduce.map.memory.mb=8500*



*For error B:  (this is more complicated)*

*   Check you are using Java 8 or higer*

*   Try with this *-XX:+UseG1GC

   Explanation: https://wiki.apache.org/solr/ShawnHeisey


yes, use Integer dictionary is the best option



2017-02-13 3:53 GMT+01:00 不清 <452652...@qq.com>:

> kylin社区,您好!
>
> 是手机号作为维度,这个维度的去重值在500w~1500w。
> 我是使用的integer 编码 然后length设置为8.  测试的数据量大约在1亿条。是我设置的有问题么?
>
> 对于超大维度,kylin需要进行什么设置么?
>
> 我使用的kylin版本是1.6.
>
> 谢谢
>
> 报错步骤是 build cube
> map任务耗时特别长,最后还报错了,如下
> Error: java.io.IOException: Failed to build cube in mapper 36 at
> org.apache.kylin.engine.mr.steps.InMemCuboidMapper.
> cleanup(InMemCuboidMapper.java:145) at 
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1642) at org.apache.hadoop.mapred.
> YarnChild.main(YarnChild.java:163) Caused by: 
> java.util.concurrent.ExecutionException:
> java.lang.RuntimeException: java.io.IOException: java.io.IOException:
> java.lang.RuntimeException: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.
> FutureTask.report(FutureTask.java:122) at java.util.concurrent.
> FutureTask.get(FutureTask.java:188) at org.apache.kylin.engine.mr.
> steps.InMemCuboidMapper.cleanup(InMemCuboidMapper.java:143) ... 8 more
> Caused by: java.lang.RuntimeException: java.io.IOException:
> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space at org.apache.kylin.cube.
> inmemcubing.AbstractInMemCubeBuilder$1.run(AbstractInMemCubeBuilder.java:84)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException:
> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space at org.apache.kylin.cube.
> inmemcubing.DoggedCubeBuilder$BuildOnce.build(DoggedCubeBuilder.java:128)
> at org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder.
> build(DoggedCubeBuilder.java:75) at org.apache.kylin.cube.inmemcubing.
> AbstractInMemCubeBuilder$1.run(AbstractInMemCubeBuilder.java:82) ... 5
> more Caused by: java.io.IOException: java.lang.RuntimeException:
> java.io.IOException: java.lang.OutOfMemoryError: Java heap space at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnce.abort(DoggedCubeBuilder.java:196)
> at org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$
> BuildOnce.checkException(DoggedCubeBuilder.java:169) at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnce.build(DoggedCubeBuilder.java:116)
> ... 7 more Caused by: java.lang.RuntimeException: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space at org.apache.kylin.cube.
> inmemcubing.DoggedCubeBuilder$SplitThread.run(DoggedCubeBuilder.java:289)
> Caused by: java.io.IOException: java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.throwExceptionIfAny(InMemCubeBuilder.java:226)
> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.
> build(InMemCubeBuilder.java:186) at org.apache.kylin.cube.
> inmemcubing.InMemCubeBuilder.build(InMemCubeBuilder.java:137) at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$SplitThread.run(DoggedCubeBuilder.java:284)
> Caused by: java.lang.OutOfMemoryError: Java heap space at
> java.math.BigInteger.(BigInteger.java:973) at
> java.math.BigInteger.valueOf(BigInteger.java:957) at
> java.math.BigDecimal.inflate(BigDecimal.java:3519) at
> java.math.BigDecimal.unscaledValue(BigDecimal.java:2205) at
> org.apache.kylin.metadata.datatype.BigDecimalSerializer.serialize(BigDecimalSerializer.java:56)
> at 
> org.apache.kylin.metadata.datatype.BigDecimalSerializer.serialize(BigDecimalSerializer.java:33)
> at org.apache.kylin.measure.MeasureCodec.encode(MeasureCodec.java:76) at
> org.apache.kylin.measure.BufferedMeasureCodec.encode(BufferedMeasureCodec.java:93)
> at org.apache.kylin.gridtable.GTAggregateScanner$AggregationCache$
> ReturningRecord.load(GTAggregateScanner.java:41

Re: create dictionary error

2017-02-10 Thread Alberto Ramón
Hi, Move this thread to User mailList

SALE_ORD_ID is not a dim of cube, but isit  a PK-FK ?  I think yes  :)
Are you using DERIVED Dims in this table ?

See this
,
the 2G limit is hardcoded, I think increase XMX dont solve your case
They said you have a cardinalty more than " final int _2GB = 20;",
can you check if this is true?
can you review the statistics for this columns?







2017-02-10 6:29 GMT+01:00 仇同心 :

> Hi,all
>
>  Building operation error on the of  Step Name: Build Dimension
> Dictionary:
>
>
>
> java.lang.RuntimeException: Failed to create dictionary on
> DMT.DMT_KYLIN_JDMALL_ORDR_DTL_I_D.SALE_ORD_ID
>
>  at org.apache.kylin.dict.DictionaryManager.buildDictionary(
> DictionaryManager.java:325)
>
>  at org.apache.kylin.cube.CubeManager.buildDictionary(
> CubeManager.java:185)
>
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.
> processSegment(DictionaryGeneratorCLI.java:50)
>
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.
> processSegment(DictionaryGeneratorCLI.java:41)
>
>  at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(
> CreateDictionaryJob.java:56)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>
>  at org.apache.kylin.engine.mr.common.HadoopShellExecutable.
> doWork(HadoopShellExecutable.java:63)
>
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:113)
>
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.
> doWork(DefaultChainedExecutable.java:57)
>
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:113)
>
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$
> JobRunner.run(DefaultScheduler.java:136)
>
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
>  at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.RuntimeException: Too big dictionary, dictionary
> cannot be bigger than 2GB
>
>  at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(
> TrieDictionaryBuilder.java:421)
>
>  at org.apache.kylin.dict.TrieDictionaryBuilder.build(
> TrieDictionaryBuilder.java:408)
>
>  at org.apache.kylin.dict.DictionaryGenerator$
> StringDictBuilder.build(DictionaryGenerator.java:165)
>
>  at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> DictionaryGenerator.java:81)
>
>  at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> DictionaryGenerator.java:73)
>
>  at org.apache.kylin.dict.DictionaryManager.buildDictionary(
> DictionaryManager.java:321)
>
>  ... 14 more
>
>
>
>   The  Cardinality of  “SALE_ORD_ID”  is 157644463,but This column was not
> selected for the dimension.
>
>
>
>   In addition, I'm very confused here to build a data dictionary is full
> amount to build or data to construct according to the selected time range?
>
>
>
>
>
> Thank you~
>
>
>
>
>
>
>
>
>
>
>


Re: New document: "How to optimize cube build"

2017-01-25 Thread Alberto Ramón
Be careful about partition by "FLIGHTDATE"

>From https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance

*"Option 1: Use id_date as partition column on Hive table. This have a big
problem: the Hive metastore is meant for few hundred of partitions not
thousand (Hive 9452 there is an idea to solve this isn’t in progress)*"

In Hive 2.0 will be a preview (only for testing) to solve this

2017-01-25 9:46 GMT+01:00 ShaoFeng Shi :

> Hello,
>
> A new document is added for the practices of cube build. Any suggestion or
> comment is welcomed. We can update the doc later with feedbacks;
>
> Here is the link:
> https://kylin.apache.org/docs16/howto/howto_optimize_build.html
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Jekyll

2017-01-23 Thread Alberto Ramón
the error was, I had two versions of "jekyll-multiple-language"



2017-01-23 20:26 GMT+01:00 Alberto Ramón :

> I'm trying to add new doc to apache kylin
>
>
> jekyll 2.5.3 | Error:  undefined method `post_read' for class
> `Jekyll::Document'
>
> And this is true: https://github.com/jekyll/jekyll/blob/master/lib/jekyll/
> document.rb
>
> I used:
>
>   git init  git clone -b document --single-branch 
> git://git.apache.org/kylin.git
>   cd  … website
>   jekyll server
>
> install:
>
> gem uninstall --all
>
> sudo gem install jekyll --version "=2.5.3"
>
>   sudo gem install bundler
>
>   sudo gem install jekyll-multiple-languages kramdown rouge
>
>
>
> versions:
>   ruby 2.3.1p112
>   jekyll 2.5.3
>
>


Re: Jekyll

2017-01-23 Thread Alberto Ramón
the error was, I had two versions of "jekyll-multiple-language"

thanks

2017-01-23 20:26 GMT+01:00 Alberto Ramón :

> I'm trying to add new doc to apache kylin
>
>
> jekyll 2.5.3 | Error:  undefined method `post_read' for class
> `Jekyll::Document'
>
> And this is true: https://github.com/jekyll/jekyll/blob/master/lib/jekyll/
> document.rb
>
> I used:
>
>   git init  git clone -b document --single-branch 
> git://git.apache.org/kylin.git
>   cd  … website
>   jekyll server
>
> install:
>
> gem uninstall --all
>
> sudo gem install jekyll --version "=2.5.3"
>
>   sudo gem install bundler
>
>   sudo gem install jekyll-multiple-languages kramdown rouge
>
>
>
> versions:
>   ruby 2.3.1p112
>   jekyll 2.5.3
>
>


Jekyll

2017-01-23 Thread Alberto Ramón
I'm trying to add new doc to apache kylin


jekyll 2.5.3 | Error:  undefined method `post_read' for class
`Jekyll::Document'

And this is true:
https://github.com/jekyll/jekyll/blob/master/lib/jekyll/document.rb

I used:

  git init  git clone -b document --single-branch git://git.apache.org/kylin.git
  cd  … website
  jekyll server

install:

gem uninstall --all

sudo gem install jekyll --version "=2.5.3"

  sudo gem install bundler

  sudo gem install jekyll-multiple-languages kramdown rouge



versions:
  ruby 2.3.1p112
  jekyll 2.5.3


Re: Kylin and BI Tools

2017-01-18 Thread Alberto Ramón
Hello,
https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain

*Changes*:
- Fixed Carabel to Caravel
- Added Zeppelin Reference
- Added Apache Flink

Thanks for all !!

2017-01-17 9:28 GMT+01:00 Alberto Ramón :

> Thanks Anton
> I will complete/fix my report with your suggestions.
>
> 2017-01-17 3:56 GMT+01:00 Anton Bubna-Litic  com.au>:
>
>> I have successfully used Zeppelin’s Kylin interpreter with Kylin 1.6 to
>> run sql queries. It was very straight forward to set up and run commands.
>>
>>
>> *From:* Alberto Ramón [mailto:a.ramonporto...@gmail.com]
>> *Sent:* Tuesday, 17 January 2017 01:49
>> *To:* user 
>> *Subject:* Re: Kylin and BI Tools
>>
>>
>>
>> Somebody has been tested this with last versions of Kylin?:
>> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/kylin.html
>>
>> If this work OK with Kylin 1.6 or 2.0, I can put a reference directly
>>
>>
>>
>> 2017-01-16 15:31 GMT+01:00 Billy Liu :
>>
>> I have interest on Zeppelin also, please refer to
>> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/kylin.html
>> first.
>>
>>
>>
>> 2017-01-16 19:14 GMT+08:00 Alberto Ramón :
>>
>> yes,
>>  - I will fix "Carabel" to "Caravel". (It is a shame that this project
>> is not updated, because the quality of the graphics are very good)
>>
>>  - Document about  Kylin and Zeppelink will be interesting, I have this
>> in my ToDo list
>>
>>  - More suggestions? bugs ?
>>
>>
>>
>> Thanks !!
>>
>>
>>
>> 2017-01-16 9:37 GMT+01:00 Jian Zhong :
>>
>> very good document.
>>
>>
>>
>> I see "Kylin Carabel" section, maybe need to update to "Kylin Caravel"
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Sun, Jan 1, 2017 at 6:53 AM, Alberto Ramón 
>> wrote:
>>
>> Happy 2017   :)
>>
>> I updated Kylin & BI tools with new notes:
>> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>>
>>
>>
>> 2016-09-28 1:30 GMT+02:00 Li Yang :
>>
>> Base on the great work, we could create more How-To page to add to Kylin
>> document section.
>>
>> Yang
>>
>>
>>
>> On Tue, Sep 20, 2016 at 9:03 AM, Luke Han  wrote:
>>
>> Very nice, thanks Alberto
>>
>>
>>
>>
>> Best Regards!
>> -
>>
>> Luke Han
>>
>>
>>
>> On Mon, Sep 19, 2016 at 10:21 PM, Billy(Yiming) Liu <
>> liuyiming@gmail.com> wrote:
>>
>> So cool, impressive. Thank you, Alberto.
>>
>>
>>
>> 2016-09-19 21:42 GMT+08:00 Alberto Ramón :
>>
>> Hello
>>
>> This is the end of all my previous articles, about Kylin and differents
>> tools
>> With some successful and some failures   :)
>>
>>
>> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>>
>>
>>
>> If you have any comment / improvement, feel free to indicate me the
>> changes
>>
>> A lot of thanks to the "Kylin Team", Alb
>>
>>
>>
>>
>>
>> --
>>
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>


Re: Problem with limit and joint aggregation

2017-01-17 Thread Alberto Ramón
Joint must be used for:
 - Group Dims with *very *low cardinality, Example: IdCurrency  (most of
bank's transactions uses < 10 currencies)
- You Have columns with same cardinality: Country_ID and Contry_txt

Check TopN feature of Kylin to precalcualte sum order by
You can allocate more memory to Kylin Instance (for order by process)
please, read links I shared with you in the other Q, there are some useful
tips and examples

2017-01-17 12:37 GMT+01:00 Phong Pham :

> Hi all,
> I definedsome dimensions, for example: A,B,C as  joint aggregation.
> When i executed query:
>
> SELECT A,B,C, SUM(metrics) as metrics
> FROM table1
> WHERE DateStats <= x and DateStats >= x
> GROUP BY A,B,C
> LIMIT 250
>
> Query is very fast, but Metrics (from SUM(metrics)) Value just sum data
> within limit (250 rows). If i used ORDER BY , results will be true but
> performance is so bad (If Total Scan Count is over 2-3 milions).
> Please explain to me this problem.
>
> Thanks.
>


Re:

2017-01-17 Thread Alberto Ramón
A -I send you a bug: (sorry)
 Try to use *JOINT* DIM with very low cardinality columns, perhaps: TypeID,
NetworkID, LanguajeID, IsMovileDevice.


B- you put "DATESTATS" as mandatory,  then I Imagine you put 1º position of
RowKey:
  Be careful with this. Because you are creating Hot RegionsServer and Cold
RS



2017-01-17 9:26 GMT+01:00 Alberto Ramón :

> Did you compressed the output cube? This is very important (see last link)
>
> About Order BY
>   - Check if TopN can solve your problem:
>  http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/
>   - Try to reorder RowKey to put OrderBY in first possitions
>   - Try AGG : Make a "sub-cube" with less Dim
>  http://kylin.apache.org/blog/2016/02/18/new-aggregation-group/
>
> 2017-01-17 7:50 GMT+01:00 Phong Pham :
>
>> Hi Alberto,
>>After try to apply your suggestion, our queríe is improved so much.
>> Thanks a lot.
>> However, we have problem with ORDER BY function. When we use ORDER BY
>> with a large data set (for example: with long date-range filter),
>> performance is very slow.
>> Result:
>> *User: ADMIN*
>> *Success: true*
>> *Duration: 23.311*
>> *Project: metrixa_global_database_new*
>> *Realization Names: [account_global_convtrack_summary_daily_by_location]*
>> *Cuboid Ids: [135]*
>> *Total scan count: 2595584*
>> *Result row count: 250*
>> *Accept Partial: true*
>> *Is Partial Result: false*
>> *Hit Exception Cache: false*
>> *Storage cache used: false*
>> *Message: null*
>>
>> ORDER BY performance goes down when Total Scan Count is big. So how can i
>> improve this problem?
>> Thanks
>>
>>
>> 2017-01-16 18:45 GMT+07:00 Alberto Ramón :
>>
>>> Hi Phon, I'm not expert but I have some suggestions:
>>>
>>> - All Dim en are using Dict: you can change a lot to Integer (Fix
>>> length)
>>> - Re-Order row key its a good idea. I always try to first fields of key
>>> have Fix Length. Put mandatory the First its a good Idea
>>> - See hierarchy optimizations, will be very interesting for you:
>>> Country, Region, City, site . Perhaps Company  and Account also can be
>>> included (I don't know your data)
>>> - If you use Left join, the first step of building cube (flat table)
>>> will be more slow
>>> - Check if your ORC input table is compressed
>>> - Try to use derived DIm with very low cardinality columns, perhaps:
>>> TypeID, NetworkID, LanguajeID, IsMovileDevice.
>>>I understand that Affiliated, Account, Company, ... will growth in
>>> the future, because you are working with test data ?
>>>
>>> Check this references:
>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>> http://mail-archives.apache.org/mod_mbox/kylin-user/201611.mbox
>>> /%3Ctencent_F5A1E061EFFB778CC5BF9909%40qq.com%3E
>>> http://mail-archives.apache.org/mod_mbox/kylin-user/201607.mbox
>>> /%3C004201d1d4ef%240151b7e0%2403f527a0%24%40fishbowl.com%3E
>>> http://mail-archives.apache.org/mod_mbox/kylin-user/201612.mbox
>>> /%3CCAEcyM171RGhk0QoXJUjjZJeSxXwgUGu0vO%2B_T71KXMU1k00L%2Bg%
>>> 40mail.gmail.com%3E
>>> Check this tunning example:  https://github.com/albertoRamon/Kylin
>>> /tree/master/KylinPerformance
>>>
>>> BR, Alb
>>>
>>>
>>> 2017-01-16 3:47 GMT+01:00 Phong Pham :
>>>
>>>> Hi all,
>>>> Hi all,
>>>>* We still meet problems with query performance. Here is the cube
>>>> info of one cube*:
>>>> {
>>>>  "uuid": "6b2f4643-72a3-4a51-b9f2-47aa8e1322a5",
>>>>  "last_modified": 1484533219336,
>>>>  "version": "1.6.0",
>>>>  "name": "account_global_convtrack_summary_daily_test",
>>>>  "owner": "ADMIN",
>>>>  "descriptor": "account_global_convtrack_summary_daily_test",
>>>>  "cost": 50,
>>>>  "status": "READY",
>>>>  "segments": [
>>>> {
>>>>  "uuid": "85fa970e-6808-47c8-ae35-45d1975bb3bc",
>>>>  "name": "2016010100_2016122600",
>>>>  "storage_location_identifier": "KYLIN_7E4KIJ3YGX",
>>>>  "date_range_start": 145160640,
>>>>  "date_range_end": 148271040,
>>>>  "source_offset_start": 0,
>>>>  "source_o

Re: Kylin and BI Tools

2017-01-17 Thread Alberto Ramón
Thanks Anton
I will complete/fix my report with your suggestions.

2017-01-17 3:56 GMT+01:00 Anton Bubna-Litic <
anton.bubna-li...@quantium.com.au>:

> I have successfully used Zeppelin’s Kylin interpreter with Kylin 1.6 to
> run sql queries. It was very straight forward to set up and run commands.
>
>
> *From:* Alberto Ramón [mailto:a.ramonporto...@gmail.com]
> *Sent:* Tuesday, 17 January 2017 01:49
> *To:* user 
> *Subject:* Re: Kylin and BI Tools
>
>
>
> Somebody has been tested this with last versions of Kylin?:
> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/kylin.html
>
> If this work OK with Kylin 1.6 or 2.0, I can put a reference directly
>
>
>
> 2017-01-16 15:31 GMT+01:00 Billy Liu :
>
> I have interest on Zeppelin also, please refer to
> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/kylin.html
> first.
>
>
>
> 2017-01-16 19:14 GMT+08:00 Alberto Ramón :
>
> yes,
>  - I will fix "Carabel" to "Caravel". (It is a shame that this project is
> not updated, because the quality of the graphics are very good)
>
>  - Document about  Kylin and Zeppelink will be interesting, I have this in
> my ToDo list
>
>  - More suggestions? bugs ?
>
>
>
> Thanks !!
>
>
>
> 2017-01-16 9:37 GMT+01:00 Jian Zhong :
>
> very good document.
>
>
>
> I see "Kylin Carabel" section, maybe need to update to "Kylin Caravel"
>
>
>
> Thanks
>
>
>
> On Sun, Jan 1, 2017 at 6:53 AM, Alberto Ramón 
> wrote:
>
> Happy 2017   :)
>
> I updated Kylin & BI tools with new notes:
> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>
>
>
> 2016-09-28 1:30 GMT+02:00 Li Yang :
>
> Base on the great work, we could create more How-To page to add to Kylin
> document section.
>
> Yang
>
>
>
> On Tue, Sep 20, 2016 at 9:03 AM, Luke Han  wrote:
>
> Very nice, thanks Alberto
>
>
>
>
> Best Regards!
> -
>
> Luke Han
>
>
>
> On Mon, Sep 19, 2016 at 10:21 PM, Billy(Yiming) Liu <
> liuyiming@gmail.com> wrote:
>
> So cool, impressive. Thank you, Alberto.
>
>
>
> 2016-09-19 21:42 GMT+08:00 Alberto Ramón :
>
> Hello
>
> This is the end of all my previous articles, about Kylin and differents
> tools
> With some successful and some failures   :)
>
>
> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>
>
>
> If you have any comment / improvement, feel free to indicate me the changes
>
> A lot of thanks to the "Kylin Team", Alb
>
>
>
>
>
> --
>
> With Warm regards
>
> Yiming Liu (刘一鸣)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re:

2017-01-17 Thread Alberto Ramón
Did you compressed the output cube? This is very important (see last link)

About Order BY
  - Check if TopN can solve your problem:
 http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/
  - Try to reorder RowKey to put OrderBY in first possitions
  - Try AGG : Make a "sub-cube" with less Dim
 http://kylin.apache.org/blog/2016/02/18/new-aggregation-group/

2017-01-17 7:50 GMT+01:00 Phong Pham :

> Hi Alberto,
>After try to apply your suggestion, our queríe is improved so much.
> Thanks a lot.
> However, we have problem with ORDER BY function. When we use ORDER BY with
> a large data set (for example: with long date-range filter), performance is
> very slow.
> Result:
> *User: ADMIN*
> *Success: true*
> *Duration: 23.311*
> *Project: metrixa_global_database_new*
> *Realization Names: [account_global_convtrack_summary_daily_by_location]*
> *Cuboid Ids: [135]*
> *Total scan count: 2595584*
> *Result row count: 250*
> *Accept Partial: true*
> *Is Partial Result: false*
> *Hit Exception Cache: false*
> *Storage cache used: false*
> *Message: null*
>
> ORDER BY performance goes down when Total Scan Count is big. So how can i
> improve this problem?
> Thanks
>
>
> 2017-01-16 18:45 GMT+07:00 Alberto Ramón :
>
>> Hi Phon, I'm not expert but I have some suggestions:
>>
>> - All Dim en are using Dict: you can change a lot to Integer (Fix length)
>> - Re-Order row key its a good idea. I always try to first fields of key
>> have Fix Length. Put mandatory the First its a good Idea
>> - See hierarchy optimizations, will be very interesting for you:
>> Country, Region, City, site . Perhaps Company  and Account also can be
>> included (I don't know your data)
>> - If you use Left join, the first step of building cube (flat table) will
>> be more slow
>> - Check if your ORC input table is compressed
>> - Try to use derived DIm with very low cardinality columns, perhaps:
>> TypeID, NetworkID, LanguajeID, IsMovileDevice.
>>I understand that Affiliated, Account, Company, ... will growth in
>> the future, because you are working with test data ?
>>
>> Check this references:
>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>> http://mail-archives.apache.org/mod_mbox/kylin-user/201611.mbox
>> /%3Ctencent_F5A1E061EFFB778CC5BF9909%40qq.com%3E
>> http://mail-archives.apache.org/mod_mbox/kylin-user/201607.mbox
>> /%3C004201d1d4ef%240151b7e0%2403f527a0%24%40fishbowl.com%3E
>> http://mail-archives.apache.org/mod_mbox/kylin-user/201612.mbox
>> /%3CCAEcyM171RGhk0QoXJUjjZJeSxXwgUGu0vO%2B_T71KXMU1k00L%2Bg%
>> 40mail.gmail.com%3E
>> Check this tunning example:  https://github.com/albertoRamon/Kylin
>> /tree/master/KylinPerformance
>>
>> BR, Alb
>>
>>
>> 2017-01-16 3:47 GMT+01:00 Phong Pham :
>>
>>> Hi all,
>>> Hi all,
>>>* We still meet problems with query performance. Here is the cube
>>> info of one cube*:
>>> {
>>>  "uuid": "6b2f4643-72a3-4a51-b9f2-47aa8e1322a5",
>>>  "last_modified": 1484533219336,
>>>  "version": "1.6.0",
>>>  "name": "account_global_convtrack_summary_daily_test",
>>>  "owner": "ADMIN",
>>>  "descriptor": "account_global_convtrack_summary_daily_test",
>>>  "cost": 50,
>>>  "status": "READY",
>>>  "segments": [
>>> {
>>>  "uuid": "85fa970e-6808-47c8-ae35-45d1975bb3bc",
>>>  "name": "2016010100_2016122600",
>>>  "storage_location_identifier": "KYLIN_7E4KIJ3YGX",
>>>  "date_range_start": 145160640,
>>>  "date_range_end": 148271040,
>>>  "source_offset_start": 0,
>>>  "source_offset_end": 0,
>>>  "status": "READY",
>>>  "size_kb": 9758001,
>>>  "input_records": 8109122,
>>>  "input_records_size": 102078756,
>>>  "last_build_time": 1484533219335,
>>>  "last_build_job_id": "a4f67403-17cb-4474-84d1-21ad64ed17a8",
>>>  "create_time_utc": 1484527504660,
>>>  "cuboid_shard_nums": {},
>>>  "total_shards": 4,
>>>  "blackout_cuboids": [],
>>>  "binary_signature": null,
>>>  "dictionaries": {
>>> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/CIT

Re: Kylin and BI Tools

2017-01-16 Thread Alberto Ramón
Somebody has been tested this with last versions of Kylin?:
http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/kylin.html

If this work OK with Kylin 1.6 or 2.0, I can put a reference directly

2017-01-16 15:31 GMT+01:00 Billy Liu :

> I have interest on Zeppelin also, please refer to
> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/kylin.html
> first.
>
> 2017-01-16 19:14 GMT+08:00 Alberto Ramón :
>
>> yes,
>>  - I will fix "Carabel" to "Caravel". (It is a shame that this project
>> is not updated, because the quality of the graphics are very good)
>>  - Document about  Kylin and Zeppelink will be interesting, I have this
>> in my ToDo list
>>  - More suggestions? bugs ?
>>
>> Thanks !!
>>
>> 2017-01-16 9:37 GMT+01:00 Jian Zhong :
>>
>>> very good document.
>>>
>>> I see "Kylin Carabel" section, maybe need to update to "Kylin Caravel"
>>>
>>> Thanks
>>>
>>> On Sun, Jan 1, 2017 at 6:53 AM, Alberto Ramón >> > wrote:
>>>
>>>> Happy 2017   :)
>>>>
>>>> I updated Kylin & BI tools with new notes:
>>>> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>>>>
>>>>
>>>>
>>>> 2016-09-28 1:30 GMT+02:00 Li Yang :
>>>>
>>>>> Base on the great work, we could create more How-To page to add to
>>>>> Kylin document section.
>>>>>
>>>>> Yang
>>>>>
>>>>> On Tue, Sep 20, 2016 at 9:03 AM, Luke Han  wrote:
>>>>>
>>>>>> Very nice, thanks Alberto
>>>>>>
>>>>>>
>>>>>> Best Regards!
>>>>>> -
>>>>>>
>>>>>> Luke Han
>>>>>>
>>>>>> On Mon, Sep 19, 2016 at 10:21 PM, Billy(Yiming) Liu <
>>>>>> liuyiming@gmail.com> wrote:
>>>>>>
>>>>>>> So cool, impressive. Thank you, Alberto.
>>>>>>>
>>>>>>> 2016-09-19 21:42 GMT+08:00 Alberto Ramón 
>>>>>>> :
>>>>>>>
>>>>>>>> Hello
>>>>>>>>
>>>>>>>> This is the end of all my previous articles, about Kylin and
>>>>>>>> differents tools
>>>>>>>> With some successful and some failures   :)
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> If you have any comment / improvement, feel free to indicate me the
>>>>>>>> changes
>>>>>>>> A lot of thanks to the "Kylin Team", Alb
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> With Warm regards
>>>>>>>
>>>>>>> Yiming Liu (刘一鸣)
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re:

2017-01-16 Thread Alberto Ramón
Hi Phon, I'm not expert but I have some suggestions:

- All Dim en are using Dict: you can change a lot to Integer (Fix length)
- Re-Order row key its a good idea. I always try to first fields of key
have Fix Length. Put mandatory the First its a good Idea
- See hierarchy optimizations, will be very interesting for you: Country,
Region, City, site . Perhaps Company  and Account also can be included (I
don't know your data)
- If you use Left join, the first step of building cube (flat table) will
be more slow
- Check if your ORC input table is compressed
- Try to use derived DIm with very low cardinality columns, perhaps: TypeID,
NetworkID, LanguajeID, IsMovileDevice.
   I understand that Affiliated, Account, Company, ... will growth in the
future, because you are working with test data ?

Check this references:
http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
http://mail-archives.apache.org/mod_mbox/kylin-user/201611.mbox
/%3Ctencent_F5A1E061EFFB778CC5BF9909%40qq.com%3E
http://mail-archives.apache.org/mod_mbox/kylin-user/201607.mbox
/%3C004201d1d4ef%240151b7e0%2403f527a0%24%40fishbowl.com%3E
http://mail-archives.apache.org/mod_mbox/kylin-user/201612.mbox
/%3CCAEcyM171RGhk0QoXJUjjZJeSxXwgUGu0vO%2B_T71KXMU1k00L%2Bg%40mail.gmail.com
%3E
Check this tunning example:  https://github.com/albertoRamon/Kylin
/tree/master/KylinPerformance

BR, Alb


2017-01-16 3:47 GMT+01:00 Phong Pham :

> Hi all,
> Hi all,
>* We still meet problems with query performance. Here is the cube info
> of one cube*:
> {
>  "uuid": "6b2f4643-72a3-4a51-b9f2-47aa8e1322a5",
>  "last_modified": 1484533219336,
>  "version": "1.6.0",
>  "name": "account_global_convtrack_summary_daily_test",
>  "owner": "ADMIN",
>  "descriptor": "account_global_convtrack_summary_daily_test",
>  "cost": 50,
>  "status": "READY",
>  "segments": [
> {
>  "uuid": "85fa970e-6808-47c8-ae35-45d1975bb3bc",
>  "name": "2016010100_2016122600",
>  "storage_location_identifier": "KYLIN_7E4KIJ3YGX",
>  "date_range_start": 145160640,
>  "date_range_end": 148271040,
>  "source_offset_start": 0,
>  "source_offset_end": 0,
>  "status": "READY",
>  "size_kb": 9758001,
>  "input_records": 8109122,
>  "input_records_size": 102078756,
>  "last_build_time": 1484533219335,
>  "last_build_job_id": "a4f67403-17cb-4474-84d1-21ad64ed17a8",
>  "create_time_utc": 1484527504660,
>  "cuboid_shard_nums": {},
>  "total_shards": 4,
>  "blackout_cuboids": [],
>  "binary_signature": null,
>  "dictionaries": {
> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/CITYID":
> "/dict/METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/
> CITYID/0015e15c-9336-4040-b8ad-b7afba71d51c.dict",
> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/TYPE":
> "/dict/METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/
> TYPE/56cc3576-3c19-40fb-8704-29dba88e3511.dict",
> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/NETWORKID":
> "/dict/METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/
> NETWORKID/edc1b900-8b8a-4834-a8ab-4d23e0087d61.dict",
> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/WEEKGROUP":
> "/dict/METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/
> WEEKGROUP/3c3ae7e2-05a0-49a3-b396-ded7b1faaebd.dict",
> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/DATESTATSBIGINT":
> "/dict/METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/
> DATESTATSBIGINT/b2003335-f10c-48b5-ac98-6d2ddd25854b.dict",
> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/COUNTRYID":
> "/dict/METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/
> COUNTRYID/233a3b35-9e0f-46e3-bb01-3330c907ab33.dict",
> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/ACCOUNTID":
> "/dict/METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/
> ACCOUNTID/612d8a57-8ed8-4fdd-bf99-c64fb2a583fe.dict",
> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/DEVICEID":
> "/dict/METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/
> DEVICEID/8813544c-aac3-4f26-849b-3e3d1b71d9e2.dict",
> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/LANGUAGEID":
> "/dict/METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/
> LANGUAGEID/02dea027-86cf-44e6-9bcf-9dbd4c33e54b.dict",
> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/COMPANYID":
> "/dict/METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/
> COMPANYID/75a5566e-b419-4fc8-9184-757b207a35d2.dict",
> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/REGIONID":
> "/dict/METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_SUMMARY_DAILY_ORC/
> REGIONID/81d5b463-8639-4633-83b9-9ac9e43e32cb.dict",
> "METRIXA_GLOBAL_DATABASE.ACCOUNT_GLOBAL_CONVTRACK_
> SUMMARY_DAILY_ORC/AFFILIATEID": "/dict/METRIXA_GLOBAL_
> DATABASE.ACCOUNT_GLOBAL_CONVTR

Re: Kylin and BI Tools

2017-01-16 Thread Alberto Ramón
yes,
 - I will fix "Carabel" to "Caravel". (It is a shame that this project is
not updated, because the quality of the graphics are very good)
 - Document about  Kylin and Zeppelink will be interesting, I have this in
my ToDo list
 - More suggestions? bugs ?

Thanks !!

2017-01-16 9:37 GMT+01:00 Jian Zhong :

> very good document.
>
> I see "Kylin Carabel" section, maybe need to update to "Kylin Caravel"
>
> Thanks
>
> On Sun, Jan 1, 2017 at 6:53 AM, Alberto Ramón 
> wrote:
>
>> Happy 2017   :)
>>
>> I updated Kylin & BI tools with new notes:
>> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>>
>>
>>
>> 2016-09-28 1:30 GMT+02:00 Li Yang :
>>
>>> Base on the great work, we could create more How-To page to add to Kylin
>>> document section.
>>>
>>> Yang
>>>
>>> On Tue, Sep 20, 2016 at 9:03 AM, Luke Han  wrote:
>>>
>>>> Very nice, thanks Alberto
>>>>
>>>>
>>>> Best Regards!
>>>> -
>>>>
>>>> Luke Han
>>>>
>>>> On Mon, Sep 19, 2016 at 10:21 PM, Billy(Yiming) Liu <
>>>> liuyiming@gmail.com> wrote:
>>>>
>>>>> So cool, impressive. Thank you, Alberto.
>>>>>
>>>>> 2016-09-19 21:42 GMT+08:00 Alberto Ramón :
>>>>>
>>>>>> Hello
>>>>>>
>>>>>> This is the end of all my previous articles, about Kylin and
>>>>>> differents tools
>>>>>> With some successful and some failures   :)
>>>>>>
>>>>>>
>>>>>> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>>>>>>
>>>>>>
>>>>>>
>>>>>> If you have any comment / improvement, feel free to indicate me the
>>>>>> changes
>>>>>> A lot of thanks to the "Kylin Team", Alb
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> With Warm regards
>>>>>
>>>>> Yiming Liu (刘一鸣)
>>>>>
>>>>
>>>>
>>>
>>
>


Re: Kylin and BI Tools

2016-12-31 Thread Alberto Ramón
Happy 2017   :)

I updated Kylin & BI tools with new notes:
https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain



2016-09-28 1:30 GMT+02:00 Li Yang :

> Base on the great work, we could create more How-To page to add to Kylin
> document section.
>
> Yang
>
> On Tue, Sep 20, 2016 at 9:03 AM, Luke Han  wrote:
>
>> Very nice, thanks Alberto
>>
>>
>> Best Regards!
>> -
>>
>> Luke Han
>>
>> On Mon, Sep 19, 2016 at 10:21 PM, Billy(Yiming) Liu <
>> liuyiming@gmail.com> wrote:
>>
>>> So cool, impressive. Thank you, Alberto.
>>>
>>> 2016-09-19 21:42 GMT+08:00 Alberto Ramón :
>>>
>>>> Hello
>>>>
>>>> This is the end of all my previous articles, about Kylin and differents
>>>> tools
>>>> With some successful and some failures   :)
>>>>
>>>>
>>>> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>>>>
>>>>
>>>>
>>>> If you have any comment / improvement, feel free to indicate me the
>>>> changes
>>>> A lot of thanks to the "Kylin Team", Alb
>>>>
>>>
>>>
>>>
>>> --
>>> With Warm regards
>>>
>>> Yiming Liu (刘一鸣)
>>>
>>
>>
>


Re: kylin query with case return error result

2016-12-27 Thread Alberto Ramón
See calcite syntaxis

I think isnt allowed: Agg ( distinct case)
You can try with: Agg ( distinct value)

2016-12-27 9:41 GMT+01:00 Billy Liu :

> When you talk about mismatch result, you'd better provide the sample data
> and actual result. Otherwise, nobody could reproduce your issue easily.
>
> 2016-12-27 16:13 GMT+08:00 徐 鹏 :
>
>> HI all:
>> Query1:
>> SELECT  COUNT(DISTINCT CASE WHEN pagefiltername IN
>> (‘homepage') THEN t.loginkey END) AS homepageuv
>> FROM fly t
>> WHERE mmdd='20161222’
>> Query2:
>> SELECT COUNT(DISTINCT t.loginkey ) AS homepageuv FROM fly
>> t WHERE mmdd='20161222' and pagefiltername IN ('homepage') ;
>>
>> expected :Query1=Query2
>> actual:Query1 !=Query2
>>
>> What’s wrong?
>>
>>
>> Regards,
>> Peng Xu
>> xupeng1...@outlook.com
>>
>>
>>
>>
>>
>>
>>
>
>
>


Re: ArrayIndexOutOfBoundsException: -1

2016-12-26 Thread Alberto Ramón
(merry Christmas)

I found the error:
 * You can't have the same name column (cod_producto) in Dim Table and Fact
Table*  ==> ERROR: java.lang.ArrayIndexOutOfBoundsException: -1
  (If you don't use this Dim in Cube, don't have any problem)
  Open JIRA ??


I also discovered:
  In Data model, you can define the same column from Fact Table as Dim and
as Measure
  Is this the desired behavior ??
  Open JIRA ??



2016-12-23 0:44 GMT+01:00 Alberto Ramón :

>
> Error on, Extract Fact Table Distinct Columns
>
>
>
>
> *   Insane record: [1, 0600-160077, FVP  DAFUTURO - ESTABLE, COP, 11, 11, 
> Tipo de producto 11, 16.94579786]   java.lang.ArrayIndexOutOfBoundsException: 
> -1
>   at org.apache.kylin.engine.mr 
> <http://org.apache.kylin.engine.mr>.steps.FactDistinctHiveColumnsMapper.map(FactDistinctHiveColumnsMapper.java:140)*
>
>
>
> I see an extra column,  My DIM have 7 columns:
>
> *Original CSV: 7 columns*
> [image: Imágenes integradas 3]
>
>
> *On hive: 7 columns*
> [image: Imágenes integradas 1]
>
>
>
> *On DM: 7 columns*[image: Imágenes integradas 2]
>
>
>
> *On Cube: 7 columns*
>
>   "dimensions": [
> {
>   "name": "ID_PRODUCTO",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "ID_PRODUCTO",
>   "derived": null
> },
> {
>   "name": "COD_PRODUCTO",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "COD_PRODUCTO",
>   "derived": null
> },
> {
>   "name": "PRODUCTO_DESC",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "PRODUCTO_DESC",
>   "derived": null
> },
> {
>   "name": "CURRECY",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "CURRENCY",
>   "derived": null
> },
> {
>   "name": "ISIN",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "ISIN",
>   "derived": null
> },
> {
>   "name": "ID_TIPO_PRODUCTO",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "ID_TIPO_PRODUCTO",
>   "derived": null
> },
> {
>   "name": "TIPO_PRODUCTO_DESC",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "TIPO_PRODUCTO_DESC",
>   "derived": null
> }
>   ],
>
>


ArrayIndexOutOfBoundsException: -1

2016-12-22 Thread Alberto Ramón
Error on, Extract Fact Table Distinct Columns




*   Insane record: [1, 0600-160077, FVP  DAFUTURO - ESTABLE, COP, 11,
11, Tipo de producto 11, 16.94579786]
java.lang.ArrayIndexOutOfBoundsException: -1
at 
org.apache.kylin.engine.mr.steps.FactDistinctHiveColumnsMapper.map(FactDistinctHiveColumnsMapper.java:140)*



I see an extra column,  My DIM have 7 columns:

*Original CSV: 7 columns*
[image: Imágenes integradas 3]


*On hive: 7 columns*
[image: Imágenes integradas 1]



*On DM: 7 columns*[image: Imágenes integradas 2]



*On Cube: 7 columns*

  "dimensions": [
{
  "name": "ID_PRODUCTO",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "ID_PRODUCTO",
  "derived": null
},
{
  "name": "COD_PRODUCTO",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "COD_PRODUCTO",
  "derived": null
},
{
  "name": "PRODUCTO_DESC",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "PRODUCTO_DESC",
  "derived": null
},
{
  "name": "CURRECY",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "CURRENCY",
  "derived": null
},
{
  "name": "ISIN",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "ISIN",
  "derived": null
},
{
  "name": "ID_TIPO_PRODUCTO",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "ID_TIPO_PRODUCTO",
  "derived": null
},
{
  "name": "TIPO_PRODUCTO_DESC",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "TIPO_PRODUCTO_DESC",
  "derived": null
}
  ],


Re: Joint and Order in RowKey

2016-12-21 Thread Alberto Ramón
yes, but I understand that if (ID , TXT) are Joint Dim, In drag and drop
you should see together like one Dim

2016-12-21 11:24 GMT+01:00 Li Yang :

> Maybe I didn't get the question. But the order of rowkey is adjustable by
> drag then move up and down...
>
> On Tue, Dec 20, 2016 at 2:46 AM, Alberto Ramón 
> wrote:
>
>> If we have these columns:
>> [image: Imágenes integradas 1]
>>
>> With There Joints:
>> [image: Imágenes integradas 3]
>>
>> *Why I cant  order these columns individually?*  (Text , Id) now must be
>> a tupple
>> [image: Imágenes integradas 4]
>>
>> (I accept suggestion about order, anyo=year)
>>
>
>


Re: if can add where clause to a measure?

2016-12-20 Thread Alberto Ramón
I never use, but Kylin 976 
can be useful for you

2016-12-21 8:14 GMT+01:00 ZhouJie :

> hi, everyone
> i want to know if kylin can filter a column which has been measured, as
> follows:
> select sum(price) from hotprice_copy1 where price > 100.0 and price <5000.0
>
> thanks
> joe
>


Re: How to workaround with columns with NULL value?

2016-12-20 Thread Alberto Ramón
about 1º point: In Kylin 2049
 there is a commet

of Shaofeng SHI

2016-12-21 6:32 GMT+01:00 Da Tong :

> Hi, all
>
> I am using kylin 1.6.0. I have met three problem:
>
> 1. in one of my Metrics, some of the values are NULL, when I tried to
> calculate the average of the column, the COUNT function will not filter out
> NULL value, which means the average result is biased. One solution I found
> is using another column to mark whether the value is NULL or not, but there
> are hundreds of columns like this. I don't think adding another hundreds of
> mark column as dimensions is a good way. Any suggestions about this
> situation?
>
> 2. I need to do filter using WHERE clause in some metrics columns, such as
> count rows that having value of one field over 100. It seems that I have to
> add new columns such as A_FIELD_OVER_100 to achieve this. But what if the
> *100* is a variable? User of our system need to filter out result based on
> metrics value, should I add metrics into dimensions? Is this requirement an
> uncommon case?
>
> 3. It seems that querying all-null columns issue is fixed in this issue
>  (Kylin 1527). But I
> still got NullPointerError from RawMesureType.valueOf method. I just want
> to make sure that Kylin support columns with all null values, right?
>
> Any suggestion is welcome. Thank you.
> --
> TONG, Da / 佟达
>


Re: Error when #2 Step: Redistribute Flat Hive Table - File does not exist

2016-12-19 Thread Alberto Ramón
other idea:
Can be a problem with permissions?: the user that execute Kylin can't read
data generated by YARN
check if Kylin user can read your folder  /young/kylin_test/
Which Hadoop user are executing Kylin?

(no more ideas, Good Luck)

2016-12-20 7:51 GMT+01:00 雨日听风 <491245...@qq.com>:

> Thank you!
> We checked the yarn and hard disk. But not found any error. Hard disk
> space and memory and so on is working well.
> Last time its error code was "unknownhost clusterB",now in new server env
> it cant find clusterB(hbase only). but cant find rowCount file.
> ===
> the follow command runs ok:
> hdfs dfs -mkdir /young/kylin_test/kylin_metadata_nokia/
> kylin-678c15ba-5375-4f80-831e-1ae0af8ed576/row_count/tmp
> And "ls" cant find file "00_0"  which it said "file does not exist".
>
> -- 原始邮件 --
> *发件人:* "Alberto Ramón";;
> *发送时间:* 2016年12月19日(星期一) 晚上9:13
> *收件人:* "user";
> *主题:* Re: Error when #2 Step: Redistribute Flat Hive Table - File does
> not exist
>
> i think i had this error last nigth  :)
> (go to yarn to find detailed error & find on internet)
> in my case was free space less than 10% of hard disk. Check this please
>
> El 19/12/2016 11:35, "雨日听风" <491245...@qq.com> escribió:
>
>> When I build a cube in kylin1.6, I get error in step #2: Redistribute
>> Flat Hive Table
>>
>> Please help! Thank you very much!
>>
>> env: kylin1.6 is in a independent server, and have 2 other server
>> cluster: clusterA(hive only) and clusterB(hbase only).
>> Error is:
>>
>> 2016-12-19 10:28:00,641 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Compute row count of flat hive table,
>> cmd:
>> 2016-12-19 10:28:00,642 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : hive -e "USE boco;
>> SET dfs.replication=2;
>> SET hive.exec.compress.output=true;
>> SET hive.auto.convert.join.noconditionaltask=true;
>> SET hive.auto.convert.join.noconditionaltask.size=1;
>> SET mapreduce.output.fileoutputformat.compress.type=BLOCK;
>> SET mapreduce.job.split.metainfo.maxsize=-1;
>> SET mapreduce.job.queuename=young;
>> SET tez.queue.name=young;
>>
>> set hive.exec.compress.output=false;
>>
>> set hive.exec.compress.output=false;
>> INSERT OVERWRITE DIRECTORY '/young/kylin_test/kylin_metad
>> ata_test/kylin-678266c0-ba0e-48b4-bdb5-6e578320375a/row_count' SELECT
>> count(*) FROM kylin_intermediate_hbase_in_testCluster_CUBE_f9468805_eabf_
>> 4b54_bf2b_182e4c86214a;
>>
>> "
>> 2016-12-19 10:28:03,277 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : WARNING: Use "yarn jar" to launch YARN
>> applications.
>> 2016-12-19 10:28:04,444 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 :
>> 2016-12-19 10:28:04,445 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Logging initialized using
>> configuration in file:/etc/hive/conf/hive-log4j.properties
>> 2016-12-19 10:28:14,700 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : OK
>> 2016-12-19 10:28:14,703 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Time taken: 0.935 seconds
>> 2016-12-19 10:28:15,559 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Query ID =
>> young_20161219102814_a7104fd4-ba83-47fc-ac0b-0c9bef4e1969
>> 2016-12-19 10:28:15,560 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Total jobs = 1
>> 2016-12-19 10:28:15,575 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Launching Job 1 out of 1
>> 2016-12-19 10:28:22,842 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 :
>> 2016-12-19 10:28:22,842 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 :
>> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Status: Running (Executing on YARN
>> cluster with App id application_1473415773736_1063281)
>> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 :
>> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Map 1: -/- Reducer 2: 0/1
>> 2016-12-19 10:28:23,307 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Map 1: 0/2 Reducer 2: 0/1
>> 2016-12-19 10:28:26,363 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Map 1: 0/2 Reducer 2: 0/1
>> 2016-12-19 10:28:26,567 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Map 1: 0(+1)/2 Reducer 2: 0/1
>> 2016-12-19 10:28:26,596 INFO  [pool-7-threa

Joint and Order in RowKey

2016-12-19 Thread Alberto Ramón
If we have these columns:
[image: Imágenes integradas 1]

With There Joints:
[image: Imágenes integradas 3]

*Why I cant  order these columns individually?*  (Text , Id) now must be a
tupple
[image: Imágenes integradas 4]

(I accept suggestion about order, anyo=year)


Re: Error when #2 Step: Redistribute Flat Hive Table - File does not exist

2016-12-19 Thread Alberto Ramón
i think i had this error last nigth  :)
(go to yarn to find detailed error & find on internet)
in my case was free space less than 10% of hard disk. Check this please

El 19/12/2016 11:35, "雨日听风" <491245...@qq.com> escribió:

> When I build a cube in kylin1.6, I get error in step #2: Redistribute Flat
> Hive Table
>
> Please help! Thank you very much!
>
> env: kylin1.6 is in a independent server, and have 2 other server cluster:
> clusterA(hive only) and clusterB(hbase only).
> Error is:
>
> 2016-12-19 10:28:00,641 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Compute row count of flat hive table,
> cmd:
> 2016-12-19 10:28:00,642 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : hive -e "USE boco;
> SET dfs.replication=2;
> SET hive.exec.compress.output=true;
> SET hive.auto.convert.join.noconditionaltask=true;
> SET hive.auto.convert.join.noconditionaltask.size=1;
> SET mapreduce.output.fileoutputformat.compress.type=BLOCK;
> SET mapreduce.job.split.metainfo.maxsize=-1;
> SET mapreduce.job.queuename=young;
> SET tez.queue.name=young;
>
> set hive.exec.compress.output=false;
>
> set hive.exec.compress.output=false;
> INSERT OVERWRITE DIRECTORY '/young/kylin_test/kylin_
> metadata_test/kylin-678266c0-ba0e-48b4-bdb5-6e578320375a/row_count'
> SELECT count(*) FROM kylin_intermediate_hbase_in_
> testCluster_CUBE_f9468805_eabf_4b54_bf2b_182e4c86214a;
>
> "
> 2016-12-19 10:28:03,277 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : WARNING: Use "yarn jar" to launch YARN
> applications.
> 2016-12-19 10:28:04,444 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 :
> 2016-12-19 10:28:04,445 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Logging initialized using configuration
> in file:/etc/hive/conf/hive-log4j.properties
> 2016-12-19 10:28:14,700 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : OK
> 2016-12-19 10:28:14,703 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Time taken: 0.935 seconds
> 2016-12-19 10:28:15,559 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Query ID =
> young_20161219102814_a7104fd4-ba83-47fc-ac0b-0c9bef4e1969
> 2016-12-19 10:28:15,560 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Total jobs = 1
> 2016-12-19 10:28:15,575 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Launching Job 1 out of 1
> 2016-12-19 10:28:22,842 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 :
> 2016-12-19 10:28:22,842 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 :
> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Status: Running (Executing on YARN
> cluster with App id application_1473415773736_1063281)
> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 :
> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: -/- Reducer 2: 0/1
> 2016-12-19 10:28:23,307 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 0/2 Reducer 2: 0/1
> 2016-12-19 10:28:26,363 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 0/2 Reducer 2: 0/1
> 2016-12-19 10:28:26,567 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 0(+1)/2 Reducer 2: 0/1
> 2016-12-19 10:28:26,596 INFO  [pool-7-thread-1]
> threadpool.DefaultScheduler:118 : Job Fetcher: 1 should running, 1 actual
> running, 0 ready, 0 already succeed, 3 error, 1 discarded, 0 others
> 2016-12-19 10:28:26,769 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 0(+2)/2 Reducer 2: 0/1
> 2016-12-19 10:28:29,810 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 0(+2)/2 Reducer 2: 0/1
> 2016-12-19 10:28:30,217 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 1(+1)/2 Reducer 2: 0(+1)/1
> 2016-12-19 10:28:30,826 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 2/2 Reducer 2: 0(+1)/1
> 2016-12-19 10:28:31,232 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 2/2 Reducer 2: 1/1
> 2016-12-19 10:28:31,319 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Moving data to: /young/kylin_test/kylin_
> metadata_test/kylin-678266c0-ba0e-48b4-bdb5-6e578320375a/row_count
> 2016-12-19 10:28:31,406 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : OK
> 2016-12-19 10:28:31,454 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Time taken: 16.701 seconds
> 2016-12-19 10:28:35,074 ERROR [pool-8-thread-7]
> execution.AbstractExecutable:357 : job:678266c0-ba0e-48b4-bdb5-6e578320375a-01
> execute finished with exception
> java.io.FileNotFoundException: File does not exist:
> /young/kylin_test/kylin_metadata_test/kylin-678266c0-
> ba0e-48b4-bdb5-6e578320375a/row_count/00_0
>  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(
> INodeFile.java:71)
>  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(
> INodeFile.java:61)
>  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
> getBloc

Relocate SQL Query ?

2016-12-12 Thread Alberto Ramón
Hello

I understand (from my point of view) that SQL TAB can be better under Data
model and not under Cube

Cube tabs:

[image: Imágenes integradas 1]

Data Model Tabs:
[image: Imágenes integradas 2]


Re: Cut Size

2016-12-12 Thread Alberto Ramón
"it will do a cap" I dont't know what cap. this means  :)

Then what is the function of "kylin.storage.hbase.hfile-size-gb=2"

2016-12-12 2:58 GMT+01:00 ShaoFeng Shi :

> when you have hfile-size-gb, you re-split HFile using max-region-count and
> region-cut-gb ?
>
> --> Yes; Kylin will estimate the total size, then divide by
> "regino-cut-gb" to get the region number; If the region number exceeds
> "max-region-count", it will do a cap.
>
> Medium , small, . ..  is deprecated (KYLIN-1669
> <https://issues.apache.org/jira/browse/KYLIN-1669>)?
> --> Yes, that marker has been removed; Will use same split configuration
> for all cubes; If user want to customize, he can overwrite the config
> values at cube level.
>
> 2016-12-08 21:27 GMT+08:00 Alberto Ramón :
>
>> I'm reading this MailList
>> <http://apache-kylin.74782.x6.nabble.com/Update-default-config-for-sandbox-environment-td6561.html>
>> and have some doubts (Example
>> <https://github.com/apache/kylin/blob/master/examples/test_case_data/sandbox/kylin.properties#L99>
>> ):
>>
>> region-cut-gb
>> max-region-count
>> hfile-size-gb
>>
>> when you have hfile-size-gb, you re-split HFile using max-region-count
>> and region-cut-gb ?? or is for normal ingest, Kylin 1323?
>>
>> Medium , small, . ..  is deprecated (KYLIN-1669
>> <https://issues.apache.org/jira/browse/KYLIN-1669>)? "# E.g, for cube
>> whose capacity be marked as "SMALL", split region per 10GB by default"
>> (From Example)
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Use derived or Joint

2016-12-12 Thread Alberto Ramón
Thanks, your notes about Hierarchy are very good and important

2016-12-12 2:49 GMT+01:00 ShaoFeng Shi :

> " define "year" - "IDDate" as a hierarchy", which is the benefit ?
> --> The combination "year" + "IDDate" has the same line number as the
> combination "IDDate"; so aggregate from the former to the latter will not
> aggregate much; Then we can prune the later with the "hierarchy" to reduce
> the cube size;
>
> Nowadays, in derived columns, the Host column, is always the PK of table ?
> --> Yes
>
> 2016-12-10 20:25 GMT+08:00 Alberto Ramón :
>
>> thanks for you clear explanation !!
>>
>>
>> The only point that I can't understand is
>> " define "year" - "IDDate" as a hierarchy", which is the benefit ?
>>   [image: Imágenes integradas 1]
>> Where:
>>
>>-
>>
>>IDData is PK of Dim table, Unique & Identity
>>- Year is a Normal Dim --> I will have precalculated by years
>>
>>
>> Nowadays, in derived columns, the Host column, is always the PK of table ?
>>
>>
>>
>>
>> 2016-12-09 15:25 GMT+01:00 ShaoFeng Shi :
>>
>>> Hi Albert, I think you're raising a good question; Many users face such
>>> questions when using Kylin in their cases. Let me try to share some my
>>> cents.
>>>
>>> "Derived" or "Joint" ?
>>> These are two independent means in Kylin (they're not conflict). Using
>>> which depends on how these dimensions being used I think;
>>>
>>> Take the "IDDate" case you mentioned as an example; If most of you
>>> queries are aggregated at the PK/FK level (which is date), and user just
>>> want to in passing other fields like its "MonthTxt", "DayWeekTxt", defining
>>> them as "derived" will be very good.
>>>
>>> But if you also want to aggregate also at "MonthTxt" or "DayWeekTxt"
>>> level, defining them as "Derived" might not be good; Because Kylin need
>>> translates the condition of "MonthTxt" into a set of PK values ("IDDate"),
>>> and then query from Cube with these values, because the cube only
>>> pre-aggreated at "IDDate"; This will slow down the query; (Ofcause if your
>>> dataset is small it still be acceptable)
>>>
>>> Besides, defining "Year" - "DayWeek_ID" - "Month_ID" as hierarchy is not
>>> suggested, because they are not a hierarchy relationship, but in parallel
>>> here;  ("March" is not a child of "2016", it appears in every year)
>>>
>>> "Joint" can be used in two typical cases:
>>> 1) combine multiple ultra low cardinality dimensions
>>> 2) combine dimensions which has 1:1 (like "Month_ID" "Month_Txt") or
>>> close to 1:1 relationship (like "USER_ID", "USER_EMAIL")
>>>
>>> For case 1, I might design the cube in this way (assume you have the
>>> need to group by year, month, dayweek):
>>> 1) define all them as normal dimension
>>> 2) define "year" - "IDDate" as a hierarchy
>>> 3) define "Month_ID" + "Month_Txt" as a joint, because they're 1:1
>>> 4) define "DayWeek_ID" + "DayWeek_Txt" as a joint, because they're 1:1
>>>
>>>
>>> For case 2, I have the same suggestion as above.
>>>
>>> 2016-12-09 7:10 GMT+08:00 Alberto Ramón :
>>>
>>>> Typical case 1:
>>>>
>>>> *IDDate*
>>>>
>>>> *Month_ID*
>>>>
>>>> *Month_Txt*
>>>>
>>>> *DayWeek_ID*
>>>>
>>>> *DayWeek_Txt*
>>>>
>>>> *Year*
>>>>
>>>> 2016-03-01
>>>>
>>>> 3
>>>>
>>>> March
>>>>
>>>> 2
>>>>
>>>> Wendesday
>>>>
>>>> 2016
>>>>
>>>> 2016-03-02
>>>>
>>>> 3
>>>>
>>>> March
>>>>
>>>> 3
>>>>
>>>> Thursday
>>>>
>>>> 2016
>>>>
>>>> 2016-03-02
>>>>
>>>> 3
>>>>
>>>> March
>>>>
>>>> 4
>>>>
>>>> Friday
>>>>
>>>> 2016
>>>>
&

Cuboid WhiteList

2016-12-11 Thread Alberto Ramón
Hello

Kylin – 242 

Nowadays Does cuboid whiteList / Partial Cube exist? or has been replaced
by Agregation groups ?


Re: Use derived or Joint

2016-12-10 Thread Alberto Ramón
thanks for you clear explanation !!


The only point that I can't understand is
" define "year" - "IDDate" as a hierarchy", which is the benefit ?
  [image: Imágenes integradas 1]
Where:

   -

   IDData is PK of Dim table, Unique & Identity
   - Year is a Normal Dim --> I will have precalculated by years


Nowadays, in derived columns, the Host column, is always the PK of table ?




2016-12-09 15:25 GMT+01:00 ShaoFeng Shi :

> Hi Albert, I think you're raising a good question; Many users face such
> questions when using Kylin in their cases. Let me try to share some my
> cents.
>
> "Derived" or "Joint" ?
> These are two independent means in Kylin (they're not conflict). Using
> which depends on how these dimensions being used I think;
>
> Take the "IDDate" case you mentioned as an example; If most of you queries
> are aggregated at the PK/FK level (which is date), and user just want to in
> passing other fields like its "MonthTxt", "DayWeekTxt", defining them as
> "derived" will be very good.
>
> But if you also want to aggregate also at "MonthTxt" or "DayWeekTxt"
> level, defining them as "Derived" might not be good; Because Kylin need
> translates the condition of "MonthTxt" into a set of PK values ("IDDate"),
> and then query from Cube with these values, because the cube only
> pre-aggreated at "IDDate"; This will slow down the query; (Ofcause if your
> dataset is small it still be acceptable)
>
> Besides, defining "Year" - "DayWeek_ID" - "Month_ID" as hierarchy is not
> suggested, because they are not a hierarchy relationship, but in parallel
> here;  ("March" is not a child of "2016", it appears in every year)
>
> "Joint" can be used in two typical cases:
> 1) combine multiple ultra low cardinality dimensions
> 2) combine dimensions which has 1:1 (like "Month_ID" "Month_Txt") or close
> to 1:1 relationship (like "USER_ID", "USER_EMAIL")
>
> For case 1, I might design the cube in this way (assume you have the need
> to group by year, month, dayweek):
> 1) define all them as normal dimension
> 2) define "year" - "IDDate" as a hierarchy
> 3) define "Month_ID" + "Month_Txt" as a joint, because they're 1:1
> 4) define "DayWeek_ID" + "DayWeek_Txt" as a joint, because they're 1:1
>
>
> For case 2, I have the same suggestion as above.
>
> 2016-12-09 7:10 GMT+08:00 Alberto Ramón :
>
>> Typical case 1:
>>
>> *IDDate*
>>
>> *Month_ID*
>>
>> *Month_Txt*
>>
>> *DayWeek_ID*
>>
>> *DayWeek_Txt*
>>
>> *Year*
>>
>> 2016-03-01
>>
>> 3
>>
>> March
>>
>> 2
>>
>> Wendesday
>>
>> 2016
>>
>> 2016-03-02
>>
>> 3
>>
>> March
>>
>> 3
>>
>> Thursday
>>
>> 2016
>>
>> 2016-03-02
>>
>> 3
>>
>> March
>>
>> 4
>>
>> Friday
>>
>> 2016
>>
>> IDDate is PK of Dim table and Unique
>>
>>
>> SOL 1: Uses Hierarchy and Derived from non PK column
>>
>>
>> *Month_ID*
>>
>> Hierarchy 2
>>
>> Normal 1
>>
>> *Month_Txt*
>>
>>
>> Derived 1
>>
>> *DayWeek_ID*
>>
>> Hierarchy 3
>>
>> Normal 2
>>
>> *DayWeek_Txt*
>>
>>
>> Derived 2
>>
>> *Year*
>>
>> Hierarchy 1
>>
>> Normal 3
>>
>> Year > Month > Day
>>
>> Text are derived from ID (in month and Week)
>>
>> PB1: KYLIN-444 <https://issues.apache.org/jira/browse/KYLIN-444>
>>
>> PB2: I don't know how create Derived column from non PK with actual UI (Kylin
>> – 1313 <https://issues.apache.org/jira/browse/KYLIN-1313> v1.5.2 Kylin
>> 1786 <https://issues.apache.org/jira/browse/KYLIN-1786>, v1.5.3)
>>
>>
>>
>> SOL 2:
>>
>> *Month_ID*
>>
>> Hierarchy 2
>>
>> Join 1
>>
>> *Month_Txt*
>>
>>
>> Join 1
>>
>> *DayWeek_ID*
>>
>> Hierarchy 3
>>
>> Join 2
>>
>> *DayWeek_Txt*
>>
>>
>> Join 2
>>
>> *Year*
>>
>> Hierarchy 1
>>
>> Normal 3
>>
>>
>> SOL 2 is this the best solution ??
>>
>>
>>
>> Typical case 2:
>>
>> I see the same scenario a lot of times (derived columns with 1:1 Relation)
>>
>> Product_ID *(PK)*
>>
>> Product_TXT
>>
>> TypeProduct_ID
>>
>> TypeProduct_TXT
>>
>> Country_TXT
>>
>> Country_ID
>>
>> Optimize queries by product / category / country, are mandatory
>>
>> Perhaps,
>>
>> Country (lower cardinality) its a good candidate to Join
>>
>> I don't want put Product_TXT as Join, because is a long text, and can
>> be affect Row_Key of HBase, but I need Queries like ... where product_TXT =
>> ""iRobot Roomba 650 Robotic Vacuum Cleaner
>>
>> suggestions ?
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Use derived or Joint

2016-12-08 Thread Alberto Ramón
Typical case 1:

*IDDate*

*Month_ID*

*Month_Txt*

*DayWeek_ID*

*DayWeek_Txt*

*Year*

2016-03-01

3

March

2

Wendesday

2016

2016-03-02

3

March

3

Thursday

2016

2016-03-02

3

March

4

Friday

2016

IDDate is PK of Dim table and Unique


SOL 1: Uses Hierarchy and Derived from non PK column


*Month_ID*

Hierarchy 2

Normal 1

*Month_Txt*


Derived 1

*DayWeek_ID*

Hierarchy 3

Normal 2

*DayWeek_Txt*


Derived 2

*Year*

Hierarchy 1

Normal 3

Year > Month > Day

Text are derived from ID (in month and Week)

PB1: KYLIN-444 

PB2: I don't know how create Derived column from non PK with actual UI (Kylin
– 1313  v1.5.2 Kylin 1786
, v1.5.3)



SOL 2:

*Month_ID*

Hierarchy 2

Join 1

*Month_Txt*


Join 1

*DayWeek_ID*

Hierarchy 3

Join 2

*DayWeek_Txt*


Join 2

*Year*

Hierarchy 1

Normal 3


SOL 2 is this the best solution ??



Typical case 2:

I see the same scenario a lot of times (derived columns with 1:1 Relation)

Product_ID *(PK)*

Product_TXT

TypeProduct_ID

TypeProduct_TXT

Country_TXT

Country_ID

Optimize queries by product / category / country, are mandatory

Perhaps,

Country (lower cardinality) its a good candidate to Join

I don't want put Product_TXT as Join, because is a long text, and can
be affect Row_Key of HBase, but I need Queries like ... where product_TXT =
""iRobot Roomba 650 Robotic Vacuum Cleaner

suggestions ?


Cut Size

2016-12-08 Thread Alberto Ramón
I'm reading this MailList

and have some doubts (Example

):

region-cut-gb
max-region-count
hfile-size-gb

when you have hfile-size-gb, you re-split HFile using max-region-count and
region-cut-gb ?? or is for normal ingest, Kylin 1323?

Medium , small, . ..  is deprecated (KYLIN-1669
)? "# E.g, for cube whose
capacity be marked as "SMALL", split region per 10GB by default"  (From
Example)


Re: Consulting "EXTENDED_COLUMN"

2016-12-02 Thread Alberto Ramón
yes, I will asume this overhead in rowKey

2016-12-02 9:58 GMT+01:00 Billy(Yiming) Liu :

> Using Joint Dimension for your 1:1 relation is the right design.
>
> 2016-12-02 0:21 GMT+08:00 Alberto Ramón :
>
>> Nice Liu
>>
>> We have some cases like
>> DayWeekTXT , DayWeekID
>> MonthTXT, MonthID
>>
>> small proposal:
>> Can would be interesting create Derived with 1:1 relation, with support
>> for filters and Group by
>>
>> 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu :
>>
>>> The cost of joint dimension compared with extended column is you have
>>> more columns in the HBase rowkey. It may harm the query performance. But
>>> most time, joint dimension is still recommended, since the normal dimension
>>> column supports much more functions than extended column, such as count(*).
>>>
>>> 2016-12-01 17:07 GMT+08:00 Alberto Ramón :
>>>
>>>> Hello
>>>> I was preparing a email with related doubts:
>>>>
>>>> Some times we have derived dimensions with relation 1:1, examples:
>>>> WeekDayID & WeekDayTxt
>>>> MonthID & WeekTxt
>>>>
>>>> SOL1: Derived.  ID as Host and Txt Extended
>>>> PB: You can't filter / Group by Txt
>>>>
>>>> SOL2: Joint. Define tuples of ID & TXT
>>>> Some PB/limitation?  (I need test this option)
>>>>
>>>> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu :
>>>>
>>>>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>>>>> used for representation, but not filtering or grouping which is  done by
>>>>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>>>>> key/value map against the HOST_COLUMN.
>>>>>
>>>>> If the value in EXTENDED_COLUMN is not long, you could just define two
>>>>> dimensions with joint dimension setting, it has almost the same 
>>>>> performance
>>>>> impact with EXTENDED_COLUMN which reduces one dimension, but better
>>>>> understanding.
>>>>>
>>>>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón :
>>>>>
>>>>>> This will help you
>>>>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>>>>>
>>>>>> The idea is always, How I can reduce the number of Dimension ?
>>>>>> If you reduce Dim, the time / resources to build the cube and final
>>>>>> size of
>>>>>> it decrease --> Its good
>>>>>>
>>>>>> An example can be DIM_Persons: Id_Person , Name, Surname, Address,
>>>>>> .
>>>>>>Id_Person can be HostColumn
>>>>>> and other columns can be calculated from ID --> are Extended
>>>>>> Column
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2016-11-30 11:35 GMT+01:00 仇同心 :
>>>>>>
>>>>>> > Hi ,all
>>>>>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although
>>>>>> I saw
>>>>>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>>>>>> > What,s the means about parameters of “Host Column” and “Extended
>>>>>> Column”?
>>>>>> > Why use this expression,and what aspects of optimization that this
>>>>>> > expression solved?
>>>>>> > Can be combined with a SQL statement to explain?
>>>>>> >
>>>>>> >
>>>>>> > Thanks~
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> With Warm regards
>>>>>
>>>>> Yiming Liu (刘一鸣)
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> With Warm regards
>>>
>>> Yiming Liu (刘一鸣)
>>>
>>
>>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>


Re: corrupt metastore

2016-12-02 Thread Alberto Ramón
jaja,

I think I loss some  HBase data (hbck found some error)
The Kylin Log is OK, make a clean start-up, but after data cube isn't
accesible

I tried use metastore.sh to delete all Kylin data (Clean and reset options,
I don't know what is the difference)

OK, I redeploy new system, Isn't problem
And perhaps I need scheduler a "metastore.sh backup"

Thanks ¡¡

2016-12-02 10:37 GMT+01:00 ShaoFeng Shi :

> There is no such a check tool/command today; Kylin metadata's availability
> relies on HBase and HDFS's replication; If the HBase and HDFS no data loss,
> then Kylin has no data loss;  You can watch the kylin.log during the
> startup, Kylin will report error when finding inconsistent metadata.
>
> 2016-12-02 15:23 GMT+08:00 Alberto Ramón :
>
>> yes, yes,
>> I had this type of problems, I needed used
>>   hdfs fsck
>>   hbase hbck
>> And solved all problems. --> pehaps some data has been lost
>>
>> The nex steps will be:
>> -  check metadata of Kylin
>> -  check consistence between metadata and Kylin's tables
>>
>>
>> But I don't know if there is some tools/commands to do this
>> I saw metadata.sh script, but I cant find this functionality
>>
>>
>>
>> 2016-12-02 2:46 GMT+01:00 ShaoFeng Shi :
>>
>>> Hi Alberto, It looks like the HBase service is in trouble, please check
>>> it firstly;
>>>
>>> 2016-12-02 8:03 GMT+08:00 Alberto Ramón :
>>>
>>>> I had some problems with corrupt data on HDFS and Meta HDFS
>>>> Now all services started OK
>>>>
>>>> *None query is excuted in none cube *
>>>> *Error while executing SQL "select part_dt, sum(price) as total_selled,
>>>> count(distinct seller_id) as sellers from kylin_sales group by part_dt
>>>> order by part_dt LIMIT 5":
>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>>>> attempts=5, exceptions: Fri Dec 02 07:31:07 GMT+08:00 2016,
>>>> org.apache.hadoop.hbase.client.RpcRetryingCaller@6cb60fb6,
>>>> com.google.protobuf.InvalidProtocolBufferException:
>>>> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
>>>> had invalid wire type. at
>>>> com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
>>>> at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom*
>>>>
>>>>
>>>> *I tried to rebuild cube, but:*
>>>>
>>>>
>>>>
>>>>
>>>> *Could not read JSON: Can not construct instance of long from String
>>>> value '2000-12-07 06:30:00': not a valid Long value at [Source:
>>>> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
>>>> 21] (through reference chain:
>>>> org.apache.kylin.rest.request.JobBuildRequest["startTime"]); nested
>>>> exception is com.fasterxml.jackson.databind.exc.InvalidFormatException: Can
>>>> not construct instance of long from String value '2000-12-07 06:30:00': not
>>>> a valid Long value at [Source:
>>>> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
>>>> 21] (through reference chain:
>>>> org.apache.kylin.rest.request.JobBuildRequest["startTime"]*
>>>>
>>>> *Some idea? I'm trying to metastore.sh, there is some check tool?*
>>>> 2016-12-01 16:21:34,162 ERROR [pool-7-thread-1] dao.ExecutableDao:148 :
>>>> error get all Jobs:
>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>>>> attempts=6, exceptions:
>>>> Fri Dec 02 05:21:34 GMT+08:00 2016, null, java.net
>>>> .SocketTimeoutException: callTimeout=6, callDuration=122823: row
>>>> '/execute/' on table 'kylin_metadata' at region=kylin_metadata,,1477759
>>>> 808710.faab4c9
>>>> 88f06f17d9e903068db5b3b81., hostname=amb0.mycorp.kom,60020,1480614855596,
>>>> seqNum=1664
>>>>
>>>> at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepl
>>>> icas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:262)
>>>> at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.c
>>>> all(ScannerCallableWithReplicas.java:199)
>>>>
>>>> Caused by: java.net.SocketTimeoutException: callTimeout=6,
>>>> callDuration=122823: row '/execute/' on table 'kylin_metadata' at
>>>> region=kylin_metadata,,1477759808710.faab4c988f06f17d9e903068db5b3b81.
>>>>
>>>> *(re-deploy all isn't a problem, is only for knowledge)*
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: corrupt metastore

2016-12-01 Thread Alberto Ramón
yes, yes,
I had this type of problems, I needed used
  hdfs fsck
  hbase hbck
And solved all problems. --> pehaps some data has been lost

The nex steps will be:
-  check metadata of Kylin
-  check consistence between metadata and Kylin's tables


But I don't know if there is some tools/commands to do this
I saw metadata.sh script, but I cant find this functionality



2016-12-02 2:46 GMT+01:00 ShaoFeng Shi :

> Hi Alberto, It looks like the HBase service is in trouble, please check it
> firstly;
>
> 2016-12-02 8:03 GMT+08:00 Alberto Ramón :
>
>> I had some problems with corrupt data on HDFS and Meta HDFS
>> Now all services started OK
>>
>> *None query is excuted in none cube *
>> *Error while executing SQL "select part_dt, sum(price) as total_selled,
>> count(distinct seller_id) as sellers from kylin_sales group by part_dt
>> order by part_dt LIMIT 5":
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>> attempts=5, exceptions: Fri Dec 02 07:31:07 GMT+08:00 2016,
>> org.apache.hadoop.hbase.client.RpcRetryingCaller@6cb60fb6,
>> com.google.protobuf.InvalidProtocolBufferException:
>> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
>> had invalid wire type. at
>> com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
>> at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom*
>>
>>
>> *I tried to rebuild cube, but:*
>>
>>
>>
>>
>> *Could not read JSON: Can not construct instance of long from String
>> value '2000-12-07 06:30:00': not a valid Long value at [Source:
>> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
>> 21] (through reference chain:
>> org.apache.kylin.rest.request.JobBuildRequest["startTime"]); nested
>> exception is com.fasterxml.jackson.databind.exc.InvalidFormatException: Can
>> not construct instance of long from String value '2000-12-07 06:30:00': not
>> a valid Long value at [Source:
>> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
>> 21] (through reference chain:
>> org.apache.kylin.rest.request.JobBuildRequest["startTime"]*
>>
>> *Some idea? I'm trying to metastore.sh, there is some check tool?*
>> 2016-12-01 16:21:34,162 ERROR [pool-7-thread-1] dao.ExecutableDao:148 :
>> error get all Jobs:
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>> attempts=6, exceptions:
>> Fri Dec 02 05:21:34 GMT+08:00 2016, null, java.net.SocketTimeoutException:
>> callTimeout=6, callDuration=122823: row '/execute/' on table
>> 'kylin_metadata' at region=kylin_metadata,,1477759808710.faab4c9
>> 88f06f17d9e903068db5b3b81., hostname=amb0.mycorp.kom,60020,1480614855596,
>> seqNum=1664
>>
>> at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepl
>> icas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:262)
>> at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.c
>> all(ScannerCallableWithReplicas.java:199)
>>
>> Caused by: java.net.SocketTimeoutException: callTimeout=6,
>> callDuration=122823: row '/execute/' on table 'kylin_metadata' at
>> region=kylin_metadata,,1477759808710.faab4c988f06f17d9e903068db5b3b81.
>>
>> *(re-deploy all isn't a problem, is only for knowledge)*
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


corrupt metastore

2016-12-01 Thread Alberto Ramón
I had some problems with corrupt data on HDFS and Meta HDFS
Now all services started OK

*None query is excuted in none cube *
*Error while executing SQL "select part_dt, sum(price) as total_selled,
count(distinct seller_id) as sellers from kylin_sales group by part_dt
order by part_dt LIMIT 5":
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=5, exceptions: Fri Dec 02 07:31:07 GMT+08:00 2016,
org.apache.hadoop.hbase.client.RpcRetryingCaller@6cb60fb6,
com.google.protobuf.InvalidProtocolBufferException:
com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
had invalid wire type. at
com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom*


*I tried to rebuild cube, but:*




*Could not read JSON: Can not construct instance of long from String value
'2000-12-07 06:30:00': not a valid Long value at [Source:
org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
21] (through reference chain:
org.apache.kylin.rest.request.JobBuildRequest["startTime"]); nested
exception is com.fasterxml.jackson.databind.exc.InvalidFormatException: Can
not construct instance of long from String value '2000-12-07 06:30:00': not
a valid Long value at [Source:
org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
21] (through reference chain:
org.apache.kylin.rest.request.JobBuildRequest["startTime"]*

*Some idea? I'm trying to metastore.sh, there is some check tool?*
2016-12-01 16:21:34,162 ERROR [pool-7-thread-1] dao.ExecutableDao:148 :
error get all Jobs:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=6, exceptions:
Fri Dec 02 05:21:34 GMT+08:00 2016, null, java.net.SocketTimeoutException:
callTimeout=6, callDuration=122823: row '/execute/' on table
'kylin_metadata' at region=kylin_metadata,,1477759808710.faab4c9
88f06f17d9e903068db5b3b81., hostname=amb0.mycorp.kom,60020,1480614855596,
seqNum=1664

at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:262)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:199)

Caused by: java.net.SocketTimeoutException: callTimeout=6,
callDuration=122823: row '/execute/' on table 'kylin_metadata' at
region=kylin_metadata,,1477759808710.faab4c988f06f17d9e903068db5b3b81.

*(re-deploy all isn't a problem, is only for knowledge)*


Re: Consulting "EXTENDED_COLUMN"

2016-12-01 Thread Alberto Ramón
Nice Liu

We have some cases like
DayWeekTXT , DayWeekID
MonthTXT, MonthID

small proposal:
Can would be interesting create Derived with 1:1 relation, with support for
filters and Group by

2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu :

> The cost of joint dimension compared with extended column is you have more
> columns in the HBase rowkey. It may harm the query performance. But most
> time, joint dimension is still recommended, since the normal dimension
> column supports much more functions than extended column, such as count(*).
>
> 2016-12-01 17:07 GMT+08:00 Alberto Ramón :
>
>> Hello
>> I was preparing a email with related doubts:
>>
>> Some times we have derived dimensions with relation 1:1, examples:
>> WeekDayID & WeekDayTxt
>> MonthID & WeekTxt
>>
>> SOL1: Derived.  ID as Host and Txt Extended
>> PB: You can't filter / Group by Txt
>>
>> SOL2: Joint. Define tuples of ID & TXT
>> Some PB/limitation?  (I need test this option)
>>
>> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu :
>>
>>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>>> used for representation, but not filtering or grouping which is  done by
>>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>>> key/value map against the HOST_COLUMN.
>>>
>>> If the value in EXTENDED_COLUMN is not long, you could just define two
>>> dimensions with joint dimension setting, it has almost the same performance
>>> impact with EXTENDED_COLUMN which reduces one dimension, but better
>>> understanding.
>>>
>>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón :
>>>
>>>> This will help you
>>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>>>
>>>> The idea is always, How I can reduce the number of Dimension ?
>>>> If you reduce Dim, the time / resources to build the cube and final
>>>> size of
>>>> it decrease --> Its good
>>>>
>>>> An example can be DIM_Persons: Id_Person , Name, Surname, Address, .
>>>>Id_Person can be HostColumn
>>>> and other columns can be calculated from ID --> are Extended Column
>>>>
>>>>
>>>>
>>>>
>>>> 2016-11-30 11:35 GMT+01:00 仇同心 :
>>>>
>>>> > Hi ,all
>>>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although I
>>>> saw
>>>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>>>> > What,s the means about parameters of “Host Column” and “Extended
>>>> Column”?
>>>> > Why use this expression,and what aspects of optimization that this
>>>> > expression solved?
>>>> > Can be combined with a SQL statement to explain?
>>>> >
>>>> >
>>>> > Thanks~
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> With Warm regards
>>>
>>> Yiming Liu (刘一鸣)
>>>
>>
>>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>


Re: User MailList

2016-12-01 Thread Alberto Ramón
Nice ¡¡
Will be very helpfull to find similar problems

2016-12-01 13:31 GMT+01:00 Luke Han :

> already working on that
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>
>
>
>
> On Thu, Dec 1, 2016 at 5:15 PM +0800, "Alberto Ramón" <
> a.ramonporto...@gmail.com> wrote:
>
> Small Proposal:
>>
>> Dev mailList is in Nabble (more practical than mail-archives.apache.org:
>> You can find by txt, see pictures and more readable)
>>
>> Is it possible make the same with UserList ?
>>
>> (nowadays, a lot of user's doubts are in Dev MailList or in both)
>>
>


User MailList

2016-12-01 Thread Alberto Ramón
Small Proposal:

Dev mailList is in Nabble (more practical than mail-archives.apache.org:
You can find by txt, see pictures and more readable)

Is it possible make the same with UserList ?

(nowadays, a lot of user's doubts are in Dev MailList or in both)


Re: Consulting "EXTENDED_COLUMN"

2016-12-01 Thread Alberto Ramón
Hello
I was preparing a email with related doubts:

Some times we have derived dimensions with relation 1:1, examples:
WeekDayID & WeekDayTxt
MonthID & WeekTxt

SOL1: Derived.  ID as Host and Txt Extended
PB: You can't filter / Group by Txt

SOL2: Joint. Define tuples of ID & TXT
Some PB/limitation?  (I need test this option)

2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu :

> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only used
> for representation, but not filtering or grouping which is  done by
> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
> key/value map against the HOST_COLUMN.
>
> If the value in EXTENDED_COLUMN is not long, you could just define two
> dimensions with joint dimension setting, it has almost the same performance
> impact with EXTENDED_COLUMN which reduces one dimension, but better
> understanding.
>
> 2016-11-30 19:00 GMT+08:00 Alberto Ramón :
>
>> This will help you
>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>
>> The idea is always, How I can reduce the number of Dimension ?
>> If you reduce Dim, the time / resources to build the cube and final size
>> of
>> it decrease --> Its good
>>
>> An example can be DIM_Persons: Id_Person , Name, Surname, Address, .
>>Id_Person can be HostColumn
>> and other columns can be calculated from ID --> are Extended Column
>>
>>
>>
>>
>> 2016-11-30 11:35 GMT+01:00 仇同心 :
>>
>> > Hi ,all
>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although I
>> saw
>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>> > What,s the means about parameters of “Host Column” and “Extended
>> Column”?
>> > Why use this expression,and what aspects of optimization that this
>> > expression solved?
>> > Can be combined with a SQL statement to explain?
>> >
>> >
>> > Thanks~
>> >
>>
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>


Re: ODBC 1.6

2016-11-28 Thread Alberto Ramón
Issue if you have a "dirty system" with previous version: ODBC 1.5

tested on:
- Win7 ultimate (new system) and tableau 10 ==> OK
- Win7 ultimate (new system) and PowerBI 2.4 ==> Error in preview  TB Kylin
Category (the other 2 works OK)
[image: Imágenes integradas 1]


The load data, fails in all tables:
[image: Imágenes integradas 2]

I attached 2 log:
log of kylin driver: no error founds on it
log of PowerBI: with error

Tip: Error respect ODBC Driver 1.5 is different

2016-11-28 15:54 GMT+01:00 Dong Li :

> Hello Alberto,
>
> Thanks very much for your feedback.
>
> There're some packaging issues in previous build. We've uploaded a new
> build.
> Please go to download page and find the latest ODBC Drive 1.6. Thanks.
>
> Thanks,
> Dong Li
>
> Thanks,
> Dong Li
>
> 2016-11-28 22:28 GMT+08:00 Alberto Ramón :
>
>> Nice ¡¡
>> Tell me and I will re-check in my Windows
>>
>> (I tried to install C++ 2013, c++2015 and c++2015 update 4,  With the
>> same negative result)
>>
>> 2016-11-28 15:24 GMT+01:00 ShaoFeng Shi :
>>
>>> I see; We will check and upload a new build soon. will update here once
>>> finished.
>>>
>>> 2016-11-28 22:17 GMT+08:00 Alberto Ramón :
>>>
>>>> x64, all system are 64 bits (win 7 ultimate, and Win Server 2008 R2)
>>>>
>>>> 2016-11-28 14:58 GMT+01:00 ShaoFeng Shi :
>>>>
>>>>> Hi Alberto, Kylin ODBC zip has two exe files; which one are you
>>>>> installing, the x86 one or x64 one?
>>>>>
>>>>> 2016-11-28 21:51 GMT+08:00 Alberto Ramón :
>>>>>
>>>>>> More Info:
>>>>>>
>>>>>> - Same error in Win Srv 2008R2
>>>>>> - I start ODBC config using: C:\Windows\System32\odbcad32.exe  (to
>>>>>> be sure start ODBC 64 bits version)
>>>>>>
>>>>>> 2016-11-28 14:34 GMT+01:00 Alberto Ramón :
>>>>>>
>>>>>>> Hello
>>>>>>>
>>>>>>> I'm try to test New Kylin ODBC Driver 1.6,
>>>>>>> When I try to create New ODBC, I have this error
>>>>>>> [image: Imágenes integradas 1]
>>>>>>>
>>>>>>> (I tested In two Win7 SP1, I didn't have problems with 1.5)
>>>>>>>
>>>>>>> The dependencies of Ms Visual C++ are the same than old version?
>>>>>>> (C++ 2012)
>>>>>>>
>>>>>>> Also saw the Version identified hasn't been changed: (but is a minor
>>>>>>> problem)
>>>>>>> [image: Imágenes integradas 2]
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>>
>>>>> Shaofeng Shi 史少锋
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
Log start: Mon Nov 28 21:24:11 2016

Log start: Mon Nov 28 21:24:11 2016

[INFO ][2016-11-28.21:24:11]SQLFreeStmt called, 496478464 with option 0
[INFO ][2016-11-28.21:24:11]
[INFO ][2016-11-28.21:24:11]start exec the query: 
[INFO ][2016-11-28.21:24:11]select "PART_DT",

"LEAF_CATEG_ID",

"LSTG_SITE_ID",

"LSTG_FORMAT_NAME",

"PRICE",

"SELLER_ID"

from "DEFAULT"."KYLIN_SALES"
[INFO ][2016-11-28.21:24:11]SQLFreeStmt called, 494018288 with option 0
[INFO ][2016-11-28.21:24:11]
[INFO ][2016-11-28.21:24:11]start exec the query: 
[INFO ][2016-11-28.21:24:11]select "USER_DEFINED_FIELD1",

"USER_DEFINED_FIELD3",

"UPD_DATE",

"UPD_USER",

"LEAF_CATEG_ID",

"SITE_ID",

"META_CATEG_NAME",

"CATEG_LVL2_NAME",

"CATEG_LVL3_NAME"

from "DEFAULT"."KYLIN_CATEGORY_GROUPINGS"
Log start: Mon Nov 28 21:24:11 2016

[INFO ][2016-11-28.21:24:11]Successfully done executing the query
[INFO ][2016-11-28.21:24:11]SQLFreeHandle called, Handle Type: 3, Handle: 494018288
[INFO ][2016-11-28.21:24:11]SQLDisconnect called
[INFO ][2016-11-28.21:24:11]SQLFreeHandle called, Handle Type: 2, Handle: 494059776
[INFO ][2016-11-28.21:24:11]SQLFreeHandle called, Handle Type: 1, Handle: 494176720
[INFO ][2016-11-28.21:24:11]SQLFreeStmt call

Re: ODBC 1.6

2016-11-28 Thread Alberto Ramón
Nice ¡¡
Tell me and I will re-check in my Windows

(I tried to install C++ 2013, c++2015 and c++2015 update 4,  With the same
negative result)

2016-11-28 15:24 GMT+01:00 ShaoFeng Shi :

> I see; We will check and upload a new build soon. will update here once
> finished.
>
> 2016-11-28 22:17 GMT+08:00 Alberto Ramón :
>
>> x64, all system are 64 bits (win 7 ultimate, and Win Server 2008 R2)
>>
>> 2016-11-28 14:58 GMT+01:00 ShaoFeng Shi :
>>
>>> Hi Alberto, Kylin ODBC zip has two exe files; which one are you
>>> installing, the x86 one or x64 one?
>>>
>>> 2016-11-28 21:51 GMT+08:00 Alberto Ramón :
>>>
>>>> More Info:
>>>>
>>>> - Same error in Win Srv 2008R2
>>>> - I start ODBC config using: C:\Windows\System32\odbcad32.exe  (to be
>>>> sure start ODBC 64 bits version)
>>>>
>>>> 2016-11-28 14:34 GMT+01:00 Alberto Ramón :
>>>>
>>>>> Hello
>>>>>
>>>>> I'm try to test New Kylin ODBC Driver 1.6,
>>>>> When I try to create New ODBC, I have this error
>>>>> [image: Imágenes integradas 1]
>>>>>
>>>>> (I tested In two Win7 SP1, I didn't have problems with 1.5)
>>>>>
>>>>> The dependencies of Ms Visual C++ are the same than old version? (C++
>>>>> 2012)
>>>>>
>>>>> Also saw the Version identified hasn't been changed: (but is a minor
>>>>> problem)
>>>>> [image: Imágenes integradas 2]
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: ODBC 1.6

2016-11-28 Thread Alberto Ramón
x64, all system are 64 bits (win 7 ultimate, and Win Server 2008 R2)

2016-11-28 14:58 GMT+01:00 ShaoFeng Shi :

> Hi Alberto, Kylin ODBC zip has two exe files; which one are you
> installing, the x86 one or x64 one?
>
> 2016-11-28 21:51 GMT+08:00 Alberto Ramón :
>
>> More Info:
>>
>> - Same error in Win Srv 2008R2
>> - I start ODBC config using: C:\Windows\System32\odbcad32.exe  (to be
>> sure start ODBC 64 bits version)
>>
>> 2016-11-28 14:34 GMT+01:00 Alberto Ramón :
>>
>>> Hello
>>>
>>> I'm try to test New Kylin ODBC Driver 1.6,
>>> When I try to create New ODBC, I have this error
>>> [image: Imágenes integradas 1]
>>>
>>> (I tested In two Win7 SP1, I didn't have problems with 1.5)
>>>
>>> The dependencies of Ms Visual C++ are the same than old version? (C++
>>> 2012)
>>>
>>> Also saw the Version identified hasn't been changed: (but is a minor
>>> problem)
>>> [image: Imágenes integradas 2]
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: ODBC 1.6

2016-11-28 Thread Alberto Ramón
More Info:

- Same error in Win Srv 2008R2
- I start ODBC config using: C:\Windows\System32\odbcad32.exe  (to be sure
start ODBC 64 bits version)

2016-11-28 14:34 GMT+01:00 Alberto Ramón :

> Hello
>
> I'm try to test New Kylin ODBC Driver 1.6,
> When I try to create New ODBC, I have this error
> [image: Imágenes integradas 1]
>
> (I tested In two Win7 SP1, I didn't have problems with 1.5)
>
> The dependencies of Ms Visual C++ are the same than old version? (C++ 2012)
>
> Also saw the Version identified hasn't been changed: (but is a minor
> problem)
> [image: Imágenes integradas 2]
>
>


ODBC 1.6

2016-11-28 Thread Alberto Ramón
Hello

I'm try to test New Kylin ODBC Driver 1.6,
When I try to create New ODBC, I have this error
[image: Imágenes integradas 1]

(I tested In two Win7 SP1, I didn't have problems with 1.5)

The dependencies of Ms Visual C++ are the same than old version? (C++ 2012)

Also saw the Version identified hasn't been changed: (but is a minor
problem)
[image: Imágenes integradas 2]


Re: Apache Kylin 1.6.0 released

2016-11-28 Thread Alberto Ramón
Heyyy,   :)

There is a new version of Kylin ODBC Driver ¡¡

[image: Imágenes integradas 1]

I will testing with PowerBI
(Last nigth I tried Kylin 1.6.0 & Driver 1.5 with PowerBI and failed)

2016-11-28 8:01 GMT+01:00 ShaoFeng Shi :

> The Apache Kylin team is pleased to announce the immediate availability of
> the 1.6.0 release.
>
> This is a major release after 1.5, with the support for using Apache Kafka
> as data source and many enhancements as well as bug fixes; All of the
> changes can be found in:
> https://kylin.apache.org/docs16/release_notes.html
>
> You can download the source release and binary packages from
> https://www.apache.org/dyn/closer.cgi?path=/kylin/apache-kylin-1.6.0/
>
> More information about the binary packages is on Kylin's download page
> https://kylin.apache.org/download/
>
> Apache Kylin is an open source Distributed Analytics Engine designed to
> provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop,
> supporting extremely large datasets.
>
> Apache Kylin lets you query massive data set at sub-second latency in 3
> steps:
> 1. Identify a Star Schema data on Hadoop.
> 2. Build Cube on Hadoop.
> 3. Query data with ANSI-SQL and get results in sub-second, via ODBC, JDBC
> or RESTful API.
>
> Thanks everyone who have contributed to the 1.6.0 release.
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kylin.apache.org/
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Codec of decimal (10,6)

2016-11-26 Thread Alberto Ramón
@ShaoFeng,  You're right
Queries have same result in Hive and Kylin.

*I only have has a problem with MAX* (don't work. But sum, min, avg works
OK) (apache-kylin-1.6.0-SNAPSHOT-bin RC1)

SELECT
SUM (FACT_VALORACIONES.VALORACION) as SUM_Valoracion
,MIN (FACT_VALORACIONES.VALORACION) as MIN_Valoracion
--,MAX (FACT_VALORACIONES.VALORACION) as MAX_Valoracion
,AVG (FACT_VALORACIONES.VALORACION) as AVG_Valoracion

FROM HERR_BANK.FACT_VALORACIONES as FACT_VALORACIONES
INNER JOIN HERR_BANK.DIM_FECHAS as DIM_FECHAS
ON FACT_VALORACIONES.IDFECHAVALORACION = DIM_FECHAS.IDFECHA
Group by DIM_FECHAS.ANYO


ERROR:
 *Can't find any realization. Please confirm with providers*. SQL digest:
fact table HERR_BANK.FACT_VALORACIONES,group by
[HERR_BANK.DIM_FECHAS.ANYO],filter on [],with aggregates[FunctionDesc
[expression=SUM, parameter=ParameterDesc [type=column, value=VALORACION,
nextParam=null], returnType=null], FunctionDesc [expression=COUNT,
parameter=ParameterDesc [type=column, value=VALORACION, nextParam=null],
returnType=null], FunctionDesc [expression=MIN, parameter=ParameterDesc
[type=column, value=VALORACION, nextParam=null], returnType=null],
FunctionDesc [expression=MAX, parameter=ParameterDesc [type=column,
value=VALORACION, nextParam=null], returnType=null]].


The result must be:
[image: Imágenes integradas 1]

Measure definition:
[image: Imágenes integradas 2]

2016-11-26 8:20 GMT+01:00 ShaoFeng Shi :

> Hi Alberto,
>
> User need aware that Cube only has aggregated data, no raw data; at the
> very begining Kylin will throw error on query like "select * "; but to
> provide a better user experience (also to support some BI tools which need
> load a subset data to warm up), Kylin answers such query from the base
> cuboid (group by all dimensions). The measure column value will be the
> aggregated value; So user could not directly compare the "select *" result
> from a cube with the source data. If you're comparing the aggregated
> queries, I believe they are totally the same.
>
>
>
> 2016-11-26 4:39 GMT+08:00 Alberto Ramón :
>
>> I have a super-Fact Table with 5 rows
>> [image: Imágenes integradas 2]
>>
>>
>> A- Data in CSV == Hive (OK)
>> B- Select * from Fact, in Kylin some values are different
>>
>> The value 9942758, has been  transformed in 10937033.8 !!!
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Testing 1.6

2016-11-26 Thread Alberto Ramón
JaJa Thanks
I will test immediately

2016-11-26 13:02 GMT+01:00 ShaoFeng Shi :

> @Yang, I didn't see the function of "auto refresh" on 1.6.0; also not
> found in JIRA; are you sure it has been implemented?
>
> @Alberto, the upgrade guide for 1.6.0 has been updated in
> https://kylin.apache.org/docs16/howto/howto_upgrade.html , FYI
>
> 2016-11-26 19:36 GMT+08:00 Alberto Ramón :
>
>> I'm re-testing auto refreshing Job Progress , and not work in my case.
>> I used Firefox and Chromium on Ubuntu 16.04
>> Isn't important for me because, the refresh button works OK
>> [image: Imágenes integradas 1]
>>
>> 2016-11-18 4:30 GMT+01:00 ShaoFeng Shi :
>>
>>> Sure, I will add a section in the "How to upgrade" page for v1.6.0
>>>
>>> 2016-11-18 11:21 GMT+08:00 Li Yang :
>>>
>>>> The auto refreshing job progress is a new feature in 1.6. Earlier
>>>> version won't auto refresh. Maybe wipe out browser cache and try again?
>>>>
>>>> The 1.6 metadata is compatible with previous version. The upgrade shall
>>>> be pretty straightforward. But you are right. It deserves a document.
>>>>
>>>> @Shaofeng, consider an upgrade guide next time.
>>>>
>>>> Cheers
>>>> Yang
>>>>
>>>> On Wed, Nov 16, 2016 at 3:02 AM, Alberto Ramón <
>>>> a.ramonporto...@gmail.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> - When you are building Cube, the web is not auto-refresh of Web page,
>>>>> was there the old behavior of older versions ?   (I use Chrome)
>>>>>
>>>>> - There isn't doc about migration from Old Version
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Testing 1.6

2016-11-26 Thread Alberto Ramón
I'm re-testing auto refreshing Job Progress , and not work in my case.
I used Firefox and Chromium on Ubuntu 16.04
Isn't important for me because, the refresh button works OK
[image: Imágenes integradas 1]

2016-11-18 4:30 GMT+01:00 ShaoFeng Shi :

> Sure, I will add a section in the "How to upgrade" page for v1.6.0
>
> 2016-11-18 11:21 GMT+08:00 Li Yang :
>
>> The auto refreshing job progress is a new feature in 1.6. Earlier version
>> won't auto refresh. Maybe wipe out browser cache and try again?
>>
>> The 1.6 metadata is compatible with previous version. The upgrade shall
>> be pretty straightforward. But you are right. It deserves a document.
>>
>> @Shaofeng, consider an upgrade guide next time.
>>
>> Cheers
>> Yang
>>
>> On Wed, Nov 16, 2016 at 3:02 AM, Alberto Ramón > > wrote:
>>
>>> Hi
>>>
>>> - When you are building Cube, the web is not auto-refresh of Web page,
>>> was there the old behavior of older versions ?   (I use Chrome)
>>>
>>> - There isn't doc about migration from Old Version
>>>
>>>
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Codec of decimal (10,6)

2016-11-25 Thread Alberto Ramón
I have a super-Fact Table with 5 rows
[image: Imágenes integradas 2]


A- Data in CSV == Hive (OK)
B- Select * from Fact, in Kylin some values are different

The value 9942758, has been  transformed in 10937033.8 !!!


Re: KYLIN throws null exception when execute specific query

2016-11-25 Thread Alberto Ramón
Hello



I tested your querie on "apache-kylin-1.6.0-SNAPSHOT-bin" and also fails
:(

I tried with some changes
SELECT count (1)
FROM
(
SELECT  PART_DT as PART_DT1, COUNT(1) as vv1
FROM KYLIN_SALES
GROUP BY PART_DT
) AS t1
JOIN
(

SELECT  PART_DT as PART_DT2, COUNT(1) as vv2
FROM KYLIN_SALES
GROUP BY PART_DT
) AS t2
ON (t1.PART_DT1 = t2.PART_DT2)
GROUP BY t1.PART_DT1,t2.vv2,t1.vv1, t2.PART_DT2


and also fails
work fine if you remove group by and do select *







2016-11-25 7:21 GMT+01:00 林豪(linhao)-技术产品中心 :

> Hello,
>
>
>
> I try to do some analyze using KYLIN, but it throws null exception when
> executing query. The query tries to calculate VV ratio of condition A and
> condition B. For simplicity, I rewrite a minimum query on ‘learn_kylin’
> project and remove the condition, so it can be reproduced easily.
>
>
>
> KYLIN Version: apache-kylin-1.5.3 for HBase 0.98
>
>
>
> *Query:*
>
> SELECT t1.PART_DT, SUM(t2.vv) / SUM(t1.vv) AS vv_rate
>
> FROM
>
> (
>
> SELECT  PART_DT, COUNT(1) as vv
>
> FROM KYLIN_SALES
>
> GROUP BY PART_DT
>
> ) AS t1
>
> JOIN
>
> (
>
> SELECT  PART_DT, COUNT(1) as vv
>
> FROM KYLIN_SALES
>
> GROUP BY PART_DT
>
> ) AS t2
>
> ON (t1.PART_DT = t2.PART_DT)
>
> GROUP BY t1.PART_DT
>
>
>
> *Error on web page:*
>
> Error while executing SQL “SELECT …” null
>
>
>
> *KYLIN Server LOG:*
>
> null
>
> at org.apache.calcite.avatica.Helper.createException(Helper.
> java:56)
>
> at org.apache.calcite.avatica.Helper.createException(Helper.
> java:41)
>
> at org.apache.calcite.avatica.AvaticaStatement.executeInternal(
> AvaticaStatement.java:143)
>
> at org.apache.calcite.avatica.AvaticaStatement.executeQuery(
> AvaticaStatement.java:186)
>
> at org.apache.kylin.rest.service.QueryService.execute(
> QueryService.java:366)
>
> at org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(
> QueryService.java:278)
>
> at org.apache.kylin.rest.service.QueryService.query(
> QueryService.java:121)
>
> at org.apache.kylin.rest.service.QueryService$$
> FastClassByCGLIB$$4957273f.invoke()
>
> …..
>
> Caused by: java.lang.NullPointerException
>
> at org.apache.kylin.query.relnode.OLAPAggregateRel.
> translateAggregation(OLAPAggregateRel.java:268)
>
> at org.apache.kylin.query.relnode.OLAPAggregateRel.
> implementRewrite(OLAPAggregateRel.java:240)
>
> at org.apache.kylin.query.relnode.OLAPRel$
> RewriteImplementor.visitChild(OLAPRel.java:121)
>
> at org.apache.kylin.query.relnode.OLAPProjectRel.implementRewrite(
> OLAPProjectRel.java:233)
>
> at org.apache.kylin.query.relnode.OLAPRel$
> RewriteImplementor.visitChild(OLAPRel.java:121)
>
> at org.apache.kylin.query.relnode.OLAPLimitRel.
> implementRewrite(OLAPLimitRel.java:101)
>
> at org.apache.kylin.query.relnode.OLAPRel$
> RewriteImplementor.visitChild(OLAPRel.java:121)
>
> at org.apache.kylin.query.relnode.OLAPToEnumerableConverter.
> implement(OLAPToEnumerableConverter.java:95)
>
> at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.
> implementRoot(EnumerableRelImplementor.java:102)
>


Re: Release apache-kylin-1.6.0 (RC2)

2016-11-24 Thread Alberto Ramón
Left: result of
* my ./build/script/package.sh OK*
Rigth:apache-kylin-1.6.0-SNAPSHOT-bin

hu... the result is not the same.
I need move files manually to generate a bin.tar.gz?

[image: Imágenes integradas 1]


[image: Imágenes integradas 2]

2016-11-24 3:43 GMT+01:00 ShaoFeng Shi :

> Hi Alberto, thanks for the question; the "/build" folder was excluded by
> the assembly tool by mistake I think (the name is too common). I created a
> JIRA (KYLIN-2229) for sovling this. For now please get the "/build" folder
> from Kylin's git repository. Please note, to build a binary package, need
> install maven, npm and grunt at first.
>
> 2016-11-24 5:39 GMT+08:00 Alberto Ramón :
>
>> My vote = Null (I'm rookie)
>>
>> 1 - Using: https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-1.
>> 6.0-rc2/
>> 2 - mvn clean install -DskipTests  -->  OK on Ubuntu 16.04
>>
>> But how create Binary package? :
>> http://kylin.apache.org/development/howto_package.html,  I don't have
>> /build/ folder ?
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Release apache-kylin-1.6.0 (RC2)

2016-11-23 Thread Alberto Ramón
My vote = Null (I'm rookie)

1 - Using:
https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-1.6.0-rc2/
2 - mvn clean install -DskipTests  -->  OK on Ubuntu 16.04

But how create Binary package? :
http://kylin.apache.org/development/howto_package.html,  I don't have
/build/ folder ?


Re: IN_THRESHOLD

2016-11-21 Thread Alberto Ramón
very very clear,
thanks ¡¡

2016-11-18 4:16 GMT+01:00 Li Yang :

> For filter on derived column, it has to translate into a filter on PK.
>
> E.g. say USER_NAME is a derived column (not on cube), USER_ID is its PK
> (on cube). When filter USER_NAME='liyang' comes in, it need to translate
> into USER_ID in (1,211,382), where ID 1, 211, 382 are three users whose
> name is 'liyang'.
>
> Now consider 'liyang' is so common a name that there are thousands of
> 'liyang's. Then the IN clause becomes super long and can cause performance
> problem during storage scanning. In such case, the filter can be translated
> into a range filter instead, like USER_ID between 1 and 382.
>
> The threshold is used to decided whether the translation to return a IN
> condition or a range condition.
>
> Cheers
> Yang
>
> On Wed, Nov 16, 2016 at 12:35 AM, Alberto Ramón  > wrote:
>
>> About Kylin 2193
>> What is the poupose of 
>> org.apache.kylin.storage.translate.DerivedFilterTranslator#
>> IN_THRESHOLD ? :)
>> (when is used?)
>>
>
>


Re: 答复: Re[2]: build cube step 3 error

2016-11-18 Thread Alberto Ramón
Ckeck This , If you have any problem tell me it

https://drive.google.com/drive/folders/0B-6nZ2q-HPTNV0xqRnNtZE03d0E?usp=sharing

2016-11-18 10:51 GMT+01:00 Li Yang :

> Do you (or anyone) still have CDH 5.4 sandbox? We've wanted to test Kylin
> on that env but lack of sandbox for long time.
>
> Yang
>
> On Fri, Nov 18, 2016 at 5:29 PM, ShangYong Li(李尚勇) 
> wrote:
>
>> CDH 5.4.7  hadoop是2.6,是分布式的
>>
>>
>>
>> Thanks
>>
>> ShangYong Li
>>
>>
>>
>> *发件人:* Li Yang [mailto:liy...@apache.org]
>> *发送时间:* 2016年11月18日 17:07
>> *收件人:* user@kylin.apache.org; Serhat Can
>> *主题:* Re: Re[2]: build cube step 3 error
>>
>>
>>
>> It looks like inconsistent hadoop versions to me too.
>>
>> What is your Hadoop version / distribution?
>>
>>
>>
>> On Thu, Nov 17, 2016 at 2:34 PM, Serhat Can  wrote:
>>
>> Another point is if you download and install each component from the
>> Apache Download site, most probably getting this error. I faced once and
>> after that I installed Kylin on Hortonworks and MapR Hadoop distributions
>> it worked fine.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -- Original Message --
>>
>> From: "Billy(Yiming) Liu" 
>>
>> To: "user" 
>>
>> Sent: 11/17/2016 9:25:42 AM
>>
>> Subject: Re: build cube step 3 error
>>
>>
>>
>> Kylin depends on Hadoop 2.6 and Yarn 2.6 API.
>>
>> Please check if the Kylin version match your hadoop distribution.
>>
>>
>>
>> 2016-11-17 11:14 GMT+08:00 ShangYong Li(李尚勇) :
>>
>> Kylin-1.5.4.1 Build cube step 3 error, can anyone please help me
>> understand the issue and how to fix it?
>>
>> java.lang.NoSuchMethodError: org.apache.hadoop.yarn.proto.Y
>> arnProtos$LocalResourceProto.hashLong(J)I
>>
>>  at org.apache.hadoop.yarn.proto.YarnProtos$LocalResourceProto.h
>> ashCode(YarnProtos.java:11655)
>>
>>  at org.apache.hadoop.yarn.api.records.impl.pb.LocalResourcePBIm
>> pl.hashCode(LocalResourcePBImpl.java:62)
>>
>>  at java.util.HashMap.hash(HashMap.java:338)
>>
>>  at java.util.HashMap.put(HashMap.java:611)
>>
>>  at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(
>> LocalDistributedCacheManager.java:133)
>>
>>  at org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobR
>> unner.java:163)
>>
>>  at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRu
>> nner.java:731)
>>
>>  at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(J
>> obSubmitter.java:536)
>>
>>  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
>>
>>  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
>>
>>  at java.security.AccessController.doPrivileged(Native Method)
>>
>>  at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>>  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1671)
>>
>>  at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
>>
>>  at org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForC
>> ompletion(AbstractHadoopJob.java:150)
>>
>>  at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(
>> FactDistinctColumnsJob.java:108)
>>
>>  at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:88)
>>
>>  at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork
>> (MapReduceExecutable.java:120)
>>
>>  at org.apache.kylin.job.execution.AbstractExecutable.execute(
>> AbstractExecutable.java:113)
>>
>>  at org.apache.kylin.job.execution.DefaultChainedExecutable.doWo
>> rk(DefaultChainedExecutable.java:57)
>>
>>  at org.apache.kylin.job.execution.AbstractExecutable.execute(
>> AbstractExecutable.java:113)
>>
>>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRun
>> ner.run(DefaultScheduler.java:136)
>>
>>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>>
>>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>>
>>  at java.lang.Thread.run(Thread.java:745)
>>
>> Thanks
>>
>> ShangYong Li
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>>
>> [image: 图像已被发件人删除。]
>> 
>>
>>
>>
>
>


Testing 1.6

2016-11-15 Thread Alberto Ramón
Hi

- When you are building Cube, the web is not auto-refresh of Web page, was
there the old behavior of older versions ?   (I use Chrome)

- There isn't doc about migration from Old Version


IN_THRESHOLD

2016-11-15 Thread Alberto Ramón
About Kylin 2193
What is the poupose of
org.apache.kylin.storage.translate.DerivedFilterTranslator# IN_THRESHOLD ?
:)
(when is used?)


Re: 答复: Most Used BI Tools

2016-11-07 Thread Alberto Ramón
Hello

I'm try to doc the integration of Kylin with BI Tools (I will add more info
in the next weeks)
(https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain)

As resume:

- Microsoft PowerBI: (Bug KYLIN-2121) It not works
- Qlik: Very partial support (not recommended for production)
- Tableau: Some small issue, ready for production
- Hue: Partial Support, only work with table output (no graphics ), no more
1000 record, no auto complete support
- SQuirrieL: (Isn't a BI tool) Works fine
- Flink: (Isn't a BI tool) Works fine (need v1.2, under development
nowadays)
- Sisense: Fail
- Kylin Caravel: Fail, and nowadays is not maintained


Alb


2016-11-07 9:20 GMT+01:00 仇同心 :

> Can you detailed introduction of the BI tools?
>
>
>
> *发件人:* hongbin ma [mailto:mahong...@apache.org]
> *发送时间:* 2016年11月7日 13:39
> *收件人:* user.kylin
> *主题:* Re: Most Used BI Tools
>
>
>
> nice stuff!
>
>
>
> On Mon, Nov 7, 2016 at 1:09 AM, Alberto Ramón 
> wrote:
>
> fyi:
>
> https://image-store.slidesharecdn.com/0d3c9706-
> b8a6-4716-afc1-26beb82c8704-large.png
>
>
>
>
>
> --
>
> Regards,
>
>
> *Bin Mahone | **马洪宾*
>


Most Used BI Tools

2016-11-06 Thread Alberto Ramón
fyi:

https://image-store.slidesharecdn.com/0d3c9706-b8a6-4716-afc1-26beb82c8704-large.png


Re: Kylin Dependencies

2016-11-03 Thread Alberto Ramón
yes,  I saw 1.200 process
(this can be OK for All in One Docker , for developing or training process)

I showed this to a "Docker Captain"  and (was funny)

(
Docker is some radical , also:
Understand that there is only 1 user process by container , with PID = 1
And only monitorice/check PID = 1
If PID 1 is ok  -->  container is OK
if PID 1 is no OK, --> auto restart container

Which process will be PID 1?  The process that you put CMD in dockerfile
(obviously only can have one CMD per dockerfile)
)

I will try to build a "MIni Kylin", ... I will info you about my progress ¡¡

2016-11-03 6:49 GMT+01:00 ShaoFeng Shi :

> are there so many running processes in the kylin docker image? Although
> Kylin relies on Hive/YARN/HDFS/HBase, they are all client jars instead of
> running services; To minimal the docker image, some components can be
> removed like openssh-server openssh-clients snappy snappy-devel
> hadoop-native (from 1.5.4 kylin doesn't use compression by default)
>
> 2016-11-02 23:15 GMT+08:00 Alberto Ramón :
>
>> yes, I tested (and use) this and other previous version
>>
>> BUT the image :
>>   -  more than > 1000 process
>>   -  more than > 3GB
>>
>> This is OK (very OK) for testing / develop / PoC
>>
>> But for production (docker recomendations):
>>   -  Ideally 1 process (5-10 can be acceptable)
>>   -  < 100 MB (200 - 300 MB can be acceptable)
>>
>> The target is: Create Kylin docker (minimal) *with out install *Hive,
>> YARN, HDFS, or HBase 
>>
>> 2016-11-02 15:52 GMT+01:00 Billy(Yiming) Liu :
>>
>>> Here is a quick start for running Kylin on docker,
>>> https://github.com/kyligence/kylin-docker
>>>
>>> From the docker file, you could find the kylin dependencies.
>>>
>>> 2016-11-02 22:46 GMT+08:00 Alberto Ramón :
>>>
>>>> With configs ... I can try it (Will be an interesting exercise for
>>>> me)
>>>> But libraries, ...
>>>>These libraries can be static compiled on Kylin?
>>>> Any Idea / solution about how to solve all dependecies with out
>>>> install HDFS, Yarn, Hive, HBase in this minimal Linux... ?
>>>>
>>>> the idea is make "minimal linux + Kylin" "to docker it"
>>>> (The result must be few MB, < 150 MB)
>>>>
>>>>
>>>> 2016-11-02 14:20 GMT+01:00 Li Yang :
>>>>
>>>>> Kylin needs Hadoop client library and configs, including hdfs, yarn,
>>>>> hive, hbase.
>>>>>
>>>>> On Sun, Oct 30, 2016 at 1:42 AM, Alberto Ramón <
>>>>> a.ramonporto...@gmail.com> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> Target:
>>>>>>   All Kylin docker are VERY heavy !! (GB and hundred of process) --->
>>>>>> That Is Good for Develop / testing , but BAD Idea for production
>>>>>> I'm trying to install Kylin on minimal linux, ideally Alpine or
>>>>>> similar
>>>>>>
>>>>>> I have:
>>>>>> -  a clean install of linux (minimal Centos for example) , without
>>>>>> Hadoop, and and install Kylin from binary
>>>>>>  - use remote HBase & Hive
>>>>>>
>>>>>>
>>>>>> Which dependencies of Kylin I Will need on my Centos / Alpine?
>>>>>>
>>>>>> BR, Alb
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> With Warm regards
>>>
>>> Yiming Liu (刘一鸣)
>>>
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Kylin Dependencies

2016-11-02 Thread Alberto Ramón
yes, I tested (and use) this and other previous version

BUT the image :
  -  more than > 1000 process
  -  more than > 3GB

This is OK (very OK) for testing / develop / PoC

But for production (docker recomendations):
  -  Ideally 1 process (5-10 can be acceptable)
  -  < 100 MB (200 - 300 MB can be acceptable)

The target is: Create Kylin docker (minimal) *with out install *Hive,
YARN, HDFS, or HBase 

2016-11-02 15:52 GMT+01:00 Billy(Yiming) Liu :

> Here is a quick start for running Kylin on docker, https://github.com/
> kyligence/kylin-docker
>
> From the docker file, you could find the kylin dependencies.
>
> 2016-11-02 22:46 GMT+08:00 Alberto Ramón :
>
>> With configs ... I can try it (Will be an interesting exercise for
>> me)
>> But libraries, ...
>>These libraries can be static compiled on Kylin?
>> Any Idea / solution about how to solve all dependecies with out
>> install HDFS, Yarn, Hive, HBase in this minimal Linux... ?
>>
>> the idea is make "minimal linux + Kylin" "to docker it"
>> (The result must be few MB, < 150 MB)
>>
>>
>> 2016-11-02 14:20 GMT+01:00 Li Yang :
>>
>>> Kylin needs Hadoop client library and configs, including hdfs, yarn,
>>> hive, hbase.
>>>
>>> On Sun, Oct 30, 2016 at 1:42 AM, Alberto Ramón <
>>> a.ramonporto...@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> Target:
>>>>   All Kylin docker are VERY heavy !! (GB and hundred of process) --->
>>>> That Is Good for Develop / testing , but BAD Idea for production
>>>> I'm trying to install Kylin on minimal linux, ideally Alpine or similar
>>>>
>>>> I have:
>>>> -  a clean install of linux (minimal Centos for example) , without
>>>> Hadoop, and and install Kylin from binary
>>>>  - use remote HBase & Hive
>>>>
>>>>
>>>> Which dependencies of Kylin I Will need on my Centos / Alpine?
>>>>
>>>> BR, Alb
>>>>
>>>
>>>
>>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>


Re: Kylin Dependencies

2016-11-02 Thread Alberto Ramón
With configs ... I can try it (Will be an interesting exercise for me)
But libraries, ...
   These libraries can be static compiled on Kylin?
Any Idea / solution about how to solve all dependecies with out install
HDFS, Yarn, Hive, HBase in this minimal Linux... ?

the idea is make "minimal linux + Kylin" "to docker it"
(The result must be few MB, < 150 MB)


2016-11-02 14:20 GMT+01:00 Li Yang :

> Kylin needs Hadoop client library and configs, including hdfs, yarn, hive,
> hbase.
>
> On Sun, Oct 30, 2016 at 1:42 AM, Alberto Ramón 
> wrote:
>
>> Hi
>>
>> Target:
>>   All Kylin docker are VERY heavy !! (GB and hundred of process) --->
>> That Is Good for Develop / testing , but BAD Idea for production
>> I'm trying to install Kylin on minimal linux, ideally Alpine or similar
>>
>> I have:
>> -  a clean install of linux (minimal Centos for example) , without Hadoop,
>> and and install Kylin from binary
>>  - use remote HBase & Hive
>>
>>
>> Which dependencies of Kylin I Will need on my Centos / Alpine?
>>
>> BR, Alb
>>
>
>


  1   2   >