from:"Alberto Ramón"

Re: kylin 作为Grafana 支持的一个数据源

2018-10-16 Thread Alberto Ramón

If your column is by hours, Days, . . . this use case is good for Apache
Kylin
If your column is by TimeStamp, is not the best scenario for Apache Kylin

this means what in the best scenario in Grafana, you will see values
grouped by Hours

On Tue, 16 Oct 2018 at 13:20, 潘博存  wrote:

>
>
>-
>1.Grafana is time-based and needs to wrap the time columns, but that 
> doesn't mean that grafana's data sources are all sequential databases, just 
> as grafana supports MySQL and SQL Server。
>-
>2.In our business scenario, we put more emphasis on Grafan's external 
> presentation capabilities, and in terms of timelines we use our business 
> dates by day, hour, etc.
>
>
>
> So I think grafana + kylin is another form of presentation besides saiku, 
> tableup, and so on. In fact, we're trying to put saiku as a grafan's layout 
> plug-in into grafan for data presentation
>
>
>
>
>
> --
> 发件人：Alberto Ramón 
> 发送时间：2018年10月16日(星期二) 17:31
> 收件人：user 
> 抄 送：潘博存 ; dev 
> 主 题：Re: kylin 作为Grafana 支持的一个数据源
>
> I checked this possibility time ago (2-3years)
> Grafana is focus in time-line series (one column must be TimeStamp)
> Work with TS doesn't sense in A Kylin, because you are not aggregating
>
> On Tue, 16 Oct 2018 at 06:04, ShaoFeng Shi  wrote:
> Good question, let me translate it to English:
>
> Grafana is one of our important data visualization tools; Kylin is a
> powerful tool for big data query, we want to display Kylin data on Grafana,
> is there anyone already running this solution? Is there a grafana-kylin
> plugin that can be used directly? Currently, Grafana doesn't provide a
> plugin for Kylin.
>
> 潘博存  于2018年10月16日周二 上午11:27写道：
>
> hi,all
>大数据可视化这一块，Grafana 是我们的一个重要展现工具，kylin 的快速查询 是大数据查询的利器，我们想在grafana
> 上展示kylin的数据，不知道大家有没有这样使用的？是否有可以直接使用的grafana -kylin 插件.目前grafana
> 是没有kylin的插件的，
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>
>

Re: kylin 作为Grafana 支持的一个数据源

2018-10-16 Thread Alberto Ramón

I checked this possibility time ago (2-3years)
Grafana is focus in time-line series (one column must be TimeStamp)
Work with TS doesn't sense in A Kylin, because you are not aggregating

On Tue, 16 Oct 2018 at 06:04, ShaoFeng Shi  wrote:

> Good question, let me translate it to English:
>
> Grafana is one of our important data visualization tools; Kylin is a
> powerful tool for big data query, we want to display Kylin data on Grafana,
> is there anyone already running this solution? Is there a grafana-kylin
> plugin that can be used directly? Currently, Grafana doesn't provide a
> plugin for Kylin.
>
> 潘博存  于2018年10月16日周二 上午11:27写道：
>
>>
>> hi,all
>>大数据可视化这一块，Grafana 是我们的一个重要展现工具，kylin 的快速查询 是大数据查询的利器，我们想在grafana
>> 上展示kylin的数据，不知道大家有没有这样使用的？是否有可以直接使用的grafana -kylin 插件.目前grafana
>> 是没有kylin的插件的，
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Re: TOPN

2018-09-20 Thread Alberto Ramón

TopN is a predefinied measure in Apache Kylin, furthermore  row_number is
not needed (and isn't recomended because you are in a cube)

"What is the link to the list of functions supported by kylin?" I think you
refer to predefined measures: (Reference:
http://kylin.apache.org/docs/tutorial/create_cube.html) SUM, MAX, MIN, COUNT,
COUNT_DISTINCT, TOP_N, EXTENDED_COLUMN and PERCENTILE

On Thu, 20 Sep 2018 at 04:09, zhengyangju...@163.com 
wrote:

>
> What is the solution for the row_number() function? I need to take the
> group TOPN
>
> I am using kylin 2.4.1. What is the link to the list of functions
> supported by kylin?
>
>
> zhengyangju...@163.com
>

Lattices

2018-03-05 Thread Alberto Ramón

FYI


I’m reading about Apache Calcite Lattices

Have sense expose Kylin as materialized view/cube ?

Implementing data cubes efficiently

About GitBox and Jira

2018-02-13 Thread Alberto Ramón

With the 'old' Jira Emails ( "[*jira] [Created] (KYLIN-3250) ***")

You had the link to Jira, the description/target of it, the status, coments
from people and you can suscribe to your favorites Jiras


Did we lost these emails??? It Was very useful to know the progress of
Apache Kylin


(With the new system, I have a lot of emails, with slices of code that I
don't know for what are useful)

Re: [jira] [Created] (KYLIN-3171) Support hadoop 3 release

2018-01-21 Thread Alberto Ramón

Duplicate ? KYLIN-2565 

On 17 January 2018 at 03:28, Ted Yu (JIRA)  wrote:

> Ted Yu created KYLIN-3171:
> -
>
>  Summary: Support hadoop 3 release
>  Key: KYLIN-3171
>  URL: https://issues.apache.org/jira/browse/KYLIN-3171
>  Project: Kylin
>   Issue Type: Improvement
> Reporter: Ted Yu
>
>
> When compiling against hadoop 3, I got:
> {code}
> [ERROR] Failed to execute goal org.apache.maven.plugins:
> maven-compiler-plugin:3.5.1:compile (default-compile) on project
> kylin-engine-mr: Compilation failure: Compilation  failure:
> [ERROR] /a/kylin/engine-mr/src/main/java/org/apache/kylin/engine/
> mr/common/DefaultSslProtocolSocketFactory.java:[29,36] error: package
> org.apache.commons.httpclient does not   exist
> [ERROR] /a/kylin/engine-mr/src/main/java/org/apache/kylin/engine/
> mr/common/DefaultSslProtocolSocketFactory.java:[30,36] error: package
> org.apache.commons.httpclient does not   exist
> [ERROR] /a/kylin/engine-mr/src/main/java/org/apache/kylin/engine/
> mr/common/DefaultSslProtocolSocketFactory.java:[31,43] error: package
> org.apache.commons.httpclient.params does not exist
> [ERROR] /a/kylin/engine-mr/src/main/java/org/apache/kylin/engine/
> mr/common/DefaultSslProtocolSocketFactory.java:[32,45] error: package
> org.apache.commons.httpclient.protocol   does not exist
> [ERROR] /a/kylin/engine-mr/src/main/java/org/apache/kylin/engine/
> mr/common/DefaultSslProtocolSocketFactory.java:[33,45] error: package
> org.apache.commons.httpclient.protocol   does not exist
> [ERROR] /a/kylin/engine-mr/src/main/java/org/apache/kylin/engine/
> mr/common/DefaultSslProtocolSocketFactory.java:[41,56] error: cannot find
> symbol
> [ERROR]   symbol: class SecureProtocolSocketFactory
> [ERROR] /a/kylin/engine-mr/src/main/java/org/apache/kylin/engine/
> mr/common/DefaultSslProtocolSocketFactory.java:[94,125] error: cannot
> find symbol
> [ERROR]   symbol:   class HttpConnectionParams
> [ERROR]   location: class DefaultSslProtocolSocketFactory
> [ERROR] /a/kylin/engine-mr/src/main/java/org/apache/kylin/engine/
> mr/common/DefaultSslProtocolSocketFactory.java:[94,196] error: cannot
> find symbol
> [ERROR]   symbol:   class ConnectTimeoutException
> [ERROR]   location: class DefaultSslProtocolSocketFactory
> [ERROR] /a/kylin/engine-mr/src/main/java/org/apache/kylin/engine/
> mr/common/DefaultSslProtocolSocketFactory.java:[105,19] error: cannot
> find symbol
> [ERROR]   symbol:   variable ControllerThreadSocketFactory
> [ERROR]   location: class DefaultSslProtocolSocketFactory
> {code}
> We should allow building against hadoop 3.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>

Re: [jira] [Created] (KYLIN-3186) Add support for partitioning columns that combine date and time (e.g. YYYYMMDDHHMISS)

2018-01-21 Thread Alberto Ramón

Could you check this:
https://issues.apache.org/jira/browse/KYLIN-1427

Alb

On 19 January 2018 at 21:33, Vsevolod Ostapenko (JIRA) 
wrote:

> Vsevolod Ostapenko created KYLIN-3186:
> -
>
>  Summary: Add support for partitioning columns that combine
> date and time (e.g. MMDDHHMISS)
>  Key: KYLIN-3186
>  URL: https://issues.apache.org/jira/browse/KYLIN-3186
>  Project: Kylin
>   Issue Type: Improvement
>   Components: General
> Affects Versions: v2.2.0
> Reporter: Vsevolod Ostapenko
>
>
> In a multitude of existing enterprise applications partitioning is done on
> a single column that fuse date and time into a single value (string,
> integer or big integer). Typical formats are MMDDHHMM
> or  MMDDHHMMSS (e.g. 201801181621 and 20180118154734).
> Such representation is human readable and provides natural sorting of the
> date/time values.
>
> Lack of support for such date/time representation requires some ugly
> workarounds, like creating views that split date and time into separate
> columns or data copying into tables with different partitioning scheme,
> none of which is a particularly good solution.
> More over, using views approach on Hive causes severe performance issues,
> due to inability of Hive optimizer correctly analyze filtering conditions
> auto-generated by Kylin during the flat table build step.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>

Re: ODBC connections get error character

2018-01-07 Thread Alberto Ramón

Could you check this KYLIN-2816
 ?

BR, Alb

On 7 January 2018 at 05:21, Wangdp  wrote:

> Hi,
>
>   When the column name is Chinese, connected by odbc in  excel, the column
> name   display problem .
>
>
>
>
> apache_...@163.com
>

Re: [Announce] New Apache Kylin PMC Billy Liu

2017-10-16 Thread Alberto Ramón

Congratuolations to  Bill, Guosheng and Cheng Wang  !!

On 16 October 2017 at 11:33, Luke Han  wrote:

> On behalf of the Apache Kylin PMC, I am very pleased to announce
> that Billy Liu has accepted the PMC's invitation to become a
> PMC member on the project.
>
> We appreciate all of Billy's generous contributions about many bug
> fixes, patches, helped many users. We are so glad to have him to be
> our new PMC and looking forward to his continued involvement.
>
> Congratulations and Welcome, Billy!
>

Re: Re: kylin encode

2017-09-30 Thread Alberto Ramón

I think yes. You have 3 format to choose o column sompose

On 30 September 2017 at 09:21, 崔苗  wrote:

> the time column in our table is timestamp such as 1501210920742,saved as
> bigint in hive,not the date format as -MM-dd HH:mm:ss,so we must change
> the timestamp to date format?
> 在 2017-09-30 16:05:23，"Alberto Ramón"  写道：
> >As resume:
> >
> >-MM-dd / *MMdd* / -MM-dd HH:mm:ss
> >
> >Check this JIRAS:
> >https://issues.apache.org/jira/browse/KYLIN-1101
> >https://issues.apache.org/jira/browse/KYLIN-1441
> >https://issues.apache.org/jira/browse/KYLIN-1427
> >
> >On 30 September 2017 at 04:45, 崔苗  wrote:
> >
> >> we have a  timestamp column used as the partition column,what should be
> >> used to encoed the column,date or time? what's the difference between
> the
> >> two code? BTW, can kylin recognize all the level of timestamp ,no matter
> >> it's s or ms timestamp?
> >>
> >>
> >> thanks in advanced for your reply.
> >>
> >>
> >>
> >>
> >>
> >>
>
>
>

Re: kylin encode

2017-09-30 Thread Alberto Ramón

As resume:

-MM-dd / *MMdd* / -MM-dd HH:mm:ss

Check this JIRAS:
https://issues.apache.org/jira/browse/KYLIN-1101
https://issues.apache.org/jira/browse/KYLIN-1441
https://issues.apache.org/jira/browse/KYLIN-1427

On 30 September 2017 at 04:45, 崔苗  wrote:

> we have a  timestamp column used as the partition column,what should be
> used to encoed the column,date or time? what's the difference between the
> two code? BTW, can kylin recognize all the level of timestamp ,no matter
> it's s or ms timestamp?
>
>
> thanks in advanced for your reply.
>
>
>
>
>
>

Project Level ACL

2017-08-21 Thread Alberto Ramón

About Kylin 2760 (https://issues.apache.org/jira/browse/KYLIN-2760)

I propose small change:

   - Only system admin can create and see connection strings
   - The connection string can't be see for Admin Project


   - Only system admin can add tables to Data Model
   - Then Admin Project is the response of define joins of exited tables


Future:

   - ACL at column level

Re: Leap Month calculate error

2017-08-15 Thread Alberto Ramón

Try to use  DATE statement

http://apache-kylin.74782.x6.nabble.com/about-kylin-sql-key-words-IN-td5908.html

On 15 August 2017 at 08:59, apache_...@163.com  wrote:

> Hi,
>
> when i run sql by kyline GUI,Right result is  2011-02-28,but i got
> 2011-03-01,is bug?
>
> select cast('2011-03-31' as date)  - INTERVAL '1' month from KYLIN_CAL_DT
>
>
>
>
>
> apache_...@163.com
>

Re: cube已经为ready状态，但无法执行查询语句

2017-07-30 Thread Alberto Ramón

"Timeout visiting cube" is because HBase take long time to response to
Apache Kylin

Check: The status of your HBase and if design your cube is adequate to your
query "select part_dt, sum(price)
as total_selled, count(distinct seller_id) as sellers from kylin_sales group
by part_dt order by part_dt" (Example the order of Dim in the RowKey)

On 25 July 2017 at 06:43, zephyrli  wrote:

> All versions are here:
> hadoop: hadoop-2.6.0-cdh5.7.1
> hbase: hbase-1.2.0-cdh5.7.1
> hive: hive-1.1.0-cdh5.7.1
> kylin: 2.0.0
>
> And I just run the sql on sample project, and meet the same problem, could
> you please help me checking it?
>
> There aren't error logs of Hbase region server(not even before or after)，
> logs are like:
>
> 2017-07-25 10:38:04,042 INFO  [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-74] common.KylinConfig: Resetting
> SYS_ENV_INSTANCE by a input stream: 320199326
> 2017-07-25 10:38:04,065 INFO  [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-74] measure.MeasureTypeFactory:
> Checking custom measure types from kylin config
> 2017-07-25 10:38:04,065 INFO  [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-74] measure.MeasureTypeFactory:
> registering COUNT_DISTINCT(hllc), class
> org.apache.kylin.measure.hllc.HLLCMeasureType$Factory
> 2017-07-25 10:38:04,068 INFO  [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-74] measure.MeasureTypeFactory:
> registering COUNT_DISTINCT(bitmap), class
> org.apache.kylin.measure.bitmap.BitmapMeasureType$Factory
> 2017-07-25 10:38:04,073 INFO  [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-74] measure.MeasureTypeFactory:
> registering TOP_N(topn), class
> org.apache.kylin.measure.topn.TopNMeasureType$Factory
> 2017-07-25 10:38:04,074 INFO  [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-74] measure.MeasureTypeFactory:
> registering RAW(raw), class
> org.apache.kylin.measure.raw.RawMeasureType$Factory
> 2017-07-25 10:38:04,075 INFO  [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-74] measure.MeasureTypeFactory:
> registering EXTENDED_COLUMN(extendedcolumn), class
> org.apache.kylin.measure.extendedcolumn.ExtendedColumnMeasureType$Factory
> 2017-07-25 10:38:04,076 INFO  [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-74] measure.MeasureTypeFactory:
> registering PERCENTILE(percentile), class
> org.apache.kylin.measure.percentile.PercentileMeasureType$Factory
> 2017-07-25 10:38:04,100 INFO  [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-74] gridtable.GTScanRequest: pre
> aggregation is not beneficial, skip it
> 2017-07-25 10:38:04,109 INFO  [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-74] endpoint.CubeVisitService: Total
> scanned 1 rows and 200 bytes
> 2017-07-25 10:38:04,110 INFO  [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-74] endpoint.CubeVisitService: Size
> of
> final result = 55 (46 before compressing)
>
>
>
>
> and kylin.log is like this:
>
>
> 2017-07-25 10:38:02,636 DEBUG [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-101] enumerator.OLAPEnumerator:122 :
> return TupleIterator...
> 2017-07-25 10:38:02,636 ERROR [Query
> 58afb8da-5466-45ee-b30a-fa4451b54562-101] service.QueryService:382 :
> Exception when execute sql
> java.sql.SQLException: Error while executing SQL "select part_dt,
> sum(price)
> as total_selled, count(distinct seller_id) as sellers from kylin_sales
> group
> by part_dt order by part_dt
> LIMIT 5": Timeout visiting cube! Check why coprocessor exception is not
> sent back? In coprocessor Self-termination is checked every 100 scanned
> rows, the configured timeout(32400) cannot support this many scans?
> at org.apache.calcite.avatica.Helper.createException(Helper.
> java:56)
> at org.apache.calcite.avatica.Helper.createException(Helper.
> java:41)
> at
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(
> AvaticaStatement.java:156)
> at
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(
> AvaticaStatement.java:218)
> at
> org.apache.kylin.rest.service.QueryService.execute(QueryService.java:562)
> at
> org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(
> QueryService.java:466)
> at
> org.apache.kylin.rest.service.QueryService.query(QueryService.java:153)
> at
> org.apache.kylin.rest.service.QueryService.doQueryWithCache(
> QueryService.java:357)
> at
> org.apache.kylin.rest.controller.QueryController.
> query(QueryController.java:69)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(
> InvocableHandlerMethod.java:221)
> at
> org.springframework.web.method.support.InvocableHandlerMethod.
> invokeForRequest(InvocableHandlerMethod.java:136)
> at
> org.springframework.web

Re: cube计算结果为空

2017-07-30 Thread Alberto Ramón

Check changes in Kylin 2513, 2601 and 2651


2017-07-27 3:08 GMT+01:00 253719...@qq.com <253719...@qq.com>:

>
> kylin v2.0 ，cube在计算（sum）类型为double的一列值以后，查询结果为空，该字段在数据库中查询一切正常，
> 请问有人知道是怎么回事吗？
> --
>
>

Re: kylin service has collapsed frequently

2017-07-30 Thread Alberto Ramón

Queries like "

Error while executing SQL "select count(*) from view_appdata_forkylin where
program_id in catc1484739667844645 and flag=1
", isn't good for Kylin check Kylin 1792
 v1.5.3and this mailList

On 30 July 2017 at 20:40, Li Yang  wrote:

> No obvious reason in log Check out "kylin.out". There could be some
> clues.
>
>
> On Tue, Jul 25, 2017 at 11:18 AM, wangke  wrote:
>
> > hi all,
> >
> > Recently my kylin service has collapsed frequently. Can you help me
> analyze the reasons.
> > Please see the  attachment for more details
> >
> > --
> > Best regards,
> > Wang Ke.
> >
> >
>

Re: Kylin2.0 can support create models or cube by restful ?

2017-07-16 Thread Alberto Ramón

Check this mailList
http://apache-kylin.74782.x6.nabble.com/Re-REST-APIs-for-Create-Update-models-and-cubes-and-selecting-datasources-tt7947.html

On 17 July 2017 at 04:44, 1820983...@qq.com <1820983...@qq.com> wrote:

> Hi,
>
>   pls help:
>
>Kylin2.0 can support create models or cube by restful ? pls share some
> demo code or about info.
>
>
>
> 1820983...@qq.com
>

Re: Query Metadata

2017-07-08 Thread Alberto Ramón

HUE 1.12 and Kylin 1.6.0, works (no path is needed):
[image: Inline images 1]

But have an exception:
An error occurred while calling o.execute. : java.sql.SQLException:
Error while executing SQL "*SHOW DATABASES"*: java.sql.SQLException:
java.io.IOException: POST failed, error code 500 and response: {"url":"
http://172.17.0.2:7070/kylin/api/query","exception":"Not Supported SQL."}
at
org.apache.kylin.jdbc.shaded.org.apache.calcite.avatica.Helper.createException(Helper.java:56)
at
org.apache.kylin.jdbc.shaded.org.apache.calcite.avatica.Helper.createException(Helper.java:41)
at
org.apache.kylin.jdbc.shaded.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:147)
at
org.apache.kylin.jdbc.shaded.org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:199)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at
py4j.Gateway.invoke(Gateway.java:259) at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at
py4j.commands.CallCommand.execute(CallCommand.java:79) at
py4j.GatewayConnection.run(GatewayConnection.java:209) at
java.lang.Thread.run(Thread.java:748) Caused by:
java.lang.RuntimeException: java.sql.SQLException: java.io.IOException:
POST failed, error code 500 and response: {"url":"
http://172.17.0.2:7070/kylin/api/query","exception":"Not Supported SQL."}

work? yes, but you must put the name cube in HUE config.
The connection is fixed to one cube

On 29 August 2016 at 18:10, Alberto Ramón  wrote:

> Hi   (This is ONLY a Idea - suggestion)
>
> I saw some problems integration between Kylin and HUE or Tableau, when try
> to discover metadata info, like: list of databases, list of tables and list
> of columns.
>
> The most clear example: HUE 4011
> <https://issues.cloudera.org/browse/HUE-4011>, only works with
> * Show Databases;
> * Show Tables;
> * Show Columns from X.Y
>
> The result is the same:
>[image: Imágenes integradas 1]
> Other programs, tyr "select * from tb", to return list of columns  you
> know the result of this   ;)
>
> Other programs:
> * start all querires with a "use dbName; select . . ."
> * or try "from dbName.tbName"  (Nowadays there is a bug)
>
>
> Really: are small things... but complicate a lot the integration with
> others Apps
>
>
>
> A lot of thanks for all, Alb
>
>

Re: [jira] [Created] (KYLIN-2679) Report error when a dimension using "dict" encoding and also configured Global dictionary for "distinct_count" measure

2017-06-23 Thread Alberto Ramón

Global dict can be used for Dim now? or only for measures

On 22 June 2017 at 07:06, Shaofeng SHI (JIRA)  wrote:

> Shaofeng SHI created KYLIN-2679:
> ---
>
>  Summary: Report error when a dimension using "dict" encoding
> and also configured Global dictionary for "distinct_count" measure
>  Key: KYLIN-2679
>  URL: https://issues.apache.org/jira/browse/KYLIN-2679
>  Project: Kylin
>   Issue Type: Bug
>   Components: Metadata, Query Engine
> Reporter: Shaofeng SHI
>
>
> This is a problem reported by community user Sonny Heer:
>
> After finally getting the global dictionary to work with building the cube
> there are now exceptions during query.
>
> ERROR in query: "AppendTrieDictionary can't retrive value from id"
>
> The cube has a UHC dimension, which also appeared in a count distinct
> measure. Then a global dictionary is created for it, but Global dictionary
> doesn't support decoding, then there is such an error during query time.
> While this wasn't even checked when the cube is created.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.4.14#64029)
>

JARs in Hadoop 3.0

2017-06-17 Thread Alberto Ramón

Could be this useful to avoid problems with JAR?
or when HBase is in other cluster

https://issues.apache.org/jira/browse/HADOOP-11804

Re: Some questions about Kylin2.0

2017-06-16 Thread Alberto Ramón

About Q2:
I'm agree with you, I think is a *issue*
To start Kylin you must check exists one Source, one Engine and one Storage
system
(for example, is not necesary have Hive and Kafka)

Example Spark


On 16 June 2017 at 13:17, skyyws  wrote:

> For Q3, you can try to make soft links for both hdfs-site.xml and
> mapred-site.xml.
>
> 2017-06-16
>
> skyyws
>
>
>
> 发件人："lxw" 
> 发送时间：2017-06-13 11:41
> 主题：Some questions about Kylin2.0
> 收件人："dev","user"
> 抄送：
>
> Hi,All :
>
>I have some questions about Kylin2.0, and my environment：
> hadoop-2.6.0-cdh5.8.3
> hbase-1.2.0-cdh5.8.3
> apache-kylin-2.0.0-bin-cdh57
> spark-2.1.0-bin-hadoop2.6
>
>
> Q1: Kylin2.0 not support Spark2.0?
>
>  find-spark-dependency.sh：
>  spark_dependency=`find -L $spark_home -name
> 'spark-assembly-[a-z0-9A-Z\.-]*.jar' 
>
>
> Q2: I want to use Kylin2.0 without Spark Cubing, but failed.
>
>
>  kylin.sh：
>  function retrieveDependency() {
>  #retrive $hive_dependency and $hbase_dependency
>  source ${dir}/find-hive-dependency.sh
>  source ${dir}/find-hbase-dependency.sh
>  source ${dir}/find-hadoop-conf-dir.sh
>  source ${dir}/find-kafka-dependency.sh
>  source ${dir}/find-spark-dependency.sh
>
>
>  If not found spark dependencies， Kylin can not start ：
>
>  [hadoop@hadoop10 bin]$ ./kylin.sh start
>  Retrieving hadoop conf dir...
>  KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
>  Retrieving hive dependency...
>  Retrieving hbase dependency...
>  Retrieving hadoop conf dir...
>  Retrieving kafka dependency...
>  Retrieving Spark dependency...
>  spark assembly lib not found.
>
>
>  after modify kylin.sh “source ${dir}/find-spark-dependency.sh”，
> Kylin start success ..
>
>
> Q3： Abount kylin_hadoop_conf_dir ?
>
>  I make some soft link under $KYLIN_HOME/hadoop-conf
> (core-site.xml、yarn-site.xml、hbase-site.xml、hive-site.xml),
>  and set 
> "kylin.env.hadoop-conf-dir=/home/bigdata/kylin/current/hadoop-conf",
> when I execute ./check-env.sh,
>
>
>  [hadoop@hadoop10 bin]$ ./check-env.sh
>  Retrieving hadoop conf dir...
> /home/bigdata/kylin/current/hadoop-conf is override as the
> kylin_hadoop_conf_dir
> KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
> -mkdir: java.net.UnknownHostException: cdh5
> Usage: hadoop fs [generic options] -mkdir [-p]  ...
> Failed to create /kylin20. Please make sure the user has right to
> access /kylin20
>
>
> My HDFS with HA， fs.defaultFS is "cdh5"，when I don't set
> "kylin.env.hadoop-conf-dir", and use HADOOP_CONF_DIR, HIVE_CONF,
> HBASE_CONF_DIR from envionment variables （/etc/profile）， it was correct.
>
>
> Best Regards!
> lxw
>

Re: Re: kylin connnect power bi by odbc

2017-06-13 Thread Alberto Ramón

Hello, Some clarifications

First, The connection ODBC1.6 can be configured check Kylin and PowerBi.odt


Second, There is a bug in Kylin 1.6 Kylin 2235


Third, Older versions of Kylin 1 year ago, worked fine, but actuals no



On 13 June 2017 at 15:37, Billy Liu  wrote:

> I don't know. It's not Kylin's setting page. Your snapshot could not be
> shown. Based on my limited experience on PowerBI, http is supported.
>
> 2017-06-13 19:15 GMT+08:00 cong.xi...@hand-china.com <
> cong.xi...@hand-china.com>:
>
> > this is a bug?when use tableau connect kylin,can select http in
> dialog,but
> > when use power bi,no http
> >
> > --
> > cong.xi...@hand-china.com
> >
> >
> > *From:* Billy Liu 
> > *Date:* 2017-06-13 18:29
> > *To:* cong.xi...@hand-china.com
> > *CC:* dev 
> > *Subject:* Re: Re: kylin connnect power bi by odbc
> > how about HTTP protocol, instead of HTTPS.
> >
> > 2017-06-13 15:54 GMT+08:00 cong.xi...@hand-china.com <
> > cong.xi...@hand-china.com>:
> >
> > > driver={KylinODBCDriver};SERVER=xxx;PORT=7070;
> > > this is my driver
> > > then,occuur a dialog
> > > protocol:https
> > > server host:
> > > port:7070
> > > username:ADMIN
> > > password:KYLIN
> > > I click "connect" button,occuring "username/password not authorized,or
> > > server out of service"
> > > this is version error?
> > >
> > > --
> > > cong.xi...@hand-china.com
> > >
> > >
> > > *From:* Billy Liu 
> > > *Date:* 2017-06-13 15:16
> > > *To:* dev 
> > > *Subject:* Re: kylin connnect power bi by odbc
> > > No picture attached. Could you describe your error in text?
> > >
> > > 2017-06-13 0:18 GMT+08:00 cong.xi...@hand-china.com <
> > > cong.xi...@hand-china.com>:
> > >
> > > > kylin version 1.6
> > > > power bi  v2.46
> > > > odbc driver v1.6
> > > > when kylin connect power bi ,occuring below error,but when kylin
> > connect
> > > > tableau ,this is success!
> > > >
> > > >
> > > >
> > > > --
> > > > cong.xi...@hand-china.com
> > > >
> > >
> > >
> >
> >
>

Re: Do kylin cache cube/segment data elsewhere other then hbase

2017-05-26 Thread Alberto Ramón

FYI.

>From HBase 1.1 you can use MemCache as Block Cache in HBase HBase 13170
 MemCache is write on C
and bypass the Java GC problem in BlockCache --> Fast

Improvement:

HBase 14984  v1.2.0
*hbase.cache.memcached.spy.optimze
=true *

On 26 May 2017 at 10:00, Li Yang  wrote:

> All cube data are in HBase now. HBase mem cache helps to keep recently used
> cube in memory.
>
> On Tue, May 16, 2017 at 5:21 AM, Nirav Patel 
> wrote:
>
> > Do kylin stores cube data anywhere apart from hbase currently? Does it
> > maintain in-memory or cache of say latest data which can be queried
> faster
> > using bitmaps rather then querying with hbase filters.
> >
> > --
> >
> >
> > [image: What's New with Xactly] 
> >
> >   [image: LinkedIn]
> >   [image: Twitter]
> >   [image: Facebook]
> >   [image: YouTube]
> > 
> >
>

Serveral Things

2017-05-22 Thread Alberto Ramón

FYI:

1 - About error:  "snapshot more than 300MB", I think all columns of the
table are loaded in RAM, independently these are used or no , is this
possible ?

2 Apache Flink 1.2 is "ready" to docker, they had some changes to be more
compatible Flink 6369 , Flink
6572 , Flink 4308
,Flink 4326

(This can be "an inspiration" for Kylin )

Re: Kylin multiple cube issue from the same data model

2017-05-19 Thread Alberto Ramón

One query only can be resolved with one cube

On 19 May 2017 at 07:58, elicer  wrote:

> Thanks for your answer
> If I have the same aggregation group in different cubes will it cause
> duplicate results?
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Kylin-multiple-cube-issue-from-the-same-data-
> model-tp8032p8035.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: Kylin multiple cube issue from the same data model

2017-05-18 Thread Alberto Ramón

If there are more than one cube per proyect, the kylin optimizator choose
the best Cube to solve the query

On 19 May 2017 at 06:59, elicer  wrote:

> Hi
> I have some query on create multiple cube from the same data model,  I have
> a cute which build on a existing model(contains multiple aggregation
> group),
> Now I want to modify one of the aggregation group to add one more field, I
> don't want to impact the existing cube, So I created a new cube and only
> modified the impacted aggregation group, and rebuild the whole data with
> the
> new cute. Looks like only one table is available for one model, no matter
> how many cubes one data mode has.
>
> Now the issue is, I am using JDBC to query the kylin data, when I
> initialize
> the jdbc connection, I need to specific the project name only, Then, How
> could I tell kylin, I need to fetch the data from the new cube? Looks like
> both the old and new cube data in the same kylin table only.
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Kylin-multiple-cube-issue-from-the-same-data-model-tp8032.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: Review patch

2017-05-14 Thread Alberto Ramón

I made a Google Doc version (will be some visual differences in Jekyll
version)

With this link
,
you can edit the files

On 14 May 2017 at 10:01, Billy Liu  wrote:

> Hi Alberto,
>
> Could you send out the document on Google Drive for review? The patch is
> not easy to read for long article.
>
> 2017-05-14 16:45 GMT+08:00 Li Yang :
>
>> Forward to dev for wider audience.
>>
>>
>> -- Forwarded message --
>> From: Alberto Ramon 
>> Date: Sat, May 13, 2017 at 5:32 AM
>> Subject: Review patch
>> To: Li Yang 
>>
>>
>> Hi,
>>
>>
>> I made 3 new manuals about Apache Kylin
>> Some body can review this patch
>>  ?
>>
>>
>> Best Regards,
>> Alberto
>>
>
>

Re: REST APIs for Create/Update models and cubes and selecting datasources

2017-05-12 Thread Alberto Ramón

This is another possible solution (in the road map)

On 11 May 2017 at 23:40, Nirav Patel  wrote:

> Sure. Is export/import in that JIRA means GET/CREATE?
>
> On Thu, May 11, 2017 at 3:05 PM, Alberto Ramón 
> wrote:
>
> > (Its not necesary duplicate all Q in dev and user MailList, thanks)
> >
> > http://mail-archives.apache.org/mod_mbox/kylin-user/201609.mbox/%
> > 3C9EEA4677F012024598DC5D7E4D11FD53886EEAAE%40QTAUSC-
> > VPEXC001.quantium.com.au.local%3E
> >
> > https://issues.apache.org/jira/browse/KYLIN-1605
> >
> >
> >
> > On 11 May 2017 at 22:57, Nirav Patel  wrote:
> >
> > > On Thu, May 11, 2017 at 2:55 PM, Nirav Patel 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I see APIs to Build existing cube but I don's see any thing for
> > > > Create/Update of Model and Cubes itself. Is there a way to modify
> cube
> > > > properties (dimensions, measures, refresh settings) programmatically?
> > > >
> > > > Thanks,
> > > > Nirav
> > > >
> > >
> > > --
> > >
> > >
> > > [image: What's New with Xactly] <http://www.xactlycorp.com/
> email-click/>
> > >
> > > <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> > > <https://www.linkedin.com/company/xactly-corporation>  [image:
> Twitter]
> > > <https://twitter.com/Xactly>  [image: Facebook]
> > > <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> > > <http://www.youtube.com/xactlycorporation>
> > >
> >
>
> --
>
>
> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>
> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
> <https://twitter.com/Xactly>  [image: Facebook]
> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> <http://www.youtube.com/xactlycorporation>
>

Re: REST APIs for Create/Update models and cubes and selecting datasources

2017-05-11 Thread Alberto Ramón

(Its not necesary duplicate all Q in dev and user MailList, thanks)

http://mail-archives.apache.org/mod_mbox/kylin-user/201609.mbox/%3C9EEA4677F012024598DC5D7E4D11FD53886EEAAE%40QTAUSC-VPEXC001.quantium.com.au.local%3E

https://issues.apache.org/jira/browse/KYLIN-1605



On 11 May 2017 at 22:57, Nirav Patel  wrote:

> On Thu, May 11, 2017 at 2:55 PM, Nirav Patel 
> wrote:
>
> > Hi,
> >
> > I see APIs to Build existing cube but I don's see any thing for
> > Create/Update of Model and Cubes itself. Is there a way to modify cube
> > properties (dimensions, measures, refresh settings) programmatically?
> >
> > Thanks,
> > Nirav
> >
>
> --
>
>
> [image: What's New with Xactly] 
>
>   [image: LinkedIn]
>   [image: Twitter]
>   [image: Facebook]
>   [image: YouTube]
> 
>

Re: where is the source code kylin-2.0.0-hbase1x

2017-05-04 Thread Alberto Ramón

This can help you: (in 2.0 the branch system ans been changed)

https://issues.apache.org/jira/browse/KYLIN-2413

On 4 May 2017 at 05:36, xl l  wrote:

>  hi,all:
> In http://kylin.apache.org/download/
> apache-kylin-2.0.0-bin-hbase098.tar.gz  source code in:
> https://github.com/apache/kylin/  tag：* kylin-2.0.0-hbase0.98* 。
> But  I can't find   tag :  *apache-kylin-2.0.0-bin-hbase1x * in github
>
> so where  is the source code* apache-kylin-2.0.0-hbase1x*?
>
> which hbase version is tag* kylin-2.0.0 * ?
>
>
>
> --
> * Best Wishes*
>

Re: [Announce] New Apache Kylin committer Zhixiong Chen

2017-04-29 Thread Alberto Ramón

Congratulations  to Roger Shi and  Zhixiong!! (and Dev team for next 2.0
version)

If you are ever near London or Spain, let me know, have beer will be
necesary  :)

2017-04-29 12:47 GMT+01:00 Dong Li :

> Welcome!
>
> Thanks,
> Dong Li
>
>  Original Message
> *Sender:* Li Yang
> *Recipient:* user
> *Cc:* dev; Apache Kylin PMC;
> chen
> *Date:* Saturday, Apr 29, 2017 19:13
> *Subject:* Re: [Announce] New Apache Kylin committer Zhixiong Chen
>
> Welcome Zhixiong!
>
> Yang
>
> On Sat, Apr 29, 2017 at 6:07 PM, Luke Han  wrote:
>
>> On behalf of the Apache Kylin PMC, I am very pleased to announce
>> that Zhixiong Chen has accepted the PMC's invitation to become a
>> committer on the project.
>>
>> We appreciate all of Zhixiong's generous contributions about many bug
>> fixes, patches, helped many users. We are so glad to have him to be
>> our new committer and looking forward to his continued involvement.
>>
>> Congratulations and Welcome, Zhixiong!
>>
>
>

Re: BadQueryDetector (Please ignore the last post)

2017-04-25 Thread Alberto Ramón

Hi

Check this: https://issues.apache.org/jira/browse/KYLIN-1792
yes, like you said, if you have a cube its to use with agregates == group by

2017-04-25 16:12 GMT+01:00 rahulsingh :

> Hi all,
>
> We are using Tableau Desktop 10.2.1 with Kylin 1.6. We got success to
> connect kylin with tableau wia ODBC driver.
> We have one table in kylin having 41 columns and around 5 milion records.
> And we use this table as a data source in tableau.
> When we prepare report in Tableau with some rows and columns, it is firing
> “select * from
> ” query on Kylin. Ideally Tableau should only fire specific group by querys
> based on the reports.
> And because we have so many data in the cube, the “select * from
> ” query on kyline make the Kylin hangup and then it is not responding.
>
> How should we handle this  “select * from
> ” query from Tableau ? Do we have some settings that block this type of *
> query.
>
> Thank You,
> Rahul Singh
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/BadQueryDetector-Please-ignore-the-last-post-tp7770.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: cube building taking 8 hours just for 6 thousands records

2017-04-18 Thread Alberto Ramón

Is this your configuration? (check this properties):

   -

   yarn.nodemanager.resource.memory-mb =150/3
   -

   yarn.nodemanager.resource.cpu-vcores= 36/3

And Check the YARN UI, during build phase (you can see how many Mappers and
reduce, RAM and CPU are you using)

2017-04-18 13:45 GMT+01:00 suresh m :

> sorry it is for all 3 nodes.
>
> each node 55gb ram and 12 cpu are there.
>
> On Tue, Apr 18, 2017 at 6:09 PM, Alberto Ramón 
> wrote:
>
> >  Are these 150 GB and 36 CPUs  configured in?
> >
> >-
> >
> >yarn.nodemanager.resource.memory-mb
> >-
> >
> >yarn.nodemanager.resource.cpu-vcores
> >
> > During Build process, you can open YARN UI how many map-reduce are you
> > using.
> >
> > 2017-04-18 13:21 GMT+01:00 suresh m :
> >
> > > Hi,
> > >
> > > I can see 150gb ram and 36 cpu in our cluster.
> > >
> > > On Tue, Apr 18, 2017 at 12:48 PM, Alberto Ramón <
> > a.ramonporto...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Can you check how many ram and cpu is asigned to yarn?
> > > >
> > > > El 18 abr. 2017 7:40 a. m., "suresh m" 
> > > escribió:
> > > >
> > > > > Hi This is the time consuming step in entire build process.
> > > > >
> > > > > #24 Step Name: Build Cube
> > > > >
> > > > > Please help me to get it resolved.
> > > > >
> > > > > Regards,
> > > > > Suresh
> > > > >
> > > > > On Mon, Apr 17, 2017 at 5:09 PM, 康凯森  wrote:
> > > > >
> > > > > > what's the slowest step in your cube job?
> > > > > > please refer to   http://kylin.apache.org/
> > > docs20/howto/howto_optimize_
> > > > > > build.html
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > -- 原始邮件 --
> > > > > > 发件人: "suresh m";;
> > > > > > 发送时间: 2017年4月17日(星期一) 晚上7:05
> > > > > > 收件人: "dev";
> > > > > >
> > > > > > 主题: cube building taking 8 hours just for 6 thousands records
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > One of my cube taking hours to build even though its having less
> > > > > > records(6000), please provide me some tuning techniques to
> improve
> > > > > > performance in Kylin.
> > > > > >
> > > > > > Regards.
> > > > > > Suresh
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: cube building taking 8 hours just for 6 thousands records

2017-04-18 Thread Alberto Ramón

 Are these 150 GB and 36 CPUs  configured in?

   -

   yarn.nodemanager.resource.memory-mb
   -

   yarn.nodemanager.resource.cpu-vcores

During Build process, you can open YARN UI how many map-reduce are you
using.

2017-04-18 13:21 GMT+01:00 suresh m :

> Hi,
>
> I can see 150gb ram and 36 cpu in our cluster.
>
> On Tue, Apr 18, 2017 at 12:48 PM, Alberto Ramón  >
> wrote:
>
> > Can you check how many ram and cpu is asigned to yarn?
> >
> > El 18 abr. 2017 7:40 a. m., "suresh m" 
> escribió:
> >
> > > Hi This is the time consuming step in entire build process.
> > >
> > > #24 Step Name: Build Cube
> > >
> > > Please help me to get it resolved.
> > >
> > > Regards,
> > > Suresh
> > >
> > > On Mon, Apr 17, 2017 at 5:09 PM, 康凯森  wrote:
> > >
> > > > what's the slowest step in your cube job?
> > > > please refer to   http://kylin.apache.org/
> docs20/howto/howto_optimize_
> > > > build.html
> > > >
> > > >
> > > >
> > > >
> > > > -- 原始邮件 --
> > > > 发件人: "suresh m";;
> > > > 发送时间: 2017年4月17日(星期一) 晚上7:05
> > > > 收件人: "dev";
> > > >
> > > > 主题: cube building taking 8 hours just for 6 thousands records
> > > >
> > > >
> > > >
> > > > Hi All,
> > > >
> > > > One of my cube taking hours to build even though its having less
> > > > records(6000), please provide me some tuning techniques to improve
> > > > performance in Kylin.
> > > >
> > > > Regards.
> > > > Suresh
> > > >
> > >
> >
>

Re: cube building taking 8 hours just for 6 thousands records

2017-04-18 Thread Alberto Ramón

Can you check how many ram and cpu is asigned to yarn?

El 18 abr. 2017 7:40 a. m., "suresh m"  escribió:

> Hi This is the time consuming step in entire build process.
>
> #24 Step Name: Build Cube
>
> Please help me to get it resolved.
>
> Regards,
> Suresh
>
> On Mon, Apr 17, 2017 at 5:09 PM, 康凯森  wrote:
>
> > what's the slowest step in your cube job?
> > please refer to   http://kylin.apache.org/docs20/howto/howto_optimize_
> > build.html
> >
> >
> >
> >
> > -- 原始邮件 --
> > 发件人: "suresh m";;
> > 发送时间: 2017年4月17日(星期一) 晚上7:05
> > 收件人: "dev";
> >
> > 主题: cube building taking 8 hours just for 6 thousands records
> >
> >
> >
> > Hi All,
> >
> > One of my cube taking hours to build even though its having less
> > records(6000), please provide me some tuning techniques to improve
> > performance in Kylin.
> >
> > Regards.
> > Suresh
> >
>

Re: kylin-yarn-ClassNotFoundException

2017-04-09 Thread Alberto Ramón

Hi

I think cdh 5.8 isn't supported yet
See that there is a binary of 5.7 but not 5.8


El 9 abr. 2017 2:10 a. m., "chenping...@keruyun.com" <
chenping...@keruyun.com> escribió:

my environment is CDH 5.8.4  and kylin is  apache-kylin-1.6.0-cdh5.7-
bin.tar.gz
I make the "Quick Start with Sample Cube"
YARN MR2 have the flow exception

2017-04-07 15:06:34,630 ERROR [IPC Server handler 4 on 24149]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
attempt_1490945407994_0217_m_00_1 - exited :
java.lang.ClassNotFoundException:
org.apache.hadoop.hive.serde2.typeinfo.TypeInfo
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at 
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2103)
at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
at 
org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:184)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

my environment have the jar of hive-serde.jar,why ClassNotFoundException？is
this kylin problem？thanks.


--

陈平  DBA工程师



成都时时客科技有限责任公司

地址：成都市高新区天府大道1268号1栋3层

邮编：610041

手机：15108456581 <(510)%20845-6581>

在线：QQ 625852056

官网：www.keruyun.com

客服：4006-315-666

Re: Cube query failing on changing rowkeys column order

2017-04-04 Thread Alberto Ramón

The recommend number of Dim can be 12 or less  (depends on a lot of things)
Keep  in mind that derived dim, are not real Dim because no are included in
the cube --> que queries will be slow than normal dim

http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
http://kylin.apache.org/docs16/gettingstarted/concepts.html



2017-04-04 10:32 GMT+01:00 Shailesh Prajapati :

> Thanks for the reply. Its just not about query performance, my queries are
> not working. I am actually trying with only 50 fact rows. As per my
> understanding, query should not fail with any order of rowkeys.
>
> On Tue, Apr 4, 2017 at 2:46 PM, Alberto Ramón 
> wrote:
>
> > hello, from http://kylin.apache.org/docs16/tutorial/create_cube.html
> >
> > "You can drag & drop a dimension column to adjust its position in rowkey;
> > Put the mandantory dimension at the begining, then followed the
> dimensions
> > that heavily involved in filters (where condition). Put high cardinality
> > dimensions ahead of low cardinality dimensions."
> >
> > Other way to improve query performance is use AGG
> >
> > Good Luck !!
> >
> >
> >
> > 2017-04-04 10:01 GMT+01:00 Shailesh Prajapati :
> >
> > > Hi,
> > >
> > > I am using kylin 1.6 and facing a wired issue with Cube description.
> > > Basically, I have two Cube desc having same dimensions, measures,
> > rowkeys,
> > > aggregation groups. The only difference is the ordering of keys in
> them.
> > > With first cube description, my queries are working and with second i
> am
> > > getting following exception:
> > >
> > >
> > > java.sql.SQLException: Error while executing SQL "SELECT
> > > sum(ss_ext_sales_price) total_sales,
> > > sum(ss_ext_discount_amt) total_discount,
> > > s_store_id,
> > > s_store_name
> > > FROM
> > > store_sales
> > > LEFT JOIN store
> > > ON (store_sales.ss_store_sk = store.s_store_sk)
> > > GROUP BY
> > > s_store_id,
> > > s_store_name
> > > ORDER BY
> > > total_sales,
> > > total_discount,
> > > s_store_id,
> > > s_store_name
> > > LIMIT 5": null
> > > at org.apache.calcite.avatica.Helper.createException(Helper.
> > > java:56)
> > > at org.apache.calcite.avatica.Helper.createException(Helper.
> > > java:41)
> > > at org.apache.calcite.avatica.AvaticaStatement.
> executeInternal(
> > > AvaticaStatement.java:147)
> > > at org.apache.calcite.avatica.AvaticaStatement.executeQuery(
> > > AvaticaStatement.java:208)
> > > at org.apache.kylin.rest.service.QueryService.execute(
> > > QueryService.java:538)
> > > at org.apache.kylin.rest.service.QueryService.
> > queryWithSqlMassage(
> > > QueryService.java:452)
> > > at org.apache.kylin.rest.service.QueryService.query(
> > > QueryService.java:151)
> > > at org.apache.kylin.rest.service.
> QueryService.doQueryWithCache(
> > > QueryService.java:354)
> > > at org.apache.kylin.rest.controller.QueryController.
> > > query(QueryController.java:69)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke(
> > > NativeMethodAccessorImpl.java:62)
> > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > > DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:498)
> > > at org.springframework.web.method.support.
> > InvocableHandlerMethod.
> > > doInvoke(InvocableHandlerMethod.java:221)
> > > at org.springframework.web.method.support.
> > InvocableHandlerMethod.
> > > invokeForRequest(InvocableHandlerMethod.java:136)
> > > at org.springframework.web.servlet.mvc.method.annotation.
> > > ServletInvocableHandlerMethod.invokeAndHandle(
> > > ServletInvocableHandlerMethod.
> > > java:104)
> > > at org.springframework.web.servlet.mvc.method.annotation.
> > > RequestMappingHandlerAdapter.invokeHandleMethod(
> > > RequestMappingHandlerAdapter.java:743)
> > > at org.springframework.web.servlet.mvc.method.annotation.
> > > RequestMappingHandlerAdapter.handleInternal(
> > RequestMappingHandlerAdapter.
> > > java:672)
> > > at org.springframework.web.servlet.mvc.method.
> > > AbstractHandlerMethodAdapter.hand

Re: Cube query failing on changing rowkeys column order

2017-04-04 Thread Alberto Ramón

hello, from http://kylin.apache.org/docs16/tutorial/create_cube.html

"You can drag & drop a dimension column to adjust its position in rowkey;
Put the mandantory dimension at the begining, then followed the dimensions
that heavily involved in filters (where condition). Put high cardinality
dimensions ahead of low cardinality dimensions."

Other way to improve query performance is use AGG

Good Luck !!



2017-04-04 10:01 GMT+01:00 Shailesh Prajapati :

> Hi,
>
> I am using kylin 1.6 and facing a wired issue with Cube description.
> Basically, I have two Cube desc having same dimensions, measures, rowkeys,
> aggregation groups. The only difference is the ordering of keys in them.
> With first cube description, my queries are working and with second i am
> getting following exception:
>
>
> java.sql.SQLException: Error while executing SQL "SELECT
> sum(ss_ext_sales_price) total_sales,
> sum(ss_ext_discount_amt) total_discount,
> s_store_id,
> s_store_name
> FROM
> store_sales
> LEFT JOIN store
> ON (store_sales.ss_store_sk = store.s_store_sk)
> GROUP BY
> s_store_id,
> s_store_name
> ORDER BY
> total_sales,
> total_discount,
> s_store_id,
> s_store_name
> LIMIT 5": null
> at org.apache.calcite.avatica.Helper.createException(Helper.
> java:56)
> at org.apache.calcite.avatica.Helper.createException(Helper.
> java:41)
> at org.apache.calcite.avatica.AvaticaStatement.executeInternal(
> AvaticaStatement.java:147)
> at org.apache.calcite.avatica.AvaticaStatement.executeQuery(
> AvaticaStatement.java:208)
> at org.apache.kylin.rest.service.QueryService.execute(
> QueryService.java:538)
> at org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(
> QueryService.java:452)
> at org.apache.kylin.rest.service.QueryService.query(
> QueryService.java:151)
> at org.apache.kylin.rest.service.QueryService.doQueryWithCache(
> QueryService.java:354)
> at org.apache.kylin.rest.controller.QueryController.
> query(QueryController.java:69)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.springframework.web.method.support.InvocableHandlerMethod.
> doInvoke(InvocableHandlerMethod.java:221)
> at org.springframework.web.method.support.InvocableHandlerMethod.
> invokeForRequest(InvocableHandlerMethod.java:136)
> at org.springframework.web.servlet.mvc.method.annotation.
> ServletInvocableHandlerMethod.invokeAndHandle(
> ServletInvocableHandlerMethod.
> java:104)
> at org.springframework.web.servlet.mvc.method.annotation.
> RequestMappingHandlerAdapter.invokeHandleMethod(
> RequestMappingHandlerAdapter.java:743)
> at org.springframework.web.servlet.mvc.method.annotation.
> RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.
> java:672)
> at org.springframework.web.servlet.mvc.method.
> AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:82)
> at org.springframework.web.servlet.DispatcherServlet.
> doDispatch(DispatcherServlet.java:933)
> at org.springframework.web.servlet.DispatcherServlet.
> doService(DispatcherServlet.java:867)
> at org.springframework.web.servlet.FrameworkServlet.
> processRequest(
> FrameworkServlet.java:951)
> at org.springframework.web.servlet.FrameworkServlet.
> doPost(FrameworkServlet.java:853)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)
> at org.springframework.web.servlet.FrameworkServlet.
> service(FrameworkServlet.java:827)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
> at org.apache.catalina.core.ApplicationFilterChain.
> internalDoFilter(
> ApplicationFilterChain.java:303)
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:208)
> at org.apache.tomcat.websocket.server.WsFilter.doFilter(
> WsFilter.java:52)
> at org.apache.catalina.core.ApplicationFilterChain.
> internalDoFilter(
> ApplicationFilterChain.java:241)
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:208)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:330)
> at org.springframework.security.web.access.intercept.
> FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
> at org.springframework.security.web.access.intercept.
> FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
> at org.springframework.security.web.FilterChainProxy$
> VirtualFilterChain.doFilter(FilterChainProxy.java:342)
> at org.springframework.security.web.access.
> Excepti

Re: Kylin是否能够支持衍生指标

2017-03-31 Thread Alberto Ramón

you can check this links

http://apache-kylin.74782.x6.nabble.com/Derived-measures-in-Kylin-td5513.html

http://apache-kylin.74782.x6.nabble.com/Does-Kylin-support-percent-function-td7061.html

https://issues.apache.org/jira/browse/KYLIN-976

2017-03-31 10:20 GMT+01:00 赵天烁 :

> 简单的加减乘除计算都是支持的啊？我们现在就有用到
>
> 
> 赵天烁
> Kevin Zhao
> zhaotians...@meizu.com
>
> 珠海市魅族科技有限公司
> MEIZU Technology Co., Ltd.
> 广东省珠海市科技创新海岸魅族科技楼
> MEIZU Tech Bldg., Technology & Innovation Coast
> Zhuhai, 519085, Guangdong, China
> meizu.com
>
> 发件人： 老衲爱吃肉
> 发送时间： 2017-03-31 17:16
> 收件人： dev
> 主题： Kylin是否能够支持衍生指标
>
> Dear ALL，
> 您好，我们这边在使用Apache Kylin的时候遇到了一个问题，
> 我们公司大部分数据分析场景下需要用到衍生指标，但是现有的Kylin并不支持，请问我们能够在Kylin的基础上进行二次开发，以便支持sum(a)/b
> 、a+b-c、a+(sum(b)/c)-d这种场景的指标业务。
>
> 如能收到您的回复，我们将会非常感激，谢谢！
>

Re: wait for stpe1 : Create Intermediate Flat Hive Table

2017-03-29 Thread Alberto Ramón

Thanks ShaoFeng Shi!!

I will review your changes and apply it to:
https://github.com/albertoRamon/Kylin/tree/master/KylinWithSQuirreL
https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain

And will send you a commit with these new tech notes



2017-03-29 10:16 GMT+01:00 ShaoFeng Shi :

> Hi Alberto,
>
> Sorry for the late response; I merged the patch today, and made minor
> update on it for some wording. Now these two documents are on Kylin
> document page. Let me know if have any question. Thanks for your
> contribution!
>
>
>
> 2017-03-27 7:25 GMT+08:00 Alberto Ramón :
>
> > Uppss "*ezmlm-reject: fatal: Sorry, I don't accept messages larger than
> > 100 bytes* "
> >
> > I put in my GooDrive:
> > https://drive.google.com/open?id=0B-6nZ2q-HPTNU3BHU3BkMURXYlE
> >
> > 2017-03-27 0:02 GMT+01:00 Alberto Ramón :
> >
> > > Hello
> > >
> > > I attached the path with two notes: (Kylin performance and Hue
> > > Integration)
> > >
> > > https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance
> > > https://github.com/albertoRamon/Kylin/tree/master/KylinWithHue
> > >
> > >
> > >
> > > I prepared also (SquirreL integration and Resume Integration) if
> somebody
> > > have any suggestion ...write me, I will wait until the two previous
> ones
> > > have been commited
> > >
> > > https://github.com/albertoRamon/Kylin/tree/master/KylinWithSQuirreL
> > > https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
> > >
> > >
> > > 2017-03-25 1:34 GMT+00:00 Alberto Ramón :
> > >
> > >> Ok.  I will prepare for 2.0 folder
> > >>
> > >> El 25 mar. 2017 1:24 a. m., "Li Yang"  escribió:
> > >>
> > >>> Btw, I just created the "_docs20" folder for latest 2.0 documents.
> > Please
> > >>> make sure this great new doc goes there.
> > >>>
> > >>> Thanks Alb!
> > >>>
> > >>> On Sat, Mar 25, 2017 at 7:46 AM, Li Yang  wrote:
> > >>>
> > >>> > Great document! I love it!!
> > >>> >
> > >>> > On Tue, Mar 21, 2017 at 9:22 AM, Alberto Ramón <
> > >>> a.ramonporto...@gmail.com>
> > >>> > wrote:
> > >>> >
> > >>> >> This is the final text
> > >>> >>
> > >>> >> https://github.com/albertoRamon/Kylin/tree/
> master/KylinPerformance
> > >>> >>
> > >>> >> Open to sugestions, meanwhile I will prepare the path for Kelly
> > >>> version
> > >>> >>
> > >>> >> Alb
> > >>> >>
> > >>> >> 2017-03-20 18:31 GMT+00:00 Li Yang :
> > >>> >>
> > >>> >> > You could pack the changes in a patch or a Pull Request,
> announce
> > >>> it in
> > >>> >> a
> > >>> >> > JIRA, then people will be able to review.
> > >>> >> >
> > >>> >> > :-)
> > >>> >> >
> > >>> >> > On Sat, Mar 18, 2017 at 8:25 PM, Alberto Ramón <
> > >>> >> a.ramonporto...@gmail.com>
> > >>> >> > wrote:
> > >>> >> >
> > >>> >> > > @Li Yang, yes, I have an improved version in my laptop, Im
> > >>> adatpting
> > >>> >> to
> > >>> >> > > kelly
> > >>> >> > > (also of the other tech notes)
> > >>> >> > >
> > >>> >> > > If somebody want review  fell free to said me  :)
> > >>> >> > >
> > >>> >> > > 2017-03-17 18:33 GMT+00:00 Li Yang :
> > >>> >> > >
> > >>> >> > > > Thank you Alberto! Looking forward to a better kylin manual.
> > >>> :-)
> > >>> >> > > >
> > >>> >> > > > Yang
> > >>> >> > > >
> > >>> >> > > > On Sat, Mar 11, 2017 at 5:16 AM, Alberto Ramón <
> > >>> >> > > a.ramonporto...@gmail.com>
> > >>> >> > > > wrote:
> > >>> >> > > >
> > >>> >> > > > > try to use ORC format in Hive with compression, the
> result:
> > >>> >> > > > > https://github.com/albertoRamon/Kylin/raw/master/
> > >>> >> > > > > KylinPerformance/Images/08.png
> > >>> >> > > > > (My apologies, I'm in process to improve these notes and
> put
> > >>> un
> > >>> >> kylin
> > >>> >> > > > > manual)
> > >>> >> > > > >
> > >>> >> > > > > Also partition fact table:
> > >>> >> > > > > http://kylin.apache.org/docs16/howto/howto_optimize_
> > build.ht
> > >>> ml
> > >>> >> > > > >
> > >>> >> > > > > 2017-03-06 8:24 GMT+00:00 h...@soonchina.cn <
> > >>> h...@soonchina.cn>:
> > >>> >> > > > >
> > >>> >> > > > > > 您好：
> > >>> >> > > > > >   我遇到一个问题，kylin的cube job 一直卡在stpe1 : Create Intermediate
> > >>> Flat
> > >>> >> Hive
> > >>> >> > > > > > Table，已经72分钟，没有发现任何错误日志。是我什么地方配置有问题吗？
> > >>> >> > > > > > 谢谢
> > >>> >> > > > > >
> > >>> >> > > > > >
> > >>> >> > > > > > h...@soonchina.cn
> > >>> >> > > > > >
> > >>> >> > > > >
> > >>> >> > > >
> > >>> >> > >
> > >>> >> >
> > >>> >>
> > >>> >
> > >>> >
> > >>>
> > >>
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>

Re: Problem with building other cube than Sample [Cloudera]

2017-03-29 Thread Alberto Ramón

To see actual jobs of YARN: localhost:8088
To see historical jobs (has been finished): localhost:19888

(check this ports with your Hadoop distribution and actual configuration)

2017-03-29 8:26 GMT+01:00 Bart :

> To see what YARN is saying I need to kill my process. It shows me the
> problem
> with memory, but I'm not sure whether it is True or just because killing
> the
> process.
>
> I've created a database with one table of fact and one dimension (with date
> obły). I pull two "facts" inside table of fact and two dates into table of
> dimension - so it's really, really small! I did a model and a cube based on
> that model. I did it like tutorial says and it doesn't work :(
>
> I gonna do it once again and put here the YARN's log.
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Problem-with-building-other-cube-than-
> Sample-Cloudera-tp7533p7543.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: Problem with building other cube than Sample [Cloudera]

2017-03-28 Thread Alberto Ramón

Did you check the YARN's log ?   :)

but this doubts is not dev mailList ...

2017-03-28 23:03 GMT+01:00 Bart :

> Hi there!
>
> I'm new in that stuff, but I'm really enjoying Kylin at all.
>
> First of all, I'm using Cloudera, cdh 5.8.0, Hadoop 2.6.0. I have Hive,
> HBase and so on - it's Cloudera based.
>
> I've installed version of Kylin which is adapted to my Cloudera. I started
> Kylin, did a Sample Cube - everything is working fine. Cube was built.
>
> I couldn't do a tutorial cube with steps:
> http://kylin.apache.org/docs20/tutorial/create_cube.html It doesn't work!
> Other ones neither! It stops on Kylin_Cube_Builer_namenamename_Cube (5%)
> [Hue Job Browser]
>
> When I'm trying to do my own Cube - it stops on "Build Cube" step as well -
> I can see it in Hue Job Browser. It stops every time on 5%. Even if I try
> to
> use my own Database (which was built in Hive), the "Default" database (new
> tables) and the same database and tables (sample above)
>
> Of course I could give you another informations if you need, but please -
> try to help me. I'm just finding the answer for 2 weeks and there's no end
> of it :(
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Problem-with-building-other-cube-than-
> Sample-Cloudera-tp7533.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: 答复: How to Custom Aggregate Functions;

2017-03-28 Thread Alberto Ramón

I never try to used (sorry)

See the last comment (
https://issues.apache.org/jira/browse/KYLIN-976?focusedCommentId=15331862&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15331862)
said KYLIN-1186 used it (you can use as example)

2017-03-28 8:51 GMT+01:00 quyang :

> Thank you, but I still don't know how to do it. I don't know how to load
> into kylin.
>
> -----邮件原件-
> 发件人: Alberto Ramón [mailto:a.ramonporto...@gmail.com]
> 发送时间: 2017年3月28日 15:17
> 收件人: dev
> 主题: Re: How to Custom Aggregate Functions;
>
> Check this
>
> https://issues.apache.org/jira/browse/KYLIN-976
>
>
> 2017-03-28 4:59 GMT+01:00 quyang :
>
> > I write a json for metadata and how to do it?
> >
> > measureTypeFactory Export a jar ?
> >
> > I need a tutorial .
> >
> > Excuse me, my English is poor.
> >
> >
> >
> >
>
>

Re: How to Custom Aggregate Functions;

2017-03-28 Thread Alberto Ramón

Check this

https://issues.apache.org/jira/browse/KYLIN-976


2017-03-28 4:59 GMT+01:00 quyang :

> I write a json for metadata and how to do it?
>
> measureTypeFactory Export a jar ?
>
> I need a tutorial .
>
> Excuse me, my English is poor.
>
>
>
>

Re: wait for stpe1 : Create Intermediate Flat Hive Table

2017-03-26 Thread Alberto Ramón

Uppss "*ezmlm-reject: fatal: Sorry, I don't accept messages larger than
100 bytes* "

I put in my GooDrive:
https://drive.google.com/open?id=0B-6nZ2q-HPTNU3BHU3BkMURXYlE

2017-03-27 0:02 GMT+01:00 Alberto Ramón :

> Hello
>
> I attached the path with two notes: (Kylin performance and Hue
> Integration)
>
> https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance
> https://github.com/albertoRamon/Kylin/tree/master/KylinWithHue
>
>
>
> I prepared also (SquirreL integration and Resume Integration) if somebody
> have any suggestion ...write me, I will wait until the two previous ones
> have been commited
>
> https://github.com/albertoRamon/Kylin/tree/master/KylinWithSQuirreL
> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>
>
> 2017-03-25 1:34 GMT+00:00 Alberto Ramón :
>
>> Ok.  I will prepare for 2.0 folder
>>
>> El 25 mar. 2017 1:24 a. m., "Li Yang"  escribió:
>>
>>> Btw, I just created the "_docs20" folder for latest 2.0 documents. Please
>>> make sure this great new doc goes there.
>>>
>>> Thanks Alb!
>>>
>>> On Sat, Mar 25, 2017 at 7:46 AM, Li Yang  wrote:
>>>
>>> > Great document! I love it!!
>>> >
>>> > On Tue, Mar 21, 2017 at 9:22 AM, Alberto Ramón <
>>> a.ramonporto...@gmail.com>
>>> > wrote:
>>> >
>>> >> This is the final text
>>> >>
>>> >> https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance
>>> >>
>>> >> Open to sugestions, meanwhile I will prepare the path for Kelly
>>> version
>>> >>
>>> >> Alb
>>> >>
>>> >> 2017-03-20 18:31 GMT+00:00 Li Yang :
>>> >>
>>> >> > You could pack the changes in a patch or a Pull Request, announce
>>> it in
>>> >> a
>>> >> > JIRA, then people will be able to review.
>>> >> >
>>> >> > :-)
>>> >> >
>>> >> > On Sat, Mar 18, 2017 at 8:25 PM, Alberto Ramón <
>>> >> a.ramonporto...@gmail.com>
>>> >> > wrote:
>>> >> >
>>> >> > > @Li Yang, yes, I have an improved version in my laptop, Im
>>> adatpting
>>> >> to
>>> >> > > kelly
>>> >> > > (also of the other tech notes)
>>> >> > >
>>> >> > > If somebody want review  fell free to said me  :)
>>> >> > >
>>> >> > > 2017-03-17 18:33 GMT+00:00 Li Yang :
>>> >> > >
>>> >> > > > Thank you Alberto! Looking forward to a better kylin manual.
>>> :-)
>>> >> > > >
>>> >> > > > Yang
>>> >> > > >
>>> >> > > > On Sat, Mar 11, 2017 at 5:16 AM, Alberto Ramón <
>>> >> > > a.ramonporto...@gmail.com>
>>> >> > > > wrote:
>>> >> > > >
>>> >> > > > > try to use ORC format in Hive with compression, the result:
>>> >> > > > > https://github.com/albertoRamon/Kylin/raw/master/
>>> >> > > > > KylinPerformance/Images/08.png
>>> >> > > > > (My apologies, I'm in process to improve these notes and put
>>> un
>>> >> kylin
>>> >> > > > > manual)
>>> >> > > > >
>>> >> > > > > Also partition fact table:
>>> >> > > > > http://kylin.apache.org/docs16/howto/howto_optimize_build.ht
>>> ml
>>> >> > > > >
>>> >> > > > > 2017-03-06 8:24 GMT+00:00 h...@soonchina.cn <
>>> h...@soonchina.cn>:
>>> >> > > > >
>>> >> > > > > > 您好：
>>> >> > > > > >   我遇到一个问题，kylin的cube job 一直卡在stpe1 : Create Intermediate
>>> Flat
>>> >> Hive
>>> >> > > > > > Table，已经72分钟，没有发现任何错误日志。是我什么地方配置有问题吗？
>>> >> > > > > > 谢谢
>>> >> > > > > >
>>> >> > > > > >
>>> >> > > > > > h...@soonchina.cn
>>> >> > > > > >
>>> >> > > > >
>>> >> > > >
>>> >> > >
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>

Re: wait for stpe1 : Create Intermediate Flat Hive Table

2017-03-24 Thread Alberto Ramón

Ok.  I will prepare for 2.0 folder

El 25 mar. 2017 1:24 a. m., "Li Yang"  escribió:

> Btw, I just created the "_docs20" folder for latest 2.0 documents. Please
> make sure this great new doc goes there.
>
> Thanks Alb!
>
> On Sat, Mar 25, 2017 at 7:46 AM, Li Yang  wrote:
>
> > Great document! I love it!!
> >
> > On Tue, Mar 21, 2017 at 9:22 AM, Alberto Ramón <
> a.ramonporto...@gmail.com>
> > wrote:
> >
> >> This is the final text
> >>
> >> https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance
> >>
> >> Open to sugestions, meanwhile I will prepare the path for Kelly version
> >>
> >> Alb
> >>
> >> 2017-03-20 18:31 GMT+00:00 Li Yang :
> >>
> >> > You could pack the changes in a patch or a Pull Request, announce it
> in
> >> a
> >> > JIRA, then people will be able to review.
> >> >
> >> > :-)
> >> >
> >> > On Sat, Mar 18, 2017 at 8:25 PM, Alberto Ramón <
> >> a.ramonporto...@gmail.com>
> >> > wrote:
> >> >
> >> > > @Li Yang, yes, I have an improved version in my laptop, Im adatpting
> >> to
> >> > > kelly
> >> > > (also of the other tech notes)
> >> > >
> >> > > If somebody want review  fell free to said me  :)
> >> > >
> >> > > 2017-03-17 18:33 GMT+00:00 Li Yang :
> >> > >
> >> > > > Thank you Alberto! Looking forward to a better kylin manual.  :-)
> >> > > >
> >> > > > Yang
> >> > > >
> >> > > > On Sat, Mar 11, 2017 at 5:16 AM, Alberto Ramón <
> >> > > a.ramonporto...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > try to use ORC format in Hive with compression, the result:
> >> > > > > https://github.com/albertoRamon/Kylin/raw/master/
> >> > > > > KylinPerformance/Images/08.png
> >> > > > > (My apologies, I'm in process to improve these notes and put un
> >> kylin
> >> > > > > manual)
> >> > > > >
> >> > > > > Also partition fact table:
> >> > > > > http://kylin.apache.org/docs16/howto/howto_optimize_build.html
> >> > > > >
> >> > > > > 2017-03-06 8:24 GMT+00:00 h...@soonchina.cn  >:
> >> > > > >
> >> > > > > > 您好：
> >> > > > > >   我遇到一个问题，kylin的cube job 一直卡在stpe1 : Create Intermediate Flat
> >> Hive
> >> > > > > > Table，已经72分钟，没有发现任何错误日志。是我什么地方配置有问题吗？
> >> > > > > > 谢谢
> >> > > > > >
> >> > > > > >
> >> > > > > > h...@soonchina.cn
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: wait for stpe1 : Create Intermediate Flat Hive Table

2017-03-20 Thread Alberto Ramón

This is the final text

https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance

Open to sugestions, meanwhile I will prepare the path for Kelly version

Alb

2017-03-20 18:31 GMT+00:00 Li Yang :

> You could pack the changes in a patch or a Pull Request, announce it in a
> JIRA, then people will be able to review.
>
> :-)
>
> On Sat, Mar 18, 2017 at 8:25 PM, Alberto Ramón 
> wrote:
>
> > @Li Yang, yes, I have an improved version in my laptop, Im adatpting to
> > kelly
> > (also of the other tech notes)
> >
> > If somebody want review  fell free to said me  :)
> >
> > 2017-03-17 18:33 GMT+00:00 Li Yang :
> >
> > > Thank you Alberto! Looking forward to a better kylin manual.  :-)
> > >
> > > Yang
> > >
> > > On Sat, Mar 11, 2017 at 5:16 AM, Alberto Ramón <
> > a.ramonporto...@gmail.com>
> > > wrote:
> > >
> > > > try to use ORC format in Hive with compression, the result:
> > > > https://github.com/albertoRamon/Kylin/raw/master/
> > > > KylinPerformance/Images/08.png
> > > > (My apologies, I'm in process to improve these notes and put un kylin
> > > > manual)
> > > >
> > > > Also partition fact table:
> > > > http://kylin.apache.org/docs16/howto/howto_optimize_build.html
> > > >
> > > > 2017-03-06 8:24 GMT+00:00 h...@soonchina.cn :
> > > >
> > > > > 您好：
> > > > >   我遇到一个问题，kylin的cube job 一直卡在stpe1 : Create Intermediate Flat Hive
> > > > > Table，已经72分钟，没有发现任何错误日志。是我什么地方配置有问题吗？
> > > > > 谢谢
> > > > >
> > > > >
> > > > > h...@soonchina.cn
> > > > >
> > > >
> > >
> >
>

Ozone

2017-03-18 Thread Alberto Ramón

Hi

I saw the PPT of Apahe Kylin 2.0 for the Haodoop Summit of Li Yang

In the "What is the next" section, can be interesting for Apache Kylin,
evaluate Apache Ozone (PPT of Summit 2015

Video   HDFS-7240
 ) same ideas than Kudu
but implemented in HDFS directly and perhaps an "HBase Killer" for some
uses case

BR, Alb

Re: Question Regrading Cube Query Time

2017-03-18 Thread Alberto Ramón

Hi

can you try to rebuild cube with a new measure? TopN

2017-03-17 17:58 GMT+00:00 Li Yang :

> You didn't mention the Kylin version. Seems to be 1.6 from the
> configuration property.
>
> The properties related to region number are (note names are slightly
> differently in 1.6):
> kylin.storage.hbase.region-cut-gb=5
> kylin.storage.hbase.min-region-count=1
> kylin.storage.hbase.max-region-count=500
>
> As to the query, it is a simple OLAP query and should be lightening fast if
> you got the right cube and model. This talk on Apache Kylin 2.0 touches a
> bit about TPC-H on Kylin, which may give ideas.
>
> The rowkey order also impact as HBase does not have secondary index. You
> want "d_moy" and "i_manufact_id" be at (or near) the head of rowkey to get
> best performance of this query.
>
> If you still have problem, there are some online tuning tools for Kylin
> that you can try.
>
> Cheers
> Yang
>
>
> On Fri, Mar 10, 2017 at 1:42 AM, 
> wrote:
>
> > Hello,
> > I am doing a POC on kylin Cubes, I have built a Cube on TPC-DS data
> > (~40GB). The build was successful, but i am facing issues with queries.
> > Simple aggregation queries are returning results in sub seconds, but
> > queries with order by/group by taking too much time. In first place,
> > queries were failing with timeout error because of records scan
> threshold,
> > i then increased "kylin.query.scan.threshold" value in kylin.properties.
> > The threshold error got fixed, but queries were taking around 200 sec.
> > Which is totally not acceptable because HIVE was returning result in 10
> > seconds for the same query. I am attaching one of the query(standard
> TPC-DS
> > query q3) i am trying to run,
> > SELECT date_dim.d_year,item.i_brand_id, item.i_brand,sum(facttable.ss_
> ext_discount_amt)
> > sum_agg FROM store_sales facttableINNER JOIN date_dim date_dim ON
> > (facttable.ss_sold_date_sk = date_dim.d_date_sk)INNER JOIN item item ON
> > (facttable.ss_item_sk = item.i_item_sk) WHERE item.i_manufact_id =
> > 783 and date_dim.d_moy = 11 GROUP BY date_dim.d_year,
> item.i_brand,item.i_brand_id ORDER
> > BY date_dim.d_year,sum_agg DESC,item.i_brand_idLIMIT 100;
> > My cluster details are,10 nodes(each node has 32 cores, 64GB RAM) with
> hdp
> > 2.5HBase 1.1.2.2.5.3.0-37 (fully distributed mode)
> >
> > Just to investigate, i checked region server logs of all the nodes and
> > found that during query execution only one region server was doing all
> the
> > work while others were idle. And, my Cube's Hbase table was also showing
> 1
> > region count, So i tried changing following properties but still no luck.
> > kylin.hbase.hfile.size.gb=1kylin.hbase.region.count.min=8
> > Please let me know, if there is any other configuration needed in order
> to
> > fix large query time.
> > Thanks
> >
> >
>

Re: wait for stpe1 : Create Intermediate Flat Hive Table

2017-03-18 Thread Alberto Ramón

@Li Yang, yes, I have an improved version in my laptop, Im adatpting to
kelly
(also of the other tech notes)

If somebody want review  fell free to said me  :)

2017-03-17 18:33 GMT+00:00 Li Yang :

> Thank you Alberto! Looking forward to a better kylin manual.  :-)
>
> Yang
>
> On Sat, Mar 11, 2017 at 5:16 AM, Alberto Ramón 
> wrote:
>
> > try to use ORC format in Hive with compression, the result:
> > https://github.com/albertoRamon/Kylin/raw/master/
> > KylinPerformance/Images/08.png
> > (My apologies, I'm in process to improve these notes and put un kylin
> > manual)
> >
> > Also partition fact table:
> > http://kylin.apache.org/docs16/howto/howto_optimize_build.html
> >
> > 2017-03-06 8:24 GMT+00:00 h...@soonchina.cn :
> >
> > > 您好：
> > >   我遇到一个问题，kylin的cube job 一直卡在stpe1 : Create Intermediate Flat Hive
> > > Table，已经72分钟，没有发现任何错误日志。是我什么地方配置有问题吗？
> > > 谢谢
> > >
> > >
> > > h...@soonchina.cn
> > >
> >
>

Re: 问题咨询

2017-03-16 Thread Alberto Ramón

fyi

http://apache-kylin.74782.x6.nabble.com/Hive-server-vs-Impala-server-td5639.html

2017-03-14 5:17 GMT+00:00 罗生亮 :

> kylin能对接kudu和Impala吗？
>
>
>
> 罗生亮
>

Re: count(distinct case when condittion) &count(distinct) return same result

2017-03-11 Thread Alberto Ramón

jaja, sorry

2017-03-11 8:14 GMT+00:00 Alberto Ramón :

> is this the same problem?
> https://issues.apache.org/jira/browse/KYLIN-2341
>
> 2017-03-11 1:04 GMT+00:00 Li Yang :
>
>> The behavior resembles KYLIN-2341. Still worth a new JIRA to track.
>>
>> https://issues.apache.org/jira/browse/KYLIN-2500
>>
>>
>> On Thu, Mar 9, 2017 at 2:48 PM, Roy  wrote:
>>
>> > Hi there,
>> >
>> > Use kylin insight submit below sentence,
>> > --select
>> > sentence--
>> > select
>> > count(distinct memberid) as memberid,
>> > count(distinct case when issuccess=1 then memberid else -1 end) as
>> > Xmemberid
>> > from
>> > tables
>> > where istest=0 and isvalid=1  and createdate>='2017-03-08'
>> >
>> > results
>> > memberid   Xmemberid
>> >
>> > 863049863049
>> > 
>> >
>> > retrun the same results,if add condittion include where the results is
>> > --condittion in
>> where
>> > -
>> > select
>> > count(distinct memberid) as Xmemberid
>> > from
>> > tables
>> > where istest=0 and isvalid=1 and issuccess=1 and
>> createdate>='2017-03-08'
>> >
>> > results
>> > memberid
>> > 637290
>> > --
>> >
>> > Why  appear this problem,anyone has encountered a similar situcation?
>> >
>> > Best Regards
>> >
>> > Roy
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>

Re: count(distinct case when condittion) &count(distinct) return same result

2017-03-11 Thread Alberto Ramón

is this the same problem?
https://issues.apache.org/jira/browse/KYLIN-2341

2017-03-11 1:04 GMT+00:00 Li Yang :

> The behavior resembles KYLIN-2341. Still worth a new JIRA to track.
>
> https://issues.apache.org/jira/browse/KYLIN-2500
>
>
> On Thu, Mar 9, 2017 at 2:48 PM, Roy  wrote:
>
> > Hi there,
> >
> > Use kylin insight submit below sentence,
> > --select
> > sentence--
> > select
> > count(distinct memberid) as memberid,
> > count(distinct case when issuccess=1 then memberid else -1 end) as
> > Xmemberid
> > from
> > tables
> > where istest=0 and isvalid=1  and createdate>='2017-03-08'
> >
> > results
> > memberid   Xmemberid
> >
> > 863049863049
> > 
> >
> > retrun the same results,if add condittion include where the results is
> > --condittion in
> where
> > -
> > select
> > count(distinct memberid) as Xmemberid
> > from
> > tables
> > where istest=0 and isvalid=1 and issuccess=1 and createdate>='2017-03-08'
> >
> > results
> > memberid
> > 637290
> > --
> >
> > Why  appear this problem,anyone has encountered a similar situcation?
> >
> > Best Regards
> >
> > Roy
> >
> >
> >
> >
> >
> >
> >
>

Re: wait for stpe1 : Create Intermediate Flat Hive Table

2017-03-10 Thread Alberto Ramón

try to use ORC format in Hive with compression, the result:
https://github.com/albertoRamon/Kylin/raw/master/KylinPerformance/Images/08.png
(My apologies, I'm in process to improve these notes and put un kylin
manual)

Also partition fact table:
http://kylin.apache.org/docs16/howto/howto_optimize_build.html

2017-03-06 8:24 GMT+00:00 h...@soonchina.cn :

> 您好：
>   我遇到一个问题，kylin的cube job 一直卡在stpe1 : Create Intermediate Flat Hive
> Table，已经72分钟，没有发现任何错误日志。是我什么地方配置有问题吗？
> 谢谢
>
>
> h...@soonchina.cn
>

Re: [jira] [Created] (KYLIN-2496) Table snapshot should be no greater than 300MB

2017-03-10 Thread Alberto Ramón

This jira is repeteated

On build cube process the Lookup tables are stored in RAM
Check these:

https://issues.apache.org/jira/browse/KYLIN-1869

*kylin.table.snapshot.max_mb *



2017-03-10 8:42 GMT+01:00 Kailun Zhang (JIRA) :

> Kailun Zhang created KYLIN-2496:
> ---
>
>  Summary: Table snapshot should be no greater than 300MB
>  Key: KYLIN-2496
>  URL: https://issues.apache.org/jira/browse/KYLIN-2496
>  Project: Kylin
>   Issue Type: Bug
> Affects Versions: v1.5.2
> Reporter: Kailun Zhang
>  Fix For: v1.5.2
>
>
> my fact table has 1000w terms,and join with look up table by userid,the
> look up table has 600w terms, I set the colums gender as dimension to build
> the cube,failed caused by java.lang.IllegalStateException:Table snapshot
> should be no greater than 300 MB,but TableDesc[database=mydatabase name=my
> table name] size is 1442042137.
> could kylin affords the high cardinality dimension to join?
> how can i resolve the promblem and biuld the cube,thanks!
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.15#6346)
>

Re: How kylin limit mapreduce task Memory!!!!

2017-03-07 Thread Alberto Ramón

 ???

Please check this properties

   -

   k*yl**in.job.mr.config.override.mapred.map.child.**java.opts=-Xmx8g *(for
   example)
   -

   *kylin.job.mr.config.override.mapreduce.map.memory.mb=8500 *(for example)

Check the prefix "*kylin.job.mr.config.override.*"

2017-03-08 8:28 GMT+01:00 《秦殇》！健 :

> When I build cube, it occured a error. as follws:
> Invaild Resouce request, requested Memory <0, or requested Memory> max
> configured, requestedMemory=122880, maxMemory=61440.
>
>
> My env is CDH5.7 + Kylin1.6.
>
>
> Please help me T_T!!

Re: Kylin build 13 Step (Build Cube) has error

2017-02-23 Thread Alberto Ramón

This is a very tipical problem with YARN, you will find a lot of info on
Internet.
Review this parameters, and check if your hardward system is enougth
(The values are an exaple)

   -

   yarn.nodemanager.resource.memory-mb = 15 GB
   -

   yarn.scheduler.maximum-allocation-mb = 8 GB
   -

   yarn.nodemanager.resource.cpu-vcores = 8 cores


2017-02-23 9:37 GMT+01:00 《秦殇》！健 :

> hi, I build a cube in today. It Suddenly appear one error.The error
> message is as follows:
>
>
> MAP capability required is more than the supported max container
> capability in the cluster. Killing the Job. mapResourceRequest:
>  maxContainerCapability:
> Job received Kill while in RUNNING state.
>
>
>
> My env is Kylin1.6 + cdh5.7 + memory size 12G. What should I do can solve
> it.

Re: 回复： Unkown main cluster host when build new cube

2017-02-19 Thread Alberto Ramón

:)

El 20/2/2017 3:59, "柯南"  escribió:

> That question which you to point out is mine :)
>
>
>
>  -- 原始邮件 ------
>   发件人: "Alberto Ramón";;
>  发送时间: 2017年2月19日(星期天) 晚上8:10
>  收件人: "dev";
>
>  主题: Re: Unkown main cluster host when build new cube
>
>
>
> Check this
>
> http://apache-kylin.74782.x6.nabble.com/Error-in-kylin-
> with-standalone-HBase-cluster-td6901.html
>
> Anyway this configuration, I don't know if this is recomended for
> production environment
>
> 2017-02-19 12:25 GMT+01:00 柯南 :
>
> > hi,all：
> >   I want to deploy apache kylin with standalone HBase cluster,and
> > error occur on step15(Convert Cuboid Data to HFile), I think it's because
> > server which in hbase cluster don't know the host of main cluster with NN
> > HA.Maybe in offical doc(http://kylin.apache.org/
> > blog/2016/06/10/standalone-hbase-cluster) it gives two choices, "merge
> > NN-HA related configs of two clusters" or "Update Hbase and yarn
> version",
> > unfortunately neither of them we could do on product environment. Could I
> > have other choices? Or give me some advice to change source code of
> kylin?
> >  Thank you! Looking forward to your reply.

Re: Unkown main cluster host when build new cube

2017-02-19 Thread Alberto Ramón

Check this

http://apache-kylin.74782.x6.nabble.com/Error-in-kylin-with-standalone-HBase-cluster-td6901.html

Anyway this configuration, I don't know if this is recomended for
production environment

2017-02-19 12:25 GMT+01:00 柯南 :

> hi,all：
>   I want to deploy apache kylin with standalone HBase cluster,and
> error occur on step15(Convert Cuboid Data to HFile), I think it's because
> server which in hbase cluster don't know the host of main cluster with NN
> HA.Maybe in offical doc(http://kylin.apache.org/
> blog/2016/06/10/standalone-hbase-cluster) it gives two choices, "merge
> NN-HA related configs of two clusters" or "Update Hbase and yarn version",
> unfortunately neither of them we could do on product environment. Could I
> have other choices? Or give me some advice to change source code of kylin?
>  Thank you! Looking forward to your reply.

Re: [jira] [Created] (KYLIN-2445) UI: select cube engine in "Advanced setting" page

2017-02-14 Thread Alberto Ramón

check the label "*Fix Version/s:*" of this JIRA, its planinig for 2.0 branch

2017-02-14 7:49 GMT+01:00 rahulsingh :

> In which version of Kylin this is applied?
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/jira-Created-KYLIN-2445-UI-select-cube-engine-in-
> Advanced-setting-page-tp7193p7194.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: about cube building?

2017-02-14 Thread Alberto Ramón

about NodeJS, there is another open thread "Kylin with Node JS"

http://apache-kylin.74782.x6.nabble.com/Kylin-with-Node-JS-td7180.html

2017-02-14 14:43 GMT+01:00 rahulsingh :

> need some clarification.
>
> 1. Is there any kylin package for nodejs?
> 2. Can we use nodejs as backend with kylin?
>
> Thank you
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/about-cube-building-tp7181p7204.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: about cube building?

2017-02-14 Thread Alberto Ramón

The size isn't a problem thanks to HBase. See this

cubes of TB works fine, and one project can hold several cubes ;)

About the merge slowness I can't see any problem

2017-02-14 10:07 GMT+01:00 rahulsingh :

> And how much data(in size) we can keep in cube?
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/about-cube-building-tp7181p7195.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: about cube building?

2017-02-13 Thread Alberto Ramón

I cant see any error
How many GB are per day?

2017-02-13 18:16 GMT+01:00 rahulsingh :

> Yes, it is a full date in the format of -MM-DD
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/about-cube-building-tp7181p7187.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: about cube building?

2017-02-13 Thread Alberto Ramón

Merge segments in Kylin, means rebuild HBase region
Is your partition Column a DaySlot ?

2017-02-13 15:30 GMT+01:00 rahulsingh :

> Thank you for the solution.
>
> But I have already gone through this documentation.
> Now my scenario is that i am building cube of 30 days data is first build
> then again other 30 days data in second build, then i m merging the cube,
> while merging it is taking around 150 mins to merge, the i have added 1 day
> data to the same cube and again merge the both i.e(60 days and 1 day), and
> it is taking around 160 mins to merge.
>
> Here my question is why cube is taking this much of time while merging of 1
> day data?
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/about-cube-building-tp7181p7185.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: Kylin with Node JS

2017-02-13 Thread Alberto Ramón

Se "Use Restful API in JavaScript
"

2017-02-13 14:20 GMT+01:00 rahulsingh :

> Can we connect kylin with Node Js?? like kylin with JDBC..
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Kylin-with-Node-JS-tp7180.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: about cube building?

2017-02-13 Thread Alberto Ramón

Hello,

About the first steps, you can see, "Quick Start with Sample cube"


About, "incremental period for cube", see "Create Cube
" and Retention
Range (step 4 of this manual). More detailed description on KYLIN-906


2017-02-13 14:26 GMT+01:00 rahulsingh :

> what are ideal way to build cube? Like what are the parameters we should
> keep
> in mind before building cube?
> how much data(in days) can be stored in cube? What is the best incremental
> period for cube?
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/about-cube-building-tp7181.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: Re: java.lang.RuntimeException:unexpectedevictreasonCOLLECTED

2017-02-12 Thread Alberto Ramón

https://issues.apache.org/jira/browse/KYLIN-2316?focusedCommentId=15857574&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15857574

This error only happens if you used global dicctionary on Dim Colum

2017-02-12 10:57 GMT+01:00 446463...@qq.com <446463...@qq.com>:

> thx.But I rebuild the cube,it is scueessful.What happened?
>
>
>
> 446463...@qq.com
>
> From: Alberto Ramón
> Date: 2017-02-12 17:34
> To: dev
> Subject: Re: java.lang.RuntimeException:unexpectedevictreasonCOLLECTED
> Exactly, this patch has been apply only to 2.0.0  (*"Fix Version/s:*
> v2.0.0"
> <https://issues.apache.org/jira/browse/KYLIN/fixforversion/12338647>)
> The actual version (1.6.0) don't have this commit and have this bug
>
>
>
> 2017-02-12 9:19 GMT+01:00 446463...@qq.com <446463...@qq.com>:
>
> > Hi all:
> > I meet this problem today when I build cube .
> > I view the maillist and find this problem in jira
> > https://issues.apache.org/jira/browse/KYLIN-2316
> > it's closed and is commited to the github 2 days ago
> > it means that this problem will be fixed in next version?
> > it's still in the current version?(apache-kylin-1.6.0-cdh5.7-bin.tar.gz)
> >
> >
> >
> > 446463...@qq.com
> >
>

Re: java.lang.RuntimeException:unexpectedevictreasonCOLLECTED

2017-02-12 Thread Alberto Ramón

Exactly, this patch has been apply only to 2.0.0  (*"Fix Version/s:* v2.0.0"
)
The actual version (1.6.0) don't have this commit and have this bug



2017-02-12 9:19 GMT+01:00 446463...@qq.com <446463...@qq.com>:

> Hi all:
> I meet this problem today when I build cube .
> I view the maillist and find this problem in jira
> https://issues.apache.org/jira/browse/KYLIN-2316
> it's closed and is commited to the github 2 days ago
> it means that this problem will be fixed in next version?
> it's still in the current version?(apache-kylin-1.6.0-cdh5.7-bin.tar.gz)
>
>
>
> 446463...@qq.com
>

Re: Does kylin support the high-availability mode that uses multiple job instance: one active job instance and several backups in case it fails？

2017-02-10 Thread Alberto Ramón

In  2.0, Kylin has full HA  !!

In 1.X has close to HA
-  it has HA to run your queries
-  it doesn't have HA to calculate cubes


Check this : you
can have 'n' queries nodes (to solve node) and use a Load Balancer to
distribute the load (MailList

)

About the job node, You need it to create new cubes and job scheduler, in
KYLIN-2006   v2.0.0 has HA

Either way, with 1.6 if job node fail they auto-resume pending task at
failure point (without data lost) (MailList

)

2017-02-10 4:07 GMT+01:00 xingpeng :

> HI,
>
> Does kylin support the high-availability mode that uses multiple job
> instance: one active job instance and several backups in case it fails？
>
> I want to enable kylin job instance high availability.What should I do
>
> Thanks
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Does-kylin-support-the-high-availability-mode-
> that-uses-multiple-job-instance-one-active-job-instanc-tp7153.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Query error on ODBC

2017-02-10 Thread Alberto Ramón

About KYLIN-2274 

The problem is this query run on OLEDB but doesn't on ODBC

*"Select * from Table where 1=2"*
*"ERROR: SQLColAttribute unknown attr, ColNum: 1"*

This query is used to return column names (read metadata) and
works fine on SquirrieL


In  KO_FETCH.CPP
,
dont have a case for constatns (1=2)

Re: Getting this exception at step 18 Build Cube

2017-02-07 Thread Alberto Ramón

you Can try:
 - add dim by dim, to find which column have a problem
 - Start use dicionary encoding and if work use integer or other
and then:
 - Check all FK of Fact table are on Dim
 - Check dont have duplicates PK on Dim



2017-02-07 8:35 GMT+01:00 rahulsingh :

> When I build with only fact table it build successfully. But when I work
> with
> fact table and lookup table it will throw the above exception.
> Please give if you have solution for this.
> thank you
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Getting-this-exception-at-step-18-Build-Cube-tp7108p7115.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: Getting this exception at step 18 Build Cube

2017-02-06 Thread Alberto Ramón

What is your Kylin version?
  There was some bug with old version KYLIN-1834
 and KYLIN-1934


Can you check if some of your Dim Data have unusual data ?  can you try
using diccitionary encoded ?
Can you check that al values of Fact table exists on Dim tables?



2017-02-06 16:05 GMT+01:00 rahulsingh :

> Error: java.io.IOException: Failed to build cube in mapper 0 at
> org.apache.kylin.engine.mr.steps.InMemCuboidMapper.
> cleanup(InMemCuboidMapper.java:145)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148) at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1714)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by:
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> java.io.IOException: java.io.IOException:
> java.lang.IllegalArgumentException: Value not exists! at
> java.util.concurrent.FutureTask.report(FutureTask.java:122) at
> java.util.concurrent.FutureTask.get(FutureTask.java:188) at
> org.apache.kylin.engine.mr.steps.InMemCuboidMapper.
> cleanup(InMemCuboidMapper.java:143)
> ... 8 more Caused by: java.lang.RuntimeException: java.io.IOException:
> java.io.IOException: java.lang.IllegalArgumentException: Value not exists!
> at
> org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1.
> run(AbstractInMemCubeBuilder.java:84)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException:
> java.io.IOException: java.lang.IllegalArgumentException: Value not exists!
> at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnce.build(
> DoggedCubeBuilder.java:128)
> at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder.
> build(DoggedCubeBuilder.java:75)
> at
> org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1.
> run(AbstractInMemCubeBuilder.java:82)
> ... 5 more Caused by: java.io.IOException:
> java.lang.IllegalArgumentException: Value not exists! at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnce.abort(
> DoggedCubeBuilder.java:196)
> at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$
> BuildOnce.checkException(DoggedCubeBuilder.java:169)
> at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnce.build(
> DoggedCubeBuilder.java:116)
> ... 7 more Caused by: java.lang.IllegalArgumentException: Value not
> exists!
> at
> org.apache.kylin.common.util.Dictionary.getIdFromValueBytes(
> Dictionary.java:162)
> at
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(
> TrieDictionary.java:167)
> at
> org.apache.kylin.common.util.Dictionary.getIdFromValue(Dictionary.java:98)
> at
> org.apache.kylin.dimension.DictionaryDimEnc$DictionarySerializer.
> serialize(DictionaryDimEnc.java:121)
> at
> org.apache.kylin.cube.gridtable.CubeCodeSystem.encodeColumnValue(
> CubeCodeSystem.java:121)
> at
> org.apache.kylin.cube.gridtable.CubeCodeSystem.encodeColumnValue(
> CubeCodeSystem.java:110)
> at org.apache.kylin.gridtable.GTRecord.setValues(GTRecord.java:93) at
> org.apache.kylin.gridtable.GTRecord.setValues(GTRecord.java:81) at
> org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConverter.convert(
> InMemCubeBuilderInputConverter.java:74)
> at
> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$InputConverter$1.next(
> InMemCubeBuilder.java:544)
> at
> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$InputConverter$1.next(
> InMemCubeBuilder.java:525)
> at
> org.apache.kylin.gridtable.GTAggregateScanner.iterator(
> GTAggregateScanner.java:139)
> at
> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.createBaseCuboid(
> InMemCubeBuilder.java:341)
> at
> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.
> build(InMemCubeBuilder.java:168)
> at
> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.
> build(InMemCubeBuilder.java:137)
> at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$SplitThread.run(
> DoggedCubeBuilder.java:284)
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Getting-this-exception-at-step-18-Build-Cube-tp7108.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: Hive and Kylin synchronization

2017-02-02 Thread Alberto Ramón

no, isn't sync
you can sync metadata from Model > Data Source, > Arrow> Sync
Changes in metadata may involve a invalid data model


2017-02-02 5:53 GMT+01:00 rahulsingh :

> how kylin tables are synchronized with hive. if we drop table from hive
> then
> what would happen?
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Hive-and-Kylin-synchronization-tp7060.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: Does Kylin support percent function?

2017-02-02 Thread Alberto Ramón

short answer: no

nowadays: We have Sum, count, Max , min, Avg, count distinct, TopN
Deprecated: Raw
Future: Percentile (see Kylin 2396)

Alternative solutions:
http://apache-kylin.74782.x6.nabble.com/Derived-measures-in-Kylin-td5513.html

Tip: See this bug Kylin 2341, I think can be important for you

2017-02-01 17:27 GMT+01:00 陈光亮 :

> HI,
>   sorry,my English very pool.
>   I have some problem when I  do demo.
>   I want take some fields to percent function.
>   For example:
>   table1
>   idhostpid
>   1local321
>   2local123
>   3google 987
>   4google 789
>   SELECT (SUM(CASE  WHEN (host="local") then 1 else 0 end)/count(*)*100)
> as hostpercent FROM table1
>   The sql result is 50%. Mysql or hive can do this,but i don't know does
> kylin support this action?
>   please help me, thank you!
>
>
>
>
>

Qlik

2017-01-25 Thread Alberto Ramón

It's in Spanish, but the picture is very clear

https://www.linkedin.com/pulse/qlik-cloudera-bigdatasmartdataanalytics-felipe-trigo?trk=hp-feed-article-title-publish

Re: New document: "How to optimize cube build"

2017-01-25 Thread Alberto Ramón

Be careful about partition by "FLIGHTDATE"

>From https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance

*"Option 1: Use id_date as partition column on Hive table. This have a big
problem: the Hive metastore is meant for few hundred of partitions not
thousand (Hive 9452 there is an idea to solve this isn’t in progress)*"

In Hive 2.0 will be a preview (only for testing) to solve this

2017-01-25 9:46 GMT+01:00 ShaoFeng Shi :

> Hello,
>
> A new document is added for the practices of cube build. Any suggestion or
> comment is welcomed. We can update the doc later with feedbacks;
>
> Here is the link:
> https://kylin.apache.org/docs16/howto/howto_optimize_build.html
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Re: hbase Very high read or write request count in a single RegionServer

2017-01-25 Thread Alberto Ramón

The solution of Li Yand works works from CDH 5.4, but If your production
env is HBase 1.2,  dreaming is free ;)   , you can also try:

https://issues.apache.org/jira/browse/HBASE-10070


2017-01-25 5:52 GMT+01:00 Li Yang :

> Try google 'hbase read replica'.
>
> Cheers
>
> On Tue, Jan 17, 2017 at 10:36 AM, read2me  wrote:
>
> > we scale hbase cluter,from three regionservers to twenty.and enable the
> > loadbance.then the num of regions on each regionserver is almost the
> > same.but  Very high read or write request count in a single RegionServer,
> > How to solve it.
> >
> > --
> > View this message in context: http://apache-kylin.74782.x6.
> > nabble.com/hbase-Very-high-read-or-write-request-count-
> > in-a-single-RegionServer-tp6959.html
> > Sent from the Apache Kylin mailing list archive at Nabble.com.
> >
>

Re: Aggregation with calculation

2017-01-11 Thread Alberto Ramón

This will explain the solution:

http://apache-kylin.74782.x6.nabble.com/Derived-measures-in-Kylin-td5513.html

2017-01-09 10:27 GMT+01:00 steelspace :

> Hello,
>
> I am aware that Kylin supports only one measure per cube. What I need is to
> calculate aggregation in the following simple case:
>
> - Table Product with Price fact
> - Table Sale with Quantity fact
>
> I'd like to get a cube (or cubes) that allow me to get aggregated values
> for
> Price * Quantity.
>
> How can I achieve that?
>
> Thanks
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Aggregation-with-calculation-tp6891.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: When kylin.job.run.as.remote.cmd=true, all hadoop、hive、hbase commands can be in remote cluster?

2017-01-07 Thread Alberto Ramón

To Apache Kylin Dev: This can be useful for you


HDFS-9666 <https://issues.apache.org/jira/browse/HDFS-9666>

(It Will be in the next HDFS release
<http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201701.mbox/%3CCAMs9kVjMdiw0AufB5u-HY%2B%2B%3D05ijLswZsQ7eyE7yLJxTh8EvsQ%40mail.gmail.com%3E>
2.9.0)

2017-01-05 10:22 GMT+01:00 Alberto Ramón :

> Upppsss,
> Some times JAR Client and Server are the same file, (this is a performance
> problem)
> This can be a problem with docker, a lot of dependencies and program
> runing, breakin the microservice philosophy
>
> This can be help you for *future*:
> https://github.com/apache/hbase/tree/master/hbase-client
> https://issues.apache.org/jira/browse/HDFS-6200  Finally finished, nice ¡¡
> Artifact org.apache.zookeeper.client (To Find HBase master)
>
> *IDEA*: Query nodes, 'only' are used to 'resolve user queries', only need
> access to HBase / Metatadata (¿?)
>
>
>
>
> 2017-01-05 2:33 GMT+01:00 ShaoFeng Shi :
>
>> No necessary be hadoop nodes; The only thing is the node that running
>> Kylin
>> should have hadoop/hive/hbase clients installed and configured. Docker
>> container is okay if it matches this.
>>
>> 2017-01-04 23:00 GMT+08:00 Alberto Ramón :
>>
>> > OK,
>> >
>> > I undertand that Kylin cluster mode must be on same nodes that hadoop
>> > cluster. Is it correct?
>> > I was thinked on put Kylin cluster mode on Docker Containers (Isolated
>> from
>> > Hadoop cluster)
>> >
>> > Thanks
>> >
>> > 2017-01-04 15:19 GMT+01:00 ShaoFeng Shi :
>> >
>> > > Here is the explainations in
>> > > examples/test_case_data/sandbox/kylin.properties:
>> > >
>> > > # If true, job engine will not assume that hadoop CLI reside on the
>> > > same server as it self
>> > > # you will have to specify kylin.job.remote-cli-hostname,
>> > > kylin.job.remote-cli-username and kylin.job.remote-cli-password
>> > > # It should not be set to "true" unless you're NOT running Kylin.sh on
>> > > a hadoop client machine
>> > > # (Thus kylin instance has to ssh to another real hadoop client
>> > > machine to execute hbase,hive,hadoop commands)
>> > > kylin.job.use-remote-cli=false
>> > >
>> > > # Only necessary when kylin.job.use-remote-cli=true
>> > > kylin.job.remote-cli-hostname=sandbox
>> > >
>> > > kylin.job.remote-cli-username=root
>> > >
>> > > # Only necessary when kylin.job.use-remote-cli=true
>> > > kylin.job.remote-cli-password=hadoop
>> > >
>> > >
>> > > Anyway, today Kylin need be running inner a Hadoop client machine.
>> > > Using a jumpbox is not recommended and may not work, the only scenario
>> > > is for development.
>> > >
>> > >
>> > > 2017-01-04 17:14 GMT+08:00 Alberto Ramón :
>> > >
>> > > > is it the same remote mode than cluster mode ?
>> > > > <http://kylin.apache.org/docs15/install/kylin_cluster.html>
>> > > >
>> > > >
>> > > >
>> > > > 2017-01-04 9:35 GMT+01:00 ShaoFeng Shi :
>> > > >
>> > > > > Hi Lei,
>> > > > >
>> > > > > The remote mode is not recommended and not supported in real
>> > > deployment.
>> > > > > (we only use it in dev environment because local laptop doesn't
>> have
>> > > > > hive/hbase/hadoop configured, need ssh to a remote sandbox vm to
>> run
>> > > > > commands).
>> > > > >
>> > > > > Query doesn't support it;
>> > > > >
>> > > > > 2017-01-04 15:18 GMT+08:00 张磊 <121762...@qq.com>:
>> > > > >
>> > > > > > Hi ,
>> > > > > >  I create a aws emr cluster which contains hadoop、hive、hbase
>> > client,
>> > > > and
>> > > > > i
>> > > > > > install kylin instance in another ec2 instance. Can kylin work?
>> > > > > >
>> > > > > >
>> > > > > > In kylin.properties, i find job engine can work in remote mode,
>> but
>> > > > query
>> > > > > > engine can also work well?
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Best regards,
>> > > > >
>> > > > > Shaofeng Shi 史少锋
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Best regards,
>> > >
>> > > Shaofeng Shi 史少锋
>> > >
>> >
>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>
>

Re: Hierarchical dimensions in Kylin?

2017-01-07 Thread Alberto Ramón

This will help you:

http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin#9

http://kylin.apache.org/docs/howto/howto_optimize_cubes.html

http://mail-archives.apache.org/mod_mbox/kylin-user/201612.mbox/%3CCANfpUctmQgPQ93mavF4MYJbFiJ2zGbPJuLpMdrcGytTaJrH0%2BQ%40mail.gmail.com%3E



2017-01-07 8:59 GMT+01:00 davout :

> Does Kylin support the notion of pre-defined hierarchical dimensions?
>
> Typically these are used to represent geography or organizational models.
> For example, where countries are the leaf nodes in a hierarchy and regions
> and continents are parent nodes culminating in a single top level 'world'
> node.
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Hierarchical-dimensions-in-Kylin-tp6870.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: When kylin.job.run.as.remote.cmd=true, all hadoop、hive、hbase commands can be in remote cluster?

2017-01-05 Thread Alberto Ramón

Upppsss,
Some times JAR Client and Server are the same file, (this is a performance
problem)
This can be a problem with docker, a lot of dependencies and program
runing, breakin the microservice philosophy

This can be help you for *future*:
https://github.com/apache/hbase/tree/master/hbase-client
https://issues.apache.org/jira/browse/HDFS-6200  Finally finished, nice ¡¡
Artifact org.apache.zookeeper.client (To Find HBase master)

*IDEA*: Query nodes, 'only' are used to 'resolve user queries', only need
access to HBase / Metatadata (¿?)




2017-01-05 2:33 GMT+01:00 ShaoFeng Shi :

> No necessary be hadoop nodes; The only thing is the node that running Kylin
> should have hadoop/hive/hbase clients installed and configured. Docker
> container is okay if it matches this.
>
> 2017-01-04 23:00 GMT+08:00 Alberto Ramón :
>
> > OK,
> >
> > I undertand that Kylin cluster mode must be on same nodes that hadoop
> > cluster. Is it correct?
> > I was thinked on put Kylin cluster mode on Docker Containers (Isolated
> from
> > Hadoop cluster)
> >
> > Thanks
> >
> > 2017-01-04 15:19 GMT+01:00 ShaoFeng Shi :
> >
> > > Here is the explainations in
> > > examples/test_case_data/sandbox/kylin.properties:
> > >
> > > # If true, job engine will not assume that hadoop CLI reside on the
> > > same server as it self
> > > # you will have to specify kylin.job.remote-cli-hostname,
> > > kylin.job.remote-cli-username and kylin.job.remote-cli-password
> > > # It should not be set to "true" unless you're NOT running Kylin.sh on
> > > a hadoop client machine
> > > # (Thus kylin instance has to ssh to another real hadoop client
> > > machine to execute hbase,hive,hadoop commands)
> > > kylin.job.use-remote-cli=false
> > >
> > > # Only necessary when kylin.job.use-remote-cli=true
> > > kylin.job.remote-cli-hostname=sandbox
> > >
> > > kylin.job.remote-cli-username=root
> > >
> > > # Only necessary when kylin.job.use-remote-cli=true
> > > kylin.job.remote-cli-password=hadoop
> > >
> > >
> > > Anyway, today Kylin need be running inner a Hadoop client machine.
> > > Using a jumpbox is not recommended and may not work, the only scenario
> > > is for development.
> > >
> > >
> > > 2017-01-04 17:14 GMT+08:00 Alberto Ramón :
> > >
> > > > is it the same remote mode than cluster mode ?
> > > > <http://kylin.apache.org/docs15/install/kylin_cluster.html>
> > > >
> > > >
> > > >
> > > > 2017-01-04 9:35 GMT+01:00 ShaoFeng Shi :
> > > >
> > > > > Hi Lei,
> > > > >
> > > > > The remote mode is not recommended and not supported in real
> > > deployment.
> > > > > (we only use it in dev environment because local laptop doesn't
> have
> > > > > hive/hbase/hadoop configured, need ssh to a remote sandbox vm to
> run
> > > > > commands).
> > > > >
> > > > > Query doesn't support it;
> > > > >
> > > > > 2017-01-04 15:18 GMT+08:00 张磊 <121762...@qq.com>:
> > > > >
> > > > > > Hi ,
> > > > > >  I create a aws emr cluster which contains hadoop、hive、hbase
> > client,
> > > > and
> > > > > i
> > > > > > install kylin instance in another ec2 instance. Can kylin work?
> > > > > >
> > > > > >
> > > > > > In kylin.properties, i find job engine can work in remote mode,
> but
> > > > query
> > > > > > engine can also work well?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >
> > > > > Shaofeng Shi 史少锋
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > Shaofeng Shi 史少锋
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>

Re: When kylin.job.run.as.remote.cmd=true, all hadoop、hive、hbase commands can be in remote cluster?

2017-01-04 Thread Alberto Ramón

OK,

I undertand that Kylin cluster mode must be on same nodes that hadoop
cluster. Is it correct?
I was thinked on put Kylin cluster mode on Docker Containers (Isolated from
Hadoop cluster)

Thanks

2017-01-04 15:19 GMT+01:00 ShaoFeng Shi :

> Here is the explainations in
> examples/test_case_data/sandbox/kylin.properties:
>
> # If true, job engine will not assume that hadoop CLI reside on the
> same server as it self
> # you will have to specify kylin.job.remote-cli-hostname,
> kylin.job.remote-cli-username and kylin.job.remote-cli-password
> # It should not be set to "true" unless you're NOT running Kylin.sh on
> a hadoop client machine
> # (Thus kylin instance has to ssh to another real hadoop client
> machine to execute hbase,hive,hadoop commands)
> kylin.job.use-remote-cli=false
>
> # Only necessary when kylin.job.use-remote-cli=true
> kylin.job.remote-cli-hostname=sandbox
>
> kylin.job.remote-cli-username=root
>
> # Only necessary when kylin.job.use-remote-cli=true
> kylin.job.remote-cli-password=hadoop
>
>
> Anyway, today Kylin need be running inner a Hadoop client machine.
> Using a jumpbox is not recommended and may not work, the only scenario
> is for development.
>
>
> 2017-01-04 17:14 GMT+08:00 Alberto Ramón :
>
> > is it the same remote mode than cluster mode ?
> > <http://kylin.apache.org/docs15/install/kylin_cluster.html>
> >
> >
> >
> > 2017-01-04 9:35 GMT+01:00 ShaoFeng Shi :
> >
> > > Hi Lei,
> > >
> > > The remote mode is not recommended and not supported in real
> deployment.
> > > (we only use it in dev environment because local laptop doesn't have
> > > hive/hbase/hadoop configured, need ssh to a remote sandbox vm to run
> > > commands).
> > >
> > > Query doesn't support it;
> > >
> > > 2017-01-04 15:18 GMT+08:00 张磊 <121762...@qq.com>:
> > >
> > > > Hi ,
> > > >  I create a aws emr cluster which contains hadoop、hive、hbase client,
> > and
> > > i
> > > > install kylin instance in another ec2 instance. Can kylin work?
> > > >
> > > >
> > > > In kylin.properties, i find job engine can work in remote mode, but
> > query
> > > > engine can also work well?
> > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > Shaofeng Shi 史少锋
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>

Re: When kylin.job.run.as.remote.cmd=true, all hadoop、hive、hbase commands can be in remote cluster?

2017-01-04 Thread Alberto Ramón

is it the same remote mode than cluster mode ?




2017-01-04 9:35 GMT+01:00 ShaoFeng Shi :

> Hi Lei,
>
> The remote mode is not recommended and not supported in real deployment.
> (we only use it in dev environment because local laptop doesn't have
> hive/hbase/hadoop configured, need ssh to a remote sandbox vm to run
> commands).
>
> Query doesn't support it;
>
> 2017-01-04 15:18 GMT+08:00 张磊 <121762...@qq.com>:
>
> > Hi ,
> >  I create a aws emr cluster which contains hadoop、hive、hbase client, and
> i
> > install kylin instance in another ec2 instance. Can kylin work?
> >
> >
> > In kylin.properties, i find job engine can work in remote mode, but query
> > engine can also work well?
>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>

Re: [jira] [Created] (KYLIN-2355) kylin cube build is failing at #3 Step Name: Extract Fact Table Distinct Columns

2017-01-04 Thread Alberto Ramón

See :

Kylin 2326 



MailList



2017-01-04 9:35 GMT+01:00 prasannaP (JIRA) :

> prasannaP created KYLIN-2355:
> 
>
>  Summary: kylin cube build is failing at #3 Step Name: Extract
> Fact Table Distinct Columns
>  Key: KYLIN-2355
>  URL: https://issues.apache.org/jira/browse/KYLIN-2355
>  Project: Kylin
>   Issue Type: Bug
> Reporter: prasannaP
>
>
> I am new to Kylin,I create kylin model and cube by following url,
>
> http://kylin.apache.org/
>
> every cube build is failing at this step only,i am not able to find the
> cause,
>
> #3 Step Name: Extract Fact Table Distinct Columns
>
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Re: Kylin Performance

2016-12-30 Thread Alberto Ramón

About Kylin performance, I completed some uses cases:


https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance


Any contribution or correction will be appreciated
BR, Alb

2016-12-28 11:32 GMT+01:00 Alberto Ramón :

> Don`t worry, I'm going to completed my KylinPerformace_I.pdf with new
> tests and some notes
>
> 2016-12-28 11:19 GMT+01:00 ShaoFeng Shi :
>
>> Alberto, the image can not be displayed :-<
>>
>> 2016-12-28 2:39 GMT+08:00 Alberto Ramón :
>>
>> > Kylin 2165 will be nice
>> >
>> > Yes 30% of total cube, because the cardinality of  DIM was low ( 2K and
>> > 11K)
>> >
>> > You are in true: When the cardinality of  DIM are 1M,  the intermediate
>> > table is only 5% of total: Picture (I don't know you can see pictures in
>> > this mailList)
>> > [image: Imágenes integradas 1]
>> >
>> >
>> > 2016-12-27 2:32 GMT+01:00 ShaoFeng Shi :
>> >
>> >> Alberto, I didn't test ORC format; but as you know, Kylin consumes the
>> >> source data row by row (all columns at once), so I guess columnar
>> format
>> >> like ORC may not benefit much. But this is a good try, if there is
>> better
>> >> format we can switch to it.
>> >>
>> >> The "redistribute flat hive table" will add time but it can reduce
>> time in
>> >> subsequent cube building (avoid data skew), especially when there are
>> lots
>> >> of records. Usually it is fast (a couple minutes to ten or twenty
>> minutes)
>> >> comparing to the cube build time. You mentioned it took 30% of total
>> time,
>> >> what's the total time and what's the input number? When the input is
>> >> small,
>> >> the overhead may overcome the benefit.
>> >>
>> >> For the method you mentioned (count on fact table, then put the
>> >> redistribute to step 1), actually it is supported in Kylin 1.5.4 (maybe
>> >> also 1.5.3) with a config parameter; but that method is not
>> recommended as
>> >> it is unstable: In some cases (e.g, the fact table is a big hive view,
>> or
>> >> it is a big table but not partitioned by date), a simple "select
>> count(*)
>> >> from fact_table" will cost lots of resources on Hadoop, a second
>> "create
>> >> intermediate_table as select ..." will start the same mappers again.
>> >>
>> >> In contrast, the as-is method is relatively stable for extreme case;
>> >> usually the intermediate table is much smaller than fact table, count
>> and
>> >> redistribute on it will be low-cost; In next version there will be a
>> >> further optimization (https://issues.apache.org/jira/browse/KYLIN-2165
>> )
>> >> to
>> >> reduce the time in this step.
>> >>
>> >>
>> >> 2016-12-27 1:20 GMT+08:00 Alberto Ramón :
>> >>
>> >> > Hello
>> >> >
>> >> > from v0, I correct english sintaxis
>> >> >
>> >> >
>> >> > After tunning of cube:
>> >> >   -  Use Hive input compress table
>> >> >   -  Define  Hierarchy, Joint, Dim
>> >> >   -  . . .
>> >> >
>> >> > Now:  57% if for first steps (flat table, steps: 1,2,3)  and 43% for
>> >> build
>> >> > cube
>> >> >
>> >> > I saw flat table uses SEQUENCEFILE, then I tested to use
>> >> >ORC,
>> >> >    ORC + Snappy
>> >> >ORC + Snappy + Vectorization
>> >> >
>> >> > without good results, more ideas ??
>> >> >
>> >> >
>> >> > I'm thinking that 'Redistribute Flat Hive Table' is a simple count
>> and
>> >> uses
>> >> >
>> >> > *30% of total time*
>> >> >   Is this the normal case ?
>> >> >   We can aprox this count to: count of Fact Table (Will true 99% of
>> >> time),
>> >> > and put in // with step 1, is necessary be precise?
>> >> >
>> >> > 2016-12-22 14:00 GMT+01:00 Li Yang :
>> >> >
>> >> > > Very good work!
>> >> > >
>> >> > > Btw, we are also doing benchmarks on SSB and TPC-H data sets,
>> based on
>> >> > > below work. Will share more info soon.
>> >> > &g

Re: docker version of apache kylin 1.6.0

2016-12-29 Thread Alberto Ramón

You can use my docker image (
https://github.com/albertoRamon/Kylin/tree/master/KylinWithDocker)
It runs OK, and uses Kylin  1.6.0
Notes about install, upgrade, problems,  start/stop (
https://github.com/albertoRamon/Kylin/blob/master/KylinWithDocker/Readme_ARP.txt
)

Also kyligence people have docker image (
https://github.com/Kyligence/kylin-docker/)

2016-12-29 19:51 GMT+01:00 morotor :

> how can I create a docker version of apache kylin 1.6.0??
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/docker-version-of-apache-kylin-1-6-0-tp6794.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: Kylin Performance

2016-12-28 Thread Alberto Ramón

Don`t worry, I'm going to completed my KylinPerformace_I.pdf with new tests
and some notes

2016-12-28 11:19 GMT+01:00 ShaoFeng Shi :

> Alberto, the image can not be displayed :-<
>
> 2016-12-28 2:39 GMT+08:00 Alberto Ramón :
>
> > Kylin 2165 will be nice
> >
> > Yes 30% of total cube, because the cardinality of  DIM was low ( 2K and
> > 11K)
> >
> > You are in true: When the cardinality of  DIM are 1M,  the intermediate
> > table is only 5% of total: Picture (I don't know you can see pictures in
> > this mailList)
> > [image: Imágenes integradas 1]
> >
> >
> > 2016-12-27 2:32 GMT+01:00 ShaoFeng Shi :
> >
> >> Alberto, I didn't test ORC format; but as you know, Kylin consumes the
> >> source data row by row (all columns at once), so I guess columnar format
> >> like ORC may not benefit much. But this is a good try, if there is
> better
> >> format we can switch to it.
> >>
> >> The "redistribute flat hive table" will add time but it can reduce time
> in
> >> subsequent cube building (avoid data skew), especially when there are
> lots
> >> of records. Usually it is fast (a couple minutes to ten or twenty
> minutes)
> >> comparing to the cube build time. You mentioned it took 30% of total
> time,
> >> what's the total time and what's the input number? When the input is
> >> small,
> >> the overhead may overcome the benefit.
> >>
> >> For the method you mentioned (count on fact table, then put the
> >> redistribute to step 1), actually it is supported in Kylin 1.5.4 (maybe
> >> also 1.5.3) with a config parameter; but that method is not recommended
> as
> >> it is unstable: In some cases (e.g, the fact table is a big hive view,
> or
> >> it is a big table but not partitioned by date), a simple "select
> count(*)
> >> from fact_table" will cost lots of resources on Hadoop, a second "create
> >> intermediate_table as select ..." will start the same mappers again.
> >>
> >> In contrast, the as-is method is relatively stable for extreme case;
> >> usually the intermediate table is much smaller than fact table, count
> and
> >> redistribute on it will be low-cost; In next version there will be a
> >> further optimization (https://issues.apache.org/jira/browse/KYLIN-2165)
> >> to
> >> reduce the time in this step.
> >>
> >>
> >> 2016-12-27 1:20 GMT+08:00 Alberto Ramón :
> >>
> >> > Hello
> >> >
> >> > from v0, I correct english sintaxis
> >> >
> >> >
> >> > After tunning of cube:
> >> >   -  Use Hive input compress table
> >> >   -  Define  Hierarchy, Joint, Dim
> >> >   -  . . .
> >> >
> >> > Now:  57% if for first steps (flat table, steps: 1,2,3)  and 43% for
> >> build
> >> > cube
> >> >
> >> > I saw flat table uses SEQUENCEFILE, then I tested to use
> >> >ORC,
> >> >ORC + Snappy
> >> >ORC + Snappy + Vectorization
> >> >
> >> > without good results, more ideas ??
> >> >
> >> >
> >> > I'm thinking that 'Redistribute Flat Hive Table' is a simple count and
> >> uses
> >> >
> >> > *30% of total time*
> >> >   Is this the normal case ?
> >> >   We can aprox this count to: count of Fact Table (Will true 99% of
> >> time),
> >> > and put in // with step 1, is necessary be precise?
> >> >
> >> > 2016-12-22 14:00 GMT+01:00 Li Yang :
> >> >
> >> > > Very good work!
> >> > >
> >> > > Btw, we are also doing benchmarks on SSB and TPC-H data sets, based
> on
> >> > > below work. Will share more info soon.
> >> > >
> >> > > - http://www.cs.umb.edu/~poneil/StarSchemaB.PDF
> >> > > - https://github.com/hortonworks/hive-testbench
> >> > >
> >> > >
> >> > > Cheers
> >> > > Yang
> >> > >
> >> > > On Wed, Dec 21, 2016 at 8:45 PM, Alberto Ramón <
> >> > a.ramonporto...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > When Kylin 2149 <https://issues.apache.org/jira/browse/KYLIN-2149
> >
> >> > will
> >> > > be
> >> > > > solved the performance will be* improve even more*, because:
> >> > > >
> >> > > > you know that 2016-05-05 Belongs to May, Week 18, and friday , but
> >> > kylin
> >> > > > doesnt know it
> >> > > > It will try to calulate the combination of 2016-05-05 with January
> >> > > February
> >> > > > March, ... Monday Tuesday ..., W1 W2 ..., Q2 Q3 Q4 ==> There are a
> >> lot
> >> > of
> >> > > > combination wasted
> >> > > >
> >> > > > 2016-12-21 12:57 GMT+01:00 Luke_Selina  >:
> >> > > >
> >> > > > > Great and Agree! But I still have an question like Alberto, why
> >> in an
> >> > > AGG
> >> > > > > one
> >> > > > > dim can use only one regulation(mandatory, join, hierachy)?
> >> > > > >
> >> > > > > --
> >> > > > > View this message in context: http://apache-kylin.74782.x6.
> >> > > > > nabble.com/Kylin-Performance-tp6713p6728.html
> >> > > > > Sent from the Apache Kylin mailing list archive at Nabble.com.
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Best regards,
> >>
> >> Shaofeng Shi 史少锋
> >>
> >
> >
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>

Re: Kylin Performance

2016-12-27 Thread Alberto Ramón

Kylin 2165 will be nice

Yes 30% of total cube, because the cardinality of  DIM was low ( 2K and 11K)

You are in true: When the cardinality of  DIM are 1M,  the intermediate
table is only 5% of total: Picture (I don't know you can see pictures in
this mailList)
[image: Imágenes integradas 1]


2016-12-27 2:32 GMT+01:00 ShaoFeng Shi :

> Alberto, I didn't test ORC format; but as you know, Kylin consumes the
> source data row by row (all columns at once), so I guess columnar format
> like ORC may not benefit much. But this is a good try, if there is better
> format we can switch to it.
>
> The "redistribute flat hive table" will add time but it can reduce time in
> subsequent cube building (avoid data skew), especially when there are lots
> of records. Usually it is fast (a couple minutes to ten or twenty minutes)
> comparing to the cube build time. You mentioned it took 30% of total time,
> what's the total time and what's the input number? When the input is small,
> the overhead may overcome the benefit.
>
> For the method you mentioned (count on fact table, then put the
> redistribute to step 1), actually it is supported in Kylin 1.5.4 (maybe
> also 1.5.3) with a config parameter; but that method is not recommended as
> it is unstable: In some cases (e.g, the fact table is a big hive view, or
> it is a big table but not partitioned by date), a simple "select count(*)
> from fact_table" will cost lots of resources on Hadoop, a second "create
> intermediate_table as select ..." will start the same mappers again.
>
> In contrast, the as-is method is relatively stable for extreme case;
> usually the intermediate table is much smaller than fact table, count and
> redistribute on it will be low-cost; In next version there will be a
> further optimization (https://issues.apache.org/jira/browse/KYLIN-2165) to
> reduce the time in this step.
>
>
> 2016-12-27 1:20 GMT+08:00 Alberto Ramón :
>
> > Hello
> >
> > from v0, I correct english sintaxis
> >
> >
> > After tunning of cube:
> >   -  Use Hive input compress table
> >   -  Define  Hierarchy, Joint, Dim
> >   -  . . .
> >
> > Now:  57% if for first steps (flat table, steps: 1,2,3)  and 43% for
> build
> > cube
> >
> > I saw flat table uses SEQUENCEFILE, then I tested to use
> >ORC,
> >ORC + Snappy
> >ORC + Snappy + Vectorization
> >
> > without good results, more ideas ??
> >
> >
> > I'm thinking that 'Redistribute Flat Hive Table' is a simple count and
> uses
> >
> > *30% of total time*
> >   Is this the normal case ?
> >   We can aprox this count to: count of Fact Table (Will true 99% of
> time),
> > and put in // with step 1, is necessary be precise?
> >
> > 2016-12-22 14:00 GMT+01:00 Li Yang :
> >
> > > Very good work!
> > >
> > > Btw, we are also doing benchmarks on SSB and TPC-H data sets, based on
> > > below work. Will share more info soon.
> > >
> > > - http://www.cs.umb.edu/~poneil/StarSchemaB.PDF
> > > - https://github.com/hortonworks/hive-testbench
> > >
> > >
> > > Cheers
> > > Yang
> > >
> > > On Wed, Dec 21, 2016 at 8:45 PM, Alberto Ramón <
> > a.ramonporto...@gmail.com>
> > > wrote:
> > >
> > > > When Kylin 2149 <https://issues.apache.org/jira/browse/KYLIN-2149>
> > will
> > > be
> > > > solved the performance will be* improve even more*, because:
> > > >
> > > > you know that 2016-05-05 Belongs to May, Week 18, and friday , but
> > kylin
> > > > doesnt know it
> > > > It will try to calulate the combination of 2016-05-05 with January
> > > February
> > > > March, ... Monday Tuesday ..., W1 W2 ..., Q2 Q3 Q4 ==> There are a
> lot
> > of
> > > > combination wasted
> > > >
> > > > 2016-12-21 12:57 GMT+01:00 Luke_Selina :
> > > >
> > > > > Great and Agree! But I still have an question like Alberto, why in
> an
> > > AGG
> > > > > one
> > > > > dim can use only one regulation(mandatory, join, hierachy)?
> > > > >
> > > > > --
> > > > > View this message in context: http://apache-kylin.74782.x6.
> > > > > nabble.com/Kylin-Performance-tp6713p6728.html
> > > > > Sent from the Apache Kylin mailing list archive at Nabble.com.
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>

Re: [jira] [Created] (KYLIN-2324) Kylin web hang with no error after query

2016-12-27 Thread Alberto Ramón

sounds like,  HBase / Region server is stop / dead and cant read Kylin
metadata
can you test if this is the reason ? start region server should be work OK

2016-12-27 4:18 GMT+01:00 hoangle (JIRA) :

> hoangle created KYLIN-2324:
> --
>
>  Summary: Kylin web hang with no error after  query
>  Key: KYLIN-2324
>  URL: https://issues.apache.org/jira/browse/KYLIN-2324
>  Project: Kylin
>   Issue Type: Bug
> Affects Versions: v1.6.0
>  Environment: CentOS
> Reporter: hoangle
>
>
> After take some query, Kylin web is very slow. I can not load cube and do
> anything.
> Please check jstack and image for detail.
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Re: Kylin Performance

2016-12-26 Thread Alberto Ramón

Hello

from v0, I correct english sintaxis


After tunning of cube:
  -  Use Hive input compress table
  -  Define  Hierarchy, Joint, Dim
  -  . . .

Now:  57% if for first steps (flat table, steps: 1,2,3)  and 43% for build
cube

I saw flat table uses SEQUENCEFILE, then I tested to use
   ORC,
   ORC + Snappy
   ORC + Snappy + Vectorization

without good results, more ideas ??


I'm thinking that 'Redistribute Flat Hive Table' is a simple count and uses

*30% of total time*
  Is this the normal case ?
  We can aprox this count to: count of Fact Table (Will true 99% of time),
and put in // with step 1, is necessary be precise?

2016-12-22 14:00 GMT+01:00 Li Yang :

> Very good work!
>
> Btw, we are also doing benchmarks on SSB and TPC-H data sets, based on
> below work. Will share more info soon.
>
> - http://www.cs.umb.edu/~poneil/StarSchemaB.PDF
> - https://github.com/hortonworks/hive-testbench
>
>
> Cheers
> Yang
>
> On Wed, Dec 21, 2016 at 8:45 PM, Alberto Ramón 
> wrote:
>
> > When Kylin 2149 <https://issues.apache.org/jira/browse/KYLIN-2149> will
> be
> > solved the performance will be* improve even more*, because:
> >
> > you know that 2016-05-05 Belongs to May, Week 18, and friday , but kylin
> > doesnt know it
> > It will try to calulate the combination of 2016-05-05 with January
> February
> > March, ... Monday Tuesday ..., W1 W2 ..., Q2 Q3 Q4 ==> There are a lot of
> > combination wasted
> >
> > 2016-12-21 12:57 GMT+01:00 Luke_Selina :
> >
> > > Great and Agree! But I still have an question like Alberto, why in an
> AGG
> > > one
> > > dim can use only one regulation(mandatory, join, hierachy)?
> > >
> > > --
> > > View this message in context: http://apache-kylin.74782.x6.
> > > nabble.com/Kylin-Performance-tp6713p6728.html
> > > Sent from the Apache Kylin mailing list archive at Nabble.com.
> > >
> >
>

Re: Kylin Performance

2016-12-21 Thread Alberto Ramón

When Kylin 2149  will be
solved the performance will be* improve even more*, because:

you know that 2016-05-05 Belongs to May, Week 18, and friday , but kylin
doesnt know it
It will try to calulate the combination of 2016-05-05 with January February
March, ... Monday Tuesday ..., W1 W2 ..., Q2 Q3 Q4 ==> There are a lot of
combination wasted

2016-12-21 12:57 GMT+01:00 Luke_Selina :

> Great and Agree! But I still have an question like Alberto, why in an AGG
> one
> dim can use only one regulation(mandatory, join, hierachy)?
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Kylin-Performance-tp6713p6728.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: Kylin Performance

2016-12-21 Thread Alberto Ramón

Yes (thanks for your help)

My fact table is only 3.9 Millons, I will try Cube_06 with more data
One of my Dim have 800K rows, I want test create this Dim with Buckets on
Hive

2016-12-21 11:25 GMT+01:00 ShaoFeng Shi :

> Hi Alberto, this is a Great test, the only issue might be the data set is
> too small for Kylin, but the conclusion are the same, like a) enable
> compression can improve overall performance; b) optimize the cube design
> with "hierarchy"/"joint" can reduce the calculations and storage, etc
>
> For "Cube_06" test, usually partition is used for table which has huge
> amount of data (partition can be used for data pruning); Lookup tables
> don't need be partitioned: making all records in 1 single file will be more
> efficient than diving them into 70 files;  \
>
> If you want to compare hive parition/non-partition, suggest you find a
> bigger fact table, e.g 5 or 10 million rows;
>
> 2016-12-21 17:53 GMT+08:00 Alberto Ramón :
>
> > I attached as PDF,  ... I don't know it this is forbidden in MailList
> >
> > googleDrive
> > <https://drive.google.com/drive/folders/0B-6nZ2q-
> HPTNem1KTTRHbDhpOG8?usp=
> > sharing>
> >  (tell me if there is any problem)
> >
> > 2016-12-21 10:27 GMT+01:00 ShaoFeng Shi :
> >
> > > Hi Alberto, where I can preview this doc? Thanks!
> > >
> > > 2016-12-21 6:46 GMT+08:00 Alberto Ramón :
> > >
> > > > I made a small tech notes about my performance tests, the doc is
> > > > unfinished (I need more time, test and knowledge)
> > > > Review my English mistakes is pending
> > > >
> > > > If somebody have any comment, test, more experience , ... feel free
> > make
> > > > any suggestion
> > > >
> > > > Alb
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > Shaofeng Shi 史少锋
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>

Re: Kylin Performance

2016-12-21 Thread Alberto Ramón

I attached as PDF,  ... I don't know it this is forbidden in MailList

googleDrive
<https://drive.google.com/drive/folders/0B-6nZ2q-HPTNem1KTTRHbDhpOG8?usp=sharing>
 (tell me if there is any problem)

2016-12-21 10:27 GMT+01:00 ShaoFeng Shi :

> Hi Alberto, where I can preview this doc? Thanks!
>
> 2016-12-21 6:46 GMT+08:00 Alberto Ramón :
>
> > I made a small tech notes about my performance tests, the doc is
> > unfinished (I need more time, test and knowledge)
> > Review my English mistakes is pending
> >
> > If somebody have any comment, test, more experience , ... feel free make
> > any suggestion
> >
> > Alb
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>

Kylin Performance

2016-12-20 Thread Alberto Ramón

I made a small tech notes about my performance tests, the doc is unfinished
(I need more time, test and knowledge)
Review my English mistakes is pending

If somebody have any comment, test, more experience , ... feel free make
any suggestion

Alb

Re: Re: Create Aggr Func(SUM,MIN,MAX) For Every Measure

2016-12-18 Thread Alberto Ramón

yes, Nowadays, I'm doing  a performance testing creating a lot of cubes,
changing configuration of dimensions to evaluate their impact on building
time  and size of cube

I'm bored of Create Cube Wizard  :(, a lot of clicks  !!

Some small changes:

* Default names of columns. I think this will be a new feature in 1.6.1  ??
* Add Normal and Derived Dim, can be faster if see all available columns
and tables in the main windows (Similar to add Joint Dim)
* Add Measures UI is very heavy ...  Default names? add all measures of the
one column at same time (no measure by measure)
 Other people will have a lot of better ideas, I'm sure !!

Did I dream that you will load / save the Create Cube configuration wizard
to a file using UI ? now I can't find JIRA.
Will be great save & load configuration of unfinished cubes

2016-12-18 11:43 GMT+01:00 Luke Han :

> There's REST API for cube creation, you could try to write some scripts to
> bulk load measures.
>
> Thanks.
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Sun, Dec 18, 2016 at 2:23 PM, Li Yang  wrote:
>
> > Btw it is not often that MIN/MAX is as useful as SUM. For the sake of
> build
> > time and storage, you may not want to add MAX/MIN for every measure.
> >
> > On Tue, Dec 13, 2016 at 9:44 PM, 汪胜  wrote:
> >
> > > Thanks for your suggestion! I'll try this way.
> > >
> > >
> > >
> > > 在2016年12月13日 21:26， Billy Liu写道：
> > >
> > > Maybe you could edit the cube json directly, but be careful always.
> > >
> > > 2016-12-13 15:30 GMT+08:00 汪胜 :
> > >
> > > > Hello all,
> > > > I have many measures(about fifty), and I have to create SUM, MIN
> > and
> > > > MAX for every measure. Is there any good ways to create them at a
> time
> > > > instead of create one by one?
> > > >
> > > >
> > >
> >
>

Re: Can kylin intermediate tables in hive be deleted ?

2016-12-13 Thread Alberto Ramón

You will need execute cleanUp Storage


Can be made with Kylin Online? Yes

In this mailList
you
will find extra info


2016-12-13 10:14 GMT+01:00 Luke_Selina :

> 
>
> Hi all, just as the pic shows, can these intermediate hive tables can be
> deleted manually?
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Can-kylin-intermediate-tables-in-hive-be-deleted-tp6617.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: kylin 1.6.0 cardinality can't greater than 5000000 ?

2016-12-08 Thread Alberto Ramón

Humm, you can try this:

With Kylin 1705  you can
use Global dictionary Builder, which support 2 Billons of values (versus
previous dic 5 Millons)

In Teorical you can migrate from old dics (Kylin 1775
 )

2016-12-08 7:57 GMT+01:00 wang...@snqu.com :

> I improved the version from 1.5.4.1 to 1.6.0 and modified KYLIN_HOME,
> and modied "kylin.dictionary.max.cardinality=500" to
>  "kylin.dictionary.max.cardinality=3000" in file kylin.properties,
> then start kylin 1.6-->create model-->create cube-->build cube
>I got the following error message:
>
> java.lang.RuntimeException: Failed to create dictionary on
> DEFAULT.TEST_500W_TBL.ROWKEY
> at org.apache.kylin.dict.DictionaryManager.buildDictionary(
> DictionaryManager.java:325)
> at org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:222)
> at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(
> DictionaryGeneratorCLI.java:50)
> at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(
> DictionaryGeneratorCLI.java:41)
> at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(
> CreateDictionaryJob.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.kylin.engine.mr.common.HadoopShellExecutable.
> doWork(HadoopShellExecutable.java:63)
> at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:113)
> at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> DefaultChainedExecutable.java:57)
> at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:113)
> at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(
> DefaultScheduler.java:136)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Too high cardinality is
> not suitable for dictionary -- cardinality: 5359970
> at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> DictionaryGenerator.java:96)
>
>
>
>

Re: A question about establishing Kylin

2016-12-06 Thread Alberto Ramón

Hi

(I never test with HDP 2.1 with kylin)
With HDP 2.2 (HBase 0.98.4.2.2.0) works OK
[image: Imágenes integradas 1]
[image: Imágenes integradas 2]

Perhaps you can try HDP 2.1 to HDP2.2


2016-12-06 3:05 GMT+01:00 rockteen :

> Hi,
>
> Maybe I've just found the reason, the hadoop I was working is HDP 2.1 so it
> might not be compatible with the latest versions of Kylin. From reading the
> release notes I noticed that in v1.5.0 Kylin had "update hdp version in
> test
> cases to 2.2.4", does that mean the earlier versions such as v1.3.0 will
> still support HDP 2.1?
>
> Regards
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/A-question-about-establishing-Kylin-tp6499p6503.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: select * clause still case all regionserver crash

2016-12-04 Thread Alberto Ramón

about KYLIN-1936: what make this change ?
   (keep in mind groups by? keep in mind differents regions ? what is the
new limit ?)


Thanks

2016-12-04 9:39 GMT+01:00 roger shi :

> The limit push down issue is completely resolved in JIRA KYLIN-1936, which
> is applied in 1.5.4. So please try kylin 1.5.4.
>
> On 02/12/2016, 5:51 PM, "ShaoFeng Shi"  wrote:
>
> I remember hongbin has further optimization on the limit push down
> after
> 1.5.3; @hongbin, can you confirm this?
>
> 2016-12-02 17:02 GMT+08:00 alaleiwang :
>
> > i can see each regionserver scan over 2000+
> >
> > the query "select *"  is called from tableau used by non-technical
> > analyst,so we can't force them to bypass such clause  or to do
> something
> > from tableau ...
> >
> > how can "limit" work for "select *" without "where"?
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-kylin.74782.x6.
> > nabble.com/select-clause-still-cause-all-regionserver-
> > crash-tp6474p6481.html
> > Sent from the Apache Kylin mailing list archive at Nabble.com.
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>
>

Re: select * clause still case all regionserver crash

2016-12-02 Thread Alberto Ramón

:)  We have some similiar problem with non-IT
"Why do you want a result table of 1 Millon of rows ¿? "

I don't Know any solution.
The Q is: this is a problem/bug/issue of Kylin ? I think NO, the target of
this is protect to coprocesor (is mandatory)

Select * is not addecuate for Kylin
Impala, Hive, Cassandra, presto, ... Is the best solution for this: View
raw data



2016-12-02 10:02 GMT+01:00 alaleiwang :

> i can see each regionserver scan over 2000+
>
> the query "select *"  is called from tableau used by non-technical
> analyst,so we can't force them to bypass such clause  or to do something
> from tableau ...
>
> how can "limit" work for "select *" without "where"?
>
>
>
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/select-clause-still-cause-all-regionserver-
> crash-tp6474p6481.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

1 2 >

1 - 100 of 140 matches

Mail list logo