退订

2023-05-23 Thread L
退订
| |
L
|
|
邮箱:yanyi...@163.com
|

kylin????oracle????????????????????

2019-11-21 Thread ???????l??
kylin

  
kylinkylin??
  
  
??oracleJDBC??load??kylin??buildCube??sql??


??

(无主题)

2018-08-08 Thread L
please remove my email addres from the recevie list.The auto remove email seems 
not work.Thank you.


| |
L
|
|
邮箱:yanyi...@163.com
|

签名由 网易邮箱大师 定制

where is the source code kylin-2.0.0-hbase1x

2017-05-03 Thread xl l
 hi,all:
In http://kylin.apache.org/download/
apache-kylin-2.0.0-bin-hbase098.tar.gz  source code in:
https://github.com/apache/kylin/  tag:* kylin-2.0.0-hbase0.98* 。
But  I can't find   tag :  *apache-kylin-2.0.0-bin-hbase1x * in github

so where  is the source code* apache-kylin-2.0.0-hbase1x*?

which hbase version is tag* kylin-2.0.0 * ?



-- 
* Best Wishes*


Re: kylin sql query with Weird error

2017-02-16 Thread xl l
hi,Billy Liu
  thanks.  I did the same sql with sample cube, and the query result is
also error .

sql :
select
META_CATEG_NAME,
sum(price) as total_selled,
count(distinct seller_id) as sellers
from kylin_sales
inner join KYLIN_CATEGORY_GROUPINGS
on KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID=KYLIN_SALES.LEAF_CATEG_ID
and KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID
where part_dt>='2012-01-01'
and part_dt<='2013-01-01'
and KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME in ('Toys & Hobbies','Cameras
& Photo')
group by part_dt,KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME


detail description see :
http://note.youdao.com/noteshare?id=df34c64dcf3cf801a9c085be0c3f5f21=7BB3043221BA44E4BAF5760339280480




2017-02-16 23:27 GMT+08:00 Billy Liu <billy...@apache.org>:

> Could you reproduce this issue on the sample cube? That would help the dev
> team to identify the root cause quickly.
>
> 2017-02-16 20:30 GMT+08:00 xl l <yours...@gmail.com>:
>
> > HI, I am sure hbase is ok.
> > 而且只有这个sql抛异常,且能稳定复现。 sql稍微改一下,就正常。
> >
> > 从异常日志看, 首先抛出异常的是
> > Caused by: java.lang.NullPointerException
> > at com.google.common.base.Preconditions.checkNotNull(
> > Preconditions.java:191)
> > at
> > org.apache.kylin.storage.hbase.cube.v2.HBaseReadonlyStore$1$1.next(
> > HBaseReadonlyStore.java:131)
> >
> > 对应于 代码:
> >
> > Pair<byte[], byte[]> hbaseColumn = hbaseColumns.get(i);
> > Cell cell = findCell(oneRow, hbaseColumn.getFirst(),
> > hbaseColumn.getSecond());
> > Preconditions.checkNotNull(cell);
> >
> > cell 啥时候 会为空?
> >
> >
> >
> >
> >
> > 2017-02-16 17:47 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:
> >
> > > "java.net.SocketTimeoutException", did you check HBase's healthy
> status?
> > > Are all regions of the table "KYLIN_F2H6NPOLR7" online?
> > >
> > > 2017-02-16 16:10 GMT+08:00 xl l <yours...@gmail.com>:
> > >
> > > > see  :
> > > >
> > > > http://note.youdao.com/noteshare?id=df34c64dcf3cf801a9c085be0c3f5f
> > > 21=
> > > > 7BB3043221BA44E4BAF5760339280480
> > > >
> > > >
> > > > kylin 1.6 问题记录:
> > > > 版本:apache-kylin-1.6.0-hbase1.x-bin
> > > >
> > > >
> > > > select
> > > > cast(SUM(pv) as double) as pv,
> > > > cast( count(distinct user_id) as double) as user_id
> > > > from olap.olap_log_accs_page_di
> > > > inner join DIM.DIM_LOG_USER_LOCATION on
> > > > DIM.DIM_LOG_USER_LOCATION.user_city_code=olap.olap_log_
> > > > accs_page_di.location
> > > >
> > > > inner join DIM.DIM_PUBLIC_DATE_INFO on
> > > > DIM.DIM_PUBLIC_DATE_INFO."DATE"=olap.olap_log_accs_page_di."DATE"
> > > > where
> > > > DIM.DIM_PUBLIC_DATE_INFO."DATE" >=20170117
> > > > and DIM.DIM_PUBLIC_DATE_INFO."DATE" <=20170215
> > > > and DIM.DIM_LOG_USER_LOCATION.user_region_name in ('华东')
> > > > group by
> > > > DIM.DIM_LOG_USER_LOCATION.user_country_name,DIM.DIM_LOG_
> > > > USER_LOCATION.user_province_name,DIM.DIM_LOG_USER_
> > > > LOCATION.user_region_name
> > > >
> > > > order by DIM.DIM_LOG_USER_LOCATION.user_province_name ASC
> > > >
> > > > 上面这个sql执行OK,符合预期。
> > > > 但是 如果 仅仅把 in ('华东') 改成 in ('华东','华南') 则 sql执行就会报错。
> > > >
> > > >
> > > > 错误信息如下所示:
> > > >
> > > >
> > > > 查看 kylin.log日志, 详细的异常信息 :
> > > > http://note.youdao.com/noteshare?id=a1c257599774c4bccb0c6763923359
> > > d5=
> > > > 11C6AA36AC894EDD9006DDAE17B16747
> > > > 2017-02-16 15:32:51,080 WARN [kylin-coproc--pool3-t5578]
> > > > ipc.CoprocessorRpcChannel:58 : Call failed on IOException
> > > > java.net.SocketTimeoutException: callTimeout=6,
> > callDuration=114625:
> > > > row ' ' on table 'KYLIN_F2H6NPOLR7' at
> > > > region=KYLIN_F2H6NPOLR7,,1487168742102.
> 433e266be82448c5380610e9e77046
> > > 58.,
> > > > hostname=jx-db-hbase03.22lll.com,16020,1480406440673, seqNum=2
> > > > at
> > > > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(
> > > > RpcRetryingCaller.java:159)
> > > > at
> > > > org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.
> > callExecService(
> > > > RegionCoprocessorRpcChannel.java:95)
> > > > at
> > > > org.

Re: kylin sql query with Weird error

2017-02-16 Thread xl l
HI, I am sure hbase is ok.
而且只有这个sql抛异常,且能稳定复现。 sql稍微改一下,就正常。

从异常日志看, 首先抛出异常的是
Caused by: java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191)
at
org.apache.kylin.storage.hbase.cube.v2.HBaseReadonlyStore$1$1.next(HBaseReadonlyStore.java:131)

对应于 代码:

Pair<byte[], byte[]> hbaseColumn = hbaseColumns.get(i);
Cell cell = findCell(oneRow, hbaseColumn.getFirst(), hbaseColumn.getSecond());
Preconditions.checkNotNull(cell);

cell 啥时候 会为空?





2017-02-16 17:47 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>:

> "java.net.SocketTimeoutException", did you check HBase's healthy status?
> Are all regions of the table "KYLIN_F2H6NPOLR7" online?
>
> 2017-02-16 16:10 GMT+08:00 xl l <yours...@gmail.com>:
>
> > see  :
> >
> > http://note.youdao.com/noteshare?id=df34c64dcf3cf801a9c085be0c3f5f
> 21=
> > 7BB3043221BA44E4BAF5760339280480
> >
> >
> > kylin 1.6 问题记录:
> > 版本:apache-kylin-1.6.0-hbase1.x-bin
> >
> >
> > select
> > cast(SUM(pv) as double) as pv,
> > cast( count(distinct user_id) as double) as user_id
> > from olap.olap_log_accs_page_di
> > inner join DIM.DIM_LOG_USER_LOCATION on
> > DIM.DIM_LOG_USER_LOCATION.user_city_code=olap.olap_log_
> > accs_page_di.location
> >
> > inner join DIM.DIM_PUBLIC_DATE_INFO on
> > DIM.DIM_PUBLIC_DATE_INFO."DATE"=olap.olap_log_accs_page_di."DATE"
> > where
> > DIM.DIM_PUBLIC_DATE_INFO."DATE" >=20170117
> > and DIM.DIM_PUBLIC_DATE_INFO."DATE" <=20170215
> > and DIM.DIM_LOG_USER_LOCATION.user_region_name in ('华东')
> > group by
> > DIM.DIM_LOG_USER_LOCATION.user_country_name,DIM.DIM_LOG_
> > USER_LOCATION.user_province_name,DIM.DIM_LOG_USER_
> > LOCATION.user_region_name
> >
> > order by DIM.DIM_LOG_USER_LOCATION.user_province_name ASC
> >
> > 上面这个sql执行OK,符合预期。
> > 但是 如果 仅仅把 in ('华东') 改成 in ('华东','华南') 则 sql执行就会报错。
> >
> >
> > 错误信息如下所示:
> >
> >
> > 查看 kylin.log日志, 详细的异常信息 :
> > http://note.youdao.com/noteshare?id=a1c257599774c4bccb0c6763923359
> d5=
> > 11C6AA36AC894EDD9006DDAE17B16747
> > 2017-02-16 15:32:51,080 WARN [kylin-coproc--pool3-t5578]
> > ipc.CoprocessorRpcChannel:58 : Call failed on IOException
> > java.net.SocketTimeoutException: callTimeout=6, callDuration=114625:
> > row ' ' on table 'KYLIN_F2H6NPOLR7' at
> > region=KYLIN_F2H6NPOLR7,,1487168742102.433e266be82448c5380610e9e77046
> 58.,
> > hostname=jx-db-hbase03.22lll.com,16020,1480406440673, seqNum=2
> > at
> > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(
> > RpcRetryingCaller.java:159)
> > at
> > org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(
> > RegionCoprocessorRpcChannel.java:95)
> > at
> > org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callMethod(
> > CoprocessorRpcChannel.java:56)
> > at
> > org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.
> > CubeVisitProtos$CubeVisitService$Stub.visitCube(CubeVisitProtos.
> java:4178)
> >
> > Caused by: java.lang.NullPointerException
> > at com.google.common.base.Preconditions.checkNotNull(
> > Preconditions.java:191)
> > at
> > org.apache.kylin.storage.hbase.cube.v2.HBaseReadonlyStore$1$1.next(
> > HBaseReadonlyStore.java:131)
> >
> >
> >
> > 根据kylin.log 异常,我的的判断定位,kylin源码中:HBaseReadonlyStore
> >
> > 麻烦帮忙排查,顺便咨询一下,这个源码中 cell啥情况下会为空?
> >
> >
> >
> >
> > 附上该sql其他情况下:
> > 如果 把sql 中 in ('华东') 改成 in ('华东','华南') 后 ,同时 select sum,count(distinct)
> 两个指标
> > 只出现一个 指标, 也不会报错。
> >
> > 补充说明:在kylin 1.5.4.1 版本,该诡异现象 也存在。
> >
> >
> > 附上 cube_desc 详细信息:
> >
> > { "uuid": "f30d538b-5345-4f77-b8e3-b20ebae8cb8e", "last_modified":
> > 1487147245924, "version": "1.6.0", "name":
> > "olap_log_accs_page_di_cube_0215", "model_name":
> > "olap_log_accs_page_di_cube_0215", "description":
> > "olap_log_accs_page_di_cube_0215", "null_string": null, "dimensions": [
> {
> > "name": "YEAR", "table": "DIM.DIM_PUBLIC_DATE_INFO", "column": "YEAR",
> > "derived": null }, { "name": "QUARTER", "table":
> > "DIM.DIM_PUBLIC_DATE_INFO", "column": "QUARTER_CN", "derived": null }, {
> > "name": "MONTH", "table": "DIM.DIM_PU

kylin sql query with Weird error

2017-02-16 Thread xl l
see  :

http://note.youdao.com/noteshare?id=df34c64dcf3cf801a9c085be0c3f5f21=7BB3043221BA44E4BAF5760339280480


kylin 1.6 问题记录:
版本:apache-kylin-1.6.0-hbase1.x-bin


select
cast(SUM(pv) as double) as pv,
cast( count(distinct user_id) as double) as user_id
from olap.olap_log_accs_page_di
inner join DIM.DIM_LOG_USER_LOCATION on
DIM.DIM_LOG_USER_LOCATION.user_city_code=olap.olap_log_accs_page_di.location

inner join DIM.DIM_PUBLIC_DATE_INFO on
DIM.DIM_PUBLIC_DATE_INFO."DATE"=olap.olap_log_accs_page_di."DATE"
where
DIM.DIM_PUBLIC_DATE_INFO."DATE" >=20170117
and DIM.DIM_PUBLIC_DATE_INFO."DATE" <=20170215
and DIM.DIM_LOG_USER_LOCATION.user_region_name in ('华东')
group by
DIM.DIM_LOG_USER_LOCATION.user_country_name,DIM.DIM_LOG_USER_LOCATION.user_province_name,DIM.DIM_LOG_USER_LOCATION.user_region_name

order by DIM.DIM_LOG_USER_LOCATION.user_province_name ASC

上面这个sql执行OK,符合预期。
但是 如果 仅仅把 in ('华东') 改成 in ('华东','华南') 则 sql执行就会报错。


错误信息如下所示:


查看 kylin.log日志, 详细的异常信息 :
http://note.youdao.com/noteshare?id=a1c257599774c4bccb0c6763923359d5=11C6AA36AC894EDD9006DDAE17B16747
2017-02-16 15:32:51,080 WARN [kylin-coproc--pool3-t5578]
ipc.CoprocessorRpcChannel:58 : Call failed on IOException
java.net.SocketTimeoutException: callTimeout=6, callDuration=114625:
row ' ' on table 'KYLIN_F2H6NPOLR7' at
region=KYLIN_F2H6NPOLR7,,1487168742102.433e266be82448c5380610e9e7704658.,
hostname=jx-db-hbase03.22lll.com,16020,1480406440673, seqNum=2
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95)
at
org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callMethod(CoprocessorRpcChannel.java:56)
at
org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.generated.CubeVisitProtos$CubeVisitService$Stub.visitCube(CubeVisitProtos.java:4178)

Caused by: java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191)
at
org.apache.kylin.storage.hbase.cube.v2.HBaseReadonlyStore$1$1.next(HBaseReadonlyStore.java:131)



根据kylin.log 异常,我的的判断定位,kylin源码中:HBaseReadonlyStore

麻烦帮忙排查,顺便咨询一下,这个源码中 cell啥情况下会为空?




附上该sql其他情况下:
如果 把sql 中 in ('华东') 改成 in ('华东','华南') 后 ,同时 select sum,count(distinct) 两个指标
只出现一个 指标, 也不会报错。

补充说明:在kylin 1.5.4.1 版本,该诡异现象 也存在。


附上 cube_desc 详细信息:

{ "uuid": "f30d538b-5345-4f77-b8e3-b20ebae8cb8e", "last_modified":
1487147245924, "version": "1.6.0", "name":
"olap_log_accs_page_di_cube_0215", "model_name":
"olap_log_accs_page_di_cube_0215", "description":
"olap_log_accs_page_di_cube_0215", "null_string": null, "dimensions": [ {
"name": "YEAR", "table": "DIM.DIM_PUBLIC_DATE_INFO", "column": "YEAR",
"derived": null }, { "name": "QUARTER", "table":
"DIM.DIM_PUBLIC_DATE_INFO", "column": "QUARTER_CN", "derived": null }, {
"name": "MONTH", "table": "DIM.DIM_PUBLIC_DATE_INFO", "column": "MONTH_CN",
"derived": null }, { "name": "DATE", "table": "DIM.DIM_PUBLIC_DATE_INFO",
"column": "DATE", "derived": null }, { "name": "PROVINCE", "table":
"DIM.DIM_PUBLIC_CITY_INFO", "column": "PROVINCE_NAME", "derived": null }, {
"name": "CITY", "table": "DIM.DIM_PUBLIC_CITY_INFO", "column": "CITY_NAME",
"derived": null }, { "name": "USER_COUNTRY", "table":
"DIM.DIM_LOG_USER_LOCATION", "column": "USER_COUNTRY_NAME", "derived": null
}, { "name": "USER_REGION", "table": "DIM.DIM_LOG_USER_LOCATION", "column":
"USER_REGION_NAME", "derived": null }, { "name": "USER_PROVINCE", "table":
"DIM.DIM_LOG_USER_LOCATION", "column": "USER_PROVINCE_NAME", "derived":
null }, { "name": "USER_CITY", "table": "DIM.DIM_LOG_USER_LOCATION",
"column": "USER_CITY_NAME", "derived": null }, { "name": "USER_TYPE",
"table": "DIM.DIM_LOG_USER_TYPE", "column": "TYPE_NAME", "derived": null },
{ "name": "IS_LOGIN", "table": "DIM.DIM_LOG_IS_LOGIN", "column":
"LOGIN_NAME", "derived": null }, { "name": "IS_REGISTER", "table":
"DIM.DIM_LOG_IS_REGISTER", "column": "REGISTER_NAME", "derived": null }, {
"name": "BROWSER", "table": "DIM.DIM_LOG_BROWSER", "column":
"BROWSER_NAME", "derived": null }, { "name": "APP_VERSION", "table":
"DIM.DIM_LOG_APP_VERSION", "column": "VERSION_NAME", "derived": null } ],
"measures": [ { "name": "_COUNT_", "function": { "expression": "COUNT",
"parameter": { "type": "constant", "value": "1", "next_parameter": null },
"returntype": "bigint" }, "dependent_measure_ref": null }, { "name": "PV",
"function": { "expression": "SUM", "parameter": { "type": "column",
"value": "PV", "next_parameter": null }, "returntype": "bigint" },
"dependent_measure_ref": null }, { "name": "UV", "function": {
"expression": "COUNT_DISTINCT", "parameter": { "type": "column", "value":
"USER_ID", "next_parameter": null }, "returntype": "bitmap" },
"dependent_measure_ref": null }, { "name": "OUT_SESSION", "function": {
"expression": "SUM", "parameter": { "type": "column", "value":
"IS_OUT_SESSION", "next_parameter": null }, "returntype": "bigint" },
"dependent_measure_ref": null } ], 

[jira] [Created] (KYLIN-1844) High cardinality dimensions in memory

2016-07-01 Thread Abhilash L L (JIRA)
Abhilash L L created KYLIN-1844:
---

 Summary: High cardinality dimensions in memory
 Key: KYLIN-1844
 URL: https://issues.apache.org/jira/browse/KYLIN-1844
 Project: Kylin
  Issue Type: Improvement
  Components: Query Engine
Affects Versions: v1.5.2, v1.2
Reporter: Abhilash L L
Assignee: liyang


A whole dimension is kept in memory.

We should have a way to keep only certain number / size of total rows to be 
kept in memory. A LRU cache for rows in the dimension will help keep memory in 
check.

Why not store all the dimensions data in hbase in a different table with a 
prefix of dimensionid, and all calls to the dimensions (get based on dim key), 
is mapped to hbase.

This does mean it will cost more time on a miss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Questions / Clarifications for 1.5.x

2016-06-28 Thread Abhilash L L
Hello,

   We plan to upgrade to Kylin 1.5.x series from 1.2 series. We want to ask
the team a few questions / clarification before proceeding. Since we are
looking at upgrading, please do let us know the behaviour for the latest
1.5.x release.


1) Updating facts
Eg:
Lets say im building a cube for jira tickets. List of tickets is my fact
table. List of ticket status is my dimension table. Now if a ticket is
updated from 'open' to 'in progress' how to tell kylin this change ?

2) Multiple dimensions in memory
When we were on 1.2 versions, all the dimensions were kept in memory. So
memory always keeps increasing. The in memory dimensions were not swapped
in / out, causing tomcat OOM issues. Is this still the same in 1.5.x ?

3) Do these dimension snapshots get reloaded on a cube refresh / data
appended ?

4) Maximum size for dimension
There was an older thread talking about 300mb max snapshot size. Does this
limitation still hold? Do very high cardinality dimension still build the
trees on one node ?


5) Multiple query servers
When there are multiple query servers, will one query server serve only one
cube or all servers serve all cubes. If its one cube only in one server,
how does kylin handle a server going down.


6) Incremental build
Is there a document on how incremental build works ?  We want to understand
limitations / assumptions for this.

Regards,
Abhilash


Re: Hot swapping cube post build

2016-02-01 Thread Abhilash L L
Sure, we will try out refresh


What do you suggest in case there are some changes in the schema / extra
measures etc

Regards,
Abhilash

On Sat, Jan 30, 2016 at 12:28 PM, Li Yang <liy...@apache.org> wrote:

> Assuming the cube definition does not change, all you need is "refresh" an
> existing cube segment. The old cube segment will continue serving until the
> new build is complete. No down time during the whole process.
>
> Try "refresh"
>
>
>
> On Friday, January 29, 2016, hongbin ma <mahong...@apache.org> wrote:
>
> > have you ever checked out the "refresh" function for cubes?
> >
> > On Thu, Jan 28, 2016 at 7:07 PM, Abhilash L L <abhil...@infoworks.io
> > <javascript:;>> wrote:
> >
> > > We have a use case where we want to rebuild the cube with an updated
> data
> > > set without downtime on requests
> > >
> > > Lets say we have cube C1.
> > > We get some new data and we rebuild the cube.
> > > Lets call this C2 with the new data. (Assume no change to cube
> structure)
> > >
> > > When C2 is building, we want C1 to be still serving requests.
> > > Once C2 is done building, we hot swap C1 with C2
> > >
> > > This way there is no downtime on the requests (even if it is, its very
> > > less)
> > >
> > > Another problem is, since it has to use the same hive table names and
> > > schema as C1, we can recreate the tables (external) pointing to the
> data
> > > for C2
> > >
> > > We cannot use  the incremental cube data addition since as of now its
> > hard
> > > to figure out of the change set.
> > >
> > > What is the best way to achieve this ?
> > >
> > > Assumption:
> > > Since we cannot have two cubes with same name under same project, we
> need
> > > two different cubes.
> > >
> > >
> > > Regards,
> > > Abhilash
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>


Re: Exact distinct count support

2016-02-01 Thread Abhilash L L
Hello,

   Need clarification on one point. From what I understand the int value
for the bitmap is per cell ?

   As long as the maximum disctinct count for one cell (a given value for
each dimension in the particular cuboid) does not exceed int value we
should be okay ?



Regards,
Abhilash

On Mon, Feb 1, 2016 at 12:11 PM, Abhilash L L <abhil...@infoworks.io> wrote:

> Sorry for the delayed response.
>
> what's the cardinality of the dimension that you want to count distinct
> values?
> --> We might be coming across different types of cardinality for the
> measure. Though unsigned int capacity should cover almost all cases, there
> might be some cases we miss.
>
>
> For example, if you want to count distinct users, use the numeric
> user_id, instead of email address;
> --> We will see if we can come up with a mapping function and use that for
> distinct count
>
>
> cast Long to Int may cause precision losing
> --> i remember seeing something like, good to know its removed and will be
> introduced later after the fix
>
>
> Regards,
> Abhilash
>
> On Fri, Jan 29, 2016 at 4:51 PM, Sarnath <stell...@gmail.com> wrote:
>
>> Yes. I was just hinting at practically faster compute using bloom filter.
>> Will need a way to handle probablistic answers
>>
>
>


Hot swapping cube post build

2016-01-28 Thread Abhilash L L
We have a use case where we want to rebuild the cube with an updated data
set without downtime on requests

Lets say we have cube C1.
We get some new data and we rebuild the cube.
Lets call this C2 with the new data. (Assume no change to cube structure)

When C2 is building, we want C1 to be still serving requests.
Once C2 is done building, we hot swap C1 with C2

This way there is no downtime on the requests (even if it is, its very less)

Another problem is, since it has to use the same hive table names and
schema as C1, we can recreate the tables (external) pointing to the data
for C2

We cannot use  the incremental cube data addition since as of now its hard
to figure out of the change set.

What is the best way to achieve this ?

Assumption:
Since we cannot have two cubes with same name under same project, we need
two different cubes.


Regards,
Abhilash


Re: Exact distinct count support

2016-01-28 Thread Abhilash L L
Thanks ShaoFeng Shi,

We might need for other data types as well

date & string

 (eg, distinct count of dates of certain activity)

So in the rest call instead of hllc return type it should be bitmap for
int,tinyint etc ?

And we still send it as hllc for other data types ?


Also in one of the comments, it said we cast long to int..  wont we be
losing data due to truncation ?


Regards,
Abhilash

On Thu, Jan 28, 2016 at 3:43 PM, ShaoFeng Shi <shaofeng...@apache.org>
wrote:

> is this matched your case?
> https://issues.apache.org/jira/browse/KYLIN-1186
>
> 2016-01-28 17:42 GMT+08:00 Abhilash L L <abhil...@infoworks.io>:
>
> > +user ml
> >
> > Regards,
> > Abhilash
> >
> > On Thu, Jan 28, 2016 at 11:32 AM, Abhilash L L <abhil...@infoworks.io>
> > wrote:
> >
> > > Hello,
> > >
> > >Is there a way to ask Kylin to get exact distinct count ?  From what
> > we
> > > understand, we can choose between hllc(10) to hllc(16)
> > >
> > >I understand that for every cuboid, you will need to go through the
> > > whole data set again, but with the new cubing algo (2.x branch) should
> be
> > > simpler to add ?
> > >
> > >If currently not present are there any plans to introduce this ?
> > >
> > > Regards,
> > > Abhilash
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>


Exact distinct count support

2016-01-27 Thread Abhilash L L
Hello,

   Is there a way to ask Kylin to get exact distinct count ?  From what we
understand, we can choose between hllc(10) to hllc(16)

   I understand that for every cuboid, you will need to go through the
whole data set again, but with the new cubing algo (2.x branch) should be
simpler to add ?

   If currently not present are there any plans to introduce this ?

Regards,
Abhilash


Dropping hbase table on dropping the cube

2015-12-29 Thread Abhilash L L
Hello,

   When we purge the cube, the hbase table in which all cuboids are being
stored is not being dropped.

   And as per the site seems like its a conscious decision.
   http://kylin.apache.org/docs/howto/howto_cleanup_storage.html

   Why dont we disable+drop the hbase table right away when we purge/delete
the cube ?

Regards,
Abhilash


Re: Incremental builds assumptions and clarifications

2015-12-25 Thread Abhilash L L
Thanks for the clarification Luke, Li Yang.

Please find my comments / questions inline

>Is there a document explaining the assumptions for incremental builds.
>> *Luke: I'm afraid there's no such doc yet. what's exactly "assumption"
you
>> are looking for, to know the code level implementation or how to
optimize?*
--> Not at code / implementation level. More at a feature level. Like the
one shared by Li Yang regarding a TS column for differentiation. Also, on
how it breaks it up into segments and how a user can rebuild part of the
segments


>Do we allow 'updates' on a facts ?
> 1) Because of some typo the quantity came in as 100 instead of 10. What is
> the suggested approach to handle this.
>>So you want to refresh a built piece of data. And yes, that's doable.
Kylin
>>cut cube into segments by time period. You can refresh (or rebuild) a
>>segment without impacting the rests.
--> a) How does kylin cut the data (initial / incremental) into segments ?
Does one day become one segment ?  b) When new data comes, does it
automatically figure out which segments to rebuild ?  c) How to rebuild
only part of the data / segment via Rest API.


>> Luke: Do you mean data model changes? Then you have to disable that
>>cube, purge data and refine it, the rebuild it.
--> No only data changes, not model changes. For model as now I understand
we have to rebuild the full cube.


>How to support deletes in fact / dimension ?
>
>>*  Luke: delete in fact table is fine, but in dimension should be
>>careful, properly it will require rebuild.*
Lets for a time period T1-T2 there were 100 records earlier, now due to
deletion it should be only 98 for the same time period. How to trigger
delete of the 2 records ?  Is it to populate all 98 records in facttable
and then ask kylin to rebuild for T1-T2 ?










Regards,
Abhilash

On Fri, Dec 25, 2015 at 7:35 AM, Li Yang <liy...@apache.org> wrote:

> Em.. don't think Luke has all the questions fully answered. My additions.
>
> >Is there a document explaining the assumptions for incremental builds.
> The only assumption (or requirement) is that there is date or timestamp
> column on the fact table that distinguishes the old from the new.
>
> >Do we allow 'updates' on a facts ?
> > 1) Because of some typo the quantity came in as 100 instead of 10. What
> is
> > the suggested approach to handle this.
> So you want to refresh a built piece of data. And yes, that's doable. Kylin
> cut cube into segments by time period. You can refresh (or rebuild) a
> segment without impacting the rests.
>
> > 2) Lets say the the value for dimension 1 was d1 in the facttable. Now it
> > got updated to d2 for the same dimension. How does it 'deduct' from the
> > aggregation for d1 for all cuboids and 'accumulate' for d2 in all
> cuboids.
> >
> >How to support Slowly Changing Dimensions (SCD). Support for type 2
> and
> > type 3.
> The design is Kylin remembers data at the point it's built. So you may
> build a daily segment on T day with category set C in lookup table; then on
> T+1 day, the category lookup table is updated into C~, and with that build
> a T+1 daily segment. Now if you query the cube, it will report categories
> including both C and C~. More precisely Kylin will return C for T day
> transactions and C~ for T+1 transactions.
>
> If what you want is to reflect C~ in historic data, then earlier segments
> have to be rebuild.
>
> On Thu, Dec 24, 2015 at 10:59 PM, Luke Han <luke...@gmail.com> wrote:
>
> > Hi Abhilash,
> > Please refer to below comments inline.
> >
> > Thanks.
> >
> >
> > Best Regards!
> > -
> >
> > Luke Han
> >
> > On Thu, Dec 10, 2015 at 2:28 PM, Abhilash L L <abhil...@infoworks.io>
> > wrote:
> >
> > > Hello,
> > >
> > >Is there a document explaining the assumptions for incremental
> builds.
> > > *Luke: I'm afraid there's no such doc yet. what's exactly "assumption"
> > you
> > > are looking for, to know the code level implementation or how to
> > optimize?*
> >
> >
> >
> > >
> > >Is it purely additive ? Lets say category id is one my row key
> > > components. I had 10 products on category id 20. Now I got a new
> product
> > > for same category would it add up. Would distinct count also be fine ?
> > >
> > *  Luke:  Kylin performs very well for such case, it will add up to
> 21,
> > also for distinct count, but the result of distinct count is
> > approximately.*
> >
> > >
> > >Do we allow 'updates' 

[jira] [Created] (KYLIN-1255) curator 2.7.1 client incompatibility

2015-12-25 Thread Abhilash L L (JIRA)
Abhilash L L created KYLIN-1255:
---

 Summary: curator 2.7.1 client incompatibility
 Key: KYLIN-1255
 URL: https://issues.apache.org/jira/browse/KYLIN-1255
 Project: Kylin
  Issue Type: Bug
  Components: Environment 
Affects Versions: v1.2
Reporter: Abhilash L L
Assignee: hongbin ma
Priority: Blocker


Currently curator-framework-2.6.0.jar, curator-recipes-2.6.0.jar and 
curator-client-2.7.1.jar  gets bundled with kylin v1.2

Getting an error with LockInternals constructor method not found for
PathUtils.validatePath(path);

since its finding 2.7.1 client in WEB-INF/lib..  

When i replaced that with curator-client-2.7.1.jar and then started, the 
scheduler was able to take a lock and start up

Even in mvn dependency:tree on the server module, shows that 2.7.1 is present 
along 2.6.0 of others

Not sure if its anything to do with our environment..  More with Maven ?


Marking it blocker as we are not able to build anything




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1243) Can't get dictionary value for column

2015-12-20 Thread Abhilash L L (JIRA)
Abhilash L L created KYLIN-1243:
---

 Summary: Can't get dictionary value for column
 Key: KYLIN-1243
 URL: https://issues.apache.org/jira/browse/KYLIN-1243
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v0.7.2
Reporter: Abhilash L L
Assignee: liyang


Hello for one of the dimensions it started giving the following error and the 
response was empty.

[http-bio-7071-exec-11]:[2015-12-17 
13:48:23,548][ERROR][org.apache.kylin.cube.kv.RowKeyColumnIO.readColumnString(RowKeyColumnIO.java:113)]
 - Can't get dictionary value for column DIM_FKEY (id = 23561)

After going through the RowKeyColumnIO code a bit..  Seems like the in memory 
lookup table is 'corrupt' ?  

After restarting, it started working fine.

We are on 0.72




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Can't get dictionary value for column

2015-12-17 Thread Abhilash L L
Hello for one of the dimensions it started giving the following error and
the response was empty.

[http-bio-7071-exec-11]:[2015-12-17
13:48:23,548][ERROR][org.apache.kylin.cube.kv.RowKeyColumnIO.readColumnString(RowKeyColumnIO.java:113)]
- Can't get dictionary value for column DIM_FKEY (id = 23561)

After going through the RowKeyColumnIO code a bit..  Seems like the in
memory lookup table is 'corrupt' ?

After restarting, it started working fine.

We are on 0.72

Regards,
Abhilash


Re: Can't get dictionary value for column

2015-12-17 Thread Abhilash L L
Should I raise a JIRA ticket for this ?

Regards,
Abhilash

On Fri, Dec 18, 2015 at 12:56 AM, Abhilash L L <abhil...@infoworks.io>
wrote:

> Hello for one of the dimensions it started giving the following error and
> the response was empty.
>
> [http-bio-7071-exec-11]:[2015-12-17
> 13:48:23,548][ERROR][org.apache.kylin.cube.kv.RowKeyColumnIO.readColumnString(RowKeyColumnIO.java:113)]
> - Can't get dictionary value for column DIM_FKEY (id = 23561)
>
> After going through the RowKeyColumnIO code a bit..  Seems like the in
> memory lookup table is 'corrupt' ?
>
> After restarting, it started working fine.
>
> We are on 0.72
>
> Regards,
> Abhilash
>