[jira] [Created] (KYLIN-3663) Failed to delete project when project has more than one table

2018-11-01 Thread rongchuan.jin (JIRA)
rongchuan.jin created KYLIN-3663:


 Summary: Failed to delete project when project has more than one 
table
 Key: KYLIN-3663
 URL: https://issues.apache.org/jira/browse/KYLIN-3663
 Project: Kylin
  Issue Type: Bug
  Components: Metadata
Affects Versions: v2.5.0
 Environment: MacOSX,JDK1.8+
Reporter: rongchuan.jin
 Fix For: v2.6.0


When I drop a project with more than one table.(When there is only one table,it 
works well)

It comes out an error:
{code:java}
org.apache.kylin.rest.exception.InternalErrorException: Failed to delete 
project. Caused by: null
 at 
org.apache.kylin.rest.controller.ProjectController.deleteProject(ProjectController.java:199)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
 at 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)
 at 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)
 at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
 at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)
 at 
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
 at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)
 at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)
 at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
 at 
org.springframework.web.servlet.FrameworkServlet.doDelete(FrameworkServlet.java:894)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:656)
 at 
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
 at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:317)
 at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:127)
 at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:91)
 at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
 at 
org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:114)
 at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
 at 
org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:137)
 at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
 at 
org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111)
 at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
 at 
org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:170)
 at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
 at 
org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:63)
 at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
 at 
org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilterInternal(BasicAuthenticationFilter.java:158)
 at 
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
 at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
 at 
org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:200)
 at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
 at 
org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(

Re: [VOTE] Release apache-kylin-2.5.1 (RC1)

2018-11-01 Thread JiaTao Tao
👏👏👏
Here is my vote:

+1 (binding)

ShaoFeng Shi  于2018年11月2日周五 下午2:10写道:

> Hi all,
>
> I have created a build for Apache Kylin 2.5.1, release candidate 1.
>
> Changes highlights:
>
> [KYLIN-3531] - Login failed with case-insensitive username
> [KYLIN-3604] - Can't build cube with spark in HBase standalone mode
> [KYLIN-3613] - Kylin with Standalone HBase Cluster could not find the main
> cluster namespace at "Create HTable" step
> [KYLIN-3634] - When the filter column has null value may cause incorrect
> query result
> [KYLIN-3635] - Percentile calculation on Spark engine is wrong
> [KYLIN-3644] - NumberFormatExcetion on null values when building cube with
> Spark
> [KYLIN-3599] - Bulk Add Measures
>
> Thanks to everyone who has contributed to this release.
> Here’s release notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121&version=12344108
>
> The commit to be voted upon:
>
>
> https://github.com/apache/kylin/commit/24e2452309a450ec4ef62339b003343eabe23016
>
> Its hash is 24e2452309a450ec4ef62339b003343eabe23016.
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.5.1-rc1/
>
> The hashe of the artifact is as follows:
> apache-kylin-2.5.1-source-release.zip.sha256
> 21db5dab4d3900a49237b9083b5d270c8471d1882a5427cddf1cc74873df42f2
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachekylin-1056/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/shaofengshi.asc
>
> Please vote on releasing this package as Apache Kylin 2.5.1.
>
> The vote is open for the next 72 hours and passes if a majority of
> at least three +1 PPMC votes are cast.
>
> [ ] +1 Release this package as Apache Kylin 2.5.1
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>


-- 


Regards!

Aron Tao


Re: Redistribute intermediate table default not by rand()

2018-11-01 Thread liuzhixin
Hi ShaoFeng Shi,

Thank you for the answer.
#
Step1: Create Intermediate Flat Hive Table
Step2: Redistribute intermediate table
#
Perhaps, Kylin can insert one rand column for the next shard, (as default).
At the same time,  Kylin should support the custom column for shard.

Best Wishes.

> 在 2018年11月2日,下午2:06,ShaoFeng Shi  写道:
> 
> Hi ShaoFeng Shi,
> 
> Kylin 2.5.1 will add some tips in the advanced step, hope that can help.
> 
> liuzhixin  于2018年11月2日周五 下午2:05写道:
> 
>> Hi Chao Long:
>> 
>> Thank you for the answer.
>> #
>> Maybe kylin should provide config for every build step
>> 
>> Best wishes.
>> 
>>> 在 2018年11月2日,下午1:38,Chao Long  写道:
>>> 
>>> Hi zhixin,
>>> Data may become not correct if use "distribute by rand()".
>>> https://issues.apache.org/jira/browse/KYLIN-3388
>>> 
>>> 
>>> 
>>> 
>>> -- 原始邮件 --
>>> 发件人: "liuzhixin";
>>> 发送时间: 2018年11月2日(星期五) 中午12:53
>>> 收件人: "dev";
>>> 抄送: "ShaoFeng Shi";
>>> 主题: Re: Redistribute intermediate table default not by rand()
>>> 
>>> 
>>> 
>>> Hi kylin team:
>>> 
>>> Step: Redistribute intermediate table
>>> #
>>> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND()
>>> 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。
>>> 
>>> Best Regards!
>>> 
 在 2018年11月2日,下午12:03,liuzhixin  写道:
 
 Hi kylin team:
 
 Version: Kylin2.5-hadoop3.1 for hdp3.0
 #
 Step: Redistribute intermediate table
 #
 DISTRIBUTE BY is that:
 INSERT OVERWRITE TABLE table_intermediate SELECT * FROM
>> table_intermediate DISTRIBUTE BY Field1, Field2, Field3;
 #
 Not DISTRIBUTE BY RAND()
 #
 Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE
>> BY RAND()?
 
 Best wishes.
>> 
>> 
>> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋




Re: Redistribute intermediate table default not by rand()

2018-11-01 Thread liuzhixin
Hi ShaoFeng Shi,

Thank you for the answer.
#
Step1: Create Intermediate Flat Hive Table
Step2: Redistribute intermediate table
#
Perhaps, Kylin can insert one rand column for the next shard, (as default).
At the same time,  Kylin should support the custom column for shard.

Best Wishes.

> 在 2018年11月2日,下午2:06,ShaoFeng Shi  写道:
> 
> Hi ShaoFeng Shi,
> 
> Kylin 2.5.1 will add some tips in the advanced step, hope that can help.
> 
> liuzhixin  于2018年11月2日周五 下午2:05写道:
> 
>> Hi Chao Long:
>> 
>> Thank you for the answer.
>> #
>> Maybe kylin should provide config for every build step
>> 
>> Best wishes.
>> 
>>> 在 2018年11月2日,下午1:38,Chao Long  写道:
>>> 
>>> Hi zhixin,
>>> Data may become not correct if use "distribute by rand()".
>>> https://issues.apache.org/jira/browse/KYLIN-3388
>>> 
>>> 
>>> 
>>> 
>>> -- 原始邮件 --
>>> 发件人: "liuzhixin";
>>> 发送时间: 2018年11月2日(星期五) 中午12:53
>>> 收件人: "dev";
>>> 抄送: "ShaoFeng Shi";
>>> 主题: Re: Redistribute intermediate table default not by rand()
>>> 
>>> 
>>> 
>>> Hi kylin team:
>>> 
>>> Step: Redistribute intermediate table
>>> #
>>> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND()
>>> 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。
>>> 
>>> Best Regards!
>>> 
 在 2018年11月2日,下午12:03,liuzhixin  写道:
 
 Hi kylin team:
 
 Version: Kylin2.5-hadoop3.1 for hdp3.0
 #
 Step: Redistribute intermediate table
 #
 DISTRIBUTE BY is that:
 INSERT OVERWRITE TABLE table_intermediate SELECT * FROM
>> table_intermediate DISTRIBUTE BY Field1, Field2, Field3;
 #
 Not DISTRIBUTE BY RAND()
 #
 Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE
>> BY RAND()?
 
 Best wishes.
>> 
>> 
>> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋




Re: [VOTE] Release apache-kylin-2.5.1 (RC1)

2018-11-01 Thread zhan shaoxiong
+1

On [DATE], "[NAME]" <[ADDRESS]> wrote:

Hi all,

I have created a build for Apache Kylin 2.5.1, release candidate 1.

Changes highlights:

[KYLIN-3531] - Login failed with case-insensitive username
[KYLIN-3604] - Can't build cube with spark in HBase standalone mode
[KYLIN-3613] - Kylin with Standalone HBase Cluster could not find the main
cluster namespace at "Create HTable" step
[KYLIN-3634] - When the filter column has null value may cause incorrect
query result
[KYLIN-3635] - Percentile calculation on Spark engine is wrong
[KYLIN-3644] - NumberFormatExcetion on null values when building cube with
Spark
[KYLIN-3599] - Bulk Add Measures

Thanks to everyone who has contributed to this release.
Here’s release notes:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121&version=12344108

The commit to be voted upon:


https://github.com/apache/kylin/commit/24e2452309a450ec4ef62339b003343eabe23016

Its hash is 24e2452309a450ec4ef62339b003343eabe23016.

The artifacts to be voted on are located here:
https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.5.1-rc1/

The hashe of the artifact is as follows:
apache-kylin-2.5.1-source-release.zip.sha256
21db5dab4d3900a49237b9083b5d270c8471d1882a5427cddf1cc74873df42f2

A staged Maven repository is available for review at:
https://repository.apache.org/content/repositories/orgapachekylin-1056/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/shaofengshi.asc

Please vote on releasing this package as Apache Kylin 2.5.1.

The vote is open for the next 72 hours and passes if a majority of
at least three +1 PPMC votes are cast.

[ ] +1 Release this package as Apache Kylin 2.5.1
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...


Here is my vote:

+1 (binding)

-- 
Best regards,

Shaofeng Shi 史少锋



Re: 回复:[VOTE] Release apache-kylin-2.5.1 (RC1)

2018-11-01 Thread zhan shaoxiong
+1

On [DATE], "[NAME]" <[ADDRESS]> wrote:

+1




-- 原始邮件 --
发件人: "ShaoFeng Shi";
发送时间: 2018年11月2日(星期五) 下午2:09
收件人: "dev";

主题: [VOTE] Release apache-kylin-2.5.1 (RC1)



Hi all,

I have created a build for Apache Kylin 2.5.1, release candidate 1.

Changes highlights:

[KYLIN-3531] - Login failed with case-insensitive username
[KYLIN-3604] - Can't build cube with spark in HBase standalone mode
[KYLIN-3613] - Kylin with Standalone HBase Cluster could not find the main
cluster namespace at "Create HTable" step
[KYLIN-3634] - When the filter column has null value may cause incorrect
query result
[KYLIN-3635] - Percentile calculation on Spark engine is wrong
[KYLIN-3644] - NumberFormatExcetion on null values when building cube with
Spark
[KYLIN-3599] - Bulk Add Measures

Thanks to everyone who has contributed to this release.
Here’s release notes:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121&version=12344108

The commit to be voted upon:


https://github.com/apache/kylin/commit/24e2452309a450ec4ef62339b003343eabe23016

Its hash is 24e2452309a450ec4ef62339b003343eabe23016.

The artifacts to be voted on are located here:
https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.5.1-rc1/

The hashe of the artifact is as follows:
apache-kylin-2.5.1-source-release.zip.sha256
21db5dab4d3900a49237b9083b5d270c8471d1882a5427cddf1cc74873df42f2

A staged Maven repository is available for review at:
https://repository.apache.org/content/repositories/orgapachekylin-1056/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/shaofengshi.asc

Please vote on releasing this package as Apache Kylin 2.5.1.

The vote is open for the next 72 hours and passes if a majority of
at least three +1 PPMC votes are cast.

[ ] +1 Release this package as Apache Kylin 2.5.1
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...


Here is my vote:

+1 (binding)

-- 
Best regards,

Shaofeng Shi 史少锋


??????[VOTE] Release apache-kylin-2.5.1 (RC1)

2018-11-01 Thread Chao Long
+1




--  --
??: "ShaoFeng Shi";
: 2018??11??2??(??) 2:09
??: "dev";

: [VOTE] Release apache-kylin-2.5.1 (RC1)



Hi all,

I have created a build for Apache Kylin 2.5.1, release candidate 1.

Changes highlights:

[KYLIN-3531] - Login failed with case-insensitive username
[KYLIN-3604] - Can't build cube with spark in HBase standalone mode
[KYLIN-3613] - Kylin with Standalone HBase Cluster could not find the main
cluster namespace at "Create HTable" step
[KYLIN-3634] - When the filter column has null value may cause incorrect
query result
[KYLIN-3635] - Percentile calculation on Spark engine is wrong
[KYLIN-3644] - NumberFormatExcetion on null values when building cube with
Spark
[KYLIN-3599] - Bulk Add Measures

Thanks to everyone who has contributed to this release.
Here??s release notes:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121&version=12344108

The commit to be voted upon:

https://github.com/apache/kylin/commit/24e2452309a450ec4ef62339b003343eabe23016

Its hash is 24e2452309a450ec4ef62339b003343eabe23016.

The artifacts to be voted on are located here:
https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.5.1-rc1/

The hashe of the artifact is as follows:
apache-kylin-2.5.1-source-release.zip.sha256
21db5dab4d3900a49237b9083b5d270c8471d1882a5427cddf1cc74873df42f2

A staged Maven repository is available for review at:
https://repository.apache.org/content/repositories/orgapachekylin-1056/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/shaofengshi.asc

Please vote on releasing this package as Apache Kylin 2.5.1.

The vote is open for the next 72 hours and passes if a majority of
at least three +1 PPMC votes are cast.

[ ] +1 Release this package as Apache Kylin 2.5.1
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...


Here is my vote:

+1 (binding)

-- 
Best regards,

Shaofeng Shi ??

[VOTE] Release apache-kylin-2.5.1 (RC1)

2018-11-01 Thread ShaoFeng Shi
Hi all,

I have created a build for Apache Kylin 2.5.1, release candidate 1.

Changes highlights:

[KYLIN-3531] - Login failed with case-insensitive username
[KYLIN-3604] - Can't build cube with spark in HBase standalone mode
[KYLIN-3613] - Kylin with Standalone HBase Cluster could not find the main
cluster namespace at "Create HTable" step
[KYLIN-3634] - When the filter column has null value may cause incorrect
query result
[KYLIN-3635] - Percentile calculation on Spark engine is wrong
[KYLIN-3644] - NumberFormatExcetion on null values when building cube with
Spark
[KYLIN-3599] - Bulk Add Measures

Thanks to everyone who has contributed to this release.
Here’s release notes:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316121&version=12344108

The commit to be voted upon:

https://github.com/apache/kylin/commit/24e2452309a450ec4ef62339b003343eabe23016

Its hash is 24e2452309a450ec4ef62339b003343eabe23016.

The artifacts to be voted on are located here:
https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.5.1-rc1/

The hashe of the artifact is as follows:
apache-kylin-2.5.1-source-release.zip.sha256
21db5dab4d3900a49237b9083b5d270c8471d1882a5427cddf1cc74873df42f2

A staged Maven repository is available for review at:
https://repository.apache.org/content/repositories/orgapachekylin-1056/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/shaofengshi.asc

Please vote on releasing this package as Apache Kylin 2.5.1.

The vote is open for the next 72 hours and passes if a majority of
at least three +1 PPMC votes are cast.

[ ] +1 Release this package as Apache Kylin 2.5.1
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...


Here is my vote:

+1 (binding)

-- 
Best regards,

Shaofeng Shi 史少锋


Re: Redistribute intermediate table default not by rand()

2018-11-01 Thread ShaoFeng Shi
Hi Zhixin,

Kylin 2.5.1 will add some tips in the advanced step, hope that can help.

liuzhixin  于2018年11月2日周五 下午2:05写道:

> Hi Chao Long:
>
> Thank you for the answer.
> #
> Maybe kylin should provide config for every build step
>
> Best wishes.
>
> > 在 2018年11月2日,下午1:38,Chao Long  写道:
> >
> > Hi zhixin,
> > Data may become not correct if use "distribute by rand()".
> > https://issues.apache.org/jira/browse/KYLIN-3388
> >
> >
> >
> >
> > -- 原始邮件 --
> > 发件人: "liuzhixin";
> > 发送时间: 2018年11月2日(星期五) 中午12:53
> > 收件人: "dev";
> > 抄送: "ShaoFeng Shi";
> > 主题: Re: Redistribute intermediate table default not by rand()
> >
> >
> >
> > Hi kylin team:
> >
> > Step: Redistribute intermediate table
> > #
> > 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND()
> > 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。
> >
> > Best Regards!
> >
> >> 在 2018年11月2日,下午12:03,liuzhixin  写道:
> >>
> >> Hi kylin team:
> >>
> >> Version: Kylin2.5-hadoop3.1 for hdp3.0
> >> #
> >> Step: Redistribute intermediate table
> >> #
> >> DISTRIBUTE BY is that:
> >> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM
> table_intermediate DISTRIBUTE BY Field1, Field2, Field3;
> >> #
> >> Not DISTRIBUTE BY RAND()
> >> #
> >> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE
> BY RAND()?
> >>
> >> Best wishes.
>
>
>

-- 
Best regards,

Shaofeng Shi 史少锋


Re: Redistribute intermediate table default not by rand()

2018-11-01 Thread liuzhixin
Hi Chao Long:

Thank you for the answer.
#
Maybe kylin should provide config for every build step

Best wishes.

> 在 2018年11月2日,下午1:38,Chao Long  写道:
> 
> Hi zhixin,
> Data may become not correct if use "distribute by rand()".
> https://issues.apache.org/jira/browse/KYLIN-3388
> 
> 
> 
> 
> -- 原始邮件 --
> 发件人: "liuzhixin";
> 发送时间: 2018年11月2日(星期五) 中午12:53
> 收件人: "dev";
> 抄送: "ShaoFeng Shi"; 
> 主题: Re: Redistribute intermediate table default not by rand()
> 
> 
> 
> Hi kylin team:
> 
> Step: Redistribute intermediate table
> #
> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND()
> 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。
> 
> Best Regards!
> 
>> 在 2018年11月2日,下午12:03,liuzhixin  写道:
>> 
>> Hi kylin team:
>> 
>> Version: Kylin2.5-hadoop3.1 for hdp3.0
>> #
>> Step: Redistribute intermediate table
>> #
>> DISTRIBUTE BY is that:
>> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM table_intermediate 
>> DISTRIBUTE BY Field1, Field2, Field3;
>> #
>> Not DISTRIBUTE BY RAND()
>> #
>> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE BY 
>> RAND()?
>> 
>> Best wishes.




Re: Redistribute intermediate table default not by rand()

2018-11-01 Thread liuzhixin
Hi ShaoFeng Shi

OK, thank you for the answer.
#
Perhaps Kylin should provide the tips or notes for the default shard.

Best Wishes.

> 在 2018年11月2日,下午1:42,ShaoFeng Shi  写道:
> 
> Please move the high cardinality dimensions to the leading position of
> rowkey, that will make the data distribution more even;
> 
> Chao Long  于2018年11月2日周五 下午1:38写道:
> 
>> Hi zhixin,
>> Data may become not correct if use "distribute by rand()".
>> https://issues.apache.org/jira/browse/KYLIN-3388
>> 
>> 
>> 
>> 
>> -- 原始邮件 --
>> 发件人: "liuzhixin";
>> 发送时间: 2018年11月2日(星期五) 中午12:53
>> 收件人: "dev";
>> 抄送: "ShaoFeng Shi";
>> 主题: Re: Redistribute intermediate table default not by rand()
>> 
>> 
>> 
>> Hi kylin team:
>> 
>> Step: Redistribute intermediate table
>> #
>> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND()
>> 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。
>> 
>> Best Regards!
>> 
>>> 在 2018年11月2日,下午12:03,liuzhixin  写道:
>>> 
>>> Hi kylin team:
>>> 
>>> Version: Kylin2.5-hadoop3.1 for hdp3.0
>>> #
>>> Step: Redistribute intermediate table
>>> #
>>> DISTRIBUTE BY is that:
>>> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM
>> table_intermediate DISTRIBUTE BY Field1, Field2, Field3;
>>> #
>>> Not DISTRIBUTE BY RAND()
>>> #
>>> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE
>> BY RAND()?
>>> 
>>> Best wishes.
>>> 
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋




Re: Redistribute intermediate table default not by rand()

2018-11-01 Thread ShaoFeng Shi
Please move the high cardinality dimensions to the leading position of
rowkey, that will make the data distribution more even;

Chao Long  于2018年11月2日周五 下午1:38写道:

> Hi zhixin,
>  Data may become not correct if use "distribute by rand()".
>  https://issues.apache.org/jira/browse/KYLIN-3388
>
>
>
>
> -- 原始邮件 --
> 发件人: "liuzhixin";
> 发送时间: 2018年11月2日(星期五) 中午12:53
> 收件人: "dev";
> 抄送: "ShaoFeng Shi";
> 主题: Re: Redistribute intermediate table default not by rand()
>
>
>
> Hi kylin team:
>
> Step: Redistribute intermediate table
> #
> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND()
> 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。
>
> Best Regards!
>
> > 在 2018年11月2日,下午12:03,liuzhixin  写道:
> >
> > Hi kylin team:
> >
> > Version: Kylin2.5-hadoop3.1 for hdp3.0
> > #
> > Step: Redistribute intermediate table
> > #
> > DISTRIBUTE BY is that:
> > INSERT OVERWRITE TABLE table_intermediate SELECT * FROM
> table_intermediate DISTRIBUTE BY Field1, Field2, Field3;
> > #
> > Not DISTRIBUTE BY RAND()
> > #
> > Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE
> BY RAND()?
> >
> > Best wishes.
> >



-- 
Best regards,

Shaofeng Shi 史少锋


?????? Redistribute intermediate table default not by rand()

2018-11-01 Thread Chao Long
Hi zhixin,
 Data may become not correct if use "distribute by rand()".
 https://issues.apache.org/jira/browse/KYLIN-3388




--  --
??: "liuzhixin";
: 2018??11??2??(??) 12:53
??: "dev";
: "ShaoFeng Shi"; 
: Re: Redistribute intermediate table default not by rand()



Hi kylin team:

Step: Redistribute intermediate table
#
??DISTRIBUTE BYDISTRIBUTE BY RAND()


Best Regards??

> ?? 2018??11??212:03??liuzhixin  ??
> 
> Hi kylin team:
> 
> Version: Kylin2.5-hadoop3.1 for hdp3.0
> #
> Step: Redistribute intermediate table
> #
> DISTRIBUTE BY is that:
> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM table_intermediate 
> DISTRIBUTE BY Field1, Field2, Field3;
> #
> Not DISTRIBUTE BY RAND()
> #
> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE BY 
> RAND()?
> 
> Best wishes.
>

Re: Redistribute intermediate table default not by rand()

2018-11-01 Thread liuzhixin
Hi kylin team:

Step: Redistribute intermediate table
#
默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND()
如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。

Best Regards!

> 在 2018年11月2日,下午12:03,liuzhixin  写道:
> 
> Hi kylin team:
> 
> Version: Kylin2.5-hadoop3.1 for hdp3.0
> #
> Step: Redistribute intermediate table
> #
> DISTRIBUTE BY is that:
> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM table_intermediate 
> DISTRIBUTE BY Field1, Field2, Field3;
> #
> Not DISTRIBUTE BY RAND()
> #
> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE BY 
> RAND()?
> 
> Best wishes.
> 



Redistribute intermediate table default not by rand()

2018-11-01 Thread liuzhixin
Hi kylin team:

Version: Kylin2.5-hadoop3.1 for hdp3.0
#
Step: Redistribute intermediate table
#
DISTRIBUTE BY is that:
INSERT OVERWRITE TABLE table_intermediate SELECT * FROM table_intermediate 
DISTRIBUTE BY Field1, Field2, Field3;
#
Not DISTRIBUTE BY RAND()
#
Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE BY 
RAND()?

Best wishes.



[jira] [Created] (KYLIN-3662) exception message "Cannot find project '%s'." should be formated

2018-11-01 Thread Lingang Deng (JIRA)
Lingang Deng created KYLIN-3662:
---

 Summary: exception message "Cannot find project '%s'." should be 
formated
 Key: KYLIN-3662
 URL: https://issues.apache.org/jira/browse/KYLIN-3662
 Project: Kylin
  Issue Type: Bug
Affects Versions: v2.5.0
Reporter: Lingang Deng
Assignee: Lingang Deng


When use kylin dashboard without system cube, exception is threw as follows,
{code:java}
org.apache.kylin.rest.exception.BadRequestException: Cannot find project '%s'.
 at 
org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:378)
 at 
org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:359)
 at 
org.apache.kylin.rest.controller.DashboardController.getQueryMetrics(DashboardController.java:74)
{code}
The log is unfriendly to users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re:Re: Re: [DISCUSS] New Kylin Streaming Solution From eBay

2018-11-01 Thread Ma Gang
Hi ShaoFeng,
For streaming ingest/query performance, there is a doc: 
https://drive.google.com/file/d/1GSBMpRuVQRmr8Ev2BWvssfMd-Rck9vsH/view?ths=true 
, it is also in the design doc's 'performance' section attached in the jira: 
https://issues.apache.org/jira/browse/KYLIN-3654
For stability, it is very stable in our environment, but currently it is not 
widely used in eBay, so it is hard to say.
I will start to merge code to master branch, it may take some time because our 
current version is Kylin 2.1.0, hope it can be done before Nov.30, but I cannot 
guarantee it, there is lots of other works to do.

At 2018-11-01 15:08:12, "ShaoFeng Shi"  wrote:
>Hi Gang,
>
>Thank you for the information, that is helpful for understanding the
>overall design and implementation.
>
>Do you have some statistical information, like performance, throughput,
>stability, etc.? Besides, what's the plan of contributing it to the
>community? Thanks!
>
>
>Ma Gang  于2018年11月1日周四 下午2:45写道:
>
>> Thanks Xiaoxiang,
>> Very good questions! Please see my comments started with [Gang]:
>>
>>
>> 1.  Is it possible to use Yarn as cluster manager for index task.
>> Coordinator process will set up them at specificed period.
>> [Gang] I think it is possible, but in current design,  the indexing task
>> is designed as long running task, it also can provide query service, this
>> makes the whole system very simple and efficiency, I don't think we need to
>> stop/start indexing task time by time. But use yarn to manage the resource
>> is possible, we need to redesign the existing coordinator, to make it easy
>> to deploy to Yarn, Kubernetes, etc. Hope this can be done after
>> contribution to community.
>>
>> 2.  As I know, ebay’s New Kylin Streaming Solution use replica Set to
>> ensure that income messages wouldn’t lost if some processes  lost. I think
>> replica set is a set of kafka cosumer processes which is responsible for
>> ingest message and build base cuboid in memory. Could you please show me
>> some detail about how replica Set provide HA guarantee? How to configure
>> it? A link / paper is OK.  I found one but I don’t know if it same meaning
>> for your replica Set.
>>
>>
>> [Gang] Yes, it is similar as the MongoDB replication, but currently we
>> don't replicate data from Primary node, just assign the same Kafka
>> topic/partitions to the receivers in a ReplicaSet, all receivers in a
>> ReplicaSet will consume data from Kafka, so if one receiver is down, other
>> receivers in the ReplicaSet are still consuming the same Kafka data, so the
>> consume/query will not be impact. And We don't guarantee that the receivers
>> in a ReplicaSet have the same consuming rate, but we can guarantee that the
>> user can view data consistently by stick to the query to one receiver for
>> one cube.
>> The HA implementation is a little bit naive, but simple and worked. Maybe
>> in the future, we can do HA by replication to support other streaming
>> sources that don't support multiple consumers and don't have persistent
>> store.
>>
>> 3.  How to add or remove node of replica Set in production env? How to
>> monitor the health/pressure of replica Set cluster ?
>> [Gang] Currently we have UI/restful api to let admin to add/remove node
>> to/from a ReplicaSet, and have a simple ui to let admin monitor the health,
>> consuming rate for each receiver/cube. Also all metrics are collected using
>> yammer metrics framework, it is easy to exposed to other monitor system.
>>
>> 4.  Does all measure are supported in ebay’s New Kylin Streaming
>> Solution? What about count distinct(bitmap)?
>> [Gang] Most measures are supported, but precise count distinct(bitmap) is
>> not support in case that the distinct dimension is not int type. As you
>> know, to support precise count distinct for not-int type dimension, it
>> needs to build global dictionary, it is not possible in the streaming env.
>>
>>
>> 5.  It seems ebay’s New Kylin Streaming Solution use a custom columnar
>> storage, why not use a open source mature columnar storage  solution ? Have
>> your ever compare the performance of your custom columnar storage to open
>> source columnar storage  solution ?
>>
>> [Gang] Most open source columnar format like Parquet, ORC are designed to
>> use in Hadoop env, the streaming data are in local disk, so I didn't
>> consider them at the beginning. It is not very hard to define columnar
>> format to store Kylin specific data, use a customize columnar storage, you
>> can use mmap file to scan data, add row-level invert index for all
>> dimensions, so I think the performance will be better compared to using
>> common columnar format. I didn't compare the performance, but the storage
>> engine is pluggable, you may contribute a parquet storage if you are
>> interesting.
>>
>>
>>
>>
>>
>>
>> At 2018-11-01 12:42:25, "Xiaoxiang Yu"  wrote:
>> >Hi gang, I am so glad to know that eBay has a solution for realtime olap
>> on kylin. I have some small question:

Re: [DISCUSS] New Kylin Streaming Solution From eBay

2018-11-01 Thread Xiaoxiang Yu
Thank you for your reply. Maybe I can help to improve your Kylin Streaming 
Solution in the future.



Best wishes,
Xiaoxiang Yu





On [DATE], "[NAME]" <[ADDRESS]> wrote:



Thanks Xiaoxiang,

Very good questions! Please see my comments started with [Gang]:





1.  Is it possible to use Yarn as cluster manager for index task. 
Coordinator process will set up them at specificed period.

[Gang] I think it is possible, but in current design,  the indexing task is 
designed as long running task, it also can provide query service, this makes 
the whole system very simple and efficiency, I don't think we need to 
stop/start indexing task time by time. But use yarn to manage the resource is 
possible, we need to redesign the existing coordinator, to make it easy to 
deploy to Yarn, Kubernetes, etc. Hope this can be done after contribution to 
community.



2.  As I know, ebay’s New Kylin Streaming Solution use replica Set to 
ensure that income messages wouldn’t lost if some processes  lost. I think 
replica set is a set of kafka cosumer processes which is responsible for ingest 
message and build base cuboid in memory. Could you please show me some detail 
about how replica Set provide HA guarantee? How to configure it? A link / paper 
is OK.  I found one but I don’t know if it same meaning for your replica Set.





[Gang] Yes, it is similar as the MongoDB replication, but currently we 
don't replicate data from Primary node, just assign the same Kafka 
topic/partitions to the receivers in a ReplicaSet, all receivers in a 
ReplicaSet will consume data from Kafka, so if one receiver is down, other 
receivers in the ReplicaSet are still consuming the same Kafka data, so the 
consume/query will not be impact. And We don't guarantee that the receivers in 
a ReplicaSet have the same consuming rate, but we can guarantee that the user 
can view data consistently by stick to the query to one receiver for one cube.

The HA implementation is a little bit naive, but simple and worked. Maybe 
in the future, we can do HA by replication to support other streaming sources 
that don't support multiple consumers and don't have persistent store.



3.  How to add or remove node of replica Set in production env? How to 
monitor the health/pressure of replica Set cluster ?

[Gang] Currently we have UI/restful api to let admin to add/remove node 
to/from a ReplicaSet, and have a simple ui to let admin monitor the health, 
consuming rate for each receiver/cube. Also all metrics are collected using 
yammer metrics framework, it is easy to exposed to other monitor system.



4.  Does all measure are supported in ebay’s New Kylin Streaming 
Solution? What about count distinct(bitmap)?

[Gang] Most measures are supported, but precise count distinct(bitmap) is 
not support in case that the distinct dimension is not int type. As you know, 
to support precise count distinct for not-int type dimension, it needs to build 
global dictionary, it is not possible in the streaming env.





5.  It seems ebay’s New Kylin Streaming Solution use a custom columnar 
storage, why not use a open source mature columnar storage  solution ? Have 
your ever compare the performance of your custom columnar storage to open 
source columnar storage  solution ?



[Gang] Most open source columnar format like Parquet, ORC are designed to 
use in Hadoop env, the streaming data are in local disk, so I didn't consider 
them at the beginning. It is not very hard to define columnar format to store 
Kylin specific data, use a customize columnar storage, you can use mmap file to 
scan data, add row-level invert index for all dimensions, so I think the 
performance will be better compared to using common columnar format. I didn't 
compare the performance, but the storage engine is pluggable, you may 
contribute a parquet storage if you are interesting.













At 2018-11-01 12:42:25, "Xiaoxiang Yu"  wrote:

>Hi gang, I am so glad to know that eBay has a solution for realtime olap 
on kylin. I have some small question:

>

>

>1.  Is it possible to use Yarn as cluster manager for index task. 
Coordinator process will set up them at specificed period. Yarn will manage :

>

>a)   retry these task if some failed

>

>b)   resource allocation

>

>c)   log collection

>

>2.  As I know, ebay’s New Kylin Streaming Solution use replica Set to 
ensure that income messages wouldn’t lost if some processes  lost. I think 
replica set is a set of kafka cosumer processes which is responsible for ingest 
message and build base cuboid in memory. Could you please show me some detail 
about how replica Set provide HA guarantee? How to configure it? A link / paper 
is OK.  I found one but I don’t know if it same meaning for your replica Set.

>

>a)   [Mongodb 
replication](https://docs.mongodb.com/manual/repl

Re: Re: [DISCUSS] New Kylin Streaming Solution From eBay

2018-11-01 Thread ShaoFeng Shi
Hi Gang,

Thank you for the information, that is helpful for understanding the
overall design and implementation.

Do you have some statistical information, like performance, throughput,
stability, etc.? Besides, what's the plan of contributing it to the
community? Thanks!


Ma Gang  于2018年11月1日周四 下午2:45写道:

> Thanks Xiaoxiang,
> Very good questions! Please see my comments started with [Gang]:
>
>
> 1.  Is it possible to use Yarn as cluster manager for index task.
> Coordinator process will set up them at specificed period.
> [Gang] I think it is possible, but in current design,  the indexing task
> is designed as long running task, it also can provide query service, this
> makes the whole system very simple and efficiency, I don't think we need to
> stop/start indexing task time by time. But use yarn to manage the resource
> is possible, we need to redesign the existing coordinator, to make it easy
> to deploy to Yarn, Kubernetes, etc. Hope this can be done after
> contribution to community.
>
> 2.  As I know, ebay’s New Kylin Streaming Solution use replica Set to
> ensure that income messages wouldn’t lost if some processes  lost. I think
> replica set is a set of kafka cosumer processes which is responsible for
> ingest message and build base cuboid in memory. Could you please show me
> some detail about how replica Set provide HA guarantee? How to configure
> it? A link / paper is OK.  I found one but I don’t know if it same meaning
> for your replica Set.
>
>
> [Gang] Yes, it is similar as the MongoDB replication, but currently we
> don't replicate data from Primary node, just assign the same Kafka
> topic/partitions to the receivers in a ReplicaSet, all receivers in a
> ReplicaSet will consume data from Kafka, so if one receiver is down, other
> receivers in the ReplicaSet are still consuming the same Kafka data, so the
> consume/query will not be impact. And We don't guarantee that the receivers
> in a ReplicaSet have the same consuming rate, but we can guarantee that the
> user can view data consistently by stick to the query to one receiver for
> one cube.
> The HA implementation is a little bit naive, but simple and worked. Maybe
> in the future, we can do HA by replication to support other streaming
> sources that don't support multiple consumers and don't have persistent
> store.
>
> 3.  How to add or remove node of replica Set in production env? How to
> monitor the health/pressure of replica Set cluster ?
> [Gang] Currently we have UI/restful api to let admin to add/remove node
> to/from a ReplicaSet, and have a simple ui to let admin monitor the health,
> consuming rate for each receiver/cube. Also all metrics are collected using
> yammer metrics framework, it is easy to exposed to other monitor system.
>
> 4.  Does all measure are supported in ebay’s New Kylin Streaming
> Solution? What about count distinct(bitmap)?
> [Gang] Most measures are supported, but precise count distinct(bitmap) is
> not support in case that the distinct dimension is not int type. As you
> know, to support precise count distinct for not-int type dimension, it
> needs to build global dictionary, it is not possible in the streaming env.
>
>
> 5.  It seems ebay’s New Kylin Streaming Solution use a custom columnar
> storage, why not use a open source mature columnar storage  solution ? Have
> your ever compare the performance of your custom columnar storage to open
> source columnar storage  solution ?
>
> [Gang] Most open source columnar format like Parquet, ORC are designed to
> use in Hadoop env, the streaming data are in local disk, so I didn't
> consider them at the beginning. It is not very hard to define columnar
> format to store Kylin specific data, use a customize columnar storage, you
> can use mmap file to scan data, add row-level invert index for all
> dimensions, so I think the performance will be better compared to using
> common columnar format. I didn't compare the performance, but the storage
> engine is pluggable, you may contribute a parquet storage if you are
> interesting.
>
>
>
>
>
>
> At 2018-11-01 12:42:25, "Xiaoxiang Yu"  wrote:
> >Hi gang, I am so glad to know that eBay has a solution for realtime olap
> on kylin. I have some small question:
> >
> >
> >1.  Is it possible to use Yarn as cluster manager for index task.
> Coordinator process will set up them at specificed period. Yarn will manage
> :
> >
> >a)   retry these task if some failed
> >
> >b)   resource allocation
> >
> >c)   log collection
> >
> >2.  As I know, ebay’s New Kylin Streaming Solution use replica Set to
> ensure that income messages wouldn’t lost if some processes  lost. I think
> replica set is a set of kafka cosumer processes which is responsible for
> ingest message and build base cuboid in memory. Could you please show me
> some detail about how replica Set provide HA guarantee? How to configure
> it? A link / paper is OK.  I found one but I don’t know if it same meaning
> for y