[jira] [Commented] (KYLIN-1676) High CPU in TrieDictionary due to incorrect use of HashMap

2016-05-15 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283829#comment-15283829
 ] 

liyang commented on KYLIN-1676:
---

Nice catch!  Yanghong can do the merge and save me some time.  :-)

> High CPU in TrieDictionary due to incorrect use of HashMap
> --
>
> Key: KYLIN-1676
> URL: https://issues.apache.org/jira/browse/KYLIN-1676
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: v1.4.0
>Reporter: qianqiaoneng
>Assignee: liyang
> Attachments: fix_hashmap_concurrency_issue_1.4rc.patch, 
> fix_hashmap_concurrency_issue_master.patch
>
>
> 10015 b_kylin   20   0 62.5g 6.7g  29m R 99.9  4.7 431:15.42 java
> 10723 b_kylin   20   0 62.5g 6.7g  29m R 99.9  4.7 432:30.48 java
> 10724 b_kylin   20   0 62.5g 6.7g  29m R 99.9  4.7 432:30.76 java
> 10781 b_kylin   20   0 62.5g 6.7g  29m R 99.9  4.7 429:02.64 java
> 30929 b_kylin   20   0 62.5g 6.7g  29m R 99.9  4.7 430:21.31 java
> 10014 b_kylin   20   0 62.5g 6.7g  29m R 99.6  4.7 432:32.71 java
> 10722 b_kylin   20   0 62.5g 6.7g  29m R 99.6  4.7 433:05.26 java
> 10827 b_kylin   20   0 62.5g 6.7g  29m R 99.6  4.7 430:27.80 java
>  
>  
>  
> at java.util.HashMap.getEntry(HashMap.java:465)
> at java.util.HashMap.get(HashMap.java:417)
> at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:151)
> at org.apache.kylin.dict.Dictionary.getIdFromValue(Dictionary.java:98)
> at 
> org.apache.kylin.cube.gridtable.CubeCodeSystem$DictionarySerializer.serializeWithRounding(CubeCodeSystem.java:219)
> at 
> org.apache.kylin.cube.gridtable.CubeCodeSystem.encodeColumnValue(CubeCodeSystem.java:130)
> at org.apache.kylin.gridtable.GTUtil$1.translate(GTUtil.java:207)
> at 
> org.apache.kylin.gridtable.GTUtil$1.encodeConstants(GTUtil.java:140)
>at org.apache.kylin.gridtable.GTUtil$1.onSerialize(GTUtil.java:105)
> at 
> org.apache.kylin.metadata.filter.TupleFilterSerializer.internalSerialize(TupleFilterSerializer.java:63)
> at 
> org.apache.kylin.metadata.filter.TupleFilterSerializer.internalSerialize(TupleFilterSerializer.java:75)
> at 
> org.apache.kylin.metadata.filter.TupleFilterSerializer.serialize(TupleFilterSerializer.java:55)
> at org.apache.kylin.gridtable.GTUtil.convertFilter(GTUtil.java:76)
> at 
> org.apache.kylin.gridtable.GTUtil.convertFilterColumnsAndConstants(GTUtil.java:66)
> at 
> org.apache.kylin.storage.hbase.cube.v2.CubeSegmentScanner.(CubeSegmentScanner.java:89)
> at 
> org.apache.kylin.storage.hbase.cube.v2.CubeStorageQuery.search(CubeStorageQuery.java:120)
> at 
> org.apache.kylin.storage.cache.CacheFledgedStaticQuery.search(CacheFledgedStaticQuery.java:59)
> at 
> org.apache.kylin.query.enumerator.OLAPEnumerator.queryStorage(OLAPEnumerator.java:125)
> at 
> org.apache.kylin.query.enumerator.OLAPEnumerator.moveNext(OLAPEnumerator.java:71)
> at Baz$1$1.moveNext(Unknown Source)
> at 
> org.apache.calcite.linq4j.EnumerableDefaults.aggregate(EnumerableDefaults.java:116)
> at 
> org.apache.calcite.linq4j.DefaultEnumerable.aggregate(DefaultEnumerable.java:107)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1670) Unable to find measures, once after cube built successfully.

2016-05-15 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang resolved KYLIN-1670.
---
Resolution: Not A Problem

> Unable to find measures, once after cube built successfully.
> 
>
> Key: KYLIN-1670
> URL: https://issues.apache.org/jira/browse/KYLIN-1670
> Project: Kylin
>  Issue Type: Bug
>  Components: Tools, Build and Test
>Affects Versions: v1.5.1
> Environment: Testing
>Reporter: Bhanuprakash
>Assignee: hongbin ma
> Attachments: Cube.PNG, Measure.PNG
>
>
> We are facing couple of issues , after building the cube in Kylin,
> 1) Measures are not displayed once after the cube is succesffully built.
> 2) cant find option to join / group by  b/w Dimensions & Measure for Slice & 
> Dice.
> 3) connected through EXCEL but the complete data is getting loaded into excel 
> and again need to give relationship b/w dimension and Fact to pivot the data.
> 4) excel limitation to 1 million, how will it support for Big data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1670) Unable to find measures, once after cube built successfully.

2016-05-15 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283832#comment-15283832
 ] 

liyang commented on KYLIN-1670:
---

Unlike traditional cube engine, Kylin exposes SQL interface rather than MDX. 
That's why at query time, what user see is a relational model rather than a 
cube model.

There are requirements about exposing cube model and supporting MDX interface 
on the long term roadmap.

- https://issues.apache.org/jira/browse/KYLIN-776
- https://issues.apache.org/jira/browse/KYLIN-1525


> Unable to find measures, once after cube built successfully.
> 
>
> Key: KYLIN-1670
> URL: https://issues.apache.org/jira/browse/KYLIN-1670
> Project: Kylin
>  Issue Type: Bug
>  Components: Tools, Build and Test
>Affects Versions: v1.5.1
> Environment: Testing
>Reporter: Bhanuprakash
>Assignee: hongbin ma
> Attachments: Cube.PNG, Measure.PNG
>
>
> We are facing couple of issues , after building the cube in Kylin,
> 1) Measures are not displayed once after the cube is succesffully built.
> 2) cant find option to join / group by  b/w Dimensions & Measure for Slice & 
> Dice.
> 3) connected through EXCEL but the complete data is getting loaded into excel 
> and again need to give relationship b/w dimension and Fact to pivot the data.
> 4) excel limitation to 1 million, how will it support for Big data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1684) query on table "kylin_sales" return empty resultset after cube "kylin_sales_cube" which generated by sample.sh is ready

2016-05-15 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283851#comment-15283851
 ] 

hongbin ma commented on KYLIN-1684:
---

hi [~whenwin] the check is for skipping segments those have 0 records. As I 
remember if you removed the check, the query runtime will throw some exceptions 
when 0 record segment is met. Have you verified that?

> query on table "kylin_sales" return empty resultset after cube 
> "kylin_sales_cube" which generated by sample.sh is ready
> ---
>
> Key: KYLIN-1684
> URL: https://issues.apache.org/jira/browse/KYLIN-1684
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v1.5.1
> Environment: cluster:
> hadoop-2.6.0
> hbase-0.98.8
> hive-0.14.0
>Reporter: wangxianbin
>Assignee: wangxianbin
> Attachments: 
> 1.5.1-release-hotfix-KYLIN-1684-query-on-table-kylin_sales-return-empty-r.patch,
>  log for Build Base Cuboid Data.png, log when run query.png
>
>
> there is a check for "InputRecords" in CubeStorageQuery search method which 
> seem like unnecessary, as follow:
> List scanners = Lists.newArrayList();
> for (CubeSegment cubeSeg : 
> cubeInstance.getSegments(SegmentStatusEnum.READY)) {
> CubeSegmentScanner scanner;
> if (cubeSeg.getInputRecords() == 0) {
> logger.info("Skip cube segment {} because its input record is 
> 0", cubeSeg);
> continue;
> }
> scanner = new CubeSegmentScanner(cubeSeg, cuboid, dimensionsD, 
> groupsD, metrics, filterD, !isExactAggregation);
> scanners.add(scanner);
> }
> if (scanners.isEmpty())
> return ITupleIterator.EMPTY_TUPLE_ITERATOR;
> return new SequentialCubeTupleIterator(scanners, cuboid, dimensionsD, 
> metrics, returnTupleInfo, context);
> this check will cause query return empty resultset, even there is data in 
> storage engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1664) rest api '/kylin/api/admin/config' without security check

2016-05-15 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283862#comment-15283862
 ] 

hongbin ma commented on KYLIN-1664:
---

hi,

This API as well as some others are set to be authentication-free for some CLI 
tool's convenience. The issue is not fixed because kylin configs are treated as 
not sensitive. Do you have any security concerns on this?

> rest api '/kylin/api/admin/config' without security check
> -
>
> Key: KYLIN-1664
> URL: https://issues.apache.org/jira/browse/KYLIN-1664
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v1.5.1
> Environment: Ubuntu 14.4
> Jdk 1.7.0
> Kylin 1.5.1 binary
>Reporter: Hanhui LI
>Assignee: Zhong,Jason
>  Labels: test
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> rest api '/kylin/api/admin/config' without security check.
> Please check the follwoing:
> ===
> GET Request: 
> http://127.0.0.1:7070/kylin/api/admin/config
> Response:
> {"config":"kylin.hbase.region.cut.large=50\nkylin.hbase.default.compression.codec=snappy\ndeploy.env=QA\nacl.adminRole=ROLE_ADMIN\nkylin.sandbox=true\nkylin.hdfs.working.dir=/kylin\nldap.user.searchBase=\nkylin.job.concurrent.max.limit=10\nkylin.job.remote.cli.password=\nsaml.metadata.file=classpath:sso_metadata.xml\nkylin.job.yarn.app.rest.check.interval.seconds=10\nmail.sender=\nmail.password=\nkylin.job.remote.cli.username=\nmail.username=\nsaml.context.serverPort=443\nkylin.web.help.length=4\nkylin.job.run.as.remote.cmd=false\nldap.service.searchPattern=\nkylin.web.contact_mail=\nldap.user.groupSearchBase=\nkylin.hbase.region.cut.small=5\nkylin.web.hive.limit=20\nkylin.job.mapreduce.default.reduce.input.mb=500\nkylin.job.hive.database.for.intermediatetable=default\nkylin.metadata.url=kylin_metadata@hbase\nldap.password=\nldap.username=\nkylin.storage.url=hbase\nganglia.port=8664\nldap.user.searchPattern=\nkylin.job.status.with.kerberos=false\nganglia.group=\nkylin.hbase.cluster.fs=\nacl.defaultRole=ROLE_ANALYST,ROLE_MODELER\nsaml.context.contextPath=/kylin\nmail.host=\nkylin.job.remote.cli.working.dir=/tmp/kylin\nkylin.web.diagnostic=\nsaml.context.scheme=https\nkylin.job.cubing.inmem.sampling.percent=100\nldap.service.groupSearchBase=\nsaml.metadata.entityBaseURL=https://hostname/kylin\nkylin.hbase.hfile.size.gb=5\nldap.service.searchBase=\nkylin.owner=who...@kylin.apache.org\nmail.enabled=false\nkylin.rest.servers=localhost:7070\nkylin.security.profile=testing\nkylin.job.retry=0\nsaml.context.serverName=hostname\nldap.server=ldap://ldap_server:389\nkylin.job.remote.cli.hostname=\nkylin.query.security.enabled=true\nkylin.server.mode=all\nkylin.web.help.3=onboard|Cube
>  Design Tutorial|\nkylin.web.help.2=tableau|Tableau 
> Guide|\nkylin.web.help.1=odbc|ODBC 
> Driver|\nkylin.hbase.region.cut.medium=10\nkylin.web.help.0=start|Getting 
> Started|\nkylin.web.hadoop=\nkylin.web.streaming.guide=http://kylin.apache.org/\n"}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KYLIN-1664) rest api '/kylin/api/admin/config' without security check

2016-05-15 Thread hongbin ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma reassigned KYLIN-1664:
-

Assignee: hongbin ma  (was: Zhong,Jason)

> rest api '/kylin/api/admin/config' without security check
> -
>
> Key: KYLIN-1664
> URL: https://issues.apache.org/jira/browse/KYLIN-1664
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v1.5.1
> Environment: Ubuntu 14.4
> Jdk 1.7.0
> Kylin 1.5.1 binary
>Reporter: Hanhui LI
>Assignee: hongbin ma
>  Labels: test
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> rest api '/kylin/api/admin/config' without security check.
> Please check the follwoing:
> ===
> GET Request: 
> http://127.0.0.1:7070/kylin/api/admin/config
> Response:
> {"config":"kylin.hbase.region.cut.large=50\nkylin.hbase.default.compression.codec=snappy\ndeploy.env=QA\nacl.adminRole=ROLE_ADMIN\nkylin.sandbox=true\nkylin.hdfs.working.dir=/kylin\nldap.user.searchBase=\nkylin.job.concurrent.max.limit=10\nkylin.job.remote.cli.password=\nsaml.metadata.file=classpath:sso_metadata.xml\nkylin.job.yarn.app.rest.check.interval.seconds=10\nmail.sender=\nmail.password=\nkylin.job.remote.cli.username=\nmail.username=\nsaml.context.serverPort=443\nkylin.web.help.length=4\nkylin.job.run.as.remote.cmd=false\nldap.service.searchPattern=\nkylin.web.contact_mail=\nldap.user.groupSearchBase=\nkylin.hbase.region.cut.small=5\nkylin.web.hive.limit=20\nkylin.job.mapreduce.default.reduce.input.mb=500\nkylin.job.hive.database.for.intermediatetable=default\nkylin.metadata.url=kylin_metadata@hbase\nldap.password=\nldap.username=\nkylin.storage.url=hbase\nganglia.port=8664\nldap.user.searchPattern=\nkylin.job.status.with.kerberos=false\nganglia.group=\nkylin.hbase.cluster.fs=\nacl.defaultRole=ROLE_ANALYST,ROLE_MODELER\nsaml.context.contextPath=/kylin\nmail.host=\nkylin.job.remote.cli.working.dir=/tmp/kylin\nkylin.web.diagnostic=\nsaml.context.scheme=https\nkylin.job.cubing.inmem.sampling.percent=100\nldap.service.groupSearchBase=\nsaml.metadata.entityBaseURL=https://hostname/kylin\nkylin.hbase.hfile.size.gb=5\nldap.service.searchBase=\nkylin.owner=who...@kylin.apache.org\nmail.enabled=false\nkylin.rest.servers=localhost:7070\nkylin.security.profile=testing\nkylin.job.retry=0\nsaml.context.serverName=hostname\nldap.server=ldap://ldap_server:389\nkylin.job.remote.cli.hostname=\nkylin.query.security.enabled=true\nkylin.server.mode=all\nkylin.web.help.3=onboard|Cube
>  Design Tutorial|\nkylin.web.help.2=tableau|Tableau 
> Guide|\nkylin.web.help.1=odbc|ODBC 
> Driver|\nkylin.hbase.region.cut.medium=10\nkylin.web.help.0=start|Getting 
> Started|\nkylin.web.hadoop=\nkylin.web.streaming.guide=http://kylin.apache.org/\n"}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1689) bug when a column being dimension as well as in a sum metric

2016-05-15 Thread hongbin ma (JIRA)
hongbin ma created KYLIN-1689:
-

 Summary: bug when a column being dimension as well as in a sum 
metric
 Key: KYLIN-1689
 URL: https://issues.apache.org/jira/browse/KYLIN-1689
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


Hi all,
 
I recently built a cube named c1, use 2 columns as dimensions ,”rule_name”,” 
PARTNER_GAIN_PAY_PT_DOC_CNT”, also use ” sum(PARTNER_GAIN_PAY_PT_DOC_CNT)” as 
measure. C1 was built successfully.
 
So, I made a query sql to test, that is “select 
rule_name,PARTNER_GAIN_PAY_PT_DOC_CNT 
,count(*),sum(PARTNER_GAIN_PAY_PT_DOC_CNT) from 
CUB_PARTNER_GAIN_PAY_PT_PRE0_AT0_S where rule_name='1号店3C产品' group by 
rule_name,PARTNER_GAIN_PAY_PT_DOC_CNT;”, but the result is not probably exactly.
RULE_NAME  PARTNER...  EXPR$2 EXPR$3
1号店3C产品1860 301860
1号店3C产品700  2 700
1号店3C产品7410 387410
1号店3C产品2940 602940
 
In my opinion,”count(*)” means the amount of records with the same rule_name 
and PARTNER_GAIN_PAY_PT_DOC_CNT, so I think sum(PARTNER…) equals
Count(*) * PARTNER_GAIN_PAY_PT_DOC_CNT, but the truth is not , I wonder if 
there is something wrong with my understanding?
 
   Insight snapshot as below:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1690) always returning 0 or 1 for sum(a)/sum(b) for integer type a and b

2016-05-15 Thread hongbin ma (JIRA)
hongbin ma created KYLIN-1690:
-

 Summary: always returning 0 or 1 for sum(a)/sum(b) for integer 
type a and b
 Key: KYLIN-1690
 URL: https://issues.apache.org/jira/browse/KYLIN-1690
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma



  I want to get a value which is defined as sum(a)/sum(b), how can I do 
this kind of anlysis.

  Now I build a cube which have sum(a) and sum(b), when I execute “select 
sum(a)/sum(b) from table1 group by c” ,the result is wrong. sum(a)/sum(b) the 
result is all 0 and sum(b)/sum(a) result is all 1.


 MMENE_NAMESUCC   ATTSUCC/ATT
 CSMME15BZX   336981   368366   1
 CSMME32BZX   338754   366842   1
 CSMME07BZX   687965   747694   1
 CSMME03BHW   703269   747623   1
 CSMME12BZX   705856   764656   1
 CSMME16BHW   1962293142173   1


   MMENE_NAME   SUCC   ATT   ATT/SUCC
 CSMME15BZX   336981   368366   0
 CSMME32BZX   338754   366842   0
 CSMME07BZX   687965   747694   0
 CSMME03BHW   703269   747623   0
 CSMME12BZX   705856   764656   0
 CSMME16BHW   1962293142173   0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1641) Spark - pagination

2016-05-15 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283870#comment-15283870
 ] 

hongbin ma commented on KYLIN-1641:
---

it seems posted in wrong project

> Spark - pagination
> --
>
> Key: KYLIN-1641
> URL: https://issues.apache.org/jira/browse/KYLIN-1641
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Dileep
>
> Issue: we have inserted around 10 million records in hive and show the 
> results in web interface through spark dataframe. We cannot get all those 10 
> million and do the pagination in the front end. So we did the pagination in 
> the spark dataframe using following approach 
>   df1 =df.limit(rowsperPage * pagenumer)
> df2 = df1.limit(rowsperPage * (pagenumer  -1))
> df1.subtract(df2)).collect().
> This working fine but when we go up the pagenumber (last page ) it is slowing 
> down and not get the results back to front end. 
> Just want to check what we are doing right or any other solution for this 
> problem
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1641) Spark - pagination

2016-05-15 Thread Dileep (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283879#comment-15283879
 ] 

Dileep commented on KYLIN-1641:
---

may i know which project I need to post

> Spark - pagination
> --
>
> Key: KYLIN-1641
> URL: https://issues.apache.org/jira/browse/KYLIN-1641
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Dileep
>
> Issue: we have inserted around 10 million records in hive and show the 
> results in web interface through spark dataframe. We cannot get all those 10 
> million and do the pagination in the front end. So we did the pagination in 
> the spark dataframe using following approach 
>   df1 =df.limit(rowsperPage * pagenumer)
> df2 = df1.limit(rowsperPage * (pagenumer  -1))
> df1.subtract(df2)).collect().
> This working fine but when we go up the pagenumber (last page ) it is slowing 
> down and not get the results back to front end. 
> Just want to check what we are doing right or any other solution for this 
> problem
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1641) Spark - pagination

2016-05-15 Thread Dong Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284044#comment-15284044
 ] 

Dong Li commented on KYLIN-1641:


Is this a Spark issue? If so please go to Spark project.

> Spark - pagination
> --
>
> Key: KYLIN-1641
> URL: https://issues.apache.org/jira/browse/KYLIN-1641
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Dileep
>
> Issue: we have inserted around 10 million records in hive and show the 
> results in web interface through spark dataframe. We cannot get all those 10 
> million and do the pagination in the front end. So we did the pagination in 
> the spark dataframe using following approach 
>   df1 =df.limit(rowsperPage * pagenumer)
> df2 = df1.limit(rowsperPage * (pagenumer  -1))
> df1.subtract(df2)).collect().
> This working fine but when we go up the pagenumber (last page ) it is slowing 
> down and not get the results back to front end. 
> Just want to check what we are doing right or any other solution for this 
> problem
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1690) always returning 0 or 1 for sum(a)/sum(b) for integer type a and b

2016-05-15 Thread Dong Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284046#comment-15284046
 ] 

Dong Li commented on KYLIN-1690:


This is a duplicated JIRA, which is resolved with workaround.
Link them for tracking.

> always returning 0 or 1 for sum(a)/sum(b) for integer type a and b
> --
>
> Key: KYLIN-1690
> URL: https://issues.apache.org/jira/browse/KYLIN-1690
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
>Assignee: hongbin ma
>
>   I want to get a value which is defined as sum(a)/sum(b), how can I do 
> this kind of anlysis.
>   Now I build a cube which have sum(a) and sum(b), when I execute “select 
> sum(a)/sum(b) from table1 group by c” ,the result is wrong. sum(a)/sum(b) the 
> result is all 0 and sum(b)/sum(a) result is all 1.
>  MMENE_NAMESUCC   ATTSUCC/ATT
>  CSMME15BZX   336981   368366   1
>  CSMME32BZX   338754   366842   1
>  CSMME07BZX   687965   747694   1
>  CSMME03BHW   703269   747623   1
>  CSMME12BZX   705856   764656   1
>  CSMME16BHW   1962293142173   1
>MMENE_NAME   SUCC   ATT   ATT/SUCC
>  CSMME15BZX   336981   368366   0
>  CSMME32BZX   338754   366842   0
>  CSMME07BZX   687965   747694   0
>  CSMME03BHW   703269   747623   0
>  CSMME12BZX   705856   764656   0
>  CSMME16BHW   1962293142173   0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1690) always returning 0 or 1 for sum(a)/sum(b) for integer type a and b

2016-05-15 Thread Dong Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284046#comment-15284046
 ] 

Dong Li edited comment on KYLIN-1690 at 5/16/16 1:13 AM:
-

This is a duplicated JIRA KYLIN-1630, which is resolved with workaround.
Link them for tracking.


was (Author: lidong_sjtu):
This is a duplicated JIRA, which is resolved with workaround.
Link them for tracking.

> always returning 0 or 1 for sum(a)/sum(b) for integer type a and b
> --
>
> Key: KYLIN-1690
> URL: https://issues.apache.org/jira/browse/KYLIN-1690
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
>Assignee: hongbin ma
>
>   I want to get a value which is defined as sum(a)/sum(b), how can I do 
> this kind of anlysis.
>   Now I build a cube which have sum(a) and sum(b), when I execute “select 
> sum(a)/sum(b) from table1 group by c” ,the result is wrong. sum(a)/sum(b) the 
> result is all 0 and sum(b)/sum(a) result is all 1.
>  MMENE_NAMESUCC   ATTSUCC/ATT
>  CSMME15BZX   336981   368366   1
>  CSMME32BZX   338754   366842   1
>  CSMME07BZX   687965   747694   1
>  CSMME03BHW   703269   747623   1
>  CSMME12BZX   705856   764656   1
>  CSMME16BHW   1962293142173   1
>MMENE_NAME   SUCC   ATT   ATT/SUCC
>  CSMME15BZX   336981   368366   0
>  CSMME32BZX   338754   366842   0
>  CSMME07BZX   687965   747694   0
>  CSMME03BHW   703269   747623   0
>  CSMME12BZX   705856   764656   0
>  CSMME16BHW   1962293142173   0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1690) always returning 0 or 1 for sum(a)/sum(b) for integer type a and b

2016-05-15 Thread Dong Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284046#comment-15284046
 ] 

Dong Li edited comment on KYLIN-1690 at 5/16/16 1:20 AM:
-

There is a duplicated JIRA KYLIN-1630, which is resolved with workaround.
Link them for tracking.


was (Author: lidong_sjtu):
This is a duplicated JIRA KYLIN-1630, which is resolved with workaround.
Link them for tracking.

> always returning 0 or 1 for sum(a)/sum(b) for integer type a and b
> --
>
> Key: KYLIN-1690
> URL: https://issues.apache.org/jira/browse/KYLIN-1690
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
>Assignee: hongbin ma
>
>   I want to get a value which is defined as sum(a)/sum(b), how can I do 
> this kind of anlysis.
>   Now I build a cube which have sum(a) and sum(b), when I execute “select 
> sum(a)/sum(b) from table1 group by c” ,the result is wrong. sum(a)/sum(b) the 
> result is all 0 and sum(b)/sum(a) result is all 1.
>  MMENE_NAMESUCC   ATTSUCC/ATT
>  CSMME15BZX   336981   368366   1
>  CSMME32BZX   338754   366842   1
>  CSMME07BZX   687965   747694   1
>  CSMME03BHW   703269   747623   1
>  CSMME12BZX   705856   764656   1
>  CSMME16BHW   1962293142173   1
>MMENE_NAME   SUCC   ATT   ATT/SUCC
>  CSMME15BZX   336981   368366   0
>  CSMME32BZX   338754   366842   0
>  CSMME07BZX   687965   747694   0
>  CSMME03BHW   703269   747623   0
>  CSMME12BZX   705856   764656   0
>  CSMME16BHW   1962293142173   0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1672) support kylin on cdh 5.7

2016-05-15 Thread Lingyan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lingyan Jiang updated KYLIN-1672:
-
Attachment: (was: 0001-KYLIN-1672-support-for-kylin-on-cdh5.7.0.patch)

> support kylin on cdh 5.7
> 
>
> Key: KYLIN-1672
> URL: https://issues.apache.org/jira/browse/KYLIN-1672
> Project: Kylin
>  Issue Type: New Feature
>  Components: Environment 
>Reporter: Dong Li
>Assignee: Lingyan Jiang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1664) rest api '/kylin/api/admin/config' without security check

2016-05-15 Thread Hanhui LI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284084#comment-15284084
 ] 

Hanhui LI commented on KYLIN-1664:
--

Thanks a lot
It is not a good solution to set APIs to be authentication-free just for CLI 
tool's convenience. Are there any sensitive info for these APIs?
I am not sure  whether kylin configs are sensitive. But I am sure someone will 
think it is. :)

> rest api '/kylin/api/admin/config' without security check
> -
>
> Key: KYLIN-1664
> URL: https://issues.apache.org/jira/browse/KYLIN-1664
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v1.5.1
> Environment: Ubuntu 14.4
> Jdk 1.7.0
> Kylin 1.5.1 binary
>Reporter: Hanhui LI
>Assignee: hongbin ma
>  Labels: test
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> rest api '/kylin/api/admin/config' without security check.
> Please check the follwoing:
> ===
> GET Request: 
> http://127.0.0.1:7070/kylin/api/admin/config
> Response:
> {"config":"kylin.hbase.region.cut.large=50\nkylin.hbase.default.compression.codec=snappy\ndeploy.env=QA\nacl.adminRole=ROLE_ADMIN\nkylin.sandbox=true\nkylin.hdfs.working.dir=/kylin\nldap.user.searchBase=\nkylin.job.concurrent.max.limit=10\nkylin.job.remote.cli.password=\nsaml.metadata.file=classpath:sso_metadata.xml\nkylin.job.yarn.app.rest.check.interval.seconds=10\nmail.sender=\nmail.password=\nkylin.job.remote.cli.username=\nmail.username=\nsaml.context.serverPort=443\nkylin.web.help.length=4\nkylin.job.run.as.remote.cmd=false\nldap.service.searchPattern=\nkylin.web.contact_mail=\nldap.user.groupSearchBase=\nkylin.hbase.region.cut.small=5\nkylin.web.hive.limit=20\nkylin.job.mapreduce.default.reduce.input.mb=500\nkylin.job.hive.database.for.intermediatetable=default\nkylin.metadata.url=kylin_metadata@hbase\nldap.password=\nldap.username=\nkylin.storage.url=hbase\nganglia.port=8664\nldap.user.searchPattern=\nkylin.job.status.with.kerberos=false\nganglia.group=\nkylin.hbase.cluster.fs=\nacl.defaultRole=ROLE_ANALYST,ROLE_MODELER\nsaml.context.contextPath=/kylin\nmail.host=\nkylin.job.remote.cli.working.dir=/tmp/kylin\nkylin.web.diagnostic=\nsaml.context.scheme=https\nkylin.job.cubing.inmem.sampling.percent=100\nldap.service.groupSearchBase=\nsaml.metadata.entityBaseURL=https://hostname/kylin\nkylin.hbase.hfile.size.gb=5\nldap.service.searchBase=\nkylin.owner=who...@kylin.apache.org\nmail.enabled=false\nkylin.rest.servers=localhost:7070\nkylin.security.profile=testing\nkylin.job.retry=0\nsaml.context.serverName=hostname\nldap.server=ldap://ldap_server:389\nkylin.job.remote.cli.hostname=\nkylin.query.security.enabled=true\nkylin.server.mode=all\nkylin.web.help.3=onboard|Cube
>  Design Tutorial|\nkylin.web.help.2=tableau|Tableau 
> Guide|\nkylin.web.help.1=odbc|ODBC 
> Driver|\nkylin.hbase.region.cut.medium=10\nkylin.web.help.0=start|Getting 
> Started|\nkylin.web.hadoop=\nkylin.web.streaming.guide=http://kylin.apache.org/\n"}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1691) can not load project info from hbase when startup.

2016-05-15 Thread Hanhui LI (JIRA)
Hanhui LI created KYLIN-1691:


 Summary: can not load project info from hbase when startup.
 Key: KYLIN-1691
 URL: https://issues.apache.org/jira/browse/KYLIN-1691
 Project: Kylin
  Issue Type: Bug
  Components: Environment 
Affects Versions: v1.5.1
 Environment: Ubuntu 14
JDK 1.7 
kylin 1.5.1
Reporter: Hanhui LI
Assignee: hongbin ma


can not load project info from hbase when startup if directory 
kylin_metadata@hbase is created in $KYLIN_HOME



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1692) kylin server hanged during startup if hbase is not up.

2016-05-15 Thread Hanhui LI (JIRA)
Hanhui LI created KYLIN-1692:


 Summary: kylin server hanged during startup if hbase is not up.
 Key: KYLIN-1692
 URL: https://issues.apache.org/jira/browse/KYLIN-1692
 Project: Kylin
  Issue Type: Bug
Affects Versions: v1.5.1
 Environment: Ubuntu 14
JDK 1.7
kylin 1.5.1
Reporter: Hanhui LI


kylin server hanged during startup if hbase is not up.
kylin can not re-connect to hbase even hbase is up later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1691) can not load project info from hbase when startup.

2016-05-15 Thread Hanhui LI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanhui LI updated KYLIN-1691:
-
Environment: 
Ubuntu 14
JDK 1.7 
kylin 1.5.1 - Binary Package (for running on HBase 1.x)

  was:
Ubuntu 14
JDK 1.7 
kylin 1.5.1


> can not load project info from hbase when startup.
> --
>
> Key: KYLIN-1691
> URL: https://issues.apache.org/jira/browse/KYLIN-1691
> Project: Kylin
>  Issue Type: Bug
>  Components: Environment 
>Affects Versions: v1.5.1
> Environment: Ubuntu 14
> JDK 1.7 
> kylin 1.5.1 - Binary Package (for running on HBase 1.x)
>Reporter: Hanhui LI
>Assignee: hongbin ma
>
> can not load project info from hbase when startup if directory 
> kylin_metadata@hbase is created in $KYLIN_HOME



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1692) kylin server hanged during startup if hbase is not up.

2016-05-15 Thread Hanhui LI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanhui LI updated KYLIN-1692:
-
Environment: 
Ubuntu 14
JDK 1.7
kylin 1.5.1 - Binary Package (for running on HBase 1.x)

  was:
Ubuntu 14
JDK 1.7
kylin 1.5.1


> kylin server hanged during startup if hbase is not up.
> --
>
> Key: KYLIN-1692
> URL: https://issues.apache.org/jira/browse/KYLIN-1692
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.1
> Environment: Ubuntu 14
> JDK 1.7
> kylin 1.5.1 - Binary Package (for running on HBase 1.x)
>Reporter: Hanhui LI
>
> kylin server hanged during startup if hbase is not up.
> kylin can not re-connect to hbase even hbase is up later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1693) Support multiple group-by columns for TOP_N meausre

2016-05-15 Thread JunAn Chen (JIRA)
JunAn Chen created KYLIN-1693:
-

 Summary: Support multiple group-by columns for TOP_N meausre
 Key: KYLIN-1693
 URL: https://issues.apache.org/jira/browse/KYLIN-1693
 Project: Kylin
  Issue Type: New Feature
  Components: Query Engine
Affects Versions: v1.5.1
Reporter: JunAn Chen
Assignee: liyang


For this case:
table name : "tbl"
columns:  (dim_city, dim_industry, keyword, pv)

the "keyword" column has a large cardinality, for about ten million.

currently I can build "top100 pv" in (dim_city), (dim_industry). 
But I also want to build "top100 pv" in (dim_city, dim_industry) and "top100 pv 
of keyword" in (dim_city), (dim_industry) and (dim_city, dim_industrt).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1694) make multiply coefficient configurable when estimating cuboid size

2016-05-15 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-1694:
-

 Summary: make multiply coefficient configurable when estimating 
cuboid size
 Key: KYLIN-1694
 URL: https://issues.apache.org/jira/browse/KYLIN-1694
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v1.5.1, v1.5.0
Reporter: kangkaisen
Assignee: Dong Li


In the current version of MRv2 build engine, in CubeStatsReader when estimating 
cuboid size , the curent method is "cube is memory hungry, storage size 
estimation multiply 0.05" and "cube is not memory hungry, storage size 
estimation multiply 0.25".

This has one major problems:the default multiply coefficient is smaller, this 
will make the estimated cuboid size much less than the actual
cuboid size,which will lead to the region numbers of HBase and the reducer 
numbers of CubeHFileJob are both smaller. obviously, the current method
makes the job of CubeHFileJob much slower.

After we remove the the default multiply coefficient, the job of CubeHFileJob 
becomes much faster.

we'd better make multiply coefficient configurable and this could be more 
friendly for user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1695) disable cardinality calculation job when loading hive table

2016-05-15 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-1695:
-

 Summary: disable cardinality calculation job when loading hive 
table
 Key: KYLIN-1695
 URL: https://issues.apache.org/jira/browse/KYLIN-1695
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v1.5.1
Reporter: kangkaisen
Assignee: Dong Li


When user loads/reloads hive tables from web console, kylin will submit a mr 
job asynchronously to calculate column cardinalities. This has four major 
problems:

# the calculated cardinality is stored in table metadata, but never used in 
cubing/querying
# table may change after loading, so the cardinality doesn't necessarily 
reflect the actual value
# the current `HiveColumnCardinalityJob` has many limitations, e.g., it doesn't 
support views
# the `HiveColumnCardinalityJob` may use lots of resources when computing 
cardinality of partitioned table

Due to these problems, we should disable it by default and (maybe) remove it in 
future releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1319) Find a better way to check hadoop job status

2016-05-15 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284137#comment-15284137
 ] 

Zhong Yanghong commented on KYLIN-1319:
---

I have tried the CI test. It seems it works well for the cube building. 
However, tests in "ITMassInQueryTest" always failed due to other reasons. I'll 
fix that before merging.

> Find a better way to check hadoop job status
> 
>
> Key: KYLIN-1319
> URL: https://issues.apache.org/jira/browse/KYLIN-1319
> Project: Kylin
>  Issue Type: Improvement
>Reporter: liyang
>Assignee: Zhong Yanghong
>  Labels: newbie
> Attachments: 
> Find_better_way_of_checking_hadoop_job_status_by_YarnClient_master.patch, 
> Find_better_way_of_checking_hadoop_job_status_via_job_API_master.patch
>
>
> Currently Kylin retrieves jobs status via a resource manager web service like 
> {code}https://:/ws/v1/cluster/apps/${job_id}?anonymous=true{code}
> It is not most robust. Some user does not have 
> "yarn.resourcemanager.webapp.address" set in yarm-site.xml, then get status 
> will fail out-of-box. They have to set a Kylin property 
> "kylin.job.yarn.app.rest.check.status.url" to overcome, which is not user 
> friendly.
> Kerberos authentication might cause problem too if security is enabled.
> Is there a more robust way to check job status? Via Job API?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1696) Have caught exception when connection issue occurs for some Broker

2016-05-15 Thread Zhong Yanghong (JIRA)
Zhong Yanghong created KYLIN-1696:
-

 Summary: Have caught exception when connection issue occurs for 
some Broker
 Key: KYLIN-1696
 URL: https://issues.apache.org/jira/browse/KYLIN-1696
 Project: Kylin
  Issue Type: Bug
  Components: streaming
Reporter: Zhong Yanghong
Assignee: Zhong Yanghong


2016-05-16 01:50:24,711 ERROR [main StreamingCLI:109]: error start streaming
java.lang.RuntimeException: error when get StreamingMessages
at 
org.apache.kylin.source.kafka.KafkaStreamingInput.getBatchWithTimeWindow(KafkaStreamingInput.java:93)
at 
org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.run(OneOffStreamingBuilder.java:72)
at 
org.apache.kylin.engine.streaming.cli.StreamingCLI.startOneOffCubeStreaming(StreamingCLI.java:129)
at 
org.apache.kylin.engine.streaming.cli.StreamingCLI.main(StreamingCLI.java:103)
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:127)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44)
at 
kafka.consumer.SimpleConsumer.getOrMakeConnection(SimpleConsumer.scala:142)
at 
kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:69)
at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:93)
at kafka.javaapi.consumer.SimpleConsumer.send(SimpleConsumer.scala:68)
at 
org.apache.kylin.source.kafka.util.KafkaRequester.getPartitionMetadata(KafkaRequester.java:132)
at 
org.apache.kylin.source.kafka.util.KafkaUtils.getLeadBroker(KafkaUtils.java:53)
at 
org.apache.kylin.source.kafka.util.KafkaUtils.getFirstAndLastOffset(KafkaUtils.java:113)
at 
org.apache.kylin.source.kafka.util.KafkaUtils.findClosestOffsetWithDataTimestamp(KafkaUtils.java:102)
at 
org.apache.kylin.source.kafka.KafkaStreamingInput$StreamingMessageProducer.call(KafkaStreamingInput.java:141)
at 
org.apache.kylin.source.kafka.KafkaStreamingInput$StreamingMessageProducer.call(KafkaStreamingInput.java:104)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1696) Have caught exception when connection issue occurs for some Broker

2016-05-15 Thread Zhong Yanghong (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhong Yanghong updated KYLIN-1696:
--
Affects Version/s: v1.4.0
   v1.5.0

> Have caught exception when connection issue occurs for some Broker
> --
>
> Key: KYLIN-1696
> URL: https://issues.apache.org/jira/browse/KYLIN-1696
> Project: Kylin
>  Issue Type: Bug
>  Components: streaming
>Affects Versions: v1.5.0, v1.4.0
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> 2016-05-16 01:50:24,711 ERROR [main StreamingCLI:109]: error start streaming
> java.lang.RuntimeException: error when get StreamingMessages
> at 
> org.apache.kylin.source.kafka.KafkaStreamingInput.getBatchWithTimeWindow(KafkaStreamingInput.java:93)
> at 
> org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.run(OneOffStreamingBuilder.java:72)
> at 
> org.apache.kylin.engine.streaming.cli.StreamingCLI.startOneOffCubeStreaming(StreamingCLI.java:129)
> at 
> org.apache.kylin.engine.streaming.cli.StreamingCLI.main(StreamingCLI.java:103)
> Caused by: java.nio.channels.UnresolvedAddressException
> at sun.nio.ch.Net.checkAddress(Net.java:127)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
> at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
> at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44)
> at 
> kafka.consumer.SimpleConsumer.getOrMakeConnection(SimpleConsumer.scala:142)
> at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:69)
> at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:93)
> at kafka.javaapi.consumer.SimpleConsumer.send(SimpleConsumer.scala:68)
> at 
> org.apache.kylin.source.kafka.util.KafkaRequester.getPartitionMetadata(KafkaRequester.java:132)
> at 
> org.apache.kylin.source.kafka.util.KafkaUtils.getLeadBroker(KafkaUtils.java:53)
> at 
> org.apache.kylin.source.kafka.util.KafkaUtils.getFirstAndLastOffset(KafkaUtils.java:113)
> at 
> org.apache.kylin.source.kafka.util.KafkaUtils.findClosestOffsetWithDataTimestamp(KafkaUtils.java:102)
> at 
> org.apache.kylin.source.kafka.KafkaStreamingInput$StreamingMessageProducer.call(KafkaStreamingInput.java:141)
> at 
> org.apache.kylin.source.kafka.KafkaStreamingInput$StreamingMessageProducer.call(KafkaStreamingInput.java:104)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1684) query on table "kylin_sales" return empty resultset after cube "kylin_sales_cube" which generated by sample.sh is ready

2016-05-15 Thread wangxianbin (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284141#comment-15284141
 ] 

wangxianbin commented on KYLIN-1684:


hi hongbin! in your commit for "KYLIN-1465", I notice that 
NotEnoughGTInfoException is the only exception you are trying to catch in 
CubeStorageQuery search regardless segment record count, and CubeGridTable 
newGTInfo is the only point you throw NotEnoughGTInfoException when there is a 
dic info mismatch between CubeManager and Cuboid(in which case dict == null), 
however, seem like your guys have refactor it, for case where dict not 
found(dict == null) in CubeDimEncMap, FixedLenDimEnc is used, and therefore I 
just remove the check, if there is some other runtime exception I should worry 
about, I may miss it, anyway, test is aways a better choice.

> query on table "kylin_sales" return empty resultset after cube 
> "kylin_sales_cube" which generated by sample.sh is ready
> ---
>
> Key: KYLIN-1684
> URL: https://issues.apache.org/jira/browse/KYLIN-1684
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v1.5.1
> Environment: cluster:
> hadoop-2.6.0
> hbase-0.98.8
> hive-0.14.0
>Reporter: wangxianbin
>Assignee: wangxianbin
> Attachments: 
> 1.5.1-release-hotfix-KYLIN-1684-query-on-table-kylin_sales-return-empty-r.patch,
>  log for Build Base Cuboid Data.png, log when run query.png
>
>
> there is a check for "InputRecords" in CubeStorageQuery search method which 
> seem like unnecessary, as follow:
> List scanners = Lists.newArrayList();
> for (CubeSegment cubeSeg : 
> cubeInstance.getSegments(SegmentStatusEnum.READY)) {
> CubeSegmentScanner scanner;
> if (cubeSeg.getInputRecords() == 0) {
> logger.info("Skip cube segment {} because its input record is 
> 0", cubeSeg);
> continue;
> }
> scanner = new CubeSegmentScanner(cubeSeg, cuboid, dimensionsD, 
> groupsD, metrics, filterD, !isExactAggregation);
> scanners.add(scanner);
> }
> if (scanners.isEmpty())
> return ITupleIterator.EMPTY_TUPLE_ITERATOR;
> return new SequentialCubeTupleIterator(scanners, cuboid, dimensionsD, 
> metrics, returnTupleInfo, context);
> this check will cause query return empty resultset, even there is data in 
> storage engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1696) Have caught exception when connection issue occurs for some Broker

2016-05-15 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284142#comment-15284142
 ] 

Zhong Yanghong commented on KYLIN-1696:
---

To get the leader broker, every broker in the cluster needs to be visited. 
However, exceptions have been caught for those brokers with issues.

> Have caught exception when connection issue occurs for some Broker
> --
>
> Key: KYLIN-1696
> URL: https://issues.apache.org/jira/browse/KYLIN-1696
> Project: Kylin
>  Issue Type: Bug
>  Components: streaming
>Affects Versions: v1.5.0, v1.4.0
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> 2016-05-16 01:50:24,711 ERROR [main StreamingCLI:109]: error start streaming
> java.lang.RuntimeException: error when get StreamingMessages
> at 
> org.apache.kylin.source.kafka.KafkaStreamingInput.getBatchWithTimeWindow(KafkaStreamingInput.java:93)
> at 
> org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.run(OneOffStreamingBuilder.java:72)
> at 
> org.apache.kylin.engine.streaming.cli.StreamingCLI.startOneOffCubeStreaming(StreamingCLI.java:129)
> at 
> org.apache.kylin.engine.streaming.cli.StreamingCLI.main(StreamingCLI.java:103)
> Caused by: java.nio.channels.UnresolvedAddressException
> at sun.nio.ch.Net.checkAddress(Net.java:127)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
> at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
> at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44)
> at 
> kafka.consumer.SimpleConsumer.getOrMakeConnection(SimpleConsumer.scala:142)
> at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:69)
> at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:93)
> at kafka.javaapi.consumer.SimpleConsumer.send(SimpleConsumer.scala:68)
> at 
> org.apache.kylin.source.kafka.util.KafkaRequester.getPartitionMetadata(KafkaRequester.java:132)
> at 
> org.apache.kylin.source.kafka.util.KafkaUtils.getLeadBroker(KafkaUtils.java:53)
> at 
> org.apache.kylin.source.kafka.util.KafkaUtils.getFirstAndLastOffset(KafkaUtils.java:113)
> at 
> org.apache.kylin.source.kafka.util.KafkaUtils.findClosestOffsetWithDataTimestamp(KafkaUtils.java:102)
> at 
> org.apache.kylin.source.kafka.KafkaStreamingInput$StreamingMessageProducer.call(KafkaStreamingInput.java:141)
> at 
> org.apache.kylin.source.kafka.KafkaStreamingInput$StreamingMessageProducer.call(KafkaStreamingInput.java:104)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1684) query on table "kylin_sales" return empty resultset after cube "kylin_sales_cube" which generated by sample.sh is ready

2016-05-15 Thread wangxianbin (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284177#comment-15284177
 ] 

wangxianbin commented on KYLIN-1684:


hi! [~mahongbin],I have try on 0 record segment, it work well!

> query on table "kylin_sales" return empty resultset after cube 
> "kylin_sales_cube" which generated by sample.sh is ready
> ---
>
> Key: KYLIN-1684
> URL: https://issues.apache.org/jira/browse/KYLIN-1684
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v1.5.1
> Environment: cluster:
> hadoop-2.6.0
> hbase-0.98.8
> hive-0.14.0
>Reporter: wangxianbin
>Assignee: wangxianbin
> Attachments: 
> 1.5.1-release-hotfix-KYLIN-1684-query-on-table-kylin_sales-return-empty-r.patch,
>  log for Build Base Cuboid Data.png, log when run query.png
>
>
> there is a check for "InputRecords" in CubeStorageQuery search method which 
> seem like unnecessary, as follow:
> List scanners = Lists.newArrayList();
> for (CubeSegment cubeSeg : 
> cubeInstance.getSegments(SegmentStatusEnum.READY)) {
> CubeSegmentScanner scanner;
> if (cubeSeg.getInputRecords() == 0) {
> logger.info("Skip cube segment {} because its input record is 
> 0", cubeSeg);
> continue;
> }
> scanner = new CubeSegmentScanner(cubeSeg, cuboid, dimensionsD, 
> groupsD, metrics, filterD, !isExactAggregation);
> scanners.add(scanner);
> }
> if (scanners.isEmpty())
> return ITupleIterator.EMPTY_TUPLE_ITERATOR;
> return new SequentialCubeTupleIterator(scanners, cuboid, dimensionsD, 
> metrics, returnTupleInfo, context);
> this check will cause query return empty resultset, even there is data in 
> storage engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)