[jira] [Updated] (KYLIN-2217) Reducers build dictionaries locally

2016-12-10 Thread XIE FAN (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XIE FAN updated KYLIN-2217:
---
Fix Version/s: (was: v1.6.0)
   v1.6.1

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v1.6.1
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KYLIN-2124) Property 'kylin.job.hive.database.for.intermediatetable' does not work for Beeline

2016-12-10 Thread hongbin ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma closed KYLIN-2124.
-
Resolution: Not A Problem
  Assignee: hongbin ma  (was: Shaofeng SHI)

Turns out I defined the property twice, and the latter one is overwriting the 
previous one

> Property 'kylin.job.hive.database.for.intermediatetable' does not work for 
> Beeline
> --
>
> Key: KYLIN-2124
> URL: https://issues.apache.org/jira/browse/KYLIN-2124
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Billy Liu
>Assignee: hongbin ma
>
> hive.cli=beeline
> and changed kylin.job.hiev.database.for.intermediatetable
> The properties does not work, it works for hive cli.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2144) move useful operation tools to org.apache.kylin.tool

2016-12-10 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15739262#comment-15739262
 ] 

Shaofeng SHI commented on KYLIN-2144:
-

Can we defer such reorg into a major version change? 

> move useful operation tools to org.apache.kylin.tool
> 
>
> Key: KYLIN-2144
> URL: https://issues.apache.org/jira/browse/KYLIN-2144
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v1.6.1
>
>
> due to historical reasons, the following 5 operation tools:
> StorageCleanupJob,MetadataCleanupJob,CubeMigrationCLI, 
> CubeMigrationCheckCLI,ExtendCubeToHybridCLI
> locates in  org.apache.kylin.storage.hbase.util, which brings dependency 
> issues and other concerns. 
> In 1.6.1 and later, we'll move the 5 tools to org.apache.kylin.tool. The old 
> java class will mark as deprecated, and no longer under maintainance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-2248) TopN merge further optimization after KYLIN-1917

2016-12-10 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI resolved KYLIN-2248.
-
Resolution: Fixed

> TopN merge further optimization after KYLIN-1917
> 
>
> Key: KYLIN-2248
> URL: https://issues.apache.org/jira/browse/KYLIN-2248
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
> Fix For: v1.6.1
>
>
> After KYLIN-1917, there still be room for performance optimization when 
> building a cube which has very large amount rows but the cardinality of all 
> dimension are quite small.
> Then there will be much aggregation happens in building base cuboid. The 
> reducer has a big pressure on CPU. With JStack we observed the CPU was spent 
> on the TopNCounter.merge(), in the HashMap.get() method.
> {code}
> Thread 28679: (state = IN_JAVA)
>  - java.util.HashMap.getEntry(java.lang.Object) @bci=81, line=465 (Compiled 
> frame; information may be imprecise)
>  - java.util.HashMap.get(java.lang.Object) @bci=11, line=417 (Compiled frame)
>  - 
> org.apache.kylin.measure.topn.TopNCounter.merge(org.apache.kylin.measure.topn.TopNCounter)
>  @bci=117, line=174 (Compiled frame)
>  - 
> org.apache.kylin.measure.topn.TopNAggregator.aggregate(org.apache.kylin.measure.topn.TopNCounter)
>  @bci=38, line=44 (Compiled frame)
>  - org.apache.kylin.measure.topn.TopNAggregator.aggregate(java.lang.Object) 
> @bci=5, line=27 (Compiled frame)
>  - org.apache.kylin.measure.MeasureAggregators.aggregate(java.lang.Object[]) 
> @bci=42, line=76 (Compiled frame)
>  - 
> org.apache.kylin.engine.mr.steps.CuboidReducer.doReduce(org.apache.hadoop.io.Text,
>  java.lang.Iterable, org.apache.hadoop.mapreduce.Reducer$Context) @bci=95, 
> line=97 (Compiled frame)
>  - org.apache.kylin.engine.mr.steps.CuboidReducer.doReduce(java.lang.Object, 
> java.lang.Iterable, org.apache.hadoop.mapreduce.Reducer$Context) @bci=7, 
> line=42 (Interpreted frame)
>  - org.apache.kylin.engine.mr.KylinReducer.reduce(java.lang.Object, 
> java.lang.Iterable, org.apache.hadoop.mapreduce.Reducer$Context) @bci=4, 
> line=40 (Interpreted frame)
>  - 
> org.apache.hadoop.mapreduce.Reducer.run(org.apache.hadoop.mapreduce.Reducer$Context)
>  @bci=22, line=171 (Interpreted frame)
>  - 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(org.apache.hadoop.mapred.JobConf,
>  org.apache.hadoop.mapred.TaskUmbilicalProtocol, 
> org.apache.hadoop.mapred.Task$TaskReporter, 
> org.apache.hadoop.mapred.RawKeyValueIterator, 
> org.apache.hadoop.io.RawComparator, java.lang.Class, java.lang.Class) 
> @bci=119, line=627 (Interpreted frame)
>  - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, 
> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=384, line=389 
> (Interpreted frame)
>  - org.apache.hadoop.mapred.YarnChild$2.run() @bci=36, line=164 (Interpreted 
> frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Interpreted frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=415 (Interpreted frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1709 (Interpreted frame)
>  - org.apache.hadoop.mapred.YarnChild.main(java.lang.String[]) @bci=514, 
> line=158 (Interpreted frame)
>  
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-2262) Get "null" error when trigger a build with wrong cube name

2016-12-10 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI resolved KYLIN-2262.
-
Resolution: Fixed

> Get "null" error when trigger a build with wrong cube name
> --
>
> Key: KYLIN-2262
> URL: https://issues.apache.org/jira/browse/KYLIN-2262
> Project: Kylin
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: v1.6.0
> Environment: CDH1.5.7
> Kylin1.6 
> KAFKA-2.0.2-1.2.0.2.p0.5
>Reporter: QiLiFei
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v1.6.1
>
> Attachments: kylin.logError.txt
>
>
> When I build the kafka stream cube  according to the doc 
> (http://kylin.apache.org/docs16/tutorial/cube_streaming.html) , it always 
> raise the error in the CLI 
> {"url":"http://172.31.18.12:7070/kylin/api/cubes/StreamingCube9/build2","exception":null}
> From the kylin.log, there are only "Java.lang.NullPointerException" 
> present!!I'm not sure what exactly happened there !!!Please give me some 
> support !



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2262) Get "null" error when trigger a build with wrong cube name

2016-12-10 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-2262:

   Labels:   (was: features)
 Priority: Minor  (was: Blocker)
Fix Version/s: v1.6.1
   Issue Type: Bug  (was: Test)
  Summary: Get "null" error when trigger a build with wrong cube name  
(was: Kafka Streaming Cube build error  )

I reproduced this issue; the reason is given a wrong Cube name when using the 
API. The fix will be included in next version.

> Get "null" error when trigger a build with wrong cube name
> --
>
> Key: KYLIN-2262
> URL: https://issues.apache.org/jira/browse/KYLIN-2262
> Project: Kylin
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: v1.6.0
> Environment: CDH1.5.7
> Kylin1.6 
> KAFKA-2.0.2-1.2.0.2.p0.5
>Reporter: QiLiFei
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v1.6.1
>
> Attachments: kylin.logError.txt
>
>
> When I build the kafka stream cube  according to the doc 
> (http://kylin.apache.org/docs16/tutorial/cube_streaming.html) , it always 
> raise the error in the CLI 
> {"url":"http://172.31.18.12:7070/kylin/api/cubes/StreamingCube9/build2","exception":null}
> From the kylin.log, there are only "Java.lang.NullPointerException" 
> present!!I'm not sure what exactly happened there !!!Please give me some 
> support !



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1793) Job couldn't stop when hive commands got error with beeline

2016-12-10 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15739229#comment-15739229
 ] 

Shaofeng SHI commented on KYLIN-1793:
-

Get it; Thanks Dong.

> Job couldn't stop when hive commands got error with beeline
> ---
>
> Key: KYLIN-1793
> URL: https://issues.apache.org/jira/browse/KYLIN-1793
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.0, v1.5.1, v1.5.2
>Reporter: Shaofeng SHI
>Assignee: Dong Li
>  Labels: newbie
> Fix For: v1.6.1
>
>
> Configure Kylin to use beeline as the hive command line; submit a cube build 
> job, the job moves to 100% with success, while I found there was error in the 
> hive related steps, but the error wasn't captured by Kylin;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2016-12-10 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15739225#comment-15739225
 ] 

Shaofeng SHI commented on KYLIN-2217:
-

1.6.0 already released in Nov 26, should this be in 1.6.1? [~xiefan46]

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v1.6.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1851) Improve build dictionary, consider that input is already sorted

2016-12-10 Thread XIE FAN (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737984#comment-15737984
 ] 

XIE FAN commented on KYLIN-1851:


It's already ok. The new dictionary class is TrieDictionaryForest.

> Improve build dictionary, consider that input is already sorted
> ---
>
> Key: KYLIN-1851
> URL: https://issues.apache.org/jira/browse/KYLIN-1851
> Project: Kylin
>  Issue Type: Bug
>Reporter: liyang
>Assignee: XIE FAN
> Attachments: 
> 0001-KYLIN-1851-unfinished-add-TrieDictionaryForest-and-N.patch
>
>
> Currently dictionary build may encounter OOM when cardinality is huge. This 
> can benefit from that the input (which is the output of FactDistinctColumn 
> reducer) is already sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1851) Improve build dictionary, consider that input is already sorted

2016-12-10 Thread XIE FAN (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XIE FAN resolved KYLIN-1851.

Resolution: Fixed

> Improve build dictionary, consider that input is already sorted
> ---
>
> Key: KYLIN-1851
> URL: https://issues.apache.org/jira/browse/KYLIN-1851
> Project: Kylin
>  Issue Type: Bug
>Reporter: liyang
>Assignee: XIE FAN
> Attachments: 
> 0001-KYLIN-1851-unfinished-add-TrieDictionaryForest-and-N.patch
>
>
> Currently dictionary build may encounter OOM when cardinality is huge. This 
> can benefit from that the input (which is the output of FactDistinctColumn 
> reducer) is already sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KYLIN-1773) Model should not be editable if used by cubes

2016-12-10 Thread Dong Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Li closed KYLIN-1773.
--
Resolution: Duplicate

> Model should not be editable if used by cubes
> -
>
> Key: KYLIN-1773
> URL: https://issues.apache.org/jira/browse/KYLIN-1773
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2
>Reporter: Dong Li
>Assignee: Dong Li
>Priority: Minor
>
> With sample data
> 1. build cube kylin_sales_cube.
> 2. edit model kylin_sales_model
> 3. change fact table to other tables
> 4. save
> Actual: model saved, and cube/model loading will fail
> Expect: Model cannot be saved, with warn message like "Cannot edit model 
> because there's cube references."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2117) support add edit model even cube is ready

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737967#comment-15737967
 ] 

hongbin ma commented on KYLIN-2117:
---

[~Zhixiong Chen] what's the status now? 

> support add edit model even cube is ready
> -
>
> Key: KYLIN-2117
> URL: https://issues.apache.org/jira/browse/KYLIN-2117
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.5.4.1
>Reporter: Zhong,Jason
>Assignee: Zhixiong Chen
> Fix For: v1.6.1
>
>
> now after creating a model and cube, when cube id build to 'READY' status,
> we cannot edit model. but user may want to add new column to model.
> so it's necessary to support edit model, and check at backend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-2176) org.apache.kylin.rest.service.JobService#submitJob will leave orphan NEW segment in cube when exception is met

2016-12-10 Thread hongbin ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-2176.
---
   Resolution: Fixed
Fix Version/s: v1.6.1

commit id:a6e8c35da8e2c6d944dce4c7d261a541c312b7f6

> org.apache.kylin.rest.service.JobService#submitJob will leave orphan NEW 
> segment in cube when exception is met
> --
>
> Key: KYLIN-2176
> URL: https://issues.apache.org/jira/browse/KYLIN-2176
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v1.6.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1793) Job couldn't stop when hive commands got error with beeline

2016-12-10 Thread Dong Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737952#comment-15737952
 ] 

Dong Li commented on KYLIN-1793:


Hi Hongbin and Shaofeng,

The commit is: 
https://github.com/apache/kylin/commit/99f1dd9d2460566748db2167e69a6a8a7689271d

The root cause is: we use "beeline -f xxx.sql;rm xxx.sq" to execute beeline 
command before. The "rm xxx.sql" part always succeeds and will swallow the exit 
code of "beeline -f xxx.sql'.
The fix is using a variable to record the exit code, and return it in the end.

Thanks.

> Job couldn't stop when hive commands got error with beeline
> ---
>
> Key: KYLIN-1793
> URL: https://issues.apache.org/jira/browse/KYLIN-1793
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.0, v1.5.1, v1.5.2
>Reporter: Shaofeng SHI
>Assignee: Dong Li
>  Labels: newbie
> Fix For: v1.6.1
>
>
> Configure Kylin to use beeline as the hive command line; submit a cube build 
> job, the job moves to 100% with success, while I found there was error in the 
> hive related steps, but the error wasn't captured by Kylin;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1855) Inner-join query partially matches inner-join model can return incorrect result

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737949#comment-15737949
 ] 

hongbin ma commented on KYLIN-1855:
---

I don't think it's good idea to reopen this JIRA, which became part of history 
as it went public with prior releases. Maybe it's better to open a  new JIRA 
and link to this, so that new comer can figure out when and what happened 
clearly

> Inner-join query partially matches inner-join model can return incorrect 
> result
> ---
>
> Key: KYLIN-1855
> URL: https://issues.apache.org/jira/browse/KYLIN-1855
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: liyang
> Fix For: v1.5.4
>
> Attachments: exclude_unused_joins.patch
>
>
> A cube is based on a model in which a star schema is defined. In some cases, 
> the cube utilizes only a few lookup tables rather than all. In this case, 
> when creating the sql for the flat table, those lookup tables should not be 
> included. Otherwise, it will confuse users when query. If users do query 
> according to the definition of the flat table, error of no realization will 
> occur due to lack of the related join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737937#comment-15737937
 ] 

hongbin ma commented on KYLIN-2088:
---

sorry, it should be Yerui :)

> Support intersect count for calculation of retention or conversion rates
> 
>
> Key: KYLIN-2088
> URL: https://issues.apache.org/jira/browse/KYLIN-2088
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Fix For: v1.6.0
>
> Attachments: KYLIN-2088.patch
>
>
> Retention or Conversion Rates is very important in data analyze. 
> It can be calculated from two dataset of two different value of one 
> dimension. For example, we have an count distinct measure, like uv(dataset of 
> uuid), and one dimension, like date, and the retention of uv between 
> '20161015' and '20161016' is the intersection of two uv datasets.
> Fortunately, we have implement dataset in Kylin, as bitmap, for precisely 
> count distinct. Only an UDAF is needed to calculate intersection of two or 
> more bitmaps.
> I'll try on this and post patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737935#comment-15737935
 ] 

hongbin ma commented on KYLIN-2088:
---

Thanks dayue, the blog is already there: 
http://kylin.apache.org/blog/2016/11/28/intersect-count/

> Support intersect count for calculation of retention or conversion rates
> 
>
> Key: KYLIN-2088
> URL: https://issues.apache.org/jira/browse/KYLIN-2088
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Fix For: v1.6.0
>
> Attachments: KYLIN-2088.patch
>
>
> Retention or Conversion Rates is very important in data analyze. 
> It can be calculated from two dataset of two different value of one 
> dimension. For example, we have an count distinct measure, like uv(dataset of 
> uuid), and one dimension, like date, and the retention of uv between 
> '20161015' and '20161016' is the intersection of two uv datasets.
> Fortunately, we have implement dataset in Kylin, as bitmap, for precisely 
> count distinct. Only an UDAF is needed to calculate intersection of two or 
> more bitmaps.
> I'll try on this and post patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KYLIN-2210) call CubeStatsReader.print at SaveStatisticsStep

2016-12-10 Thread hongbin ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma closed KYLIN-2210.
-
Resolution: Won't Fix

> call CubeStatsReader.print at SaveStatisticsStep
> 
>
> Key: KYLIN-2210
> URL: https://issues.apache.org/jira/browse/KYLIN-2210
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> The output of CubeStatsReader is usually helpful to modellers. We'll first 
> output it in kylin.log



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1793) Job couldn't stop when hive commands got error with beeline

2016-12-10 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737905#comment-15737905
 ] 

Shaofeng SHI commented on KYLIN-1793:
-

[~lidong_sjtu] could you please add description on how this be fixed? Thanks

> Job couldn't stop when hive commands got error with beeline
> ---
>
> Key: KYLIN-1793
> URL: https://issues.apache.org/jira/browse/KYLIN-1793
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.0, v1.5.1, v1.5.2
>Reporter: Shaofeng SHI
>Assignee: Dong Li
>  Labels: newbie
> Fix For: v1.6.1
>
>
> Configure Kylin to use beeline as the hive command line; submit a cube build 
> job, the job moves to 100% with success, while I found there was error in the 
> hive related steps, but the error wasn't captured by Kylin;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2227) rename kylin-log4j.properties to kylin-tools-log4j.properties and move it to global conf folder

2016-12-10 Thread hongbin ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2227:
--
Fix Version/s: v1.6.1

> rename kylin-log4j.properties to kylin-tools-log4j.properties and move it to 
> global conf folder
> ---
>
> Key: KYLIN-2227
> URL: https://issues.apache.org/jira/browse/KYLIN-2227
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v1.6.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-227) Support "Pause" on Kylin Job

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737889#comment-15737889
 ] 

hongbin ma commented on KYLIN-227:
--

it's very useful, thanks shaofeng!

> Support "Pause" on Kylin Job
> 
>
> Key: KYLIN-227
> URL: https://issues.apache.org/jira/browse/KYLIN-227
> Project: Kylin
>  Issue Type: Wish
>  Components: Job Engine
>Reporter: Luke Han
>Assignee: Shaofeng SHI
>  Labels: github-import
> Fix For: v1.6.1
>
>
> Add one action called "Pause" to stop current job, user could resume this job 
> later.
> ![image|https://cloud.githubusercontent.com/assets/1104017/5556023/54ae27e2-8d07-11e4-8efb-a22c041243ba.png]
>  Imported from GitHub 
> Url: https://github.com/KylinOLAP/Kylin/issues/278
> Created by: [lukehan|https://github.com/lukehan]
> Labels: newfeature, 
> Created at: Fri Dec 26 13:59:03 CST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1793) Job couldn't stop when hive commands got error with beeline

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737880#comment-15737880
 ] 

hongbin ma commented on KYLIN-1793:
---

what is the final solution to the JIRA? It's resolved without any explanation. 
can't find anything in KYLIN-1603 either

> Job couldn't stop when hive commands got error with beeline
> ---
>
> Key: KYLIN-1793
> URL: https://issues.apache.org/jira/browse/KYLIN-1793
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.0, v1.5.1, v1.5.2
>Reporter: Shaofeng SHI
>Assignee: Dong Li
>  Labels: newbie
> Fix For: v1.6.1
>
>
> Configure Kylin to use beeline as the hive command line; submit a cube build 
> job, the job moves to 100% with success, while I found there was error in the 
> hive related steps, but the error wasn't captured by Kylin;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737863#comment-15737863
 ] 

hongbin ma commented on KYLIN-2217:
---

hi [~xiefan46] please specify fixed version if possible

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: Future
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1851) Improve build dictionary, consider that input is already sorted

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737861#comment-15737861
 ] 

hongbin ma commented on KYLIN-1851:
---

what's the status of this JIRA and KYLIN-1178?

> Improve build dictionary, consider that input is already sorted
> ---
>
> Key: KYLIN-1851
> URL: https://issues.apache.org/jira/browse/KYLIN-1851
> Project: Kylin
>  Issue Type: Bug
>Reporter: liyang
>Assignee: XIE FAN
> Attachments: 
> 0001-KYLIN-1851-unfinished-add-TrieDictionaryForest-and-N.patch
>
>
> Currently dictionary build may encounter OOM when cardinality is huge. This 
> can benefit from that the input (which is the output of FactDistinctColumn 
> reducer) is already sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2242) Directly write hdfs file in reducer is dangerous

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737854#comment-15737854
 ] 

hongbin ma commented on KYLIN-2242:
---

+1 

I noticed the same potential issue recently. The whole write procedure finishes 
fast, so it's not easy to reproduce

> Directly write hdfs file in reducer is dangerous
> 
>
> Key: KYLIN-2242
> URL: https://issues.apache.org/jira/browse/KYLIN-2242
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.6.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>
> currently, Kylin directly write hdfs file in {{FactDistinctColumnsReducer}}, 
> which is dangerous because the MapReduce Speculative Execution will result in 
> more than one reducers write the same hdfs file at the same time. 
> After KYLIN-2217, I think this issue will occur with higher probability. we 
> should  output the value by {{context.wirte}} in reducer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-2246) redesign the way to decide layer cubing reducer count

2016-12-10 Thread hongbin ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-2246.
---
   Resolution: Fixed
Fix Version/s: v1.6.1

> redesign the way to decide layer cubing reducer count
> -
>
> Key: KYLIN-2246
> URL: https://issues.apache.org/jira/browse/KYLIN-2246
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v1.6.1
>
>
> currently the sizing algorithm does not leverage CubeStatsReader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2251) JDBC Driver httpcore dependency conflict

2016-12-10 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737820#comment-15737820
 ] 

Billy Liu commented on KYLIN-2251:
--

I know it's not a common fix by depending on the maven loading order. I did it 
because I don't think here was a real conflict case. No exception, no wrong 
result, all IT passed. So I did a very lightweight hack for this update.

> JDBC Driver httpcore dependency conflict
> 
>
> Key: KYLIN-2251
> URL: https://issues.apache.org/jira/browse/KYLIN-2251
> Project: Kylin
>  Issue Type: Bug
>  Components: Driver - JDBC
>Affects Versions: v1.6.0
>Reporter: Billy Liu
>Assignee: Billy Liu
>Priority: Minor
> Fix For: v1.6.1
>
>
> Report by xwhfcenter from github:
> "There is a conflict in dependency of httpcore in module JDBC Driver"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2247) Automatically flush cache after executing "sample.sh" or "metadata.sh restore"

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737818#comment-15737818
 ] 

hongbin ma commented on KYLIN-2247:
---

is there any plan on such API's authorising? Seems not proper to leave the 
permission of "reload metadata" too open

> Automatically flush cache after executing "sample.sh" or "metadata.sh restore"
> --
>
> Key: KYLIN-2247
> URL: https://issues.apache.org/jira/browse/KYLIN-2247
> Project: Kylin
>  Issue Type: Improvement
>  Components: Client - CLI
>Reporter: Shaofeng SHI
> Fix For: Backlog
>
>
> Today after run "sample.sh" to create the sample cube, or after run 
> "metadata.sh restore" command to restore the metadata, the user need 
> explicitly go to Kylin's web UI to "reload metadata"; When it is in a 
> clustered deployment, user has to do this on each node, this is laborous.
> The scripts should call Kylin REST API to flush the caches automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-2262) Kafka Streaming Cube build error

2016-12-10 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737739#comment-15737739
 ] 

Shaofeng SHI edited comment on KYLIN-2262 at 12/10/16 11:43 AM:


Need Kafka 0.10 or above, as stated in the doc: 
https://kylin.apache.org/docs16/tutorial/cube_streaming.html


was (Author: shaofengshi):
Need Kafka 0.10 or above.

> Kafka Streaming Cube build error  
> --
>
> Key: KYLIN-2262
> URL: https://issues.apache.org/jira/browse/KYLIN-2262
> Project: Kylin
>  Issue Type: Test
>  Components: Client - CLI
>Affects Versions: v1.6.0
> Environment: CDH1.5.7
> Kylin1.6 
> KAFKA-2.0.2-1.2.0.2.p0.5
>Reporter: QiLiFei
>Assignee: Shaofeng SHI
>Priority: Blocker
>  Labels: features
> Attachments: kylin.logError.txt
>
>
> When I build the kafka stream cube  according to the doc 
> (http://kylin.apache.org/docs16/tutorial/cube_streaming.html) , it always 
> raise the error in the CLI 
> {"url":"http://172.31.18.12:7070/kylin/api/cubes/StreamingCube9/build2","exception":null}
> From the kylin.log, there are only "Java.lang.NullPointerException" 
> present!!I'm not sure what exactly happened there !!!Please give me some 
> support !



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1936) Improve enable limit logic (exactAggregation is too strict)

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737532#comment-15737532
 ] 

hongbin ma commented on KYLIN-1936:
---

Hi, is there any abnormal logs in region server's log? can you pastebin the 
log? Can you reproduce the issue with our sample cube? 
http://kylin.apache.org/docs16/tutorial/kylin_sample.html

> Improve enable limit logic (exactAggregation is too strict)
> ---
>
> Key: KYLIN-1936
> URL: https://issues.apache.org/jira/browse/KYLIN-1936
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.3
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v1.5.4
>
>
> from zhaotians...@meizu.com:
> recently I got the following error while execute query on a cube which is not 
> that big( about 400mb, 20milion record)
> ==
> Error while executing SQL "select FCRASHTIME,count(1) from 
> UXIP.EDL_FDT_OUC_UPLOAD_FILES group by FCRASH_ANALYSIS_ID,FCRASHTIME limit 
> 1": Scan row count exceeded threshold: 1000, please add filter condition 
> to narrow down backend scan range, like where clause.
> I guess what  it scan were the intermediate result, but It doesn't any order 
> by,also the result count is limit to just 1.so it could scan to find any 
> record with those two dimension and wala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-2261) Cleanup Hbase Storage issue

2016-12-10 Thread hongbin ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-2261.
---
   Resolution: Fixed
 Assignee: hongbin ma
Fix Version/s: v1.6.0

> Cleanup Hbase Storage issue
> ---
>
> Key: KYLIN-2261
> URL: https://issues.apache.org/jira/browse/KYLIN-2261
> Project: Kylin
>  Issue Type: Test
>  Components: Client - CLI
>Affects Versions: v1.6.0
> Environment: CDH-5.7.2
> Hbase1.2
> Kylin1.6
>Reporter: QiLiFei
>Assignee: hongbin ma
>Priority: Critical
>  Labels: test
> Fix For: v1.6.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When I try to run below command according doc 
> (http://kylin.apache.org/docs16/howto/howto_cleanup_storage.html), it will 
> always raise error "Error: Could not find or load main class 
> org.apache.kylin.tool.StorageCleanupJob"
> Command : 
>  /opt/kylin/bin/kylin.sh  org.apache.kylin.tool.StorageCleanupJob --delete 
> false
>  
> Is the class in the 'kylin-storage-hbase.jar ' ?
> And it should be put into $KYLIN_HOME/lib/  ,  Right ?
> I've put jar file in $KYLIN_HOME/lib &  $HBase_Home/lib/  and set the 777 
> authority . However it's still cannot working!!
> If I'm wrong , please correct me ! Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2261) Cleanup Hbase Storage issue

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737479#comment-15737479
 ] 

hongbin ma commented on KYLIN-2261:
---

As [~yimingliu] said it's caused by a hasty document update. I have reverted 
the change on document

> Cleanup Hbase Storage issue
> ---
>
> Key: KYLIN-2261
> URL: https://issues.apache.org/jira/browse/KYLIN-2261
> Project: Kylin
>  Issue Type: Test
>  Components: Client - CLI
>Affects Versions: v1.6.0
> Environment: CDH-5.7.2
> Hbase1.2
> Kylin1.6
>Reporter: QiLiFei
>Priority: Critical
>  Labels: test
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When I try to run below command according doc 
> (http://kylin.apache.org/docs16/howto/howto_cleanup_storage.html), it will 
> always raise error "Error: Could not find or load main class 
> org.apache.kylin.tool.StorageCleanupJob"
> Command : 
>  /opt/kylin/bin/kylin.sh  org.apache.kylin.tool.StorageCleanupJob --delete 
> false
>  
> Is the class in the 'kylin-storage-hbase.jar ' ?
> And it should be put into $KYLIN_HOME/lib/  ,  Right ?
> I've put jar file in $KYLIN_HOME/lib &  $HBase_Home/lib/  and set the 777 
> authority . However it's still cannot working!!
> If I'm wrong , please correct me ! Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2144) move useful operation tools to org.apache.kylin.tool

2016-12-10 Thread hongbin ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2144:
--
Description: 
due to historical reasons, the following 5 operation tools:

StorageCleanupJob,MetadataCleanupJob,CubeMigrationCLI, 
CubeMigrationCheckCLI,ExtendCubeToHybridCLI

locates in  org.apache.kylin.storage.hbase.util, which brings dependency issues 
and other concerns. 

In 1.6.1 and later, we'll move the 5 tools to org.apache.kylin.tool. The old 
java class will mark as deprecated, and no longer under maintainance.

  was:
due to historical reasons, the following 5 operation tools:

StorageCleanupJob,MetadataCleanupJob,CubeMigrationCLI, 
CubeMigrationCheckCLI,ExtendCubeToHybridCLI

locates in  org.apache.kylin.storage.hbase.util, which brings dependency issues 
and other concerns. 

In 1.6.0 and later, we'll move the 5 tools to org.apache.kylin.tool. The old 
java class will mark as deprecated, and no longer under maintainance.


> move useful operation tools to org.apache.kylin.tool
> 
>
> Key: KYLIN-2144
> URL: https://issues.apache.org/jira/browse/KYLIN-2144
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v1.6.1
>
>
> due to historical reasons, the following 5 operation tools:
> StorageCleanupJob,MetadataCleanupJob,CubeMigrationCLI, 
> CubeMigrationCheckCLI,ExtendCubeToHybridCLI
> locates in  org.apache.kylin.storage.hbase.util, which brings dependency 
> issues and other concerns. 
> In 1.6.1 and later, we'll move the 5 tools to org.apache.kylin.tool. The old 
> java class will mark as deprecated, and no longer under maintainance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2144) move useful operation tools to org.apache.kylin.tool

2016-12-10 Thread hongbin ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma updated KYLIN-2144:
--
Fix Version/s: v1.6.1

> move useful operation tools to org.apache.kylin.tool
> 
>
> Key: KYLIN-2144
> URL: https://issues.apache.org/jira/browse/KYLIN-2144
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v1.6.1
>
>
> due to historical reasons, the following 5 operation tools:
> StorageCleanupJob,MetadataCleanupJob,CubeMigrationCLI, 
> CubeMigrationCheckCLI,ExtendCubeToHybridCLI
> locates in  org.apache.kylin.storage.hbase.util, which brings dependency 
> issues and other concerns. 
> In 1.6.0 and later, we'll move the 5 tools to org.apache.kylin.tool. The old 
> java class will mark as deprecated, and no longer under maintainance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)