[jira] [Created] (KYLIN-3694) Kylin On Druid Storage

2018-11-18 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-3694:
-

 Summary: Kylin On Druid Storage
 Key: KYLIN-3694
 URL: https://issues.apache.org/jira/browse/KYLIN-3694
 Project: Kylin
  Issue Type: New Feature
  Components: Job Engine, Metadata, Query Engine
Affects Versions: v2.5.0
Reporter: kangkaisen
Assignee: kangkaisen
 Attachments: Kylin On Druid Storage.pdf

Meituan Kylin team has implemented a new storage engine for Kylin: Druid 
Storage Engine. 
The attach file is the Kylin On Druid Storage Engine architecture design 
doc. 
We would like to contribute the feature to community, please let us know if 
you have any concern. [^Kylin On Druid Storage.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3425) Kylin v2.3.2 Release

2018-06-24 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-3425:
-

 Summary: Kylin v2.3.2 Release
 Key: KYLIN-3425
 URL: https://issues.apache.org/jira/browse/KYLIN-3425
 Project: Kylin
  Issue Type: Task
Reporter: kangkaisen
Assignee: kangkaisen
 Fix For: v2.3.2






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3205) Allow one column is used for both dimension and precisely count distinct measure

2018-01-27 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-3205:
-

 Summary: Allow one column is used for both dimension and precisely 
count distinct measure
 Key: KYLIN-3205
 URL: https://issues.apache.org/jira/browse/KYLIN-3205
 Project: Kylin
  Issue Type: Bug
  Components: Metadata
Affects Versions: v2.2.0
Reporter: kangkaisen
Assignee: kangkaisen


I Introduced a bug in KYLIN-2316, we should allow one column is used for both 
dimension and precisely count distinct measure, as long as the  dimension 
encoding is not dict.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3133) Fix KYLIN-2717 compatibility issue

2017-12-26 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-3133:
-

 Summary: Fix KYLIN-2717 compatibility issue
 Key: KYLIN-3133
 URL: https://issues.apache.org/jira/browse/KYLIN-3133
 Project: Kylin
  Issue Type: Bug
  Components: Metadata, Tools, Build and Test
Affects Versions: v2.2.0
Reporter: kangkaisen
Assignee: kangkaisen


Fix KYLIN-2717 compatibility issue:
1 keep old getTableDesc api so that user could rolling upgrade to v2.2.0 when 
user have dozens of QueryServer.

2 Use tableRef.getTableDesc().getProject() not modelDesc.getProject() to be 
compatible with old table resource path format.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-3117) Hide project config in cube level

2017-12-19 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-3117:
-

 Summary: Hide project config in cube level
 Key: KYLIN-3117
 URL: https://issues.apache.org/jira/browse/KYLIN-3117
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Affects Versions: v2.2.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently,  The project configs will put in the overrideKylinProps of cube, So 
normal users will see project configs in cube level.

Generally, The project configs is about authentication,security,resource, query 
restriction  and so on. So we shouldn't let  normal users see project configs. 
The project configs should only be seen by Kylin Admin.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-3113) Editing Measure supports fuzzy search in web

2017-12-15 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-3113:
-

 Summary: Editing Measure supports fuzzy search in web
 Key: KYLIN-3113
 URL: https://issues.apache.org/jira/browse/KYLIN-3113
 Project: Kylin
  Issue Type: Improvement
  Components: Web 
Affects Versions: v2.2.0
Reporter: kangkaisen
Assignee: kangkaisen


After Kylin 2.0,  the column in web contains table name and column name, so the 
prefixal search is useless, which is a bad user experience. So we should 
support fuzzy search when editing measure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-3002) Use Spark as default engine for none-global-dict cube

2017-11-03 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-3002:
-

 Summary: Use Spark as default engine for none-global-dict cube
 Key: KYLIN-3002
 URL: https://issues.apache.org/jira/browse/KYLIN-3002
 Project: Kylin
  Issue Type: Improvement
  Components: Web 
Reporter: kangkaisen
Assignee: kangkaisen


After KYLIN-2997, like KYLIN-2963, we could use Spark as default engine for 
none-global-dict cube.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-3000) Add a tool supporting migrate Cubedesc across different HBase cluster

2017-11-03 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-3000:
-

 Summary: Add a tool supporting migrate Cubedesc across different 
HBase cluster
 Key: KYLIN-3000
 URL: https://issues.apache.org/jira/browse/KYLIN-3000
 Project: Kylin
  Issue Type: New Feature
  Components: Tools, Build and Test
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Add a tool supporting migrate Cubedesc across different HBase cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2999) One click migrate cube in web

2017-11-03 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2999:
-

 Summary: One click migrate cube in web
 Key: KYLIN-2999
 URL: https://issues.apache.org/jira/browse/KYLIN-2999
 Project: Kylin
  Issue Type: New Feature
  Components: Tools, Build and Test, Web 
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Currently, the cube migration must be done by Kylin Admin,  which will waste a 
lot of time for Kylin Admin. So, we should allow use to migrate cube by one 
click in web. Of Course, which is configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2998) Kill spark app when job was discarded

2017-11-03 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2998:
-

 Summary: Kill spark app when job was discarded
 Key: KYLIN-2998
 URL: https://issues.apache.org/jira/browse/KYLIN-2998
 Project: Kylin
  Issue Type: Improvement
  Components: Spark Engine
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Currently, when we discard spark job, the spark job will still running, and 
when we restart JobServer, the SparkExecutable will submit a new spark job. we 
should handle spark job as mr job.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2997) Allow change engineType even if there are segments in cube

2017-11-03 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2997:
-

 Summary: Allow change engineType even if there are segments in cube
 Key: KYLIN-2997
 URL: https://issues.apache.org/jira/browse/KYLIN-2997
 Project: Kylin
  Issue Type: Bug
  Components: Metadata, Web 
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Currently, the cube signature contains engineType, if user want to switch 
engine, they must purge the cube firstly. I think which is unreasonable because 
the engine doesn't effect query and existing segments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2996) DeployCoprocessorCLI Log failed tables info

2017-11-02 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2996:
-

 Summary: DeployCoprocessorCLI Log failed tables info
 Key: KYLIN-2996
 URL: https://issues.apache.org/jira/browse/KYLIN-2996
 Project: Kylin
  Issue Type: Improvement
  Components: Storage - HBase
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, updating coprocessor will be less likely to fail, we should tell 
user the info in final output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2995) Set SparkContext.hadoopConfiguration to HadoopUtil in Spark Cuing

2017-11-02 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2995:
-

 Summary: Set SparkContext.hadoopConfiguration to HadoopUtil in 
Spark Cuing
 Key: KYLIN-2995
 URL: https://issues.apache.org/jira/browse/KYLIN-2995
 Project: Kylin
  Issue Type: Bug
  Components: Spark Engine
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Currenly, we load metadata from HDFS in 
SparkCubing:{{AbstractHadoopJob.loadKylinConfigFromHdfs}}, But HadoopUtil will 
use new Configuration, we should use SparkContext.hadoopConfiguration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2994) Handle NPE when load dict in DictionaryManager

2017-11-02 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2994:
-

 Summary: Handle NPE when load dict in DictionaryManager
 Key: KYLIN-2994
 URL: https://issues.apache.org/jira/browse/KYLIN-2994
 Project: Kylin
  Issue Type: Bug
  Components: Metadata
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


Currently, the argument {{resourcePath}} in 
{{DictionaryManager.getDictionaryInfo}} could be NULL



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2993) Add special mr config for base cuboid step

2017-11-02 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2993:
-

 Summary: Add special mr config for base cuboid step
 Key: KYLIN-2993
 URL: https://issues.apache.org/jira/browse/KYLIN-2993
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Refer to http://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/, 
currently, if user want to enlarge MR memory for global dict, they must use 
kylin.engine.mr.config-override., which will enlarge the memory of  all mr job. 
In fact, we only need to enlarge the memory for "Build Base Cuboid", so we 
could add a special mr config for base cuboid step.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2992) Avoid OOM in CubeHFileJob.Reducer

2017-11-02 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2992:
-

 Summary: Avoid OOM in  CubeHFileJob.Reducer
 Key: KYLIN-2992
 URL: https://issues.apache.org/jira/browse/KYLIN-2992
 Project: Kylin
  Issue Type: Improvement
  Components: Storage - HBase
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Major


Refer to  HBASE-13897, we also could improve CubeHFileJob.Reducer and avoid OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2838) Should get storageType in changeHtableHost of CubeMigrationCLI

2017-09-03 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2838:
-

 Summary: Should get storageType in changeHtableHost of 
CubeMigrationCLI
 Key: KYLIN-2838
 URL: https://issues.apache.org/jira/browse/KYLIN-2838
 Project: Kylin
  Issue Type: Bug
  Components: Tools, Build and Test
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
 Fix For: v2.2.0


We should get storageType in changeHtableHost of CubeMigrationCLI, not 
engineType.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2764) Build the dict for UHC column with MR

2017-07-27 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2764:
-

 Summary: Build the dict for UHC column with MR
 Key: KYLIN-2764
 URL: https://issues.apache.org/jira/browse/KYLIN-2764
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


KYLIN-2217 has built dict for  normal column with MR,  but the UHC column still 
build dict in JobServer. Like KYLIN-2217, we also could use MR build dict for 
UHC column. which could thoroughly release the memory pressure and  improve job 
concurrent for JobServer  as well as speed up multi UHC columns procedure.

The MR input is the output of  "Extract Fact Table Distinct Columns", the MR 
output is the UHC column dict. Because it is very hard build global dict with 
multi reducers, I use one reducer handle one UHC column and allocate enough 
memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2744) Should return correct type for SUM measure in web

2017-07-16 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2744:
-

 Summary: Should return correct type for SUM measure in web
 Key: KYLIN-2744
 URL: https://issues.apache.org/jira/browse/KYLIN-2744
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, Kylin return decimal type for the  sum measure of double type, which 
will result in wrong result. So, We should return correct type for SUM measure 
in web.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2707) Fxi NPE in JobInfoConverter

2017-07-03 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2707:
-

 Summary: Fxi NPE in JobInfoConverter
 Key: KYLIN-2707
 URL: https://issues.apache.org/jira/browse/KYLIN-2707
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


The other day,  I couldn't get all job info because the stepOutput for one job 
in JobInfoConverter.parseToJobStep is NULL, I didn't dive into why stepOutput 
is NULL, but since stepOutput could be NULL, I think we should handle it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2706) Should disable Storage limit push down when singleValuesD doesn't containsAll othersD

2017-07-03 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2706:
-

 Summary: Should disable Storage limit push down when singleValuesD 
doesn't containsAll othersD
 Key: KYLIN-2706
 URL: https://issues.apache.org/jira/browse/KYLIN-2706
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


For this SQL, which should disable Storage limit push. Because this SQL will 
return more than one record from HBase tables, but the 
SortedIteratorMergerWithLimit only return one record, which will get wrong 
result.

{code:java}
SELECT sum(A) 
FROM TABLE 
WHERE date_id >= 20170624 and date_id <= 20170626 
limit 1
{code}

We should disable Storage limit push down when singleValuesD doesn't 
containsAll othersD



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2705) Should allow user to remove partition_date_column for model in web

2017-07-03 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2705:
-

 Summary: Should allow user to remove partition_date_column for 
model in web
 Key: KYLIN-2705
 URL: https://issues.apache.org/jira/browse/KYLIN-2705
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


Currently, User couldn't remove partition_date_column for model in web.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2695) Should allow user to override spark conf in cube

2017-06-28 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2695:
-

 Summary: Should allow user to override spark conf in cube
 Key: KYLIN-2695
 URL: https://issues.apache.org/jira/browse/KYLIN-2695
 Project: Kylin
  Issue Type: Improvement
  Components: Spark Engine
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, we could only get spark conf from kylin server config. We should 
allow user to override spark conf in cube.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2694) Fix ArrayIndexOutOfBoundsException in SparkCubingByLayer

2017-06-28 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2694:
-

 Summary: Fix ArrayIndexOutOfBoundsException in SparkCubingByLayer
 Key: KYLIN-2694
 URL: https://issues.apache.org/jira/browse/KYLIN-2694
 Project: Kylin
  Issue Type: Bug
  Components: Spark Engine
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


cubeDesc.getBuildLevel() could be zero, so there will throw 
ArrayIndexOutOfBoundsException in allRDDs[totalLevels - 1].unpersist().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2693) Should use overrideHiveConfig for LookupHiveViewMaterialization and RedistributeFlatHiveTable

2017-06-28 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2693:
-

 Summary: Should use overrideHiveConfig for 
LookupHiveViewMaterialization and RedistributeFlatHiveTable
 Key: KYLIN-2693
 URL: https://issues.apache.org/jira/browse/KYLIN-2693
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently,  we use KylinConfig for LookupHiveViewMaterialization and 
RedistributeFlatHiveTable step. We should use cubeOverrideHiveConfig.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2675) The hfileSizeMB should not relay on kylin.env

2017-06-16 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2675:
-

 Summary: The hfileSizeMB should not relay on kylin.env
 Key: KYLIN-2675
 URL: https://issues.apache.org/jira/browse/KYLIN-2675
 Project: Kylin
  Issue Type: Bug
  Components: Storage - HBase
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


The kylin.env default value is DEV, if user don't set kylin.env.  which will 
make  kylin.storage.hbase.hfile-size-gb useless.

So the hfileSizeMB should not relay on kylin.env.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2674) Should not catch OutOfMemoryError in coprocessor

2017-06-16 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2674:
-

 Summary: Should not catch OutOfMemoryError in coprocessor
 Key: KYLIN-2674
 URL: https://issues.apache.org/jira/browse/KYLIN-2674
 Project: Kylin
  Issue Type: Bug
  Components: Storage - HBase
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


We almost don't have any reason to catch OutOfMemoryError.  Which will result 
in terrible query case when HBase Regionserver OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2673) Should allow user to change fact table as long as the cube is disable

2017-06-16 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2673:
-

 Summary: Should allow user to change fact table as long as the 
cube is disable
 Key: KYLIN-2673
 URL: https://issues.apache.org/jira/browse/KYLIN-2673
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, user couldn't change fact table  even though the cube is disable, 
which isn't reasonable. We should allow user to change fact table as long as 
the cube is disable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2672) Only clean necessary cache for CubeMigrationCLI

2017-06-16 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2672:
-

 Summary: Only clean necessary cache for CubeMigrationCLI
 Key: KYLIN-2672
 URL: https://issues.apache.org/jira/browse/KYLIN-2672
 Project: Kylin
  Issue Type: Improvement
  Components: Tools, Build and Test
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, we simply clear ALL cache in  CubeMigrationCLI. which will make a 
few of queries slower in prod env when we have many tables, models, cubes and 
migrate cube often.

So, we could only clean necessary cache for CubeMigrationCLI.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2665) Add model JSON edit in web

2017-06-09 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2665:
-

 Summary: Add model JSON edit in web 
 Key: KYLIN-2665
 URL: https://issues.apache.org/jira/browse/KYLIN-2665
 Project: Kylin
  Issue Type: New Feature
  Components: Web 
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, when the model metadata is broken, we must  use {{bin/metastore.sh}} 
to fix the metadata. Which is troublesome. So we should allow the admin to edit 
model JSON in web directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2664) Fix Extended column bug in web

2017-06-09 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2664:
-

 Summary: Fix Extended column bug in web
 Key: KYLIN-2664
 URL: https://issues.apache.org/jira/browse/KYLIN-2664
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


The option for {{Extended column on fact table}} should be 
{{getCommonMetricColumns()}} not {{getCommonMetricColumns()}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2653) Spark cubing support HBase cluster with kerberos on Yarn client mode

2017-05-31 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2653:
-

 Summary: Spark cubing support HBase cluster with kerberos on Yarn 
client mode
 Key: KYLIN-2653
 URL: https://issues.apache.org/jira/browse/KYLIN-2653
 Project: Kylin
  Issue Type: Bug
  Components: Spark Engine
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, Spark cubing doesn't support HBase cluster with kerberos.
Temporarily,we could support HBase cluster with kerberos on Yarn client mode, 
because which is easy.
In the long term,we should avoid access HBase in Spark cubing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2652) Make KylinConfig threadsafe in CubeVisitService

2017-05-31 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2652:
-

 Summary: Make KylinConfig threadsafe in CubeVisitService
 Key: KYLIN-2652
 URL: https://issues.apache.org/jira/browse/KYLIN-2652
 Project: Kylin
  Issue Type: Bug
  Components: Storage - HBase
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, the KylinConfig in CubeVisitService is not threadsafe. This bug 
didn't expose until KYLIN-2195 updated the naming convention for kylin 
properties.

When user upgrade to Kylin 2.0, If user set 
kylin.query.endpoint.compression.result=false and user only upgrade one 
QueryServer to 2.0 firstly.  The config kylin.query.endpoint.compression.result 
will change to kylin.storage.hbase.endpoint-compress-result, So the 
CubeVisitService in HBase will get {{kylinConfig.getCompressionResult()}} true 
and is not consistent with the QueryServer config, which will make the query 
failed.

The KylinConfig in CubeVisitService is not threadsafe, which will not only make 
the one QueryServer updated query failed, but also all JobServer and all 
QueryServer query failed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2647) Should get FileSystem from HBaseConfiguration in HBaseResourceStore

2017-05-25 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2647:
-

 Summary: Should get FileSystem from HBaseConfiguration in 
HBaseResourceStore
 Key: KYLIN-2647
 URL: https://issues.apache.org/jira/browse/KYLIN-2647
 Project: Kylin
  Issue Type: Bug
  Components: Metadata
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Critical


KYLIN-2351 introduced a bug if User use Standalone HBase Cluster.
{code:java}
   Error while executing SQL "SELECT SUM(revenue) AS revenue, SUM(profit) AS 
profit, SUM(repay_profit) AS repayProfit, SUM(fraud_profit) AS fraudProfit, 
SUM(share_profit) AS shareProfit, SUM(consume) AS consume, SUM(repay_consume) 
AS repayConsume, SUM(fraud_consume) AS fraudConsume, SUM(share_consume) AS 
shareConsume, SUM(cost) AS cost, SUM(fraud_cost) AS fraudCost, SUM(repay_cost) 
AS repayCost, poi_cate2_id AS poiCategory2Id, poi_cate2_name AS 
poiCategory2Name, main_poi_id AS orgId, main_poi_name AS orgName, 
COUNT(DISTINCT NEW_OBJECT) AS newDeal, COUNT(DISTINCT ONLINE_OBJECT) AS 
onlineDeal, partition_date AS dateStr FROM mart_catering.app_shu_v5_trade_view 
WHERE (bd_id = 2084324 AND c_platform IN ('mt', 'dp') AND partition_date = 
'2017-05-24') GROUP BY poi_cate2_id, poi_cate2_name, partition_date, 
main_poi_id, main_poi_name LIMIT 5": java.io.FileNotFoundException: File 
does not exist: 
/user/kylin2x/prod/kylin2x_metadata_prod/resources/dict/MART_CATERING.APP_SHU_V5_TRADE_VIEW/C_OBJECT_ID/854df823-abc8-4e19-9035-def12f8af3e2.dict
 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) 
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1850)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1821)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1729)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:589)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:415) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)



at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:299)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:793)
at 
org.apache.kylin.storage.hbase.HBaseResourceStore.getInputStream(HBaseResourceStore.java:206)
at 
org.apache.kylin.storage.hbase.HBaseResourceStore.getResourceImpl(HBaseResourceStore.java:226)
at 
org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.java:148)
at 
org.apache.kylin.dict.DictionaryManager.load(DictionaryManager.java:448)
at 
org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:105)
at 
org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:102)
at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at 
org.apache.kylin.dict.DictionaryManager.getDictionaryInfo(DictionaryManager.java:122)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2642) Relax check in RowKeyColDesc to keep backward compatibility

2017-05-23 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2642:
-

 Summary: Relax check in RowKeyColDesc to keep backward 
compatibility
 Key: KYLIN-2642
 URL: https://issues.apache.org/jira/browse/KYLIN-2642
 Project: Kylin
  Issue Type: Bug
  Components: Metadata
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


This check will make the cube DESCBROKEN if user used FixedLenDimEnc encode 
integer:

{code:java}
if (encodingName.startsWith(FixedLenDimEnc.ENCODING_NAME) && 
(type.isIntegerFamily() || type.isNumberFamily())) {
throw new IllegalArgumentException(colRef + " type is " + type + " 
and cannot apply fixed_length encoding");
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2628) Remove synchronized modifier for reloadCubeLocalAt

2017-05-17 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2628:
-

 Summary: Remove synchronized modifier for reloadCubeLocalAt
 Key: KYLIN-2628
 URL: https://issues.apache.org/jira/browse/KYLIN-2628
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


The synchronized modifier for CubeManager.reloadCubeLocalAt is unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2626) Fix InstantiationException in ZookeeperDistributedJobLock

2017-05-16 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2626:
-

 Summary: Fix InstantiationException in ZookeeperDistributedJobLock
 Key: KYLIN-2626
 URL: https://issues.apache.org/jira/browse/KYLIN-2626
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v2.1.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Critical
 Fix For: v2.1.0


KYLIN-2578 introduced this issue:

{code:java}
  Caused by: java.lang.RuntimeException: java.lang.InstantiationException: 
org.apache.kylin.storage.hbase.util.ZookeeperDistributedLock
at org.apache.kylin.common.util.ClassUtil.newInstance(ClassUtil.java:95)
at 
org.apache.kylin.rest.service.JobService.afterPropertiesSet(JobService.java:110)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1573)
at 
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1511)
... 38 more
Caused by: java.lang.InstantiationException: 
org.apache.kylin.storage.hbase.util.ZookeeperDistributedLock
at java.lang.Class.newInstance(Class.java:427)
at org.apache.kylin.common.util.ClassUtil.newInstance(ClassUtil.java:93)
... 41 more
Caused by: java.lang.NoSuchMethodException: 
org.apache.kylin.storage.hbase.util.ZookeeperDistributedLock.()
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.newInstance(Class.java:412)
... 42 more
{code}

which make the Kylin  job server cannot start.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2622) AppendTrieDictionary support not global

2017-05-15 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2622:
-

 Summary: AppendTrieDictionary support not global
 Key: KYLIN-2622
 URL: https://issues.apache.org/jira/browse/KYLIN-2622
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, AppendTrieDictionary only support global dict, which means the dict 
will grow continuously. But for the cube doesn't have Partition Date Column and 
the cube  doesn't need aggregate query across segments, we could build 
AppendTrieDictionary from empty dict every time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2619) Use newCachedThreadPool instead of newFixedThreadPool in Broadcaster

2017-05-15 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2619:
-

 Summary: Use newCachedThreadPool instead of newFixedThreadPool in 
Broadcaster
 Key: KYLIN-2619
 URL: https://issues.apache.org/jira/browse/KYLIN-2619
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
 Fix For: v2.1.0


We should use newCachedThreadPool instead of newFixedThreadPool in Broadcaster 
because newCachedThreadPool is more flexible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2607) Add http timeout for RestClient

2017-05-10 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2607:
-

 Summary: Add http timeout for RestClient
 Key: KYLIN-2607
 URL: https://issues.apache.org/jira/browse/KYLIN-2607
 Project: Kylin
  Issue Type: Improvement
  Components: General
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
 Fix For: v2.1.0


we should add http timeout for RestClient in distributed env.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2606) Only return counter for precise count_distinct if query is exactAggregate

2017-05-10 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2606:
-

 Summary: Only return counter for precise count_distinct if query 
is exactAggregate
 Key: KYLIN-2606
 URL: https://issues.apache.org/jira/browse/KYLIN-2606
 Project: Kylin
  Issue Type: Improvement
  Components: Query Engine
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


If the query is exactAggregation and has some memory hungry measures, we could 
directly return final result to speed up the query , reduce the RPC data size 
and memory usage in queryServer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2604) Use global dict as the default encoding for precise distinct count in web

2017-05-10 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2604:
-

 Summary: Use global dict as the default encoding for precise 
distinct count in web
 Key: KYLIN-2604
 URL: https://issues.apache.org/jira/browse/KYLIN-2604
 Project: Kylin
  Issue Type: Improvement
  Components: Web 
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


we should use global dict as the default encoding for precise distinct count in 
web, which more easy-to-use for users.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2602) Add optional job threshold arg for MetadataCleanupJob

2017-05-10 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2602:
-

 Summary: Add optional job threshold arg for MetadataCleanupJob
 Key: KYLIN-2602
 URL: https://issues.apache.org/jira/browse/KYLIN-2602
 Project: Kylin
  Issue Type: Improvement
  Components: Tools, Build and Test
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor
 Fix For: v2.1.0


When we have hundreds of cubes,we will have tens of thousands of jobs metadata 
within 30 days, which will result in get job metadata slow.

So we should add a optional job threshold arg for MetadataCleanupJob in order 
to users could reduce the job threshold.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2601) The return type of tinyint for sum measure should be bigint

2017-05-10 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2601:
-

 Summary: The return type of tinyint for sum measure should be 
bigint
 Key: KYLIN-2601
 URL: https://issues.apache.org/jira/browse/KYLIN-2601
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Critical
 Fix For: v2.1.0


The return type of tinyint for sum measure should be bigint, not decimal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2563) Fix bug in checkCubeAuthorization

2017-04-24 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2563:
-

 Summary: Fix bug in checkCubeAuthorization
 Key: KYLIN-2563
 URL: https://issues.apache.org/jira/browse/KYLIN-2563
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen


I found that the preauthorize-annotation didn't work in 
QueryService.checkCubeAuthorization.

It turned out that we can not have annotations on methods that are accessed 
from within the same class, whether private or public. The annotations only 
work on public methods accessed by outsiders.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2547) Fix the bug of multi-process concurrence in mergeCubeSegment

2017-04-14 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2547:
-

 Summary: Fix the bug of multi-process concurrence in 
mergeCubeSegment
 Key: KYLIN-2547
 URL: https://issues.apache.org/jira/browse/KYLIN-2547
 Project: Kylin
  Issue Type: Bug
  Components: Metadata
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor
 Fix For: v2.0.0


There is a minor bug in  "Update Cube Info" when build a cube and in 
distributed env.

{code:java}
   Caused by: java.lang.IllegalStateException: Segments overlap: 
waimai_dolphin_topic_flow_activity_expose_food_d_cube[2017040500_2017041200]
 and 
waimai_dolphin_topic_flow_activity_expose_food_d_cube[2017040500_2017041200]
at org.apache.kylin.cube.CubeValidator.validate(CubeValidator.java:85)
at 
org.apache.kylin.cube.CubeManager.updateCubeWithRetry(CubeManager.java:359)
at 
org.apache.kylin.cube.CubeManager.updateCubeWithRetry(CubeManager.java:386)
at org.apache.kylin.cube.CubeManager.updateCube(CubeManager.java:302)
at org.apache.kylin.cube.CubeManager.mergeSegments(CubeManager.java:533)
at 
org.apache.kylin.rest.service.CubeService.mergeCubeSegment(CubeService.java:635)
at 
org.apache.kylin.rest.service.CubeService.updateOnNewSegmentReady(CubeService.java:587)
at 
org.apache.kylin.rest.service.CubeServiceFastClassBySpringCGLIB17a07c0e.invoke()
at 
org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at 
org.springframework.aop.framework.CglibAopProxyDynamicAdvisedInterceptor.intercept(CglibAopProxy.java:629)
at 
org.apache.kylin.rest.service.CubeServiceEnhancerBySpringCGLIB$$c6fabb3f.updateOnNewSegmentReady()
at 
org.apache.kylin.rest.service.CacheService.rebuildCubeCache(CacheService.java:237)
at 
org.apache.kylin.rest.service.CacheService.access$000(CacheService.java:62)
at 
org.apache.kylin.rest.service.CacheService$1.afterCubeUpdate(CacheService.java:86)
at org.apache.kylin.cube.CubeManager.updateCube(CubeManager.java:305)
at 
org.apache.kylin.cube.CubeManager.promoteNewlyBuiltSegments(CubeManager.java:735)
at 
org.apache.kylin.engine.mr.steps.UpdateCubeInfoAfterBuildStep.doWork(UpdateCubeInfoAfterBuildStep.java:62)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
... 6 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2506) Refactor Global Dictionary

2017-03-13 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2506:
-

 Summary: Refactor Global Dictionary
 Key: KYLIN-2506
 URL: https://issues.apache.org/jira/browse/KYLIN-2506
 Project: Kylin
  Issue Type: Improvement
  Components: General
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen
 Fix For: v2.0.0


The main points of this refactor:
1 Fix the bug that the RemoveListener of LoadingCache swallowed any exceptions 
when building the GlobalDict.
2 Fix the bug that the HDFS filename of DictSliceKey had Illegal characters.
3 Fix the bug that the HDFS filename of DictSliceKey maybe longer than 255.
4 Fix the bug that DictNode split failed if value length greater than 255 bytes.
5 Decouple the build and query of GlobalDict: 
Abstract the builder of AppendTrieDictionary to AppendTrieDictionaryBuilder; 
Add LoadingCache to AppendTrieDictionary and make AppendTrieDictionary is only 
readable.
6 Remove dependence of LoadingCache when building the GlobalDict.
7 Abstract the HDFS operations to GlobalDictStore.
8 Abstract the metadata of GlobalDict to GlobalDictMetadata.
9 Delete CachedTreeMap.
10 Remove the support of multithreading concurrent build and I will add 
distributed lock for GlobalDict later.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2446) Support project names filter in DeployCoprocessorCLI

2017-02-14 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2446:
-

 Summary: Support project names filter in DeployCoprocessorCLI
 Key: KYLIN-2446
 URL: https://issues.apache.org/jira/browse/KYLIN-2446
 Project: Kylin
  Issue Type: Improvement
  Components: Storage - HBase
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen
 Fix For: v2.0.0


we should support updating coprocessor by project names so that user could 
update coprocessor one project by one project.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2433) NPE in MergeCuboidMapper

2017-02-07 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2433:
-

 Summary: NPE in MergeCuboidMapper
 Key: KYLIN-2433
 URL: https://issues.apache.org/jira/browse/KYLIN-2433
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen


If all records of one column is null in a segment, there will be a NPE in 
{{sourceCubeSegment.getDictionary}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2430) Unnecessary exception catching in BulkLoadJob

2017-02-06 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2430:
-

 Summary: Unnecessary exception catching in BulkLoadJob
 Key: KYLIN-2430
 URL: https://issues.apache.org/jira/browse/KYLIN-2430
 Project: Kylin
  Issue Type: Bug
  Components: Storage - HBase
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen


FsShell.run has caught all exceptions, So we should get exitCode instead of 
catching exception.
Currently code potentially result in infinite loop in {{LoadIncrementalHFiles}} 
if user use HBase 0.98.13 and don't set {{hbase.bulkload.retries.number}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2389) Improve resource utilization for DistributedScheduler

2017-01-14 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2389:
-

 Summary: Improve resource utilization for DistributedScheduler
 Key: KYLIN-2389
 URL: https://issues.apache.org/jira/browse/KYLIN-2389
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v2.0.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, in DistributedScheduler we lock segment in JobService, which will 
make the job of segment only schedule in jobServer that the job submitted and   
 could not fully utilize the threadPool resource of all jobServers.

For example, we have two jobServer and the max concurrent jobs is 10, if we 
continuously submit 20 jobs to jobServer1, there will be only 10 jobs running 
at the same time not 20 and will no job running in jobServer2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2388) Hot load kylin config from web

2017-01-13 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2388:
-

 Summary: Hot load kylin config from web
 Key: KYLIN-2388
 URL: https://issues.apache.org/jira/browse/KYLIN-2388
 Project: Kylin
  Issue Type: New Feature
  Components: Web 
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen
 Fix For: v2.0.0


Allow admin user reload kylin config from web, which could improve operational 
efficiency and service stability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2379) Add UseCMSInitiatingOccupancyOnly to KYLIN_JVM_SETTINGS

2017-01-10 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2379:
-

 Summary: Add UseCMSInitiatingOccupancyOnly to KYLIN_JVM_SETTINGS
 Key: KYLIN-2379
 URL: https://issues.apache.org/jira/browse/KYLIN-2379
 Project: Kylin
  Issue Type: Improvement
  Components: Tools, Build and Test
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


{{CMSInitiatingOccupancyFraction}} is only used for the 1st collection unless 
{{-XX:+UseCMSInitiatingOccupancyOnly}} is set.

The reference linking:
https://books.google.com.hk/books?id=aIhUAwAAQBAJ&pg=PA146&lpg=PA146&dq=UseCMSInitiatingOccupancyOnly&source=bl&ots=E51s7uZ1eH&sig=D9nGk_hJu0IQ7QFymCnoekDrWf4&hl=zh-CN&sa=X&ved=0ahUKEwiI2tnQl63RAhWLL48KHZ5tDzA4ChDoAQg5MAQ#v=onepage&q=UseCMSInitiatingOccupancyOnly&f=false

https://blog.codecentric.de/en/2013/10/useful-jvm-flags-part-7-cms-collector/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2378) Set job thread name with job uuid

2017-01-10 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2378:
-

 Summary: Set job thread name with job uuid
 Key: KYLIN-2378
 URL: https://issues.apache.org/jira/browse/KYLIN-2378
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


Set job thread name with job uuid so that we can quickly diagnose the job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2377) Add kylin client query timeout

2017-01-10 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2377:
-

 Summary: Add kylin client query timeout
 Key: KYLIN-2377
 URL: https://issues.apache.org/jira/browse/KYLIN-2377
 Project: Kylin
  Issue Type: Improvement
  Components: Query Engine
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen


Add kylin client query timeout to make query server more robust



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2364) Output table name to error info in LookupTable

2017-01-07 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2364:
-

 Summary: Output table name to error info in LookupTable
 Key: KYLIN-2364
 URL: https://issues.apache.org/jira/browse/KYLIN-2364
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


We should output table name so that the user know which LookupTable is broken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2357) Make ERROR_RECORD_LOG_THRESHOLD configurable

2017-01-04 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2357:
-

 Summary: Make ERROR_RECORD_LOG_THRESHOLD configurable
 Key: KYLIN-2357
 URL: https://issues.apache.org/jira/browse/KYLIN-2357
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


currently, the {{BatchConstants.ERROR_RECORD_LOG_THRESHOLD}} is hardcode to 
100.I wonder why we accept the error record. 

Normally, the cubing should have zero error record.Besides, even if only have 
one error record, the query results will be different from Hive or Presto.

So. I think we could make the ERROR_RECORD_LOG_THRESHOLD configurable and the 
default value is 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2353) Serialize BitmapCounter with distinct count

2017-01-03 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2353:
-

 Summary: Serialize BitmapCounter with distinct count
 Key: KYLIN-2353
 URL: https://issues.apache.org/jira/browse/KYLIN-2353
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, we deserialize the bitmap whether we need to aggregate or not.

Actually, we could serialize {{BitmapCounter}} with bitmap counter and delay to 
deserialize bitmap until we need to aggregate bitmap and only get the counter 
for the bitmap when deserialize.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2349) Serialize BitmapCounter with peekLength

2017-01-03 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2349:
-

 Summary: Serialize BitmapCounter with peekLength
 Key: KYLIN-2349
 URL: https://issues.apache.org/jira/browse/KYLIN-2349
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen


Currently, in {{BitmapCounter}} we deserialize the bitmap to get the 
peekLength, we know which is expensive in terms of CPU time from JMC hot code.

Actually, we could Serialize {{BitmapCounter}} with peekLength to avoid 
deserializing the bitmap when we get peekLength.

Of course, we need to keep forward compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2338) refactor BitmapCounter.DataInputByteBuffer

2016-12-29 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2338:
-

 Summary: refactor BitmapCounter.DataInputByteBuffer
 Key: KYLIN-2338
 URL: https://issues.apache.org/jira/browse/KYLIN-2338
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


Make BitmapCounter.DataInputByteBuffer simpler and more readable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2337) Remove expensive toString in SortedIteratorMergerWithLimit

2016-12-29 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2337:
-

 Summary: Remove expensive toString in SortedIteratorMergerWithLimit
 Key: KYLIN-2337
 URL: https://issues.apache.org/jira/browse/KYLIN-2337
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen


The toString in {{SortedIteratorMergerWithLimit.MergedIteratorWithLimit.next}} 
is expensive and unnecessary



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2308) Allow user to set more columnFamily in web

2016-12-21 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2308:
-

 Summary: Allow user to set more columnFamily in web 
 Key: KYLIN-2308
 URL: https://issues.apache.org/jira/browse/KYLIN-2308
 Project: Kylin
  Issue Type: Improvement
  Components: Web 
Affects Versions: v1.6.1
Reporter: kangkaisen
Assignee: kangkaisen


currently, when user set dozens of precise count distinct metrics in one cube, 
we put all the count distinct metrics column in one columnFamily. Which result 
in HBase scan become slow because the one {{KeyValue}} is too big. we could
set more columnFamily to speed up the HBase scan in this scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2304) Only copy latest version dict for global dict

2016-12-20 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2304:
-

 Summary: Only copy latest version dict for global dict
 Key: KYLIN-2304
 URL: https://issues.apache.org/jira/browse/KYLIN-2304
 Project: Kylin
  Issue Type: Improvement
Affects Versions: v1.6.1
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


After KYLIN-2192, building global dict will use multiple versions. when we 
migrate the cube, we only need copy the  latest version dict, otherwise we will 
take a long time to copy the all version dicts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2287) Speed up model and cube list load in Web

2016-12-15 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2287:
-

 Summary: Speed up model and cube list load in Web
 Key: KYLIN-2287
 URL: https://issues.apache.org/jira/browse/KYLIN-2287
 Project: Kylin
  Issue Type: Improvement
  Components: Web 
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Critical


Currently, if a project has more than one hundred cubes and models, the "Model" 
page load will take a long time because there are a lot of http requests. So we 
need to reduce and defer the http requests when initially load "Model" page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2270) Reduce MR memory usage for global dict

2016-12-11 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2270:
-

 Summary: Reduce MR memory usage for global dict
 Key: KYLIN-2270
 URL: https://issues.apache.org/jira/browse/KYLIN-2270
 Project: Kylin
  Issue Type: Improvement
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen


currently, in {{Build Base Cuboid Data}}, if user use the global dict and the 
global dict size significantly larger the mapper memory size, the 
{{CachedTreeMap}} will load all values as much as possible and the soft 
references object will stick around for a while when GC, So which will make the 
{{Build Base Cuboid Data}}  mapper pause for a long time even could not  finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2269) Reduce MR memory usage for global dict

2016-12-11 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2269:
-

 Summary: Reduce MR memory usage for global dict
 Key: KYLIN-2269
 URL: https://issues.apache.org/jira/browse/KYLIN-2269
 Project: Kylin
  Issue Type: Improvement
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen


currently, in {{Build Base Cuboid Data}}, if user use the global dict and the 
global dict size significantly larger the mapper memory size, the 
{{CachedTreeMap}} will load all values as much as possible and the soft 
references object will stick around for a while when GC, So which will make the 
{{Build Base Cuboid Data}}  mapper pause for a long time even could not  finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2266) Reduce memory usage for building global dict

2016-12-11 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2266:
-

 Summary: Reduce memory usage for building global dict
 Key: KYLIN-2266
 URL: https://issues.apache.org/jira/browse/KYLIN-2266
 Project: Kylin
  Issue Type: Improvement
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: kangkaisen


Because the input for building global dict is sequential,so we could set max 
cache size to 1 to reduce the  memory usage.

Although we also could set `kylin.dict.append.cache.size` to 1 to reduce the  
memory usage, most of users don't know this config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2242) Directly write hdfs file in reducer is dangerous

2016-12-01 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2242:
-

 Summary: Directly write hdfs file in reducer is dangerous
 Key: KYLIN-2242
 URL: https://issues.apache.org/jira/browse/KYLIN-2242
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v1.6.0
Reporter: kangkaisen
Assignee: Dong Li


currently, Kylin directly write hdfs file in {{FactDistinctColumnsReducer}}, 
which is dangerous because the MapReduce Speculative Execution will result in 
more than one reducers write the same hdfs file at the same time. 

After KYLIN-2217, I think this issue will occur with higher probability. we 
should  output the value by {{context.wirte}} in reducer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2239) Remove refreshSegment in JobService

2016-11-30 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2239:
-

 Summary: Remove refreshSegment in JobService
 Key: KYLIN-2239
 URL: https://issues.apache.org/jira/browse/KYLIN-2239
 Project: Kylin
  Issue Type: Improvement
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


currently, we have three build types:build, refresh, merge.  But the build and 
t refresh type only is one job type indeed and the build type could replace the 
refresh type completely. 
So, I think the refresh type is redundant. we can firstly remove  
refreshSegment in JobService internal and keep the web api unchanged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2238) Add query server scan threshold

2016-11-30 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2238:
-

 Summary: Add query server scan threshold
 Key: KYLIN-2238
 URL: https://issues.apache.org/jira/browse/KYLIN-2238
 Project: Kylin
  Issue Type: Improvement
  Components: Query Engine
Affects Versions: v1.5.4.1
Reporter: kangkaisen
Assignee: kangkaisen


currently, we have added  scan threshold in HBase RegionServer, we should also 
add scan threshold in Kylin query server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2237) Ensure dimensions and measures of model don't have null column

2016-11-30 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2237:
-

 Summary: Ensure dimensions and measures of model don't have null 
column
 Key: KYLIN-2237
 URL: https://issues.apache.org/jira/browse/KYLIN-2237
 Project: Kylin
  Issue Type: Bug
  Components: Metadata
Affects Versions: v1.5.4.1
Reporter: kangkaisen
Assignee: kangkaisen


currently, the dimensions or measures of model maybe have null column.
like this: 
{{u'dimensions': [{u'table': u'TEST.KYLIN_CAL_DT_KKS', u'columns': [u'CAL_DT', 
u'YEAR_BEG_DT', u'QTR_BEG_DT', None, u'DAY_OF_CAL_ID_KKS']}],}}

which could be produced by the following steps:

1. rename the hive column in model dimensions or measures.
2. reload the hive table.
3. don't remove the null column because of carelessness and update the model.
4 edit the model again and could not select the dimensions or measures.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2180) Add project config and make config priority become "cube > project > server"

2016-11-13 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2180:
-

 Summary: Add project config and make config priority become "cube 
> project > server"
 Key: KYLIN-2180
 URL: https://issues.apache.org/jira/browse/KYLIN-2180
 Project: Kylin
  Issue Type: New Feature
  Components: Metadata
Affects Versions: v1.5.4.1
Reporter: kangkaisen
Assignee: kangkaisen


There are cases we want to override global kylin.properties in the scope of a 
project. E.g. the queue name of Hadoop job.

Finally, the config priority for Kylin should be "cube > project > server". I 
think which is reasonable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2153) Allow user to skip the check in CubeMetaIngester

2016-11-02 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2153:
-

 Summary: Allow user to skip the check in CubeMetaIngester
 Key: KYLIN-2153
 URL: https://issues.apache.org/jira/browse/KYLIN-2153
 Project: Kylin
  Issue Type: Bug
  Components: Tools, Build and Test
Affects Versions: v1.5.4.1
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


when the model has multiple cubes or the user want to overwrite the model or 
cube indeed, we should allow user to skip the check in {{CubeMetaIngester}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2135) Enlarge FactDistinctColumns reducer number

2016-10-27 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2135:
-

 Summary: Enlarge FactDistinctColumns reducer number
 Key: KYLIN-2135
 URL: https://issues.apache.org/jira/browse/KYLIN-2135
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v1.5.4.1
Reporter: kangkaisen
Assignee: kangkaisen


When the hive table has billions of rows and use global dictionary for precise 
count distinct measures, the  {{Extract Fact Table Distinct Columns}} job will 
run o long time.
So we could use more reducer to deal with the one column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2130) QueryMetrics concurrent bug fix

2016-10-26 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2130:
-

 Summary: QueryMetrics concurrent bug fix
 Key: KYLIN-2130
 URL: https://issues.apache.org/jira/browse/KYLIN-2130
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v1.5.4.1, v1.5.4
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


Recently,I made a concurrent kylin query test and found a little bug in 
QueryMetrics:
If the initial query to a cube or a project is concurrent,  the QueryMetric 
will register failed and throw a MetricsException.

The exception is like this:
"exception":"Metrics source kylin_test,sub=xxx already exists!"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2127) UI bug fix for Extend Column

2016-10-25 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2127:
-

 Summary: UI bug fix for Extend Column
 Key: KYLIN-2127
 URL: https://issues.apache.org/jira/browse/KYLIN-2127
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Affects Versions: v1.5.4.1
Reporter: kangkaisen
Assignee: kangkaisen


In the 1.5.4.1 version of Kylin. we firstly add a new SUM(MAX, MIN...) measure 
and then add a Extend Column measure, finally save the cube will fail.
Because of the json data of Extend Column measure is like this:
{{{
  "name": "周起始日",
  "function": {
"expression": "EXTENDED_COLUMN",
"returntype": "extendedcolumn(100)",
"parameter": {
  "type": "column",
  "value": "WK",
  "next_parameter": {
"type": "column",
"value": "WK_FROM",
"next_parameter": {}
  }
},
"configuration": null
  }
},}}.
the last {{next_parameter}} is {}, it should be null.

This bug may be introduced by KYLIN-1767.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2114) WEB-Global-Dictionary bug fix and improve

2016-10-20 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2114:
-

 Summary: WEB-Global-Dictionary bug fix and improve
 Key: KYLIN-2114
 URL: https://issues.apache.org/jira/browse/KYLIN-2114
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Affects Versions: v1.5.4.1
Reporter: kangkaisen
Assignee: kangkaisen


in the 1.5.4.1 version of Kylin, the web UI for WEB-Global-Dictionary couldn't 
select column from measure columns and need user to input the dictionary 
builder class manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2109) Deploy coprocessor only this server own the table

2016-10-19 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2109:
-

 Summary: Deploy coprocessor only this server own the table
 Key: KYLIN-2109
 URL: https://issues.apache.org/jira/browse/KYLIN-2109
 Project: Kylin
  Issue Type: Bug
  Components: Tools, Build and Test
Affects Versions: v1.5.4.1
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Critical


When the table has migrated from test env to prod env and we update the 
coprocessor in the test env, we should not update the coprocessor of the table 
has migrated, otherwise the queries to prod env will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2093) Clear cache in CubeMetaIngester

2016-10-13 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2093:
-

 Summary: Clear cache in CubeMetaIngester
 Key: KYLIN-2093
 URL: https://issues.apache.org/jira/browse/KYLIN-2093
 Project: Kylin
  Issue Type: Bug
  Components: Tools, Build and Test
Affects Versions: v1.5.4.1
Reporter: kangkaisen
Assignee: kangkaisen


when the target project didn't have the hive table and copied the metadata, the 
{{MetadataManager}} could not get the hive table from the {{srcTableMap}}.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2089) Make update HBase coprocessor concurrent

2016-10-12 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2089:
-

 Summary: Make update HBase coprocessor concurrent
 Key: KYLIN-2089
 URL: https://issues.apache.org/jira/browse/KYLIN-2089
 Project: Kylin
  Issue Type: Improvement
  Components: Tools, Build and Test
Affects Versions: v1.5.4.1
Reporter: kangkaisen
Assignee: kangkaisen


When we have thousands of HBase tables and update the coprocessor, it will take 
several hours. Which means we must stop query service for hours, so we should 
make updating HBase coprocessor concurrent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2006) Make job build server distributed

2016-09-10 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-2006:
-

 Summary: Make job build server distributed
 Key: KYLIN-2006
 URL: https://issues.apache.org/jira/browse/KYLIN-2006
 Project: Kylin
  Issue Type: New Feature
  Components: Job Engine
Reporter: kangkaisen
Assignee: kangkaisen


currently, the Kylin job build server is single-point。
In order to make Kylin job build server more extensible, available, reliable,
we should support distributed job build server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1992) Clear ThreadLocal Contexts when query failed before scaning HBase

2016-09-01 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-1992:
-

 Summary: Clear ThreadLocal Contexts when query failed before 
scaning HBase
 Key: KYLIN-1992
 URL: https://issues.apache.org/jira/browse/KYLIN-1992
 Project: Kylin
  Issue Type: Bug
Reporter: kangkaisen
Assignee: kangkaisen
Priority: Minor


currently, we call `OLAPContext.clearThreadLocalContexts()` function before 
scaning HBase.
if query failed before scaning HBase, we would get wrong `realization` of the 
query possibly.
Because the thread pool of Tomcat multiplexed the thread and didn't clear 
ThreadLocal variable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1986) CubeMigrationCLI: make global dictionary unique

2016-08-30 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-1986:
-

 Summary: CubeMigrationCLI: make global dictionary unique
 Key: KYLIN-1986
 URL: https://issues.apache.org/jira/browse/KYLIN-1986
 Project: Kylin
  Issue Type: Bug
  Components: Tools, Build and Test
Affects Versions: v1.5.3
Reporter: kangkaisen
Assignee: kangkaisen


The global dictionary is shared by all segments of one cube, so when we migrate 
the global dictionary, we should copy the global dictionary file only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1982) CubeMigrationCLI: associate model_name with project

2016-08-29 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-1982:
-

 Summary: CubeMigrationCLI: associate  model_name with project
 Key: KYLIN-1982
 URL: https://issues.apache.org/jira/browse/KYLIN-1982
 Project: Kylin
  Issue Type: Bug
  Components: Tools, Build and Test
Affects Versions: v1.5.3
Reporter: kangkaisen
Assignee: kangkaisen


In the current `CubeMigrationCLI`, when we migrated the cube, the model 
metadata has migrated indeed, but the model hasn't associated with the project. 
So, if we get model via `getModels` in `ModelController` with "modelName" and 
"projectName",  we will get null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1965) Check duplicated measure name

2016-08-18 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-1965:
-

 Summary: Check duplicated measure name
 Key: KYLIN-1965
 URL: https://issues.apache.org/jira/browse/KYLIN-1965
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Affects Versions: v1.5.2, v1.5.3
Reporter: kangkaisen
Assignee: kangkaisen


The duplicated measure's name will lead to query failed, so we should check 
duplicated measure name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1908) Collect Metrics to JMX

2016-07-20 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-1908:
-

 Summary: Collect Metrics to JMX
 Key: KYLIN-1908
 URL: https://issues.apache.org/jira/browse/KYLIN-1908
 Project: Kylin
  Issue Type: New Feature
  Components: Tools, Build and Test
Affects Versions: v1.5.2
Reporter: kangkaisen
Assignee: kangkaisen


As we all known, some performance metrics is important for enterprise 
applications. so we should support to collect metrics to JMX in Kylin.

The method I have done is As shown below:

1. use `org.apache.hadoop.metrics2` as the metrics collection framework.
2. define MBean Class for the metrics that we need to collect.
3. update metrics in right place.

The questions I have:
1. can I depend on `org.apache.hadoop.metrics2` directly?
2. how do you think about my method?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1896) JDBC support mybatis

2016-07-15 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-1896:
-

 Summary: JDBC support mybatis
 Key: KYLIN-1896
 URL: https://issues.apache.org/jira/browse/KYLIN-1896
 Project: Kylin
  Issue Type: Bug
  Components: Driver - JDBC
Affects Versions: v1.5.2
Reporter: kangkaisen
Assignee: kangkaisen


When our user used Mybatis, he found Mybatis need `columnClassType` in 
`ColumnMetaData`. But in the current version of Kylin, when construct the 
`ColumnMetaData`, the  last parameter `columnClassType` is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1893) Upgrade spring-boot framework because of security vulnerabilities

2016-07-14 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-1893:
-

 Summary: Upgrade spring-boot framework because of security 
vulnerabilities
 Key: KYLIN-1893
 URL: https://issues.apache.org/jira/browse/KYLIN-1893
 Project: Kylin
  Issue Type: Bug
  Components: REST Service
Affects Versions: v1.5.2
Reporter: kangkaisen
Assignee: Zhong,Jason
Priority: Critical


The Spring Boot Framework has a expression of SPEL type injection common 
vulnerabilities, which affect versions is 1.1-1.3.0.
we need upgrade to version 1.3.1 or later.



https://www.chinacybersafety.com/tag/the-common-vulnerabilities-and-high-risk-vulnerabilities-early-warning-framework-spring-boot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1884) Reload metadata automatically after migrating cube

2016-07-13 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-1884:
-

 Summary: Reload metadata automatically after migrating cube
 Key: KYLIN-1884
 URL: https://issues.apache.org/jira/browse/KYLIN-1884
 Project: Kylin
  Issue Type: Improvement
  Components: Tools, Build and Test
Affects Versions: v1.5.2
Reporter: kangkaisen
Assignee: kangkaisen


in the current version of Kylin, after migrating cube we need reload metadata 
manually.
in our production environment, we have many restServers. 
so, we hope to reload metadata automatically after migrating cube.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1695) disable cardinality calculation job when loading hive table

2016-05-15 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-1695:
-

 Summary: disable cardinality calculation job when loading hive 
table
 Key: KYLIN-1695
 URL: https://issues.apache.org/jira/browse/KYLIN-1695
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v1.5.1
Reporter: kangkaisen
Assignee: Dong Li


When user loads/reloads hive tables from web console, kylin will submit a mr 
job asynchronously to calculate column cardinalities. This has four major 
problems:

# the calculated cardinality is stored in table metadata, but never used in 
cubing/querying
# table may change after loading, so the cardinality doesn't necessarily 
reflect the actual value
# the current `HiveColumnCardinalityJob` has many limitations, e.g., it doesn't 
support views
# the `HiveColumnCardinalityJob` may use lots of resources when computing 
cardinality of partitioned table

Due to these problems, we should disable it by default and (maybe) remove it in 
future releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1694) make multiply coefficient configurable when estimating cuboid size

2016-05-15 Thread kangkaisen (JIRA)
kangkaisen created KYLIN-1694:
-

 Summary: make multiply coefficient configurable when estimating 
cuboid size
 Key: KYLIN-1694
 URL: https://issues.apache.org/jira/browse/KYLIN-1694
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v1.5.1, v1.5.0
Reporter: kangkaisen
Assignee: Dong Li


In the current version of MRv2 build engine, in CubeStatsReader when estimating 
cuboid size , the curent method is "cube is memory hungry, storage size 
estimation multiply 0.05" and "cube is not memory hungry, storage size 
estimation multiply 0.25".

This has one major problems:the default multiply coefficient is smaller, this 
will make the estimated cuboid size much less than the actual
cuboid size,which will lead to the region numbers of HBase and the reducer 
numbers of CubeHFileJob are both smaller. obviously, the current method
makes the job of CubeHFileJob much slower.

After we remove the the default multiply coefficient, the job of CubeHFileJob 
becomes much faster.

we'd better make multiply coefficient configurable and this could be more 
friendly for user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)