[jira] [Created] (KYLIN-3694) Kylin On Druid Storage
kangkaisen created KYLIN-3694: - Summary: Kylin On Druid Storage Key: KYLIN-3694 URL: https://issues.apache.org/jira/browse/KYLIN-3694 Project: Kylin Issue Type: New Feature Components: Job Engine, Metadata, Query Engine Affects Versions: v2.5.0 Reporter: kangkaisen Assignee: kangkaisen Attachments: Kylin On Druid Storage.pdf Meituan Kylin team has implemented a new storage engine for Kylin: Druid Storage Engine. The attach file is the Kylin On Druid Storage Engine architecture design doc. We would like to contribute the feature to community, please let us know if you have any concern. [^Kylin On Druid Storage.pdf] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3425) Kylin v2.3.2 Release
kangkaisen created KYLIN-3425: - Summary: Kylin v2.3.2 Release Key: KYLIN-3425 URL: https://issues.apache.org/jira/browse/KYLIN-3425 Project: Kylin Issue Type: Task Reporter: kangkaisen Assignee: kangkaisen Fix For: v2.3.2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3205) Allow one column is used for both dimension and precisely count distinct measure
kangkaisen created KYLIN-3205: - Summary: Allow one column is used for both dimension and precisely count distinct measure Key: KYLIN-3205 URL: https://issues.apache.org/jira/browse/KYLIN-3205 Project: Kylin Issue Type: Bug Components: Metadata Affects Versions: v2.2.0 Reporter: kangkaisen Assignee: kangkaisen I Introduced a bug in KYLIN-2316, we should allow one column is used for both dimension and precisely count distinct measure, as long as the dimension encoding is not dict. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3133) Fix KYLIN-2717 compatibility issue
kangkaisen created KYLIN-3133: - Summary: Fix KYLIN-2717 compatibility issue Key: KYLIN-3133 URL: https://issues.apache.org/jira/browse/KYLIN-3133 Project: Kylin Issue Type: Bug Components: Metadata, Tools, Build and Test Affects Versions: v2.2.0 Reporter: kangkaisen Assignee: kangkaisen Fix KYLIN-2717 compatibility issue: 1 keep old getTableDesc api so that user could rolling upgrade to v2.2.0 when user have dozens of QueryServer. 2 Use tableRef.getTableDesc().getProject() not modelDesc.getProject() to be compatible with old table resource path format. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-3117) Hide project config in cube level
kangkaisen created KYLIN-3117: - Summary: Hide project config in cube level Key: KYLIN-3117 URL: https://issues.apache.org/jira/browse/KYLIN-3117 Project: Kylin Issue Type: Improvement Components: Metadata Affects Versions: v2.2.0 Reporter: kangkaisen Assignee: kangkaisen Currently, The project configs will put in the overrideKylinProps of cube, So normal users will see project configs in cube level. Generally, The project configs is about authentication,security,resource, query restriction and so on. So we shouldn't let normal users see project configs. The project configs should only be seen by Kylin Admin. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-3113) Editing Measure supports fuzzy search in web
kangkaisen created KYLIN-3113: - Summary: Editing Measure supports fuzzy search in web Key: KYLIN-3113 URL: https://issues.apache.org/jira/browse/KYLIN-3113 Project: Kylin Issue Type: Improvement Components: Web Affects Versions: v2.2.0 Reporter: kangkaisen Assignee: kangkaisen After Kylin 2.0, the column in web contains table name and column name, so the prefixal search is useless, which is a bad user experience. So we should support fuzzy search when editing measure. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-3002) Use Spark as default engine for none-global-dict cube
kangkaisen created KYLIN-3002: - Summary: Use Spark as default engine for none-global-dict cube Key: KYLIN-3002 URL: https://issues.apache.org/jira/browse/KYLIN-3002 Project: Kylin Issue Type: Improvement Components: Web Reporter: kangkaisen Assignee: kangkaisen After KYLIN-2997, like KYLIN-2963, we could use Spark as default engine for none-global-dict cube. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-3000) Add a tool supporting migrate Cubedesc across different HBase cluster
kangkaisen created KYLIN-3000: - Summary: Add a tool supporting migrate Cubedesc across different HBase cluster Key: KYLIN-3000 URL: https://issues.apache.org/jira/browse/KYLIN-3000 Project: Kylin Issue Type: New Feature Components: Tools, Build and Test Reporter: kangkaisen Assignee: kangkaisen Priority: Major Add a tool supporting migrate Cubedesc across different HBase cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2999) One click migrate cube in web
kangkaisen created KYLIN-2999: - Summary: One click migrate cube in web Key: KYLIN-2999 URL: https://issues.apache.org/jira/browse/KYLIN-2999 Project: Kylin Issue Type: New Feature Components: Tools, Build and Test, Web Reporter: kangkaisen Assignee: kangkaisen Priority: Major Currently, the cube migration must be done by Kylin Admin, which will waste a lot of time for Kylin Admin. So, we should allow use to migrate cube by one click in web. Of Course, which is configurable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2998) Kill spark app when job was discarded
kangkaisen created KYLIN-2998: - Summary: Kill spark app when job was discarded Key: KYLIN-2998 URL: https://issues.apache.org/jira/browse/KYLIN-2998 Project: Kylin Issue Type: Improvement Components: Spark Engine Affects Versions: v2.1.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Major Currently, when we discard spark job, the spark job will still running, and when we restart JobServer, the SparkExecutable will submit a new spark job. we should handle spark job as mr job. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2997) Allow change engineType even if there are segments in cube
kangkaisen created KYLIN-2997: - Summary: Allow change engineType even if there are segments in cube Key: KYLIN-2997 URL: https://issues.apache.org/jira/browse/KYLIN-2997 Project: Kylin Issue Type: Bug Components: Metadata, Web Affects Versions: v2.1.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Major Currently, the cube signature contains engineType, if user want to switch engine, they must purge the cube firstly. I think which is unreasonable because the engine doesn't effect query and existing segments. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2996) DeployCoprocessorCLI Log failed tables info
kangkaisen created KYLIN-2996: - Summary: DeployCoprocessorCLI Log failed tables info Key: KYLIN-2996 URL: https://issues.apache.org/jira/browse/KYLIN-2996 Project: Kylin Issue Type: Improvement Components: Storage - HBase Affects Versions: v2.1.0 Reporter: kangkaisen Assignee: kangkaisen Currently, updating coprocessor will be less likely to fail, we should tell user the info in final output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2995) Set SparkContext.hadoopConfiguration to HadoopUtil in Spark Cuing
kangkaisen created KYLIN-2995: - Summary: Set SparkContext.hadoopConfiguration to HadoopUtil in Spark Cuing Key: KYLIN-2995 URL: https://issues.apache.org/jira/browse/KYLIN-2995 Project: Kylin Issue Type: Bug Components: Spark Engine Affects Versions: v2.1.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Major Currenly, we load metadata from HDFS in SparkCubing:{{AbstractHadoopJob.loadKylinConfigFromHdfs}}, But HadoopUtil will use new Configuration, we should use SparkContext.hadoopConfiguration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2994) Handle NPE when load dict in DictionaryManager
kangkaisen created KYLIN-2994: - Summary: Handle NPE when load dict in DictionaryManager Key: KYLIN-2994 URL: https://issues.apache.org/jira/browse/KYLIN-2994 Project: Kylin Issue Type: Bug Components: Metadata Affects Versions: v2.1.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor Currently, the argument {{resourcePath}} in {{DictionaryManager.getDictionaryInfo}} could be NULL -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2993) Add special mr config for base cuboid step
kangkaisen created KYLIN-2993: - Summary: Add special mr config for base cuboid step Key: KYLIN-2993 URL: https://issues.apache.org/jira/browse/KYLIN-2993 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v2.1.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Major Refer to http://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/, currently, if user want to enlarge MR memory for global dict, they must use kylin.engine.mr.config-override., which will enlarge the memory of all mr job. In fact, we only need to enlarge the memory for "Build Base Cuboid", so we could add a special mr config for base cuboid step. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2992) Avoid OOM in CubeHFileJob.Reducer
kangkaisen created KYLIN-2992: - Summary: Avoid OOM in CubeHFileJob.Reducer Key: KYLIN-2992 URL: https://issues.apache.org/jira/browse/KYLIN-2992 Project: Kylin Issue Type: Improvement Components: Storage - HBase Affects Versions: v2.1.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Major Refer to HBASE-13897, we also could improve CubeHFileJob.Reducer and avoid OOM. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2838) Should get storageType in changeHtableHost of CubeMigrationCLI
kangkaisen created KYLIN-2838: - Summary: Should get storageType in changeHtableHost of CubeMigrationCLI Key: KYLIN-2838 URL: https://issues.apache.org/jira/browse/KYLIN-2838 Project: Kylin Issue Type: Bug Components: Tools, Build and Test Affects Versions: v2.1.0 Reporter: kangkaisen Assignee: kangkaisen Fix For: v2.2.0 We should get storageType in changeHtableHost of CubeMigrationCLI, not engineType. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2764) Build the dict for UHC column with MR
kangkaisen created KYLIN-2764: - Summary: Build the dict for UHC column with MR Key: KYLIN-2764 URL: https://issues.apache.org/jira/browse/KYLIN-2764 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen KYLIN-2217 has built dict for normal column with MR, but the UHC column still build dict in JobServer. Like KYLIN-2217, we also could use MR build dict for UHC column. which could thoroughly release the memory pressure and improve job concurrent for JobServer as well as speed up multi UHC columns procedure. The MR input is the output of "Extract Fact Table Distinct Columns", the MR output is the UHC column dict. Because it is very hard build global dict with multi reducers, I use one reducer handle one UHC column and allocate enough memory to the reducer. According to my test, 8G memory is enough. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2744) Should return correct type for SUM measure in web
kangkaisen created KYLIN-2744: - Summary: Should return correct type for SUM measure in web Key: KYLIN-2744 URL: https://issues.apache.org/jira/browse/KYLIN-2744 Project: Kylin Issue Type: Bug Components: Web Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Currently, Kylin return decimal type for the sum measure of double type, which will result in wrong result. So, We should return correct type for SUM measure in web. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2707) Fxi NPE in JobInfoConverter
kangkaisen created KYLIN-2707: - Summary: Fxi NPE in JobInfoConverter Key: KYLIN-2707 URL: https://issues.apache.org/jira/browse/KYLIN-2707 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor The other day, I couldn't get all job info because the stepOutput for one job in JobInfoConverter.parseToJobStep is NULL, I didn't dive into why stepOutput is NULL, but since stepOutput could be NULL, I think we should handle it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2706) Should disable Storage limit push down when singleValuesD doesn't containsAll othersD
kangkaisen created KYLIN-2706: - Summary: Should disable Storage limit push down when singleValuesD doesn't containsAll othersD Key: KYLIN-2706 URL: https://issues.apache.org/jira/browse/KYLIN-2706 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen For this SQL, which should disable Storage limit push. Because this SQL will return more than one record from HBase tables, but the SortedIteratorMergerWithLimit only return one record, which will get wrong result. {code:java} SELECT sum(A) FROM TABLE WHERE date_id >= 20170624 and date_id <= 20170626 limit 1 {code} We should disable Storage limit push down when singleValuesD doesn't containsAll othersD -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2705) Should allow user to remove partition_date_column for model in web
kangkaisen created KYLIN-2705: - Summary: Should allow user to remove partition_date_column for model in web Key: KYLIN-2705 URL: https://issues.apache.org/jira/browse/KYLIN-2705 Project: Kylin Issue Type: Bug Components: Web Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor Currently, User couldn't remove partition_date_column for model in web. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2695) Should allow user to override spark conf in cube
kangkaisen created KYLIN-2695: - Summary: Should allow user to override spark conf in cube Key: KYLIN-2695 URL: https://issues.apache.org/jira/browse/KYLIN-2695 Project: Kylin Issue Type: Improvement Components: Spark Engine Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Currently, we could only get spark conf from kylin server config. We should allow user to override spark conf in cube. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2694) Fix ArrayIndexOutOfBoundsException in SparkCubingByLayer
kangkaisen created KYLIN-2694: - Summary: Fix ArrayIndexOutOfBoundsException in SparkCubingByLayer Key: KYLIN-2694 URL: https://issues.apache.org/jira/browse/KYLIN-2694 Project: Kylin Issue Type: Bug Components: Spark Engine Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor cubeDesc.getBuildLevel() could be zero, so there will throw ArrayIndexOutOfBoundsException in allRDDs[totalLevels - 1].unpersist(). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2693) Should use overrideHiveConfig for LookupHiveViewMaterialization and RedistributeFlatHiveTable
kangkaisen created KYLIN-2693: - Summary: Should use overrideHiveConfig for LookupHiveViewMaterialization and RedistributeFlatHiveTable Key: KYLIN-2693 URL: https://issues.apache.org/jira/browse/KYLIN-2693 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Currently, we use KylinConfig for LookupHiveViewMaterialization and RedistributeFlatHiveTable step. We should use cubeOverrideHiveConfig. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2675) The hfileSizeMB should not relay on kylin.env
kangkaisen created KYLIN-2675: - Summary: The hfileSizeMB should not relay on kylin.env Key: KYLIN-2675 URL: https://issues.apache.org/jira/browse/KYLIN-2675 Project: Kylin Issue Type: Bug Components: Storage - HBase Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen The kylin.env default value is DEV, if user don't set kylin.env. which will make kylin.storage.hbase.hfile-size-gb useless. So the hfileSizeMB should not relay on kylin.env. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2674) Should not catch OutOfMemoryError in coprocessor
kangkaisen created KYLIN-2674: - Summary: Should not catch OutOfMemoryError in coprocessor Key: KYLIN-2674 URL: https://issues.apache.org/jira/browse/KYLIN-2674 Project: Kylin Issue Type: Bug Components: Storage - HBase Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen We almost don't have any reason to catch OutOfMemoryError. Which will result in terrible query case when HBase Regionserver OOM. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2673) Should allow user to change fact table as long as the cube is disable
kangkaisen created KYLIN-2673: - Summary: Should allow user to change fact table as long as the cube is disable Key: KYLIN-2673 URL: https://issues.apache.org/jira/browse/KYLIN-2673 Project: Kylin Issue Type: Bug Components: Web Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Currently, user couldn't change fact table even though the cube is disable, which isn't reasonable. We should allow user to change fact table as long as the cube is disable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2672) Only clean necessary cache for CubeMigrationCLI
kangkaisen created KYLIN-2672: - Summary: Only clean necessary cache for CubeMigrationCLI Key: KYLIN-2672 URL: https://issues.apache.org/jira/browse/KYLIN-2672 Project: Kylin Issue Type: Improvement Components: Tools, Build and Test Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Currently, we simply clear ALL cache in CubeMigrationCLI. which will make a few of queries slower in prod env when we have many tables, models, cubes and migrate cube often. So, we could only clean necessary cache for CubeMigrationCLI. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2665) Add model JSON edit in web
kangkaisen created KYLIN-2665: - Summary: Add model JSON edit in web Key: KYLIN-2665 URL: https://issues.apache.org/jira/browse/KYLIN-2665 Project: Kylin Issue Type: New Feature Components: Web Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Currently, when the model metadata is broken, we must use {{bin/metastore.sh}} to fix the metadata. Which is troublesome. So we should allow the admin to edit model JSON in web directly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2664) Fix Extended column bug in web
kangkaisen created KYLIN-2664: - Summary: Fix Extended column bug in web Key: KYLIN-2664 URL: https://issues.apache.org/jira/browse/KYLIN-2664 Project: Kylin Issue Type: Bug Components: Web Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen The option for {{Extended column on fact table}} should be {{getCommonMetricColumns()}} not {{getCommonMetricColumns()}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2653) Spark cubing support HBase cluster with kerberos on Yarn client mode
kangkaisen created KYLIN-2653: - Summary: Spark cubing support HBase cluster with kerberos on Yarn client mode Key: KYLIN-2653 URL: https://issues.apache.org/jira/browse/KYLIN-2653 Project: Kylin Issue Type: Bug Components: Spark Engine Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Currently, Spark cubing doesn't support HBase cluster with kerberos. Temporarily,we could support HBase cluster with kerberos on Yarn client mode, because which is easy. In the long term,we should avoid access HBase in Spark cubing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2652) Make KylinConfig threadsafe in CubeVisitService
kangkaisen created KYLIN-2652: - Summary: Make KylinConfig threadsafe in CubeVisitService Key: KYLIN-2652 URL: https://issues.apache.org/jira/browse/KYLIN-2652 Project: Kylin Issue Type: Bug Components: Storage - HBase Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Currently, the KylinConfig in CubeVisitService is not threadsafe. This bug didn't expose until KYLIN-2195 updated the naming convention for kylin properties. When user upgrade to Kylin 2.0, If user set kylin.query.endpoint.compression.result=false and user only upgrade one QueryServer to 2.0 firstly. The config kylin.query.endpoint.compression.result will change to kylin.storage.hbase.endpoint-compress-result, So the CubeVisitService in HBase will get {{kylinConfig.getCompressionResult()}} true and is not consistent with the QueryServer config, which will make the query failed. The KylinConfig in CubeVisitService is not threadsafe, which will not only make the one QueryServer updated query failed, but also all JobServer and all QueryServer query failed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2647) Should get FileSystem from HBaseConfiguration in HBaseResourceStore
kangkaisen created KYLIN-2647: - Summary: Should get FileSystem from HBaseConfiguration in HBaseResourceStore Key: KYLIN-2647 URL: https://issues.apache.org/jira/browse/KYLIN-2647 Project: Kylin Issue Type: Bug Components: Metadata Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Critical KYLIN-2351 introduced a bug if User use Standalone HBase Cluster. {code:java} Error while executing SQL "SELECT SUM(revenue) AS revenue, SUM(profit) AS profit, SUM(repay_profit) AS repayProfit, SUM(fraud_profit) AS fraudProfit, SUM(share_profit) AS shareProfit, SUM(consume) AS consume, SUM(repay_consume) AS repayConsume, SUM(fraud_consume) AS fraudConsume, SUM(share_consume) AS shareConsume, SUM(cost) AS cost, SUM(fraud_cost) AS fraudCost, SUM(repay_cost) AS repayCost, poi_cate2_id AS poiCategory2Id, poi_cate2_name AS poiCategory2Name, main_poi_id AS orgId, main_poi_name AS orgName, COUNT(DISTINCT NEW_OBJECT) AS newDeal, COUNT(DISTINCT ONLINE_OBJECT) AS onlineDeal, partition_date AS dateStr FROM mart_catering.app_shu_v5_trade_view WHERE (bd_id = 2084324 AND c_platform IN ('mt', 'dp') AND partition_date = '2017-05-24') GROUP BY poi_cate2_id, poi_cate2_name, partition_date, main_poi_id, main_poi_name LIMIT 5": java.io.FileNotFoundException: File does not exist: /user/kylin2x/prod/kylin2x_metadata_prod/resources/dict/MART_CATERING.APP_SHU_V5_TRADE_VIEW/C_OBJECT_ID/854df823-abc8-4e19-9035-def12f8af3e2.dict at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1821) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1729) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:589) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:299) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:793) at org.apache.kylin.storage.hbase.HBaseResourceStore.getInputStream(HBaseResourceStore.java:206) at org.apache.kylin.storage.hbase.HBaseResourceStore.getResourceImpl(HBaseResourceStore.java:226) at org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.java:148) at org.apache.kylin.dict.DictionaryManager.load(DictionaryManager.java:448) at org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:105) at org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:102) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) at com.google.common.cache.LocalCache.get(LocalCache.java:4000) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.kylin.dict.DictionaryManager.getDictionaryInfo(DictionaryManager.java:122) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2642) Relax check in RowKeyColDesc to keep backward compatibility
kangkaisen created KYLIN-2642: - Summary: Relax check in RowKeyColDesc to keep backward compatibility Key: KYLIN-2642 URL: https://issues.apache.org/jira/browse/KYLIN-2642 Project: Kylin Issue Type: Bug Components: Metadata Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor This check will make the cube DESCBROKEN if user used FixedLenDimEnc encode integer: {code:java} if (encodingName.startsWith(FixedLenDimEnc.ENCODING_NAME) && (type.isIntegerFamily() || type.isNumberFamily())) { throw new IllegalArgumentException(colRef + " type is " + type + " and cannot apply fixed_length encoding"); } {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2628) Remove synchronized modifier for reloadCubeLocalAt
kangkaisen created KYLIN-2628: - Summary: Remove synchronized modifier for reloadCubeLocalAt Key: KYLIN-2628 URL: https://issues.apache.org/jira/browse/KYLIN-2628 Project: Kylin Issue Type: Improvement Components: Metadata Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor The synchronized modifier for CubeManager.reloadCubeLocalAt is unnecessary. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2626) Fix InstantiationException in ZookeeperDistributedJobLock
kangkaisen created KYLIN-2626: - Summary: Fix InstantiationException in ZookeeperDistributedJobLock Key: KYLIN-2626 URL: https://issues.apache.org/jira/browse/KYLIN-2626 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v2.1.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Critical Fix For: v2.1.0 KYLIN-2578 introduced this issue: {code:java} Caused by: java.lang.RuntimeException: java.lang.InstantiationException: org.apache.kylin.storage.hbase.util.ZookeeperDistributedLock at org.apache.kylin.common.util.ClassUtil.newInstance(ClassUtil.java:95) at org.apache.kylin.rest.service.JobService.afterPropertiesSet(JobService.java:110) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1573) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1511) ... 38 more Caused by: java.lang.InstantiationException: org.apache.kylin.storage.hbase.util.ZookeeperDistributedLock at java.lang.Class.newInstance(Class.java:427) at org.apache.kylin.common.util.ClassUtil.newInstance(ClassUtil.java:93) ... 41 more Caused by: java.lang.NoSuchMethodException: org.apache.kylin.storage.hbase.util.ZookeeperDistributedLock.() at java.lang.Class.getConstructor0(Class.java:3082) at java.lang.Class.newInstance(Class.java:412) ... 42 more {code} which make the Kylin job server cannot start. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2622) AppendTrieDictionary support not global
kangkaisen created KYLIN-2622: - Summary: AppendTrieDictionary support not global Key: KYLIN-2622 URL: https://issues.apache.org/jira/browse/KYLIN-2622 Project: Kylin Issue Type: Improvement Components: Metadata Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Currently, AppendTrieDictionary only support global dict, which means the dict will grow continuously. But for the cube doesn't have Partition Date Column and the cube doesn't need aggregate query across segments, we could build AppendTrieDictionary from empty dict every time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2619) Use newCachedThreadPool instead of newFixedThreadPool in Broadcaster
kangkaisen created KYLIN-2619: - Summary: Use newCachedThreadPool instead of newFixedThreadPool in Broadcaster Key: KYLIN-2619 URL: https://issues.apache.org/jira/browse/KYLIN-2619 Project: Kylin Issue Type: Improvement Components: Metadata Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Fix For: v2.1.0 We should use newCachedThreadPool instead of newFixedThreadPool in Broadcaster because newCachedThreadPool is more flexible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2607) Add http timeout for RestClient
kangkaisen created KYLIN-2607: - Summary: Add http timeout for RestClient Key: KYLIN-2607 URL: https://issues.apache.org/jira/browse/KYLIN-2607 Project: Kylin Issue Type: Improvement Components: General Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Fix For: v2.1.0 we should add http timeout for RestClient in distributed env. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2606) Only return counter for precise count_distinct if query is exactAggregate
kangkaisen created KYLIN-2606: - Summary: Only return counter for precise count_distinct if query is exactAggregate Key: KYLIN-2606 URL: https://issues.apache.org/jira/browse/KYLIN-2606 Project: Kylin Issue Type: Improvement Components: Query Engine Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen If the query is exactAggregation and has some memory hungry measures, we could directly return final result to speed up the query , reduce the RPC data size and memory usage in queryServer. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2604) Use global dict as the default encoding for precise distinct count in web
kangkaisen created KYLIN-2604: - Summary: Use global dict as the default encoding for precise distinct count in web Key: KYLIN-2604 URL: https://issues.apache.org/jira/browse/KYLIN-2604 Project: Kylin Issue Type: Improvement Components: Web Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor we should use global dict as the default encoding for precise distinct count in web, which more easy-to-use for users. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2602) Add optional job threshold arg for MetadataCleanupJob
kangkaisen created KYLIN-2602: - Summary: Add optional job threshold arg for MetadataCleanupJob Key: KYLIN-2602 URL: https://issues.apache.org/jira/browse/KYLIN-2602 Project: Kylin Issue Type: Improvement Components: Tools, Build and Test Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor Fix For: v2.1.0 When we have hundreds of cubes,we will have tens of thousands of jobs metadata within 30 days, which will result in get job metadata slow. So we should add a optional job threshold arg for MetadataCleanupJob in order to users could reduce the job threshold. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2601) The return type of tinyint for sum measure should be bigint
kangkaisen created KYLIN-2601: - Summary: The return type of tinyint for sum measure should be bigint Key: KYLIN-2601 URL: https://issues.apache.org/jira/browse/KYLIN-2601 Project: Kylin Issue Type: Bug Components: Web Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Critical Fix For: v2.1.0 The return type of tinyint for sum measure should be bigint, not decimal. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2563) Fix bug in checkCubeAuthorization
kangkaisen created KYLIN-2563: - Summary: Fix bug in checkCubeAuthorization Key: KYLIN-2563 URL: https://issues.apache.org/jira/browse/KYLIN-2563 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen I found that the preauthorize-annotation didn't work in QueryService.checkCubeAuthorization. It turned out that we can not have annotations on methods that are accessed from within the same class, whether private or public. The annotations only work on public methods accessed by outsiders. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2547) Fix the bug of multi-process concurrence in mergeCubeSegment
kangkaisen created KYLIN-2547: - Summary: Fix the bug of multi-process concurrence in mergeCubeSegment Key: KYLIN-2547 URL: https://issues.apache.org/jira/browse/KYLIN-2547 Project: Kylin Issue Type: Bug Components: Metadata Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor Fix For: v2.0.0 There is a minor bug in "Update Cube Info" when build a cube and in distributed env. {code:java} Caused by: java.lang.IllegalStateException: Segments overlap: waimai_dolphin_topic_flow_activity_expose_food_d_cube[2017040500_2017041200] and waimai_dolphin_topic_flow_activity_expose_food_d_cube[2017040500_2017041200] at org.apache.kylin.cube.CubeValidator.validate(CubeValidator.java:85) at org.apache.kylin.cube.CubeManager.updateCubeWithRetry(CubeManager.java:359) at org.apache.kylin.cube.CubeManager.updateCubeWithRetry(CubeManager.java:386) at org.apache.kylin.cube.CubeManager.updateCube(CubeManager.java:302) at org.apache.kylin.cube.CubeManager.mergeSegments(CubeManager.java:533) at org.apache.kylin.rest.service.CubeService.mergeCubeSegment(CubeService.java:635) at org.apache.kylin.rest.service.CubeService.updateOnNewSegmentReady(CubeService.java:587) at org.apache.kylin.rest.service.CubeServiceFastClassBySpringCGLIB17a07c0e.invoke() at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) at org.springframework.aop.framework.CglibAopProxyDynamicAdvisedInterceptor.intercept(CglibAopProxy.java:629) at org.apache.kylin.rest.service.CubeServiceEnhancerBySpringCGLIB$$c6fabb3f.updateOnNewSegmentReady() at org.apache.kylin.rest.service.CacheService.rebuildCubeCache(CacheService.java:237) at org.apache.kylin.rest.service.CacheService.access$000(CacheService.java:62) at org.apache.kylin.rest.service.CacheService$1.afterCubeUpdate(CacheService.java:86) at org.apache.kylin.cube.CubeManager.updateCube(CubeManager.java:305) at org.apache.kylin.cube.CubeManager.promoteNewlyBuiltSegments(CubeManager.java:735) at org.apache.kylin.engine.mr.steps.UpdateCubeInfoAfterBuildStep.doWork(UpdateCubeInfoAfterBuildStep.java:62) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113) ... 6 more {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2506) Refactor Global Dictionary
kangkaisen created KYLIN-2506: - Summary: Refactor Global Dictionary Key: KYLIN-2506 URL: https://issues.apache.org/jira/browse/KYLIN-2506 Project: Kylin Issue Type: Improvement Components: General Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Fix For: v2.0.0 The main points of this refactor: 1 Fix the bug that the RemoveListener of LoadingCache swallowed any exceptions when building the GlobalDict. 2 Fix the bug that the HDFS filename of DictSliceKey had Illegal characters. 3 Fix the bug that the HDFS filename of DictSliceKey maybe longer than 255. 4 Fix the bug that DictNode split failed if value length greater than 255 bytes. 5 Decouple the build and query of GlobalDict: Abstract the builder of AppendTrieDictionary to AppendTrieDictionaryBuilder; Add LoadingCache to AppendTrieDictionary and make AppendTrieDictionary is only readable. 6 Remove dependence of LoadingCache when building the GlobalDict. 7 Abstract the HDFS operations to GlobalDictStore. 8 Abstract the metadata of GlobalDict to GlobalDictMetadata. 9 Delete CachedTreeMap. 10 Remove the support of multithreading concurrent build and I will add distributed lock for GlobalDict later. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2446) Support project names filter in DeployCoprocessorCLI
kangkaisen created KYLIN-2446: - Summary: Support project names filter in DeployCoprocessorCLI Key: KYLIN-2446 URL: https://issues.apache.org/jira/browse/KYLIN-2446 Project: Kylin Issue Type: Improvement Components: Storage - HBase Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen Fix For: v2.0.0 we should support updating coprocessor by project names so that user could update coprocessor one project by one project. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2433) NPE in MergeCuboidMapper
kangkaisen created KYLIN-2433: - Summary: NPE in MergeCuboidMapper Key: KYLIN-2433 URL: https://issues.apache.org/jira/browse/KYLIN-2433 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen If all records of one column is null in a segment, there will be a NPE in {{sourceCubeSegment.getDictionary}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2430) Unnecessary exception catching in BulkLoadJob
kangkaisen created KYLIN-2430: - Summary: Unnecessary exception catching in BulkLoadJob Key: KYLIN-2430 URL: https://issues.apache.org/jira/browse/KYLIN-2430 Project: Kylin Issue Type: Bug Components: Storage - HBase Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen FsShell.run has caught all exceptions, So we should get exitCode instead of catching exception. Currently code potentially result in infinite loop in {{LoadIncrementalHFiles}} if user use HBase 0.98.13 and don't set {{hbase.bulkload.retries.number}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KYLIN-2389) Improve resource utilization for DistributedScheduler
kangkaisen created KYLIN-2389: - Summary: Improve resource utilization for DistributedScheduler Key: KYLIN-2389 URL: https://issues.apache.org/jira/browse/KYLIN-2389 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v2.0.0 Reporter: kangkaisen Assignee: kangkaisen Currently, in DistributedScheduler we lock segment in JobService, which will make the job of segment only schedule in jobServer that the job submitted and could not fully utilize the threadPool resource of all jobServers. For example, we have two jobServer and the max concurrent jobs is 10, if we continuously submit 20 jobs to jobServer1, there will be only 10 jobs running at the same time not 20 and will no job running in jobServer2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2388) Hot load kylin config from web
kangkaisen created KYLIN-2388: - Summary: Hot load kylin config from web Key: KYLIN-2388 URL: https://issues.apache.org/jira/browse/KYLIN-2388 Project: Kylin Issue Type: New Feature Components: Web Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen Fix For: v2.0.0 Allow admin user reload kylin config from web, which could improve operational efficiency and service stability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2379) Add UseCMSInitiatingOccupancyOnly to KYLIN_JVM_SETTINGS
kangkaisen created KYLIN-2379: - Summary: Add UseCMSInitiatingOccupancyOnly to KYLIN_JVM_SETTINGS Key: KYLIN-2379 URL: https://issues.apache.org/jira/browse/KYLIN-2379 Project: Kylin Issue Type: Improvement Components: Tools, Build and Test Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor {{CMSInitiatingOccupancyFraction}} is only used for the 1st collection unless {{-XX:+UseCMSInitiatingOccupancyOnly}} is set. The reference linking: https://books.google.com.hk/books?id=aIhUAwAAQBAJ&pg=PA146&lpg=PA146&dq=UseCMSInitiatingOccupancyOnly&source=bl&ots=E51s7uZ1eH&sig=D9nGk_hJu0IQ7QFymCnoekDrWf4&hl=zh-CN&sa=X&ved=0ahUKEwiI2tnQl63RAhWLL48KHZ5tDzA4ChDoAQg5MAQ#v=onepage&q=UseCMSInitiatingOccupancyOnly&f=false https://blog.codecentric.de/en/2013/10/useful-jvm-flags-part-7-cms-collector/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2378) Set job thread name with job uuid
kangkaisen created KYLIN-2378: - Summary: Set job thread name with job uuid Key: KYLIN-2378 URL: https://issues.apache.org/jira/browse/KYLIN-2378 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor Set job thread name with job uuid so that we can quickly diagnose the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2377) Add kylin client query timeout
kangkaisen created KYLIN-2377: - Summary: Add kylin client query timeout Key: KYLIN-2377 URL: https://issues.apache.org/jira/browse/KYLIN-2377 Project: Kylin Issue Type: Improvement Components: Query Engine Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen Add kylin client query timeout to make query server more robust -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2364) Output table name to error info in LookupTable
kangkaisen created KYLIN-2364: - Summary: Output table name to error info in LookupTable Key: KYLIN-2364 URL: https://issues.apache.org/jira/browse/KYLIN-2364 Project: Kylin Issue Type: Improvement Components: Metadata Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor We should output table name so that the user know which LookupTable is broken. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2357) Make ERROR_RECORD_LOG_THRESHOLD configurable
kangkaisen created KYLIN-2357: - Summary: Make ERROR_RECORD_LOG_THRESHOLD configurable Key: KYLIN-2357 URL: https://issues.apache.org/jira/browse/KYLIN-2357 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor currently, the {{BatchConstants.ERROR_RECORD_LOG_THRESHOLD}} is hardcode to 100.I wonder why we accept the error record. Normally, the cubing should have zero error record.Besides, even if only have one error record, the query results will be different from Hive or Presto. So. I think we could make the ERROR_RECORD_LOG_THRESHOLD configurable and the default value is 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2353) Serialize BitmapCounter with distinct count
kangkaisen created KYLIN-2353: - Summary: Serialize BitmapCounter with distinct count Key: KYLIN-2353 URL: https://issues.apache.org/jira/browse/KYLIN-2353 Project: Kylin Issue Type: Improvement Components: Metadata Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen Currently, we deserialize the bitmap whether we need to aggregate or not. Actually, we could serialize {{BitmapCounter}} with bitmap counter and delay to deserialize bitmap until we need to aggregate bitmap and only get the counter for the bitmap when deserialize. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2349) Serialize BitmapCounter with peekLength
kangkaisen created KYLIN-2349: - Summary: Serialize BitmapCounter with peekLength Key: KYLIN-2349 URL: https://issues.apache.org/jira/browse/KYLIN-2349 Project: Kylin Issue Type: Improvement Components: Metadata Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen Currently, in {{BitmapCounter}} we deserialize the bitmap to get the peekLength, we know which is expensive in terms of CPU time from JMC hot code. Actually, we could Serialize {{BitmapCounter}} with peekLength to avoid deserializing the bitmap when we get peekLength. Of course, we need to keep forward compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2338) refactor BitmapCounter.DataInputByteBuffer
kangkaisen created KYLIN-2338: - Summary: refactor BitmapCounter.DataInputByteBuffer Key: KYLIN-2338 URL: https://issues.apache.org/jira/browse/KYLIN-2338 Project: Kylin Issue Type: Improvement Components: Metadata Reporter: kangkaisen Assignee: kangkaisen Priority: Minor Make BitmapCounter.DataInputByteBuffer simpler and more readable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2337) Remove expensive toString in SortedIteratorMergerWithLimit
kangkaisen created KYLIN-2337: - Summary: Remove expensive toString in SortedIteratorMergerWithLimit Key: KYLIN-2337 URL: https://issues.apache.org/jira/browse/KYLIN-2337 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen The toString in {{SortedIteratorMergerWithLimit.MergedIteratorWithLimit.next}} is expensive and unnecessary -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2308) Allow user to set more columnFamily in web
kangkaisen created KYLIN-2308: - Summary: Allow user to set more columnFamily in web Key: KYLIN-2308 URL: https://issues.apache.org/jira/browse/KYLIN-2308 Project: Kylin Issue Type: Improvement Components: Web Affects Versions: v1.6.1 Reporter: kangkaisen Assignee: kangkaisen currently, when user set dozens of precise count distinct metrics in one cube, we put all the count distinct metrics column in one columnFamily. Which result in HBase scan become slow because the one {{KeyValue}} is too big. we could set more columnFamily to speed up the HBase scan in this scenario. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2304) Only copy latest version dict for global dict
kangkaisen created KYLIN-2304: - Summary: Only copy latest version dict for global dict Key: KYLIN-2304 URL: https://issues.apache.org/jira/browse/KYLIN-2304 Project: Kylin Issue Type: Improvement Affects Versions: v1.6.1 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor After KYLIN-2192, building global dict will use multiple versions. when we migrate the cube, we only need copy the latest version dict, otherwise we will take a long time to copy the all version dicts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2287) Speed up model and cube list load in Web
kangkaisen created KYLIN-2287: - Summary: Speed up model and cube list load in Web Key: KYLIN-2287 URL: https://issues.apache.org/jira/browse/KYLIN-2287 Project: Kylin Issue Type: Improvement Components: Web Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen Priority: Critical Currently, if a project has more than one hundred cubes and models, the "Model" page load will take a long time because there are a lot of http requests. So we need to reduce and defer the http requests when initially load "Model" page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2270) Reduce MR memory usage for global dict
kangkaisen created KYLIN-2270: - Summary: Reduce MR memory usage for global dict Key: KYLIN-2270 URL: https://issues.apache.org/jira/browse/KYLIN-2270 Project: Kylin Issue Type: Improvement Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen currently, in {{Build Base Cuboid Data}}, if user use the global dict and the global dict size significantly larger the mapper memory size, the {{CachedTreeMap}} will load all values as much as possible and the soft references object will stick around for a while when GC, So which will make the {{Build Base Cuboid Data}} mapper pause for a long time even could not finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2269) Reduce MR memory usage for global dict
kangkaisen created KYLIN-2269: - Summary: Reduce MR memory usage for global dict Key: KYLIN-2269 URL: https://issues.apache.org/jira/browse/KYLIN-2269 Project: Kylin Issue Type: Improvement Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen currently, in {{Build Base Cuboid Data}}, if user use the global dict and the global dict size significantly larger the mapper memory size, the {{CachedTreeMap}} will load all values as much as possible and the soft references object will stick around for a while when GC, So which will make the {{Build Base Cuboid Data}} mapper pause for a long time even could not finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2266) Reduce memory usage for building global dict
kangkaisen created KYLIN-2266: - Summary: Reduce memory usage for building global dict Key: KYLIN-2266 URL: https://issues.apache.org/jira/browse/KYLIN-2266 Project: Kylin Issue Type: Improvement Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: kangkaisen Because the input for building global dict is sequential,so we could set max cache size to 1 to reduce the memory usage. Although we also could set `kylin.dict.append.cache.size` to 1 to reduce the memory usage, most of users don't know this config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2242) Directly write hdfs file in reducer is dangerous
kangkaisen created KYLIN-2242: - Summary: Directly write hdfs file in reducer is dangerous Key: KYLIN-2242 URL: https://issues.apache.org/jira/browse/KYLIN-2242 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v1.6.0 Reporter: kangkaisen Assignee: Dong Li currently, Kylin directly write hdfs file in {{FactDistinctColumnsReducer}}, which is dangerous because the MapReduce Speculative Execution will result in more than one reducers write the same hdfs file at the same time. After KYLIN-2217, I think this issue will occur with higher probability. we should output the value by {{context.wirte}} in reducer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2239) Remove refreshSegment in JobService
kangkaisen created KYLIN-2239: - Summary: Remove refreshSegment in JobService Key: KYLIN-2239 URL: https://issues.apache.org/jira/browse/KYLIN-2239 Project: Kylin Issue Type: Improvement Reporter: kangkaisen Assignee: kangkaisen Priority: Minor currently, we have three build types:build, refresh, merge. But the build and t refresh type only is one job type indeed and the build type could replace the refresh type completely. So, I think the refresh type is redundant. we can firstly remove refreshSegment in JobService internal and keep the web api unchanged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2238) Add query server scan threshold
kangkaisen created KYLIN-2238: - Summary: Add query server scan threshold Key: KYLIN-2238 URL: https://issues.apache.org/jira/browse/KYLIN-2238 Project: Kylin Issue Type: Improvement Components: Query Engine Affects Versions: v1.5.4.1 Reporter: kangkaisen Assignee: kangkaisen currently, we have added scan threshold in HBase RegionServer, we should also add scan threshold in Kylin query server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2237) Ensure dimensions and measures of model don't have null column
kangkaisen created KYLIN-2237: - Summary: Ensure dimensions and measures of model don't have null column Key: KYLIN-2237 URL: https://issues.apache.org/jira/browse/KYLIN-2237 Project: Kylin Issue Type: Bug Components: Metadata Affects Versions: v1.5.4.1 Reporter: kangkaisen Assignee: kangkaisen currently, the dimensions or measures of model maybe have null column. like this: {{u'dimensions': [{u'table': u'TEST.KYLIN_CAL_DT_KKS', u'columns': [u'CAL_DT', u'YEAR_BEG_DT', u'QTR_BEG_DT', None, u'DAY_OF_CAL_ID_KKS']}],}} which could be produced by the following steps: 1. rename the hive column in model dimensions or measures. 2. reload the hive table. 3. don't remove the null column because of carelessness and update the model. 4 edit the model again and could not select the dimensions or measures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2180) Add project config and make config priority become "cube > project > server"
kangkaisen created KYLIN-2180: - Summary: Add project config and make config priority become "cube > project > server" Key: KYLIN-2180 URL: https://issues.apache.org/jira/browse/KYLIN-2180 Project: Kylin Issue Type: New Feature Components: Metadata Affects Versions: v1.5.4.1 Reporter: kangkaisen Assignee: kangkaisen There are cases we want to override global kylin.properties in the scope of a project. E.g. the queue name of Hadoop job. Finally, the config priority for Kylin should be "cube > project > server". I think which is reasonable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2153) Allow user to skip the check in CubeMetaIngester
kangkaisen created KYLIN-2153: - Summary: Allow user to skip the check in CubeMetaIngester Key: KYLIN-2153 URL: https://issues.apache.org/jira/browse/KYLIN-2153 Project: Kylin Issue Type: Bug Components: Tools, Build and Test Affects Versions: v1.5.4.1 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor when the model has multiple cubes or the user want to overwrite the model or cube indeed, we should allow user to skip the check in {{CubeMetaIngester}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2135) Enlarge FactDistinctColumns reducer number
kangkaisen created KYLIN-2135: - Summary: Enlarge FactDistinctColumns reducer number Key: KYLIN-2135 URL: https://issues.apache.org/jira/browse/KYLIN-2135 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v1.5.4.1 Reporter: kangkaisen Assignee: kangkaisen When the hive table has billions of rows and use global dictionary for precise count distinct measures, the {{Extract Fact Table Distinct Columns}} job will run o long time. So we could use more reducer to deal with the one column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2130) QueryMetrics concurrent bug fix
kangkaisen created KYLIN-2130: - Summary: QueryMetrics concurrent bug fix Key: KYLIN-2130 URL: https://issues.apache.org/jira/browse/KYLIN-2130 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v1.5.4.1, v1.5.4 Reporter: kangkaisen Assignee: kangkaisen Priority: Minor Recently,I made a concurrent kylin query test and found a little bug in QueryMetrics: If the initial query to a cube or a project is concurrent, the QueryMetric will register failed and throw a MetricsException. The exception is like this: "exception":"Metrics source kylin_test,sub=xxx already exists!" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2127) UI bug fix for Extend Column
kangkaisen created KYLIN-2127: - Summary: UI bug fix for Extend Column Key: KYLIN-2127 URL: https://issues.apache.org/jira/browse/KYLIN-2127 Project: Kylin Issue Type: Bug Components: Web Affects Versions: v1.5.4.1 Reporter: kangkaisen Assignee: kangkaisen In the 1.5.4.1 version of Kylin. we firstly add a new SUM(MAX, MIN...) measure and then add a Extend Column measure, finally save the cube will fail. Because of the json data of Extend Column measure is like this: {{{ "name": "周起始日", "function": { "expression": "EXTENDED_COLUMN", "returntype": "extendedcolumn(100)", "parameter": { "type": "column", "value": "WK", "next_parameter": { "type": "column", "value": "WK_FROM", "next_parameter": {} } }, "configuration": null } },}}. the last {{next_parameter}} is {}, it should be null. This bug may be introduced by KYLIN-1767. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2114) WEB-Global-Dictionary bug fix and improve
kangkaisen created KYLIN-2114: - Summary: WEB-Global-Dictionary bug fix and improve Key: KYLIN-2114 URL: https://issues.apache.org/jira/browse/KYLIN-2114 Project: Kylin Issue Type: Bug Components: Web Affects Versions: v1.5.4.1 Reporter: kangkaisen Assignee: kangkaisen in the 1.5.4.1 version of Kylin, the web UI for WEB-Global-Dictionary couldn't select column from measure columns and need user to input the dictionary builder class manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2109) Deploy coprocessor only this server own the table
kangkaisen created KYLIN-2109: - Summary: Deploy coprocessor only this server own the table Key: KYLIN-2109 URL: https://issues.apache.org/jira/browse/KYLIN-2109 Project: Kylin Issue Type: Bug Components: Tools, Build and Test Affects Versions: v1.5.4.1 Reporter: kangkaisen Assignee: kangkaisen Priority: Critical When the table has migrated from test env to prod env and we update the coprocessor in the test env, we should not update the coprocessor of the table has migrated, otherwise the queries to prod env will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2093) Clear cache in CubeMetaIngester
kangkaisen created KYLIN-2093: - Summary: Clear cache in CubeMetaIngester Key: KYLIN-2093 URL: https://issues.apache.org/jira/browse/KYLIN-2093 Project: Kylin Issue Type: Bug Components: Tools, Build and Test Affects Versions: v1.5.4.1 Reporter: kangkaisen Assignee: kangkaisen when the target project didn't have the hive table and copied the metadata, the {{MetadataManager}} could not get the hive table from the {{srcTableMap}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2089) Make update HBase coprocessor concurrent
kangkaisen created KYLIN-2089: - Summary: Make update HBase coprocessor concurrent Key: KYLIN-2089 URL: https://issues.apache.org/jira/browse/KYLIN-2089 Project: Kylin Issue Type: Improvement Components: Tools, Build and Test Affects Versions: v1.5.4.1 Reporter: kangkaisen Assignee: kangkaisen When we have thousands of HBase tables and update the coprocessor, it will take several hours. Which means we must stop query service for hours, so we should make updating HBase coprocessor concurrent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2006) Make job build server distributed
kangkaisen created KYLIN-2006: - Summary: Make job build server distributed Key: KYLIN-2006 URL: https://issues.apache.org/jira/browse/KYLIN-2006 Project: Kylin Issue Type: New Feature Components: Job Engine Reporter: kangkaisen Assignee: kangkaisen currently, the Kylin job build server is single-point。 In order to make Kylin job build server more extensible, available, reliable, we should support distributed job build server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1992) Clear ThreadLocal Contexts when query failed before scaning HBase
kangkaisen created KYLIN-1992: - Summary: Clear ThreadLocal Contexts when query failed before scaning HBase Key: KYLIN-1992 URL: https://issues.apache.org/jira/browse/KYLIN-1992 Project: Kylin Issue Type: Bug Reporter: kangkaisen Assignee: kangkaisen Priority: Minor currently, we call `OLAPContext.clearThreadLocalContexts()` function before scaning HBase. if query failed before scaning HBase, we would get wrong `realization` of the query possibly. Because the thread pool of Tomcat multiplexed the thread and didn't clear ThreadLocal variable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1986) CubeMigrationCLI: make global dictionary unique
kangkaisen created KYLIN-1986: - Summary: CubeMigrationCLI: make global dictionary unique Key: KYLIN-1986 URL: https://issues.apache.org/jira/browse/KYLIN-1986 Project: Kylin Issue Type: Bug Components: Tools, Build and Test Affects Versions: v1.5.3 Reporter: kangkaisen Assignee: kangkaisen The global dictionary is shared by all segments of one cube, so when we migrate the global dictionary, we should copy the global dictionary file only once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1982) CubeMigrationCLI: associate model_name with project
kangkaisen created KYLIN-1982: - Summary: CubeMigrationCLI: associate model_name with project Key: KYLIN-1982 URL: https://issues.apache.org/jira/browse/KYLIN-1982 Project: Kylin Issue Type: Bug Components: Tools, Build and Test Affects Versions: v1.5.3 Reporter: kangkaisen Assignee: kangkaisen In the current `CubeMigrationCLI`, when we migrated the cube, the model metadata has migrated indeed, but the model hasn't associated with the project. So, if we get model via `getModels` in `ModelController` with "modelName" and "projectName", we will get null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1965) Check duplicated measure name
kangkaisen created KYLIN-1965: - Summary: Check duplicated measure name Key: KYLIN-1965 URL: https://issues.apache.org/jira/browse/KYLIN-1965 Project: Kylin Issue Type: Improvement Components: Metadata Affects Versions: v1.5.2, v1.5.3 Reporter: kangkaisen Assignee: kangkaisen The duplicated measure's name will lead to query failed, so we should check duplicated measure name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1908) Collect Metrics to JMX
kangkaisen created KYLIN-1908: - Summary: Collect Metrics to JMX Key: KYLIN-1908 URL: https://issues.apache.org/jira/browse/KYLIN-1908 Project: Kylin Issue Type: New Feature Components: Tools, Build and Test Affects Versions: v1.5.2 Reporter: kangkaisen Assignee: kangkaisen As we all known, some performance metrics is important for enterprise applications. so we should support to collect metrics to JMX in Kylin. The method I have done is As shown below: 1. use `org.apache.hadoop.metrics2` as the metrics collection framework. 2. define MBean Class for the metrics that we need to collect. 3. update metrics in right place. The questions I have: 1. can I depend on `org.apache.hadoop.metrics2` directly? 2. how do you think about my method? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1896) JDBC support mybatis
kangkaisen created KYLIN-1896: - Summary: JDBC support mybatis Key: KYLIN-1896 URL: https://issues.apache.org/jira/browse/KYLIN-1896 Project: Kylin Issue Type: Bug Components: Driver - JDBC Affects Versions: v1.5.2 Reporter: kangkaisen Assignee: kangkaisen When our user used Mybatis, he found Mybatis need `columnClassType` in `ColumnMetaData`. But in the current version of Kylin, when construct the `ColumnMetaData`, the last parameter `columnClassType` is null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1893) Upgrade spring-boot framework because of security vulnerabilities
kangkaisen created KYLIN-1893: - Summary: Upgrade spring-boot framework because of security vulnerabilities Key: KYLIN-1893 URL: https://issues.apache.org/jira/browse/KYLIN-1893 Project: Kylin Issue Type: Bug Components: REST Service Affects Versions: v1.5.2 Reporter: kangkaisen Assignee: Zhong,Jason Priority: Critical The Spring Boot Framework has a expression of SPEL type injection common vulnerabilities, which affect versions is 1.1-1.3.0. we need upgrade to version 1.3.1 or later. https://www.chinacybersafety.com/tag/the-common-vulnerabilities-and-high-risk-vulnerabilities-early-warning-framework-spring-boot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1884) Reload metadata automatically after migrating cube
kangkaisen created KYLIN-1884: - Summary: Reload metadata automatically after migrating cube Key: KYLIN-1884 URL: https://issues.apache.org/jira/browse/KYLIN-1884 Project: Kylin Issue Type: Improvement Components: Tools, Build and Test Affects Versions: v1.5.2 Reporter: kangkaisen Assignee: kangkaisen in the current version of Kylin, after migrating cube we need reload metadata manually. in our production environment, we have many restServers. so, we hope to reload metadata automatically after migrating cube. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1695) disable cardinality calculation job when loading hive table
kangkaisen created KYLIN-1695: - Summary: disable cardinality calculation job when loading hive table Key: KYLIN-1695 URL: https://issues.apache.org/jira/browse/KYLIN-1695 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v1.5.1 Reporter: kangkaisen Assignee: Dong Li When user loads/reloads hive tables from web console, kylin will submit a mr job asynchronously to calculate column cardinalities. This has four major problems: # the calculated cardinality is stored in table metadata, but never used in cubing/querying # table may change after loading, so the cardinality doesn't necessarily reflect the actual value # the current `HiveColumnCardinalityJob` has many limitations, e.g., it doesn't support views # the `HiveColumnCardinalityJob` may use lots of resources when computing cardinality of partitioned table Due to these problems, we should disable it by default and (maybe) remove it in future releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1694) make multiply coefficient configurable when estimating cuboid size
kangkaisen created KYLIN-1694: - Summary: make multiply coefficient configurable when estimating cuboid size Key: KYLIN-1694 URL: https://issues.apache.org/jira/browse/KYLIN-1694 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v1.5.1, v1.5.0 Reporter: kangkaisen Assignee: Dong Li In the current version of MRv2 build engine, in CubeStatsReader when estimating cuboid size , the curent method is "cube is memory hungry, storage size estimation multiply 0.05" and "cube is not memory hungry, storage size estimation multiply 0.25". This has one major problems:the default multiply coefficient is smaller, this will make the estimated cuboid size much less than the actual cuboid size,which will lead to the region numbers of HBase and the reducer numbers of CubeHFileJob are both smaller. obviously, the current method makes the job of CubeHFileJob much slower. After we remove the the default multiply coefficient, the job of CubeHFileJob becomes much faster. we'd better make multiply coefficient configurable and this could be more friendly for user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)