[jira] [Assigned] (KYLIN-4372) Docker entrypoint delete file too later cause ZK started by HBase crash
[ https://issues.apache.org/jira/browse/KYLIN-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-4372: - Assignee: weibin0516 > Docker entrypoint delete file too later cause ZK started by HBase crash > --- > > Key: KYLIN-4372 > URL: https://issues.apache.org/jira/browse/KYLIN-4372 > Project: Kylin > Issue Type: Bug > Components: Others >Affects Versions: v3.0.0-alpha2 >Reporter: Yue Zhang >Assignee: weibin0516 >Priority: Critical > > In docker/entrypoint.sh > > {code:java} > # start hbase > $HBASE_HOME/bin/start-hbase.sh > # start kafka > rm -rf /tmp/kafka-logs > rm -rf /data/zookeeper/* > nohup $KAFKA_HOME/bin/kafka-server-start.sh > $KAFKA_HOME/config/server.properties & > {code} > rm -rf /data/zookeeper/* should before starting HBase instead of before > starting Kafka. > It executes after HBase will cause ZK started by HBase crash. > The crash logs of /home/admin/hbase-1.1.2/logs/hbase--master-9aef5f427eb6.log: > {code:java} > 2020-02-10 09:25:56,402 INFO [SyncThread:0] persistence.FileTxnLog: Creating > new log file: log.1 2020-02-10 09:25:56,402 ERROR [SyncThread:0] > server.SyncRequestProcessor: Severe unrecoverable error, exiting > java.io.FileNotFoundException: /data/zookeeper/zookeeper_0/version-2/log.1 > (No such file or directory) at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) at > java.io.FileOutputStream.(FileOutputStream.java:213) at > java.io.FileOutputStream.(FileOutputStream.java:162) at > org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:205) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:314) > at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:476) at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:140) > {code} > I think the shell should be like this > {code:java} > # start hbase > rm -rf /data/zookeeper/* > $HBASE_HOME/bin/start-hbase.sh > # start kafka > rm -rf /tmp/kafka-logs > nohup $KAFKA_HOME/bin/kafka-server-start.sh > $KAFKA_HOME/config/server.properties & {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4372) Docker entrypoint delete file too later cause ZK started by HBase crash
[ https://issues.apache.org/jira/browse/KYLIN-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033561#comment-17033561 ] weibin0516 commented on KYLIN-4372: --- [~cijianzy], ok, if you don't mind, I can submit a pr to fix this bug > Docker entrypoint delete file too later cause ZK started by HBase crash > --- > > Key: KYLIN-4372 > URL: https://issues.apache.org/jira/browse/KYLIN-4372 > Project: Kylin > Issue Type: Bug > Components: Others >Affects Versions: v3.0.0-alpha2 >Reporter: Yue Zhang >Priority: Critical > > In docker/entrypoint.sh > > {code:java} > # start hbase > $HBASE_HOME/bin/start-hbase.sh > # start kafka > rm -rf /tmp/kafka-logs > rm -rf /data/zookeeper/* > nohup $KAFKA_HOME/bin/kafka-server-start.sh > $KAFKA_HOME/config/server.properties & > {code} > rm -rf /data/zookeeper/* should before starting HBase instead of before > starting Kafka. > It executes after HBase will cause ZK started by HBase crash. > The crash logs of /home/admin/hbase-1.1.2/logs/hbase--master-9aef5f427eb6.log: > {code:java} > 2020-02-10 09:25:56,402 INFO [SyncThread:0] persistence.FileTxnLog: Creating > new log file: log.1 2020-02-10 09:25:56,402 ERROR [SyncThread:0] > server.SyncRequestProcessor: Severe unrecoverable error, exiting > java.io.FileNotFoundException: /data/zookeeper/zookeeper_0/version-2/log.1 > (No such file or directory) at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) at > java.io.FileOutputStream.(FileOutputStream.java:213) at > java.io.FileOutputStream.(FileOutputStream.java:162) at > org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:205) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:314) > at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:476) at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:140) > {code} > I think the shell should be like this > {code:java} > # start hbase > rm -rf /data/zookeeper/* > $HBASE_HOME/bin/start-hbase.sh > # start kafka > rm -rf /tmp/kafka-logs > nohup $KAFKA_HOME/bin/kafka-server-start.sh > $KAFKA_HOME/config/server.properties & {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4372) Docker entrypoint delete file too later cause ZK started by HBase crash
[ https://issues.apache.org/jira/browse/KYLIN-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033479#comment-17033479 ] weibin0516 commented on KYLIN-4372: --- Hi, [~cijianzy], is it the restart container that encountered this error? > Docker entrypoint delete file too later cause ZK started by HBase crash > --- > > Key: KYLIN-4372 > URL: https://issues.apache.org/jira/browse/KYLIN-4372 > Project: Kylin > Issue Type: Bug > Components: Others >Affects Versions: v3.0.0-alpha2 >Reporter: Yue Zhang >Priority: Critical > > In docker/entrypoint.sh > > {code:java} > # start hbase > $HBASE_HOME/bin/start-hbase.sh > # start kafka > rm -rf /tmp/kafka-logs > rm -rf /data/zookeeper/* > nohup $KAFKA_HOME/bin/kafka-server-start.sh > $KAFKA_HOME/config/server.properties & > {code} > rm -rf /data/zookeeper/* should before starting HBase instead of before > starting Kafka. > It executes after HBase will cause ZK started by HBase crash. > The crash logs of /home/admin/hbase-1.1.2/logs/hbase--master-9aef5f427eb6.log: > {code:java} > 2020-02-10 09:25:56,402 INFO [SyncThread:0] persistence.FileTxnLog: Creating > new log file: log.1 2020-02-10 09:25:56,402 ERROR [SyncThread:0] > server.SyncRequestProcessor: Severe unrecoverable error, exiting > java.io.FileNotFoundException: /data/zookeeper/zookeeper_0/version-2/log.1 > (No such file or directory) at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) at > java.io.FileOutputStream.(FileOutputStream.java:213) at > java.io.FileOutputStream.(FileOutputStream.java:162) at > org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:205) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:314) > at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:476) at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:140) > {code} > I think the shell should be like this > {code:java} > # start hbase > rm -rf /data/zookeeper/* > $HBASE_HOME/bin/start-hbase.sh > # start kafka > rm -rf /tmp/kafka-logs > nohup $KAFKA_HOME/bin/kafka-server-start.sh > $KAFKA_HOME/config/server.properties & {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4340) Fix bug of get value of isSparkFactDistinctEnable for cube not correct
[ https://issues.apache.org/jira/browse/KYLIN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4340: -- Summary: Fix bug of get value of isSparkFactDistinctEnable for cube not correct (was: Cube Configuration Overwrites not effective) > Fix bug of get value of isSparkFactDistinctEnable for cube not correct > -- > > Key: KYLIN-4340 > URL: https://issues.apache.org/jira/browse/KYLIN-4340 > Project: Kylin > Issue Type: Bug >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > Attachments: image-2020-01-13-23-20-23-476.png > > > In kylin.properties, > {code:java} > kylin.engine.spark-fact-distinct=true > {code} > !image-2020-01-13-23-20-23-476.png! > set this config to false in cube, but not effective when build cube -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4361) Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with Sqoop.
[ https://issues.apache.org/jira/browse/KYLIN-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025963#comment-17025963 ] weibin0516 commented on KYLIN-4361: --- OK, if there are results, please let me know. > Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with > Sqoop. > --- > > Key: KYLIN-4361 > URL: https://issues.apache.org/jira/browse/KYLIN-4361 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 > Environment: HDP3.1 >Reporter: Sonu Singh >Assignee: weibin0516 >Priority: Blocker > Fix For: v3.0.0 > > Attachments: image-2020-01-28-11-39-25-860.png > > > I am trying to submit a job with JDBC data sources and getting > nullpointerexception because of below code: > File Path - > \kylin\source-jdbc\src\main\java\org\apache\kylin\source\jdbc\JdbcHiveInputBase.java > method - createSqoopToFlatHiveStep > //code start > String partCol = null; > if (partitionDesc.isPartitioned()) { > partCol = partitionDesc.getPartitionDateColumn();//tablename.colname > } > // code end > Fon non-partition cubes, the value of partCol will be always null and > creating a exception in below method: > //code start > static String quoteIdentifier(String identifier, SourceDialect dialect) { > if (KylinConfig.getInstanceFromEnv().enableHiveDdlQuote()) { > String[] identifierArray = identifier.split("\\."); > //code end > Environment Detail - > HDP3.1 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4361) Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with Sqoop.
[ https://issues.apache.org/jira/browse/KYLIN-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025670#comment-17025670 ] weibin0516 commented on KYLIN-4361: --- OK, what is your jdbc database? mysql or something else? Please provide the ddl of the table(using show create table {tableName}) for verification. > Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with > Sqoop. > --- > > Key: KYLIN-4361 > URL: https://issues.apache.org/jira/browse/KYLIN-4361 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 > Environment: HDP3.1 >Reporter: Sonu Singh >Assignee: weibin0516 >Priority: Blocker > Fix For: v3.0.0 > > Attachments: image-2020-01-28-11-39-25-860.png > > > I am trying to submit a job with JDBC data sources and getting > nullpointerexception because of below code: > File Path - > \kylin\source-jdbc\src\main\java\org\apache\kylin\source\jdbc\JdbcHiveInputBase.java > method - createSqoopToFlatHiveStep > //code start > String partCol = null; > if (partitionDesc.isPartitioned()) { > partCol = partitionDesc.getPartitionDateColumn();//tablename.colname > } > // code end > Fon non-partition cubes, the value of partCol will be always null and > creating a exception in below method: > //code start > static String quoteIdentifier(String identifier, SourceDialect dialect) { > if (KylinConfig.getInstanceFromEnv().enableHiveDdlQuote()) { > String[] identifierArray = identifier.split("\\."); > //code end > Environment Detail - > HDP3.1 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4362) Kylin 3.0.0 Release: MR & Spark Job is failing with JDBC connection and Sqoop.
[ https://issues.apache.org/jira/browse/KYLIN-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025616#comment-17025616 ] weibin0516 commented on KYLIN-4362: --- OK ~ > Kylin 3.0.0 Release: MR & Spark Job is failing with JDBC connection and Sqoop. > -- > > Key: KYLIN-4362 > URL: https://issues.apache.org/jira/browse/KYLIN-4362 > Project: Kylin > Issue Type: Bug >Reporter: Sonu Singh >Priority: Blocker > Fix For: v3.0.0 > > Attachments: image-2020-01-28-11-39-59-098.png > > > MR and SPark job are failing on HDP3.1 with below error: > -00 execute finished with exception > java.io.IOException: OS command error exit with return code: 1, error > message: Warning: /usr/hdp/3.0.1.0-187/accumulo does not exist! Accumulo > imports will fail. > Please set $ACCUMULO_HOME to the root of your Accumulo installation. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.0.1.0-187/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.0.1.0-187/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 20/01/27 17:09:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.0.1.0-187 > Missing argument for option: split-by > The command is: > /usr/hdp/current/sqoop-client/bin/sqoop import > -Dorg.apache.sqoop.splitter.allow_text_splitter=true > -Dmapreduce.job.queuename=default --connect "jdbc:vdb:/ > /XX.XX.XX.XX:XX/X" --driver com..XX.jdbc.Driver --username X > --password "XXX" --query "SELECT \`sales\`.\`locationdim ensionid\` as > \`SALES_LOCATIONDIMENSIONID\` ,\`sales\`.\`storeitemdimensionid\` as > \`SALES_STOREITEMDIMENSIONID\` ,\`sales\`.\`basecostperunit\` as \`SALES_ > BASECOSTPERUNIT\` ,\`sales\`.\`createdby\` as \`SALES_CREATEDBY\` > ,\`sales\`.\`updateddate\` as \`SALES_UPDATEDDATE\` FROM \`\`.\`sales\` > \`sale s\` WHERE 1=1 AND \$CONDITIONS" --target-dir > hdfs://XX-master:8020/apps/XXX/XXX/kylin-4f367799-4993-bb67-da69-a9a147c62a1e/kylin_intermediate_cube_11_2701 > 2020_1d0a2dfd_bd66_d3e3_304b_9cd7f2018dbc --split-by --boundary-query > "SELECT min(\`\`), max(\`\`) FROM \`XX\`.\`sales\` " --null-string '\\N' > --n ull-non-string '\\N' --fields-terminated-by '|' --num-mappers 4 > at > org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:88) > at org.apache.kylin.source.jdbc.CmdStep.sqoopFlatHiveTable(CmdStep.java:43) > at org.apache.kylin.source.jdbc.CmdStep.doWork(CmdStep.java:54) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:171) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:62) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:171) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:106) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2020-01-27 17:09:19,362 INFO [Scheduler 1642300543 Job > 4f367799-4993-bb67-da69-a9a147c62a1e-160] execution.ExecutableManager:466 : > job id:4f367799-4993-bb6 7-da69-a9a147c62a1e-00 from RUNNING to ERROR > 2020-01-27 17:09:19,365 ERROR [Scheduler 1642300543 Job > 4f367799-4993-bb67-da69-a9a147c62a1e-160] execution.AbstractExecutable:173 : > error running Executabl e: > CubingJob\{id=4f367799-4993-bb67-da69-a9a147c62a1e, name=BUILD CUBE - > cube_11_27012020 - FULL_BUILD - UTC 2020-01-27 17:09:00, state=RUNNING} > 2020-01-27 17:09:19,372 DEBUG [pool-7-thread-1] cachesync.Broadcaster:111 : > Servers in the cluster: [localhost:7070] > 2020-01-27 17:09:19,373 DEBUG [pool-7-thread-1] cachesync.Broadcaster:121 : > Announcing new bro > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4362) Kylin 3.0.0 Release: MR & Spark Job is failing with JDBC connection and Sqoop.
[ https://issues.apache.org/jira/browse/KYLIN-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-4362: - Assignee: weibin0516 > Kylin 3.0.0 Release: MR & Spark Job is failing with JDBC connection and Sqoop. > -- > > Key: KYLIN-4362 > URL: https://issues.apache.org/jira/browse/KYLIN-4362 > Project: Kylin > Issue Type: Bug >Reporter: Sonu Singh >Assignee: weibin0516 >Priority: Blocker > Fix For: v3.0.0 > > Attachments: image-2020-01-28-11-39-59-098.png > > > MR and SPark job are failing on HDP3.1 with below error: > -00 execute finished with exception > java.io.IOException: OS command error exit with return code: 1, error > message: Warning: /usr/hdp/3.0.1.0-187/accumulo does not exist! Accumulo > imports will fail. > Please set $ACCUMULO_HOME to the root of your Accumulo installation. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.0.1.0-187/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.0.1.0-187/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 20/01/27 17:09:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.0.1.0-187 > Missing argument for option: split-by > The command is: > /usr/hdp/current/sqoop-client/bin/sqoop import > -Dorg.apache.sqoop.splitter.allow_text_splitter=true > -Dmapreduce.job.queuename=default --connect "jdbc:vdb:/ > /XX.XX.XX.XX:XX/X" --driver com..XX.jdbc.Driver --username X > --password "XXX" --query "SELECT \`sales\`.\`locationdim ensionid\` as > \`SALES_LOCATIONDIMENSIONID\` ,\`sales\`.\`storeitemdimensionid\` as > \`SALES_STOREITEMDIMENSIONID\` ,\`sales\`.\`basecostperunit\` as \`SALES_ > BASECOSTPERUNIT\` ,\`sales\`.\`createdby\` as \`SALES_CREATEDBY\` > ,\`sales\`.\`updateddate\` as \`SALES_UPDATEDDATE\` FROM \`\`.\`sales\` > \`sale s\` WHERE 1=1 AND \$CONDITIONS" --target-dir > hdfs://XX-master:8020/apps/XXX/XXX/kylin-4f367799-4993-bb67-da69-a9a147c62a1e/kylin_intermediate_cube_11_2701 > 2020_1d0a2dfd_bd66_d3e3_304b_9cd7f2018dbc --split-by --boundary-query > "SELECT min(\`\`), max(\`\`) FROM \`XX\`.\`sales\` " --null-string '\\N' > --n ull-non-string '\\N' --fields-terminated-by '|' --num-mappers 4 > at > org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:88) > at org.apache.kylin.source.jdbc.CmdStep.sqoopFlatHiveTable(CmdStep.java:43) > at org.apache.kylin.source.jdbc.CmdStep.doWork(CmdStep.java:54) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:171) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:62) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:171) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:106) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2020-01-27 17:09:19,362 INFO [Scheduler 1642300543 Job > 4f367799-4993-bb67-da69-a9a147c62a1e-160] execution.ExecutableManager:466 : > job id:4f367799-4993-bb6 7-da69-a9a147c62a1e-00 from RUNNING to ERROR > 2020-01-27 17:09:19,365 ERROR [Scheduler 1642300543 Job > 4f367799-4993-bb67-da69-a9a147c62a1e-160] execution.AbstractExecutable:173 : > error running Executabl e: > CubingJob\{id=4f367799-4993-bb67-da69-a9a147c62a1e, name=BUILD CUBE - > cube_11_27012020 - FULL_BUILD - UTC 2020-01-27 17:09:00, state=RUNNING} > 2020-01-27 17:09:19,372 DEBUG [pool-7-thread-1] cachesync.Broadcaster:111 : > Servers in the cluster: [localhost:7070] > 2020-01-27 17:09:19,373 DEBUG [pool-7-thread-1] cachesync.Broadcaster:121 : > Announcing new bro > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4361) Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with Sqoop.
[ https://issues.apache.org/jira/browse/KYLIN-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025606#comment-17025606 ] weibin0516 commented on KYLIN-4361: --- Hi, Singh, this is indeed a bug in the old version, but in the latest code of the master branch, the original bug code has changed. Can you try to test with the latest code from the master branch? According to the code logic, I think it should also report an error. > Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with > Sqoop. > --- > > Key: KYLIN-4361 > URL: https://issues.apache.org/jira/browse/KYLIN-4361 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 > Environment: HDP3.1 >Reporter: Sonu Singh >Assignee: weibin0516 >Priority: Blocker > Fix For: v3.0.0 > > Attachments: image-2020-01-28-11-39-25-860.png > > > I am trying to submit a job with JDBC data sources and getting > nullpointerexception because of below code: > File Path - > \kylin\source-jdbc\src\main\java\org\apache\kylin\source\jdbc\JdbcHiveInputBase.java > method - createSqoopToFlatHiveStep > //code start > String partCol = null; > if (partitionDesc.isPartitioned()) { > partCol = partitionDesc.getPartitionDateColumn();//tablename.colname > } > // code end > Fon non-partition cubes, the value of partCol will be always null and > creating a exception in below method: > //code start > static String quoteIdentifier(String identifier, SourceDialect dialect) { > if (KylinConfig.getInstanceFromEnv().enableHiveDdlQuote()) { > String[] identifierArray = identifier.split("\\."); > //code end > Environment Detail - > HDP3.1 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (KYLIN-4361) Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with Sqoop.
[ https://issues.apache.org/jira/browse/KYLIN-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4361: -- Comment: was deleted (was: I think this is a bug. When the jdbc table is not partitioned, the corresponding partition identifier is null, but splitting the null partition identifier will cause this exception. I will mention a pr to fix the bug.) > Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with > Sqoop. > --- > > Key: KYLIN-4361 > URL: https://issues.apache.org/jira/browse/KYLIN-4361 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 > Environment: HDP3.1 >Reporter: Sonu Singh >Assignee: weibin0516 >Priority: Blocker > Fix For: v3.0.0 > > Attachments: image-2020-01-28-11-39-25-860.png > > > I am trying to submit a job with JDBC data sources and getting > nullpointerexception because of below code: > File Path - > \kylin\source-jdbc\src\main\java\org\apache\kylin\source\jdbc\JdbcHiveInputBase.java > method - createSqoopToFlatHiveStep > //code start > String partCol = null; > if (partitionDesc.isPartitioned()) { > partCol = partitionDesc.getPartitionDateColumn();//tablename.colname > } > // code end > Fon non-partition cubes, the value of partCol will be always null and > creating a exception in below method: > //code start > static String quoteIdentifier(String identifier, SourceDialect dialect) { > if (KylinConfig.getInstanceFromEnv().enableHiveDdlQuote()) { > String[] identifierArray = identifier.split("\\."); > //code end > Environment Detail - > HDP3.1 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4361) Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with Sqoop.
[ https://issues.apache.org/jira/browse/KYLIN-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025588#comment-17025588 ] weibin0516 commented on KYLIN-4361: --- I think this is a bug. When the jdbc table is not partitioned, the corresponding partition identifier is null, but splitting the null partition identifier will cause this exception. I will mention a pr to fix the bug. > Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with > Sqoop. > --- > > Key: KYLIN-4361 > URL: https://issues.apache.org/jira/browse/KYLIN-4361 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 > Environment: HDP3.1 >Reporter: Sonu Singh >Priority: Blocker > Fix For: v3.0.0 > > Attachments: image-2020-01-28-11-39-25-860.png > > > I am trying to submit a job with JDBC data sources and getting > nullpointerexception because of below code: > File Path - > \kylin\source-jdbc\src\main\java\org\apache\kylin\source\jdbc\JdbcHiveInputBase.java > method - createSqoopToFlatHiveStep > //code start > String partCol = null; > if (partitionDesc.isPartitioned()) { > partCol = partitionDesc.getPartitionDateColumn();//tablename.colname > } > // code end > Fon non-partition cubes, the value of partCol will be always null and > creating a exception in below method: > //code start > static String quoteIdentifier(String identifier, SourceDialect dialect) { > if (KylinConfig.getInstanceFromEnv().enableHiveDdlQuote()) { > String[] identifierArray = identifier.split("\\."); > //code end > Environment Detail - > HDP3.1 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4361) Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with Sqoop.
[ https://issues.apache.org/jira/browse/KYLIN-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-4361: - Assignee: weibin0516 > Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with > Sqoop. > --- > > Key: KYLIN-4361 > URL: https://issues.apache.org/jira/browse/KYLIN-4361 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 > Environment: HDP3.1 >Reporter: Sonu Singh >Assignee: weibin0516 >Priority: Blocker > Fix For: v3.0.0 > > Attachments: image-2020-01-28-11-39-25-860.png > > > I am trying to submit a job with JDBC data sources and getting > nullpointerexception because of below code: > File Path - > \kylin\source-jdbc\src\main\java\org\apache\kylin\source\jdbc\JdbcHiveInputBase.java > method - createSqoopToFlatHiveStep > //code start > String partCol = null; > if (partitionDesc.isPartitioned()) { > partCol = partitionDesc.getPartitionDateColumn();//tablename.colname > } > // code end > Fon non-partition cubes, the value of partCol will be always null and > creating a exception in below method: > //code start > static String quoteIdentifier(String identifier, SourceDialect dialect) { > if (KylinConfig.getInstanceFromEnv().enableHiveDdlQuote()) { > String[] identifierArray = identifier.split("\\."); > //code end > Environment Detail - > HDP3.1 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4361) Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with Sqoop.
[ https://issues.apache.org/jira/browse/KYLIN-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024834#comment-17024834 ] weibin0516 commented on KYLIN-4361: --- Can you show the full error stack information? This will help find the cause of the error. > Kylin 3.0.0 Release - Not able to submit job with JDBC Data Sources with > Sqoop. > --- > > Key: KYLIN-4361 > URL: https://issues.apache.org/jira/browse/KYLIN-4361 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 > Environment: HDP3.1 >Reporter: Sonu Singh >Priority: Blocker > Fix For: v3.0.0 > > > I am trying to submit a job with JDBC data sources and getting > nullpointerexception because of below code: > File Path - > \kylin\source-jdbc\src\main\java\org\apache\kylin\source\jdbc\JdbcHiveInputBase.java > method - createSqoopToFlatHiveStep > //code start > String partCol = null; > if (partitionDesc.isPartitioned()) { > partCol = partitionDesc.getPartitionDateColumn();//tablename.colname > } > // code end > Fon non-partition cubes, the value of partCol will be always null and > creating a exception in below method: > //code start > static String quoteIdentifier(String identifier, SourceDialect dialect) { > if (KylinConfig.getInstanceFromEnv().enableHiveDdlQuote()) { > String[] identifierArray = identifier.split("\\."); > //code end > Environment Detail - > HDP3.1 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4362) Kylin 3.0.0 Release: MR & Spark Job is failing with JDBC connection and Sqoop.
[ https://issues.apache.org/jira/browse/KYLIN-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024833#comment-17024833 ] weibin0516 commented on KYLIN-4362: --- Hi, the solution in this link may solve your problem: https://community.cloudera.com/t5/Support-Questions/Warning-usr-lib-sqoop-accumulo-does-not-exist-Accumulo/td-p/22304 > Kylin 3.0.0 Release: MR & Spark Job is failing with JDBC connection and Sqoop. > -- > > Key: KYLIN-4362 > URL: https://issues.apache.org/jira/browse/KYLIN-4362 > Project: Kylin > Issue Type: Bug >Reporter: Sonu Singh >Priority: Blocker > Fix For: v3.0.0 > > > MR and SPark job are failing on HDP3.1 with below error: > -00 execute finished with exception > java.io.IOException: OS command error exit with return code: 1, error > message: Warning: /usr/hdp/3.0.1.0-187/accumulo does not exist! Accumulo > imports will fail. > Please set $ACCUMULO_HOME to the root of your Accumulo installation. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.0.1.0-187/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.0.1.0-187/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 20/01/27 17:09:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.0.1.0-187 > Missing argument for option: split-by > The command is: > /usr/hdp/current/sqoop-client/bin/sqoop import > -Dorg.apache.sqoop.splitter.allow_text_splitter=true > -Dmapreduce.job.queuename=default --connect "jdbc:vdb:/ > /XX.XX.XX.XX:XX/X" --driver com..XX.jdbc.Driver --username X > --password "XXX" --query "SELECT \`sales\`.\`locationdim ensionid\` as > \`SALES_LOCATIONDIMENSIONID\` ,\`sales\`.\`storeitemdimensionid\` as > \`SALES_STOREITEMDIMENSIONID\` ,\`sales\`.\`basecostperunit\` as \`SALES_ > BASECOSTPERUNIT\` ,\`sales\`.\`createdby\` as \`SALES_CREATEDBY\` > ,\`sales\`.\`updateddate\` as \`SALES_UPDATEDDATE\` FROM \`\`.\`sales\` > \`sale s\` WHERE 1=1 AND \$CONDITIONS" --target-dir > hdfs://XX-master:8020/apps/XXX/XXX/kylin-4f367799-4993-bb67-da69-a9a147c62a1e/kylin_intermediate_cube_11_2701 > 2020_1d0a2dfd_bd66_d3e3_304b_9cd7f2018dbc --split-by --boundary-query > "SELECT min(\`\`), max(\`\`) FROM \`XX\`.\`sales\` " --null-string '\\N' > --n ull-non-string '\\N' --fields-terminated-by '|' --num-mappers 4 > at > org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:88) > at org.apache.kylin.source.jdbc.CmdStep.sqoopFlatHiveTable(CmdStep.java:43) > at org.apache.kylin.source.jdbc.CmdStep.doWork(CmdStep.java:54) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:171) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:62) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:171) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:106) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2020-01-27 17:09:19,362 INFO [Scheduler 1642300543 Job > 4f367799-4993-bb67-da69-a9a147c62a1e-160] execution.ExecutableManager:466 : > job id:4f367799-4993-bb6 7-da69-a9a147c62a1e-00 from RUNNING to ERROR > 2020-01-27 17:09:19,365 ERROR [Scheduler 1642300543 Job > 4f367799-4993-bb67-da69-a9a147c62a1e-160] execution.AbstractExecutable:173 : > error running Executabl e: > CubingJob\{id=4f367799-4993-bb67-da69-a9a147c62a1e, name=BUILD CUBE - > cube_11_27012020 - FULL_BUILD - UTC 2020-01-27 17:09:00, state=RUNNING} > 2020-01-27 17:09:19,372 DEBUG [pool-7-thread-1] cachesync.Broadcaster:111 : > Servers in the cluster: [localhost:7070] > 2020-01-27 17:09:19,373 DEBUG [pool-7-thread-1] cachesync.Broadcaster:121 : > Announcing new bro > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4350) Pushdown improperly rewrites the query causing it to fail
[ https://issues.apache.org/jira/browse/KYLIN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019162#comment-17019162 ] weibin0516 commented on KYLIN-4350: --- I verified with v3.0.0 and found no such problem > Pushdown improperly rewrites the query causing it to fail > - > > Key: KYLIN-4350 > URL: https://issues.apache.org/jira/browse/KYLIN-4350 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.6.4 > Environment: HDP 2.6.5, Kylin 2.6.4, CentOS 7.6 >Reporter: Vsevolod Ostapenko >Priority: Major > > A query that uses WITH clause and is subject for pushdown to Hive (or Impala) > for execution is incorrectly rewritten before being submitted to the > execution engine. Table aliases are attributed with database name, with makes > query invalid. > Sample log excerpts are below: > > {quote}2020-01-17 12:12:21,997 INFO [Query > e844b846-c589-4729-5a04-483f6d73c834-31163] service.QueryService:404 : The > original query: with > t as > ( > SELECT ZETTICSDW.A_VL_HOURLY_V.IMSIID "ZETTICSDW_A_VL_HOURLY_V_IMSIID", > ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID > "ZETTICSDW_A_VL_HOURLY_V_MEDIA_GAP_CALL_ID", > count(*) cnt > FROM ZETTICSDW.A_VL_HOURLY_V > WHERE ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20200117') > AND ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '10') > AND (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '10'))) > GROUP BY ZETTICSDW.A_VL_HOURLY_V.IMSIID, > ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID > ) > select t.ZETTICSDW_A_VL_HOURLY_V_IMSIID, > count(*) "vl_aggs_model___CD_MEDIA_GAP_CALL_ID" > *from t* > group by t.ZETTICSDW_A_VL_HOURLY_V_IMSIID > ORDER BY "vl_aggs_model___CD_MEDIA_GAP_CALL_ID" desc > LIMIT 500 > > 2020-01-17 12:12:22,073 INFO [Query > e844b846-c589-4729-5a04-483f6d73c834-31163] > adhocquery.AbstractPushdownRunner:37 : the query is converted to with > t as > ( > SELECT ZETTICSDW.A_VL_HOURLY_V.IMSIID `ZETTICSDW_A_VL_HOURLY_V_IMSIID`, > ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID > `ZETTICSDW_A_VL_HOURLY_V_MEDIA_GAP_CALL_ID`, > count(*) cnt > FROM ZETTICSDW.A_VL_HOURLY_V > WHERE ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20200117') > AND ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '10') > AND (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '10'))) > GROUP BY ZETTICSDW.A_VL_HOURLY_V.IMSIID, > ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID > ) > select t.ZETTICSDW_A_VL_HOURLY_V_IMSIID, > count(*) `vl_aggs_model___CD_MEDIA_GAP_CALL_ID` > *{color:#FF}from ZETTICSDW.t{color}* > group by t.ZETTICSDW_A_VL_HOURLY_V_IMSIID > ORDER BY `vl_aggs_model___CD_MEDIA_GAP_CALL_ID` desc > LIMIT 500 after applying converter > org.apache.kylin.source.adhocquery.HivePushDownConverter > 2020-01-17 12:12:22,108 ERROR [Query > e844b846-c589-4729-5a04-483f6d73c834-31163] service.QueryService:989 : > pushdown engine failed current query too > org.apache.hive.service.cli.HiveSQLException: AnalysisException: Could not > resolve table reference: '*zetticsdw.t*' > {quote} > Pushdown query should be submitted into query engine as written by the user. > As the best effort Kylin push down executor should issue "use " > over the same JDBC connection right before submitting the query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4350) Pushdown improperly rewrites the query causing it to fail
[ https://issues.apache.org/jira/browse/KYLIN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019161#comment-17019161 ] weibin0516 commented on KYLIN-4350: --- Hi, [~seva_ostapenko], not all databases support use databse, such as postgresql > Pushdown improperly rewrites the query causing it to fail > - > > Key: KYLIN-4350 > URL: https://issues.apache.org/jira/browse/KYLIN-4350 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.6.4 > Environment: HDP 2.6.5, Kylin 2.6.4, CentOS 7.6 >Reporter: Vsevolod Ostapenko >Priority: Major > > A query that uses WITH clause and is subject for pushdown to Hive (or Impala) > for execution is incorrectly rewritten before being submitted to the > execution engine. Table aliases are attributed with database name, with makes > query invalid. > Sample log excerpts are below: > > {quote}2020-01-17 12:12:21,997 INFO [Query > e844b846-c589-4729-5a04-483f6d73c834-31163] service.QueryService:404 : The > original query: with > t as > ( > SELECT ZETTICSDW.A_VL_HOURLY_V.IMSIID "ZETTICSDW_A_VL_HOURLY_V_IMSIID", > ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID > "ZETTICSDW_A_VL_HOURLY_V_MEDIA_GAP_CALL_ID", > count(*) cnt > FROM ZETTICSDW.A_VL_HOURLY_V > WHERE ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20200117') > AND ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '10') > AND (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '10'))) > GROUP BY ZETTICSDW.A_VL_HOURLY_V.IMSIID, > ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID > ) > select t.ZETTICSDW_A_VL_HOURLY_V_IMSIID, > count(*) "vl_aggs_model___CD_MEDIA_GAP_CALL_ID" > *from t* > group by t.ZETTICSDW_A_VL_HOURLY_V_IMSIID > ORDER BY "vl_aggs_model___CD_MEDIA_GAP_CALL_ID" desc > LIMIT 500 > > 2020-01-17 12:12:22,073 INFO [Query > e844b846-c589-4729-5a04-483f6d73c834-31163] > adhocquery.AbstractPushdownRunner:37 : the query is converted to with > t as > ( > SELECT ZETTICSDW.A_VL_HOURLY_V.IMSIID `ZETTICSDW_A_VL_HOURLY_V_IMSIID`, > ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID > `ZETTICSDW_A_VL_HOURLY_V_MEDIA_GAP_CALL_ID`, > count(*) cnt > FROM ZETTICSDW.A_VL_HOURLY_V > WHERE ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20200117') > AND ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '10') > AND (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '10'))) > GROUP BY ZETTICSDW.A_VL_HOURLY_V.IMSIID, > ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID > ) > select t.ZETTICSDW_A_VL_HOURLY_V_IMSIID, > count(*) `vl_aggs_model___CD_MEDIA_GAP_CALL_ID` > *{color:#FF}from ZETTICSDW.t{color}* > group by t.ZETTICSDW_A_VL_HOURLY_V_IMSIID > ORDER BY `vl_aggs_model___CD_MEDIA_GAP_CALL_ID` desc > LIMIT 500 after applying converter > org.apache.kylin.source.adhocquery.HivePushDownConverter > 2020-01-17 12:12:22,108 ERROR [Query > e844b846-c589-4729-5a04-483f6d73c834-31163] service.QueryService:989 : > pushdown engine failed current query too > org.apache.hive.service.cli.HiveSQLException: AnalysisException: Could not > resolve table reference: '*zetticsdw.t*' > {quote} > Pushdown query should be submitted into query engine as written by the user. > As the best effort Kylin push down executor should issue "use " > over the same JDBC connection right before submitting the query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4349) Close InputStream in RowRecordReader.initReaders()
[ https://issues.apache.org/jira/browse/KYLIN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4349: -- Attachment: image-2020-01-16-10-44-40-119.png Description: Some InputStream not closed properly !image-2020-01-16-10-44-40-119.png! > Close InputStream in RowRecordReader.initReaders() > -- > > Key: KYLIN-4349 > URL: https://issues.apache.org/jira/browse/KYLIN-4349 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > Attachments: image-2020-01-16-10-44-40-119.png > > > Some InputStream not closed properly > !image-2020-01-16-10-44-40-119.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4349) Close InputStream in RowRecordReader.initReaders()
weibin0516 created KYLIN-4349: - Summary: Close InputStream in RowRecordReader.initReaders() Key: KYLIN-4349 URL: https://issues.apache.org/jira/browse/KYLIN-4349 Project: Kylin Issue Type: Bug Affects Versions: v3.0.0 Reporter: weibin0516 Assignee: weibin0516 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (KYLIN-4339) Extract Fact Table Distinct Columns fail due to no kylin installed on worker node
[ https://issues.apache.org/jira/browse/KYLIN-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 closed KYLIN-4339. - Resolution: Duplicate > Extract Fact Table Distinct Columns fail due to no kylin installed on worker > node > - > > Key: KYLIN-4339 > URL: https://issues.apache.org/jira/browse/KYLIN-4339 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > > After set kylin.engine.spark-fact-distinct to true, using spark engine to > build cube will fail, error message as follow > {code:java} > 2020-01-13 22:19:23 INFO BlockManagerMaster:54 - BlockManagerMaster stopped > 2020-01-13 22:19:23 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - > OutputCommitCoordinator stopped! > 2020-01-13 22:19:23 INFO SparkContext:54 - Successfully stopped SparkContext > Exception in thread "main" java.lang.RuntimeException: error execute > org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Job aborted due > to stage failure: Task 9 in stage 1.0 failed 4 times, most recent failure: > Lost task 9.3 in stage 1.0 (TID 32, > sql-gateway-eu95-17.gz00c.test.alipay.net, executor 7): > org.apache.kylin.common.KylinConfigCannotInitException: Didn't find > KYLIN_CONF or KYLIN_HOME, please set one of them > at > org.apache.kylin.common.KylinConfig.getSitePropertiesFile(KylinConfig.java:336) > at > org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:378) > at > org.apache.kylin.common.KylinConfig.buildSiteProperties(KylinConfig.java:358) > at > org.apache.kylin.common.KylinConfig.getInstanceFromEnv(KylinConfig.java:137) > at > org.apache.kylin.dict.CacheDictionary.enableCache(CacheDictionary.java:105) > at > org.apache.kylin.dict.TrieDictionaryForest.initForestCache(TrieDictionaryForest.java:394) > at > org.apache.kylin.dict.TrieDictionaryForest.init(TrieDictionaryForest.java:77) > at > org.apache.kylin.dict.TrieDictionaryForest.(TrieDictionaryForest.java:67) > at > org.apache.kylin.dict.TrieDictionaryForestBuilder.build(TrieDictionaryForestBuilder.java:114) > at > org.apache.kylin.dict.DictionaryGenerator$NumberTrieDictForestBuilder.build(DictionaryGenerator.java:312) > at > org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:774) > at > org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:650) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:800) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:800) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:756) > {code} > we should put kylin.properties in the execution environment of spark > application (via --files) to fix this problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4339) Extract Fact Table Distinct Columns fail due to no kylin installed on worker node
[ https://issues.apache.org/jira/browse/KYLIN-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014893#comment-17014893 ] weibin0516 commented on KYLIN-4339: --- Ok, i will close the jira. > Extract Fact Table Distinct Columns fail due to no kylin installed on worker > node > - > > Key: KYLIN-4339 > URL: https://issues.apache.org/jira/browse/KYLIN-4339 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > > After set kylin.engine.spark-fact-distinct to true, using spark engine to > build cube will fail, error message as follow > {code:java} > 2020-01-13 22:19:23 INFO BlockManagerMaster:54 - BlockManagerMaster stopped > 2020-01-13 22:19:23 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - > OutputCommitCoordinator stopped! > 2020-01-13 22:19:23 INFO SparkContext:54 - Successfully stopped SparkContext > Exception in thread "main" java.lang.RuntimeException: error execute > org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Job aborted due > to stage failure: Task 9 in stage 1.0 failed 4 times, most recent failure: > Lost task 9.3 in stage 1.0 (TID 32, > sql-gateway-eu95-17.gz00c.test.alipay.net, executor 7): > org.apache.kylin.common.KylinConfigCannotInitException: Didn't find > KYLIN_CONF or KYLIN_HOME, please set one of them > at > org.apache.kylin.common.KylinConfig.getSitePropertiesFile(KylinConfig.java:336) > at > org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:378) > at > org.apache.kylin.common.KylinConfig.buildSiteProperties(KylinConfig.java:358) > at > org.apache.kylin.common.KylinConfig.getInstanceFromEnv(KylinConfig.java:137) > at > org.apache.kylin.dict.CacheDictionary.enableCache(CacheDictionary.java:105) > at > org.apache.kylin.dict.TrieDictionaryForest.initForestCache(TrieDictionaryForest.java:394) > at > org.apache.kylin.dict.TrieDictionaryForest.init(TrieDictionaryForest.java:77) > at > org.apache.kylin.dict.TrieDictionaryForest.(TrieDictionaryForest.java:67) > at > org.apache.kylin.dict.TrieDictionaryForestBuilder.build(TrieDictionaryForestBuilder.java:114) > at > org.apache.kylin.dict.DictionaryGenerator$NumberTrieDictForestBuilder.build(DictionaryGenerator.java:312) > at > org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:774) > at > org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:650) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:800) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:800) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:756) > {code} > we should put kylin.properties in the execution environment of spark > application (via --files) to fix this problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4340) Cube Configuration Overwrites not effective
weibin0516 created KYLIN-4340: - Summary: Cube Configuration Overwrites not effective Key: KYLIN-4340 URL: https://issues.apache.org/jira/browse/KYLIN-4340 Project: Kylin Issue Type: Bug Reporter: weibin0516 Assignee: weibin0516 Attachments: image-2020-01-13-23-20-23-476.png In kylin.properties, {code:java} kylin.engine.spark-fact-distinct=true {code} !image-2020-01-13-23-20-23-476.png! set this config to false in cube, but not effective when build cube -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4339) Extract Fact Table Distinct Columns fail due to no kylin installed on worker node
[ https://issues.apache.org/jira/browse/KYLIN-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014393#comment-17014393 ] weibin0516 commented on KYLIN-4339: --- cc [~temple.zhou] ,he met the same problem. > Extract Fact Table Distinct Columns fail due to no kylin installed on worker > node > - > > Key: KYLIN-4339 > URL: https://issues.apache.org/jira/browse/KYLIN-4339 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > > After set kylin.engine.spark-fact-distinct to true, using spark engine to > build cube will fail, error message as follow > {code:java} > 2020-01-13 22:19:23 INFO BlockManagerMaster:54 - BlockManagerMaster stopped > 2020-01-13 22:19:23 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - > OutputCommitCoordinator stopped! > 2020-01-13 22:19:23 INFO SparkContext:54 - Successfully stopped SparkContext > Exception in thread "main" java.lang.RuntimeException: error execute > org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Job aborted due > to stage failure: Task 9 in stage 1.0 failed 4 times, most recent failure: > Lost task 9.3 in stage 1.0 (TID 32, > sql-gateway-eu95-17.gz00c.test.alipay.net, executor 7): > org.apache.kylin.common.KylinConfigCannotInitException: Didn't find > KYLIN_CONF or KYLIN_HOME, please set one of them > at > org.apache.kylin.common.KylinConfig.getSitePropertiesFile(KylinConfig.java:336) > at > org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:378) > at > org.apache.kylin.common.KylinConfig.buildSiteProperties(KylinConfig.java:358) > at > org.apache.kylin.common.KylinConfig.getInstanceFromEnv(KylinConfig.java:137) > at > org.apache.kylin.dict.CacheDictionary.enableCache(CacheDictionary.java:105) > at > org.apache.kylin.dict.TrieDictionaryForest.initForestCache(TrieDictionaryForest.java:394) > at > org.apache.kylin.dict.TrieDictionaryForest.init(TrieDictionaryForest.java:77) > at > org.apache.kylin.dict.TrieDictionaryForest.(TrieDictionaryForest.java:67) > at > org.apache.kylin.dict.TrieDictionaryForestBuilder.build(TrieDictionaryForestBuilder.java:114) > at > org.apache.kylin.dict.DictionaryGenerator$NumberTrieDictForestBuilder.build(DictionaryGenerator.java:312) > at > org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:774) > at > org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:650) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:800) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:800) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:756) > {code} > we should put kylin.properties in the execution environment of spark > application (via --files) to fix this problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4339) Extract Fact Table Distinct Columns fail due to no kylin installed on worker node
[ https://issues.apache.org/jira/browse/KYLIN-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014390#comment-17014390 ] weibin0516 commented on KYLIN-4339: --- Need to appear in a cluster environment, there is no problem with docker trial > Extract Fact Table Distinct Columns fail due to no kylin installed on worker > node > - > > Key: KYLIN-4339 > URL: https://issues.apache.org/jira/browse/KYLIN-4339 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > > After set kylin.engine.spark-fact-distinct to true, using spark engine to > build cube will fail, error message as follow > {code:java} > 2020-01-13 22:19:23 INFO BlockManagerMaster:54 - BlockManagerMaster stopped > 2020-01-13 22:19:23 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - > OutputCommitCoordinator stopped! > 2020-01-13 22:19:23 INFO SparkContext:54 - Successfully stopped SparkContext > Exception in thread "main" java.lang.RuntimeException: error execute > org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Job aborted due > to stage failure: Task 9 in stage 1.0 failed 4 times, most recent failure: > Lost task 9.3 in stage 1.0 (TID 32, > sql-gateway-eu95-17.gz00c.test.alipay.net, executor 7): > org.apache.kylin.common.KylinConfigCannotInitException: Didn't find > KYLIN_CONF or KYLIN_HOME, please set one of them > at > org.apache.kylin.common.KylinConfig.getSitePropertiesFile(KylinConfig.java:336) > at > org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:378) > at > org.apache.kylin.common.KylinConfig.buildSiteProperties(KylinConfig.java:358) > at > org.apache.kylin.common.KylinConfig.getInstanceFromEnv(KylinConfig.java:137) > at > org.apache.kylin.dict.CacheDictionary.enableCache(CacheDictionary.java:105) > at > org.apache.kylin.dict.TrieDictionaryForest.initForestCache(TrieDictionaryForest.java:394) > at > org.apache.kylin.dict.TrieDictionaryForest.init(TrieDictionaryForest.java:77) > at > org.apache.kylin.dict.TrieDictionaryForest.(TrieDictionaryForest.java:67) > at > org.apache.kylin.dict.TrieDictionaryForestBuilder.build(TrieDictionaryForestBuilder.java:114) > at > org.apache.kylin.dict.DictionaryGenerator$NumberTrieDictForestBuilder.build(DictionaryGenerator.java:312) > at > org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:774) > at > org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:650) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:800) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:800) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:756) > {code} > we should put kylin.properties in the execution environment of spark > application (via --files) to fix this problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4339) Extract Fact Table Distinct Columns fail due to no kylin installed on worker node
weibin0516 created KYLIN-4339: - Summary: Extract Fact Table Distinct Columns fail due to no kylin installed on worker node Key: KYLIN-4339 URL: https://issues.apache.org/jira/browse/KYLIN-4339 Project: Kylin Issue Type: Bug Affects Versions: v3.0.0 Reporter: weibin0516 Assignee: weibin0516 After set kylin.engine.spark-fact-distinct to true, using spark engine to build cube will fail, error message as follow {code:java} 2020-01-13 22:19:23 INFO BlockManagerMaster:54 - BlockManagerMaster stopped 2020-01-13 22:19:23 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped! 2020-01-13 22:19:23 INFO SparkContext:54 - Successfully stopped SparkContext Exception in thread "main" java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Job aborted due to stage failure: Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 (TID 32, sql-gateway-eu95-17.gz00c.test.alipay.net, executor 7): org.apache.kylin.common.KylinConfigCannotInitException: Didn't find KYLIN_CONF or KYLIN_HOME, please set one of them at org.apache.kylin.common.KylinConfig.getSitePropertiesFile(KylinConfig.java:336) at org.apache.kylin.common.KylinConfig.buildSiteOrderedProps(KylinConfig.java:378) at org.apache.kylin.common.KylinConfig.buildSiteProperties(KylinConfig.java:358) at org.apache.kylin.common.KylinConfig.getInstanceFromEnv(KylinConfig.java:137) at org.apache.kylin.dict.CacheDictionary.enableCache(CacheDictionary.java:105) at org.apache.kylin.dict.TrieDictionaryForest.initForestCache(TrieDictionaryForest.java:394) at org.apache.kylin.dict.TrieDictionaryForest.init(TrieDictionaryForest.java:77) at org.apache.kylin.dict.TrieDictionaryForest.(TrieDictionaryForest.java:67) at org.apache.kylin.dict.TrieDictionaryForestBuilder.build(TrieDictionaryForestBuilder.java:114) at org.apache.kylin.dict.DictionaryGenerator$NumberTrieDictForestBuilder.build(DictionaryGenerator.java:312) at org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:774) at org.apache.kylin.engine.spark.SparkFactDistinct$MultiOutputFunction.call(SparkFactDistinct.java:650) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:800) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:800) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:756) {code} we should put kylin.properties in the execution environment of spark application (via --files) to fix this problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-4321) Create fact distinct columns using spark by default when build engine is spark
[ https://issues.apache.org/jira/browse/KYLIN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010215#comment-17010215 ] weibin0516 edited comment on KYLIN-4321 at 1/8/20 12:49 AM: Past experience and a large amount of test data show that Spark's performance is significantly better than Hive(MapReduce). The following pictures are the test result of spark and hive on tpc-ds !screenshot-2.png! !screenshot-1.png! Currently, when the cube is built with the spark engine, the `Create fact distinct columns` step uses mapreduce by default. Here we want to use the spark engine to perform this step by default, that is, modify the` kylin.engine.spark-fact-distinct` value to true. was (Author: codingforfun): Past experience and a large amount of test data show that Spark's performance is significantly better than Hive(MapReduce). !screenshot-2.png! !screenshot-1.png! Currently, when the cube is built with the spark engine, the `Create fact distinct columns` step uses mapreduce by default. Here we want to use the spark engine to perform this step by default, that is, modify the` kylin.engine.spark-fact-distinct` value to true. > Create fact distinct columns using spark by default when build engine is spark > -- > > Key: KYLIN-4321 > URL: https://issues.apache.org/jira/browse/KYLIN-4321 > Project: Kylin > Issue Type: Improvement >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > Fix For: v3.1.0 > > Attachments: screenshot-1.png, screenshot-2.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-4321) Create fact distinct columns using spark by default when build engine is spark
[ https://issues.apache.org/jira/browse/KYLIN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010215#comment-17010215 ] weibin0516 edited comment on KYLIN-4321 at 1/8/20 12:46 AM: Past experience and a large amount of test data show that Spark's performance is significantly better than Hive(MapReduce). !screenshot-2.png! !screenshot-1.png! Currently, when the cube is built with the spark engine, the `Create fact distinct columns` step uses mapreduce by default. Here we want to use the spark engine to perform this step by default, that is, modify the` kylin.engine.spark-fact-distinct` value to true. was (Author: codingforfun): Past experience and a large amount of test data show that Spark's performance is significantly better than MapReduce. !screenshot-1.png! !screenshot-2.png! Currently, when the cube is built with the spark engine, the `Create fact distinct columns` step uses mapreduce by default. Here we want to use the spark engine to perform this step by default, that is, modify the` kylin.engine.spark-fact-distinct` value to true. > Create fact distinct columns using spark by default when build engine is spark > -- > > Key: KYLIN-4321 > URL: https://issues.apache.org/jira/browse/KYLIN-4321 > Project: Kylin > Issue Type: Improvement >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > Fix For: v3.1.0 > > Attachments: screenshot-1.png, screenshot-2.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4321) Create fact distinct columns using spark by default when build engine is spark
[ https://issues.apache.org/jira/browse/KYLIN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4321: -- Attachment: screenshot-2.png > Create fact distinct columns using spark by default when build engine is spark > -- > > Key: KYLIN-4321 > URL: https://issues.apache.org/jira/browse/KYLIN-4321 > Project: Kylin > Issue Type: Improvement >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > Fix For: v3.1.0 > > Attachments: screenshot-1.png, screenshot-2.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4321) Create fact distinct columns using spark by default when build engine is spark
[ https://issues.apache.org/jira/browse/KYLIN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010215#comment-17010215 ] weibin0516 commented on KYLIN-4321: --- Past experience and a large amount of test data show that Spark's performance is significantly better than MapReduce. !screenshot-1.png! !screenshot-2.png! Currently, when the cube is built with the spark engine, the `Create fact distinct columns` step uses mapreduce by default. Here we want to use the spark engine to perform this step by default, that is, modify the` kylin.engine.spark-fact-distinct` value to true. > Create fact distinct columns using spark by default when build engine is spark > -- > > Key: KYLIN-4321 > URL: https://issues.apache.org/jira/browse/KYLIN-4321 > Project: Kylin > Issue Type: Improvement >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > Fix For: v3.1.0 > > Attachments: screenshot-1.png, screenshot-2.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4321) Create fact distinct columns using spark by default when build engine is spark
[ https://issues.apache.org/jira/browse/KYLIN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4321: -- Attachment: screenshot-1.png > Create fact distinct columns using spark by default when build engine is spark > -- > > Key: KYLIN-4321 > URL: https://issues.apache.org/jira/browse/KYLIN-4321 > Project: Kylin > Issue Type: Improvement >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > Fix For: v3.1.0 > > Attachments: screenshot-1.png, screenshot-2.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4324) User query returns Unknown error
[ https://issues.apache.org/jira/browse/KYLIN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006567#comment-17006567 ] weibin0516 commented on KYLIN-4324: --- Hi, [~bai], Can you describe how this error occurred? > User query returns Unknown error > > > Key: KYLIN-4324 > URL: https://issues.apache.org/jira/browse/KYLIN-4324 > Project: Kylin > Issue Type: Bug >Reporter: 白云松 >Priority: Major > Attachments: 1577935826(1).png > > > !1577935826(1).png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4104) Support multi jdbc pushdown runners to execute query/update
[ https://issues.apache.org/jira/browse/KYLIN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005896#comment-17005896 ] weibin0516 commented on KYLIN-4104: --- Hi, [~shaofengshi], please see previous discussions http://apache-kylin.74782.x6.nabble.com/DISCUSS-Support-multiple-pushdown-query-engines-td13454.html > Support multi jdbc pushdown runners to execute query/update > --- > > Key: KYLIN-4104 > URL: https://issues.apache.org/jira/browse/KYLIN-4104 > Project: Kylin > Issue Type: New Feature >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > > Current (version 3.0.0-SNAPSHOT), kylin support only one kind of pushdown > query engine. In some user's scenario, need pushdown query to mysql, spark > sql,hive etc. > I think kylin need support multiple pushdowns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4321) Create fact distinct columns using spark by default when build engine is spark
[ https://issues.apache.org/jira/browse/KYLIN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4321: -- Summary: Create fact distinct columns using spark by default when build engine is spark (was: Create fact distinct columns by spark when build engine is spark) > Create fact distinct columns using spark by default when build engine is spark > -- > > Key: KYLIN-4321 > URL: https://issues.apache.org/jira/browse/KYLIN-4321 > Project: Kylin > Issue Type: Improvement >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4321) Create fact distinct columns by spark when build engine is spark
weibin0516 created KYLIN-4321: - Summary: Create fact distinct columns by spark when build engine is spark Key: KYLIN-4321 URL: https://issues.apache.org/jira/browse/KYLIN-4321 Project: Kylin Issue Type: Improvement Reporter: weibin0516 Assignee: weibin0516 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4317) Update doc for KYLIN-4104
weibin0516 created KYLIN-4317: - Summary: Update doc for KYLIN-4104 Key: KYLIN-4317 URL: https://issues.apache.org/jira/browse/KYLIN-4317 Project: Kylin Issue Type: Improvement Reporter: weibin0516 Assignee: weibin0516 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4309) One user's mailbox is not suffixed and other messages cannot be sent
[ https://issues.apache.org/jira/browse/KYLIN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-4309: - Assignee: weibin0516 > One user's mailbox is not suffixed and other messages cannot be sent > > > Key: KYLIN-4309 > URL: https://issues.apache.org/jira/browse/KYLIN-4309 > Project: Kylin > Issue Type: Bug >Reporter: 白云松 >Assignee: weibin0516 >Priority: Major > Attachments: 1576811897(1).png > > > !1576811897(1).png! > One user's mailbox is not suffixed and other messages cannot be sent > {code} > org.apache.commons.mail.EmailException: javax.mail.internet.AddressException: > Missing final '@domain' in string ``baiyunsong'' > at org.apache.commons.mail.Email.createInternetAddress(Email.java:1974) > at org.apache.commons.mail.Email.addTo(Email.java:846) > at org.apache.commons.mail.Email.addTo(Email.java:829) > at org.apache.commons.mail.Email.addTo(Email.java:778) > at > org.apache.kylin.common.util.MailService.sendMailInternal(MailService.java:136) > at > org.apache.kylin.common.util.MailService.sendMail(MailService.java:107) > at > org.apache.kylin.common.util.MailService.sendMail(MailService.java:76) > at > org.apache.kylin.job.execution.AbstractExecutable.doSendMail(AbstractExecutable.java:381) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-4309) One user's mailbox is not suffixed and other messages cannot be sent
[ https://issues.apache.org/jira/browse/KYLIN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000596#comment-17000596 ] weibin0516 edited comment on KYLIN-4309 at 12/20/19 3:49 AM: - Hi, [~bai], this is indeed a bug, we should verify when creating the model / cube, or skip the wrong mailbox when sending (record the error log). I prefer the latter, if you don't mind, I can fix this bug. was (Author: codingforfun): Hi, [~bai], this is indeed a bug, we should verify when creating the model / cube, or skip the wrong mailbox when sending (record the error log) > One user's mailbox is not suffixed and other messages cannot be sent > > > Key: KYLIN-4309 > URL: https://issues.apache.org/jira/browse/KYLIN-4309 > Project: Kylin > Issue Type: Bug >Reporter: 白云松 >Priority: Major > Attachments: 1576811897(1).png > > > !1576811897(1).png! > One user's mailbox is not suffixed and other messages cannot be sent > {code} > org.apache.commons.mail.EmailException: javax.mail.internet.AddressException: > Missing final '@domain' in string ``baiyunsong'' > at org.apache.commons.mail.Email.createInternetAddress(Email.java:1974) > at org.apache.commons.mail.Email.addTo(Email.java:846) > at org.apache.commons.mail.Email.addTo(Email.java:829) > at org.apache.commons.mail.Email.addTo(Email.java:778) > at > org.apache.kylin.common.util.MailService.sendMailInternal(MailService.java:136) > at > org.apache.kylin.common.util.MailService.sendMail(MailService.java:107) > at > org.apache.kylin.common.util.MailService.sendMail(MailService.java:76) > at > org.apache.kylin.job.execution.AbstractExecutable.doSendMail(AbstractExecutable.java:381) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4309) One user's mailbox is not suffixed and other messages cannot be sent
[ https://issues.apache.org/jira/browse/KYLIN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000596#comment-17000596 ] weibin0516 commented on KYLIN-4309: --- Hi, [~bai], this is indeed a bug, we should verify when creating the model / cube, or skip the wrong mailbox when sending (record the error log) > One user's mailbox is not suffixed and other messages cannot be sent > > > Key: KYLIN-4309 > URL: https://issues.apache.org/jira/browse/KYLIN-4309 > Project: Kylin > Issue Type: Bug >Reporter: 白云松 >Priority: Major > Attachments: 1576811897(1).png > > > !1576811897(1).png! > One user's mailbox is not suffixed and other messages cannot be sent > {code} > org.apache.commons.mail.EmailException: javax.mail.internet.AddressException: > Missing final '@domain' in string ``baiyunsong'' > at org.apache.commons.mail.Email.createInternetAddress(Email.java:1974) > at org.apache.commons.mail.Email.addTo(Email.java:846) > at org.apache.commons.mail.Email.addTo(Email.java:829) > at org.apache.commons.mail.Email.addTo(Email.java:778) > at > org.apache.kylin.common.util.MailService.sendMailInternal(MailService.java:136) > at > org.apache.kylin.common.util.MailService.sendMail(MailService.java:107) > at > org.apache.kylin.common.util.MailService.sendMail(MailService.java:76) > at > org.apache.kylin.job.execution.AbstractExecutable.doSendMail(AbstractExecutable.java:381) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4308) Make kylin.sh tips clearer and more explicit
weibin0516 created KYLIN-4308: - Summary: Make kylin.sh tips clearer and more explicit Key: KYLIN-4308 URL: https://issues.apache.org/jira/browse/KYLIN-4308 Project: Kylin Issue Type: Improvement Reporter: weibin0516 Assignee: weibin0516 When the streaming receiver process is running, start another one, and the error is not clear enough {code:java} [root@ca301c60c3d6 bin]# ./kylin.sh streaming start Kylin is running, stop it first {code} When stopping the streaming receiver process, `Stopping Kylin...` should not be displayed and it should be pointed out exactly as streaming receiver stopped {code:java} [root@ca301c60c3d6 bin]# ./kylin.sh streaming stop stopping streaming:20404 Stopping Kylin: 20404 Kylin with pid 20404 has been stopped. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4303) Fix the bug that HBaseAdmin is not closed properly
weibin0516 created KYLIN-4303: - Summary: Fix the bug that HBaseAdmin is not closed properly Key: KYLIN-4303 URL: https://issues.apache.org/jira/browse/KYLIN-4303 Project: Kylin Issue Type: Bug Reporter: weibin0516 Assignee: weibin0516 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4302) Fix the bug that InputStream is not closed properly
weibin0516 created KYLIN-4302: - Summary: Fix the bug that InputStream is not closed properly Key: KYLIN-4302 URL: https://issues.apache.org/jira/browse/KYLIN-4302 Project: Kylin Issue Type: Bug Reporter: weibin0516 Assignee: weibin0516 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (KYLIN-4078) Fix DefaultSchedulerTest.testMetaStoreRecover unit test fail
[ https://issues.apache.org/jira/browse/KYLIN-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 closed KYLIN-4078. - Resolution: Invalid > Fix DefaultSchedulerTest.testMetaStoreRecover unit test fail > > > Key: KYLIN-4078 > URL: https://issues.apache.org/jira/browse/KYLIN-4078 > Project: Kylin > Issue Type: Test >Affects Versions: v3.0.0 >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > Attachments: error.png > > > When run `mvn clean test` got error as follow: > {code:java} > [INFO] > [ERROR] Errors: > [ERROR] > DefaultSchedulerTest.testMetaStoreRecover:189->BaseSchedulerTest.waitForJobFinish:107 > » Runtime > [INFO] > [ERROR] Tests run: 28, Failures: 0, Errors: 1, Skipped: 2 > [INFO] > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Kylin 3.0.0-SNAPSHOT SUCCESS [ 4.856 > s] > [INFO] Apache Kylin - Core Common . SUCCESS [ 32.858 > s] > [INFO] Apache Kylin - Core Metadata ... SUCCESS [ 59.055 > s] > [INFO] Apache Kylin - Core Dictionary . SUCCESS [03:55 > min] > [INFO] Apache Kylin - Core Cube ... SUCCESS [02:34 > min] > [INFO] Apache Kylin - Core Metrics SUCCESS [ 2.071 > s] > [INFO] Apache Kylin - Core Job FAILURE [02:33 > min] > [INFO] Apache Kylin - Core Storage SKIPPED > [INFO] Apache Kylin - Stream Core . SKIPPED > [INFO] Apache Kylin - MapReduce Engine SKIPPED > [INFO] Apache Kylin - Spark Engine SKIPPED > [INFO] Apache Kylin - Hive Source . SKIPPED > [INFO] Apache Kylin - DataSource SDK .. SKIPPED > [INFO] Apache Kylin - Jdbc Source . SKIPPED > [INFO] Apache Kylin - Kafka Source SKIPPED > [INFO] Apache Kylin - Cache ... SKIPPED > [INFO] Apache Kylin - HBase Storage ... SKIPPED > [INFO] Apache Kylin - Query ... SKIPPED > [INFO] Apache Kylin - Metrics Reporter Hive ... SKIPPED > [INFO] Apache Kylin - Metrics Reporter Kafka .. SKIPPED > [INFO] Apache Kylin - Stream Source Kafka . SKIPPED > [INFO] Apache Kylin - Stream Coordinator .. SKIPPED > [INFO] Apache Kylin - Stream Receiver . SKIPPED > [INFO] Apache Kylin - Stream Storage .. SKIPPED > [INFO] Apache Kylin - REST Server Base SKIPPED > [INFO] Apache Kylin - REST Server . SKIPPED > [INFO] Apache Kylin - JDBC Driver . SKIPPED > [INFO] Apache Kylin - Assembly SKIPPED > [INFO] Apache Kylin - Tool SKIPPED > [INFO] Apache Kylin - Tool Assembly ... SKIPPED > [INFO] Apache Kylin - Integration Test SKIPPED > [INFO] Apache Kylin - Tomcat Extension 3.0.0-SNAPSHOT . SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 10:42 min > [INFO] Finished at: 2019-07-12T08:59:26+08:00 > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on > project kylin-core-job: There are test failures. > [ERROR] > [ERROR] Please refer to > /Users/zhuweibin/ant_code/OpenSource/kylin/core-job/../target/surefire-reports > for the individual test results. > [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, > [date].dumpstream and [date]-jvmRun[N].dumpstream. > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :kylin-core-job > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4296) When an exception occurs, record detailed stack information to the log
weibin0516 created KYLIN-4296: - Summary: When an exception occurs, record detailed stack information to the log Key: KYLIN-4296 URL: https://issues.apache.org/jira/browse/KYLIN-4296 Project: Kylin Issue Type: Improvement Reporter: weibin0516 Assignee: weibin0516 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4272) problems of docker/build_image.sh
[ https://issues.apache.org/jira/browse/KYLIN-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-4272: - Assignee: (was: weibin0516) > problems of docker/build_image.sh > - > > Key: KYLIN-4272 > URL: https://issues.apache.org/jira/browse/KYLIN-4272 > Project: Kylin > Issue Type: Bug >Reporter: ZhouKang >Priority: Major > > this script can only be executed in sub dir "docker", if your want to execute > as following, some error got: > {code:java} > // code placeholder > bash docker/build_image.sh{code} > And, the source code's dir name must be kylin, you cannot use other name. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (KYLIN-4272) problems of docker/build_image.sh
[ https://issues.apache.org/jira/browse/KYLIN-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4272: -- Comment: was deleted (was: Hi, [~zhoukangcn], thanks for the feedback, I will fix this bug.) > problems of docker/build_image.sh > - > > Key: KYLIN-4272 > URL: https://issues.apache.org/jira/browse/KYLIN-4272 > Project: Kylin > Issue Type: Bug >Reporter: ZhouKang >Assignee: weibin0516 >Priority: Major > > this script can only be executed in sub dir "docker", if your want to execute > as following, some error got: > {code:java} > // code placeholder > bash docker/build_image.sh{code} > And, the source code's dir name must be kylin, you cannot use other name. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4272) problems of docker/build_image.sh
[ https://issues.apache.org/jira/browse/KYLIN-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983275#comment-16983275 ] weibin0516 commented on KYLIN-4272: --- Hi, [~zhoukangcn], thanks for the feedback, I will fix this bug. > problems of docker/build_image.sh > - > > Key: KYLIN-4272 > URL: https://issues.apache.org/jira/browse/KYLIN-4272 > Project: Kylin > Issue Type: Bug >Reporter: ZhouKang >Assignee: weibin0516 >Priority: Major > > this script can only be executed in sub dir "docker", if your want to execute > as following, some error got: > {code:java} > // code placeholder > bash docker/build_image.sh{code} > And, the source code's dir name must be kylin, you cannot use other name. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4272) problems of docker/build_image.sh
[ https://issues.apache.org/jira/browse/KYLIN-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-4272: - Assignee: weibin0516 > problems of docker/build_image.sh > - > > Key: KYLIN-4272 > URL: https://issues.apache.org/jira/browse/KYLIN-4272 > Project: Kylin > Issue Type: Bug >Reporter: ZhouKang >Assignee: weibin0516 >Priority: Major > > this script can only be executed in sub dir "docker", if your want to execute > as following, some error got: > {code:java} > // code placeholder > bash docker/build_image.sh{code} > And, the source code's dir name must be kylin, you cannot use other name. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4008) Real-time Streaming submit streaming job failed for spark engine
[ https://issues.apache.org/jira/browse/KYLIN-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-4008: - Assignee: weibin0516 > Real-time Streaming submit streaming job failed for spark engine > > > Key: KYLIN-4008 > URL: https://issues.apache.org/jira/browse/KYLIN-4008 > Project: Kylin > Issue Type: New Feature > Components: Real-time Streaming >Affects Versions: v3.0.0-alpha >Reporter: zengrui >Assignee: weibin0516 >Priority: Minor > Attachments: error.bmp > > > Create a Realtime Streaming Cube and the Cube Engine is Spark, when the > coordinator node receive a remoteStoreCompelete request and exist some > segments can build, the streaming job submit failed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4255) Display detailed error message when using livy build error
[ https://issues.apache.org/jira/browse/KYLIN-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4255: -- Description: Currently, when using livy build error, the error message does not show the detailed error reason, which is not conducive to troubleshooting. The following is an example of submit build job error: Current error message: {code:java} java.lang.RuntimeException: livy execute failed. livy get status failed. state is dead at org.apache.kylin.common.livy.LivyRestExecutor.execute(LivyRestExecutor.java:76) at org.apache.kylin.source.hive.MRHiveDictUtil.runLivySqlJob(MRHiveDictUtil.java:144) at org.apache.kylin.source.hive.CreateFlatHiveTableByLivyStep.createFlatHiveTable(CreateFlatHiveTableByLivyStep.java:44) at org.apache.kylin.source.hive.CreateFlatHiveTableByLivyStep.doWork(CreateFlatHiveTableByLivyStep.java:51) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} Actual reason for the error: {code:java} 2019-11-13 07:40:02 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception in thread "main" java.io.FileNotFoundException: File hdfs://localhost:9000/kylin/livy/hbase-client-$HBASE_VERSION.jar does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:697) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751) at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:727) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:695) at org.apache.spark.deploy.DependencyUtils$.downloadFile(DependencyUtils.scala:135) at org.apache.spark.deploy.DependencyUtils$$anonfun$downloadFileList$2.apply(DependencyUtils.scala:102) at org.apache.spark.deploy.DependencyUtils$$anonfun$downloadFileList$2.apply(DependencyUtils.scala:102) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) {code} > Display detailed error message when using livy build error > -- > > Key: KYLIN-4255 > URL: https://issues.apache.org/jira/browse/KYLIN-4255 > Project: Kylin > Issue Type: Improvement > Components: Spark Engine >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > > Currently, when using livy build error, the error message does not show the > detailed error reason, which is not conducive to troubleshooting. The > following is an example of submit build job error: > Current error message: > {code:java} > java.lang.RuntimeException: livy execute failed. > livy get status failed. state is dead > at > org.apache.kylin.common.livy.LivyRestExecutor.execute(LivyRestExecutor.java:76) > at > org.apache.kylin.source.hive.MRHiveDictUtil.runLivySqlJob(MRHiveDictUtil.java:144) > at > org.apache.kylin.source.hive.CreateFlatHiveTableByLivyStep.createFlatHiveTable(CreateFlatHiveTableByLivyStep.java:44) > at > org.apache.kylin.source.hive.CreateFlatHiveTableByLivyStep.doWork(CreateFlatHiveTableByLivyStep.java:51) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Wor
[jira] [Updated] (KYLIN-4255) Display detailed error message when using livy build error
[ https://issues.apache.org/jira/browse/KYLIN-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4255: -- Attachment: (was: screenshot-1.png) > Display detailed error message when using livy build error > -- > > Key: KYLIN-4255 > URL: https://issues.apache.org/jira/browse/KYLIN-4255 > Project: Kylin > Issue Type: Improvement > Components: Spark Engine >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4255) Display detailed error message when using livy build error
[ https://issues.apache.org/jira/browse/KYLIN-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4255: -- Attachment: screenshot-1.png > Display detailed error message when using livy build error > -- > > Key: KYLIN-4255 > URL: https://issues.apache.org/jira/browse/KYLIN-4255 > Project: Kylin > Issue Type: Improvement > Components: Spark Engine >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4255) Display detailed error message when using livy build error
[ https://issues.apache.org/jira/browse/KYLIN-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4255: -- Description: (was: Currently, when using livy build error, the error message does not show the detailed error reason, which is not conducive to troubleshooting. Here are two examples: ### submit build job failed ### ) > Display detailed error message when using livy build error > -- > > Key: KYLIN-4255 > URL: https://issues.apache.org/jira/browse/KYLIN-4255 > Project: Kylin > Issue Type: Improvement > Components: Spark Engine >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4255) Display detailed error message when using livy build error
[ https://issues.apache.org/jira/browse/KYLIN-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4255: -- Description: Currently, when using livy build error, the error message does not show the detailed error reason, which is not conducive to troubleshooting. Here are two examples: ### submit build job failed ### > Display detailed error message when using livy build error > -- > > Key: KYLIN-4255 > URL: https://issues.apache.org/jira/browse/KYLIN-4255 > Project: Kylin > Issue Type: Improvement > Components: Spark Engine >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > > Currently, when using livy build error, the error message does not show the > detailed error reason, which is not conducive to troubleshooting. Here are > two examples: > ### submit build job failed > ### -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4255) Display detailed error message when using livy build error
weibin0516 created KYLIN-4255: - Summary: Display detailed error message when using livy build error Key: KYLIN-4255 URL: https://issues.apache.org/jira/browse/KYLIN-4255 Project: Kylin Issue Type: Improvement Components: Spark Engine Reporter: weibin0516 Assignee: weibin0516 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4251) Add livy to docker
weibin0516 created KYLIN-4251: - Summary: Add livy to docker Key: KYLIN-4251 URL: https://issues.apache.org/jira/browse/KYLIN-4251 Project: Kylin Issue Type: Improvement Components: Environment Reporter: weibin0516 Assignee: weibin0516 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4224) Create flat table wich spark sql
[ https://issues.apache.org/jira/browse/KYLIN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972006#comment-16972006 ] weibin0516 commented on KYLIN-4224: --- Hi [~shaofengshi], thanks for reminding, this feature is used to support reading the spark sql datasource to build a flat table, which is different from what you mentioned. > Create flat table wich spark sql > > > Key: KYLIN-4224 > URL: https://issues.apache.org/jira/browse/KYLIN-4224 > Project: Kylin > Issue Type: Sub-task >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > > Spark SQL datasource jira is https://issues.apache.org/jira/browse/KYLIN-741. > Currently using hive to create flat table, hive can't read spark datasource > data, we need to support the creation of flat table with spark sql, because > it can read hive and spark datasource data at the same time to create flat > table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4224) Create flat table wich spark sql
[ https://issues.apache.org/jira/browse/KYLIN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-4224: - Assignee: weibin0516 (was: hailin.huang) > Create flat table wich spark sql > > > Key: KYLIN-4224 > URL: https://issues.apache.org/jira/browse/KYLIN-4224 > Project: Kylin > Issue Type: Sub-task >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > > Spark SQL datasource jira is https://issues.apache.org/jira/browse/KYLIN-741. > Currently using hive to create flat table, hive can't read spark datasource > data, we need to support the creation of flat table with spark sql, because > it can read hive and spark datasource data at the same time to create flat > table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4224) Create flat table wich spark sql
[ https://issues.apache.org/jira/browse/KYLIN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4224: -- Description: Spark SQL datasource jira is https://issues.apache.org/jira/browse/KYLIN-741. Currently using hive to create flat table, hive can't read spark datasource data, we need to support the creation of flat table with spark sql, because it can read hive and spark datasource data at the same time to create flat table. > Create flat table wich spark sql > > > Key: KYLIN-4224 > URL: https://issues.apache.org/jira/browse/KYLIN-4224 > Project: Kylin > Issue Type: Sub-task >Reporter: weibin0516 >Assignee: hailin.huang >Priority: Major > > Spark SQL datasource jira is https://issues.apache.org/jira/browse/KYLIN-741. > Currently using hive to create flat table, hive can't read spark datasource > data, we need to support the creation of flat table with spark sql, because > it can read hive and spark datasource data at the same time to create flat > table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4104) Support multi jdbc pushdown runners to execute query/update
[ https://issues.apache.org/jira/browse/KYLIN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4104: -- Summary: Support multi jdbc pushdown runners to execute query/update (was: Support multi kinds of pushdown query engines) > Support multi jdbc pushdown runners to execute query/update > --- > > Key: KYLIN-4104 > URL: https://issues.apache.org/jira/browse/KYLIN-4104 > Project: Kylin > Issue Type: New Feature >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > > Current (version 3.0.0-SNAPSHOT), kylin support only one kind of pushdown > query engine. In some user's scenario, need pushdown query to mysql, spark > sql,hive etc. > I think kylin need support multiple pushdowns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4217) Calcite rel to Spark plan
[ https://issues.apache.org/jira/browse/KYLIN-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-4217: - Assignee: (was: weibin0516) > Calcite rel to Spark plan > - > > Key: KYLIN-4217 > URL: https://issues.apache.org/jira/browse/KYLIN-4217 > Project: Kylin > Issue Type: Sub-task > Components: Query Engine >Reporter: yiming.xu >Priority: Major > > Transform calcite rel to spark plan to implement distributed computing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4224) Create flat table wich spark sql
[ https://issues.apache.org/jira/browse/KYLIN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969819#comment-16969819 ] weibin0516 commented on KYLIN-4224: --- Hi, [~aahi], spark sql datasource jira is https://issues.apache.org/jira/browse/KYLIN-741, i have basically completed the development, these two days will push pr to github, welcome to review. > Create flat table wich spark sql > > > Key: KYLIN-4224 > URL: https://issues.apache.org/jira/browse/KYLIN-4224 > Project: Kylin > Issue Type: Sub-task >Reporter: weibin0516 >Assignee: hailin.huang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4224) Create flat table wich spark sql
weibin0516 created KYLIN-4224: - Summary: Create flat table wich spark sql Key: KYLIN-4224 URL: https://issues.apache.org/jira/browse/KYLIN-4224 Project: Kylin Issue Type: Sub-task Reporter: weibin0516 Assignee: weibin0516 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4213) The new build engine with Spark-SQL
[ https://issues.apache.org/jira/browse/KYLIN-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962071#comment-16962071 ] weibin0516 commented on KYLIN-4213: --- Great feature, this will bring significant performance improvementsss. > The new build engine with Spark-SQL > --- > > Key: KYLIN-4213 > URL: https://issues.apache.org/jira/browse/KYLIN-4213 > Project: Kylin > Issue Type: New Feature > Components: Job Engine >Affects Versions: Future >Reporter: yiming.xu >Assignee: yiming.xu >Priority: Major > > 1. Use Spark-SQL to compute cuboid, build cuboid A, B, C , Sum(D) is sql > "select A B C Sum(D) from table group by A, B, C". > 2. To void many memory errors or other exceptions, we can auto set spark conf > with build job.E.g use adaptive execution. > 3. The snapshot table will save a table with parquet format. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4217) Calcite rel to Spark plan
[ https://issues.apache.org/jira/browse/KYLIN-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962058#comment-16962058 ] weibin0516 commented on KYLIN-4217: --- I have done similar things in our project and I am interested in doing this feature. > Calcite rel to Spark plan > - > > Key: KYLIN-4217 > URL: https://issues.apache.org/jira/browse/KYLIN-4217 > Project: Kylin > Issue Type: Sub-task > Components: Query Engine >Reporter: yiming.xu >Assignee: weibin0516 >Priority: Major > > Transform calcite rel to spark plan to implement distributed computing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KYLIN-4217) Calcite rel to Spark plan
[ https://issues.apache.org/jira/browse/KYLIN-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-4217: - Assignee: weibin0516 > Calcite rel to Spark plan > - > > Key: KYLIN-4217 > URL: https://issues.apache.org/jira/browse/KYLIN-4217 > Project: Kylin > Issue Type: Sub-task > Components: Query Engine >Reporter: yiming.xu >Assignee: weibin0516 >Priority: Major > > Transform calcite rel to spark plan to implement distributed computing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-741) Read data from SparkSQL
[ https://issues.apache.org/jira/browse/KYLIN-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927218#comment-16927218 ] weibin0516 commented on KYLIN-741: -- Sounds useful, Spark sql itself already supports a wide variety of data sources and is highly extensible, which helps Kylin read various data sources to build cubes. I will try to achieve this feature. > Read data from SparkSQL > --- > > Key: KYLIN-741 > URL: https://issues.apache.org/jira/browse/KYLIN-741 > Project: Kylin > Issue Type: New Feature > Components: Job Engine, Spark Engine >Reporter: Luke Han >Assignee: weibin0516 >Priority: Major > Labels: scope > Fix For: Backlog > > > Read data from SparkSQL directly. > There are some instances enabled SparkSQL interface for data consuming, it > will be great if Kylin could read data directly from SparkSQL. > This feature does not require Spark Cube Build Engine to be ready. It could > continue to leverage existing MR cube build engine and process data on Hadoop > cluster then persistent cube to HBase. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (KYLIN-741) Read data from SparkSQL
[ https://issues.apache.org/jira/browse/KYLIN-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-741: Assignee: weibin0516 (was: Dong Li) > Read data from SparkSQL > --- > > Key: KYLIN-741 > URL: https://issues.apache.org/jira/browse/KYLIN-741 > Project: Kylin > Issue Type: New Feature > Components: Job Engine, Spark Engine >Reporter: Luke Han >Assignee: weibin0516 >Priority: Major > Labels: scope > Fix For: Backlog > > > Read data from SparkSQL directly. > There are some instances enabled SparkSQL interface for data consuming, it > will be great if Kylin could read data directly from SparkSQL. > This feature does not require Spark Cube Build Engine to be ready. It could > continue to leverage existing MR cube build engine and process data on Hadoop > cluster then persistent cube to HBase. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (KYLIN-4068) Automatically add limit has bug
[ https://issues.apache.org/jira/browse/KYLIN-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 resolved KYLIN-4068. --- Resolution: Fixed > Automatically add limit has bug > --- > > Key: KYLIN-4068 > URL: https://issues.apache.org/jira/browse/KYLIN-4068 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.6.2 >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > Fix For: v3.0.0-alpha2 > > > {code:sql} > SELECT E_Name FROM Employees_China > UNION > SELECT E_Name FROM Employees_USA > {code} > will convert to > {code:sql} > SELECT E_Name FROM Employees_China > UNION > SELECT E_Name FROM Employees_USA > LIMIT 5 > {code} > This limit is not working on the result of union, but on SELECT E_Name FROM > Employees_USA. > We should use a more secure way to achieve the limit effect. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (KYLIN-4068) Automatically add limit has bug
[ https://issues.apache.org/jira/browse/KYLIN-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4068: -- Fix Version/s: v3.0.0-alpha2 Description: {code:sql} SELECT E_Name FROM Employees_China UNION SELECT E_Name FROM Employees_USA {code} will convert to {code:sql} SELECT E_Name FROM Employees_China UNION SELECT E_Name FROM Employees_USA LIMIT 5 {code} This limit is not working on the result of union, but on SELECT E_Name FROM Employees_USA. We should use a more secure way to achieve the limit effect. was: {code:sql} SELECT E_Name FROM Employees_China UNION SELECT E_Name FROM Employees_USA {code} will convert to {code:sql} SELECT E_Name FROM Employees_China UNION SELECT E_Name FROM Employees_USA LIMIT 5 {code} This limit is not working on the result of union, but on SELECT E_Name FROM Employees_USA. We should use a more secure way to achieve the limit effect. > Automatically add limit has bug > --- > > Key: KYLIN-4068 > URL: https://issues.apache.org/jira/browse/KYLIN-4068 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.6.2 >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > Fix For: v3.0.0-alpha2 > > > {code:sql} > SELECT E_Name FROM Employees_China > UNION > SELECT E_Name FROM Employees_USA > {code} > will convert to > {code:sql} > SELECT E_Name FROM Employees_China > UNION > SELECT E_Name FROM Employees_USA > LIMIT 5 > {code} > This limit is not working on the result of union, but on SELECT E_Name FROM > Employees_USA. > We should use a more secure way to achieve the limit effect. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (KYLIN-4150) Improve docker for kylin instructions
weibin0516 created KYLIN-4150: - Summary: Improve docker for kylin instructions Key: KYLIN-4150 URL: https://issues.apache.org/jira/browse/KYLIN-4150 Project: Kylin Issue Type: Improvement Reporter: weibin0516 Assignee: weibin0516 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (KYLIN-4146) Add doc for KYLIN-4114
weibin0516 created KYLIN-4146: - Summary: Add doc for KYLIN-4114 Key: KYLIN-4146 URL: https://issues.apache.org/jira/browse/KYLIN-4146 Project: Kylin Issue Type: Improvement Reporter: weibin0516 Assignee: weibin0516 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (KYLIN-4129) Remove useless code
[ https://issues.apache.org/jira/browse/KYLIN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4129: -- Summary: Remove useless code (was: Remove never useless code) > Remove useless code > --- > > Key: KYLIN-4129 > URL: https://issues.apache.org/jira/browse/KYLIN-4129 > Project: Kylin > Issue Type: Improvement >Affects Versions: v3.0.0-alpha2 >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KYLIN-4129) Remove never useless code
weibin0516 created KYLIN-4129: - Summary: Remove never useless code Key: KYLIN-4129 URL: https://issues.apache.org/jira/browse/KYLIN-4129 Project: Kylin Issue Type: Improvement Affects Versions: v3.0.0-alpha2 Reporter: weibin0516 Assignee: weibin0516 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KYLIN-4127) Remove never called classes
[ https://issues.apache.org/jira/browse/KYLIN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4127: -- Description: I found some classes never called by FindBugs plugin, we should remove these to make code more clean. was: Class {code:java} org.apache.kylin.stream.core.storage.StreamingCubeSegment.SegmentInfo {code} is never used. We should remove it. Summary: Remove never called classes (was: Delete unused class org.apache.kylin.stream.core.storage.StreamingCubeSegment.SegmentInfo) > Remove never called classes > --- > > Key: KYLIN-4127 > URL: https://issues.apache.org/jira/browse/KYLIN-4127 > Project: Kylin > Issue Type: Improvement >Affects Versions: v3.0.0-alpha2 >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Minor > > I found some classes never called by FindBugs plugin, we should remove these > to make code more clean. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KYLIN-4128) Remove never called methods
weibin0516 created KYLIN-4128: - Summary: Remove never called methods Key: KYLIN-4128 URL: https://issues.apache.org/jira/browse/KYLIN-4128 Project: Kylin Issue Type: Improvement Affects Versions: v3.0.0-alpha2 Reporter: weibin0516 Assignee: weibin0516 I found some methods never called by *FindBugs* plugin, we should remove these to make code more clean. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KYLIN-4127) Delete unused class org.apache.kylin.stream.core.storage.StreamingCubeSegment.SegmentInfo
weibin0516 created KYLIN-4127: - Summary: Delete unused class org.apache.kylin.stream.core.storage.StreamingCubeSegment.SegmentInfo Key: KYLIN-4127 URL: https://issues.apache.org/jira/browse/KYLIN-4127 Project: Kylin Issue Type: Improvement Affects Versions: v3.0.0-alpha2 Reporter: weibin0516 Assignee: weibin0516 Class {code:java} org.apache.kylin.stream.core.storage.StreamingCubeSegment.SegmentInfo {code} is never used. We should remove it. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KYLIN-4116) Package fail due to lack of dependency objenesis
[ https://issues.apache.org/jira/browse/KYLIN-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4116: -- Description: Execute the command {code:java} build/script/package.sh {code} got error below due to lack of dependency objenesis: {code:java} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile (default-compile) on project kylin-core-metadata: Compilation failure: Compilation failure: [ERROR] /home/admin/kylin_sourcecode/core-metadata/src/main/java/org/apache/kylin/util/KryoUtils.java:[24,29] error: package org.objenesis.strategy does not exist [ERROR] /home/admin/kylin_sourcecode/core-metadata/src/main/java/org/apache/kylin/util/KryoUtils.java:[59,82] error: cannot find symbol [ERROR] symbol: class StdInstantiatorStrategy [ERROR] location: class KryoUtils [ERROR] /home/admin/kylin_sourcecode/core-metadata/src/main/java/org/apache/kylin/util/KryoUtils.java:[59,16] error: cannot access InstantiatorStrategy [ERROR] -> [Help 1] {code} was: Execute the command {code:shell} build/script/package.sh {code} got error below due to lack of dependency objenesis: {code:java} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile (default-compile) on project kylin-core-metadata: Compilation failure: Compilation failure: [ERROR] /home/admin/kylin_sourcecode/core-metadata/src/main/java/org/apache/kylin/util/KryoUtils.java:[24,29] error: package org.objenesis.strategy does not exist [ERROR] /home/admin/kylin_sourcecode/core-metadata/src/main/java/org/apache/kylin/util/KryoUtils.java:[59,82] error: cannot find symbol [ERROR] symbol: class StdInstantiatorStrategy [ERROR] location: class KryoUtils [ERROR] /home/admin/kylin_sourcecode/core-metadata/src/main/java/org/apache/kylin/util/KryoUtils.java:[59,16] error: cannot access InstantiatorStrategy [ERROR] -> [Help 1] {code} > Package fail due to lack of dependency objenesis > > > Key: KYLIN-4116 > URL: https://issues.apache.org/jira/browse/KYLIN-4116 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0 >Reporter: weibin0516 >Assignee: weibin0516 >Priority: Major > > Execute the command > {code:java} > build/script/package.sh > {code} > got error below due to lack of dependency objenesis: > {code:java} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile > (default-compile) on project kylin-core-metadata: Compilation failure: > Compilation failure: > [ERROR] > /home/admin/kylin_sourcecode/core-metadata/src/main/java/org/apache/kylin/util/KryoUtils.java:[24,29] > error: package org.objenesis.strategy does not exist > [ERROR] > /home/admin/kylin_sourcecode/core-metadata/src/main/java/org/apache/kylin/util/KryoUtils.java:[59,82] > error: cannot find symbol > [ERROR] symbol: class StdInstantiatorStrategy > [ERROR] location: class KryoUtils > [ERROR] > /home/admin/kylin_sourcecode/core-metadata/src/main/java/org/apache/kylin/util/KryoUtils.java:[59,16] > error: cannot access InstantiatorStrategy > [ERROR] -> [Help 1] > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KYLIN-4116) Package fail due to lack of dependency objenesis
weibin0516 created KYLIN-4116: - Summary: Package fail due to lack of dependency objenesis Key: KYLIN-4116 URL: https://issues.apache.org/jira/browse/KYLIN-4116 Project: Kylin Issue Type: Bug Affects Versions: v3.0.0 Reporter: weibin0516 Assignee: weibin0516 Execute the command {code:shell} build/script/package.sh {code} got error below due to lack of dependency objenesis: {code:java} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile (default-compile) on project kylin-core-metadata: Compilation failure: Compilation failure: [ERROR] /home/admin/kylin_sourcecode/core-metadata/src/main/java/org/apache/kylin/util/KryoUtils.java:[24,29] error: package org.objenesis.strategy does not exist [ERROR] /home/admin/kylin_sourcecode/core-metadata/src/main/java/org/apache/kylin/util/KryoUtils.java:[59,82] error: cannot find symbol [ERROR] symbol: class StdInstantiatorStrategy [ERROR] location: class KryoUtils [ERROR] /home/admin/kylin_sourcecode/core-metadata/src/main/java/org/apache/kylin/util/KryoUtils.java:[59,16] error: cannot access InstantiatorStrategy [ERROR] -> [Help 1] {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KYLIN-4107) StorageCleanupJob fails to delete Hive tables with "Argument list too long" error
[ https://issues.apache.org/jira/browse/KYLIN-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894630#comment-16894630 ] weibin0516 commented on KYLIN-4107: --- pr: https://github.com/apache/kylin/pull/776 > StorageCleanupJob fails to delete Hive tables with "Argument list too long" > error > - > > Key: KYLIN-4107 > URL: https://issues.apache.org/jira/browse/KYLIN-4107 > Project: Kylin > Issue Type: Bug > Components: Storage - HBase >Affects Versions: v2.6.2 > Environment: CentOS 7.6, HDP 2.6.5, Kylin 2.6.3 >Reporter: Vsevolod Ostapenko >Assignee: weibin0516 >Priority: Major > Fix For: v3.0.0-beta > > > On a system with multiple Kylin developers that experiment with cube design > and (re)build/drop cube segments often intermediate Hive tables and HBase > left over tables accumulate very quickly. > After a certain point storage cleanup cannot be executed using suggested > method: > {{${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete > true}} > Apparently, storage cleanup job creates a single shell command to drop all > Hive tables, which fails to execute because command line is just too long. > For example: > {quote} > 2019-07-23 17:47:31,611 ERROR [main] job.StorageCleanupJob:377 : Error during > deleting Hive tables > java.io.IOException: Cannot run program "/bin/bash": error=7, Argument list > too long > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) > at > org.apache.kylin.common.util.CliCommandExecutor.runNativeCommand(CliCommandExecutor.java:133) > at > org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:89) > at > org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:83) > at > org.apache.kylin.rest.job.StorageCleanupJob.deleteHiveTables(StorageCleanupJob.java:409) > at > org.apache.kylin.rest.job.StorageCleanupJob.cleanUnusedIntermediateHiveTableInternal(StorageCleanupJob.java:375) > at > org.apache.kylin.rest.job.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:278) > at > org.apache.kylin.rest.job.StorageCleanupJob.cleanup(StorageCleanupJob.java:151) > at > org.apache.kylin.rest.job.StorageCleanupJob.execute(StorageCleanupJob.java:145) > at > org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37) > at org.apache.kylin.tool.StorageCleanupJob.main(StorageCleanupJob.java:27) > Caused by: java.io.IOException: error=7, Argument list too long > at java.lang.UNIXProcess.forkAndExec(Native Method) > at java.lang.UNIXProcess.(UNIXProcess.java:247) > at java.lang.ProcessImpl.start(ProcessImpl.java:134) > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > ... 10 more > {quote} > Instead of composing one long command, storage cleanup need to generate a > script and feed that into beeline or hive CLI. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KYLIN-4107) StorageCleanupJob fails to delete Hive tables with "Argument list too long" error
[ https://issues.apache.org/jira/browse/KYLIN-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-4107: - Assignee: weibin0516 > StorageCleanupJob fails to delete Hive tables with "Argument list too long" > error > - > > Key: KYLIN-4107 > URL: https://issues.apache.org/jira/browse/KYLIN-4107 > Project: Kylin > Issue Type: Bug > Components: Storage - HBase >Affects Versions: v2.6.2 > Environment: CentOS 7.6, HDP 2.6.5, Kylin 2.6.3 >Reporter: Vsevolod Ostapenko >Assignee: weibin0516 >Priority: Major > Fix For: v3.0.0-beta > > > On a system with multiple Kylin developers that experiment with cube design > and (re)build/drop cube segments often intermediate Hive tables and HBase > left over tables accumulate very quickly. > After a certain point storage cleanup cannot be executed using suggested > method: > {{${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete > true}} > Apparently, storage cleanup job creates a single shell command to drop all > Hive tables, which fails to execute because command line is just too long. > For example: > {quote} > 2019-07-23 17:47:31,611 ERROR [main] job.StorageCleanupJob:377 : Error during > deleting Hive tables > java.io.IOException: Cannot run program "/bin/bash": error=7, Argument list > too long > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) > at > org.apache.kylin.common.util.CliCommandExecutor.runNativeCommand(CliCommandExecutor.java:133) > at > org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:89) > at > org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:83) > at > org.apache.kylin.rest.job.StorageCleanupJob.deleteHiveTables(StorageCleanupJob.java:409) > at > org.apache.kylin.rest.job.StorageCleanupJob.cleanUnusedIntermediateHiveTableInternal(StorageCleanupJob.java:375) > at > org.apache.kylin.rest.job.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:278) > at > org.apache.kylin.rest.job.StorageCleanupJob.cleanup(StorageCleanupJob.java:151) > at > org.apache.kylin.rest.job.StorageCleanupJob.execute(StorageCleanupJob.java:145) > at > org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37) > at org.apache.kylin.tool.StorageCleanupJob.main(StorageCleanupJob.java:27) > Caused by: java.io.IOException: error=7, Argument list too long > at java.lang.UNIXProcess.forkAndExec(Native Method) > at java.lang.UNIXProcess.(UNIXProcess.java:247) > at java.lang.ProcessImpl.start(ProcessImpl.java:134) > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > ... 10 more > {quote} > Instead of composing one long command, storage cleanup need to generate a > script and feed that into beeline or hive CLI. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KYLIN-4104) Support multi kinds of pushdown query engines
weibin0516 created KYLIN-4104: - Summary: Support multi kinds of pushdown query engines Key: KYLIN-4104 URL: https://issues.apache.org/jira/browse/KYLIN-4104 Project: Kylin Issue Type: New Feature Reporter: weibin0516 Assignee: weibin0516 Current (version 3.0.0-SNAPSHOT), kylin support only one kind of pushdown query engine. In some user's scenario, need pushdown query to mysql, spark sql,hive etc. I think kylin need support multiple pushdowns. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KYLIN-4010) TimeZone is hard-coded in function makeSegmentName for class CubeSegment
[ https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4010: -- Sprint: Sprint 52 > TimeZone is hard-coded in function makeSegmentName for class CubeSegment > > > Key: KYLIN-4010 > URL: https://issues.apache.org/jira/browse/KYLIN-4010 > Project: Kylin > Issue Type: Improvement > Components: Others >Affects Versions: v3.0.0-alpha >Reporter: zengrui >Assignee: Xiaoxiang Yu >Priority: Minor > Attachments: image-2019-07-15-17-15-31-209.png, > image-2019-07-15-17-17-04-029.png, image-2019-07-15-17-17-39-568.png > > > In Real-Time Streaming Cube when I send some records to kafka topic, the > tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a > segment named 2018123116_2018123117. > Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName > for class CubeSegment. I think that it should be config in kylin.properties. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KYLIN-4010) TimeZone is hard-coded in function makeSegmentName for class CubeSegment
[ https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4010: -- Sprint: (was: Sprint 52) > TimeZone is hard-coded in function makeSegmentName for class CubeSegment > > > Key: KYLIN-4010 > URL: https://issues.apache.org/jira/browse/KYLIN-4010 > Project: Kylin > Issue Type: Improvement > Components: Others >Affects Versions: v3.0.0-alpha >Reporter: zengrui >Assignee: Xiaoxiang Yu >Priority: Minor > Attachments: image-2019-07-15-17-15-31-209.png, > image-2019-07-15-17-17-04-029.png, image-2019-07-15-17-17-39-568.png > > > In Real-Time Streaming Cube when I send some records to kafka topic, the > tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a > segment named 2018123116_2018123117. > Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName > for class CubeSegment. I think that it should be config in kylin.properties. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KYLIN-4078) Fix DefaultSchedulerTest.testMetaStoreRecover unit test fail
weibin0516 created KYLIN-4078: - Summary: Fix DefaultSchedulerTest.testMetaStoreRecover unit test fail Key: KYLIN-4078 URL: https://issues.apache.org/jira/browse/KYLIN-4078 Project: Kylin Issue Type: Test Affects Versions: v3.0.0 Reporter: weibin0516 Assignee: weibin0516 Attachments: error.png When run `mvn clean test` got error as follow: {code:java} [INFO] [ERROR] Errors: [ERROR] DefaultSchedulerTest.testMetaStoreRecover:189->BaseSchedulerTest.waitForJobFinish:107 » Runtime [INFO] [ERROR] Tests run: 28, Failures: 0, Errors: 1, Skipped: 2 [INFO] [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Kylin 3.0.0-SNAPSHOT SUCCESS [ 4.856 s] [INFO] Apache Kylin - Core Common . SUCCESS [ 32.858 s] [INFO] Apache Kylin - Core Metadata ... SUCCESS [ 59.055 s] [INFO] Apache Kylin - Core Dictionary . SUCCESS [03:55 min] [INFO] Apache Kylin - Core Cube ... SUCCESS [02:34 min] [INFO] Apache Kylin - Core Metrics SUCCESS [ 2.071 s] [INFO] Apache Kylin - Core Job FAILURE [02:33 min] [INFO] Apache Kylin - Core Storage SKIPPED [INFO] Apache Kylin - Stream Core . SKIPPED [INFO] Apache Kylin - MapReduce Engine SKIPPED [INFO] Apache Kylin - Spark Engine SKIPPED [INFO] Apache Kylin - Hive Source . SKIPPED [INFO] Apache Kylin - DataSource SDK .. SKIPPED [INFO] Apache Kylin - Jdbc Source . SKIPPED [INFO] Apache Kylin - Kafka Source SKIPPED [INFO] Apache Kylin - Cache ... SKIPPED [INFO] Apache Kylin - HBase Storage ... SKIPPED [INFO] Apache Kylin - Query ... SKIPPED [INFO] Apache Kylin - Metrics Reporter Hive ... SKIPPED [INFO] Apache Kylin - Metrics Reporter Kafka .. SKIPPED [INFO] Apache Kylin - Stream Source Kafka . SKIPPED [INFO] Apache Kylin - Stream Coordinator .. SKIPPED [INFO] Apache Kylin - Stream Receiver . SKIPPED [INFO] Apache Kylin - Stream Storage .. SKIPPED [INFO] Apache Kylin - REST Server Base SKIPPED [INFO] Apache Kylin - REST Server . SKIPPED [INFO] Apache Kylin - JDBC Driver . SKIPPED [INFO] Apache Kylin - Assembly SKIPPED [INFO] Apache Kylin - Tool SKIPPED [INFO] Apache Kylin - Tool Assembly ... SKIPPED [INFO] Apache Kylin - Integration Test SKIPPED [INFO] Apache Kylin - Tomcat Extension 3.0.0-SNAPSHOT . SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 10:42 min [INFO] Finished at: 2019-07-12T08:59:26+08:00 [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on project kylin-core-job: There are test failures. [ERROR] [ERROR] Please refer to /Users/zhuweibin/ant_code/OpenSource/kylin/core-job/../target/surefire-reports for the individual test results. [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :kylin-core-job {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KYLIN-2517) Upgrade hbase dependency to 1.4.7
[ https://issues.apache.org/jira/browse/KYLIN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-2517: - Assignee: weibin0516 > Upgrade hbase dependency to 1.4.7 > - > > Key: KYLIN-2517 > URL: https://issues.apache.org/jira/browse/KYLIN-2517 > Project: Kylin > Issue Type: Improvement >Reporter: Ted Yu >Assignee: weibin0516 >Priority: Major > > There have been major enhancements / bug fixes since the hbase 1.1.1 release. > This issue is to upgrade to 1.4.7 release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KYLIN-3519) Upgrade Jacoco version to 0.8.2
[ https://issues.apache.org/jira/browse/KYLIN-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 reassigned KYLIN-3519: - Assignee: weibin0516 > Upgrade Jacoco version to 0.8.2 > --- > > Key: KYLIN-3519 > URL: https://issues.apache.org/jira/browse/KYLIN-3519 > Project: Kylin > Issue Type: Improvement >Reporter: Ted Yu >Assignee: weibin0516 >Priority: Minor > > Jacoco 0.8.2 adds Java 11 support: >https://github.com/jacoco/jacoco/releases/tag/v0.8.2 > Java 11 RC1 is out. > We should consider upgrading Jacoco. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-4069) HivePushDownConverter.doConvert will change sql semantics in some scenarios
weibin0516 created KYLIN-4069: - Summary: HivePushDownConverter.doConvert will change sql semantics in some scenarios Key: KYLIN-4069 URL: https://issues.apache.org/jira/browse/KYLIN-4069 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v2.6.2 Reporter: weibin0516 Assignee: weibin0516 HivePushDownConverter.doConvert source code is as follows: {code:java} public static String doConvert(String originStr, boolean isPrepare) { // Step1.Replace " with ` String convertedSql = replaceString(originStr, "\"", "`"); // Step2.Replace extract functions convertedSql = extractReplace(convertedSql); // Step3.Replace cast type string convertedSql = castReplace(convertedSql); // Step4.Replace sub query convertedSql = subqueryReplace(convertedSql); // Step5.Replace char_length with length convertedSql = replaceString(convertedSql, "CHAR_LENGTH", "LENGTH"); convertedSql = replaceString(convertedSql, "char_length", "length"); // Step6.Replace "||" with concat convertedSql = concatReplace(convertedSql); // Step7.Add quote for interval in timestampadd convertedSql = timestampAddDiffReplace(convertedSql); // Step8.Replace integer with int convertedSql = replaceString(convertedSql, "INTEGER", "INT"); convertedSql = replaceString(convertedSql, "integer", "int"); // Step9.Add limit 1 for prepare select sql to speed up if (isPrepare) { convertedSql = addLimit(convertedSql); } return convertedSql; } {code} It is not advisable to directly replace the sql text. The following example will convert sql to another error sql: {code:sql} SELECT "CHAR_LENGTH" FROM datasource.a {code} will convert to {code:sql} SELECT `LENGTH` FROM datasource.a {code} Every use of replaceString in doConvert will cause such problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-4068) Automatically add limit has bug
weibin0516 created KYLIN-4068: - Summary: Automatically add limit has bug Key: KYLIN-4068 URL: https://issues.apache.org/jira/browse/KYLIN-4068 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v2.6.2 Reporter: weibin0516 Assignee: weibin0516 {code:sql} SELECT E_Name FROM Employees_China UNION SELECT E_Name FROM Employees_USA {code} will convert to {code:sql} SELECT E_Name FROM Employees_China UNION SELECT E_Name FROM Employees_USA LIMIT 5 {code} This limit is not working on the result of union, but on SELECT E_Name FROM Employees_USA. We should use a more secure way to achieve the limit effect. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3832) Kylin pushdown to support postgresql
[ https://issues.apache.org/jira/browse/KYLIN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875792#comment-16875792 ] weibin0516 commented on KYLIN-3832: --- [~Shaofengshi], thanks for assigning to me, i will try to implement this. > Kylin pushdown to support postgresql > > > Key: KYLIN-3832 > URL: https://issues.apache.org/jira/browse/KYLIN-3832 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Affects Versions: v2.5.2 >Reporter: hailin.huang >Assignee: weibin0516 >Priority: Major > > when I run pushdown to postgresql in my env, I encount the below exception. > it seems that kylin need support more JDBC Driver, > PushDownRunnerJdbcImpl.class should be more general. > 2019-02-26 16:12:53,168 ERROR [Query 207dcf77-7c14-8078-ea8b-79644a0c576d-48] > service.QueryService:989 : pushdown engine failed current query too > java.sql.SQLException: Unrecognized column type: int8 > at > org.apache.kylin.query.adhoc.PushDownRunnerJdbcImpl.toSqlType(PushDownRunnerJdbcImpl.java:260) > at > org.apache.kylin.query.adhoc.PushDownRunnerJdbcImpl.extractColumnMeta(PushDownRunnerJdbcImpl.java:192) > at > org.apache.kylin.query.adhoc.PushDownRunnerJdbcImpl.executeQuery(PushDownRunnerJdbcImpl.java:68) > at > org.apache.kylin.query.util.PushDownUtil.tryPushDownQuery(PushDownUtil.java:122) > at > org.apache.kylin.query.util.PushDownUtil.tryPushDownSelectQuery(PushDownUtil.java:69) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-4061) Swap inner join's left side, right side table will get different result when query
[ https://issues.apache.org/jira/browse/KYLIN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875789#comment-16875789 ] weibin0516 commented on KYLIN-4061: --- [~Shaofengshi], thanks for explaining. > Swap inner join's left side, right side table will get different result when > query > -- > > Key: KYLIN-4061 > URL: https://issues.apache.org/jira/browse/KYLIN-4061 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.2 >Reporter: weibin0516 >Priority: Major > Attachments: failed.png, succeed.png > > > When the left side table of inner join is a fact table and the right side > table is a lookup table, will query cube and get correct result. Sql is as > follows. > {code:java} > SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), > COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) > FROM KYLIN_SALES > INNER JOIN KYLIN_ACCOUNT ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID > WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 > GROUP BY KYLIN_SALES.TRANS_ID > ORDER BY TRANS_ID > LIMIT 10;{code} > > However,when swap the left and right side tables of the inner join will > failed due to no realization found. Sql is as follows. > {code:java} > SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), > COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) > FROM KYLIN_ACCOUNT > INNER JOIN KYLIN_SALES ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID > WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 > GROUP BY KYLIN_SALES.TRANS_ID > ORDER BY TRANS_ID > LIMIT 10;{code} > We know that the above two sql semantics are consistent and should return the > same result. > I looked at the source code, kylin will use context.firstTableScan(assigned > in OLAPTableScan.implementOLAP) as the fact table, whether it is or not. The > fact table will be the key evidence for choosing realization later. So, in > the second sql Regard a lookup table as a fact table can not find > corresponding realization. > Is this a bug, do we need to fix it? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-4061) Swap inner join's left side, right side table will get different result when query
[ https://issues.apache.org/jira/browse/KYLIN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4061: -- Description: When the left side table of inner join is a fact table and the right side table is a lookup table, will query cube and get correct result. Sql is as follows. {code:java} SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) FROM KYLIN_SALES INNER JOIN KYLIN_ACCOUNT ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 GROUP BY KYLIN_SALES.TRANS_ID ORDER BY TRANS_ID LIMIT 10;{code} However,when swap the left and right side tables of the inner join will failed due to no realization found. Sql is as follows. {code:java} SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) FROM KYLIN_ACCOUNT INNER JOIN KYLIN_SALES ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 GROUP BY KYLIN_SALES.TRANS_ID ORDER BY TRANS_ID LIMIT 10;{code} We know that the above two sql semantics are consistent and should return the same result. I looked at the source code, kylin will use context.firstTableScan(assigned in OLAPTableScan.implementOLAP) as the fact table, whether it is or not. The fact table will be the key evidence for choosing realization later. So, in the second sql Regard a lookup table as a fact table can not find corresponding realization. Is this a bug, do we need to fix it? was: When the left side table of inner join is a fact table and the right side table is a lookup table, will query cube and get correct result. Sql is as follows. {code:java} SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) FROM KYLIN_SALES INNER JOIN KYLIN_ACCOUNT ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 GROUP BY KYLIN_SALES.TRANS_ID ORDER BY TRANS_ID LIMIT 10;{code} However,when swap the left and right side tables of the inner join will failed due to no realization found. Sql is as follows. {code:java} SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) FROM KYLIN_ACCOUNT INNER JOIN KYLIN_SALES ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 GROUP BY KYLIN_SALES.TRANS_ID ORDER BY TRANS_ID LIMIT 10;{code} We know that the above two sql semantics are consistent and should return the same result. I looked at the source code, kylin will use context.firstTableScan(assigned in OLAPTableScan.implementOLAP) as the fact table, whether it is or not. Is this a bug, do we need to fix it? > Swap inner join's left side, right side table will get different result when > query > -- > > Key: KYLIN-4061 > URL: https://issues.apache.org/jira/browse/KYLIN-4061 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.2 >Reporter: weibin0516 >Priority: Major > Attachments: failed.png, succeed.png > > > When the left side table of inner join is a fact table and the right side > table is a lookup table, will query cube and get correct result. Sql is as > follows. > {code:java} > SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), > COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) > FROM KYLIN_SALES > INNER JOIN KYLIN_ACCOUNT ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID > WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 > GROUP BY KYLIN_SALES.TRANS_ID > ORDER BY TRANS_ID > LIMIT 10;{code} > > However,when swap the left and right side tables of the inner join will > failed due to no realization found. Sql is as follows. > {code:java} > SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), > COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) > FROM KYLIN_ACCOUNT > INNER JOIN KYLIN_SALES ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID > WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 > GROUP BY KYLIN_SALES.TRANS_ID > ORDER BY TRANS_ID > LIMIT 10;{code} > We know that the above two sql semantics are consistent and should return the > same result. > I looked at the source code, kylin will use context.firstTableScan(assigned > in OLAPTableScan.implementOLAP) as the fact table, whether it is or not. The > fact table will be the key evidence for choosing realization later. So, in > the second sql Regard a lookup table as a fact table can not find > corresponding realization. > Is this a bug, do we need to fix it? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-4061) Swap inner join's left side, right side table will get different result when query
[ https://issues.apache.org/jira/browse/KYLIN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4061: -- Description: When the left side table of inner join is a fact table and the right side table is a lookup table, will query cube and get correct result. Sql is as follows. {code:java} SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) FROM KYLIN_SALES INNER JOIN KYLIN_ACCOUNT ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 GROUP BY KYLIN_SALES.TRANS_ID ORDER BY TRANS_ID LIMIT 10;{code} However,when swap the left and right side tables of the inner join will failed due to no realization found. Sql is as follows. {code:java} SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) FROM KYLIN_ACCOUNT INNER JOIN KYLIN_SALES ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 GROUP BY KYLIN_SALES.TRANS_ID ORDER BY TRANS_ID LIMIT 10;{code} We know that the above two sql semantics are consistent and should return the same result. I looked at the source code, kylin will use context.firstTableScan(assigned in OLAPTableScan.implementOLAP) as the fact table, whether it is or not. Is this a bug, do we need to fix it? was: When the left side table of inner join is a fact table and the right side table is a lookup table, will query cube and get correct result. Sql is as follows. ``` SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) FROM KYLIN_SALES INNER JOIN KYLIN_ACCOUNT ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 GROUP BY KYLIN_SALES.TRANS_ID ORDER BY TRANS_ID LIMIT 10; ``` However,when swap the left and right side tables of the inner join will failed due to no realization found. Sql is as follows. ``` SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) FROM KYLIN_ACCOUNT INNER JOIN KYLIN_SALES ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 GROUP BY KYLIN_SALES.TRANS_ID ORDER BY TRANS_ID LIMIT 10; ``` We know that the above two sql semantics are consistent and should return the same result. I looked at the source code, kylin will use context.firstTableScan(assigned in OLAPTableScan.implementOLAP) as the fact table, whether it is or not. Is this a bug, do we need to fix it? > Swap inner join's left side, right side table will get different result when > query > -- > > Key: KYLIN-4061 > URL: https://issues.apache.org/jira/browse/KYLIN-4061 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.2 >Reporter: weibin0516 >Priority: Major > Attachments: failed.png, succeed.png > > > When the left side table of inner join is a fact table and the right side > table is a lookup table, will query cube and get correct result. Sql is as > follows. > {code:java} > SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), > COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) > FROM KYLIN_SALES > INNER JOIN KYLIN_ACCOUNT ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID > WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 > GROUP BY KYLIN_SALES.TRANS_ID > ORDER BY TRANS_ID > LIMIT 10;{code} > > However,when swap the left and right side tables of the inner join will > failed due to no realization found. Sql is as follows. > {code:java} > SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), > COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) > FROM KYLIN_ACCOUNT > INNER JOIN KYLIN_SALES ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID > WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 > GROUP BY KYLIN_SALES.TRANS_ID > ORDER BY TRANS_ID > LIMIT 10;{code} > We know that the above two sql semantics are consistent and should return the > same result. > I looked at the source code, kylin will use context.firstTableScan(assigned > in OLAPTableScan.implementOLAP) as the fact table, whether it is or not. > Is this a bug, do we need to fix it? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-4061) Swap inner join's left side, right side table will get different result when query
[ https://issues.apache.org/jira/browse/KYLIN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-4061: -- Attachment: succeed.png failed.png > Swap inner join's left side, right side table will get different result when > query > -- > > Key: KYLIN-4061 > URL: https://issues.apache.org/jira/browse/KYLIN-4061 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.2 >Reporter: weibin0516 >Priority: Major > Attachments: failed.png, succeed.png > > > When the left side table of inner join is a fact table and the right side > table is a lookup table, will query cube and get correct result. Sql is as > follows. > ``` > SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), > COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) > FROM KYLIN_SALES > INNER JOIN KYLIN_ACCOUNT ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID > WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 > GROUP BY KYLIN_SALES.TRANS_ID > ORDER BY TRANS_ID > LIMIT 10; > ``` > > However,when swap the left and right side tables of the inner join will > failed due to no realization found. Sql is as follows. > ``` > SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), > COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) > FROM KYLIN_ACCOUNT > INNER JOIN KYLIN_SALES ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID > WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 > GROUP BY KYLIN_SALES.TRANS_ID > ORDER BY TRANS_ID > LIMIT 10; > ``` > We know that the above two sql semantics are consistent and should return the > same result. > I looked at the source code, kylin will use context.firstTableScan(assigned > in OLAPTableScan.implementOLAP) as the fact table, whether it is or not. > Is this a bug, do we need to fix it? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-4061) Swap inner join's left side, right side table will get different result when query
weibin0516 created KYLIN-4061: - Summary: Swap inner join's left side, right side table will get different result when query Key: KYLIN-4061 URL: https://issues.apache.org/jira/browse/KYLIN-4061 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v2.5.2 Reporter: weibin0516 When the left side table of inner join is a fact table and the right side table is a lookup table, will query cube and get correct result. Sql is as follows. ``` SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) FROM KYLIN_SALES INNER JOIN KYLIN_ACCOUNT ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 GROUP BY KYLIN_SALES.TRANS_ID ORDER BY TRANS_ID LIMIT 10; ``` However,when swap the left and right side tables of the inner join will failed due to no realization found. Sql is as follows. ``` SELECT KYLIN_SALES.TRANS_ID, SUM(KYLIN_SALES.PRICE), COUNT(KYLIN_ACCOUNT.ACCOUNT_ID) FROM KYLIN_ACCOUNT INNER JOIN KYLIN_SALES ON KYLIN_SALES.BUYER_ID = KYLIN_ACCOUNT.ACCOUNT_ID WHERE KYLIN_SALES.LSTG_SITE_ID != 1000 GROUP BY KYLIN_SALES.TRANS_ID ORDER BY TRANS_ID LIMIT 10; ``` We know that the above two sql semantics are consistent and should return the same result. I looked at the source code, kylin will use context.firstTableScan(assigned in OLAPTableScan.implementOLAP) as the fact table, whether it is or not. Is this a bug, do we need to fix it? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3679) Fetch Kafka topic with Spark streaming
[ https://issues.apache.org/jira/browse/KYLIN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875486#comment-16875486 ] weibin0516 commented on KYLIN-3679: --- Hi, [~Shaofengshi], i hope to implement this feature, please assign it to me. Thanks~ > Fetch Kafka topic with Spark streaming > -- > > Key: KYLIN-3679 > URL: https://issues.apache.org/jira/browse/KYLIN-3679 > Project: Kylin > Issue Type: New Feature > Components: Spark Engine >Reporter: Shaofeng SHI >Priority: Major > > Now Kylin uses a MR job to fetch Kafka messages in parallel and then persist > to HDFS for subsequent processing. If user selects to use Spark engine, we > can use Spark streaming API to do this. Spark streaming can read the Kafka > message in a given offset range as a RDD, then it would be easy to process; > https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html > With Spark streaming, Kylin can also easily connect with other data source > like Kinesis, Flume, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (KYLIN-3832) Kylin Pushdown query not support postgresql
[ https://issues.apache.org/jira/browse/KYLIN-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weibin0516 updated KYLIN-3832: -- Comment: was deleted (was: I'd like to implement the postgresql data source adapter. Please assign to me.) > Kylin Pushdown query not support postgresql > --- > > Key: KYLIN-3832 > URL: https://issues.apache.org/jira/browse/KYLIN-3832 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.5.2 >Reporter: hailin.huang >Priority: Major > Fix For: Future > > > when I run pushdown to postgresql in my env, I encount the below exception. > it seems that kylin need support more JDBC Driver, > PushDownRunnerJdbcImpl.class should be more general. > 2019-02-26 16:12:53,168 ERROR [Query 207dcf77-7c14-8078-ea8b-79644a0c576d-48] > service.QueryService:989 : pushdown engine failed current query too > java.sql.SQLException: Unrecognized column type: int8 > at > org.apache.kylin.query.adhoc.PushDownRunnerJdbcImpl.toSqlType(PushDownRunnerJdbcImpl.java:260) > at > org.apache.kylin.query.adhoc.PushDownRunnerJdbcImpl.extractColumnMeta(PushDownRunnerJdbcImpl.java:192) > at > org.apache.kylin.query.adhoc.PushDownRunnerJdbcImpl.executeQuery(PushDownRunnerJdbcImpl.java:68) > at > org.apache.kylin.query.util.PushDownUtil.tryPushDownQuery(PushDownUtil.java:122) > at > org.apache.kylin.query.util.PushDownUtil.tryPushDownSelectQuery(PushDownUtil.java:69) -- This message was sent by Atlassian JIRA (v7.6.3#76005)