[jira] [Work logged] (HIVE-26242) Compaction heartbeater improvements
[ https://issues.apache.org/jira/browse/HIVE-26242?focusedWorklogId=772684&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772684 ] ASF GitHub Bot logged work on HIVE-26242: - Author: ASF GitHub Bot Created on: 20/May/22 06:48 Start Date: 20/May/22 06:48 Worklog Time Spent: 10m Work Description: veghlaci05 opened a new pull request, #3303: URL: https://github.com/apache/hive/pull/3303 ### What changes were proposed in this pull request? This PR introduces a new component (CompactionHeartbeatService) to centralize the compaction transaction heartbeating, and reduce resource usage. It consist of a Scheduled exevutor service and a MetaStoreClient pool. ### Why are the changes needed? The heartbeat of the compaction txns are wasting resources by having a dedicated separate thread and MetaStore client for every compaction transaction. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually and through unit tests. Issue Time Tracking --- Worklog Id: (was: 772684) Remaining Estimate: 0h Time Spent: 10m > Compaction heartbeater improvements > --- > > Key: HIVE-26242 > URL: https://issues.apache.org/jira/browse/HIVE-26242 > Project: Hive > Issue Type: Improvement >Reporter: László Végh >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The Compaction heartbeater should be improved the following ways: > * The metastore clients should be reused between heartbeats and closed only > at the end, when the transaction ends > * Instead of having a dedicated heartbeater thread for each Compaction > transaction, there should be shared a heartbeater executor where the > heartbeat tasks can be scheduled/submitted. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26242) Compaction heartbeater improvements
[ https://issues.apache.org/jira/browse/HIVE-26242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26242: -- Labels: pull-request-available (was: ) > Compaction heartbeater improvements > --- > > Key: HIVE-26242 > URL: https://issues.apache.org/jira/browse/HIVE-26242 > Project: Hive > Issue Type: Improvement >Reporter: László Végh >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The Compaction heartbeater should be improved the following ways: > * The metastore clients should be reused between heartbeats and closed only > at the end, when the transaction ends > * Instead of having a dedicated heartbeater thread for each Compaction > transaction, there should be shared a heartbeater executor where the > heartbeat tasks can be scheduled/submitted. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26239) Shutdown Hash table load executor service threads when they are interrupted
[ https://issues.apache.org/jira/browse/HIVE-26239?focusedWorklogId=772673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772673 ] ASF GitHub Bot logged work on HIVE-26239: - Author: ASF GitHub Bot Created on: 20/May/22 05:57 Start Date: 20/May/22 05:57 Worklog Time Spent: 10m Work Description: ramesh0201 opened a new pull request, #3302: URL: https://github.com/apache/hive/pull/3302 …y are interrupted ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Issue Time Tracking --- Worklog Id: (was: 772673) Remaining Estimate: 0h Time Spent: 10m > Shutdown Hash table load executor service threads when they are interrupted > --- > > Key: HIVE-26239 > URL: https://issues.apache.org/jira/browse/HIVE-26239 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26239) Shutdown Hash table load executor service threads when they are interrupted
[ https://issues.apache.org/jira/browse/HIVE-26239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26239: -- Labels: pull-request-available (was: ) > Shutdown Hash table load executor service threads when they are interrupted > --- > > Key: HIVE-26239 > URL: https://issues.apache.org/jira/browse/HIVE-26239 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25813) CREATE TABLE x LIKE storagehandler-based-source fails
[ https://issues.apache.org/jira/browse/HIVE-25813?focusedWorklogId=772672&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772672 ] ASF GitHub Bot logged work on HIVE-25813: - Author: ASF GitHub Bot Created on: 20/May/22 05:55 Start Date: 20/May/22 05:55 Worklog Time Spent: 10m Work Description: saihemanth-cloudera opened a new pull request, #3301: URL: https://github.com/apache/hive/pull/3301 …over is fixed ### What changes were proposed in this pull request? CTLT commands based on external tables will not copy over table properties of the source table. ### Why are the changes needed? Because it is consistent with other Query engines like MySql and Redshift. ### Does this PR introduce _any_ user-facing change? Yes, When users execute CREATE TABLE x LIKE command, we'll no more copy the table properties of source table to the destination table, we'll only copy the table/column schema of the source table. ### How was this patch tested? Local machine, Unit tests Issue Time Tracking --- Worklog Id: (was: 772672) Time Spent: 50m (was: 40m) > CREATE TABLE x LIKE storagehandler-based-source fails > -- > > Key: HIVE-25813 > URL: https://issues.apache.org/jira/browse/HIVE-25813 > Project: Hive > Issue Type: Bug >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > {code:java} > CREATE EXTERNAL TABLE default.dbs ( > DB_IDbigint, > DB_LOCATION_URI string, > NAME string, > OWNER_NAME string, > OWNER_TYPE string ) > STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' > TBLPROPERTIES ( > 'hive.sql.database.type' = 'MYSQL', > 'hive.sql.jdbc.driver' = 'com.mysql.jdbc.Driver', > 'hive.sql.jdbc.url' = 'jdbc:mysql://localhost:3306/hive1', > 'hive.sql.dbcp.username' = 'hive1', > 'hive.sql.dbcp.password' = 'cloudera', > 'hive.sql.query' = 'SELECT DB_ID, DB_LOCATION_URI, NAME, OWNER_NAME, > OWNER_TYPE FROM DBS' > ); > CREATE TABLE default.dbscopy LIKE default.dbs; > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getFieldsFromDeserializer(HiveMetaStoreUtils.java:186) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Reopened] (HIVE-25813) CREATE TABLE x LIKE storagehandler-based-source fails
[ https://issues.apache.org/jira/browse/HIVE-25813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala reopened HIVE-25813: -- Reopening this as HIVE-25989 only partially fixes the issues. > CREATE TABLE x LIKE storagehandler-based-source fails > -- > > Key: HIVE-25813 > URL: https://issues.apache.org/jira/browse/HIVE-25813 > Project: Hive > Issue Type: Bug >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {code:java} > CREATE EXTERNAL TABLE default.dbs ( > DB_IDbigint, > DB_LOCATION_URI string, > NAME string, > OWNER_NAME string, > OWNER_TYPE string ) > STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' > TBLPROPERTIES ( > 'hive.sql.database.type' = 'MYSQL', > 'hive.sql.jdbc.driver' = 'com.mysql.jdbc.Driver', > 'hive.sql.jdbc.url' = 'jdbc:mysql://localhost:3306/hive1', > 'hive.sql.dbcp.username' = 'hive1', > 'hive.sql.dbcp.password' = 'cloudera', > 'hive.sql.query' = 'SELECT DB_ID, DB_LOCATION_URI, NAME, OWNER_NAME, > OWNER_TYPE FROM DBS' > ); > CREATE TABLE default.dbscopy LIKE default.dbs; > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getFieldsFromDeserializer(HiveMetaStoreUtils.java:186) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26192) JDBC data connector queries occur exception at cbo stage
[ https://issues.apache.org/jira/browse/HIVE-26192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539890#comment-17539890 ] zhangbutao commented on HIVE-26192: --- [~ngangam] I think other DBs also have this issue. I'll check it out as soon as. > JDBC data connector queries occur exception at cbo stage > - > > Key: HIVE-26192 > URL: https://issues.apache.org/jira/browse/HIVE-26192 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0-alpha-2 >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 50m > Remaining Estimate: 0h > > If you do a select query qtest with jdbc data connector, you will see > exception at cbo stage: > {code:java} > [ERROR] Failures: > [ERROR] TestMiniLlapCliDriver.testCliDriver:62 Client execution failed with > error code = 4 > running > select * from country > fname=dataconnector_mysql.qSee ./ql/target/tmp/log/hive.log or > ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports > or ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.parse.SemanticException: Table qtestDB.country was > not found in the database > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genTableLogicalPlan(CalcitePlanner.java:3078) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5048) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1665) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1605) > at > org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) > at > org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) > at > org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1357) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:567) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12587) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:460) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:452) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:416) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:410) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:200) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:126) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:421) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:352) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:727) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:697) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:114) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver(TestMiniLlapCliDriver.java:62) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.Framewo
[jira] [Work logged] (HIVE-26046) MySQL's bit datatype is default to void datatype in hive
[ https://issues.apache.org/jira/browse/HIVE-26046?focusedWorklogId=772656&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772656 ] ASF GitHub Bot logged work on HIVE-26046: - Author: ASF GitHub Bot Created on: 20/May/22 03:03 Start Date: 20/May/22 03:03 Worklog Time Spent: 10m Work Description: zhangbutao commented on code in PR #3276: URL: https://github.com/apache/hive/pull/3276#discussion_r877690343 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java: ## @@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) { // map any db specific types here. switch (dbDataType.toLowerCase()) { +case "bit": + return toHiveBitType(size); default: mappedType = ColumnType.VOID_TYPE_NAME; break; } return mappedType; } + + private String toHiveBitType(int size) { +if (size <= 1) { + return ColumnType.BOOLEAN_TYPE_NAME; +} else { + return ColumnType.BIGINT_TYPE_NAME; Review Comment: > for BIT(n), if we represent the value as bigint(decimal) in hive, will the original intent have been lost? a select would return 56 which visually doesn't represent how it is originally stored I think users migiht be able to convert the result **56** to bit **"111000"** themselves instead of hive internally. :) I will research how to convert result using hive functions or UDFs as you suggsted. or maybe we can rewrite query sql to "select bin(col)" after hive compile. Issue Time Tracking --- Worklog Id: (was: 772656) Time Spent: 1h 50m (was: 1h 40m) > MySQL's bit datatype is default to void datatype in hive > > > Key: HIVE-26046 > URL: https://issues.apache.org/jira/browse/HIVE-26046 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: Naveen Gangam >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > describe on a table that contains a "bit" datatype gets mapped to void. We > need a explicit conversion logic in the MySQL ConnectorProvider to map it to > a suitable datatype in hive. > {noformat} > +---+---++ > | col_name| data_type > | comment | > +---+---++ > | tbl_id| bigint > | from deserializer | > | create_time | int > | from deserializer | > | db_id | bigint > | from deserializer | > | last_access_time | int > | from deserializer | > | owner | varchar(767) > | from deserializer | > | owner_type| varchar(10) > | from deserializer | > | retention | int > | from deserializer | > | sd_id | bigint > | from deserializer | > | tbl_name | varchar(256) > | from deserializer | > | tbl_type | varchar(128) > | from deserializer | > | view_expanded_text| string > | from deserializer | > | view_original_text| string > | from deserializer | > | is_rewrite_enabled| void > | from deserializer | > | write_id | bigint > | from deserializer
[jira] [Work logged] (HIVE-26046) MySQL's bit datatype is default to void datatype in hive
[ https://issues.apache.org/jira/browse/HIVE-26046?focusedWorklogId=772654&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772654 ] ASF GitHub Bot logged work on HIVE-26046: - Author: ASF GitHub Bot Created on: 20/May/22 02:57 Start Date: 20/May/22 02:57 Worklog Time Spent: 10m Work Description: zhangbutao commented on code in PR #3276: URL: https://github.com/apache/hive/pull/3276#discussion_r877690343 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java: ## @@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) { // map any db specific types here. switch (dbDataType.toLowerCase()) { +case "bit": + return toHiveBitType(size); default: mappedType = ColumnType.VOID_TYPE_NAME; break; } return mappedType; } + + private String toHiveBitType(int size) { +if (size <= 1) { + return ColumnType.BOOLEAN_TYPE_NAME; +} else { + return ColumnType.BIGINT_TYPE_NAME; Review Comment: > for BIT(n), if we represent the value as bigint(decimal) in hive, will the original intent have been lost? a select would return 56 which visually doesn't represent how it is originally stored I think users migiht be able to convert the result **56** to bit **"111000"** themselves instead hive internally. :) I will research how to do convert result using hive functions or UDFs as you suggsted. or maybe we can rewrite query sql to "select bin(col)" after hive compile. Issue Time Tracking --- Worklog Id: (was: 772654) Time Spent: 1h 40m (was: 1.5h) > MySQL's bit datatype is default to void datatype in hive > > > Key: HIVE-26046 > URL: https://issues.apache.org/jira/browse/HIVE-26046 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: Naveen Gangam >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > describe on a table that contains a "bit" datatype gets mapped to void. We > need a explicit conversion logic in the MySQL ConnectorProvider to map it to > a suitable datatype in hive. > {noformat} > +---+---++ > | col_name| data_type > | comment | > +---+---++ > | tbl_id| bigint > | from deserializer | > | create_time | int > | from deserializer | > | db_id | bigint > | from deserializer | > | last_access_time | int > | from deserializer | > | owner | varchar(767) > | from deserializer | > | owner_type| varchar(10) > | from deserializer | > | retention | int > | from deserializer | > | sd_id | bigint > | from deserializer | > | tbl_name | varchar(256) > | from deserializer | > | tbl_type | varchar(128) > | from deserializer | > | view_expanded_text| string > | from deserializer | > | view_original_text| string > | from deserializer | > | is_rewrite_enabled| void > | from deserializer | > | write_id | bigint > | from deserializer
[jira] [Work logged] (HIVE-26227) Add support of catalog related statements for Hive ql
[ https://issues.apache.org/jira/browse/HIVE-26227?focusedWorklogId=772653&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772653 ] ASF GitHub Bot logged work on HIVE-26227: - Author: ASF GitHub Bot Created on: 20/May/22 02:53 Start Date: 20/May/22 02:53 Worklog Time Spent: 10m Work Description: boneanxs commented on PR #3288: URL: https://github.com/apache/hive/pull/3288#issuecomment-1132402520 A very great work! it's much easier for us to manage catalogs with DDL. Issue Time Tracking --- Worklog Id: (was: 772653) Time Spent: 0.5h (was: 20m) > Add support of catalog related statements for Hive ql > - > > Key: HIVE-26227 > URL: https://issues.apache.org/jira/browse/HIVE-26227 > Project: Hive > Issue Type: Task > Components: Hive >Reporter: Wechar >Assignee: Wechar >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Catalog concept is proposed to Hive 3.0 to allow different systems to connect > to different catalogs in the metastore. But so far we can not query catalog > through Hive ql, this task aims to implement the ddl statements related to > catalog. > *Create Catalog* > {code:sql} > CREATE CATALOG [IF NOT EXISTS] catalog_name > LOCATION hdfs_path > [COMMENT catalog_comment]; > {code} > LOCATION is required for creating a new catalog now. > *Alter Catalog* > {code:sql} > ALTER CATALOG catalog_name SET LOCATION hdfs_path; > {code} > Only location metadata can be altered for catalog. > *Drop Catalog* > {code:sql} > DROP CATALOG [IF EXISTS] catalog_name; > {code} > DROP CATALOG is always RESTRICT, which means DROP CATALOG will fail if there > are non-default databases in the catalog. > *Show Catalogs* > {code:sql} > SHOW CATALOGS [LIKE 'identifier_with_wildcards']; > {code} > SHOW CATALOGS lists all of the catalogs defined in the metastore. > The optional LIKE clause allows the list of catalogs to be filtered using a > regular expression. > *Describe Catalog* > {code:sql} > DESC[RIBE] CATALOG [EXTENDED] cat_name; > {code} > DESCRIBE CATALOG shows the name of the catalog, its comment (if one has been > set), and its root location on the filesystem. > EXTENDED also shows the create time. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-25667) Unify code managing JDBC databases in tests
[ https://issues.apache.org/jira/browse/HIVE-25667?focusedWorklogId=772639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772639 ] ASF GitHub Bot logged work on HIVE-25667: - Author: ASF GitHub Bot Created on: 20/May/22 00:18 Start Date: 20/May/22 00:18 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on PR #2919: URL: https://github.com/apache/hive/pull/2919#issuecomment-1132319117 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. Issue Time Tracking --- Worklog Id: (was: 772639) Time Spent: 3h (was: 2h 50m) > Unify code managing JDBC databases in tests > --- > > Key: HIVE-25667 > URL: https://issues.apache.org/jira/browse/HIVE-25667 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Mark Bathori >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > Currently there are two class hierarchies managing JDBC databases in tests, > [DatabaseRule|https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java] > and > [AbstractExternalDB|https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java]. > There are many similarities between these hierarchies and certain parts are > duplicated. > The goal of this JIRA is to refactor the aforementioned hierarchies to reduce > code duplication and improve extensibility. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26046) MySQL's bit datatype is default to void datatype in hive
[ https://issues.apache.org/jira/browse/HIVE-26046?focusedWorklogId=772510&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772510 ] ASF GitHub Bot logged work on HIVE-26046: - Author: ASF GitHub Bot Created on: 19/May/22 15:43 Start Date: 19/May/22 15:43 Worklog Time Spent: 10m Work Description: nrg4878 commented on code in PR #3276: URL: https://github.com/apache/hive/pull/3276#discussion_r877230118 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java: ## @@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) { // map any db specific types here. switch (dbDataType.toLowerCase()) { +case "bit": + return toHiveBitType(size); default: mappedType = ColumnType.VOID_TYPE_NAME; break; } return mappedType; } + + private String toHiveBitType(int size) { +if (size <= 1) { + return ColumnType.BOOLEAN_TYPE_NAME; +} else { + return ColumnType.BIGINT_TYPE_NAME; Review Comment: absolutely agreed on the BIT(1) that the intent is to store a boolean. for BIT(n), if we represent the value as bigint(decimal) in hive, will the original intent have been lost? a select would return **56** which visually doesn't represent how it is originally stored. Just thinking out loud here, in this case are we better off representing these BITS as a string in hive. So we see something like this **"111000"** on a select. I think in both cases, we can apply functions (or custom UDFs) to cast this value back to BITs. Issue Time Tracking --- Worklog Id: (was: 772510) Time Spent: 1.5h (was: 1h 20m) > MySQL's bit datatype is default to void datatype in hive > > > Key: HIVE-26046 > URL: https://issues.apache.org/jira/browse/HIVE-26046 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: Naveen Gangam >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > describe on a table that contains a "bit" datatype gets mapped to void. We > need a explicit conversion logic in the MySQL ConnectorProvider to map it to > a suitable datatype in hive. > {noformat} > +---+---++ > | col_name| data_type > | comment | > +---+---++ > | tbl_id| bigint > | from deserializer | > | create_time | int > | from deserializer | > | db_id | bigint > | from deserializer | > | last_access_time | int > | from deserializer | > | owner | varchar(767) > | from deserializer | > | owner_type| varchar(10) > | from deserializer | > | retention | int > | from deserializer | > | sd_id | bigint > | from deserializer | > | tbl_name | varchar(256) > | from deserializer | > | tbl_type | varchar(128) > | from deserializer | > | view_expanded_text| string > | from deserializer | > | view_original_text| string > | from deserializer | > | is_rewrite_enabled| void > | from deserializer | > | write_id | bigint
[jira] [Commented] (HIVE-26192) JDBC data connector queries occur exception at cbo stage
[ https://issues.apache.org/jira/browse/HIVE-26192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539593#comment-17539593 ] Naveen Gangam commented on HIVE-26192: -- Is this an issue specifically with MySQL? or does it happen with other DBs as well ? > JDBC data connector queries occur exception at cbo stage > - > > Key: HIVE-26192 > URL: https://issues.apache.org/jira/browse/HIVE-26192 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0-alpha-2 >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 50m > Remaining Estimate: 0h > > If you do a select query qtest with jdbc data connector, you will see > exception at cbo stage: > {code:java} > [ERROR] Failures: > [ERROR] TestMiniLlapCliDriver.testCliDriver:62 Client execution failed with > error code = 4 > running > select * from country > fname=dataconnector_mysql.qSee ./ql/target/tmp/log/hive.log or > ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports > or ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.parse.SemanticException: Table qtestDB.country was > not found in the database > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genTableLogicalPlan(CalcitePlanner.java:3078) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5048) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1665) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1605) > at > org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) > at > org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) > at > org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1357) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:567) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12587) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:460) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:317) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:106) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:500) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:452) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:416) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:410) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:200) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:126) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:421) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:352) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:727) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:697) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:114) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver(TestMiniLlapCliDriver.java:62) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model
[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1
[ https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=772469&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772469 ] ASF GitHub Bot logged work on HIVE-24484: - Author: ASF GitHub Bot Created on: 19/May/22 14:34 Start Date: 19/May/22 14:34 Worklog Time Spent: 10m Work Description: ayushtkn commented on PR #3279: URL: https://github.com/apache/hive/pull/3279#issuecomment-1131786459 The build is green with ``3.3.3`` I built the distro and checked if it contains reload4j, it doesn't ``` lib % ls -l | grep reload4j lib % ``` Deployed and tried with hadoop-3.3.3, Hive on MR and ran some basic queries and they were working. @steveloughran do we need anything more or 3.3.3 or are we good Issue Time Tracking --- Worklog Id: (was: 772469) Time Spent: 10.55h (was: 10h 23m) > Upgrade Hadoop to 3.3.1 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 10.55h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26227) Add support of catalog related statements for Hive ql
[ https://issues.apache.org/jira/browse/HIVE-26227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539577#comment-17539577 ] Wechar commented on HIVE-26227: --- Not quite yet. In our scene, HMS is in charge of handling catalogs as the metadata center, the other computing engines will only use catalogs while querying. Simply put, we extend the original `db_name.tbl_name` to `cat_name.db_name.tbl_name` in computing engines to support data from different systems or sources. So we do not plan to manage catalogs in other components now. > Add support of catalog related statements for Hive ql > - > > Key: HIVE-26227 > URL: https://issues.apache.org/jira/browse/HIVE-26227 > Project: Hive > Issue Type: Task > Components: Hive >Reporter: Wechar >Assignee: Wechar >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 20m > Remaining Estimate: 0h > > Catalog concept is proposed to Hive 3.0 to allow different systems to connect > to different catalogs in the metastore. But so far we can not query catalog > through Hive ql, this task aims to implement the ddl statements related to > catalog. > *Create Catalog* > {code:sql} > CREATE CATALOG [IF NOT EXISTS] catalog_name > LOCATION hdfs_path > [COMMENT catalog_comment]; > {code} > LOCATION is required for creating a new catalog now. > *Alter Catalog* > {code:sql} > ALTER CATALOG catalog_name SET LOCATION hdfs_path; > {code} > Only location metadata can be altered for catalog. > *Drop Catalog* > {code:sql} > DROP CATALOG [IF EXISTS] catalog_name; > {code} > DROP CATALOG is always RESTRICT, which means DROP CATALOG will fail if there > are non-default databases in the catalog. > *Show Catalogs* > {code:sql} > SHOW CATALOGS [LIKE 'identifier_with_wildcards']; > {code} > SHOW CATALOGS lists all of the catalogs defined in the metastore. > The optional LIKE clause allows the list of catalogs to be filtered using a > regular expression. > *Describe Catalog* > {code:sql} > DESC[RIBE] CATALOG [EXTENDED] cat_name; > {code} > DESCRIBE CATALOG shows the name of the catalog, its comment (if one has been > set), and its root location on the filesystem. > EXTENDED also shows the create time. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26217) Make CTAS use Direct Insert Semantics
[ https://issues.apache.org/jira/browse/HIVE-26217?focusedWorklogId=772403&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772403 ] ASF GitHub Bot logged work on HIVE-26217: - Author: ASF GitHub Bot Created on: 19/May/22 11:38 Start Date: 19/May/22 11:38 Worklog Time Spent: 10m Work Description: SourabhBadhya commented on code in PR #3281: URL: https://github.com/apache/hive/pull/3281#discussion_r876944639 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -7592,6 +7594,22 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input) destTableIsTransactional = tblProps != null && AcidUtils.isTablePropertyTransactional(tblProps); if (destTableIsTransactional) { +isNonNativeTable = AcidUtils.isNonNativeTable(tblProps); +boolean isCtas = tblDesc != null && tblDesc.isCTAS(); +if (AcidUtils.isInsertOnlyTable(tblProps, true)) { + isMmTable = isMmCreate = true; +} +if (!isNonNativeTable && !destTableIsTemporary && isCtas) { + destTableIsFullAcid = AcidUtils.isFullAcidTable(tblProps); + acidOperation = getAcidType(dest); + isDirectInsert = isDirectInsert(destTableIsFullAcid, acidOperation); + boolean enableSuffixing = conf.getBoolVar(ConfVars.HIVE_ACID_CREATE_TABLE_USE_SUFFIX) + || conf.getBoolVar(ConfVars.HIVE_ACID_LOCKLESS_READS_ENABLED); + if (isDirectInsert || isMmTable) { +String location = tblDesc.getLocation(); +destinationPath = location == null ? getCTASDestinationTableLocation(tblDesc, enableSuffixing) : new Path(location); Review Comment: @pvary Thanks for pointing this out. I have updated the patch to handle the use of MetadataTransformer. Please check and let me know if there are any issues with it. Issue Time Tracking --- Worklog Id: (was: 772403) Time Spent: 2h 20m (was: 2h 10m) > Make CTAS use Direct Insert Semantics > - > > Key: HIVE-26217 > URL: https://issues.apache.org/jira/browse/HIVE-26217 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > CTAS on transactional tables currently does a copy from staging location to > table location. This can be avoided by using Direct Insert semantics. Added > support for suffixed table locations as well. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26217) Make CTAS use Direct Insert Semantics
[ https://issues.apache.org/jira/browse/HIVE-26217?focusedWorklogId=772401&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772401 ] ASF GitHub Bot logged work on HIVE-26217: - Author: ASF GitHub Bot Created on: 19/May/22 11:37 Start Date: 19/May/22 11:37 Worklog Time Spent: 10m Work Description: SourabhBadhya commented on code in PR #3281: URL: https://github.com/apache/hive/pull/3281#discussion_r876944852 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -7592,6 +7594,22 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input) destTableIsTransactional = tblProps != null && AcidUtils.isTablePropertyTransactional(tblProps); if (destTableIsTransactional) { +isNonNativeTable = AcidUtils.isNonNativeTable(tblProps); +boolean isCtas = tblDesc != null && tblDesc.isCTAS(); +if (AcidUtils.isInsertOnlyTable(tblProps, true)) { Review Comment: Updated. ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -2876,6 +2876,12 @@ private static boolean isLockableTable(Table t) { } } + public static boolean isNonNativeTable(Map tblProps) { Review Comment: Updated. Issue Time Tracking --- Worklog Id: (was: 772401) Time Spent: 2h 10m (was: 2h) > Make CTAS use Direct Insert Semantics > - > > Key: HIVE-26217 > URL: https://issues.apache.org/jira/browse/HIVE-26217 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > CTAS on transactional tables currently does a copy from staging location to > table location. This can be avoided by using Direct Insert semantics. Added > support for suffixed table locations as well. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26217) Make CTAS use Direct Insert Semantics
[ https://issues.apache.org/jira/browse/HIVE-26217?focusedWorklogId=772397&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772397 ] ASF GitHub Bot logged work on HIVE-26217: - Author: ASF GitHub Bot Created on: 19/May/22 11:34 Start Date: 19/May/22 11:34 Worklog Time Spent: 10m Work Description: SourabhBadhya commented on code in PR #3281: URL: https://github.com/apache/hive/pull/3281#discussion_r876944639 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -7592,6 +7594,22 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input) destTableIsTransactional = tblProps != null && AcidUtils.isTablePropertyTransactional(tblProps); if (destTableIsTransactional) { +isNonNativeTable = AcidUtils.isNonNativeTable(tblProps); +boolean isCtas = tblDesc != null && tblDesc.isCTAS(); +if (AcidUtils.isInsertOnlyTable(tblProps, true)) { + isMmTable = isMmCreate = true; +} +if (!isNonNativeTable && !destTableIsTemporary && isCtas) { + destTableIsFullAcid = AcidUtils.isFullAcidTable(tblProps); + acidOperation = getAcidType(dest); + isDirectInsert = isDirectInsert(destTableIsFullAcid, acidOperation); + boolean enableSuffixing = conf.getBoolVar(ConfVars.HIVE_ACID_CREATE_TABLE_USE_SUFFIX) + || conf.getBoolVar(ConfVars.HIVE_ACID_LOCKLESS_READS_ENABLED); + if (isDirectInsert || isMmTable) { +String location = tblDesc.getLocation(); +destinationPath = location == null ? getCTASDestinationTableLocation(tblDesc, enableSuffixing) : new Path(location); Review Comment: @pvary Thanks for pointing this out. I have edited the patch to handle the use of MetadataTransformer. Please check and let me know if there are any issues with it. Issue Time Tracking --- Worklog Id: (was: 772397) Time Spent: 1h 40m (was: 1.5h) > Make CTAS use Direct Insert Semantics > - > > Key: HIVE-26217 > URL: https://issues.apache.org/jira/browse/HIVE-26217 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > CTAS on transactional tables currently does a copy from staging location to > table location. This can be avoided by using Direct Insert semantics. Added > support for suffixed table locations as well. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26217) Make CTAS use Direct Insert Semantics
[ https://issues.apache.org/jira/browse/HIVE-26217?focusedWorklogId=772398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772398 ] ASF GitHub Bot logged work on HIVE-26217: - Author: ASF GitHub Bot Created on: 19/May/22 11:34 Start Date: 19/May/22 11:34 Worklog Time Spent: 10m Work Description: SourabhBadhya commented on code in PR #3281: URL: https://github.com/apache/hive/pull/3281#discussion_r876944852 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -7592,6 +7594,22 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input) destTableIsTransactional = tblProps != null && AcidUtils.isTablePropertyTransactional(tblProps); if (destTableIsTransactional) { +isNonNativeTable = AcidUtils.isNonNativeTable(tblProps); +boolean isCtas = tblDesc != null && tblDesc.isCTAS(); +if (AcidUtils.isInsertOnlyTable(tblProps, true)) { Review Comment: Done. Issue Time Tracking --- Worklog Id: (was: 772398) Time Spent: 1h 50m (was: 1h 40m) > Make CTAS use Direct Insert Semantics > - > > Key: HIVE-26217 > URL: https://issues.apache.org/jira/browse/HIVE-26217 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > CTAS on transactional tables currently does a copy from staging location to > table location. This can be avoided by using Direct Insert semantics. Added > support for suffixed table locations as well. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26217) Make CTAS use Direct Insert Semantics
[ https://issues.apache.org/jira/browse/HIVE-26217?focusedWorklogId=772400&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772400 ] ASF GitHub Bot logged work on HIVE-26217: - Author: ASF GitHub Bot Created on: 19/May/22 11:34 Start Date: 19/May/22 11:34 Worklog Time Spent: 10m Work Description: SourabhBadhya commented on code in PR #3281: URL: https://github.com/apache/hive/pull/3281#discussion_r876943222 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -2876,6 +2876,12 @@ private static boolean isLockableTable(Table t) { } } + public static boolean isNonNativeTable(Map tblProps) { Review Comment: Done. Issue Time Tracking --- Worklog Id: (was: 772400) Time Spent: 2h (was: 1h 50m) > Make CTAS use Direct Insert Semantics > - > > Key: HIVE-26217 > URL: https://issues.apache.org/jira/browse/HIVE-26217 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > CTAS on transactional tables currently does a copy from staging location to > table location. This can be avoided by using Direct Insert semantics. Added > support for suffixed table locations as well. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26217) Make CTAS use Direct Insert Semantics
[ https://issues.apache.org/jira/browse/HIVE-26217?focusedWorklogId=772396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772396 ] ASF GitHub Bot logged work on HIVE-26217: - Author: ASF GitHub Bot Created on: 19/May/22 11:32 Start Date: 19/May/22 11:32 Worklog Time Spent: 10m Work Description: SourabhBadhya commented on code in PR #3281: URL: https://github.com/apache/hive/pull/3281#discussion_r876943222 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -2876,6 +2876,12 @@ private static boolean isLockableTable(Table t) { } } + public static boolean isNonNativeTable(Map tblProps) { Review Comment: Done Issue Time Tracking --- Worklog Id: (was: 772396) Time Spent: 1.5h (was: 1h 20m) > Make CTAS use Direct Insert Semantics > - > > Key: HIVE-26217 > URL: https://issues.apache.org/jira/browse/HIVE-26217 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > CTAS on transactional tables currently does a copy from staging location to > table location. This can be avoided by using Direct Insert semantics. Added > support for suffixed table locations as well. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26237) Check if replication cause metastore connection leakage.
[ https://issues.apache.org/jira/browse/HIVE-26237?focusedWorklogId=772367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772367 ] ASF GitHub Bot logged work on HIVE-26237: - Author: ASF GitHub Bot Created on: 19/May/22 10:40 Start Date: 19/May/22 10:40 Worklog Time Spent: 10m Work Description: hmangla98 commented on code in PR #3298: URL: https://github.com/apache/hive/pull/3298#discussion_r876899781 ## itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/BaseReplicationAcrossInstances.java: ## @@ -121,6 +122,7 @@ static void internalBeforeClassSetupExclusiveReplica(Map primary public static void classLevelTearDown() throws IOException { primary.close(); replica.close(); +Hive.getThreadLocal().close(true); Review Comment: Agreed. ## itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java: ## @@ -262,17 +263,20 @@ public static void tearDownAfterClass(){ // FIXME : should clean up TEST_PATH, but not doing it now, for debugging's sake //Clean up the warehouse after test run as we are restoring the warehouse path for other metastore creation Path warehousePath = new Path(MetastoreConf.getVar(hconf, MetastoreConf.ConfVars.WAREHOUSE)); -try { - warehousePath.getFileSystem(hconf).delete(warehousePath, true); -} catch (IOException e) { - -} Path warehousePathReplica = new Path(MetastoreConf.getVar(hconfMirror, MetastoreConf.ConfVars.WAREHOUSE)); try { + warehousePath.getFileSystem(hconf).delete(warehousePath, true); warehousePathReplica.getFileSystem(hconfMirror).delete(warehousePathReplica, true); } catch (IOException e) { } +Hive.getThreadLocal().close(true); Review Comment: Done Issue Time Tracking --- Worklog Id: (was: 772367) Time Spent: 2h 50m (was: 2h 40m) > Check if replication cause metastore connection leakage. > > > Key: HIVE-26237 > URL: https://issues.apache.org/jira/browse/HIVE-26237 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > It is observed that the after running replication unit tests, in some cases, > the final number of metastore connections is not logged as 0. > Sample test : TestReplicationScenarios.testBasic > The last entry of hive.log which records connection count is as follows: > INFO [main] metastore.HiveMetaStoreClient: Closed a connection to metastore, > current connections: 3 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26237) Check if replication cause metastore connection leakage.
[ https://issues.apache.org/jira/browse/HIVE-26237?focusedWorklogId=772364&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772364 ] ASF GitHub Bot logged work on HIVE-26237: - Author: ASF GitHub Bot Created on: 19/May/22 10:27 Start Date: 19/May/22 10:27 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3298: URL: https://github.com/apache/hive/pull/3298#discussion_r876888023 ## itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java: ## @@ -262,17 +263,20 @@ public static void tearDownAfterClass(){ // FIXME : should clean up TEST_PATH, but not doing it now, for debugging's sake //Clean up the warehouse after test run as we are restoring the warehouse path for other metastore creation Path warehousePath = new Path(MetastoreConf.getVar(hconf, MetastoreConf.ConfVars.WAREHOUSE)); -try { - warehousePath.getFileSystem(hconf).delete(warehousePath, true); -} catch (IOException e) { - -} Path warehousePathReplica = new Path(MetastoreConf.getVar(hconfMirror, MetastoreConf.ConfVars.WAREHOUSE)); try { + warehousePath.getFileSystem(hconf).delete(warehousePath, true); warehousePathReplica.getFileSystem(hconfMirror).delete(warehousePathReplica, true); } catch (IOException e) { } +Hive.getThreadLocal().close(true); Review Comment: same as above Issue Time Tracking --- Worklog Id: (was: 772364) Time Spent: 2h 40m (was: 2.5h) > Check if replication cause metastore connection leakage. > > > Key: HIVE-26237 > URL: https://issues.apache.org/jira/browse/HIVE-26237 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > It is observed that the after running replication unit tests, in some cases, > the final number of metastore connections is not logged as 0. > Sample test : TestReplicationScenarios.testBasic > The last entry of hive.log which records connection count is as follows: > INFO [main] metastore.HiveMetaStoreClient: Closed a connection to metastore, > current connections: 3 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26237) Check if replication cause metastore connection leakage.
[ https://issues.apache.org/jira/browse/HIVE-26237?focusedWorklogId=772363&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772363 ] ASF GitHub Bot logged work on HIVE-26237: - Author: ASF GitHub Bot Created on: 19/May/22 10:26 Start Date: 19/May/22 10:26 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3298: URL: https://github.com/apache/hive/pull/3298#discussion_r876887257 ## itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/BaseReplicationAcrossInstances.java: ## @@ -121,6 +122,7 @@ static void internalBeforeClassSetupExclusiveReplica(Map primary public static void classLevelTearDown() throws IOException { primary.close(); replica.close(); +Hive.getThreadLocal().close(true); Review Comment: Hive.closeCurrent() won't do here? If not Hive.getThreadLocal() might be null, so we should handle that. Issue Time Tracking --- Worklog Id: (was: 772363) Time Spent: 2.5h (was: 2h 20m) > Check if replication cause metastore connection leakage. > > > Key: HIVE-26237 > URL: https://issues.apache.org/jira/browse/HIVE-26237 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > It is observed that the after running replication unit tests, in some cases, > the final number of metastore connections is not logged as 0. > Sample test : TestReplicationScenarios.testBasic > The last entry of hive.log which records connection count is as follows: > INFO [main] metastore.HiveMetaStoreClient: Closed a connection to metastore, > current connections: 3 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26224) Add support for ESRI GeoSpatial SERDE formats
[ https://issues.apache.org/jira/browse/HIVE-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26224: -- Labels: pull-request-available (was: ) > Add support for ESRI GeoSpatial SERDE formats > - > > Key: HIVE-26224 > URL: https://issues.apache.org/jira/browse/HIVE-26224 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add support to use ESRI geospatial serde formats -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26224) Add support for ESRI GeoSpatial SERDE formats
[ https://issues.apache.org/jira/browse/HIVE-26224?focusedWorklogId=772354&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772354 ] ASF GitHub Bot logged work on HIVE-26224: - Author: ASF GitHub Bot Created on: 19/May/22 10:05 Start Date: 19/May/22 10:05 Worklog Time Spent: 10m Work Description: ayushtkn opened a new pull request, #3300: URL: https://github.com/apache/hive/pull/3300 added esri geospatial serde formats Issue Time Tracking --- Worklog Id: (was: 772354) Remaining Estimate: 0h Time Spent: 10m > Add support for ESRI GeoSpatial SERDE formats > - > > Key: HIVE-26224 > URL: https://issues.apache.org/jira/browse/HIVE-26224 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Add support to use ESRI geospatial serde formats -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26046) MySQL's bit datatype is default to void datatype in hive
[ https://issues.apache.org/jira/browse/HIVE-26046?focusedWorklogId=772330&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772330 ] ASF GitHub Bot logged work on HIVE-26046: - Author: ASF GitHub Bot Created on: 19/May/22 08:53 Start Date: 19/May/22 08:53 Worklog Time Spent: 10m Work Description: zhangbutao commented on code in PR #3276: URL: https://github.com/apache/hive/pull/3276#discussion_r876791141 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java: ## @@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) { // map any db specific types here. switch (dbDataType.toLowerCase()) { +case "bit": + return toHiveBitType(size); default: mappedType = ColumnType.VOID_TYPE_NAME; break; } return mappedType; } + + private String toHiveBitType(int size) { +if (size <= 1) { + return ColumnType.BOOLEAN_TYPE_NAME; +} else { + return ColumnType.BIGINT_TYPE_NAME; Review Comment: for bit-type with size <= 1 of mysql column, users usually use this column to express false or true, so it make sense to be recognized boolean type in hive. for bit-type with size >1 of mysql column, users may want to express different semantics, eg weekly working calendar, bit(7) , and in this case we can not simply convert bit(7) to boolean type. I convert this bit(n) to bigint in hive, eg **b'111000' will be read as 56**. in this pr, i added qtest to select bit datatype, but HIVE-26192 blocked me, i hope you can give advice about HIVE-26192。 > But also to be able to read this datatype, should we read as "select bin(col)" while reading from remote table? I think it is diffcult and not worth to read as "select bin(col)". Becaue if we want do this, we may need to mak type check when executing query and then convert column type to bit. Also, hive can not recognize bit type, and i think hive will compile failed with bit type before the query is submited to mysql. That's just my basic thought. In addtion, i have checked the presto and spark code. presto will convert all bit type to bolean. Spark have the similar ideas with this pr. I think spark idea is more reasonable. https://github.com/trinodb/trino/blob/543ae143cadeb47ab03af4197dae9d00ff5baf7c/plugin/trino-mysql/src/main/java/io/trino/plugin/mysql/MySqlClient.java#L377-L379 https://github.com/apache/spark/blob/7309e76d8b95e306d6f3d2f611316b748949e9cf/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala#L64-#L71 Issue Time Tracking --- Worklog Id: (was: 772330) Time Spent: 1h 20m (was: 1h 10m) > MySQL's bit datatype is default to void datatype in hive > > > Key: HIVE-26046 > URL: https://issues.apache.org/jira/browse/HIVE-26046 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: Naveen Gangam >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > describe on a table that contains a "bit" datatype gets mapped to void. We > need a explicit conversion logic in the MySQL ConnectorProvider to map it to > a suitable datatype in hive. > {noformat} > +---+---++ > | col_name| data_type > | comment | > +---+---++ > | tbl_id| bigint > | from deserializer | > | create_time | int > | from deserializer | > | db_id | bigint > | from deserializer | > | last_access_time | int > | from deserializer | > | owner | varchar(767) > | from deserializer | > | owner_type| varchar(10) > | from deserializer | > | retention
[jira] [Work logged] (HIVE-26046) MySQL's bit datatype is default to void datatype in hive
[ https://issues.apache.org/jira/browse/HIVE-26046?focusedWorklogId=772329&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772329 ] ASF GitHub Bot logged work on HIVE-26046: - Author: ASF GitHub Bot Created on: 19/May/22 08:49 Start Date: 19/May/22 08:49 Worklog Time Spent: 10m Work Description: zhangbutao commented on code in PR #3276: URL: https://github.com/apache/hive/pull/3276#discussion_r876791141 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java: ## @@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) { // map any db specific types here. switch (dbDataType.toLowerCase()) { +case "bit": + return toHiveBitType(size); default: mappedType = ColumnType.VOID_TYPE_NAME; break; } return mappedType; } + + private String toHiveBitType(int size) { +if (size <= 1) { + return ColumnType.BOOLEAN_TYPE_NAME; +} else { + return ColumnType.BIGINT_TYPE_NAME; Review Comment: for bit-type with size <= 1 of mysql column, users usually use this column to express false or true, so it make sense to be recognized boolean type in hive. for bit-type with size >1 of mysql column, users may want to express different semantics, eg weekly working calendar, bit(7) , and in this case we can not simply convert bit(7) to boolean type. I convert this bit(n) to bigint in hive, eg **b'111000' will be read as 56**. in this pr, i added qtest to select bit datatype, but HIVE-26192 blocked me, i hope you can give advice about HIVE-26192。 > But also to be able to read this datatype, should we read as "select bin(col)" while reading from remote table? I think it is diffcult and not worth to read as "select bin(col)". Becaue if we want do this, we may need to mak type check when executing query and then convert column type to bit. Also, hive can not recognize bit type, and i think hive will compile failed with bit type before the query is submited to mysql. That's just my basic thought. In addtion, i have checked the presto and spark code. presto will convert all bit type to bolean. Spark have the similar ideas with this pr. I think spark idea is more reasonable. https://github.com/trinodb/trino/blob/543ae143cadeb47ab03af4197dae9d00ff5baf7c/plugin/trino-mysql/src/main/java/io/trino/plugin/mysql/MySqlClient.java#L377-L379 https://github.com/apache/spark/blob/7309e76d8b95e306d6f3d2f611316b748949e9cf/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala#L64-#L71 Issue Time Tracking --- Worklog Id: (was: 772329) Time Spent: 1h 10m (was: 1h) > MySQL's bit datatype is default to void datatype in hive > > > Key: HIVE-26046 > URL: https://issues.apache.org/jira/browse/HIVE-26046 > Project: Hive > Issue Type: Sub-task > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: Naveen Gangam >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > describe on a table that contains a "bit" datatype gets mapped to void. We > need a explicit conversion logic in the MySQL ConnectorProvider to map it to > a suitable datatype in hive. > {noformat} > +---+---++ > | col_name| data_type > | comment | > +---+---++ > | tbl_id| bigint > | from deserializer | > | create_time | int > | from deserializer | > | db_id | bigint > | from deserializer | > | last_access_time | int > | from deserializer | > | owner | varchar(767) > | from deserializer | > | owner_type| varchar(10) > | from deserializer | > | retention | int