[jira] [Created] (HIVE-26545) hive.txn.manager changed
zengxl created HIVE-26545: - Summary: hive.txn.manager changed Key: HIVE-26545 URL: https://issues.apache.org/jira/browse/HIVE-26545 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 3.1.2 Reporter: zengxl * All my clients and servers are set as follows {code:java} hive.support.concurrency false hive.txn.manager org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager hive.compactor.initiator.on false {code} * However, the following information is found on the Hiveserver2 server 2022-09-19T00:10:02,222 INFO [09e6b2b3-4c12-40e8-98ed-417b2c790be9 HiveServer2-Handler-Pool: Thread-92]: metastore.HiveMetaStoreClient (:()) - Mestastore configuration hive.txn.manager changed from org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager * Why would you want to change this configuration? ** The following exception occurs due to the configuration change {code:java} 2022-09-19T02:27:46,331 INFO [394a5f2b-b657-4654-837b-482fa2bf947d HiveServer2-Handler-Pool: Thread-452]: session.SessionState (:()) - Resetting thread name to HiveServer2-Handler-Pool: Thread-452 2022-09-19T02:27:50,256 ERROR [HiveServer2-Background-Pool: Thread-7336]: ql.Driver (:()) - FAILED: Error in acquiring locks: Error communicating with the metastore org.apache.hadoop.hive.ql.lockmgr.LockException: Error communicating with the metastore at org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:178) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:607) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:623) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:276) at org.apache.hadoop.hive.ql.lockmgr.HiveTxnManagerImpl.acquireLocks(HiveTxnManagerImpl.java:76) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:93) at org.apache.hadoop.hive.ql.Driver.acquireLocks(Driver.java:1610) at org.apache.hadoop.hive.ql.Driver.lockAndRespond(Driver.java:1795) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1965) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224) at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.thrift.TApplicationException: Internal error processing lock at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_lock(ThriftHiveMetastore.java:5299) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.lock(ThriftHiveMetastore.java:5286) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2568) at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212) at com.sun.proxy.$Proxy40.lock(Unknown Source) at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2773) at com.sun.proxy.$Proxy40.lock(Unknown
[jira] [Created] (HIVE-25863) join result is null
zengxl created HIVE-25863: - Summary: join result is null Key: HIVE-25863 URL: https://issues.apache.org/jira/browse/HIVE-25863 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.2 Environment: hadoop 3.2.1 hive 3.1.2 Reporter: zengxl When I change the number of Reduce, the query result will change.Either inner join or left join will appear.Partial join results for the third table are null.When there is only one Reduce, the results are all correct. when set hive.exec.reducers.bytes.per.reducer=256000 only one reduce; when set hive.exec.reducers.bytes.per.reducer=2560 has four reduce Here is my SQL and data {code:java} CREATE TABLE pdwd.hive_ah3_metastore_tbl_partitions_sds_test_20220112( tbl_id bigint COMMENT 'TBL_ID', tbl_create_time bigint COMMENT 'TBL_CREATE_TIME', db_id bigint COMMENT 'DB_ID', tbl_last_access_time bigint COMMENT 'TBL_LAST_ACCESS_TIME', owner string COMMENT 'OWNER', retention bigint COMMENT 'RETENTION', sd_id bigint COMMENT 'SD_ID', tbl_name string COMMENT 'TBL_NAME', tbl_type string COMMENT 'TBL_TYPE', view_expanded_text string COMMENT 'VIEW_EXPANDED_TEXT', view_original_text string COMMENT 'VIEW_ORIGINAL_TEXT', is_rewrite_enabled bigint COMMENT 'IS_REWRITE_ENABLED', tbl_owner_type string COMMENT 'TBL_OWNER_TYPE', cd_id bigint COMMENT 'CD_ID', input_format string COMMENT 'INPUT_FORMAT', is_compressed bigint COMMENT 'IS_COMPRESSED', is_storedassubdirectories bigint COMMENT 'IS_STOREDASSUBDIRECTORIES', tbl_or_part_location string COMMENT 'tbl_or_part_location', num_buckets bigint COMMENT 'NUM_BUCKETS', output_format string COMMENT 'OUTPUT_FORMAT', serde_id bigint COMMENT 'SERDE_ID', part_id bigint COMMENT 'PART_ID', part_create_time bigint COMMENT 'PART_CREATE_TIME', part_last_access_time bigint COMMENT 'PART_LAST_ACCESS_TIME', part_name string COMMENT 'PART_NAME') ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; CREATE TABLE pdwd.test_partitions_2021_12_21_shuffle_1( part_id bigint, create_time bigint, last_access_time bigint, part_name string, sd_id bigint, tbl_id bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; CREATE TABLE pdwd.test_tbls_2021_12_21_shuffle( tbl_id bigint, create_time bigint, db_id bigint, last_access_time bigint, owner string, retention bigint, sd_id bigint, tbl_name string, tbl_type string, view_expanded_text string, view_original_text string, is_rewrite_enabled bigint, owner_type string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; CREATE TABLE pdwd.test_sds_2021_12_21_shuffle_1_new( sd_id bigint, cd_id bigint, input_format string, is_compressed bigint, is_storedassubdirectories bigint, _c5 string, num_buckets bigint, output_format string, serde_id bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; set hive.stats.column.autogather=false; set hive.exec.reducers.bytes.per.reducer=2560; set hive.auto.convert.join=false; insert overwrite table pdwd.hive_ah3_metastore_tbl_partitions_sds_test_20220112 select a.tbl_id, b.create_time as tbl_create_time, b.db_id, b.last_access_time as tbl_last_access_time, b.owner, b.retention, a.sd_id, b.tbl_name, b.tbl_type, b.view_expanded_text, b.view_original_text, b.is_rewrite_enabled, b.owner_type as tbl_owner_type, d.cd_id, d.input_format, d.is_compressed, d.is_storedassubdirectories, d.tbl_location, d.num_buckets, d.output_format, d.serde_id, a.part_id, a.create_time as part_create_time, a.last_access_time as part_last_access_time, a.part_name from pdwd.test_partitions_2021_12_21_shuffle_1 a left join pdwd.test_tbls_2021_12_21_shuffle b on a.tbl_id=b.tbl_id left join pdwd.test_sds_2021_12_21_shuffle_1_new d on a.sd_id=d.sd_id; {code} Execution error result: {code:java} select count(*) from pdwd.hive_ah3_metastore_tbl_partitions_sds_test_20220112 where output_format is null; Query ID = op_20220112153733_5c3793c8-c0e0-4dda-9212-239b5bd66f19 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum
[jira] [Created] (HIVE-25614) Mapjoin then join left,the result is incorrect
zengxl created HIVE-25614: - Summary: Mapjoin then join left,the result is incorrect Key: HIVE-25614 URL: https://issues.apache.org/jira/browse/HIVE-25614 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.2 Reporter: zengxl Currently I join 3 tables, find the result of left join is *{color:#de350b}null{color}* Here is my SQL,The result of this SQL is NULL {code:java} //代码占位符 CREATE TABLE `pdwd.pdwd_pf_hive_ah3_metastore_tbl_partitions_d_test_0831_3_tbl`( `tbl_id` bigint COMMENT 'TBL_ID', `tbl_create_time` bigint COMMENT 'TBL_CREATE_TIME', `db_id` bigint COMMENT 'DB_ID', `tbl_last_access_time` bigint COMMENT 'TBL_LAST_ACCESS_TIME', `owner` string COMMENT 'OWNER', `retention` bigint COMMENT 'RETENTION', `sd_id` bigint COMMENT 'SD_ID', `tbl_name` string COMMENT 'TBL_NAME', `tbl_type` string COMMENT 'TBL_TYPE', `view_expanded_text` string COMMENT 'VIEW_EXPANDED_TEXT', `view_original_text` string COMMENT 'VIEW_ORIGINAL_TEXT', `is_rewrite_enabled` bigint COMMENT 'IS_REWRITE_ENABLED', `tbl_owner_type` string COMMENT 'TBL_OWNER_TYPE', `cd_id` bigint COMMENT 'CD_ID', `input_format` string COMMENT 'INPUT_FORMAT', `is_compressed` bigint COMMENT 'IS_COMPRESSED', `is_storedassubdirectories` bigint COMMENT 'IS_STOREDASSUBDIRECTORIES', `tbl_or_part_location` string COMMENT 'tbl_or_part_location', `num_buckets` bigint COMMENT 'NUM_BUCKETS', `output_format` string COMMENT 'OUTPUT_FORMAT', `serde_id` bigint COMMENT 'SERDE_ID', `part_id` bigint COMMENT 'PART_ID', `part_create_time` bigint COMMENT 'PART_CREATE_TIME', `part_last_access_time` bigint COMMENT 'PART_LAST_ACCESS_TIME', `part_name` string COMMENT 'PART_NAME') PARTITIONED BY ( `pt` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' with tmp1 as ( select part_id, create_time, last_access_time, part_name, sd_id, tbl_id from pods.pods_pf_hive_ah3_metastore_partitions_d where pt='2021-08-12' ), tmp2 as ( select tbl_id, create_time, db_id, last_access_time, owner, retention, sd_id, tbl_name, tbl_type, view_expanded_text, view_original_text, is_rewrite_enabled, owner_type from pods.pods_pf_hive_ah3_metastore_tbls_d where pt='2021-08-12' ), tmp3 as ( select sd_id, cd_id, input_format, is_compressed, is_storedassubdirectories, location, num_buckets, output_format, serde_id from pods.pods_pf_hive_ah3_metastore_sds_d where pt='2021-08-12' )insert overwrite table pdwd.pdwd_pf_hive_ah3_metastore_tbl_partitions_d_test_0831_3_tbl PARTITION(pt='2021-08-14') select a.tbl_id, b.create_time as tbl_create_time, b.db_id, b.last_access_time as tbl_last_access_time, b.owner, b.retention, a.sd_id, b.tbl_name, b.tbl_type, b.view_expanded_text, b.view_original_text, b.is_rewrite_enabled, b.owner_type as tbl_owner_type, d.cd_id, d.input_format, d.is_compressed, d.is_storedassubdirectories, d.location as tbl_location, d.num_buckets, d.output_format, d.serde_id, a.part_id, a.create_time as part_create_time, a.last_access_time as part_last_access_time, a.part_name from tmp1 a left join tmp2 b on a.tbl_id=b.tbl_id left join tmp3 d on a.sd_id=d.sd_id; {code} pods.pods_pf_hive_ah3_metastore_partitions_d、pods.pods_pf_hive_ah3_metastore_tbls_d、pods.pods_pf_hive_ah3_metastore_sds_d from {color:#de350b}*Metastore*{color} partitions、tbls、sds The sizes of the three tables are as follows: 80.3 M 240.9 M hdfs://ctyunns4/apps/hive/warehouse/pods/pods_pf_hive_ah3_metastore_partitions_d/pt=2021-10-09/exchangis_hive_w__2585cbd4_8bf8_4fbb_8a90_f5a7939b62b3.snappy 179.8 K 539.5 K hdfs://ctyunns4/apps/hive/warehouse/pods/pods_pf_hive_ah3_metastore_tbls_d/pt=2021-10-09/exchangis_hive_w__8a62acaa_6f82_442e_97db_ce960833612f.snappy 94.3 M 282.9 M hdfs://ctyunns4/apps/hive/warehouse/pods/pods_pf_hive_ah3_metastore_sds_d/pt=2021-10-09/exchangis_hive_w__d25536e8_7018_4262_a00e_5af3b1f88925.snappy The result is as follows,select from pods.pods_pf_hive_ah3_metastore_sds_d table *{color:#de350b}is null{color}* {code:java} hive> select * from pdwd.pdwd_pf_hive_ah3_metastore_tbl_partitions_d_test_0831_3_tbl where pt='2021-08-14' and sd_id=21229815; OK 721213 1619403652 528181 0 op 0 21229815 pods_pf_yarn_logic_res_i_d MANAGED_TABLE NULL NULL 0 USER NULL NULL NULL NULL NULL NULL NULL NULL 20302818 1628697610 0 pt=2021-08-11 2021-08-14 {code} The reality pods.pods_pf_hive_ah3_metastore_sds_d table is that the data in this table is {color:#de350b}*not null*{color} {code:java} > select * from pods.pods_pf_hive_ah3_metastore_sds_d where pt='2021-08-12' and > sd_id=21229815; OK Interrupting... Be patient, this might take some time. Press Ctrl+C again to kill JVM 21229815 726105 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
[jira] [Created] (HIVE-25559) to_unix_timestamp udf result incorrect
zengxl created HIVE-25559: - Summary: to_unix_timestamp udf result incorrect Key: HIVE-25559 URL: https://issues.apache.org/jira/browse/HIVE-25559 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 3.1.2 Reporter: zengxl Assignee: zengxl when I use *unix_timestamp* udf,What this function actually calls is *to_unix_timestamp* udf.This return result is incorrect.Here is my SQL: {code:java} //代码占位符 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hive/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.2.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 3a04a9cf-1fdb-4017-a4bb-14763a3163c7Logging initialized using configuration in file:/usr/local/hive/conf/hive-log4j2.properties Async: true Hive Session ID = 92ca916b-cfde-43b5-bd86-10d50ff7d861 Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> select unix_timestamp('2021-09-24 00:00:00'); OK 1632441600 Time taken: 3.729 seconds, Fetched: 1 row(s) {code} We see GenericUDFToUnixTimeStamp class code,I found that the fixed time zone is set {color:#de350b}UTC{color}, not according to the user time zone.Time zones vary with users,My time zone is {color:#de350b}Asia/Shanghai{color} .Therefore, the function should use the user time zone Here is the code I modified {code:java} //代码占位符 SessionState ss = SessionState.get(); String timeZoneStr = ss.getConf().get("hive.local.time.zone"); if (timeZoneStr == null || timeZoneStr.trim().isEmpty() || timeZoneStr.toLowerCase().equals("local")) { timeZoneStr = System.getProperty("user.timezone"); } formatter.setTimeZone(TimeZone.getTimeZone(timeZoneStr)); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24326) HiveServer memory leak
zengxl created HIVE-24326: - Summary: HiveServer memory leak Key: HIVE-24326 URL: https://issues.apache.org/jira/browse/HIVE-24326 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 1.1.0 Reporter: zengxl Attachments: QQ图片20201029160447.png After a while, the hiveserver we produce will fill up with JVMS, resulting in unresponsive hiveservers -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-21452) Loss of query condition when exist exists
zengxl created HIVE-21452: - Summary: Loss of query condition when exist exists Key: HIVE-21452 URL: https://issues.apache.org/jira/browse/HIVE-21452 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: zengxl In our production environment, there are four tables to do association queries. There are exists in the conditions, and we found that the first two conditions were lost. The following two conditions are missing: {color:#f79232}t2.cust_no is null and t4.cust_level not in ('4','5'){color} In the test environment, I prepared the data of the following four tables, and a condition was lost in the simulation. tables: test_table1 cust_no,name 60001,lisa 60002,tina 60003,kylin 60004,jeny 60005,john 60006,jamse test_table2 cust_no,acct_type 60001,1 60001,1 60001,2 60002,1 60003,2 60003,3 test_table3 cust_no 60001 60002 60003 60004 60005 60007 test_table4 cust_no,cust_level 60001,1 60002,2 60003,3 60004,4 60005,5 create table tmp.test_table1(cust_no string,name string); create table tmp.test_table2(cust_no string,acct_type string); create table tmp.test_table3(cust_no string); create table tmp.test_table4(cust_no string,cust_level string); insert into tmp.test_table1 select '60001','lisa'; insert into tmp.test_table1 select '60002','tina'; insert into tmp.test_table1 select '60003','kylin'; insert into tmp.test_table1 select '60004','jeny'; insert into tmp.test_table1 select '60005','john'; insert into tmp.test_table1 select '60006','jamse'; insert into tmp.test_table2 select '60001','1'; insert into tmp.test_table2 select '60001','1'; insert into tmp.test_table2 select '60001','2'; insert into tmp.test_table2 select '60002','1'; insert into tmp.test_table2 select '60003','2'; insert into tmp.test_table2 select '60002','3'; insert into tmp.test_table3 select '60001'; insert into tmp.test_table3 select '60002'; insert into tmp.test_table3 select '60003'; insert into tmp.test_table3 select '60004'; insert into tmp.test_table3 select '60005'; insert into tmp.test_table3 select '60007'; insert into tmp.test_table4 select '60001','1'; insert into tmp.test_table4 select '60002','2'; insert into tmp.test_table4 select '60003','3'; insert into tmp.test_table4 select '60004','4'; insert into tmp.test_table4 select '60005','5'; Here is my query SQL And shut down mapjoin: set hive.auto.convert.join=false; select t1.cust_no as cust_no,t2.cust_no as custNO,t1.name from tmp.test_table1 t1 left join tmp.test_table2 t2 on t1.cust_no=t2.cust_no and t2.acct_type='1' left join tmp.test_table4 t4 on t1.cust_no=t4.cust_no where t2.cust_no is null and t4.cust_level not in ('4','5') and exists (select 1 from tmp.test_table3 t3 where t1.cust_no=t3.cust_no) All I want is to include cust_no for 6003,But the result is inclusive 6004 and 6005,this wrong 。{color:#f79232}In my production environment, 6001 came out。Loss of condition because cust_no is null。{color} {color:#f6c342}View the execution plan, t4.cust_level not in ('4','5') condition missing{color} *result:* 60003 NULL kylin 60003 NULL kylin 60003 NULL kylin 60004 NULL jeny 60005 NULL john -- This message was sent by Atlassian JIRA (v7.6.3#76005)