[jira] [Created] (HIVE-26545) hive.txn.manager changed

2022-09-18 Thread zengxl (Jira)
zengxl created HIVE-26545:
-

 Summary: hive.txn.manager changed
 Key: HIVE-26545
 URL: https://issues.apache.org/jira/browse/HIVE-26545
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 3.1.2
Reporter: zengxl


* All my clients and servers are set as follows
{code:java}

    hive.support.concurrency
    false
 
 
    hive.txn.manager
    org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager


    hive.compactor.initiator.on
    false
{code}

 * However, the following information is found on the Hiveserver2 server        
       2022-09-19T00:10:02,222 INFO  [09e6b2b3-4c12-40e8-98ed-417b2c790be9 
HiveServer2-Handler-Pool: Thread-92]: metastore.HiveMetaStoreClient (:()) - 
Mestastore configuration hive.txn.manager changed from 
org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager to 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

 * Why would you want to change this configuration?

 ** The following exception occurs due to the configuration change
{code:java}
2022-09-19T02:27:46,331 INFO  [394a5f2b-b657-4654-837b-482fa2bf947d 
HiveServer2-Handler-Pool: Thread-452]: session.SessionState (:()) - Resetting 
thread name to  HiveServer2-Handler-Pool: Thread-452
2022-09-19T02:27:50,256 ERROR [HiveServer2-Background-Pool: Thread-7336]: 
ql.Driver (:()) - FAILED: Error in acquiring locks: Error communicating with 
the metastore
org.apache.hadoop.hive.ql.lockmgr.LockException: Error communicating with the 
metastore
        at 
org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:178)
        at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:607)
        at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:623)
        at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:276)
        at 
org.apache.hadoop.hive.ql.lockmgr.HiveTxnManagerImpl.acquireLocks(HiveTxnManagerImpl.java:76)
        at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:93)
        at org.apache.hadoop.hive.ql.Driver.acquireLocks(Driver.java:1610)
        at org.apache.hadoop.hive.ql.Driver.lockAndRespond(Driver.java:1795)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1965)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.TApplicationException: Internal error processing 
lock
        at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_lock(ThriftHiveMetastore.java:5299)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.lock(ThriftHiveMetastore.java:5286)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2568)
        at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
        at com.sun.proxy.$Proxy40.lock(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2773)
        at com.sun.proxy.$Proxy40.lock(Unknown 

[jira] [Created] (HIVE-25863) join result is null

2022-01-12 Thread zengxl (Jira)
zengxl created HIVE-25863:
-

 Summary: join result is null
 Key: HIVE-25863
 URL: https://issues.apache.org/jira/browse/HIVE-25863
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.2
 Environment: hadoop 3.2.1

hive 3.1.2
Reporter: zengxl


When I change the number of Reduce, the query result will change.Either inner 
join or left join will appear.Partial join results for the third table are 
null.When there is only one Reduce, the results are all correct.

when set hive.exec.reducers.bytes.per.reducer=256000 only one reduce;

when set hive.exec.reducers.bytes.per.reducer=2560 has four reduce

Here is my SQL and data
{code:java}
CREATE TABLE pdwd.hive_ah3_metastore_tbl_partitions_sds_test_20220112(
  tbl_id bigint COMMENT 'TBL_ID', 
  tbl_create_time bigint COMMENT 'TBL_CREATE_TIME', 
  db_id bigint COMMENT 'DB_ID', 
  tbl_last_access_time bigint COMMENT 'TBL_LAST_ACCESS_TIME', 
  owner string COMMENT 'OWNER', 
  retention bigint COMMENT 'RETENTION', 
  sd_id bigint COMMENT 'SD_ID', 
  tbl_name string COMMENT 'TBL_NAME', 
  tbl_type string COMMENT 'TBL_TYPE', 
  view_expanded_text string COMMENT 'VIEW_EXPANDED_TEXT', 
  view_original_text string COMMENT 'VIEW_ORIGINAL_TEXT', 
  is_rewrite_enabled bigint COMMENT 'IS_REWRITE_ENABLED', 
  tbl_owner_type string COMMENT 'TBL_OWNER_TYPE', 
  cd_id bigint COMMENT 'CD_ID', 
  input_format string COMMENT 'INPUT_FORMAT', 
  is_compressed bigint COMMENT 'IS_COMPRESSED', 
  is_storedassubdirectories bigint COMMENT 'IS_STOREDASSUBDIRECTORIES', 
  tbl_or_part_location string COMMENT 'tbl_or_part_location', 
  num_buckets bigint COMMENT 'NUM_BUCKETS', 
  output_format string COMMENT 'OUTPUT_FORMAT', 
  serde_id bigint COMMENT 'SERDE_ID', 
  part_id bigint COMMENT 'PART_ID', 
  part_create_time bigint COMMENT 'PART_CREATE_TIME', 
  part_last_access_time bigint COMMENT 'PART_LAST_ACCESS_TIME', 
  part_name string COMMENT 'PART_NAME')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
  
CREATE TABLE pdwd.test_partitions_2021_12_21_shuffle_1(
  part_id bigint, 
  create_time bigint, 
  last_access_time bigint, 
  part_name string, 
  sd_id bigint, 
  tbl_id bigint)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
  
CREATE TABLE pdwd.test_tbls_2021_12_21_shuffle(
  tbl_id bigint, 
  create_time bigint, 
  db_id bigint, 
  last_access_time bigint, 
  owner string, 
  retention bigint, 
  sd_id bigint, 
  tbl_name string, 
  tbl_type string, 
  view_expanded_text string, 
  view_original_text string, 
  is_rewrite_enabled bigint, 
  owner_type string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
  
CREATE TABLE pdwd.test_sds_2021_12_21_shuffle_1_new(
  sd_id bigint, 
  cd_id bigint, 
  input_format string, 
  is_compressed bigint, 
  is_storedassubdirectories bigint, 
  _c5 string, 
  num_buckets bigint, 
  output_format string, 
  serde_id bigint)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';


set hive.stats.column.autogather=false;
set hive.exec.reducers.bytes.per.reducer=2560;
set hive.auto.convert.join=false;  
insert overwrite table pdwd.hive_ah3_metastore_tbl_partitions_sds_test_20220112
select
a.tbl_id,
b.create_time as tbl_create_time,
b.db_id,
b.last_access_time as tbl_last_access_time,
b.owner,
b.retention,
a.sd_id,
b.tbl_name,
b.tbl_type,
b.view_expanded_text,
b.view_original_text,
b.is_rewrite_enabled,
b.owner_type as tbl_owner_type,
d.cd_id,
d.input_format,
d.is_compressed,
d.is_storedassubdirectories,
d.tbl_location,
d.num_buckets,
d.output_format,
d.serde_id,
a.part_id,
a.create_time as part_create_time,
a.last_access_time as part_last_access_time,
a.part_name
from pdwd.test_partitions_2021_12_21_shuffle_1 a
left join pdwd.test_tbls_2021_12_21_shuffle b on a.tbl_id=b.tbl_id
left join pdwd.test_sds_2021_12_21_shuffle_1_new d on a.sd_id=d.sd_id; {code}
Execution error result:
{code:java}
select count(*) from pdwd.hive_ah3_metastore_tbl_partitions_sds_test_20220112 
where output_format is  null;
Query ID = op_20220112153733_5c3793c8-c0e0-4dda-9212-239b5bd66f19
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum

[jira] [Created] (HIVE-25614) Mapjoin then join left,the result is incorrect

2021-10-13 Thread zengxl (Jira)
zengxl created HIVE-25614:
-

 Summary: Mapjoin then join left,the result is incorrect
 Key: HIVE-25614
 URL: https://issues.apache.org/jira/browse/HIVE-25614
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.2
Reporter: zengxl


Currently I join 3 tables, find the result of left join is 
*{color:#de350b}null{color}*
Here is my SQL,The result of this SQL is NULL
{code:java}
//代码占位符
CREATE TABLE `pdwd.pdwd_pf_hive_ah3_metastore_tbl_partitions_d_test_0831_3_tbl`(
  `tbl_id` bigint COMMENT 'TBL_ID', 
  `tbl_create_time` bigint COMMENT 'TBL_CREATE_TIME', 
  `db_id` bigint COMMENT 'DB_ID', 
  `tbl_last_access_time` bigint COMMENT 'TBL_LAST_ACCESS_TIME', 
  `owner` string COMMENT 'OWNER', 
  `retention` bigint COMMENT 'RETENTION', 
  `sd_id` bigint COMMENT 'SD_ID', 
  `tbl_name` string COMMENT 'TBL_NAME', 
  `tbl_type` string COMMENT 'TBL_TYPE', 
  `view_expanded_text` string COMMENT 'VIEW_EXPANDED_TEXT', 
  `view_original_text` string COMMENT 'VIEW_ORIGINAL_TEXT', 
  `is_rewrite_enabled` bigint COMMENT 'IS_REWRITE_ENABLED', 
  `tbl_owner_type` string COMMENT 'TBL_OWNER_TYPE', 
  `cd_id` bigint COMMENT 'CD_ID', 
  `input_format` string COMMENT 'INPUT_FORMAT', 
  `is_compressed` bigint COMMENT 'IS_COMPRESSED', 
  `is_storedassubdirectories` bigint COMMENT 'IS_STOREDASSUBDIRECTORIES', 
  `tbl_or_part_location` string COMMENT 'tbl_or_part_location', 
  `num_buckets` bigint COMMENT 'NUM_BUCKETS', 
  `output_format` string COMMENT 'OUTPUT_FORMAT', 
  `serde_id` bigint COMMENT 'SERDE_ID',
  `part_id` bigint COMMENT 'PART_ID', 
  `part_create_time` bigint COMMENT 'PART_CREATE_TIME', 
  `part_last_access_time` bigint COMMENT 'PART_LAST_ACCESS_TIME', 
  `part_name` string COMMENT 'PART_NAME')
PARTITIONED BY ( 
  `pt` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
  
  with tmp1 as (
select
part_id,
create_time,
last_access_time,
part_name,
sd_id,
tbl_id
from
pods.pods_pf_hive_ah3_metastore_partitions_d
where
pt='2021-08-12'
),
tmp2 as (
select
tbl_id,
create_time,
db_id,
last_access_time,
owner,
retention,
sd_id,
tbl_name,
tbl_type,
view_expanded_text,
view_original_text,
is_rewrite_enabled,
owner_type
from
pods.pods_pf_hive_ah3_metastore_tbls_d
where
pt='2021-08-12'
),
tmp3 as (
select
sd_id,
cd_id,
input_format,
is_compressed,
is_storedassubdirectories,
location,
num_buckets,
output_format,
serde_id
from
pods.pods_pf_hive_ah3_metastore_sds_d
where
pt='2021-08-12'
)insert overwrite table 
pdwd.pdwd_pf_hive_ah3_metastore_tbl_partitions_d_test_0831_3_tbl 
PARTITION(pt='2021-08-14')
select
a.tbl_id,
b.create_time as tbl_create_time,
b.db_id,
b.last_access_time as tbl_last_access_time,
b.owner,
b.retention,
a.sd_id,
b.tbl_name,
b.tbl_type,
b.view_expanded_text,
b.view_original_text,
b.is_rewrite_enabled,
b.owner_type as tbl_owner_type,
d.cd_id,
d.input_format,
d.is_compressed,
d.is_storedassubdirectories,
d.location as tbl_location,
d.num_buckets,
d.output_format,
d.serde_id,
a.part_id,
a.create_time as part_create_time,
a.last_access_time as part_last_access_time,
a.part_name
from tmp1 a 
left join tmp2 b on a.tbl_id=b.tbl_id
left join tmp3 d on a.sd_id=d.sd_id;
{code}
pods.pods_pf_hive_ah3_metastore_partitions_d、pods.pods_pf_hive_ah3_metastore_tbls_d、pods.pods_pf_hive_ah3_metastore_sds_d
  from {color:#de350b}*Metastore*{color} partitions、tbls、sds

The sizes of the three tables are as follows:

80.3 M 240.9 M 
hdfs://ctyunns4/apps/hive/warehouse/pods/pods_pf_hive_ah3_metastore_partitions_d/pt=2021-10-09/exchangis_hive_w__2585cbd4_8bf8_4fbb_8a90_f5a7939b62b3.snappy
179.8 K 539.5 K 
hdfs://ctyunns4/apps/hive/warehouse/pods/pods_pf_hive_ah3_metastore_tbls_d/pt=2021-10-09/exchangis_hive_w__8a62acaa_6f82_442e_97db_ce960833612f.snappy
94.3 M 282.9 M 
hdfs://ctyunns4/apps/hive/warehouse/pods/pods_pf_hive_ah3_metastore_sds_d/pt=2021-10-09/exchangis_hive_w__d25536e8_7018_4262_a00e_5af3b1f88925.snappy

The result is as follows,select from pods.pods_pf_hive_ah3_metastore_sds_d 
table *{color:#de350b}is null{color}*
{code:java}
hive> select * from 
pdwd.pdwd_pf_hive_ah3_metastore_tbl_partitions_d_test_0831_3_tbl where 
pt='2021-08-14' and sd_id=21229815;
OK
721213 1619403652 528181 0 op 0 21229815 pods_pf_yarn_logic_res_i_d 
MANAGED_TABLE NULL NULL 0 USER NULL NULL NULL NULL NULL NULL NULL NULL 20302818 
1628697610 0 pt=2021-08-11 2021-08-14
{code}
The reality pods.pods_pf_hive_ah3_metastore_sds_d table is that the data in 
this table is {color:#de350b}*not null*{color}
{code:java}
> select * from pods.pods_pf_hive_ah3_metastore_sds_d where pt='2021-08-12' and 
> sd_id=21229815;
OK
Interrupting... 
Be patient, this might take some time.
Press Ctrl+C again to kill JVM
21229815 726105 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat 

[jira] [Created] (HIVE-25559) to_unix_timestamp udf result incorrect

2021-09-26 Thread zengxl (Jira)
zengxl created HIVE-25559:
-

 Summary: to_unix_timestamp udf result incorrect
 Key: HIVE-25559
 URL: https://issues.apache.org/jira/browse/HIVE-25559
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 3.1.2
Reporter: zengxl
Assignee: zengxl


when I use *unix_timestamp* udf,What this function actually calls is 
*to_unix_timestamp* udf.This return result is incorrect.Here is my SQL:
{code:java}
//代码占位符
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/local/hive/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/local/hadoop-3.2.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 3a04a9cf-1fdb-4017-a4bb-14763a3163c7Logging initialized using 
configuration in file:/usr/local/hive/conf/hive-log4j2.properties Async: true
Hive Session ID = 92ca916b-cfde-43b5-bd86-10d50ff7d861
Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
versions. Consider using a different execution engine (i.e. spark, tez) or 
using Hive 1.X releases.
hive> select unix_timestamp('2021-09-24 00:00:00');
OK
1632441600
Time taken: 3.729 seconds, Fetched: 1 row(s)
{code}
We see GenericUDFToUnixTimeStamp class code,I found that the fixed time zone is 
set {color:#de350b}UTC{color}, not according to the user time zone.Time zones 
vary with users,My time zone is {color:#de350b}Asia/Shanghai{color} .Therefore, 
the function should use the user time zone Here is the code I modified   
{code:java}
//代码占位符
SessionState ss = SessionState.get(); String timeZoneStr = 
ss.getConf().get("hive.local.time.zone"); if (timeZoneStr == null || 
timeZoneStr.trim().isEmpty() || timeZoneStr.toLowerCase().equals("local")) { 
timeZoneStr = System.getProperty("user.timezone"); } 
formatter.setTimeZone(TimeZone.getTimeZone(timeZoneStr));
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24326) HiveServer memory leak

2020-10-29 Thread zengxl (Jira)
zengxl created HIVE-24326:
-

 Summary: HiveServer memory leak
 Key: HIVE-24326
 URL: https://issues.apache.org/jira/browse/HIVE-24326
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 1.1.0
Reporter: zengxl
 Attachments: QQ图片20201029160447.png

After a while, the hiveserver we produce will fill up with JVMS, resulting in 
unresponsive hiveservers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-21452) Loss of query condition when exist exists

2019-03-15 Thread zengxl (JIRA)
zengxl created HIVE-21452:
-

 Summary: Loss of query condition when exist exists
 Key: HIVE-21452
 URL: https://issues.apache.org/jira/browse/HIVE-21452
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: zengxl


In our production environment, there are four tables to do association queries. 
There are exists in the conditions, and we found that the first two conditions 
were lost.

The following two conditions are missing:

{color:#f79232}t2.cust_no is null and t4.cust_level not in ('4','5'){color}
 
In the test environment, I prepared the data of the following four tables, and 
a condition was lost in the simulation.

tables:

test_table1
cust_no,name
60001,lisa
60002,tina
60003,kylin
60004,jeny
60005,john
60006,jamse

test_table2
cust_no,acct_type
60001,1
60001,1
60001,2
60002,1
60003,2
60003,3

test_table3
cust_no
60001
60002
60003
60004
60005
60007

test_table4
cust_no,cust_level
60001,1
60002,2
60003,3
60004,4
60005,5

 

create table tmp.test_table1(cust_no string,name string);
create table tmp.test_table2(cust_no string,acct_type string);
create table tmp.test_table3(cust_no string);
create table tmp.test_table4(cust_no string,cust_level string);

insert into tmp.test_table1 select '60001','lisa';
insert into tmp.test_table1 select '60002','tina';
insert into tmp.test_table1 select '60003','kylin';
insert into tmp.test_table1 select '60004','jeny';
insert into tmp.test_table1 select '60005','john';
insert into tmp.test_table1 select '60006','jamse';

insert into tmp.test_table2 select '60001','1';
insert into tmp.test_table2 select '60001','1';
insert into tmp.test_table2 select '60001','2';
insert into tmp.test_table2 select '60002','1';
insert into tmp.test_table2 select '60003','2';
insert into tmp.test_table2 select '60002','3';

insert into tmp.test_table3 select '60001';
insert into tmp.test_table3 select '60002';
insert into tmp.test_table3 select '60003';
insert into tmp.test_table3 select '60004';
insert into tmp.test_table3 select '60005';
insert into tmp.test_table3 select '60007';

insert into tmp.test_table4 select '60001','1';
insert into tmp.test_table4 select '60002','2';
insert into tmp.test_table4 select '60003','3';
insert into tmp.test_table4 select '60004','4';
insert into tmp.test_table4 select '60005','5';
 
Here is my query SQL And shut down mapjoin:

set hive.auto.convert.join=false;

select t1.cust_no as cust_no,t2.cust_no as custNO,t1.name from tmp.test_table1 
t1
left join tmp.test_table2 t2 on t1.cust_no=t2.cust_no
and t2.acct_type='1'
left join tmp.test_table4 t4 on t1.cust_no=t4.cust_no
where t2.cust_no is null and t4.cust_level not in ('4','5') and exists (select 
1 from tmp.test_table3 t3 where t1.cust_no=t3.cust_no)

 

All I want is to include cust_no for 6003,But the result is inclusive 6004 and 
6005,this wrong 。{color:#f79232}In my production environment, 6001 came 
out。Loss of condition because cust_no is  null。{color}

{color:#f6c342}View the execution plan, t4.cust_level not in ('4','5') 
condition missing{color}

*result:*

60003 NULL kylin
60003 NULL kylin
60003 NULL kylin
60004 NULL jeny
60005 NULL john

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)