[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
LEMBARKI Mohamed Amine updated HIVE-21367: ------------------------------------------ Priority: Blocker (was: Major) > Hive returns an incorrect result when using a simple select query > ----------------------------------------------------------------- > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL > Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 > Reporter: LEMBARKI Mohamed Amine > Priority: Blocker > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-----------------------------------------* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-------------------------------------------------* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:10000 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +------------------+ > | code | > +------------------+ > +------------------+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-----------------------------------------* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-------------------------------------------------* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:10000 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e): > select count(code) from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, > type:bigint, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e); > Time taken: 0.166 seconds > INFO : Executing > command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e): > select count(code) from db1.tbl1 where code = '123' > INFO : Query ID = hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e > INFO : Total jobs = 1 > INFO : Launching Job 1 out of 1 > INFO : Starting task [Stage-1:MAPRED] in serial mode > INFO : Subscribed to counters: [] for queryId: > hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e > INFO : Tez session hasn't been created yet. Opening session > INFO : Dag name: select count(code) f...'123' (Stage-1) > INFO : Status: Running (Executing on YARN cluster with App id > application_1549989271932_2371) > ---------------------------------------------------------------------------------------------- > VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED > ---------------------------------------------------------------------------------------------- > Map 1 .......... container SUCCEEDED 3 3 0 0 0 0 > Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 > ---------------------------------------------------------------------------------------------- > VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 15,11 s > ---------------------------------------------------------------------------------------------- > INFO : Status: DAG finished successfully in 7,66 seconds > INFO : > INFO : Query Execution Summary > INFO : > ---------------------------------------------------------------------------------------------- > INFO : OPERATION DURATION > INFO : > ---------------------------------------------------------------------------------------------- > INFO : Compile Query 0,17s > INFO : Prepare Plan 3,33s > INFO : Get Query Coordinator (AM) 0,00s > INFO : Submit Plan 0,28s > INFO : Start DAG 0,88s > INFO : Run DAG 7,66s > INFO : > ---------------------------------------------------------------------------------------------- > INFO : > INFO : Task Execution Summary > INFO : > ---------------------------------------------------------------------------------------------- > INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS > OUTPUT_RECORDS > INFO : > ---------------------------------------------------------------------------------------------- > INFO : Map 1 6020,00 9,790 261 4,240,134 4 > INFO : Reducer 2 2635,00 590 0 3 0 > INFO : > ---------------------------------------------------------------------------------------------- > INFO : > INFO : org.apache.tez.common.counters.DAGCounter: > INFO : NUM_SUCCEEDED_TASKS: 4 > INFO : TOTAL_LAUNCHED_TASKS: 4 > INFO : DATA_LOCAL_TASKS: 1 > INFO : RACK_LOCAL_TASKS: 2 > INFO : AM_CPU_MILLISECONDS: 1800 > INFO : AM_GC_TIME_MILLIS: 0 > INFO : File System Counters: > INFO : FILE_BYTES_READ: 171 > INFO : FILE_BYTES_WRITTEN: 171 > INFO : HDFS_BYTES_READ: 14192292 > INFO : HDFS_BYTES_WRITTEN: 101 > INFO : HDFS_READ_OPS: 1008 > INFO : HDFS_WRITE_OPS: 2 > INFO : HDFS_OP_CREATE: 1 > INFO : HDFS_OP_GET_FILE_STATUS: 204 > INFO : HDFS_OP_OPEN: 603 > INFO : HDFS_OP_RENAME: 1 > INFO : org.apache.tez.common.counters.TaskCounter: > INFO : SPILLED_RECORDS: 0 > INFO : NUM_SHUFFLED_INPUTS: 3 > INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 > INFO : GC_TIME_MILLIS: 261 > INFO : TASK_DURATION_MILLIS: 10632 > INFO : CPU_MILLISECONDS: 10380 > INFO : PHYSICAL_MEMORY_BYTES: 4202692608 > INFO : VIRTUAL_MEMORY_BYTES: 34204336128 > INFO : COMMITTED_HEAP_BYTES: 4202692608 > INFO : INPUT_RECORDS_PROCESSED: 4145 > INFO : INPUT_SPLIT_LENGTH_BYTES: 217593448 > INFO : OUTPUT_RECORDS: 3 > INFO : OUTPUT_LARGE_RECORDS: 0 > INFO : OUTPUT_BYTES: 9 > INFO : OUTPUT_BYTES_WITH_OVERHEAD: 33 > INFO : OUTPUT_BYTES_PHYSICAL: 147 > INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 > INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 > INFO : ADDITIONAL_SPILL_COUNT: 0 > INFO : SHUFFLE_BYTES: 75 > INFO : SHUFFLE_BYTES_DECOMPRESSED: 33 > INFO : SHUFFLE_BYTES_TO_MEM: 0 > INFO : SHUFFLE_BYTES_TO_DISK: 0 > INFO : SHUFFLE_BYTES_DISK_DIRECT: 75 > INFO : SHUFFLE_PHASE_TIME: 2295 > INFO : FIRST_EVENT_RECEIVED: 219 > INFO : LAST_EVENT_RECEIVED: 2288 > INFO : HIVE: > INFO : CREATED_FILES: 1 > INFO : DESERIALIZE_ERRORS: 0 > INFO : RECORDS_IN_Map_1: 4240134 > INFO : RECORDS_OUT_0: 1 > INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 4 > INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0 > INFO : RECORDS_OUT_OPERATOR_FIL_10: 1 > INFO : RECORDS_OUT_OPERATOR_FS_15: 1 > INFO : RECORDS_OUT_OPERATOR_GBY_12: 3 > INFO : RECORDS_OUT_OPERATOR_GBY_14: 1 > INFO : RECORDS_OUT_OPERATOR_MAP_0: 0 > INFO : RECORDS_OUT_OPERATOR_RS_13: 4 > INFO : RECORDS_OUT_OPERATOR_SEL_11: 1 > INFO : RECORDS_OUT_OPERATOR_TS_0: 4240134 > INFO : TaskCounter_Map_1_INPUT_tbl1: > INFO : INPUT_RECORDS_PROCESSED: 4142 > INFO : INPUT_SPLIT_LENGTH_BYTES: 217593448 > INFO : TaskCounter_Map_1_OUTPUT_Reducer_2: > INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 > INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 > INFO : ADDITIONAL_SPILL_COUNT: 0 > INFO : OUTPUT_BYTES: 9 > INFO : OUTPUT_BYTES_PHYSICAL: 147 > INFO : OUTPUT_BYTES_WITH_OVERHEAD: 33 > INFO : OUTPUT_LARGE_RECORDS: 0 > INFO : OUTPUT_RECORDS: 3 > INFO : SPILLED_RECORDS: 0 > INFO : TaskCounter_Reducer_2_INPUT_Map_1: > INFO : FIRST_EVENT_RECEIVED: 219 > INFO : INPUT_RECORDS_PROCESSED: 3 > INFO : LAST_EVENT_RECEIVED: 2288 > INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 > INFO : NUM_SHUFFLED_INPUTS: 3 > INFO : SHUFFLE_BYTES: 75 > INFO : SHUFFLE_BYTES_DECOMPRESSED: 33 > INFO : SHUFFLE_BYTES_DISK_DIRECT: 75 > INFO : SHUFFLE_BYTES_TO_DISK: 0 > INFO : SHUFFLE_BYTES_TO_MEM: 0 > INFO : SHUFFLE_PHASE_TIME: 2295 > INFO : TaskCounter_Reducer_2_OUTPUT_out_Reducer_2: > INFO : OUTPUT_RECORDS: 0 > INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters: > INFO : GROUPED_INPUT_SPLITS_Map_1: 3 > INFO : INPUT_DIRECTORIES_Map_1: 1 > INFO : INPUT_FILES_Map_1: 200 > INFO : RAW_INPUT_SPLITS_Map_1: 201 > INFO : Completed executing > command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e); > Time taken: 12.218 seconds > INFO : OK > +------+ > | _c0 | > +------+ > | 1 | > +------+ > 1 row selected (12,458 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-------------------------------------------------------------* > *3 - Describe tbl1* > */-------------------------------------------------------------* > {code:java} > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:40:34 [main]: INFO jdbc.HiveConnection: Connected to data2:10000 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301104027_818ae55f-3e3f-4754-8706-0279b693d9a8): > describe extended db1.tbl1 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, > type:string, comment:from deserializer), FieldSchema(name:data_type, > type:string, comment:from deserializer), FieldSchema(name:comment, > type:string, comment:from deserializer)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301104027_818ae55f-3e3f-4754-8706-0279b693d9a8); > Time taken: 0.044 seconds > INFO : Executing > command(queryId=hive_20190301104027_818ae55f-3e3f-4754-8706-0279b693d9a8): > describe extended db1.tbl1 > INFO : Starting task [Stage-0:DDL] in serial mode > INFO : Completed executing > command(queryId=hive_20190301104027_818ae55f-3e3f-4754-8706-0279b693d9a8); > Time taken: 0.024 seconds > INFO : OK > +---------------------------------+----------------------------------------------------+----------+ > | col_name | data_type | comment | > +---------------------------------+----------------------------------------------------+----------+ > | code | string | | > | | NULL | NULL | > | Detailed Table Information | Table(tableName:tbl1, dbName:db1, > owner:anonymous, createTime:1551431182, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:code, type:string, > comment:null)], > location:hdfs://data1:8020/warehouse/tablespace/managed/hive/db1.db/tbl1, > inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, > parameters:{serialization.format=1}), bucketCols:[], sortCols:[], > parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], > skewedColValueLocationMaps:{}), storedAsSubDirectories:false), > partitionKeys:[], parameters:{totalSize=217593448, rawDataSize=0, numRows=0, > transactional_properties=default, numFiles=200, > transient_lastDdlTime=1551431187, bucketing_version=2, transactional=true}, > viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, > rewriteEnabled:false, catName:hive, ownerType:USER, writeId:1) | | > +---------------------------------+----------------------------------------------------+----------+ > 41 rows selected (0,157 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)