[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query
[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007014#comment-17007014 ] Sofia commented on HIVE-21367: -- [~sophie1] we re still using this workaround with the same version. I don't have any idea if this bug has been fixed in the latest version. > Hive returns an incorrect result when using a simple select query > - > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL >Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 >Reporter: LEMBARKI Mohamed Amine >Priority: Blocker > Attachments: mapred_input_dir_recursive.png > > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +--+ > | code | > +--+ > +--+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e): > select count(code) from db1.t
[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query
[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784226#comment-16784226 ] Sofia commented on HIVE-21367: -- Both sqoop and spark makes subdirs. Setting property hive.fetch.task.conversion= to none resolved the issue. Thanks [~starphin] > Hive returns an incorrect result when using a simple select query > - > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL >Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 >Reporter: LEMBARKI Mohamed Amine >Priority: Blocker > Attachments: mapred_input_dir_recursive.png > > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +--+ > | code | > +--+ > +--+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e): > select count(code) from db1.tbl1 where code
[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query
[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783485#comment-16783485 ] Sofia commented on HIVE-21367: -- The target table is from two different sources : * {color:#33}*From SQOOP*{color}: when loading tables we use the following code. {code:java} sqoop import --connect ${CONNECTION} \ --username ${USER} \ --password ${PASSWORD} \ --table $1 \ --hive-database $2 \ --hive-table ${TBNAME} \ --hive-import \ --as-orcfile \ --hive-overwrite \ -m 1 \ --delete-target-dir {code} * *From SPARK*: when processing the data, we store the output as a table in hive using the following code. {code:java} df.write .mode(mode) .format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR) .option("table",tableName) .save(){code} How do we load the data into the root path of the target table in each case ? > Hive returns an incorrect result when using a simple select query > - > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL >Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 >Reporter: LEMBARKI Mohamed Amine >Priority: Blocker > Attachments: mapred_input_dir_recursive.png > > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +--+ > | code | > +--+ > +--+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of t
[jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query
[ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783236#comment-16783236 ] Sofia commented on HIVE-21367: -- Hi [~starphin], why do hive behave that way and create subdirs when executing a simple select ? Is there any workaround for that ? > Hive returns an incorrect result when using a simple select query > - > > Key: HIVE-21367 > URL: https://issues.apache.org/jira/browse/HIVE-21367 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, JDBC, SQL >Affects Versions: 3.1.0 > Environment: - HDP 3.1 > - Hive 3.1.0 > - Spark 2.3.2 > - Sqoop 1.4.7 >Reporter: LEMBARKI Mohamed Amine >Priority: Blocker > > Hive returns an incorrect result when using a simple select query with a > where clause > While with an aggregation it returns a correct result > The problem arises for tables created by Spark or Sqoop > Also when we use spark-shell with HiveWarehouseConnector it returns a correct > result > > Workflow: > - Loading data with sqoop to hive > - Data processing with spark using HiveWarehouseConnector and Storage to > Hive > > below the error log : > > */-* > *1 - Executing Query : select code from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.142 seconds > INFO : Executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2): > select code from db1.tbl1 where code = '123' > INFO : Completed executing > command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2); > Time taken: 0.003 seconds > INFO : OK > +--+ > | code | > +--+ > +--+ > No rows selected (4,307 seconds) > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > Closing: 0: > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > {code} > */-* > *2 - Executing Query using count :* > *select count(code) from db1.tbl1 where code = '123'* > */-* > {code:java} > [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'" > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2 > 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:1 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > INFO : Compiling > command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e): > select count(code) from db1.tbl1 where code = '123' > INFO : Semantic Analysis Completed (retria