[jira] [Updated] (HIVE-11108) HashTableSinkOperator doesn't support vectorization [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-11108: -- Attachment: HIVE-11108.1-spark.patch The patch enables vectorization for SparkHashTableSinkOperator. Did some local tests. The end to end performance gain is not very obvious, as HTS usually processes the small tables. But for the specific stage, performance can be improved by about 2X in some cases, e.g. the work is computing min/max. HashTableSinkOperator doesn't support vectorization [Spark Branch] -- Key: HIVE-11108 URL: https://issues.apache.org/jira/browse/HIVE-11108 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-11108.1-spark.patch This prevents any BaseWork containing HTS from being vectorized. It's basically specific to spark, because Tez doesn't use HTS and MR runs HTS in local tasks. We should verify if it makes sense to make HTS support vectorization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11110) Enable HiveJoinAddNotNullRule in CBO
[ https://issues.apache.org/jira/browse/HIVE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605543#comment-14605543 ] Jesus Camacho Rodriguez commented on HIVE-0: I have analyzed the results; fails fall in different categories that need to be further analyzed: - Some of them are benign: additional filters are added or the join inputs were swapped. - In some cases, 2-input joins are not merged into 3-input joins (join_3.q, annotate_stats_join.q, auto_join3.q, explain_logical.q), which might result in additional execution stages. - In another case, a new 3-input join that was not identified before is created (correlation_optimizer6.q). - There seems to be a few cases with lost of bucketing using insert overwrite (infer_bucket_sort.q, join_33.q). -- Apart from these, the new rule triggers indefinitely for subquery_views.q. This is solved by [~jpullokkaran] patch by putting the HiveJoinAddNotNullRule rule on its own group of rules, but the issue should be studied further too. Enable HiveJoinAddNotNullRule in CBO Key: HIVE-0 URL: https://issues.apache.org/jira/browse/HIVE-0 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-0.1.patch, HIVE-0.patch Query {code} select count(*) from store_sales ,store_returns ,date_dim d1 ,date_dim d2 where d1.d_quarter_name = '2000Q1' and d1.d_date_sk = ss_sold_date_sk and ss_customer_sk = sr_customer_sk and ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number and sr_returned_date_sk = d2.d_date_sk and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’); {code} The store_sales table is partitioned on ss_sold_date_sk, which is also used in a join clause. The join clause should add a filter “filterExpr: ss_sold_date_sk is not null”, which should get pushed the MetaStore when fetching the stats. Currently this is not done in CBO planning, which results in the stats from __HIVE_DEFAULT_PARTITION__ to be fetched and considered in the optimization phase. In particular, this increases the NDV for the join columns and may result in wrong planning. Including HiveJoinAddNotNullRule in the optimization phase solves this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Tolpeko updated HIVE-11055: -- Attachment: HIVE-11055.3.patch Created Patch 3 - made the tool compatible with Hadoop 1. HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11134) HS2 should log open session failure
[ https://issues.apache.org/jira/browse/HIVE-11134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605537#comment-14605537 ] Vaibhav Gumashta commented on HIVE-11134: - +1 HS2 should log open session failure --- Key: HIVE-11134 URL: https://issues.apache.org/jira/browse/HIVE-11134 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-11134.1.patch HiveServer2 should log OpenSession failure. If beeline is not running with --verbose=true all stack trace information is not available for later debugging, as it is not currently logged in server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605541#comment-14605541 ] Vaibhav Gumashta commented on HIVE-10895: - [~aihuaxu] Were you able to reproduce the db leak at your end? In our setup, when we used oracle as the metastore db, we saw oracle running out of cursors. I'll try to run the patch through that system test as well. ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, HIVE-10895.3.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7451) pass function name in create/drop function to authorization api
[ https://issues.apache.org/jira/browse/HIVE-7451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olaf Flebbe updated HIVE-7451: -- Affects Version/s: (was: 1.2.0) pass function name in create/drop function to authorization api --- Key: HIVE-7451 URL: https://issues.apache.org/jira/browse/HIVE-7451 Project: Hive Issue Type: Bug Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.14.0 Attachments: HIVE-7451.1.patch, HIVE-7451.2.patch, HIVE-7451.3.patch, HIVE-7451.4.patch If function names are passed to the authorization api for create/drop function calls, then authorization decisions can be made based on the function names as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605538#comment-14605538 ] Vaibhav Gumashta commented on HIVE-10895: - [~aihuaxu] I'll be able to look at the patch today. Thanks for the effort. ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, HIVE-10895.3.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez
[ https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605700#comment-14605700 ] Alex Bush commented on HIVE-7765: - Workaround is to create an empty partition by creating the directory in HDFS and doing an MSCK repair table. Null pointer error with UNION ALL on partitioned tables using Tez - Key: HIVE-7765 URL: https://issues.apache.org/jira/browse/HIVE-7765 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1. Reporter: Chris Dragga Priority: Minor When executing a UNION ALL query in Tez over partitioned tables where at least one table is empty, Hive fails to execute the query, returning the message FAILED: NullPointerException null. No stack trace accompanies this message. Removing partitioning solves this problem, as does switching to MapReduce as the execution engine. This can be reproduced using a variant of the example tables from the Getting Started documentation on the Hive wiki. To create the schema, use CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); Then, load invites with data (e.g., using the instructions [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]) and execute the following: SELECT * FROM invites UNION ALL SELECT * FROM empty_invites; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez
[ https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605728#comment-14605728 ] Alex Bush commented on HIVE-7765: - Here is how to recreate this bug and use the workaround: #!/bin/bash echo col1,col2 /tmp/unionall_txt HIVECONF=--hiveconf hive.root.logger=INFO,console --hiveconf hive.cli.errors.ignore=true hive -v $HIVECONF -e drop database if exists unionall_test cascade; create database unionall_test; use unionall_test; CREATE TABLE test_a (f1 STRING, f2 STRING) PARTITIONED BY (ds STRING); CREATE TABLE test_b (f1 STRING, f2 STRING) PARTITIONED BY (ds STRING); LOAD DATA LOCAL INPATH '/tmp/unionall_txt' OVERWRITE INTO TABLE test_a PARTITION ( ds='a' ); SELECT * FROM test_a UNION ALL SELECT * FROM test_b; alter table test_b add partition ( ds='b' ); SELECT * FROM test_a UNION ALL SELECT * FROM test_b; Null pointer error with UNION ALL on partitioned tables using Tez - Key: HIVE-7765 URL: https://issues.apache.org/jira/browse/HIVE-7765 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1. Reporter: Chris Dragga Priority: Minor When executing a UNION ALL query in Tez over partitioned tables where at least one table is empty, Hive fails to execute the query, returning the message FAILED: NullPointerException null. No stack trace accompanies this message. Removing partitioning solves this problem, as does switching to MapReduce as the execution engine. This can be reproduced using a variant of the example tables from the Getting Started documentation on the Hive wiki. To create the schema, use CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); Then, load invites with data (e.g., using the instructions [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]) and execute the following: SELECT * FROM invites UNION ALL SELECT * FROM empty_invites; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez
[ https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Bush updated HIVE-7765: Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1. Hadoop 2.2.6 was:Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1. Null pointer error with UNION ALL on partitioned tables using Tez - Key: HIVE-7765 URL: https://issues.apache.org/jira/browse/HIVE-7765 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1. Hadoop 2.2.6 Reporter: Chris Dragga Priority: Minor When executing a UNION ALL query in Tez over partitioned tables where at least one table is empty, Hive fails to execute the query, returning the message FAILED: NullPointerException null. No stack trace accompanies this message. Removing partitioning solves this problem, as does switching to MapReduce as the execution engine. This can be reproduced using a variant of the example tables from the Getting Started documentation on the Hive wiki. To create the schema, use CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); Then, load invites with data (e.g., using the instructions [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]) and execute the following: SELECT * FROM invites UNION ALL SELECT * FROM empty_invites; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605713#comment-14605713 ] Hive QA commented on HIVE-11055: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742497/HIVE-11055.3.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9033 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4430/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4430/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4430/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12742497 - PreCommit-HIVE-TRUNK-Build HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11108) HashTableSinkOperator doesn't support vectorization [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605736#comment-14605736 ] Hive QA commented on HIVE-11108: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742505/HIVE-11108.1-spark.patch {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 7992 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_decimal_mapjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_left_outer_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_mapjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_nested_mapjoin org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/915/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/915/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-915/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12742505 - PreCommit-HIVE-SPARK-Build HashTableSinkOperator doesn't support vectorization [Spark Branch] -- Key: HIVE-11108 URL: https://issues.apache.org/jira/browse/HIVE-11108 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-11108.1-spark.patch This prevents any BaseWork containing HTS from being vectorized. It's basically specific to spark, because Tez doesn't use HTS and MR runs HTS in local tasks. We should verify if it makes sense to make HTS support vectorization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
[ https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605750#comment-14605750 ] Aihua Xu commented on HIVE-10754: - [~ctang.ma] I have switched the task to replace the deprecated calls with the new calls in HCatalog. It should not have any functional impact. new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog - Key: HIVE-10754 URL: https://issues.apache.org/jira/browse/HIVE-10754 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10754.patch Replace all the deprecated new Job() with Job.getInstance() in HCatalog. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605667#comment-14605667 ] Aihua Xu commented on HIVE-10895: - Really appreciate it if you can review the code, give it a test so that we can move it forward [~vgumashta]. Actually the customers are seeing the out of cursors in the production. I'm trying to repro locally (not able to repro yet). It would be great if you can try out on the test system. ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, HIVE-10895.3.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605669#comment-14605669 ] Yongzhi Chen commented on HIVE-2: - [~wisgood], you can merge my test case. I just solve my jira, you just merge my fixes and commit from your jira. Thanks ISO-8859-1 text output has fragments of previous longer rows appended - Key: HIVE-2 URL: https://issues.apache.org/jira/browse/HIVE-2 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-2.1.patch If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string. Example steps to reproduce: 1. Create a table using ISO 8859-1 encoding: {code:sql} CREATE TABLE person_lat1 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1'); {code} 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text: {noformat} Müller,Thomas Jørgensen,Jørgen Peña,Andrés Nåm,Fæk {noformat} 3. Execute {{SELECT * FROM person_lat1}} Result - The following output appears: {noformat} +---+--+ | person_lat1.name | +---+--+ | Müller,Thomas | | Jørgensen,Jørgen | | Peña,Andrésørgen | | Nåm,Fækdrésørgen | +---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez
[ https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Bush updated HIVE-7765: Priority: Major (was: Minor) Null pointer error with UNION ALL on partitioned tables using Tez - Key: HIVE-7765 URL: https://issues.apache.org/jira/browse/HIVE-7765 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1; Hadoop 2.2.6, Tez 0.5.2, Hive 0.14.0, CentOS 6.6 Reporter: Chris Dragga When executing a UNION ALL query in Tez over partitioned tables where at least one table is empty, Hive fails to execute the query, returning the message FAILED: NullPointerException null. No stack trace accompanies this message. Removing partitioning solves this problem, as does switching to MapReduce as the execution engine. This can be reproduced using a variant of the example tables from the Getting Started documentation on the Hive wiki. To create the schema, use CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); Then, load invites with data (e.g., using the instructions [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]) and execute the following: SELECT * FROM invites UNION ALL SELECT * FROM empty_invites; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez
[ https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Bush updated HIVE-7765: Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1; Hadoop 2.2.6, Tez 0.5.2, Hive 0.14.0, CentOS 6.6 was: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1. Hadoop 2.2.6 Null pointer error with UNION ALL on partitioned tables using Tez - Key: HIVE-7765 URL: https://issues.apache.org/jira/browse/HIVE-7765 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1; Hadoop 2.2.6, Tez 0.5.2, Hive 0.14.0, CentOS 6.6 Reporter: Chris Dragga Priority: Minor When executing a UNION ALL query in Tez over partitioned tables where at least one table is empty, Hive fails to execute the query, returning the message FAILED: NullPointerException null. No stack trace accompanies this message. Removing partitioning solves this problem, as does switching to MapReduce as the execution engine. This can be reproduced using a variant of the example tables from the Getting Started documentation on the Hive wiki. To create the schema, use CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); Then, load invites with data (e.g., using the instructions [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]) and execute the following: SELECT * FROM invites UNION ALL SELECT * FROM empty_invites; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11130) Refactoring the code so that HiveTxnManager interface will support lock/unlock table/database object
[ https://issues.apache.org/jira/browse/HIVE-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605782#comment-14605782 ] Aihua Xu commented on HIVE-11130: - Seems the two tests are not related to this refactoring. Refactoring the code so that HiveTxnManager interface will support lock/unlock table/database object Key: HIVE-11130 URL: https://issues.apache.org/jira/browse/HIVE-11130 Project: Hive Issue Type: Sub-task Components: Locking Affects Versions: 2.0.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-11130.patch This is just a refactoring step which keeps the current logic, but it exposes the explicit lock/unlock table and database in HiveTxnManager which should be implemented differently by the subclasses ( currently it's not. e.g., for ZooKeeper implementation, we should lock table and database when we try to lock the table). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605788#comment-14605788 ] Xuefu Zhang commented on HIVE-2: To accelerate the process, I committed the patch here to branch-1 and master. [~wisgood], could you consolidate HIVE-11095 and HIVE-10983? Add a test case if needed. I should be able to review it quickly. Thanks. Thanks to Yongzhi and Xiaowei for working on this. ISO-8859-1 text output has fragments of previous longer rows appended - Key: HIVE-2 URL: https://issues.apache.org/jira/browse/HIVE-2 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Fix For: 1.3.0, 2.0.0 Attachments: HIVE-2.1.patch If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string. Example steps to reproduce: 1. Create a table using ISO 8859-1 encoding: {code:sql} CREATE TABLE person_lat1 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1'); {code} 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text: {noformat} Müller,Thomas Jørgensen,Jørgen Peña,Andrés Nåm,Fæk {noformat} 3. Execute {{SELECT * FROM person_lat1}} Result - The following output appears: {noformat} +---+--+ | person_lat1.name | +---+--+ | Müller,Thomas | | Jørgensen,Jørgen | | Peña,Andrésørgen | | Nåm,Fækdrésørgen | +---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez
[ https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Bush updated HIVE-7765: Affects Version/s: 0.14.0 Null pointer error with UNION ALL on partitioned tables using Tez - Key: HIVE-7765 URL: https://issues.apache.org/jira/browse/HIVE-7765 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1. Reporter: Chris Dragga Priority: Minor When executing a UNION ALL query in Tez over partitioned tables where at least one table is empty, Hive fails to execute the query, returning the message FAILED: NullPointerException null. No stack trace accompanies this message. Removing partitioning solves this problem, as does switching to MapReduce as the execution engine. This can be reproduced using a variant of the example tables from the Getting Started documentation on the Hive wiki. To create the schema, use CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); Then, load invites with data (e.g., using the instructions [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]) and execute the following: SELECT * FROM invites UNION ALL SELECT * FROM empty_invites; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez
[ https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605693#comment-14605693 ] Alex Bush commented on HIVE-7765: - Stack trace from error: SELECT * FROM test5_a UNION ALL SELECT * FROM test5_b 15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver 15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver 15/06/29 15:11:35 [main]: INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver 15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver 15/06/29 15:11:35 [main]: INFO parse.ParseDriver: Parsing command: SELECT * FROM test5_a UNION ALL SELECT * FROM test5_b 15/06/29 15:11:35 [main]: INFO parse.ParseDriver: Parse Completed 15/06/29 15:11:35 [main]: INFO log.PerfLogger: /PERFLOG method=parse start=1435587095311 end=1435587095313 duration=2 from=org.apache.hadoop.hive.ql.Driver 15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Starting Semantic Analysis 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic Analysis 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for source tables 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for subqueries 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for source tables 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for subqueries 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for destination tables 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for source tables 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for subqueries 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for destination tables 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for destination tables 15/06/29 15:11:35 [main]: INFO ql.Context: New scratch dir is hdfs://upgtst226/tmp/hive/hdp_batch/57614a3b-aa9a-4bf8-82ca-4451f72b9d28/hive_2015-06-29_15-11-35_310_293431230946478382-1 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Completed getting MetaData in Semantic Analysis 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Not invoking CBO because the statement has too few joins 15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Set stats collection dir : hdfs://upgtst226/tmp/hive/hdp_batch/57614a3b-aa9a-4bf8-82ca-4451f72b9d28/hive_2015-06-29_15-11-35_310_293431230946478382-1/-ext-10002 15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for FS(6) 15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for SEL(5) 15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for UNION(4) 15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for SEL(1) 15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for TS(0) 15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for SEL(3) 15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for TS(2) 15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=partition-retrieving from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner 15/06/29 15:11:35 [main]: INFO log.PerfLogger: /PERFLOG method=partition-retrieving start=1435587095494 end=1435587095606 duration=112 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner 15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=partition-retrieving from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner 15/06/29 15:11:35 [main]: INFO log.PerfLogger: /PERFLOG method=partition-retrieving start=1435587095606 end=1435587095735 duration=129 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner 15/06/29 15:11:35 [main]: INFO parse.TezCompiler: Cycle free: true 15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities 15/06/29 15:11:35 [main]: INFO exec.Utilities: Serializing ArrayList via kryo 15/06/29 15:11:35 [main]: INFO log.PerfLogger: /PERFLOG method=serializePlan start=1435587095740 end=1435587095743 duration=3 from=org.apache.hadoop.hive.ql.exec.Utilities 15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities 15/06/29 15:11:35 [main]: INFO exec.Utilities: Deserializing ArrayList via kryo 15/06/29 15:11:35 [main]: INFO log.PerfLogger: /PERFLOG method=deserializePlan start=1435587095743 end=1435587095746 duration=3 from=org.apache.hadoop.hive.ql.exec.Utilities 15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG
[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605959#comment-14605959 ] Sushanth Sowmyan commented on HIVE-10983: - Not a problem! As part of the release process, I'm required to go unset all jiras marked for older released releases, and that's what I was doing. :) To expand further, the idea is that Fix Version is set to track which branches the commits got committed to, and thus, should not be set unless this patch has already been committed to those branches. So, now, for example, if this commit is committed to branch-1.2 to track 1.2.x, its fix version would be 1.2.2 once it is committed. Setting it to 1.2.0 would mean that this was included as part of the 1.2.0 release, which it wasn't. So, for this, when a committer commits a patch for this bug, if they commit it to branch-1.2, they should then set the fix version to 1.2.2. SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 2.0.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt {noformat} The mothod transformTextToUTF8 and transformTextFromUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605940#comment-14605940 ] Nishant Kelkar commented on HIVE-11137: --- LazyBinaryUtils used only for readVInt() and writeVInt(). Relevant sections of code from LazyBinaryUtils: {code} private static ThreadLocalbyte[] vLongBytesThreadLocal = new ThreadLocalbyte[]() { @Override public byte[] initialValue() { return new byte[9]; } }; public static void writeVLong(RandomAccessOutput byteStream, long l) { byte[] vLongBytes = vLongBytesThreadLocal.get(); int len = LazyBinaryUtils.writeVLongToByteArray(vLongBytes, l); byteStream.write(vLongBytes, 0, len); } {code} {code} /** * Reads a zero-compressed encoded int from a byte array and returns it. * * @param bytes * the byte array * @param offset * offset of the array to read from * @param vInt * storing the deserialized int and its size in byte */ public static void readVInt(byte[] bytes, int offset, VInt vInt) { byte firstByte = bytes[offset]; vInt.length = (byte) WritableUtils.decodeVIntSize(firstByte); if (vInt.length == 1) { vInt.value = firstByte; return; } int i = 0; for (int idx = 0; idx vInt.length - 1; idx++) { byte b = bytes[offset + 1 + idx]; i = i 8; i = i | (b 0xFF); } vInt.value = (WritableUtils.isNegativeVInt(firstByte) ? (i ^ -1) : i); } {code} I could contribute a patch towards this task [~owen.omalley] (I'm a beginner contributor in Hive, looking around for work :)). Thanks and let me know! In DateWritable remove the use of LazyBinaryUtils - Key: HIVE-11137 URL: https://issues.apache.org/jira/browse/HIVE-11137 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Currently the DateWritable class uses LazyBinaryUtils, which has a lot of dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge
[ https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606699#comment-14606699 ] Hive QA commented on HIVE-11141: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742590/HIVE-11141.2.patch {color:red}ERROR:{color} -1 due to 116 failed/errored test(s), 9034 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_non_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_orig_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_whole_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization_acid org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_cube1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_update_delete org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join40 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nullsafe org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_partition_metadataonly org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_filter_on_outerjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_gby3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonblock_op_deduplicate org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partInit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_date2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_timestamp2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_reduce_deduplicate_extended org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_reducesink_dedup org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unionDistinct_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_all_non_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_all_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_two_cols org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join_nulls org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mr_diff_schema_alias org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_non_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_orig_table
[jira] [Commented] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605375#comment-14605375 ] Hive QA commented on HIVE-11138: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742480/HIVE-11138.1-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 6207 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestHBaseCliDriver.initializationError org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError org.apache.hadoop.hive.cli.TestNegativeCliDriver.initializationError org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.initializationError {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/914/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/914/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-914/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12742480 - PreCommit-HIVE-SPARK-Build Query fails when there isn't a comparator for an operator [Spark Branch] Key: HIVE-11138 URL: https://issues.apache.org/jira/browse/HIVE-11138 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-11138.1-spark.patch, HIVE-11138.1-spark.patch In such case, OperatorComparatorFactory should default to false instead of throw exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization
[ https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605462#comment-14605462 ] dima machlin commented on HIVE-7205: Will this patch be merged to future versions? Until what version is it safe to apply this patch? Wrong results when union all of grouping followed by group by with correlation optimization --- Key: HIVE-7205 URL: https://issues.apache.org/jira/browse/HIVE-7205 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: dima machlin Assignee: Navis Priority: Critical Attachments: HIVE-7205.1.patch.txt, HIVE-7205.2.patch.txt, HIVE-7205.3.patch.txt, HIVE-7205.4.patch.txt use case : table TBL (a string,b string) contains single row : 'a','a' the following query : {code:sql} select b, sum(cc) from ( select b,count(1) as cc from TBL group by b union all select a as b,count(1) as cc from TBL group by a ) z group by b {code} returns a 1 a 1 while set hive.optimize.correlation=true; if we change set hive.optimize.correlation=false; it returns correct results : a 2 The plan with correlation optimization : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: null-subquery1:z-subquery1:TBL TableScan alias: TBL Select Operator expressions: expr: b type: string outputColumnNames: b Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: b type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 0 value expressions: expr: _col1 type: bigint null-subquery2:z-subquery2:TBL TableScan alias: TBL Select Operator expressions: expr: a type: string outputColumnNames: a Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: a type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 1 value expressions: expr: _col1 type: bigint Reduce Operator Tree: Demux Operator Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 Union Select Operator
[jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez
[ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-10673: -- Issue Type: New Feature (was: Bug) Dynamically partitioned hash join for Tez - Key: HIVE-10673 URL: https://issues.apache.org/jira/browse/HIVE-10673 Project: Hive Issue Type: New Feature Components: Query Planning, Query Processor Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez
[ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606172#comment-14606172 ] Gopal V commented on HIVE-10673: [~xuefuz]: this is a re-use of the custom Tez VertexManager from last-year's Hadoop Summit talk, extending it to reducers http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey/13 Dynamically partitioned hash join for Tez - Key: HIVE-10673 URL: https://issues.apache.org/jira/browse/HIVE-10673 Project: Hive Issue Type: New Feature Components: Query Planning, Query Processor Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez
[ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606189#comment-14606189 ] Xuefu Zhang commented on HIVE-10673: Thanks, guys. I asked the question mainly because the title sounds like a feature while the JIRA was originally marked as bug and the description sounds to be either. It would be nice if the description provides more details so that people of general interest would understand. Things such as problem/feature description and proposed solution would be definitely helpful. Dynamically partitioned hash join for Tez - Key: HIVE-10673 URL: https://issues.apache.org/jira/browse/HIVE-10673 Project: Hive Issue Type: New Feature Components: Query Planning, Query Processor Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh
[ https://issues.apache.org/jira/browse/HIVE-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11140: -- Attachment: HIVE-11140.patch auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh -- Key: HIVE-11140 URL: https://issues.apache.org/jira/browse/HIVE-11140 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11140.patch it's currently set as {noformat} if [ -z ${PROJ_HOME} ]; then export PROJ_HOME=/Users/${USER}/dev/hive fi {noformat} but it always points to project root so can be {{export PROJ_HOME=../../../../../..}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11123) Fix how to confirm the RDBMS product name at Metastore.
[ https://issues.apache.org/jira/browse/HIVE-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606018#comment-14606018 ] Hive QA commented on HIVE-11123: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742520/HIVE-11123.2.patch {color:green}SUCCESS:{color} +1 9034 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4431/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4431/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4431/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12742520 - PreCommit-HIVE-TRUNK-Build Fix how to confirm the RDBMS product name at Metastore. --- Key: HIVE-11123 URL: https://issues.apache.org/jira/browse/HIVE-11123 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.2.0 Environment: PostgreSQL Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HIVE-11123.1.patch, HIVE-11123.2.patch I use PostgreSQL to Hive Metastore. And I saw the following message at PostgreSQL log. {code} 2015-06-26 10:58:15.488 JST ERROR: syntax error at or near @@ at character 5 2015-06-26 10:58:15.488 JST STATEMENT: SET @@session.sql_mode=ANSI_QUOTES 2015-06-26 10:58:15.489 JST ERROR: relation v$instance does not exist at character 21 2015-06-26 10:58:15.489 JST STATEMENT: SELECT version FROM v$instance 2015-06-26 10:58:15.490 JST ERROR: column version does not exist at character 10 2015-06-26 10:58:15.490 JST STATEMENT: SELECT @@version {code} When Hive CLI and Beeline embedded mode are carried out, this message is output to PostgreSQL log. These queries are called from MetaStoreDirectSql#determineDbType. And if we use MetaStoreDirectSql#getProductName, we need not to call these queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606099#comment-14606099 ] Thejas M Nair commented on HIVE-10895: -- [~aihuaxu] Are those users also seeing the failures when Oracle is used as the metastore database ? In the internal testing at Hortonworks, we have seen it only with Oracle. This happens in our concurrency test suite, where many queries are hitting HS2 in parallel. ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, HIVE-10895.3.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606129#comment-14606129 ] Aihua Xu commented on HIVE-10895: - [~thejas] Yes. Those users are all using Oracle as the metastore database. ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, HIVE-10895.3.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez
[ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606142#comment-14606142 ] Jason Dere commented on HIVE-10673: --- [~mmokhtar] or [~gopalv] can probably give more detail, but they found that during a shuffle join a large amount of the CPU/IO was spent sorting. While this does not work for MR, for other execution engines (such as Tez), it is possible to create a reduce-side join that uses unsorted inputs, in order to eliminate the sorting. We use the hash join algorithm to perform the join in the reducer, so this requires the small tables in the join to fit in the hash table for this to work. Testing with this patch [~mmokhtar] found some decent time savings. Dynamically partitioned hash join for Tez - Key: HIVE-10673 URL: https://issues.apache.org/jira/browse/HIVE-10673 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge
[ https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11141: - Attachment: HIVE-11141.1.patch Improve RuleRegExp when the Expression node stack gets huge --- Key: HIVE-11141 URL: https://issues.apache.org/jira/browse/HIVE-11141 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11141.1.patch, SQLQuery10.sql.mssql, createtable.rtf More and more complex workloads are migrated to Hive from Sql Server, Terradata etc.. And occasionally Hive gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} createtable.rtf - create table command SQLQuery10.sql.mssql - explain query It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11123) Fix how to confirm the RDBMS product name at Metastore.
[ https://issues.apache.org/jira/browse/HIVE-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606544#comment-14606544 ] Sergey Shelukhin commented on HIVE-11123: - +1. Small nit: null check can be done once Fix how to confirm the RDBMS product name at Metastore. --- Key: HIVE-11123 URL: https://issues.apache.org/jira/browse/HIVE-11123 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.2.0 Environment: PostgreSQL Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HIVE-11123.1.patch, HIVE-11123.2.patch I use PostgreSQL to Hive Metastore. And I saw the following message at PostgreSQL log. {code} 2015-06-26 10:58:15.488 JST ERROR: syntax error at or near @@ at character 5 2015-06-26 10:58:15.488 JST STATEMENT: SET @@session.sql_mode=ANSI_QUOTES 2015-06-26 10:58:15.489 JST ERROR: relation v$instance does not exist at character 21 2015-06-26 10:58:15.489 JST STATEMENT: SELECT version FROM v$instance 2015-06-26 10:58:15.490 JST ERROR: column version does not exist at character 10 2015-06-26 10:58:15.490 JST STATEMENT: SELECT @@version {code} When Hive CLI and Beeline embedded mode are carried out, this message is output to PostgreSQL log. These queries are called from MetaStoreDirectSql#determineDbType. And if we use MetaStoreDirectSql#getProductName, we need not to call these queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge
[ https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11141: - Attachment: HIVE-11141.2.patch Improve RuleRegExp when the Expression node stack gets huge --- Key: HIVE-11141 URL: https://issues.apache.org/jira/browse/HIVE-11141 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11141.1.patch, HIVE-11141.2.patch, SQLQuery10.sql.mssql, createtable.rtf More and more complex workloads are migrated to Hive from Sql Server, Terradata etc.. And occasionally Hive gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} createtable.rtf - create table command SQLQuery10.sql.mssql - explain query It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10141) count(distinct) not supported in Windowing function
[ https://issues.apache.org/jira/browse/HIVE-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606470#comment-14606470 ] Yin Huai commented on HIVE-10141: - Looking at https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L195-L198. Seems window spec is dropped. count(distinct) not supported in Windowing function --- Key: HIVE-10141 URL: https://issues.apache.org/jira/browse/HIVE-10141 Project: Hive Issue Type: Improvement Components: PTF-Windowing Affects Versions: 1.0.0 Reporter: Yi Zhang Priority: Critical Count(distinct) is a very important function for analysis. For example, unique visitors instead of total visitors. Currently it is missing in Windowing function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606565#comment-14606565 ] xiaowei wang commented on HIVE-2: - Ok,I will try to add a test case today for HIVE_10983,HIVE_10983 and HIVE-2 cover different case ,but HIVE-10983 duplicate both. ISO-8859-1 text output has fragments of previous longer rows appended - Key: HIVE-2 URL: https://issues.apache.org/jira/browse/HIVE-2 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Fix For: 1.3.0, 2.0.0 Attachments: HIVE-2.1.patch If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string. Example steps to reproduce: 1. Create a table using ISO 8859-1 encoding: {code:sql} CREATE TABLE person_lat1 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1'); {code} 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text: {noformat} Müller,Thomas Jørgensen,Jørgen Peña,Andrés Nåm,Fæk {noformat} 3. Execute {{SELECT * FROM person_lat1}} Result - The following output appears: {noformat} +---+--+ | person_lat1.name | +---+--+ | Müller,Thomas | | Jørgensen,Jørgen | | Peña,Andrésørgen | | Nåm,Fækdrésørgen | +---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez
[ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606278#comment-14606278 ] Damien Carol commented on HIVE-10673: - \ No newline at end of file Dynamically partitioned hash join for Tez - Key: HIVE-10673 URL: https://issues.apache.org/jira/browse/HIVE-10673 Project: Hive Issue Type: New Feature Components: Query Planning, Query Processor Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11030) Enhance storage layer to create one delta file per write
[ https://issues.apache.org/jira/browse/HIVE-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606294#comment-14606294 ] Alan Gates commented on HIVE-11030: --- AcidUtils.serializeDeltas and AcidUtils.deserializeDeltas: You changed these to work in the framework of deltas being passed as a list of longs. But this causes double stating of the file system because now OrcInputFormat.FileGenerator calls AcidUtils.serializeDeltas, has to figure out all the deltas and then forget about the statementIds, then when it comes back around in OrcInputFormat.getReader and calls AcidUtils.deserializeDeltas it has to go back and restat the file system to find all the statement ids. Instead you should change de/serializeDeltas to pass a triple (maxtxn, mintxn, stmt). Or if you prefer to extend the existing hack it can pass a list of longs but use 3 slots per delta instead of 2. This avoids loss of info in serialize that has to be rediscovered in deserialize. In AcidUtils: {code} private static ParsedDelta parseDelta(FileStatus path) { ParsedDelta p = parsedDelta(path.getPath()); return new ParsedDelta(p.getMinTransaction(), p.getMaxTransaction(), path, p.statementId); } {code} I don't understand this code. Why get a ParsedDelta and turn around and create a new one? In parseDelta, would it be better to split the string on '_' rather than call indexOf twice? In OrcRawRecordMerger, in the constructor (line 489 in your patch) you added a call to AcidUtils.parsedDeltas. This looks like another case where if the statement id was being properly preserved we would not need to again parse the file name. OrcRecordUpdate, end of the constructor (line 265 in your patch), you're introducing a file system stat for a sanity check. That doesn't seem worth it. Enhance storage layer to create one delta file per write Key: HIVE-11030 URL: https://issues.apache.org/jira/browse/HIVE-11030 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11030.2.patch, HIVE-11030.3.patch Currently each txn using ACID insert/update/delete will generate a delta directory like delta_100_101. In order to support multi-statement transactions we must generate one delta per operation within the transaction so the deltas would be named like delta_100_101_0001, etc. Support for MERGE (HIVE-10924) would need the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh
[ https://issues.apache.org/jira/browse/HIVE-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606417#comment-14606417 ] Eugene Koifman commented on HIVE-11140: --- failure not related. [~thejas] could you review please? auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh -- Key: HIVE-11140 URL: https://issues.apache.org/jira/browse/HIVE-11140 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11140.patch, HIVE-11140.patch it's currently set as {noformat} if [ -z ${PROJ_HOME} ]; then export PROJ_HOME=/Users/${USER}/dev/hive fi {noformat} but it always points to project root so can be {{export PROJ_HOME=../../../../../..}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11009) LLAP: fix TestMiniTezCliDriverLocal on the branch
[ https://issues.apache.org/jira/browse/HIVE-11009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11009: Assignee: Vikram Dixit K (was: Gunther Hagleitner) LLAP: fix TestMiniTezCliDriverLocal on the branch - Key: HIVE-11009 URL: https://issues.apache.org/jira/browse/HIVE-11009 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Vikram Dixit K See HIVE-10997. All the queries of this test fail on the branch with the same initialization error -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9823) Load spark-defaults.conf from classpath [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606315#comment-14606315 ] Xuefu Zhang commented on HIVE-9823: --- The document says that for spark related properties, you can add them to a file called spark-default.conf and add the file to the classpath. The JIRA here says that Hive will load this file from the classpath. Thus, you need both. Load spark-defaults.conf from classpath [Spark Branch] -- Key: HIVE-9823 URL: https://issues.apache.org/jira/browse/HIVE-9823 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Fix For: 1.2.0 Attachments: HIVE-9823.1-spark.patch, HIVE-9823.2-spark.patch, HIVE-9823.3-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11061) Table renames not propagated to partition table in HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-11061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-11061: -- Attachment: HIVE-11061.patch This patch does the work in both the tbls and partitions tables to figure out if the table name has changed, and if so delete the existing rows and create new ones. Table renames not propagated to partition table in HBase metastore -- Key: HIVE-11061 URL: https://issues.apache.org/jira/browse/HIVE-11061 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: hbase-metastore-branch Reporter: Alan Gates Assignee: Alan Gates Fix For: hbase-metastore-branch Attachments: HIVE-11061.patch When a table is renamed in the HBase metastore it needs to update relevant rows in the partition table not only in the tbls table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11139) Emit more lineage information
[ https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-11139: --- Attachment: HIVE-11139.1.patch Attached patch v1 that is on RB: https://reviews.apache.org/r/36025/ Emit more lineage information - Key: HIVE-11139 URL: https://issues.apache.org/jira/browse/HIVE-11139 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0 Attachments: HIVE-11139.1.patch HIVE-1131 emits some column lineage info. But it doesn't support INSERT statements, or CTAS statements. It doesn't emit the predicate information either. We can enhance and use the dependency information created in HIVE-1131, generate more complete lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge
[ https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11141: - Attachment: (was: HIVE-11141.1.patch) Improve RuleRegExp when the Expression node stack gets huge --- Key: HIVE-11141 URL: https://issues.apache.org/jira/browse/HIVE-11141 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11141.1.patch, SQLQuery10.sql.mssql, createtable.rtf More and more complex workloads are migrated to Hive from Sql Server, Terradata etc.. And occasionally Hive gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} createtable.rtf - create table command SQLQuery10.sql.mssql - explain query It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge
[ https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11141: - Description: Hive occassionally gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} createtable.rtf - create table command SQLQuery10.sql.mssql - explain query It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} was: More and more complex workloads are migrated to Hive from Sql Server, Terradata etc.. And occasionally Hive gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} createtable.rtf - create table command SQLQuery10.sql.mssql - explain query It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} Improve RuleRegExp when the Expression node stack gets huge --- Key: HIVE-11141 URL: https://issues.apache.org/jira/browse/HIVE-11141 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11141.1.patch, HIVE-11141.2.patch, SQLQuery10.sql.mssql, createtable.rtf Hive occassionally gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} createtable.rtf - create table command SQLQuery10.sql.mssql - explain query It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: HIVE-9557.3.patch Attaching revision #3 patch to remove hidden dependency on FastMath (it comes in via org.apache.spark:spark-core_2.10 dependency) from commons-math3. Using library Math instead. create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, HIVE-9557.3.patch, udf_cosine_similarity-v01.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606523#comment-14606523 ] Alan Gates commented on HIVE-11055: --- I ran rat on this and all looks good except for a number of generated files: {code} !? /Users/gates/git/apache/hive/hplsql/src/main/java/org/apache/hive/hplsql/Hplsql.tokens !? /Users/gates/git/apache/hive/hplsql/src/main/java/org/apache/hive/hplsql/HplsqlBaseVisitor.java !? /Users/gates/git/apache/hive/hplsql/src/main/java/org/apache/hive/hplsql/HplsqlLexer.java !? /Users/gates/git/apache/hive/hplsql/src/main/java/org/apache/hive/hplsql/HplsqlLexer.tokens !? /Users/gates/git/apache/hive/hplsql/src/main/java/org/apache/hive/hplsql/HplsqlParser.java !? /Users/gates/git/apache/hive/hplsql/src/main/java/org/apache/hive/hplsql/HplsqlVisitor.java {code} Did you intend to check these in rather than have the build generate them? HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11068) Hive throws OOM in client side
[ https://issues.apache.org/jira/browse/HIVE-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606598#comment-14606598 ] Sergey Shelukhin commented on HIVE-11068: - [~gopalv] [~prasanth_j] is that the cycles issue you were talking about? Hive throws OOM in client side -- Key: HIVE-11068 URL: https://issues.apache.org/jira/browse/HIVE-11068 Project: Hive Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Prasanth Jayachandran Attachments: Yourkit_String.png, Yourkit_TablScanDesc.png, hive_cli_debug.log.gz Hive build: (Latest on Jun 21. commit 142426394cfdc8a1fea51f7642c63f43f36b0333). Query: Query 64 TPC-DS (https://github.com/cartershanklin/hive-testbench/blob/master/sample-queries-tpcds/query64.sql) Hive throws following OOM in client side. {noformat} Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149) at java.lang.StringCoding.decode(StringCoding.java:193) at java.lang.String.init(String.java:414) at java.lang.String.init(String.java:479) at org.apache.hadoop.hive.ql.exec.Utilities.serializeExpression(Utilities.java:799) at org.apache.hadoop.hive.ql.plan.TableScanDesc.setFilterExpr(TableScanDesc.java:153) at org.apache.hadoop.hive.ql.ppd.OpProcFactory.pushFilterToStorageHandler(OpProcFactory.java:901) at org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:818) at org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:788) at org.apache.hadoop.hive.ql.ppd.OpProcFactory$TableScanPPD.process(OpProcFactory.java:388) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110) at org.apache.hadoop.hive.ql.ppd.PredicatePushDown.transform(PredicatePushDown.java:135) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:192) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10171) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1124) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh
[ https://issues.apache.org/jira/browse/HIVE-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606467#comment-14606467 ] Thejas M Nair commented on HIVE-11140: -- +1 auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh -- Key: HIVE-11140 URL: https://issues.apache.org/jira/browse/HIVE-11140 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11140.patch, HIVE-11140.patch it's currently set as {noformat} if [ -z ${PROJ_HOME} ]; then export PROJ_HOME=/Users/${USER}/dev/hive fi {noformat} but it always points to project root so can be {{export PROJ_HOME=../../../../../..}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh
[ https://issues.apache.org/jira/browse/HIVE-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606385#comment-14606385 ] Hive QA commented on HIVE-11140: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742558/HIVE-11140.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9034 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4432/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4432/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4432/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12742558 - PreCommit-HIVE-TRUNK-Build auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh -- Key: HIVE-11140 URL: https://issues.apache.org/jira/browse/HIVE-11140 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11140.patch, HIVE-11140.patch it's currently set as {noformat} if [ -z ${PROJ_HOME} ]; then export PROJ_HOME=/Users/${USER}/dev/hive fi {noformat} but it always points to project root so can be {{export PROJ_HOME=../../../../../..}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh
[ https://issues.apache.org/jira/browse/HIVE-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606555#comment-14606555 ] Hive QA commented on HIVE-11140: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742568/HIVE-11140.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9034 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4433/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4433/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4433/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12742568 - PreCommit-HIVE-TRUNK-Build auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh -- Key: HIVE-11140 URL: https://issues.apache.org/jira/browse/HIVE-11140 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11140.patch, HIVE-11140.patch it's currently set as {noformat} if [ -z ${PROJ_HOME} ]; then export PROJ_HOME=/Users/${USER}/dev/hive fi {noformat} but it always points to project root so can be {{export PROJ_HOME=../../../../../..}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11068) Hive throws OOM in client side
[ https://issues.apache.org/jira/browse/HIVE-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606602#comment-14606602 ] Prasanth Jayachandran commented on HIVE-11068: -- Yes. It is. Hive throws OOM in client side -- Key: HIVE-11068 URL: https://issues.apache.org/jira/browse/HIVE-11068 Project: Hive Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Prasanth Jayachandran Attachments: Yourkit_String.png, Yourkit_TablScanDesc.png, hive_cli_debug.log.gz Hive build: (Latest on Jun 21. commit 142426394cfdc8a1fea51f7642c63f43f36b0333). Query: Query 64 TPC-DS (https://github.com/cartershanklin/hive-testbench/blob/master/sample-queries-tpcds/query64.sql) Hive throws following OOM in client side. {noformat} Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149) at java.lang.StringCoding.decode(StringCoding.java:193) at java.lang.String.init(String.java:414) at java.lang.String.init(String.java:479) at org.apache.hadoop.hive.ql.exec.Utilities.serializeExpression(Utilities.java:799) at org.apache.hadoop.hive.ql.plan.TableScanDesc.setFilterExpr(TableScanDesc.java:153) at org.apache.hadoop.hive.ql.ppd.OpProcFactory.pushFilterToStorageHandler(OpProcFactory.java:901) at org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:818) at org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:788) at org.apache.hadoop.hive.ql.ppd.OpProcFactory$TableScanPPD.process(OpProcFactory.java:388) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110) at org.apache.hadoop.hive.ql.ppd.PredicatePushDown.transform(PredicatePushDown.java:135) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:192) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10171) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1124) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11143) Tests udf_from_utc_timestamp.q/udf_to_utc_timestamp.q do not work with updated Java timezone information
[ https://issues.apache.org/jira/browse/HIVE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-11143: -- Attachment: HIVE-11143.1.patch Attaching patch v1. This changes the year used in the tests from 2015 to 2012, before the time zone changes. Tests udf_from_utc_timestamp.q/udf_to_utc_timestamp.q do not work with updated Java timezone information Key: HIVE-11143 URL: https://issues.apache.org/jira/browse/HIVE-11143 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11143.1.patch It looks like there were recent changes to the Europe/Moscow time zone in 2014. When udf_from_utc_timestamp.q/udf_to_utc_timestamp.q are run with more recent versions of JDK or with an updated time zone database, the tests fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge
[ https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11141: - Description: Hive occasionally gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} createtable.rtf - create table command SQLQuery10.sql.mssql - explain query It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} was: Hive occassionally gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} createtable.rtf - create table command SQLQuery10.sql.mssql - explain query It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} Improve RuleRegExp when the Expression node stack gets huge --- Key: HIVE-11141 URL: https://issues.apache.org/jira/browse/HIVE-11141 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11141.1.patch, HIVE-11141.2.patch, SQLQuery10.sql.mssql, createtable.rtf Hive occasionally gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} createtable.rtf - create table command SQLQuery10.sql.mssql - explain query It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10410) Apparent race condition in HiveServer2 causing intermittent query failures
[ https://issues.apache.org/jira/browse/HIVE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606525#comment-14606525 ] Jimmy Xiang commented on HIVE-10410: HIVE-10956 fixed some HiveMetaStoreClient sync issue. It should help, in case it is a race to HMS. Apparent race condition in HiveServer2 causing intermittent query failures -- Key: HIVE-10410 URL: https://issues.apache.org/jira/browse/HIVE-10410 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.1 Environment: CDH 5.3.3 CentOS 6.4 Reporter: Richard Williams Attachments: HIVE-10410.1.patch On our secure Hadoop cluster, queries submitted to HiveServer2 through JDBC occasionally trigger odd Thrift exceptions with messages such as Read a negative frame size (-2147418110)! or out of sequence response in HiveServer2's connections to the metastore. For certain metastore calls (for example, showDatabases), these Thrift exceptions are converted to MetaExceptions in HiveMetaStoreClient, which prevents RetryingMetaStoreClient from retrying these calls and thus causes the failure to bubble out to the JDBC client. Note that as far as we can tell, this issue appears to only affect queries that are submitted with the runAsync flag on TExecuteStatementReq set to true (which, in practice, seems to mean all JDBC queries), and it appears to only manifest when HiveServer2 is using the new HTTP transport mechanism. When both these conditions hold, we are able to fairly reliably reproduce the issue by spawning about 100 simple, concurrent hive queries (we have been using show databases), two or three of which typically fail. However, when either of these conditions do not hold, we are no longer able to reproduce the issue. Some example stack traces from the HiveServer2 logs: {noformat} 2015-04-16 13:54:55,486 ERROR hive.log: Got exception: org.apache.thrift.transport.TTransportException Read a negative frame size (-2147418110)! org.apache.thrift.transport.TTransportException: Read a negative frame size (-2147418110)! at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837) at org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90) at com.sun.proxy.$Proxy6.getDatabases(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139) at org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145) at
[jira] [Commented] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605216#comment-14605216 ] Rui Li commented on HIVE-11138: --- cc [~chengxiang li], [~xuefuz] Query fails when there isn't a comparator for an operator [Spark Branch] Key: HIVE-11138 URL: https://issues.apache.org/jira/browse/HIVE-11138 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-11138.1-spark.patch In such case, OperatorComparatorFactory should default to false instead of throw exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-11138: -- Attachment: HIVE-11138.1-spark.patch Query fails when there isn't a comparator for an operator [Spark Branch] Key: HIVE-11138 URL: https://issues.apache.org/jira/browse/HIVE-11138 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-11138.1-spark.patch In such case, OperatorComparatorFactory should default to false instead of throw exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-11138: -- Attachment: HIVE-11138.1-spark.patch Can't reproduce the failures locally. Try again. Query fails when there isn't a comparator for an operator [Spark Branch] Key: HIVE-11138 URL: https://issues.apache.org/jira/browse/HIVE-11138 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-11138.1-spark.patch, HIVE-11138.1-spark.patch In such case, OperatorComparatorFactory should default to false instead of throw exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master
[ https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11014: Summary: LLAP: some MiniTez tests have result changes compared to master (was: LLAP: MiniTez vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and cbo_windowing tests have result changes compared to master) LLAP: some MiniTez tests have result changes compared to master --- Key: HIVE-11014 URL: https://issues.apache.org/jira/browse/HIVE-11014 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Matt McCline -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606922#comment-14606922 ] xiaowei wang commented on HIVE-2: - I have added a test case in HIVE-11095 ,so I need code review. The test have passed . Thanks! ISO-8859-1 text output has fragments of previous longer rows appended - Key: HIVE-2 URL: https://issues.apache.org/jira/browse/HIVE-2 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Fix For: 1.3.0, 2.0.0 Attachments: HIVE-2.1.patch If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string. Example steps to reproduce: 1. Create a table using ISO 8859-1 encoding: {code:sql} CREATE TABLE person_lat1 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1'); {code} 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text: {noformat} Müller,Thomas Jørgensen,Jørgen Peña,Andrés Nåm,Fæk {noformat} 3. Execute {{SELECT * FROM person_lat1}} Result - The following output appears: {noformat} +---+--+ | person_lat1.name | +---+--+ | Müller,Thomas | | Jørgensen,Jørgen | | Peña,Andrésørgen | | Nåm,Fækdrésørgen | +---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11108) HashTableSinkOperator doesn't support vectorization [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606938#comment-14606938 ] Hive QA commented on HIVE-11108: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742687/HIVE-11108.2-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7992 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hive.hcatalog.streaming.TestStreaming.testAddPartition org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/916/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/916/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-916/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12742687 - PreCommit-HIVE-SPARK-Build HashTableSinkOperator doesn't support vectorization [Spark Branch] -- Key: HIVE-11108 URL: https://issues.apache.org/jira/browse/HIVE-11108 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-11108.1-spark.patch, HIVE-11108.2-spark.patch This prevents any BaseWork containing HTS from being vectorized. It's basically specific to spark, because Tez doesn't use HTS and MR runs HTS in local tasks. We should verify if it makes sense to make HTS support vectorization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7723: - Assignee: Hari Sankar Sivarama Subramaniyan (was: Mostafa Mokhtar) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity Key: HIVE-7723 URL: https://issues.apache.org/jira/browse/HIVE-7723 Project: Hive Issue Type: Bug Components: CLI, Physical Optimizer Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, HIVE-7723.11.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, HIVE-7723.9.patch Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it showed that ReadEntity.equals is taking ~40% of the CPU. ReadEntity.equals is called from the snippet below. Again and again the set is iterated over to get the actual match, a HashMap is a better option for this case as Set doesn't have a Get method. Also for ReadEntity equals is case-insensitive while hash is , which is an undesired behavior. {code} public static ReadEntity addInput(SetReadEntity inputs, ReadEntity newInput) { // If the input is already present, make sure the new parent is added to the input. if (inputs.contains(newInput)) { for (ReadEntity input : inputs) { if (input.equals(newInput)) { if ((newInput.getParents() != null) (!newInput.getParents().isEmpty())) { input.getParents().addAll(newInput.getParents()); input.setDirect(input.isDirect() || newInput.isDirect()); } return input; } } assert false; } else { inputs.add(newInput); return newInput; } // make compile happy return null; } {code} This is the query used : {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk =
[jira] [Assigned] (HIVE-11102) ReaderImpl: getColumnIndicesFromNames does not work for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-11102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-11102: --- Assignee: Sergey Shelukhin (was: Gopal V) ReaderImpl: getColumnIndicesFromNames does not work for ACID tables --- Key: HIVE-11102 URL: https://issues.apache.org/jira/browse/HIVE-11102 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.3.0, 1.2.1, 2.0.0 Reporter: Gopal V Assignee: Sergey Shelukhin ORC reader impl does not estimate the size of ACID data files correctly. {code} Caused by: java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:3212) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master
[ https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606810#comment-14606810 ] Sergey Shelukhin commented on HIVE-11014: - Looks like this no longer happens after recent master merge LLAP: some MiniTez tests have result changes compared to master --- Key: HIVE-11014 URL: https://issues.apache.org/jira/browse/HIVE-11014 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin -vector_binary_join_groupby, vector_outer_join1, vector_outer_join2- and cbo_windowing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11147) MetaTool doesn't update FS root location for partitions with space in name
[ https://issues.apache.org/jira/browse/HIVE-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-11147: - Attachment: HIVE-11147.01.patch Attach patch 01 MetaTool doesn't update FS root location for partitions with space in name -- Key: HIVE-11147 URL: https://issues.apache.org/jira/browse/HIVE-11147 Project: Hive Issue Type: Bug Components: Metastore Reporter: Wei Zheng Assignee: Wei Zheng Attachments: HIVE-11147.01.patch Problem happens when trying to update the FS root location: {code} # HIVE_CONF_DIR=/etc/hive/conf.server/ hive --service metatool -dryRun -updateLocation hdfs://mycluster hdfs://c6401.ambari.apache.org:8020 ... Looking for LOCATION_URI field in DBS table to update.. Dry Run of updateLocation on table DBS.. old location: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse new location: hdfs://mycluster/apps/hive/warehouse Found 1 records in DBS table to update Looking for LOCATION field in SDS table to update.. Dry Run of updateLocation on table SDS.. old location: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/web_sales/ws_web_site_sk=12 new location: hdfs://mycluster/apps/hive/warehouse/web_sales/ws_web_site_sk=12 old location: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/web_sales/ws_web_site_sk=13 new location: hdfs://mycluster/apps/hive/warehouse/web_sales/ws_web_site_sk=13 ... Found 143 records in SDS table to update Warning: Found records with bad LOCATION in SDS table.. bad location URI: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=Advanced Degree bad location URI: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=Advanced Degree bad location URI: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=4 yr Degree bad location URI: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=4 yr Degree bad location URI: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=2 yr Degree bad location URI: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=2 yr Degree {code} The reason why some entries are marked as bad location is that they have space character in the partition name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11145) Remove OFFLINE and NO_DROP from tables and partitions
[ https://issues.apache.org/jira/browse/HIVE-11145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606882#comment-14606882 ] Sergey Shelukhin commented on HIVE-11145: - Is it better to just do it on master? Remove OFFLINE and NO_DROP from tables and partitions - Key: HIVE-11145 URL: https://issues.apache.org/jira/browse/HIVE-11145 Project: Hive Issue Type: Improvement Components: Metastore, SQL Affects Versions: 2.0.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-11145.patch Currently a table or partition can be marked no_drop or offline. This prevents users from dropping or reading (and dropping) the table or partition. This was built in 0.7 before SQL standard authorization was an option. This is an expensive feature as when a table is dropped every partition must be fetched and checked to make sure it can be dropped. This feature is also redundant now that real authorization is available in Hive. This feature should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11102) ReaderImpl: getColumnIndicesFromNames does not work for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-11102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606779#comment-14606779 ] Sergey Shelukhin commented on HIVE-11102: - The issue is actually that the column is not found. Adding this: {noformat} if (fieldNames.contains(colName)) { fieldIdx = fieldNames.indexOf(colName); + } else { + String s = Cannot find field for: + colName + in ; + for (String fn : fieldNames) { +s += fn + , ; + } + LOG.error(s); + continue; } {noformat} To one test that gets this on llap branch after merge produces {noformat} 2015-06-29 17:45:56,629 ERROR [ORC_GET_SPLITS #2] orc.ReaderImpl: Cannot find field for: ctinyint in _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, {noformat} ReaderImpl: getColumnIndicesFromNames does not work for ACID tables --- Key: HIVE-11102 URL: https://issues.apache.org/jira/browse/HIVE-11102 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.3.0, 1.2.1, 2.0.0 Reporter: Gopal V Assignee: Sergey Shelukhin ORC reader impl does not estimate the size of ACID data files correctly. {code} Caused by: java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:3212) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606912#comment-14606912 ] Hive QA commented on HIVE-11095: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742660/HIVE-11095.3.patch.txt {color:green}SUCCESS:{color} +1 9035 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4436/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4436/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4436/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12742660 - PreCommit-HIVE-TRUNK-Build SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 2.0.0 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, HIVE-11095.3.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master
[ https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11014: Description: vector_binary_join_groupby, -vector_outer_join1, vector_outer_join2- and cbo_windowing was: vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and cbo_windowing LLAP: some MiniTez tests have result changes compared to master --- Key: HIVE-11014 URL: https://issues.apache.org/jira/browse/HIVE-11014 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin vector_binary_join_groupby, -vector_outer_join1, vector_outer_join2- and cbo_windowing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master
[ https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11014: Description: - vector_binary_join_groupby, vector_outer_join1, vector_outer_join2- and cbo_windowing was: vector_binary_join_groupby, -vector_outer_join1, vector_outer_join2- and cbo_windowing LLAP: some MiniTez tests have result changes compared to master --- Key: HIVE-11014 URL: https://issues.apache.org/jira/browse/HIVE-11014 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin - vector_binary_join_groupby, vector_outer_join1, vector_outer_join2- and cbo_windowing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11017) LLAP: disable the flaky TestLlapTaskSchedulerService test
[ https://issues.apache.org/jira/browse/HIVE-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-11017. - Resolution: Fixed Fix Version/s: llap LLAP: disable the flaky TestLlapTaskSchedulerService test -- Key: HIVE-11017 URL: https://issues.apache.org/jira/browse/HIVE-11017 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: llap It passes for me locally on both hadoop-1 and hadoop-2. On HiveQA, it fails: {noformat} java.lang.AssertionError: expected:6 but was:4 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.tez.dag.app.rm.TestLlapTaskSchedulerService.testNodeReEnabled(TestLlapTaskSchedulerService.java:264) {noformat} For example http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4264/testReport/org.apache.tez.dag.app.rm/TestLlapTaskSchedulerService/testNodeReEnabled/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11108) HashTableSinkOperator doesn't support vectorization [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606905#comment-14606905 ] Xuefu Zhang commented on HIVE-11108: +1 pending on test. HashTableSinkOperator doesn't support vectorization [Spark Branch] -- Key: HIVE-11108 URL: https://issues.apache.org/jira/browse/HIVE-11108 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-11108.1-spark.patch, HIVE-11108.2-spark.patch This prevents any BaseWork containing HTS from being vectorized. It's basically specific to spark, because Tez doesn't use HTS and MR runs HTS in local tasks. We should verify if it makes sense to make HTS support vectorization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606921#comment-14606921 ] xiaowei wang commented on HIVE-11095: - [~xuefuz] I add a test case ,so I need code review. The test have passed . SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 2.0.0 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, HIVE-11095.3.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11102) ReaderImpl: getColumnIndicesFromNames does not work for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-11102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11102: Attachment: HIVE-11102.patch Patch that fixes the exception. The test that was failing on LLAP branch with this error now produces the same result as on master... [~prasanth_j] should there be a separate fix for why the column is not found? ReaderImpl: getColumnIndicesFromNames does not work for ACID tables --- Key: HIVE-11102 URL: https://issues.apache.org/jira/browse/HIVE-11102 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.3.0, 1.2.1, 2.0.0 Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-11102.patch ORC reader impl does not estimate the size of ACID data files correctly. {code} Caused by: java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:3212) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4897) Hive should handle AlreadyExists on retries when creating tables/partitions
[ https://issues.apache.org/jira/browse/HIVE-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Jacobs updated HIVE-4897: - Priority: Major (was: Minor) Description: Creating new tables/partitions may fail with an AlreadyExistsException if there is an error part way through the creation and the HMS tries again without properly cleaning up or checking if this is a retry. While partitioning a new table via a script on distributed hive (MetaStore on the same machine) there was a long timeout and then: {code} FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Partition already exists:Partition( ... {code} I am assuming this is due to retry. Perhaps already-exists on retry could be handled better. A similar error occurred while creating a table through Impala, which issued a single createTable call that failed with an AlreadyExistsException. See the logs related to table tmp_proc_8_d2b7b0f133be455ca95615818b8a5879_7 in the attached hive-snippet.log was: While partitioning a new table via a script on distributed hive (MetaStore on the same machine) there was a long timeout and then: {code} FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Partition already exists:Partition( ... {code} I am assuming this is due to retry. Perhaps already-exists on retry could be handled better. Summary: Hive should handle AlreadyExists on retries when creating tables/partitions (was: Hive should handle AlreadyExists on retries when creating partitions) Hive should handle AlreadyExists on retries when creating tables/partitions --- Key: HIVE-4897 URL: https://issues.apache.org/jira/browse/HIVE-4897 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Attachments: hive-snippet.log Creating new tables/partitions may fail with an AlreadyExistsException if there is an error part way through the creation and the HMS tries again without properly cleaning up or checking if this is a retry. While partitioning a new table via a script on distributed hive (MetaStore on the same machine) there was a long timeout and then: {code} FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Partition already exists:Partition( ... {code} I am assuming this is due to retry. Perhaps already-exists on retry could be handled better. A similar error occurred while creating a table through Impala, which issued a single createTable call that failed with an AlreadyExistsException. See the logs related to table tmp_proc_8_d2b7b0f133be455ca95615818b8a5879_7 in the attached hive-snippet.log -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master
[ https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-11014: --- Assignee: Sergey Shelukhin (was: Matt McCline) LLAP: some MiniTez tests have result changes compared to master --- Key: HIVE-11014 URL: https://issues.apache.org/jira/browse/HIVE-11014 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and cbo_windowing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master
[ https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11014: Description: -vector_binary_join_groupby, vector_outer_join1, vector_outer_join2- and cbo_windowing was: - vector_binary_join_groupby, vector_outer_join1, vector_outer_join2- and cbo_windowing LLAP: some MiniTez tests have result changes compared to master --- Key: HIVE-11014 URL: https://issues.apache.org/jira/browse/HIVE-11014 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin -vector_binary_join_groupby, vector_outer_join1, vector_outer_join2- and cbo_windowing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606926#comment-14606926 ] xiaowei wang commented on HIVE-2: - The above is wrong . I will try to add a test case today for HIVE_11095,HIVE_11095 and HIVE-2 cover different case ,but HIVE-10983 duplicate both. ISO-8859-1 text output has fragments of previous longer rows appended - Key: HIVE-2 URL: https://issues.apache.org/jira/browse/HIVE-2 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Fix For: 1.3.0, 2.0.0 Attachments: HIVE-2.1.patch If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string. Example steps to reproduce: 1. Create a table using ISO 8859-1 encoding: {code:sql} CREATE TABLE person_lat1 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1'); {code} 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text: {noformat} Müller,Thomas Jørgensen,Jørgen Peña,Andrés Nåm,Fæk {noformat} 3. Execute {{SELECT * FROM person_lat1}} Result - The following output appears: {noformat} +---+--+ | person_lat1.name | +---+--+ | Müller,Thomas | | Jørgensen,Jørgen | | Peña,Andrésørgen | | Nåm,Fækdrésørgen | +---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-11095: Attachment: HIVE-11095.3.patch.txt SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 2.0.0 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, HIVE-11095.3.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master
[ https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11014: Description: vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and cbo_windowing LLAP: some MiniTez tests have result changes compared to master --- Key: HIVE-11014 URL: https://issues.apache.org/jira/browse/HIVE-11014 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Matt McCline vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and cbo_windowing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607202#comment-14607202 ] xiaowei wang commented on HIVE-11095: - Thanks! SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 2.0.0 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, HIVE-11095.3.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9566) HiveServer2 fails to start with NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-9566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9566: -- Attachment: HIVE-9566.patch Renamed patch to trigger the test run. HiveServer2 fails to start with NullPointerException Key: HIVE-9566 URL: https://issues.apache.org/jira/browse/HIVE-9566 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0, 0.14.0, 0.13.1 Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-9566-branch-0.13.patch, HIVE-9566-branch-0.14.patch, HIVE-9566-trunk.patch, HIVE-9566.patch hiveserver2 uses embedded metastore with default hive-site.xml configuration. I use hive --stop --service hiveserver2 command to stop the running hiveserver2 process and then use hive --start --service hiveserver2 command to start the hiveserver2 service. I see the following exception in the hive.log file {noformat} java.lang.NullPointerException at org.apache.hive.service.server.HiveServer2.stop(HiveServer2.java:104) at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:138) at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:171) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607540#comment-14607540 ] xiaowei wang commented on HIVE-11095: - Is there a problem ? SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 2.0.0 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, HIVE-11095.3.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11145) Remove OFFLINE and NO_DROP from tables and partitions
[ https://issues.apache.org/jira/browse/HIVE-11145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607500#comment-14607500 ] Alan Gates commented on HIVE-11145: --- Yes, I rebased it to master. Putting it on hbase-metastore was a mistake. Remove OFFLINE and NO_DROP from tables and partitions - Key: HIVE-11145 URL: https://issues.apache.org/jira/browse/HIVE-11145 Project: Hive Issue Type: Improvement Components: Metastore, SQL Affects Versions: 2.0.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-11145.patch Currently a table or partition can be marked no_drop or offline. This prevents users from dropping or reading (and dropping) the table or partition. This was built in 0.7 before SQL standard authorization was an option. This is an expensive feature as when a table is dropped every partition must be fetched and checked to make sure it can be dropped. This feature is also redundant now that real authorization is available in Hive. This feature should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607193#comment-14607193 ] Hive QA commented on HIVE-11138: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742698/HIVE-11138.1-spark.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 6222 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestHBaseCliDriver.initializationError org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError org.apache.hadoop.hive.cli.TestNegativeCliDriver.initializationError org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.initializationError org.apache.hive.jdbc.TestSSL.testSSLFetchHttp {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/917/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/917/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-917/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12742698 - PreCommit-HIVE-SPARK-Build Query fails when there isn't a comparator for an operator [Spark Branch] Key: HIVE-11138 URL: https://issues.apache.org/jira/browse/HIVE-11138 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-11138.1-spark.patch In such case, OperatorComparatorFactory should default to false instead of throw exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607197#comment-14607197 ] Xuefu Zhang commented on HIVE-11095: +1 SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 2.0.0 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, HIVE-11095.3.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10328) Enable new return path for cbo
[ https://issues.apache.org/jira/browse/HIVE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607549#comment-14607549 ] Hive QA commented on HIVE-10328: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742669/HIVE-10328.6.patch {color:red}ERROR:{color} -1 due to 1342 failed/errored test(s), 8990 tests executed *Failed tests:* {noformat} TestCliDriver-groupby10.q-timestamp_comparison.q-tez_union.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-infer_bucket_sort_list_bucket.q-bucketmapjoin4.q-show_tables.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-skewjoinopt16.q-udf_in_file.q-mapjoin_filter_on_outerjoin.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguitycheck org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_table_null_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_array_map_access_nonconstant org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join19 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join33 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_filters org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
[jira] [Updated] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh
[ https://issues.apache.org/jira/browse/HIVE-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11140: -- Attachment: HIVE-11140.patch auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh -- Key: HIVE-11140 URL: https://issues.apache.org/jira/browse/HIVE-11140 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11140.patch, HIVE-11140.patch it's currently set as {noformat} if [ -z ${PROJ_HOME} ]; then export PROJ_HOME=/Users/${USER}/dev/hive fi {noformat} but it always points to project root so can be {{export PROJ_HOME=../../../../../..}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge
[ https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11141: - Description: More and more complex workloads are migrated to Hive from Sql Server, Terradata etc.. And occasionally Hive gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} createtable.rtf - create table command SQLQuery10.sql.mssql - explain query It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} was: More and more complex workloads are migrated to Hive from Sql Server, Terradata etc.. And occasionally Hive gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} Improve RuleRegExp when the Expression node stack gets huge --- Key: HIVE-11141 URL: https://issues.apache.org/jira/browse/HIVE-11141 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: SQLQuery10.sql.mssql, createtable.rtf More and more complex workloads are migrated to Hive from Sql Server, Terradata etc.. And occasionally Hive gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} createtable.rtf - create table command SQLQuery10.sql.mssql - explain query It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge
[ https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11141: - Attachment: createtable.rtf SQLQuery10.sql.mssql Improve RuleRegExp when the Expression node stack gets huge --- Key: HIVE-11141 URL: https://issues.apache.org/jira/browse/HIVE-11141 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: SQLQuery10.sql.mssql, createtable.rtf More and more complex workloads are migrated to Hive from Sql Server, Terradata etc.. And occasionally Hive gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge
[ https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11141: - Attachment: HIVE-11141.1.patch cc-ing [~jpullokkaran] for review. Improve RuleRegExp when the Expression node stack gets huge --- Key: HIVE-11141 URL: https://issues.apache.org/jira/browse/HIVE-11141 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11141.1.patch, SQLQuery10.sql.mssql, createtable.rtf More and more complex workloads are migrated to Hive from Sql Server, Terradata etc.. And occasionally Hive gets bottlenecked on generating plans for large queries, the majority of the cases time is spent in fetching metadata, partitions and other optimizer transformation related rules I have attached the query for the test case which needs to be tested after we setup database as shown below. {code} create database dataset_3; use database dataset_3; {code} createtable.rtf - create table command SQLQuery10.sql.mssql - explain query It seems that the most problematic part of the code as the stack gets arbitrary long, in RuleRegExp.java {code} @Override public int cost(StackNode stack) throws SemanticException { int numElems = (stack != null ? stack.size() : 0); String name = ; for (int pos = numElems - 1; pos = 0; pos--) { name = stack.get(pos).getName() + % + name; Matcher m = pattern.matcher(name); if (m.matches()) { return m.group().length(); } } return -1; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11100) Beeline should escape semi-colon in queries
[ https://issues.apache.org/jira/browse/HIVE-11100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606221#comment-14606221 ] Xuefu Zhang commented on HIVE-11100: Okay. +1 Beeline should escape semi-colon in queries --- Key: HIVE-11100 URL: https://issues.apache.org/jira/browse/HIVE-11100 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.2.0, 1.1.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-11100.patch Beeline should escape the semicolon in queries. for example, the query like followings: CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n'; or CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;' LINES TERMINATED BY '\n'; both failed. But the 2nd query with semicolon escaped with \ works in CLI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)