date:20150629


 [ 
https://issues.apache.org/jira/browse/HIVE-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-11108:
--
Attachment: HIVE-11108.1-spark.patch

The patch enables vectorization for SparkHashTableSinkOperator.
Did some local tests. The end to end performance gain is not very obvious, as 
HTS usually processes the small tables. But for the specific stage, performance 
can be improved by about 2X in some cases, e.g. the work is computing min/max.

 HashTableSinkOperator doesn't support vectorization [Spark Branch]
 --

 Key: HIVE-11108
 URL: https://issues.apache.org/jira/browse/HIVE-11108
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-11108.1-spark.patch


 This prevents any BaseWork containing HTS from being vectorized. It's 
 basically specific to spark, because Tez doesn't use HTS and MR runs HTS in 
 local tasks.
 We should verify if it makes sense to make HTS support vectorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11110) Enable HiveJoinAddNotNullRule in CBO

2015-06-29 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605543#comment-14605543
 ] 

Jesus Camacho Rodriguez commented on HIVE-0:


I have analyzed the results; fails fall in different categories that need to be 
further analyzed:
- Some of them are benign: additional filters are added or the join inputs were 
swapped.
- In some cases, 2-input joins are not merged into 3-input joins (join_3.q, 
annotate_stats_join.q, auto_join3.q, explain_logical.q), which might result in 
additional execution stages.
- In another case, a new 3-input join that was not identified before is created 
(correlation_optimizer6.q).
- There seems to be a few cases with lost of bucketing using insert overwrite 
(infer_bucket_sort.q, join_33.q).
--
Apart from these, the new rule triggers indefinitely for subquery_views.q. This 
is solved by [~jpullokkaran] patch by putting the HiveJoinAddNotNullRule rule 
on its own group of rules, but the issue should be studied further too.


 Enable HiveJoinAddNotNullRule in CBO
 

 Key: HIVE-0
 URL: https://issues.apache.org/jira/browse/HIVE-0
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-0.1.patch, HIVE-0.patch


 Query
 {code}
 select  count(*)
  from store_sales
  ,store_returns
  ,date_dim d1
  ,date_dim d2
  where d1.d_quarter_name = '2000Q1'
and d1.d_date_sk = ss_sold_date_sk
and ss_customer_sk = sr_customer_sk
and ss_item_sk = sr_item_sk
and ss_ticket_number = sr_ticket_number
and sr_returned_date_sk = d2.d_date_sk
and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’);
 {code}
 The store_sales table is partitioned on ss_sold_date_sk, which is also used 
 in a join clause. The join clause should add a filter “filterExpr: 
 ss_sold_date_sk is not null”, which should get pushed the MetaStore when 
 fetching the stats. Currently this is not done in CBO planning, which results 
 in the stats from __HIVE_DEFAULT_PARTITION__ to be fetched and considered in 
 the optimization phase. In particular, this increases the NDV for the join 
 columns and may result in wrong planning.
 Including HiveJoinAddNotNullRule in the optimization phase solves this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)

2015-06-29 Thread Dmitry Tolpeko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Tolpeko updated HIVE-11055:
--
Attachment: HIVE-11055.3.patch

Created Patch 3 - made the tool compatible with Hadoop 1.

 HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
 ---

 Key: HIVE-11055
 URL: https://issues.apache.org/jira/browse/HIVE-11055
 Project: Hive
  Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
 HIVE-11055.3.patch


 There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
 (actually any SQL-on-Hadoop implementation and any JDBC source).
 Alan Gates offered to contribute it to Hive under HPL/SQL name 
 (org.apache.hive.hplsql package). This JIRA is to create a patch to 
 contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11134) HS2 should log open session failure

2015-06-29 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605537#comment-14605537
 ] 

Vaibhav Gumashta commented on HIVE-11134:
-

+1

 HS2 should log open session failure
 ---

 Key: HIVE-11134
 URL: https://issues.apache.org/jira/browse/HIVE-11134
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-11134.1.patch


 HiveServer2 should log OpenSession failure.  If beeline is not running with 
 --verbose=true all stack trace information is not available for later 
 debugging, as it is not currently logged in server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources

2015-06-29 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605541#comment-14605541
 ] 

Vaibhav Gumashta commented on HIVE-10895:
-

[~aihuaxu] Were you able to reproduce the db leak at your end? In our setup, 
when we used oracle as the metastore db, we saw oracle running out of cursors. 
I'll try to run the patch through that system test as well. 

 ObjectStore does not close Query objects in some calls, causing a potential 
 leak in some metastore db resources
 ---

 Key: HIVE-10895
 URL: https://issues.apache.org/jira/browse/HIVE-10895
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
Reporter: Takahiko Saito
Assignee: Aihua Xu
 Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, 
 HIVE-10895.3.patch


 During testing, we've noticed Oracle db running out of cursors. Might be 
 related to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7451) pass function name in create/drop function to authorization api

2015-06-29 Thread Olaf Flebbe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olaf Flebbe updated HIVE-7451:
--
Affects Version/s: (was: 1.2.0)

 pass function name in create/drop function to authorization api
 ---

 Key: HIVE-7451
 URL: https://issues.apache.org/jira/browse/HIVE-7451
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.14.0

 Attachments: HIVE-7451.1.patch, HIVE-7451.2.patch, HIVE-7451.3.patch, 
 HIVE-7451.4.patch


 If function names are passed to the authorization api for create/drop 
 function calls, then authorization decisions can be made based on the 
 function names as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources

2015-06-29 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605538#comment-14605538
 ] 

Vaibhav Gumashta commented on HIVE-10895:
-

[~aihuaxu] I'll be able to look at the patch today. Thanks for the effort.

 ObjectStore does not close Query objects in some calls, causing a potential 
 leak in some metastore db resources
 ---

 Key: HIVE-10895
 URL: https://issues.apache.org/jira/browse/HIVE-10895
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
Reporter: Takahiko Saito
Assignee: Aihua Xu
 Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, 
 HIVE-10895.3.patch


 During testing, we've noticed Oracle db running out of cursors. Might be 
 related to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez


[ 
https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605700#comment-14605700
 ] 

Alex Bush commented on HIVE-7765:
-

Workaround is to create an empty partition by creating the directory in HDFS 
and doing an MSCK repair table.

 Null pointer error with UNION ALL on partitioned tables using Tez
 -

 Key: HIVE-7765
 URL: https://issues.apache.org/jira/browse/HIVE-7765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1.
Reporter: Chris Dragga
Priority: Minor

 When executing a UNION ALL query in Tez over partitioned tables where at 
 least one table is empty, Hive fails to execute the query, returning the 
 message FAILED: NullPointerException null.  No stack trace accompanies this 
 message.  Removing partitioning solves this problem, as does switching to 
 MapReduce as the execution engine.
 This can be reproduced using a variant of the example tables from the 
 Getting Started documentation on the Hive wiki.  To create the schema, use
 CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
 CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
 Then, load invites with data (e.g., using the instructions 
 [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations])
  and execute the following:
 SELECT * FROM invites
 UNION ALL
 SELECT * FROM empty_invites;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez


[ 
https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605728#comment-14605728
 ] 

Alex Bush commented on HIVE-7765:
-

Here is how to recreate this bug and use the workaround:
#!/bin/bash

echo col1,col2  /tmp/unionall_txt

HIVECONF=--hiveconf hive.root.logger=INFO,console --hiveconf 
hive.cli.errors.ignore=true

hive -v $HIVECONF -e 
drop database if exists unionall_test cascade;

create database unionall_test;

use unionall_test;

CREATE TABLE test_a (f1 STRING, f2 STRING) PARTITIONED BY (ds STRING);
CREATE TABLE test_b (f1 STRING, f2 STRING) PARTITIONED BY (ds STRING);

LOAD DATA LOCAL INPATH '/tmp/unionall_txt' OVERWRITE INTO TABLE test_a 
PARTITION ( ds='a' );

SELECT * FROM test_a
UNION ALL
SELECT * FROM test_b;

alter table test_b add partition ( ds='b' );

SELECT * FROM test_a
UNION ALL
SELECT * FROM test_b;


 Null pointer error with UNION ALL on partitioned tables using Tez
 -

 Key: HIVE-7765
 URL: https://issues.apache.org/jira/browse/HIVE-7765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1.
Reporter: Chris Dragga
Priority: Minor

 When executing a UNION ALL query in Tez over partitioned tables where at 
 least one table is empty, Hive fails to execute the query, returning the 
 message FAILED: NullPointerException null.  No stack trace accompanies this 
 message.  Removing partitioning solves this problem, as does switching to 
 MapReduce as the execution engine.
 This can be reproduced using a variant of the example tables from the 
 Getting Started documentation on the Hive wiki.  To create the schema, use
 CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
 CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
 Then, load invites with data (e.g., using the instructions 
 [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations])
  and execute the following:
 SELECT * FROM invites
 UNION ALL
 SELECT * FROM empty_invites;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez


 [ 
https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Bush updated HIVE-7765:

Environment: 
Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1.
Hadoop 2.2.6

  was:Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1.


 Null pointer error with UNION ALL on partitioned tables using Tez
 -

 Key: HIVE-7765
 URL: https://issues.apache.org/jira/browse/HIVE-7765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1.
 Hadoop 2.2.6
Reporter: Chris Dragga
Priority: Minor

 When executing a UNION ALL query in Tez over partitioned tables where at 
 least one table is empty, Hive fails to execute the query, returning the 
 message FAILED: NullPointerException null.  No stack trace accompanies this 
 message.  Removing partitioning solves this problem, as does switching to 
 MapReduce as the execution engine.
 This can be reproduced using a variant of the example tables from the 
 Getting Started documentation on the Hive wiki.  To create the schema, use
 CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
 CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
 Then, load invites with data (e.g., using the instructions 
 [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations])
  and execute the following:
 SELECT * FROM invites
 UNION ALL
 SELECT * FROM empty_invites;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)


[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605713#comment-14605713
 ] 

Hive QA commented on HIVE-11055:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742497/HIVE-11055.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9033 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4430/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4430/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4430/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742497 - PreCommit-HIVE-TRUNK-Build

 HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
 ---

 Key: HIVE-11055
 URL: https://issues.apache.org/jira/browse/HIVE-11055
 Project: Hive
  Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
 HIVE-11055.3.patch


 There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
 (actually any SQL-on-Hadoop implementation and any JDBC source).
 Alan Gates offered to contribute it to Hive under HPL/SQL name 
 (org.apache.hive.hplsql package). This JIRA is to create a patch to 
 contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11108) HashTableSinkOperator doesn't support vectorization [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605736#comment-14605736
 ] 

Hive QA commented on HIVE-11108:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742505/HIVE-11108.1-spark.patch

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 7992 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_decimal_mapjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_left_outer_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_mapjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/915/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/915/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-915/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742505 - PreCommit-HIVE-SPARK-Build

 HashTableSinkOperator doesn't support vectorization [Spark Branch]
 --

 Key: HIVE-11108
 URL: https://issues.apache.org/jira/browse/HIVE-11108
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-11108.1-spark.patch


 This prevents any BaseWork containing HTS from being vectorized. It's 
 basically specific to spark, because Tez doesn't use HTS and MR runs HTS in 
 local tasks.
 We should verify if it makes sense to make HTS support vectorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog


[ 
https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605750#comment-14605750
 ] 

Aihua Xu commented on HIVE-10754:
-

[~ctang.ma] I have switched the task to replace the deprecated calls with the 
new calls in HCatalog. It should not have any functional impact. 

 new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
 -

 Key: HIVE-10754
 URL: https://issues.apache.org/jira/browse/HIVE-10754
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10754.patch


 Replace all the deprecated new Job() with Job.getInstance() in HCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources


[ 
https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605667#comment-14605667
 ] 

Aihua Xu commented on HIVE-10895:
-

Really appreciate it if you can review the code, give it  a test so that we can 
move it forward [~vgumashta].  Actually the customers are seeing the out of 
cursors in the production. I'm trying to repro locally (not able to repro 
yet). It would be great if you can try out on the test system.  

 ObjectStore does not close Query objects in some calls, causing a potential 
 leak in some metastore db resources
 ---

 Key: HIVE-10895
 URL: https://issues.apache.org/jira/browse/HIVE-10895
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
Reporter: Takahiko Saito
Assignee: Aihua Xu
 Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, 
 HIVE-10895.3.patch


 During testing, we've noticed Oracle db running out of cursors. Might be 
 related to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended

2015-06-29 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605669#comment-14605669
 ] 

Yongzhi Chen commented on HIVE-2:
-

[~wisgood], you can merge my test case. I just solve my jira, you just merge my 
fixes and commit from your jira. Thanks

 ISO-8859-1 text output has fragments of previous longer rows appended
 -

 Key: HIVE-2
 URL: https://issues.apache.org/jira/browse/HIVE-2
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-2.1.patch


 If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query 
 results for a string column are incorrect for any row that was preceded by a 
 row containing a longer string.
 Example steps to reproduce:
 1. Create a table using ISO 8859-1 encoding:
 {code:sql}
 CREATE TABLE person_lat1 (name STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
 SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');
 {code}
 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder 
 in HDFS. I'll attach an example file containing the following text: 
 {noformat}
 Müller,Thomas
 Jørgensen,Jørgen
 Peña,Andrés
 Nåm,Fæk
 {noformat}
 3. Execute {{SELECT * FROM person_lat1}}
 Result - The following output appears:
 {noformat}
 +---+--+
 | person_lat1.name |
 +---+--+
 | Müller,Thomas |
 | Jørgensen,Jørgen |
 | Peña,Andrésørgen |
 | Nåm,Fækdrésørgen |
 +---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez


 [ 
https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Bush updated HIVE-7765:

Priority: Major  (was: Minor)

 Null pointer error with UNION ALL on partitioned tables using Tez
 -

 Key: HIVE-7765
 URL: https://issues.apache.org/jira/browse/HIVE-7765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1;
 Hadoop 2.2.6, Tez 0.5.2, Hive 0.14.0, CentOS 6.6
Reporter: Chris Dragga

 When executing a UNION ALL query in Tez over partitioned tables where at 
 least one table is empty, Hive fails to execute the query, returning the 
 message FAILED: NullPointerException null.  No stack trace accompanies this 
 message.  Removing partitioning solves this problem, as does switching to 
 MapReduce as the execution engine.
 This can be reproduced using a variant of the example tables from the 
 Getting Started documentation on the Hive wiki.  To create the schema, use
 CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
 CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
 Then, load invites with data (e.g., using the instructions 
 [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations])
  and execute the following:
 SELECT * FROM invites
 UNION ALL
 SELECT * FROM empty_invites;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez


 [ 
https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Bush updated HIVE-7765:

Environment: 
Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1;
Hadoop 2.2.6, Tez 0.5.2, Hive 0.14.0, CentOS 6.6

  was:
Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1.
Hadoop 2.2.6


 Null pointer error with UNION ALL on partitioned tables using Tez
 -

 Key: HIVE-7765
 URL: https://issues.apache.org/jira/browse/HIVE-7765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1;
 Hadoop 2.2.6, Tez 0.5.2, Hive 0.14.0, CentOS 6.6
Reporter: Chris Dragga
Priority: Minor

 When executing a UNION ALL query in Tez over partitioned tables where at 
 least one table is empty, Hive fails to execute the query, returning the 
 message FAILED: NullPointerException null.  No stack trace accompanies this 
 message.  Removing partitioning solves this problem, as does switching to 
 MapReduce as the execution engine.
 This can be reproduced using a variant of the example tables from the 
 Getting Started documentation on the Hive wiki.  To create the schema, use
 CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
 CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
 Then, load invites with data (e.g., using the instructions 
 [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations])
  and execute the following:
 SELECT * FROM invites
 UNION ALL
 SELECT * FROM empty_invites;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11130) Refactoring the code so that HiveTxnManager interface will support lock/unlock table/database object


[ 
https://issues.apache.org/jira/browse/HIVE-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605782#comment-14605782
 ] 

Aihua Xu commented on HIVE-11130:
-

Seems the two tests are not related to this refactoring.

 Refactoring the code so that HiveTxnManager interface will support 
 lock/unlock table/database object
 

 Key: HIVE-11130
 URL: https://issues.apache.org/jira/browse/HIVE-11130
 Project: Hive
  Issue Type: Sub-task
  Components: Locking
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-11130.patch


 This is just a refactoring step which keeps the current logic, but it exposes 
 the explicit lock/unlock table and database  in HiveTxnManager which should 
 be implemented differently by the subclasses ( currently it's not. e.g., for 
 ZooKeeper implementation, we should lock table and database when we try to 
 lock the table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended


[ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605788#comment-14605788
 ] 

Xuefu Zhang commented on HIVE-2:


To accelerate the process, I committed the patch here to branch-1 and master. 
[~wisgood], could you consolidate HIVE-11095 and HIVE-10983? Add a test case if 
needed. I should be able to review it quickly. Thanks.

Thanks to Yongzhi and Xiaowei for working on this.

 ISO-8859-1 text output has fragments of previous longer rows appended
 -

 Key: HIVE-2
 URL: https://issues.apache.org/jira/browse/HIVE-2
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-2.1.patch


 If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query 
 results for a string column are incorrect for any row that was preceded by a 
 row containing a longer string.
 Example steps to reproduce:
 1. Create a table using ISO 8859-1 encoding:
 {code:sql}
 CREATE TABLE person_lat1 (name STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
 SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');
 {code}
 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder 
 in HDFS. I'll attach an example file containing the following text: 
 {noformat}
 Müller,Thomas
 Jørgensen,Jørgen
 Peña,Andrés
 Nåm,Fæk
 {noformat}
 3. Execute {{SELECT * FROM person_lat1}}
 Result - The following output appears:
 {noformat}
 +---+--+
 | person_lat1.name |
 +---+--+
 | Müller,Thomas |
 | Jørgensen,Jørgen |
 | Peña,Andrésørgen |
 | Nåm,Fækdrésørgen |
 +---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez


 [ 
https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Bush updated HIVE-7765:

Affects Version/s: 0.14.0

 Null pointer error with UNION ALL on partitioned tables using Tez
 -

 Key: HIVE-7765
 URL: https://issues.apache.org/jira/browse/HIVE-7765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1.
Reporter: Chris Dragga
Priority: Minor

 When executing a UNION ALL query in Tez over partitioned tables where at 
 least one table is empty, Hive fails to execute the query, returning the 
 message FAILED: NullPointerException null.  No stack trace accompanies this 
 message.  Removing partitioning solves this problem, as does switching to 
 MapReduce as the execution engine.
 This can be reproduced using a variant of the example tables from the 
 Getting Started documentation on the Hive wiki.  To create the schema, use
 CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
 CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
 Then, load invites with data (e.g., using the instructions 
 [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations])
  and execute the following:
 SELECT * FROM invites
 UNION ALL
 SELECT * FROM empty_invites;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez


[ 
https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605693#comment-14605693
 ] 

Alex Bush commented on HIVE-7765:
-

Stack trace from error:
SELECT * FROM test5_a
UNION ALL
SELECT * FROM test5_b
15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=Driver.run 
from=org.apache.hadoop.hive.ql.Driver
15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=TimeToSubmit 
from=org.apache.hadoop.hive.ql.Driver
15/06/29 15:11:35 [main]: INFO ql.Driver: Concurrency mode is disabled, not 
creating a lock manager
15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=compile 
from=org.apache.hadoop.hive.ql.Driver
15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=parse 
from=org.apache.hadoop.hive.ql.Driver
15/06/29 15:11:35 [main]: INFO parse.ParseDriver: Parsing command:

SELECT * FROM test5_a
UNION ALL
SELECT * FROM test5_b
15/06/29 15:11:35 [main]: INFO parse.ParseDriver: Parse Completed
15/06/29 15:11:35 [main]: INFO log.PerfLogger: /PERFLOG method=parse 
start=1435587095311 end=1435587095313 duration=2 
from=org.apache.hadoop.hive.ql.Driver
15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=semanticAnalyze 
from=org.apache.hadoop.hive.ql.Driver
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Starting Semantic 
Analysis
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Completed phase 1 of 
Semantic Analysis
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for source 
tables
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for 
subqueries
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for source 
tables
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for 
subqueries
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for 
destination tables
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for source 
tables
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for 
subqueries
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for 
destination tables
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Get metadata for 
destination tables
15/06/29 15:11:35 [main]: INFO ql.Context: New scratch dir is 
hdfs://upgtst226/tmp/hive/hdp_batch/57614a3b-aa9a-4bf8-82ca-4451f72b9d28/hive_2015-06-29_15-11-35_310_293431230946478382-1
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Completed getting 
MetaData in Semantic Analysis
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Not invoking CBO because 
the statement has too few joins
15/06/29 15:11:35 [main]: INFO parse.SemanticAnalyzer: Set stats collection dir 
: 
hdfs://upgtst226/tmp/hive/hdp_batch/57614a3b-aa9a-4bf8-82ca-4451f72b9d28/hive_2015-06-29_15-11-35_310_293431230946478382-1/-ext-10002
15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for FS(6)
15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for SEL(5)
15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for UNION(4)
15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for SEL(1)
15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for TS(0)
15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for SEL(3)
15/06/29 15:11:35 [main]: INFO ppd.OpProcFactory: Processing for TS(2)
15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG 
method=partition-retrieving 
from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner
15/06/29 15:11:35 [main]: INFO log.PerfLogger: /PERFLOG 
method=partition-retrieving start=1435587095494 end=1435587095606 duration=112 
from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner
15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG 
method=partition-retrieving 
from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner
15/06/29 15:11:35 [main]: INFO log.PerfLogger: /PERFLOG 
method=partition-retrieving start=1435587095606 end=1435587095735 duration=129 
from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner
15/06/29 15:11:35 [main]: INFO parse.TezCompiler: Cycle free: true
15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=serializePlan 
from=org.apache.hadoop.hive.ql.exec.Utilities
15/06/29 15:11:35 [main]: INFO exec.Utilities: Serializing ArrayList via kryo
15/06/29 15:11:35 [main]: INFO log.PerfLogger: /PERFLOG method=serializePlan 
start=1435587095740 end=1435587095743 duration=3 
from=org.apache.hadoop.hive.ql.exec.Utilities
15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG method=deserializePlan 
from=org.apache.hadoop.hive.ql.exec.Utilities
15/06/29 15:11:35 [main]: INFO exec.Utilities: Deserializing ArrayList via kryo
15/06/29 15:11:35 [main]: INFO log.PerfLogger: /PERFLOG method=deserializePlan 
start=1435587095743 end=1435587095746 duration=3 
from=org.apache.hadoop.hive.ql.exec.Utilities
15/06/29 15:11:35 [main]: INFO log.PerfLogger: PERFLOG

[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-29 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605959#comment-14605959
 ] 

Sushanth Sowmyan commented on HIVE-10983:
-

Not a problem! As part of the release process, I'm required to go unset all 
jiras marked for older released releases, and that's what I was doing. :)

To expand further, the idea is that Fix Version is set to track which branches 
the commits got committed to, and thus, should not be set unless this patch has 
already been committed to those branches. So, now, for example, if this commit 
is committed to branch-1.2 to track 1.2.x, its fix version would be 1.2.2 once 
it is committed. Setting it to 1.2.0 would mean that this was included as part 
of the 1.2.0 release, which it wasn't. So, for this, when a committer commits a 
patch for this bug, if they commit it to branch-1.2, they should then set the 
fix version to 1.2.2.


 SerDeUtils bug  ,when Text is reused 
 -

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
  Labels: patch
 Fix For: 2.0.0

 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
 HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt


 {noformat}
 The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
 invoke a bad method of Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 When i query data from a lzo table ， I found  in results ： the length of the 
 current row is always largr  than the previous row， and sometimes，the current 
  row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select *   from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content  of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils

2015-06-29 Thread Nishant Kelkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605940#comment-14605940
 ] 

Nishant Kelkar commented on HIVE-11137:
---

LazyBinaryUtils used only for readVInt() and writeVInt(). Relevant sections of 
code from LazyBinaryUtils:

{code}
  private static ThreadLocalbyte[] vLongBytesThreadLocal = new 
ThreadLocalbyte[]() {
@Override
public byte[] initialValue() {
  return new byte[9];
}
  };

  public static void writeVLong(RandomAccessOutput byteStream, long l) {
byte[] vLongBytes = vLongBytesThreadLocal.get();
int len = LazyBinaryUtils.writeVLongToByteArray(vLongBytes, l);
byteStream.write(vLongBytes, 0, len);
  }
{code}

{code}
  /**
   * Reads a zero-compressed encoded int from a byte array and returns it.
   *
   * @param bytes
   *  the byte array
   * @param offset
   *  offset of the array to read from
   * @param vInt
   *  storing the deserialized int and its size in byte
   */
  public static void readVInt(byte[] bytes, int offset, VInt vInt) {
byte firstByte = bytes[offset];
vInt.length = (byte) WritableUtils.decodeVIntSize(firstByte);
if (vInt.length == 1) {
  vInt.value = firstByte;
  return;
}
int i = 0;
for (int idx = 0; idx  vInt.length - 1; idx++) {
  byte b = bytes[offset + 1 + idx];
  i = i  8;
  i = i | (b  0xFF);
}
vInt.value = (WritableUtils.isNegativeVInt(firstByte) ? (i ^ -1) : i);
  }
{code}

I could contribute a patch towards this task [~owen.omalley] (I'm a beginner 
contributor in Hive, looking around for work :)). Thanks and let me know!


 In DateWritable remove the use of LazyBinaryUtils
 -

 Key: HIVE-11137
 URL: https://issues.apache.org/jira/browse/HIVE-11137
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Currently the DateWritable class uses LazyBinaryUtils, which has a lot of 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge


[ 
https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606699#comment-14606699
 ] 

Hive QA commented on HIVE-11141:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742590/HIVE-11141.2.patch

{color:red}ERROR:{color} -1 due to 116 failed/errored test(s), 9034 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_orig_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_whole_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_cube1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_update_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join40
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_partition_metadataonly
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_filter_on_outerjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_gby3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonblock_op_deduplicate
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partInit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_date2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_timestamp2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_reduce_deduplicate_extended
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_reducesink_dedup
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unionDistinct_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_all_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_all_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_two_cols
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mr_diff_schema_alias
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_orig_table

[jira] [Commented] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605375#comment-14605375
 ] 

Hive QA commented on HIVE-11138:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742480/HIVE-11138.1-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 6207 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestHBaseCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.initializationError
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/914/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/914/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-914/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742480 - PreCommit-HIVE-SPARK-Build

 Query fails when there isn't a comparator for an operator [Spark Branch]
 

 Key: HIVE-11138
 URL: https://issues.apache.org/jira/browse/HIVE-11138
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-11138.1-spark.patch, HIVE-11138.1-spark.patch


 In such case, OperatorComparatorFactory should default to false instead of 
 throw exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization

2015-06-29 Thread dima machlin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605462#comment-14605462
 ] 

dima machlin commented on HIVE-7205:


Will this patch be merged to future versions?
Until what version is it safe to apply this patch?

 Wrong results when union all of grouping followed by group by with 
 correlation optimization
 ---

 Key: HIVE-7205
 URL: https://issues.apache.org/jira/browse/HIVE-7205
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: dima machlin
Assignee: Navis
Priority: Critical
 Attachments: HIVE-7205.1.patch.txt, HIVE-7205.2.patch.txt, 
 HIVE-7205.3.patch.txt, HIVE-7205.4.patch.txt


 use case :
 table TBL (a string,b string) contains single row : 'a','a'
 the following query :
 {code:sql}
 select b, sum(cc) from (
 select b,count(1) as cc from TBL group by b
 union all
 select a as b,count(1) as cc from TBL group by a
 ) z
 group by b
 {code}
 returns 
 a 1
 a 1
 while set hive.optimize.correlation=true;
 if we change set hive.optimize.correlation=false;
 it returns correct results : a 2
 The plan with correlation optimization :
 {code:sql}
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM 
 (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
 TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR 
 (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY 
 (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION 
 (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) 
 (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL 
 a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
 (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum 
 (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 null-subquery1:z-subquery1:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: b
 type: string
   outputColumnNames: b
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: b
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 0
   value expressions:
 expr: _col1
 type: bigint
 null-subquery2:z-subquery2:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: a
 type: string
   outputColumnNames: a
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: a
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 1
   value expressions:
 expr: _col1
 type: bigint
   Reduce Operator Tree:
 Demux Operator
   Group By Operator
 aggregations:
   expr: count(VALUE._col0)
 bucketGroup: false
 keys:
   expr: KEY._col0
   type: string
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Select Operator
   expressions:
 expr: _col0
 type: string
 expr: _col1
 type: bigint
   outputColumnNames: _col0, _col1
   Union
 Select Operator

[jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez

2015-06-29 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-10673:
--
Issue Type: New Feature  (was: Bug)

 Dynamically partitioned hash join for Tez
 -

 Key: HIVE-10673
 URL: https://issues.apache.org/jira/browse/HIVE-10673
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning, Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, 
 HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch


 Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the 
 reducer are unsorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez

2015-06-29 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606172#comment-14606172
 ] 

Gopal V commented on HIVE-10673:


[~xuefuz]: this is a re-use of the custom Tez VertexManager from last-year's 
Hadoop Summit talk, extending it to reducers

http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey/13

 Dynamically partitioned hash join for Tez
 -

 Key: HIVE-10673
 URL: https://issues.apache.org/jira/browse/HIVE-10673
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning, Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, 
 HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch


 Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the 
 reducer are unsorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez


[ 
https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606189#comment-14606189
 ] 

Xuefu Zhang commented on HIVE-10673:


Thanks, guys. I asked the question mainly because the title sounds like a 
feature while the JIRA was originally marked as bug and the description 
sounds to be either. It would be nice if the description provides more details 
so that people of general interest would understand. Things such as 
problem/feature description and proposed solution would be definitely helpful.

 Dynamically partitioned hash join for Tez
 -

 Key: HIVE-10673
 URL: https://issues.apache.org/jira/browse/HIVE-10673
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning, Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, 
 HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch


 Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the 
 reducer are unsorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh

2015-06-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11140:
--
Attachment: HIVE-11140.patch

 auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh
 --

 Key: HIVE-11140
 URL: https://issues.apache.org/jira/browse/HIVE-11140
 Project: Hive
  Issue Type: Bug
  Components: Tests, WebHCat
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11140.patch


 it's currently set as
 {noformat}
 if [ -z ${PROJ_HOME} ]; then
   export PROJ_HOME=/Users/${USER}/dev/hive
 fi
 {noformat}
 but it always points to project root so can be 
 {{export PROJ_HOME=../../../../../..}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11123) Fix how to confirm the RDBMS product name at Metastore.


[ 
https://issues.apache.org/jira/browse/HIVE-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606018#comment-14606018
 ] 

Hive QA commented on HIVE-11123:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742520/HIVE-11123.2.patch

{color:green}SUCCESS:{color} +1 9034 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4431/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4431/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4431/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742520 - PreCommit-HIVE-TRUNK-Build

 Fix how to confirm the RDBMS product name at Metastore.
 ---

 Key: HIVE-11123
 URL: https://issues.apache.org/jira/browse/HIVE-11123
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.2.0
 Environment: PostgreSQL
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: HIVE-11123.1.patch, HIVE-11123.2.patch


 I use PostgreSQL to Hive Metastore. And I saw the following message at 
 PostgreSQL log.
 {code}
  2015-06-26 10:58:15.488 JST ERROR:  syntax error at or near @@ at 
 character 5
  2015-06-26 10:58:15.488 JST STATEMENT:  SET @@session.sql_mode=ANSI_QUOTES
  2015-06-26 10:58:15.489 JST ERROR:  relation v$instance does not exist 
 at character 21
  2015-06-26 10:58:15.489 JST STATEMENT:  SELECT version FROM v$instance
  2015-06-26 10:58:15.490 JST ERROR:  column version does not exist at 
 character 10
  2015-06-26 10:58:15.490 JST STATEMENT:  SELECT @@version
 {code}
 When Hive CLI and Beeline embedded mode are carried out, this message is 
 output to PostgreSQL log.
 These queries are called from MetaStoreDirectSql#determineDbType. And if we 
 use MetaStoreDirectSql#getProductName, we need not to call these queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources

2015-06-29 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606099#comment-14606099
 ] 

Thejas M Nair commented on HIVE-10895:
--

[~aihuaxu] Are those users also seeing the failures when Oracle is used as the 
metastore database ? In the internal testing at Hortonworks, we have seen it 
only with Oracle.
This happens in our concurrency test suite, where many queries are hitting HS2 
in parallel.


 ObjectStore does not close Query objects in some calls, causing a potential 
 leak in some metastore db resources
 ---

 Key: HIVE-10895
 URL: https://issues.apache.org/jira/browse/HIVE-10895
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
Reporter: Takahiko Saito
Assignee: Aihua Xu
 Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, 
 HIVE-10895.3.patch


 During testing, we've noticed Oracle db running out of cursors. Might be 
 related to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources

2015-06-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606129#comment-14606129
 ] 

Aihua Xu commented on HIVE-10895:
-

[~thejas] Yes. Those users are all using Oracle as the metastore database.  

 ObjectStore does not close Query objects in some calls, causing a potential 
 leak in some metastore db resources
 ---

 Key: HIVE-10895
 URL: https://issues.apache.org/jira/browse/HIVE-10895
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
Reporter: Takahiko Saito
Assignee: Aihua Xu
 Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, 
 HIVE-10895.3.patch


 During testing, we've noticed Oracle db running out of cursors. Might be 
 related to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez

2015-06-29 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606142#comment-14606142
 ] 

Jason Dere commented on HIVE-10673:
---

[~mmokhtar] or [~gopalv] can probably give more detail, but they found that 
during a shuffle join a large amount of the CPU/IO was spent sorting. 

While this does not work for MR, for other execution engines (such as Tez), it 
is possible to create a reduce-side join that uses unsorted inputs, in order to 
eliminate the sorting. We use the hash join algorithm to perform the join in 
the reducer, so this requires the small tables in the join to fit in the hash 
table for this to work. Testing with this patch [~mmokhtar] found some decent 
time savings.

 Dynamically partitioned hash join for Tez
 -

 Key: HIVE-10673
 URL: https://issues.apache.org/jira/browse/HIVE-10673
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, 
 HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch


 Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the 
 reducer are unsorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge


 [ 
https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11141:
-
Attachment: HIVE-11141.1.patch

 Improve RuleRegExp when the Expression node stack gets huge
 ---

 Key: HIVE-11141
 URL: https://issues.apache.org/jira/browse/HIVE-11141
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11141.1.patch, SQLQuery10.sql.mssql, createtable.rtf


 More and more complex workloads are migrated to Hive from Sql Server, 
 Terradata etc.. 
 And occasionally Hive gets bottlenecked on generating plans for large 
 queries, the majority of the cases time is spent in fetching metadata, 
 partitions and other optimizer transformation related rules
 I have attached the query for the test case which needs to be tested after we 
 setup database as shown below.
 {code}
 create database dataset_3;
 use database dataset_3;
 {code}
 createtable.rtf - create table command
 SQLQuery10.sql.mssql - explain query
 It seems that the most problematic part of the code as the stack gets 
 arbitrary long, in RuleRegExp.java
 {code}
   @Override
   public int cost(StackNode stack) throws SemanticException {
 int numElems = (stack != null ? stack.size() : 0);
 String name = ;
 for (int pos = numElems - 1; pos = 0; pos--) {
   name = stack.get(pos).getName() + % + name;
   Matcher m = pattern.matcher(name);
   if (m.matches()) {
 return m.group().length();
   }
 }
 return -1;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11123) Fix how to confirm the RDBMS product name at Metastore.

2015-06-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606544#comment-14606544
 ] 

Sergey Shelukhin commented on HIVE-11123:
-

+1. Small nit: null check can be done once

 Fix how to confirm the RDBMS product name at Metastore.
 ---

 Key: HIVE-11123
 URL: https://issues.apache.org/jira/browse/HIVE-11123
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.2.0
 Environment: PostgreSQL
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: HIVE-11123.1.patch, HIVE-11123.2.patch


 I use PostgreSQL to Hive Metastore. And I saw the following message at 
 PostgreSQL log.
 {code}
  2015-06-26 10:58:15.488 JST ERROR:  syntax error at or near @@ at 
 character 5
  2015-06-26 10:58:15.488 JST STATEMENT:  SET @@session.sql_mode=ANSI_QUOTES
  2015-06-26 10:58:15.489 JST ERROR:  relation v$instance does not exist 
 at character 21
  2015-06-26 10:58:15.489 JST STATEMENT:  SELECT version FROM v$instance
  2015-06-26 10:58:15.490 JST ERROR:  column version does not exist at 
 character 10
  2015-06-26 10:58:15.490 JST STATEMENT:  SELECT @@version
 {code}
 When Hive CLI and Beeline embedded mode are carried out, this message is 
 output to PostgreSQL log.
 These queries are called from MetaStoreDirectSql#determineDbType. And if we 
 use MetaStoreDirectSql#getProductName, we need not to call these queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge


 [ 
https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11141:
-
Attachment: HIVE-11141.2.patch

 Improve RuleRegExp when the Expression node stack gets huge
 ---

 Key: HIVE-11141
 URL: https://issues.apache.org/jira/browse/HIVE-11141
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11141.1.patch, HIVE-11141.2.patch, 
 SQLQuery10.sql.mssql, createtable.rtf


 More and more complex workloads are migrated to Hive from Sql Server, 
 Terradata etc.. 
 And occasionally Hive gets bottlenecked on generating plans for large 
 queries, the majority of the cases time is spent in fetching metadata, 
 partitions and other optimizer transformation related rules
 I have attached the query for the test case which needs to be tested after we 
 setup database as shown below.
 {code}
 create database dataset_3;
 use database dataset_3;
 {code}
 createtable.rtf - create table command
 SQLQuery10.sql.mssql - explain query
 It seems that the most problematic part of the code as the stack gets 
 arbitrary long, in RuleRegExp.java
 {code}
   @Override
   public int cost(StackNode stack) throws SemanticException {
 int numElems = (stack != null ? stack.size() : 0);
 String name = ;
 for (int pos = numElems - 1; pos = 0; pos--) {
   name = stack.get(pos).getName() + % + name;
   Matcher m = pattern.matcher(name);
   if (m.matches()) {
 return m.group().length();
   }
 }
 return -1;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10141) count(distinct) not supported in Windowing function

2015-06-29 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606470#comment-14606470
 ] 

Yin Huai commented on HIVE-10141:
-

Looking at 
https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L195-L198.
 Seems window spec is dropped.

 count(distinct) not supported in Windowing function
 ---

 Key: HIVE-10141
 URL: https://issues.apache.org/jira/browse/HIVE-10141
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Affects Versions: 1.0.0
Reporter: Yi Zhang
Priority: Critical

 Count(distinct) is a very important function for analysis. For example, 
 unique visitors instead of total visitors. Currently it is missing in 
 Windowing function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended


[ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606565#comment-14606565
 ] 

xiaowei wang commented on HIVE-2:
-

Ok,I will try to add a test case today for HIVE_10983,HIVE_10983 and HIVE-2 
cover different case ,but HIVE-10983 duplicate both.

 ISO-8859-1 text output has fragments of previous longer rows appended
 -

 Key: HIVE-2
 URL: https://issues.apache.org/jira/browse/HIVE-2
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-2.1.patch


 If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query 
 results for a string column are incorrect for any row that was preceded by a 
 row containing a longer string.
 Example steps to reproduce:
 1. Create a table using ISO 8859-1 encoding:
 {code:sql}
 CREATE TABLE person_lat1 (name STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
 SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');
 {code}
 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder 
 in HDFS. I'll attach an example file containing the following text: 
 {noformat}
 Müller,Thomas
 Jørgensen,Jørgen
 Peña,Andrés
 Nåm,Fæk
 {noformat}
 3. Execute {{SELECT * FROM person_lat1}}
 Result - The following output appears:
 {noformat}
 +---+--+
 | person_lat1.name |
 +---+--+
 | Müller,Thomas |
 | Jørgensen,Jørgen |
 | Peña,Andrésørgen |
 | Nåm,Fækdrésørgen |
 +---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez

2015-06-29 Thread Damien Carol (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606278#comment-14606278
 ] 

Damien Carol commented on HIVE-10673:
-

\ No newline at end of file

 Dynamically partitioned hash join for Tez
 -

 Key: HIVE-10673
 URL: https://issues.apache.org/jira/browse/HIVE-10673
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning, Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, 
 HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch


 Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the 
 reducer are unsorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11030) Enhance storage layer to create one delta file per write


[ 
https://issues.apache.org/jira/browse/HIVE-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606294#comment-14606294
 ] 

Alan Gates commented on HIVE-11030:
---

AcidUtils.serializeDeltas and AcidUtils.deserializeDeltas:  You changed these 
to work in the framework of deltas being passed as a list of longs.  But this 
causes double stating of the file system because now 
OrcInputFormat.FileGenerator calls AcidUtils.serializeDeltas, has to figure out 
all the deltas and then forget about the statementIds, then when it comes back 
around in OrcInputFormat.getReader and calls AcidUtils.deserializeDeltas it has 
to go back and restat the file system to find all the statement ids.  Instead 
you should change de/serializeDeltas to pass a triple (maxtxn, mintxn, stmt).  
Or if you prefer to extend the existing hack it can pass a list of longs but 
use 3 slots per delta instead of 2.  This avoids loss of info in serialize that 
has to be rediscovered in deserialize.

In AcidUtils:
{code}
private static ParsedDelta parseDelta(FileStatus path) {
ParsedDelta p = parsedDelta(path.getPath());
return new ParsedDelta(p.getMinTransaction(), 
p.getMaxTransaction(), path, p.statementId);
}
{code}
I don't understand this code.  Why get a ParsedDelta and turn around and create 
a new one?

In parseDelta, would it be better to split the string on '_' rather than call 
indexOf twice?

In OrcRawRecordMerger, in the constructor (line 489 in your patch) you added a 
call to AcidUtils.parsedDeltas.  This looks like another case where if the 
statement id was being properly preserved we would not need to again parse the 
file name.

OrcRecordUpdate, end of the constructor (line 265 in your patch), you're 
introducing a file system stat for a sanity check.  That doesn't seem worth it.



 Enhance storage layer to create one delta file per write
 

 Key: HIVE-11030
 URL: https://issues.apache.org/jira/browse/HIVE-11030
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11030.2.patch, HIVE-11030.3.patch


 Currently each txn using ACID insert/update/delete will generate a delta 
 directory like delta_100_101.  In order to support multi-statement 
 transactions we must generate one delta per operation within the transaction 
 so the deltas would be named like delta_100_101_0001, etc.
 Support for MERGE (HIVE-10924) would need the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh

2015-06-29 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606417#comment-14606417
 ] 

Eugene Koifman commented on HIVE-11140:
---

failure not related.  [~thejas] could you review please?

 auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh
 --

 Key: HIVE-11140
 URL: https://issues.apache.org/jira/browse/HIVE-11140
 Project: Hive
  Issue Type: Bug
  Components: Tests, WebHCat
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11140.patch, HIVE-11140.patch


 it's currently set as
 {noformat}
 if [ -z ${PROJ_HOME} ]; then
   export PROJ_HOME=/Users/${USER}/dev/hive
 fi
 {noformat}
 but it always points to project root so can be 
 {{export PROJ_HOME=../../../../../..}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11009) LLAP: fix TestMiniTezCliDriverLocal on the branch


 [ 
https://issues.apache.org/jira/browse/HIVE-11009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11009:

Assignee: Vikram Dixit K  (was: Gunther Hagleitner)

 LLAP: fix TestMiniTezCliDriverLocal on the branch
 -

 Key: HIVE-11009
 URL: https://issues.apache.org/jira/browse/HIVE-11009
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Vikram Dixit K

 See HIVE-10997. All the queries of this test fail on the branch with the same 
 initialization error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9823) Load spark-defaults.conf from classpath [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606315#comment-14606315
 ] 

Xuefu Zhang commented on HIVE-9823:
---

The document says that for spark related properties, you can add them to a file 
called spark-default.conf and add the file to the classpath. The JIRA here says 
that Hive will load this file from the classpath. Thus, you need both.

 Load spark-defaults.conf from classpath [Spark Branch]
 --

 Key: HIVE-9823
 URL: https://issues.apache.org/jira/browse/HIVE-9823
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 1.2.0

 Attachments: HIVE-9823.1-spark.patch, HIVE-9823.2-spark.patch, 
 HIVE-9823.3-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11061) Table renames not propagated to partition table in HBase metastore

2015-06-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-11061:
--
Attachment: HIVE-11061.patch

This patch does the work in both the tbls and partitions tables to figure out 
if the table name has changed, and if so delete the existing rows and create 
new ones.

 Table renames not propagated to partition table in HBase metastore
 --

 Key: HIVE-11061
 URL: https://issues.apache.org/jira/browse/HIVE-11061
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: hbase-metastore-branch
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11061.patch


 When a table is renamed in the HBase metastore it needs to update relevant 
 rows in the partition table not only in the tbls table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11139) Emit more lineage information

2015-06-29 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-11139:
---
Attachment: HIVE-11139.1.patch

Attached patch v1 that is on RB: https://reviews.apache.org/r/36025/

 Emit more lineage information
 -

 Key: HIVE-11139
 URL: https://issues.apache.org/jira/browse/HIVE-11139
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0

 Attachments: HIVE-11139.1.patch


 HIVE-1131 emits some column lineage info. But it doesn't support INSERT 
 statements, or CTAS statements. It doesn't emit the predicate information 
 either.
 We can enhance and use the dependency information created in HIVE-1131, 
 generate more complete lineage info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge


 [ 
https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11141:
-
Attachment: (was: HIVE-11141.1.patch)

 Improve RuleRegExp when the Expression node stack gets huge
 ---

 Key: HIVE-11141
 URL: https://issues.apache.org/jira/browse/HIVE-11141
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11141.1.patch, SQLQuery10.sql.mssql, createtable.rtf


 More and more complex workloads are migrated to Hive from Sql Server, 
 Terradata etc.. 
 And occasionally Hive gets bottlenecked on generating plans for large 
 queries, the majority of the cases time is spent in fetching metadata, 
 partitions and other optimizer transformation related rules
 I have attached the query for the test case which needs to be tested after we 
 setup database as shown below.
 {code}
 create database dataset_3;
 use database dataset_3;
 {code}
 createtable.rtf - create table command
 SQLQuery10.sql.mssql - explain query
 It seems that the most problematic part of the code as the stack gets 
 arbitrary long, in RuleRegExp.java
 {code}
   @Override
   public int cost(StackNode stack) throws SemanticException {
 int numElems = (stack != null ? stack.size() : 0);
 String name = ;
 for (int pos = numElems - 1; pos = 0; pos--) {
   name = stack.get(pos).getName() + % + name;
   Matcher m = pattern.matcher(name);
   if (m.matches()) {
 return m.group().length();
   }
 }
 return -1;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge

2015-06-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11141:
-
Description: 
Hive occassionally gets bottlenecked on generating plans for large queries, the 
majority of the cases time is spent in fetching metadata, partitions and other 
optimizer transformation related rules

I have attached the query for the test case which needs to be tested after we 
setup database as shown below.
{code}
create database dataset_3;
use database dataset_3;
{code}

createtable.rtf - create table command
SQLQuery10.sql.mssql - explain query

It seems that the most problematic part of the code as the stack gets arbitrary 
long, in RuleRegExp.java
{code}
  @Override
  public int cost(StackNode stack) throws SemanticException {
int numElems = (stack != null ? stack.size() : 0);
String name = ;
for (int pos = numElems - 1; pos = 0; pos--) {
  name = stack.get(pos).getName() + % + name;
  Matcher m = pattern.matcher(name);
  if (m.matches()) {
return m.group().length();
  }
}
return -1;
  }
{code}


  was:
More and more complex workloads are migrated to Hive from Sql Server, Terradata 
etc.. 
And occasionally Hive gets bottlenecked on generating plans for large queries, 
the majority of the cases time is spent in fetching metadata, partitions and 
other optimizer transformation related rules

I have attached the query for the test case which needs to be tested after we 
setup database as shown below.
{code}
create database dataset_3;
use database dataset_3;
{code}

createtable.rtf - create table command
SQLQuery10.sql.mssql - explain query

It seems that the most problematic part of the code as the stack gets arbitrary 
long, in RuleRegExp.java
{code}
  @Override
  public int cost(StackNode stack) throws SemanticException {
int numElems = (stack != null ? stack.size() : 0);
String name = ;
for (int pos = numElems - 1; pos = 0; pos--) {
  name = stack.get(pos).getName() + % + name;
  Matcher m = pattern.matcher(name);
  if (m.matches()) {
return m.group().length();
  }
}
return -1;
  }
{code}



 Improve RuleRegExp when the Expression node stack gets huge
 ---

 Key: HIVE-11141
 URL: https://issues.apache.org/jira/browse/HIVE-11141
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11141.1.patch, HIVE-11141.2.patch, 
 SQLQuery10.sql.mssql, createtable.rtf


 Hive occassionally gets bottlenecked on generating plans for large queries, 
 the majority of the cases time is spent in fetching metadata, partitions and 
 other optimizer transformation related rules
 I have attached the query for the test case which needs to be tested after we 
 setup database as shown below.
 {code}
 create database dataset_3;
 use database dataset_3;
 {code}
 createtable.rtf - create table command
 SQLQuery10.sql.mssql - explain query
 It seems that the most problematic part of the code as the stack gets 
 arbitrary long, in RuleRegExp.java
 {code}
   @Override
   public int cost(StackNode stack) throws SemanticException {
 int numElems = (stack != null ? stack.size() : 0);
 String name = ;
 for (int pos = numElems - 1; pos = 0; pos--) {
   name = stack.get(pos).getName() + % + name;
   Matcher m = pattern.matcher(name);
   if (m.matches()) {
 return m.group().length();
   }
 }
 return -1;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo

2015-06-29 Thread Nishant Kelkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Kelkar updated HIVE-9557:
-
Attachment: HIVE-9557.3.patch

Attaching revision #3 patch to remove hidden dependency on FastMath (it comes 
in via org.apache.spark:spark-core_2.10 dependency) from commons-math3. Using 
library Math instead.


 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Nishant Kelkar
  Labels: CosineSimilarity, SimilarityMetric, UDF
 Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, HIVE-9557.3.patch, 
 udf_cosine_similarity-v01.patch


 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)

[
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606523#comment-14606523
]

Alan Gates commented on HIVE-11055:
---

I ran rat on this and all looks good except for a number of generated files:
{code}
!?
/Users/gates/git/apache/hive/hplsql/src/main/java/org/apache/hive/hplsql/Hplsql.tokens
!?
/Users/gates/git/apache/hive/hplsql/src/main/java/org/apache/hive/hplsql/HplsqlBaseVisitor.java
!?
/Users/gates/git/apache/hive/hplsql/src/main/java/org/apache/hive/hplsql/HplsqlLexer.java
!?
/Users/gates/git/apache/hive/hplsql/src/main/java/org/apache/hive/hplsql/HplsqlLexer.tokens
!?
/Users/gates/git/apache/hive/hplsql/src/main/java/org/apache/hive/hplsql/HplsqlParser.java
!?
/Users/gates/git/apache/hive/hplsql/src/main/java/org/apache/hive/hplsql/HplsqlVisitor.java
{code}
Did you intend to check these in rather than have the build generate them?

HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
---

Key: HIVE-11055
URL: https://issues.apache.org/jira/browse/HIVE-11055
Project: Hive
Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch,
HIVE-11055.3.patch

There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive
(actually any SQL-on-Hadoop implementation and any JDBC source).
Alan Gates offered to contribute it to Hive under HPL/SQL name
(org.apache.hive.hplsql package). This JIRA is to create a patch to
contribute the PL/HQL code.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11068) Hive throws OOM in client side


[ 
https://issues.apache.org/jira/browse/HIVE-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606598#comment-14606598
 ] 

Sergey Shelukhin commented on HIVE-11068:
-

[~gopalv] [~prasanth_j] is that the cycles issue you were talking about?

 Hive throws OOM in client side
 --

 Key: HIVE-11068
 URL: https://issues.apache.org/jira/browse/HIVE-11068
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Prasanth Jayachandran
 Attachments: Yourkit_String.png, Yourkit_TablScanDesc.png, 
 hive_cli_debug.log.gz


 Hive build: (Latest on Jun 21. commit 
 142426394cfdc8a1fea51f7642c63f43f36b0333).
 Query: Query 64 TPC-DS 
 (https://github.com/cartershanklin/hive-testbench/blob/master/sample-queries-tpcds/query64.sql)
 Hive throws following OOM in client side.
 {noformat}
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
   at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
   at java.lang.StringCoding.decode(StringCoding.java:193)
   at java.lang.String.init(String.java:414)
   at java.lang.String.init(String.java:479)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.serializeExpression(Utilities.java:799)
   at 
 org.apache.hadoop.hive.ql.plan.TableScanDesc.setFilterExpr(TableScanDesc.java:153)
   at 
 org.apache.hadoop.hive.ql.ppd.OpProcFactory.pushFilterToStorageHandler(OpProcFactory.java:901)
   at 
 org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:818)
   at 
 org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:788)
   at 
 org.apache.hadoop.hive.ql.ppd.OpProcFactory$TableScanPPD.process(OpProcFactory.java:388)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
   at 
 org.apache.hadoop.hive.ql.ppd.PredicatePushDown.transform(PredicatePushDown.java:135)
   at 
 org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:192)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10171)
   at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1124)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)
   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
   at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh

2015-06-29 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606467#comment-14606467
 ] 

Thejas M Nair commented on HIVE-11140:
--

+1

 auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh
 --

 Key: HIVE-11140
 URL: https://issues.apache.org/jira/browse/HIVE-11140
 Project: Hive
  Issue Type: Bug
  Components: Tests, WebHCat
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11140.patch, HIVE-11140.patch


 it's currently set as
 {noformat}
 if [ -z ${PROJ_HOME} ]; then
   export PROJ_HOME=/Users/${USER}/dev/hive
 fi
 {noformat}
 but it always points to project root so can be 
 {{export PROJ_HOME=../../../../../..}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh


[ 
https://issues.apache.org/jira/browse/HIVE-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606385#comment-14606385
 ] 

Hive QA commented on HIVE-11140:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742558/HIVE-11140.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9034 tests executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4432/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4432/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4432/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742558 - PreCommit-HIVE-TRUNK-Build

 auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh
 --

 Key: HIVE-11140
 URL: https://issues.apache.org/jira/browse/HIVE-11140
 Project: Hive
  Issue Type: Bug
  Components: Tests, WebHCat
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11140.patch, HIVE-11140.patch


 it's currently set as
 {noformat}
 if [ -z ${PROJ_HOME} ]; then
   export PROJ_HOME=/Users/${USER}/dev/hive
 fi
 {noformat}
 but it always points to project root so can be 
 {{export PROJ_HOME=../../../../../..}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh

2015-06-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606555#comment-14606555
 ] 

Hive QA commented on HIVE-11140:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742568/HIVE-11140.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9034 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4433/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4433/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4433/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742568 - PreCommit-HIVE-TRUNK-Build

 auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh
 --

 Key: HIVE-11140
 URL: https://issues.apache.org/jira/browse/HIVE-11140
 Project: Hive
  Issue Type: Bug
  Components: Tests, WebHCat
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11140.patch, HIVE-11140.patch


 it's currently set as
 {noformat}
 if [ -z ${PROJ_HOME} ]; then
   export PROJ_HOME=/Users/${USER}/dev/hive
 fi
 {noformat}
 but it always points to project root so can be 
 {{export PROJ_HOME=../../../../../..}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11068) Hive throws OOM in client side

2015-06-29 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606602#comment-14606602
 ] 

Prasanth Jayachandran commented on HIVE-11068:
--

Yes. It is.

 Hive throws OOM in client side
 --

 Key: HIVE-11068
 URL: https://issues.apache.org/jira/browse/HIVE-11068
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Prasanth Jayachandran
 Attachments: Yourkit_String.png, Yourkit_TablScanDesc.png, 
 hive_cli_debug.log.gz


 Hive build: (Latest on Jun 21. commit 
 142426394cfdc8a1fea51f7642c63f43f36b0333).
 Query: Query 64 TPC-DS 
 (https://github.com/cartershanklin/hive-testbench/blob/master/sample-queries-tpcds/query64.sql)
 Hive throws following OOM in client side.
 {noformat}
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
   at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
   at java.lang.StringCoding.decode(StringCoding.java:193)
   at java.lang.String.init(String.java:414)
   at java.lang.String.init(String.java:479)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.serializeExpression(Utilities.java:799)
   at 
 org.apache.hadoop.hive.ql.plan.TableScanDesc.setFilterExpr(TableScanDesc.java:153)
   at 
 org.apache.hadoop.hive.ql.ppd.OpProcFactory.pushFilterToStorageHandler(OpProcFactory.java:901)
   at 
 org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:818)
   at 
 org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:788)
   at 
 org.apache.hadoop.hive.ql.ppd.OpProcFactory$TableScanPPD.process(OpProcFactory.java:388)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
   at 
 org.apache.hadoop.hive.ql.ppd.PredicatePushDown.transform(PredicatePushDown.java:135)
   at 
 org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:192)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10171)
   at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1124)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)
   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
   at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11143) Tests udf_from_utc_timestamp.q/udf_to_utc_timestamp.q do not work with updated Java timezone information

2015-06-29 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-11143:
--
Attachment: HIVE-11143.1.patch

Attaching patch v1. This changes the year used in the tests from 2015 to 2012, 
before the time zone changes.

 Tests udf_from_utc_timestamp.q/udf_to_utc_timestamp.q do not work with 
 updated Java timezone information
 

 Key: HIVE-11143
 URL: https://issues.apache.org/jira/browse/HIVE-11143
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11143.1.patch


 It looks like there were recent changes to the Europe/Moscow time zone in 
 2014. When udf_from_utc_timestamp.q/udf_to_utc_timestamp.q are run with more 
 recent versions of JDK or with an updated time zone database, the tests fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge


 [ 
https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11141:
-
Description: 
Hive occasionally gets bottlenecked on generating plans for large queries, the 
majority of the cases time is spent in fetching metadata, partitions and other 
optimizer transformation related rules

I have attached the query for the test case which needs to be tested after we 
setup database as shown below.
{code}
create database dataset_3;
use database dataset_3;
{code}

createtable.rtf - create table command
SQLQuery10.sql.mssql - explain query

It seems that the most problematic part of the code as the stack gets arbitrary 
long, in RuleRegExp.java
{code}
  @Override
  public int cost(StackNode stack) throws SemanticException {
int numElems = (stack != null ? stack.size() : 0);
String name = ;
for (int pos = numElems - 1; pos = 0; pos--) {
  name = stack.get(pos).getName() + % + name;
  Matcher m = pattern.matcher(name);
  if (m.matches()) {
return m.group().length();
  }
}
return -1;
  }
{code}


  was:
Hive occassionally gets bottlenecked on generating plans for large queries, the 
majority of the cases time is spent in fetching metadata, partitions and other 
optimizer transformation related rules

I have attached the query for the test case which needs to be tested after we 
setup database as shown below.
{code}
create database dataset_3;
use database dataset_3;
{code}

createtable.rtf - create table command
SQLQuery10.sql.mssql - explain query

It seems that the most problematic part of the code as the stack gets arbitrary 
long, in RuleRegExp.java
{code}
  @Override
  public int cost(StackNode stack) throws SemanticException {
int numElems = (stack != null ? stack.size() : 0);
String name = ;
for (int pos = numElems - 1; pos = 0; pos--) {
  name = stack.get(pos).getName() + % + name;
  Matcher m = pattern.matcher(name);
  if (m.matches()) {
return m.group().length();
  }
}
return -1;
  }
{code}



 Improve RuleRegExp when the Expression node stack gets huge
 ---

 Key: HIVE-11141
 URL: https://issues.apache.org/jira/browse/HIVE-11141
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11141.1.patch, HIVE-11141.2.patch, 
 SQLQuery10.sql.mssql, createtable.rtf


 Hive occasionally gets bottlenecked on generating plans for large queries, 
 the majority of the cases time is spent in fetching metadata, partitions and 
 other optimizer transformation related rules
 I have attached the query for the test case which needs to be tested after we 
 setup database as shown below.
 {code}
 create database dataset_3;
 use database dataset_3;
 {code}
 createtable.rtf - create table command
 SQLQuery10.sql.mssql - explain query
 It seems that the most problematic part of the code as the stack gets 
 arbitrary long, in RuleRegExp.java
 {code}
   @Override
   public int cost(StackNode stack) throws SemanticException {
 int numElems = (stack != null ? stack.size() : 0);
 String name = ;
 for (int pos = numElems - 1; pos = 0; pos--) {
   name = stack.get(pos).getName() + % + name;
   Matcher m = pattern.matcher(name);
   if (m.matches()) {
 return m.group().length();
   }
 }
 return -1;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10410) Apparent race condition in HiveServer2 causing intermittent query failures

2015-06-29 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606525#comment-14606525
 ] 

Jimmy Xiang commented on HIVE-10410:


HIVE-10956 fixed some HiveMetaStoreClient sync issue. It should help, in case 
it is a race to HMS.

 Apparent race condition in HiveServer2 causing intermittent query failures
 --

 Key: HIVE-10410
 URL: https://issues.apache.org/jira/browse/HIVE-10410
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.1
 Environment: CDH 5.3.3
 CentOS 6.4
Reporter: Richard Williams
 Attachments: HIVE-10410.1.patch


 On our secure Hadoop cluster, queries submitted to HiveServer2 through JDBC 
 occasionally trigger odd Thrift exceptions with messages such as Read a 
 negative frame size (-2147418110)! or out of sequence response in 
 HiveServer2's connections to the metastore. For certain metastore calls (for 
 example, showDatabases), these Thrift exceptions are converted to 
 MetaExceptions in HiveMetaStoreClient, which prevents RetryingMetaStoreClient 
 from retrying these calls and thus causes the failure to bubble out to the 
 JDBC client.
 Note that as far as we can tell, this issue appears to only affect queries 
 that are submitted with the runAsync flag on TExecuteStatementReq set to true 
 (which, in practice, seems to mean all JDBC queries), and it appears to only 
 manifest when HiveServer2 is using the new HTTP transport mechanism. When 
 both these conditions hold, we are able to fairly reliably reproduce the 
 issue by spawning about 100 simple, concurrent hive queries (we have been 
 using show databases), two or three of which typically fail. However, when 
 either of these conditions do not hold, we are no longer able to reproduce 
 the issue.
 Some example stack traces from the HiveServer2 logs:
 {noformat}
 2015-04-16 13:54:55,486 ERROR hive.log: Got exception: 
 org.apache.thrift.transport.TTransportException Read a negative frame size 
 (-2147418110)!
 org.apache.thrift.transport.TTransportException: Read a negative frame size 
 (-2147418110)!
 at 
 org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:435)
 at 
 org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414)
 at 
 org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at 
 org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 at 
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837)
 at 
 org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
 at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
 at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
 at 
 org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
 at 
 org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
 at

[jira] [Commented] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605216#comment-14605216
 ] 

Rui Li commented on HIVE-11138:
---

cc [~chengxiang li], [~xuefuz] 

 Query fails when there isn't a comparator for an operator [Spark Branch]
 

 Key: HIVE-11138
 URL: https://issues.apache.org/jira/browse/HIVE-11138
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-11138.1-spark.patch


 In such case, OperatorComparatorFactory should default to false instead of 
 throw exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-11138:
--
Attachment: HIVE-11138.1-spark.patch

 Query fails when there isn't a comparator for an operator [Spark Branch]
 

 Key: HIVE-11138
 URL: https://issues.apache.org/jira/browse/HIVE-11138
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-11138.1-spark.patch


 In such case, OperatorComparatorFactory should default to false instead of 
 throw exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-11138:
--
Attachment: HIVE-11138.1-spark.patch

Can't reproduce the failures locally. Try again.

 Query fails when there isn't a comparator for an operator [Spark Branch]
 

 Key: HIVE-11138
 URL: https://issues.apache.org/jira/browse/HIVE-11138
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-11138.1-spark.patch, HIVE-11138.1-spark.patch


 In such case, OperatorComparatorFactory should default to false instead of 
 throw exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master


 [ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11014:

Summary: LLAP: some MiniTez tests have result changes compared to master  
(was: LLAP: MiniTez vector_binary_join_groupby, vector_outer_join1, 
vector_outer_join2 and cbo_windowing tests have result changes compared to 
master)

 LLAP: some MiniTez tests have result changes compared to master
 ---

 Key: HIVE-11014
 URL: https://issues.apache.org/jira/browse/HIVE-11014
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Matt McCline





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended


[ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606922#comment-14606922
 ] 

xiaowei wang commented on HIVE-2:
-

 I have added  a test case in HIVE-11095 ,so I need code review. The test have 
passed . 
Thanks!

 ISO-8859-1 text output has fragments of previous longer rows appended
 -

 Key: HIVE-2
 URL: https://issues.apache.org/jira/browse/HIVE-2
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-2.1.patch


 If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query 
 results for a string column are incorrect for any row that was preceded by a 
 row containing a longer string.
 Example steps to reproduce:
 1. Create a table using ISO 8859-1 encoding:
 {code:sql}
 CREATE TABLE person_lat1 (name STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
 SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');
 {code}
 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder 
 in HDFS. I'll attach an example file containing the following text: 
 {noformat}
 Müller,Thomas
 Jørgensen,Jørgen
 Peña,Andrés
 Nåm,Fæk
 {noformat}
 3. Execute {{SELECT * FROM person_lat1}}
 Result - The following output appears:
 {noformat}
 +---+--+
 | person_lat1.name |
 +---+--+
 | Müller,Thomas |
 | Jørgensen,Jørgen |
 | Peña,Andrésørgen |
 | Nåm,Fækdrésørgen |
 +---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11108) HashTableSinkOperator doesn't support vectorization [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606938#comment-14606938
 ] 

Hive QA commented on HIVE-11108:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742687/HIVE-11108.2-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7992 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hive.hcatalog.streaming.TestStreaming.testAddPartition
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/916/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/916/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-916/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742687 - PreCommit-HIVE-SPARK-Build

 HashTableSinkOperator doesn't support vectorization [Spark Branch]
 --

 Key: HIVE-11108
 URL: https://issues.apache.org/jira/browse/HIVE-11108
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-11108.1-spark.patch, HIVE-11108.2-spark.patch


 This prevents any BaseWork containing HTS from being vectorized. It's 
 basically specific to spark, because Tez doesn't use HTS and MR runs HTS in 
 local tasks.
 We should verify if it makes sense to make HTS support vectorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2015-06-29 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7723:
-
Assignee: Hari Sankar Sivarama Subramaniyan  (was: Mostafa Mokhtar)

 Explain plan for complex query with lots of partitions is slow due to 
 in-efficient collection used to find a matching ReadEntity
 

 Key: HIVE-7723
 URL: https://issues.apache.org/jira/browse/HIVE-7723
 Project: Hive
  Issue Type: Bug
  Components: CLI, Physical Optimizer
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, 
 HIVE-7723.11.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, 
 HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, 
 HIVE-7723.9.patch


 Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
 showed that ReadEntity.equals is taking ~40% of the CPU.
 ReadEntity.equals is called from the snippet below.
 Again and again the set is iterated over to get the actual match, a HashMap 
 is a better option for this case as Set doesn't have a Get method.
 Also for ReadEntity equals is case-insensitive while hash is , which is an 
 undesired behavior.
 {code}
 public static ReadEntity addInput(SetReadEntity inputs, ReadEntity 
 newInput) {
 // If the input is already present, make sure the new parent is added to 
 the input.
 if (inputs.contains(newInput)) {
   for (ReadEntity input : inputs) {
 if (input.equals(newInput)) {
   if ((newInput.getParents() != null)  
 (!newInput.getParents().isEmpty())) {
 input.getParents().addAll(newInput.getParents());
 input.setDirect(input.isDirect() || newInput.isDirect());
   }
   return input;
 }
   }
   assert false;
 } else {
   inputs.add(newInput);
   return newInput;
 }
 // make compile happy
 return null;
   }
 {code}
 This is the query used : 
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk =

[jira] [Assigned] (HIVE-11102) ReaderImpl: getColumnIndicesFromNames does not work for ACID tables


 [ 
https://issues.apache.org/jira/browse/HIVE-11102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-11102:
---

Assignee: Sergey Shelukhin  (was: Gopal V)

 ReaderImpl: getColumnIndicesFromNames does not work for ACID tables
 ---

 Key: HIVE-11102
 URL: https://issues.apache.org/jira/browse/HIVE-11102
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Gopal V
Assignee: Sergey Shelukhin

 ORC reader impl does not estimate the size of ACID data files correctly.
 {code}
 Caused by: java.lang.IndexOutOfBoundsException: Index: 0
   at java.util.Collections$EmptyList.get(Collections.java:3212)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
   at 
 org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651)
   at 
 org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master


[ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606810#comment-14606810
 ] 

Sergey Shelukhin commented on HIVE-11014:
-

Looks like this no longer happens after recent master merge

 LLAP: some MiniTez tests have result changes compared to master
 ---

 Key: HIVE-11014
 URL: https://issues.apache.org/jira/browse/HIVE-11014
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 -vector_binary_join_groupby, vector_outer_join1, vector_outer_join2- and 
 cbo_windowing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11147) MetaTool doesn't update FS root location for partitions with space in name

2015-06-29 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11147:
-
Attachment: HIVE-11147.01.patch

Attach patch 01

 MetaTool doesn't update FS root location for partitions with space in name
 --

 Key: HIVE-11147
 URL: https://issues.apache.org/jira/browse/HIVE-11147
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Wei Zheng
Assignee: Wei Zheng
 Attachments: HIVE-11147.01.patch


 Problem happens when trying to update the FS root location:
 {code}
 # HIVE_CONF_DIR=/etc/hive/conf.server/ hive --service metatool -dryRun 
 -updateLocation hdfs://mycluster hdfs://c6401.ambari.apache.org:8020
 ...
 Looking for LOCATION_URI field in DBS table to update..
 Dry Run of updateLocation on table DBS..
 old location: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse new 
 location: hdfs://mycluster/apps/hive/warehouse
 Found 1 records in DBS table to update
 Looking for LOCATION field in SDS table to update..
 Dry Run of updateLocation on table SDS..
 old location: 
 hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/web_sales/ws_web_site_sk=12
  new location: 
 hdfs://mycluster/apps/hive/warehouse/web_sales/ws_web_site_sk=12
 old location: 
 hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/web_sales/ws_web_site_sk=13
  new location: 
 hdfs://mycluster/apps/hive/warehouse/web_sales/ws_web_site_sk=13
 ...
 Found 143 records in SDS table to update
 Warning: Found records with bad LOCATION in SDS table..
 bad location URI: 
 hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=Advanced
  Degree
 bad location URI: 
 hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=Advanced
  Degree
 bad location URI: 
 hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=4
  yr Degree
 bad location URI: 
 hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=4
  yr Degree
 bad location URI: 
 hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=2
  yr Degree
 bad location URI: 
 hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=2
  yr Degree
 {code}
 The reason why some entries are marked as bad location is that they have 
 space character in the partition name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11145) Remove OFFLINE and NO_DROP from tables and partitions


[ 
https://issues.apache.org/jira/browse/HIVE-11145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606882#comment-14606882
 ] 

Sergey Shelukhin commented on HIVE-11145:
-

Is it better to just do it on master?

 Remove OFFLINE and NO_DROP from tables and partitions
 -

 Key: HIVE-11145
 URL: https://issues.apache.org/jira/browse/HIVE-11145
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, SQL
Affects Versions: 2.0.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-11145.patch


 Currently a table or partition can be marked no_drop or offline.  This 
 prevents users from dropping or reading (and dropping) the table or 
 partition.  This was built in 0.7 before SQL standard authorization was an 
 option. 
 This is an expensive feature as when a table is dropped every partition must 
 be fetched and checked to make sure it can be dropped.
 This feature is also redundant now that real authorization is available in 
 Hive.
 This feature should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11102) ReaderImpl: getColumnIndicesFromNames does not work for ACID tables


[ 
https://issues.apache.org/jira/browse/HIVE-11102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606779#comment-14606779
 ] 

Sergey Shelukhin commented on HIVE-11102:
-

The issue is actually that the column is not found.
Adding this:
{noformat}
  if (fieldNames.contains(colName)) {
fieldIdx = fieldNames.indexOf(colName);
+ } else {
+  String s = Cannot find field for:  + colName +  in ;
+   for (String fn : fieldNames) {
+s += fn + , ;
+  }
+   LOG.error(s);
+   continue;
  }
{noformat}

To one test that gets this on llap branch after merge produces

{noformat}
2015-06-29 17:45:56,629 ERROR [ORC_GET_SPLITS #2] orc.ReaderImpl: Cannot find 
field for: ctinyint in _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, 
_col8, _col9, _col10, _col11, 
{noformat}

 ReaderImpl: getColumnIndicesFromNames does not work for ACID tables
 ---

 Key: HIVE-11102
 URL: https://issues.apache.org/jira/browse/HIVE-11102
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Gopal V
Assignee: Sergey Shelukhin

 ORC reader impl does not estimate the size of ACID data files correctly.
 {code}
 Caused by: java.lang.IndexOutOfBoundsException: Index: 0
   at java.util.Collections$EmptyList.get(Collections.java:3212)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
   at 
 org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651)
   at 
 org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused


[ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606912#comment-14606912
 ] 

Hive QA commented on HIVE-11095:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742660/HIVE-11095.3.patch.txt

{color:green}SUCCESS:{color} +1 9035 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4436/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4436/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4436/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742660 - PreCommit-HIVE-TRUNK-Build

 SerDeUtils  another bug ,when Text is reused
 

 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Fix For: 2.0.0

 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, 
 HIVE-11095.3.patch.txt


 {noformat}
 The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
 Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 How I found this bug？
 When i query data from a lzo table ， I found in results ： the length of the 
 current row is always largr than the previous row， and sometimes，the current 
 row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select * from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
 `line` string)
 PARTITIONED BY (
 `logdate` string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '
 U'
 WITH SERDEPROPERTIES (
 'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
 OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master


 [ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11014:

Description: 
vector_binary_join_groupby, -vector_outer_join1, vector_outer_join2- and 
cbo_windowing


  was:
vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and 
cbo_windowing



 LLAP: some MiniTez tests have result changes compared to master
 ---

 Key: HIVE-11014
 URL: https://issues.apache.org/jira/browse/HIVE-11014
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 vector_binary_join_groupby, -vector_outer_join1, vector_outer_join2- and 
 cbo_windowing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master


 [ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11014:

Description: 
-
vector_binary_join_groupby, vector_outer_join1, vector_outer_join2- and 
cbo_windowing


  was:
vector_binary_join_groupby, -vector_outer_join1, vector_outer_join2- and 
cbo_windowing



 LLAP: some MiniTez tests have result changes compared to master
 ---

 Key: HIVE-11014
 URL: https://issues.apache.org/jira/browse/HIVE-11014
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 -
 vector_binary_join_groupby, vector_outer_join1, vector_outer_join2- and 
 cbo_windowing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-11017) LLAP: disable the flaky TestLlapTaskSchedulerService test


 [ 
https://issues.apache.org/jira/browse/HIVE-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-11017.
-
   Resolution: Fixed
Fix Version/s: llap

 LLAP: disable the flaky TestLlapTaskSchedulerService test 
 --

 Key: HIVE-11017
 URL: https://issues.apache.org/jira/browse/HIVE-11017
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: llap


 It passes for me locally on both hadoop-1 and hadoop-2.
 On HiveQA, it fails: 
 {noformat}
 java.lang.AssertionError: expected:6 but was:4
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.tez.dag.app.rm.TestLlapTaskSchedulerService.testNodeReEnabled(TestLlapTaskSchedulerService.java:264)
 {noformat}
 For example 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4264/testReport/org.apache.tez.dag.app.rm/TestLlapTaskSchedulerService/testNodeReEnabled/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11108) HashTableSinkOperator doesn't support vectorization [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606905#comment-14606905
 ] 

Xuefu Zhang commented on HIVE-11108:


+1 pending on test.

 HashTableSinkOperator doesn't support vectorization [Spark Branch]
 --

 Key: HIVE-11108
 URL: https://issues.apache.org/jira/browse/HIVE-11108
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-11108.1-spark.patch, HIVE-11108.2-spark.patch


 This prevents any BaseWork containing HTS from being vectorized. It's 
 basically specific to spark, because Tez doesn't use HTS and MR runs HTS in 
 local tasks.
 We should verify if it makes sense to make HTS support vectorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused


[ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606921#comment-14606921
 ] 

xiaowei wang commented on HIVE-11095:
-

[~xuefuz]  I add a test case ,so I  need code review.  The test have passed .

 SerDeUtils  another bug ,when Text is reused
 

 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Fix For: 2.0.0

 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, 
 HIVE-11095.3.patch.txt


 {noformat}
 The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
 Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 How I found this bug？
 When i query data from a lzo table ， I found in results ： the length of the 
 current row is always largr than the previous row， and sometimes，the current 
 row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select * from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
 `line` string)
 PARTITIONED BY (
 `logdate` string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '
 U'
 WITH SERDEPROPERTIES (
 'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
 OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11102) ReaderImpl: getColumnIndicesFromNames does not work for ACID tables


 [ 
https://issues.apache.org/jira/browse/HIVE-11102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11102:

Attachment: HIVE-11102.patch

Patch that fixes the exception. The test that was failing on LLAP branch with 
this error now produces the same result as on master... [~prasanth_j] should 
there be a separate fix for why the column is not found?

 ReaderImpl: getColumnIndicesFromNames does not work for ACID tables
 ---

 Key: HIVE-11102
 URL: https://issues.apache.org/jira/browse/HIVE-11102
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-11102.patch


 ORC reader impl does not estimate the size of ACID data files correctly.
 {code}
 Caused by: java.lang.IndexOutOfBoundsException: Index: 0
   at java.util.Collections$EmptyList.get(Collections.java:3212)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
   at 
 org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651)
   at 
 org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-4897) Hive should handle AlreadyExists on retries when creating tables/partitions

2015-06-29 Thread Matthew Jacobs (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Jacobs updated HIVE-4897:
-
   Priority: Major  (was: Minor)
Description: 
Creating new tables/partitions may fail with an AlreadyExistsException if there 
is an error part way through the creation and the HMS tries again without 
properly cleaning up or checking if this is a retry.

While partitioning a new table via a script on distributed hive (MetaStore on 
the same machine) there was a long timeout and then:
{code}
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. 
AlreadyExistsException(message:Partition already exists:Partition( ...
{code}
I am assuming this is due to retry. Perhaps already-exists on retry could be 
handled better.


A similar error occurred while creating a table through Impala, which issued a 
single createTable call that failed with an AlreadyExistsException. See the 
logs related to table tmp_proc_8_d2b7b0f133be455ca95615818b8a5879_7 in the 
attached hive-snippet.log

  was:
While partitioning a new table via a script on distributed hive (MetaStore on 
the same machine) there was a long timeout and then:
{code}
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. 
AlreadyExistsException(message:Partition already exists:Partition( ...
{code}
I am assuming this is due to retry. Perhaps already-exists on retry could be 
handled better.


Summary: Hive should handle AlreadyExists on retries when creating 
tables/partitions  (was: Hive should handle AlreadyExists on retries when 
creating partitions)

 Hive should handle AlreadyExists on retries when creating tables/partitions
 ---

 Key: HIVE-4897
 URL: https://issues.apache.org/jira/browse/HIVE-4897
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
 Attachments: hive-snippet.log


 Creating new tables/partitions may fail with an AlreadyExistsException if 
 there is an error part way through the creation and the HMS tries again 
 without properly cleaning up or checking if this is a retry.
 While partitioning a new table via a script on distributed hive (MetaStore on 
 the same machine) there was a long timeout and then:
 {code}
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. 
 AlreadyExistsException(message:Partition already exists:Partition( ...
 {code}
 I am assuming this is due to retry. Perhaps already-exists on retry could be 
 handled better.
 A similar error occurred while creating a table through Impala, which issued 
 a single createTable call that failed with an AlreadyExistsException. See the 
 logs related to table tmp_proc_8_d2b7b0f133be455ca95615818b8a5879_7 in the 
 attached hive-snippet.log



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master


 [ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-11014:
---

Assignee: Sergey Shelukhin  (was: Matt McCline)

 LLAP: some MiniTez tests have result changes compared to master
 ---

 Key: HIVE-11014
 URL: https://issues.apache.org/jira/browse/HIVE-11014
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and 
 cbo_windowing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master


 [ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11014:

Description: 
-vector_binary_join_groupby, vector_outer_join1, vector_outer_join2- and 
cbo_windowing


  was:
-
vector_binary_join_groupby, vector_outer_join1, vector_outer_join2- and 
cbo_windowing



 LLAP: some MiniTez tests have result changes compared to master
 ---

 Key: HIVE-11014
 URL: https://issues.apache.org/jira/browse/HIVE-11014
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 -vector_binary_join_groupby, vector_outer_join1, vector_outer_join2- and 
 cbo_windowing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended


[ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606926#comment-14606926
 ] 

xiaowei wang commented on HIVE-2:
-

The above is wrong .
I will try to add a test case today for HIVE_11095,HIVE_11095 and HIVE-2 
cover different case ,but HIVE-10983 duplicate both.

 ISO-8859-1 text output has fragments of previous longer rows appended
 -

 Key: HIVE-2
 URL: https://issues.apache.org/jira/browse/HIVE-2
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-2.1.patch


 If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query 
 results for a string column are incorrect for any row that was preceded by a 
 row containing a longer string.
 Example steps to reproduce:
 1. Create a table using ISO 8859-1 encoding:
 {code:sql}
 CREATE TABLE person_lat1 (name STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
 SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');
 {code}
 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder 
 in HDFS. I'll attach an example file containing the following text: 
 {noformat}
 Müller,Thomas
 Jørgensen,Jørgen
 Peña,Andrés
 Nåm,Fæk
 {noformat}
 3. Execute {{SELECT * FROM person_lat1}}
 Result - The following output appears:
 {noformat}
 +---+--+
 | person_lat1.name |
 +---+--+
 | Müller,Thomas |
 | Jørgensen,Jørgen |
 | Peña,Andrésørgen |
 | Nåm,Fækdrésørgen |
 +---+--+
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused


 [ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-11095:

Attachment: HIVE-11095.3.patch.txt

 SerDeUtils  another bug ,when Text is reused
 

 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Fix For: 2.0.0

 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, 
 HIVE-11095.3.patch.txt


 {noformat}
 The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
 Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 How I found this bug？
 When i query data from a lzo table ， I found in results ： the length of the 
 current row is always largr than the previous row， and sometimes，the current 
 row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select * from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
 `line` string)
 PARTITIONED BY (
 `logdate` string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '
 U'
 WITH SERDEPROPERTIES (
 'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
 OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11014) LLAP: some MiniTez tests have result changes compared to master


 [ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11014:

Description: 
vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and 
cbo_windowing


 LLAP: some MiniTez tests have result changes compared to master
 ---

 Key: HIVE-11014
 URL: https://issues.apache.org/jira/browse/HIVE-11014
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Matt McCline

 vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and 
 cbo_windowing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused


[ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607202#comment-14607202
 ] 

xiaowei wang commented on HIVE-11095:
-

Thanks!

 SerDeUtils  another bug ,when Text is reused
 

 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Fix For: 2.0.0

 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, 
 HIVE-11095.3.patch.txt


 {noformat}
 The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
 Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 How I found this bug？
 When i query data from a lzo table ， I found in results ： the length of the 
 current row is always largr than the previous row， and sometimes，the current 
 row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select * from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
 `line` string)
 PARTITIONED BY (
 `logdate` string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '
 U'
 WITH SERDEPROPERTIES (
 'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
 OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9566) HiveServer2 fails to start with NullPointerException


 [ 
https://issues.apache.org/jira/browse/HIVE-9566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9566:
--
Attachment: HIVE-9566.patch

Renamed patch to trigger the test run.

 HiveServer2 fails to start with NullPointerException
 

 Key: HIVE-9566
 URL: https://issues.apache.org/jira/browse/HIVE-9566
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0, 0.14.0, 0.13.1
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-9566-branch-0.13.patch, 
 HIVE-9566-branch-0.14.patch, HIVE-9566-trunk.patch, HIVE-9566.patch


 hiveserver2 uses embedded metastore with default hive-site.xml configuration. 
 I use hive --stop --service hiveserver2 command to stop the running 
 hiveserver2 process and then use hive --start --service hiveserver2 command 
 to start the hiveserver2 service. I see the following exception in the 
 hive.log file
 {noformat}
 java.lang.NullPointerException
 at 
 org.apache.hive.service.server.HiveServer2.stop(HiveServer2.java:104)
 at 
 org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:138)
 at 
 org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:171)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 
 {noformat}
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused


[ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607540#comment-14607540
 ] 

xiaowei wang commented on HIVE-11095:
-

Is there a problem ？

 SerDeUtils  another bug ,when Text is reused
 

 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Fix For: 2.0.0

 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, 
 HIVE-11095.3.patch.txt


 {noformat}
 The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
 Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 How I found this bug？
 When i query data from a lzo table ， I found in results ： the length of the 
 current row is always largr than the previous row， and sometimes，the current 
 row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select * from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
 `line` string)
 PARTITIONED BY (
 `logdate` string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '
 U'
 WITH SERDEPROPERTIES (
 'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
 OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11145) Remove OFFLINE and NO_DROP from tables and partitions


[ 
https://issues.apache.org/jira/browse/HIVE-11145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607500#comment-14607500
 ] 

Alan Gates commented on HIVE-11145:
---

Yes, I rebased it to master.  Putting it on hbase-metastore was a mistake.

 Remove OFFLINE and NO_DROP from tables and partitions
 -

 Key: HIVE-11145
 URL: https://issues.apache.org/jira/browse/HIVE-11145
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, SQL
Affects Versions: 2.0.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-11145.patch


 Currently a table or partition can be marked no_drop or offline.  This 
 prevents users from dropping or reading (and dropping) the table or 
 partition.  This was built in 0.7 before SQL standard authorization was an 
 option. 
 This is an expensive feature as when a table is dropped every partition must 
 be fetched and checked to make sure it can be dropped.
 This feature is also redundant now that real authorization is available in 
 Hive.
 This feature should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607193#comment-14607193
 ] 

Hive QA commented on HIVE-11138:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742698/HIVE-11138.1-spark.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 6222 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestHBaseCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.initializationError
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/917/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/917/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-917/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742698 - PreCommit-HIVE-SPARK-Build

 Query fails when there isn't a comparator for an operator [Spark Branch]
 

 Key: HIVE-11138
 URL: https://issues.apache.org/jira/browse/HIVE-11138
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-11138.1-spark.patch


 In such case, OperatorComparatorFactory should default to false instead of 
 throw exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused


[ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607197#comment-14607197
 ] 

Xuefu Zhang commented on HIVE-11095:


+1

 SerDeUtils  another bug ,when Text is reused
 

 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Fix For: 2.0.0

 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt, 
 HIVE-11095.3.patch.txt


 {noformat}
 The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
 Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 How I found this bug？
 When i query data from a lzo table ， I found in results ： the length of the 
 current row is always largr than the previous row， and sometimes，the current 
 row contains the contents of the previous row。 For example ，i execute a sql ,
 {code:sql}
 select * from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
 `line` string)
 PARTITIONED BY (
 `logdate` string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '
 U'
 WITH SERDEPROPERTIES (
 'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
 OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10328) Enable new return path for cbo

2015-06-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607549#comment-14607549
 ] 

Hive QA commented on HIVE-10328:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742669/HIVE-10328.6.patch

{color:red}ERROR:{color} -1 due to 1342 failed/errored test(s), 8990 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver-groupby10.q-timestamp_comparison.q-tez_union.q-and-12-more - did 
not produce a TEST-*.xml file
TestCliDriver-infer_bucket_sort_list_bucket.q-bucketmapjoin4.q-show_tables.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-skewjoinopt16.q-udf_in_file.q-mapjoin_filter_on_outerjoin.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguitycheck
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_table_null_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_array_map_access_nonconstant
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join19
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2

[jira] [Updated] (HIVE-11140) auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh

2015-06-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11140:
--
Attachment: HIVE-11140.patch

 auto compute PROJ_HOME in hcatalog/src/test/e2e/templeton/deployers/env.sh
 --

 Key: HIVE-11140
 URL: https://issues.apache.org/jira/browse/HIVE-11140
 Project: Hive
  Issue Type: Bug
  Components: Tests, WebHCat
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11140.patch, HIVE-11140.patch


 it's currently set as
 {noformat}
 if [ -z ${PROJ_HOME} ]; then
   export PROJ_HOME=/Users/${USER}/dev/hive
 fi
 {noformat}
 but it always points to project root so can be 
 {{export PROJ_HOME=../../../../../..}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge


 [ 
https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11141:
-
Description: 
More and more complex workloads are migrated to Hive from Sql Server, Terradata 
etc.. 
And occasionally Hive gets bottlenecked on generating plans for large queries, 
the majority of the cases time is spent in fetching metadata, partitions and 
other optimizer transformation related rules

I have attached the query for the test case which needs to be tested after we 
setup database as shown below.
{code}
create database dataset_3;
use database dataset_3;
{code}

createtable.rtf - create table command
SQLQuery10.sql.mssql - explain query

It seems that the most problematic part of the code as the stack gets arbitrary 
long, in RuleRegExp.java
{code}
  @Override
  public int cost(StackNode stack) throws SemanticException {
int numElems = (stack != null ? stack.size() : 0);
String name = ;
for (int pos = numElems - 1; pos = 0; pos--) {
  name = stack.get(pos).getName() + % + name;
  Matcher m = pattern.matcher(name);
  if (m.matches()) {
return m.group().length();
  }
}
return -1;
  }
{code}


  was:
More and more complex workloads are migrated to Hive from Sql Server, Terradata 
etc.. 
And occasionally Hive gets bottlenecked on generating plans for large queries, 
the majority of the cases time is spent in fetching metadata, partitions and 
other optimizer transformation related rules

I have attached the query for the test case which needs to be tested after we 
setup database as shown below.
{code}
create database dataset_3;
use database dataset_3;
{code}

It seems that the most problematic part of the code as the stack gets arbitrary 
long, in RuleRegExp.java
{code}
  @Override
  public int cost(StackNode stack) throws SemanticException {
int numElems = (stack != null ? stack.size() : 0);
String name = ;
for (int pos = numElems - 1; pos = 0; pos--) {
  name = stack.get(pos).getName() + % + name;
  Matcher m = pattern.matcher(name);
  if (m.matches()) {
return m.group().length();
  }
}
return -1;
  }
{code}



 Improve RuleRegExp when the Expression node stack gets huge
 ---

 Key: HIVE-11141
 URL: https://issues.apache.org/jira/browse/HIVE-11141
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: SQLQuery10.sql.mssql, createtable.rtf


 More and more complex workloads are migrated to Hive from Sql Server, 
 Terradata etc.. 
 And occasionally Hive gets bottlenecked on generating plans for large 
 queries, the majority of the cases time is spent in fetching metadata, 
 partitions and other optimizer transformation related rules
 I have attached the query for the test case which needs to be tested after we 
 setup database as shown below.
 {code}
 create database dataset_3;
 use database dataset_3;
 {code}
 createtable.rtf - create table command
 SQLQuery10.sql.mssql - explain query
 It seems that the most problematic part of the code as the stack gets 
 arbitrary long, in RuleRegExp.java
 {code}
   @Override
   public int cost(StackNode stack) throws SemanticException {
 int numElems = (stack != null ? stack.size() : 0);
 String name = ;
 for (int pos = numElems - 1; pos = 0; pos--) {
   name = stack.get(pos).getName() + % + name;
   Matcher m = pattern.matcher(name);
   if (m.matches()) {
 return m.group().length();
   }
 }
 return -1;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge

2015-06-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11141:
-
Attachment: createtable.rtf
SQLQuery10.sql.mssql

 Improve RuleRegExp when the Expression node stack gets huge
 ---

 Key: HIVE-11141
 URL: https://issues.apache.org/jira/browse/HIVE-11141
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: SQLQuery10.sql.mssql, createtable.rtf


 More and more complex workloads are migrated to Hive from Sql Server, 
 Terradata etc.. 
 And occasionally Hive gets bottlenecked on generating plans for large 
 queries, the majority of the cases time is spent in fetching metadata, 
 partitions and other optimizer transformation related rules
 I have attached the query for the test case which needs to be tested after we 
 setup database as shown below.
 {code}
 create database dataset_3;
 use database dataset_3;
 {code}
 It seems that the most problematic part of the code as the stack gets 
 arbitrary long, in RuleRegExp.java
 {code}
   @Override
   public int cost(StackNode stack) throws SemanticException {
 int numElems = (stack != null ? stack.size() : 0);
 String name = ;
 for (int pos = numElems - 1; pos = 0; pos--) {
   name = stack.get(pos).getName() + % + name;
   Matcher m = pattern.matcher(name);
   if (m.matches()) {
 return m.group().length();
   }
 }
 return -1;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11141) Improve RuleRegExp when the Expression node stack gets huge

2015-06-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11141:
-
Attachment: HIVE-11141.1.patch

cc-ing [~jpullokkaran] for review.

 Improve RuleRegExp when the Expression node stack gets huge
 ---

 Key: HIVE-11141
 URL: https://issues.apache.org/jira/browse/HIVE-11141
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11141.1.patch, SQLQuery10.sql.mssql, createtable.rtf


 More and more complex workloads are migrated to Hive from Sql Server, 
 Terradata etc.. 
 And occasionally Hive gets bottlenecked on generating plans for large 
 queries, the majority of the cases time is spent in fetching metadata, 
 partitions and other optimizer transformation related rules
 I have attached the query for the test case which needs to be tested after we 
 setup database as shown below.
 {code}
 create database dataset_3;
 use database dataset_3;
 {code}
 createtable.rtf - create table command
 SQLQuery10.sql.mssql - explain query
 It seems that the most problematic part of the code as the stack gets 
 arbitrary long, in RuleRegExp.java
 {code}
   @Override
   public int cost(StackNode stack) throws SemanticException {
 int numElems = (stack != null ? stack.size() : 0);
 String name = ;
 for (int pos = numElems - 1; pos = 0; pos--) {
   name = stack.get(pos).getName() + % + name;
   Matcher m = pattern.matcher(name);
   if (m.matches()) {
 return m.group().length();
   }
 }
 return -1;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11100) Beeline should escape semi-colon in queries