date:20150721


[ 
https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635379#comment-14635379
 ] 

Pengcheng Xiong commented on HIVE-3:


[~tfriedr], thanks for your efforts. So, I am going to close this jira, 
[~shiroy] and [~libing] please feel free to reopen it if the problem remains. 
Thanks.

 ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. 
 ---

 Key: HIVE-3
 URL: https://issues.apache.org/jira/browse/HIVE-3
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 1.2.1
 Environment: 
Reporter: Shiroy Pigarez
Assignee: Pengcheng Xiong
Priority: Critical

 I was trying to perform some column statistics using hive as per the 
 documentation 
 https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive 
 and was encountering the following errors:
 Seems like a bug. Can you look into this? Thanks in advance.
 -- HIVE table
 {noformat}
 hive create table people_part(
 name string,
 address string) PARTITIONED BY (dob string, nationality varchar(2))
 row format delimited fields terminated by '\t';
 {noformat}
 --Analyze table with partition dob and nationality with FOR COLUMNS
 {noformat}
 hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) 
 COMPUTE STATISTICS FOR COLUMNS;
 NoViableAltException(-1@[])
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
 at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
 at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 FAILED: ParseException line 1:95 cannot recognize input near 'EOF' 'EOF' 
 'EOF' in column name
 {noformat}
 --Analyze table with partition dob and nationality values specified with FOR 
 COLUMNS
 {noformat}
 hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') 
 COMPUTE STATISTICS FOR COLUMNS;
 NoViableAltException(-1@[])
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
 at

[jira] [Updated] (HIVE-11310) Avoid expensive AST tree conversion to String in RowResolver

2015-07-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11310:
---
Attachment: HIVE-11310.3.patch

 Avoid expensive AST tree conversion to String in RowResolver
 

 Key: HIVE-11310
 URL: https://issues.apache.org/jira/browse/HIVE-11310
 Project: Hive
  Issue Type: Bug
  Components: Parser
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11310.1.patch, HIVE-11310.2.patch, 
 HIVE-11310.3.patch, HIVE-11310.patch


 We use the AST tree String representation of a condition in the WHERE clause 
 to identify its column in the RowResolver. This can lead to OOM Exceptions 
 when the condition is very large.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11328) Avoid String representation of expression nodes in ConstantPropagateProcFactory unless necessary


[ 
https://issues.apache.org/jira/browse/HIVE-11328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635344#comment-14635344
 ] 

Hive QA commented on HIVE-11328:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746339/HIVE-11328.patch

{color:green}SUCCESS:{color} +1 9229 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4681/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4681/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4681/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746339 - PreCommit-HIVE-TRUNK-Build

 Avoid String representation of expression nodes in 
 ConstantPropagateProcFactory unless necessary
 

 Key: HIVE-11328
 URL: https://issues.apache.org/jira/browse/HIVE-11328
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11328.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11327) HiveQL to HBase - Predicate Pushdown for composite key not working

2015-07-21 Thread Yannik Zuehlke (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yannik Zuehlke updated HIVE-11327:
--
Tags: hive, predicatepushdown, hbase  (was: hive predicatepushdown)

 HiveQL to HBase - Predicate Pushdown for composite key not working
 --

 Key: HIVE-11327
 URL: https://issues.apache.org/jira/browse/HIVE-11327
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler, Hive
Affects Versions: 0.14.0
Reporter: Yannik Zuehlke
Priority: Blocker

 I am using Hive 0.14 and Hbase 0.98.8 I would like to use HiveQL for 
 accessing a HBase table.
 I created a table with a complex composite rowkey:
 
 {quote}
 CREATE EXTERNAL TABLE db.hive_hbase (rowkey structp1:string, p2:string, 
 p3:string, column1 string, column2 string) 
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
 COLLECTION ITEMS TERMINATED BY ';'
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = 
 :key,cf:c1,cf:c2)
 TBLPROPERTIES(hbase.table.name=hbase_table);
 {quote}
 
 The table is getting successfully created, but the HiveQL query is taking 
 forever:
 
 {quote}
 SELECT * from db.hive_hbase WHERE rowkey.p1 = 'xyz';
 {quote}
 
 I am working with 1 TB of data (around 1,5 bn records) and this queries takes 
 forever (It ran over night, but did not finish in the morning).
 I changed the log4j properties to 'DEBUG' and found some interesting 
 information:
 
 {quote}
 2015-07-15 15:56:41,232 INFO  ppd.OpProcFactory
 (OpProcFactory.java:logExpr(823)) - Pushdown Predicates of FIL For Alias : 
 hive_hbase
 2015-07-15 15:56:41,232 INFO  ppd.OpProcFactory 
 (OpProcFactory.java:logExpr(826)) - (rowkey.p1 = 'xyz')
 {quote}
 
 But some lines later:
 
 {quote}
 2015-07-15 15:56:41,430 DEBUG ppd.OpProcFactory 
 (OpProcFactory.java:pushFilterToStorageHandler(1051)) - No pushdown possible 
 for predicate:  (rowkey.p1 = 'xyz')
 {quote}
 
 So my guess is: HiveQL over HBase does not do any predicate pushdown but 
 starts a MapReduce job.
 The normal HBase scan (via the HBase Shell) takes around 5 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity


[ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636165#comment-14636165
 ] 

Hive QA commented on HIVE-7723:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746439/HIVE-7723.12.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9245 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4688/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4688/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4688/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746439 - PreCommit-HIVE-TRUNK-Build

 Explain plan for complex query with lots of partitions is slow due to 
 in-efficient collection used to find a matching ReadEntity
 

 Key: HIVE-7723
 URL: https://issues.apache.org/jira/browse/HIVE-7723
 Project: Hive
  Issue Type: Bug
  Components: CLI, Physical Optimizer
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, 
 HIVE-7723.11.patch, HIVE-7723.11.patch, HIVE-7723.12.patch, 
 HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch, 
 HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, HIVE-7723.9.patch


 Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
 showed that ReadEntity.equals is taking ~40% of the CPU.
 ReadEntity.equals is called from the snippet below.
 Again and again the set is iterated over to get the actual match, a HashMap 
 is a better option for this case as Set doesn't have a Get method.
 Also for ReadEntity equals is case-insensitive while hash is , which is an 
 undesired behavior.
 {code}
 public static ReadEntity addInput(SetReadEntity inputs, ReadEntity 
 newInput) {
 // If the input is already present, make sure the new parent is added to 
 the input.
 if (inputs.contains(newInput)) {
   for (ReadEntity input : inputs) {
 if (input.equals(newInput)) {
   if ((newInput.getParents() != null)  
 (!newInput.getParents().isEmpty())) {
 input.getParents().addAll(newInput.getParents());
 input.setDirect(input.isDirect() || newInput.isDirect());
   }
   return input;
 }
   }
   assert false;
 } else {
   inputs.add(newInput);
   return newInput;
 }
 // make compile happy
 return null;
   }
 {code}
 This is the query used : 
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN

[jira] [Commented] (HIVE-11335) Multi-Join Inner Query producing incorrect results


[ 
https://issues.apache.org/jira/browse/HIVE-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636164#comment-14636164
 ] 

fatkun commented on HIVE-11335:
---

[~jcamachorodriguez] Could you take a look?

 Multi-Join Inner Query producing incorrect results
 --

 Key: HIVE-11335
 URL: https://issues.apache.org/jira/browse/HIVE-11335
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 1.1.0
 Environment: CDH5.4.0
Reporter: fatkun

 test step
 {code}
 create table log (uid string, uid2 string);
 insert into log values ('1', '1');
 create table user (uid string, name string);
 insert into user values ('1', test1);
 select b.name, c.name from log a
  left outer join (select uid, name from user) b on (a.uid=b.uid)
  left outer join user c on (a.uid2=c.uid);
 {code}
 return wrong result：
 1 test1
 It should be both return test1
 I try to find error, if I use this query, return right result.（join key 
 different）
 {code}
 select b.name, c.name from log a
  left outer join (select uid, name from user) b on (a.uid=b.uid)
  left outer join user c on (a.uid=c.uid);
 {code}
 The explain is different,Query1 only select one colum
 {code}
 b:user 
   TableScan
 alias: user
 Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column 
 stats: NONE
 Select Operator
   expressions: uid (type: string)
   outputColumnNames: _col0
 {code}
 I think there is something wrong in ColumnPruner.But i cannot find it out.
 It may relate HIVE-10996



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11196) Utilities.getPartitionDesc() should try to reuse TableDesc object


[ 
https://issues.apache.org/jira/browse/HIVE-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636128#comment-14636128
 ] 

Hive QA commented on HIVE-11196:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746431/HIVE-11196.3.patch

{color:green}SUCCESS:{color} +1 9245 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4687/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4687/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4687/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746431 - PreCommit-HIVE-TRUNK-Build

 Utilities.getPartitionDesc() should try to reuse TableDesc object 
 --

 Key: HIVE-11196
 URL: https://issues.apache.org/jira/browse/HIVE-11196
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11196.1.patch, HIVE-11196.2.patch, 
 HIVE-11196.3.patch


 Currently, Utilities.getPartitionDesc() creates a new PartitionDesc object 
 which inturn creates new TableDesc object via 
 Utilities.getTableDesc(part.getTable()) for every call. This value needs to 
 be reused  so that we can avoid the expense of creating new Descriptor object 
 wherever possible



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)


[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636202#comment-14636202
 ] 

wangchangchun commented on HIVE-11055:
--

My hplsql is OK now.
Now  I want to use permant stored procedure.
I should use .hplsqlrc file.
.hplsqlrc file should put where? and the content of  stored procedure put where?
can you give me an example?

 HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
 ---

 Key: HIVE-11055
 URL: https://issues.apache.org/jira/browse/HIVE-11055
 Project: Hive
  Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Fix For: 2.0.0

 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
 HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml


 There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
 (actually any SQL-on-Hadoop implementation and any JDBC source).
 Alan Gates offered to contribute it to Hive under HPL/SQL name 
 (org.apache.hive.hplsql package). This JIRA is to create a patch to 
 contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)


[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636203#comment-14636203
 ] 

wangchangchun commented on HIVE-11055:
--

My hplsql is OK now.
Now  I want to use permant stored procedure.
I should use .hplsqlrc file.
.hplsqlrc file should put where? and the content of  stored procedure put where?
can you give me an example?

 HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
 ---

 Key: HIVE-11055
 URL: https://issues.apache.org/jira/browse/HIVE-11055
 Project: Hive
  Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Fix For: 2.0.0

 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
 HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml


 There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
 (actually any SQL-on-Hadoop implementation and any JDBC source).
 Alan Gates offered to contribute it to Hive under HPL/SQL name 
 (org.apache.hive.hplsql package). This JIRA is to create a patch to 
 contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11334) Incorrect answer when facing multiple chars delim and negative count for substring_index

2015-07-21 Thread zhichao-li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636217#comment-14636217
 ] 

zhichao-li commented on HIVE-11334:
---

https://patch-diff.githubusercontent.com/raw/apache/hive/pull/47.patch

 Incorrect answer when facing multiple chars delim and negative count for 
 substring_index 
 -

 Key: HIVE-11334
 URL: https://issues.apache.org/jira/browse/HIVE-11334
 Project: Hive
  Issue Type: Bug
Reporter: zhichao-li
Priority: Minor

 substring_index(www||apache||org, ||, -2) would return |apache||org 
 instead of apache||org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11320) ACID enable predicate pushdown for insert-only delta file

2015-07-21 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635223#comment-14635223
 ] 

Eugene Koifman commented on HIVE-11320:
---

[~alangates], could you review please

 ACID enable predicate pushdown for insert-only delta file
 -

 Key: HIVE-11320
 URL: https://issues.apache.org/jira/browse/HIVE-11320
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11320.patch


 Given ACID table T against which some Insert/Update/Delete has been executed 
 but not Major Compaction.
 This table will have some number of delta files.  (and possibly base files).
 Given a query: select * from T where c1 = 5;
 OrcRawRecordMerger() c'tor currently disables predicate pushdown in ORC to 
 the delta file via  eventOptions.searchArgument(null, null);
 When a delta file is known to only have Insert events we can safely push the 
 predicate.  
 ORC maintains stats in a footer which have counts of insert/update/delete 
 events in the file - this can be used to determine that a given delta file 
 only has Insert events.
 See OrcRecordUpdate.parseAcidStats()
 This will enable PPD for Streaming Ingest (HIVE-5687) use cases which by 
 definition only generate Insert events. 
 PPD for deltas with arbitrary types of events can be achieved but it is more 
 complicated and will be addressed separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11320) ACID enable predicate pushdown for insert-only delta file


[ 
https://issues.apache.org/jira/browse/HIVE-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635403#comment-14635403
 ] 

Alan Gates commented on HIVE-11320:
---

+1

 ACID enable predicate pushdown for insert-only delta file
 -

 Key: HIVE-11320
 URL: https://issues.apache.org/jira/browse/HIVE-11320
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11320.patch


 Given ACID table T against which some Insert/Update/Delete has been executed 
 but not Major Compaction.
 This table will have some number of delta files.  (and possibly base files).
 Given a query: select * from T where c1 = 5;
 OrcRawRecordMerger() c'tor currently disables predicate pushdown in ORC to 
 the delta file via  eventOptions.searchArgument(null, null);
 When a delta file is known to only have Insert events we can safely push the 
 predicate.  
 ORC maintains stats in a footer which have counts of insert/update/delete 
 events in the file - this can be used to determine that a given delta file 
 only has Insert events.
 See OrcRecordUpdate.parseAcidStats()
 This will enable PPD for Streaming Ingest (HIVE-5687) use cases which by 
 definition only generate Insert events. 
 PPD for deltas with arbitrary types of events can be achieved but it is more 
 complicated and will be addressed separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8176) Close of FSDataOutputStream in OrcRecordUpdater ctor should be in finally clause

2015-07-21 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8176:
-
Description: 
{code}
try {
  FSDataOutputStream strm = fs.create(new Path(path, ACID_FORMAT), false);
  strm.writeInt(ORC_ACID_VERSION);
  strm.close();
} catch (IOException ioe) {
{code}

If strm.writeInt() throws IOE, strm would be left unclosed.

  was:
{code}
try {
  FSDataOutputStream strm = fs.create(new Path(path, ACID_FORMAT), false);
  strm.writeInt(ORC_ACID_VERSION);
  strm.close();
} catch (IOException ioe) {
{code}
If strm.writeInt() throws IOE, strm would be left unclosed.


 Close of FSDataOutputStream in OrcRecordUpdater ctor should be in finally 
 clause
 

 Key: HIVE-8176
 URL: https://issues.apache.org/jira/browse/HIVE-8176
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: SUYEON LEE
Priority: Minor
 Attachments: HIVE-8176.patch


 {code}
 try {
   FSDataOutputStream strm = fs.create(new Path(path, ACID_FORMAT), false);
   strm.writeInt(ORC_ACID_VERSION);
   strm.close();
 } catch (IOException ioe) {
 {code}
 If strm.writeInt() throws IOE, strm would be left unclosed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11301) thrift metastore issue when getting stats results in disconnect


 [ 
https://issues.apache.org/jira/browse/HIVE-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11301:
---
Attachment: HIVE-11301.02.patch

It seems that the QA run is not completed. Now resubmit patch for a complete QA 
run.

 thrift metastore issue when getting stats results in disconnect
 ---

 Key: HIVE-11301
 URL: https://issues.apache.org/jira/browse/HIVE-11301
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Pengcheng Xiong
 Attachments: HIVE-11301.01.patch, HIVE-11301.02.patch


 On metastore side it looks like this:
 {noformat}
 2015-07-17 20:32:27,795 ERROR [pool-3-thread-150]: server.TThreadPoolServer 
 (TThreadPoolServer.java:run(294)) - Thrift error occurred during processing 
 of message.
 org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
 unset! Struct:AggrStats(colStats:null, partsFound:0)
 at 
 org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 and then
 {noformat}
 2015-07-17 20:32:27,796 WARN  [pool-3-thread-150]: 
 transport.TIOStreamTransport (TIOStreamTransport.java:close(112)) - Error 
 closing output stream.
 java.net.SocketException: Socket closed
 at 
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116)
 at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
 at 
 org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
 at org.apache.thrift.transport.TSocket.close(TSocket.java:196)
 at 
 org.apache.hadoop.hive.thrift.TFilterTransport.close(TFilterTransport.java:52)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:304)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Which on client manifests as
 {noformat}
 2015-07-17 20:32:27,796 WARN  [main()]: metastore.RetryingMetaStoreClient 
 (RetryingMetaStoreClient.java:invoke(187)) - MetaStoreClient lost connection. 
 Attempting to reconnect.
 org.apache.thrift.transport.TTransportException
 at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
 at 
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_aggr_stats_for(ThriftHiveMetastore.java:3029)
 at

[jira] [Resolved] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.


 [ 
https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-3.

Resolution: Cannot Reproduce

 ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. 
 ---

 Key: HIVE-3
 URL: https://issues.apache.org/jira/browse/HIVE-3
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 1.2.1
 Environment: 
Reporter: Shiroy Pigarez
Assignee: Pengcheng Xiong
Priority: Critical

 I was trying to perform some column statistics using hive as per the 
 documentation 
 https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive 
 and was encountering the following errors:
 Seems like a bug. Can you look into this? Thanks in advance.
 -- HIVE table
 {noformat}
 hive create table people_part(
 name string,
 address string) PARTITIONED BY (dob string, nationality varchar(2))
 row format delimited fields terminated by '\t';
 {noformat}
 --Analyze table with partition dob and nationality with FOR COLUMNS
 {noformat}
 hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) 
 COMPUTE STATISTICS FOR COLUMNS;
 NoViableAltException(-1@[])
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
 at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
 at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 FAILED: ParseException line 1:95 cannot recognize input near 'EOF' 'EOF' 
 'EOF' in column name
 {noformat}
 --Analyze table with partition dob and nationality values specified with FOR 
 COLUMNS
 {noformat}
 hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') 
 COMPUTE STATISTICS FOR COLUMNS;
 NoViableAltException(-1@[])
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
 at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
 at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
 at

[jira] [Commented] (HIVE-11310) Avoid expensive AST tree conversion to String in RowResolver


[ 
https://issues.apache.org/jira/browse/HIVE-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635515#comment-14635515
 ] 

Hive QA commented on HIVE-11310:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746374/HIVE-11310.3.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 9229 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_gby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_lateral_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_move_tasks_share_dependencies
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_lateral_view
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_move_tasks_share_dependencies
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4682/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4682/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4682/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746374 - PreCommit-HIVE-TRUNK-Build

 Avoid expensive AST tree conversion to String in RowResolver
 

 Key: HIVE-11310
 URL: https://issues.apache.org/jira/browse/HIVE-11310
 Project: Hive
  Issue Type: Bug
  Components: Parser
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11310.1.patch, HIVE-11310.2.patch, 
 HIVE-11310.3.patch, HIVE-11310.patch


 We use the AST tree String representation of a condition in the WHERE clause 
 to identify its column in the RowResolver. This can lead to OOM Exceptions 
 when the condition is very large.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11254) Process result sets returned by a stored procedure


[ 
https://issues.apache.org/jira/browse/HIVE-11254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635397#comment-14635397
 ] 

Alan Gates commented on HIVE-11254:
---

Why are you including hive-jdbc version 1.2.1?  It seems like you want to set 
it to ${project.version} so that you get whatever was just built.

 Process result sets returned by a stored procedure
 --

 Key: HIVE-11254
 URL: https://issues.apache.org/jira/browse/HIVE-11254
 Project: Hive
  Issue Type: Improvement
  Components: hpl/sql
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Attachments: HIVE-11254.1.patch, HIVE-11254.2.patch, 
 HIVE-11254.3.patch, HIVE-11254.4.patch


 Stored procedure can return one or more result sets. A caller should be able 
 to process them.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11330) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression

2015-07-21 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-11330:
---
Attachment: HIVE-11330.patch

 Add early termination for recursion in 
 StatsRulesProcFactory$FilterStatsRule.evaluateExpression
 ---

 Key: HIVE-11330
 URL: https://issues.apache.org/jira/browse/HIVE-11330
 Project: Hive
  Issue Type: Bug
  Components: Hive, Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11330.patch


 Queries with heavily nested filters can cause a StackOverflowError
 {code}
 Exception in thread main java.lang.StackOverflowError
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:301)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
  
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11229) Mutation API: Coordinator communication with meta store should be optional


[ 
https://issues.apache.org/jira/browse/HIVE-11229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635470#comment-14635470
 ] 

Alan Gates commented on HIVE-11229:
---

+1

 Mutation API: Coordinator communication with meta store should be optional
 --

 Key: HIVE-11229
 URL: https://issues.apache.org/jira/browse/HIVE-11229
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 2.0.0
Reporter: Elliot West
Assignee: Elliot West
  Labels: streaming_api
 Attachments: HIVE-11229.1.patch


 [~ekoifman] raised a theoretical issue with the streaming mutation API 
 (HIVE-10165) where worker nodes operating in a distributed cluster might 
 overwhelm a meta store while trying to obtain partition locks. Although this 
 does not happen in practice (see HIVE-11228), the API does communicate with 
 the meta store in this manner to obtain partition paths and create new 
 partitions. Therefore the issue described does in fact exist in the current 
 implementation, albeit in a different code path. I’d like to make such 
 communication optional like so:
 * When the user chooses not to create partitions on demand, no meta store 
 connection will be created in the {{MutationCoordinators}}. Additionally, 
 partition paths will be resolved using 
 {{org.apache.hadoop.hive.metastore.Warehouse.getPartitionPath(Path, 
 LinkedHashMapString, String)}} which should be suitable so long as standard 
 Hive partition layouts are followed.
 * If the user does choose to create partitions on demand then the system will 
 operate as is does currently; using the meta store to both issue 
 {{add_partition}} events and look up partition meta data.
 * The documentation will be updated to describe these behaviours and outline 
 alternative approaches to collecting affected partition names and creating 
 partitions in a less intensive manner.
 Side note for follow up: The parameter names {{tblName}} and {{dbName}} seem 
 to be the wrong way around on the method 
 {{org.apache.hadoop.hive.metastore.IMetaStoreClient.getPartition(String, 
 String, ListString)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11334) Incorrect answer when facing multiple chars delim and negative count for substring_index

2015-07-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636126#comment-14636126
 ] 

ASF GitHub Bot commented on HIVE-11334:
---

GitHub user zhichao-li opened a pull request:

https://github.com/apache/hive/pull/47

HIVE-11334-fix substring_index for multiple chars delim



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhichao-li/hive substringindex

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/47.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #47


commit 2acedeef1ca4ff1211410e9ffe9c437f2902de0d
Author: zhichao.li zhichao...@intel.com
Date:   2015-07-22T01:50:03Z

fix substring_index for multiple chars delim




 Incorrect answer when facing multiple chars delim and negative count for 
 substring_index 
 -

 Key: HIVE-11334
 URL: https://issues.apache.org/jira/browse/HIVE-11334
 Project: Hive
  Issue Type: Bug
Reporter: zhichao-li
Priority: Minor

 substring_index(www||apache||org, ||, -2) would return |apache||org 
 instead of apache||org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11328) Avoid String representation of expression nodes in ConstantPropagateProcFactory unless necessary

2015-07-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636155#comment-14636155
 ] 

Ashutosh Chauhan commented on HIVE-11328:
-

+1

 Avoid String representation of expression nodes in 
 ConstantPropagateProcFactory unless necessary
 

 Key: HIVE-11328
 URL: https://issues.apache.org/jira/browse/HIVE-11328
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11328.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez

2015-07-21 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636347#comment-14636347
 ] 

Lefty Leverenz commented on HIVE-10673:
---

Doc note:  *hive.optimize.dynamic.partition.hashjoin* should be documented in 
the wiki.  Does it belong in the Tez section of Configuration Properties, or 
should it go in the general query execution section and just be added to the 
list of related parameters at the beginning of the Tez section?

* [Configuration Properties -- Tez | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez]
 
* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Commit error:  The commit to master was mislabeled HIVE-11303: Getting Tez 
LimitExceededException after dag execution on large query (commit ID 
04d54f61c9f56906160936751e772080c079498c).  The actual HIVE-11303 has commit ID 
72f97fc7760134465333983fc40766e9e864e643.

 Dynamically partitioned hash join for Tez
 -

 Key: HIVE-10673
 URL: https://issues.apache.org/jira/browse/HIVE-10673
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning, Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
  Labels: TODOC1.3
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10673.1.patch, HIVE-10673.10.patch, 
 HIVE-10673.11.patch, HIVE-10673.12, HIVE-10673.2.patch, HIVE-10673.3.patch, 
 HIVE-10673.4.patch, HIVE-10673.5.patch, HIVE-10673.6.patch, 
 HIVE-10673.7.patch, HIVE-10673.8.patch, HIVE-10673.9.patch


 Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 
 2/3 of the CPU was spent during sorting/merging.
 While this does not work for MR, for other execution engines (such as Tez), 
 it is possible to create a reduce-side join that uses unsorted inputs in 
 order to eliminate the sorting, which may be faster than a shuffle join. To 
 join on unsorted inputs, we can use the hash join algorithm to perform the 
 join in the reducer. This will require the small tables in the join to fit in 
 the reducer/hash table for this to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11310) Avoid expensive AST tree conversion to String in RowResolver


[ 
https://issues.apache.org/jira/browse/HIVE-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636063#comment-14636063
 ] 

Hive QA commented on HIVE-11310:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746399/HIVE-11310.4.patch

{color:green}SUCCESS:{color} +1 9245 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4686/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4686/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4686/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746399 - PreCommit-HIVE-TRUNK-Build

 Avoid expensive AST tree conversion to String in RowResolver
 

 Key: HIVE-11310
 URL: https://issues.apache.org/jira/browse/HIVE-11310
 Project: Hive
  Issue Type: Bug
  Components: Parser
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11310.1.patch, HIVE-11310.2.patch, 
 HIVE-11310.3.patch, HIVE-11310.4.patch, HIVE-11310.patch


 We use the AST tree String representation of a condition in the WHERE clause 
 to identify its column in the RowResolver. This can lead to OOM Exceptions 
 when the condition is very large.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10799) Refactor the SearchArgumentFactory to remove the dependence on ExprNodeGenericFuncDesc


[ 
https://issues.apache.org/jira/browse/HIVE-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636064#comment-14636064
 ] 

Prasanth Jayachandran commented on HIVE-10799:
--

PPD on Char types is broken for bloom filters. Char object is not trimmed 
before inserting into bloom filter. So when we convert the stats object to 
string the hashcodes will not match. We are hitting this here 
https://issues.apache.org/jira/browse/HIVE-11312?focusedCommentId=14634349page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634349

Would it make sense to add CHAR to predicate types?

 Refactor the SearchArgumentFactory to remove the dependence on 
 ExprNodeGenericFuncDesc
 --

 Key: HIVE-10799
 URL: https://issues.apache.org/jira/browse/HIVE-10799
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-10799.patch, HIVE-10799.patch, HIVE-10799.patch, 
 HIVE-10799.patch, HIVE-10799.patch


 SearchArgumentFactory and SearchArgumentImpl are high level and shouldn't 
 depend on the internals of Hive's AST model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11333) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ColumnPruner prunes columns of UnionOperator that should be kept


 [ 
https://issues.apache.org/jira/browse/HIVE-11333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11333:
---
Attachment: HIVE-11333.01.patch

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): ColumnPruner 
 prunes columns of UnionOperator that should be kept
 --

 Key: HIVE-11333
 URL: https://issues.apache.org/jira/browse/HIVE-11333
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11333.01.patch


 unionOperator will have the schema following the operator in the first 
 branch. Because ColumnPruner prunes columns based on the internal name, the 
 column in other branches may be pruned due to a different internal name from 
 the first branch. To repro, run rcfile_union.q with return path turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11334) Incorrect answer when facing multiple chars delim and negative count for substring_index

2015-07-21 Thread zhichao-li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhichao-li updated HIVE-11334:
--
Priority: Major  (was: Minor)

 Incorrect answer when facing multiple chars delim and negative count for 
 substring_index 
 -

 Key: HIVE-11334
 URL: https://issues.apache.org/jira/browse/HIVE-11334
 Project: Hive
  Issue Type: Bug
Reporter: zhichao-li

 substring_index(www||apache||org, ||, -2) would return |apache||org 
 instead of apache||org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez

2015-07-21 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-10673:
--
Labels: TODOC1.3  (was: )

 Dynamically partitioned hash join for Tez
 -

 Key: HIVE-10673
 URL: https://issues.apache.org/jira/browse/HIVE-10673
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning, Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
  Labels: TODOC1.3
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10673.1.patch, HIVE-10673.10.patch, 
 HIVE-10673.11.patch, HIVE-10673.12, HIVE-10673.2.patch, HIVE-10673.3.patch, 
 HIVE-10673.4.patch, HIVE-10673.5.patch, HIVE-10673.6.patch, 
 HIVE-10673.7.patch, HIVE-10673.8.patch, HIVE-10673.9.patch


 Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 
 2/3 of the CPU was spent during sorting/merging.
 While this does not work for MR, for other execution engines (such as Tez), 
 it is possible to create a reduce-side join that uses unsorted inputs in 
 order to eliminate the sorting, which may be faster than a shuffle join. To 
 join on unsorted inputs, we can use the hash join algorithm to perform the 
 join in the reducer. This will require the small tables in the join to fit in 
 the reducer/hash table for this to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8128) Improve Parquet Vectorization

2015-07-21 Thread Dong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-8128:

Attachment: testParquetFile

Upload the Parquet file for qfile test, which should be put at ./data/files/

 Improve Parquet Vectorization
 -

 Key: HIVE-8128
 URL: https://issues.apache.org/jira/browse/HIVE-8128
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Dong Chen
 Fix For: parquet-branch

 Attachments: HIVE-8128-parquet.patch.POC, HIVE-8128.1-parquet.patch, 
 HIVE-8128.6-parquet.patch, testParquetFile


 NO PRECOMMIT TESTS
 We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, 
 VectorizedOrcSerde) which was partially done in HIVE-5998.
 As discussed in PARQUET-131, we will work out Hive POC based on the new 
 Parquet vectorized API, and then finish the implementation after finilized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11335) Multi-Join Inner Query producing incorrect results


 [ 
https://issues.apache.org/jira/browse/HIVE-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fatkun updated HIVE-11335:
--
Description: 
test step

{code}
create table log (uid string, uid2 string);
insert into log values ('1', '1');

create table user (uid string, name string);
insert into user values ('1', test1);

select b.name, c.name from log a
 left outer join (select uid, name from user) b on (a.uid=b.uid)
 left outer join user c on (a.uid2=c.uid);

{code}
return wrong result：
1   test1

It should be both return test1

I try to find error, if I use this query, return right result.（join key 
different）
{code}
select b.name, c.name from log a
 left outer join (select uid, name from user) b on (a.uid=b.uid)
 left outer join user c on (a.uid=c.uid);
{code}

The explain is different,Query1 only select one colum. It should select uid and 
name.
{code}
b:user 
  TableScan
alias: user
Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: uid (type: string)
  outputColumnNames: _col0
{code}
I think there is something wrong in ColumnPruner.But i cannot find it out.
It may relate HIVE-10996




  was:
test step

{code}
create table log (uid string, uid2 string);
insert into log values ('1', '1');

create table user (uid string, name string);
insert into user values ('1', test1);

select b.name, c.name from log a
 left outer join (select uid, name from user) b on (a.uid=b.uid)
 left outer join user c on (a.uid2=c.uid);

{code}
return wrong result：
1   test1

It should be both return test1

I try to find error, if I use this query, return right result.（join key 
different）
{code}
select b.name, c.name from log a
 left outer join (select uid, name from user) b on (a.uid=b.uid)
 left outer join user c on (a.uid=c.uid);
{code}

The explain is different,Query1 only select one colum
{code}
b:user 
  TableScan
alias: user
Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: uid (type: string)
  outputColumnNames: _col0
{code}
I think there is something wrong in ColumnPruner.But i cannot find it out.
It may relate HIVE-10996





 Multi-Join Inner Query producing incorrect results
 --

 Key: HIVE-11335
 URL: https://issues.apache.org/jira/browse/HIVE-11335
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 1.1.0
 Environment: CDH5.4.0
Reporter: fatkun

 test step
 {code}
 create table log (uid string, uid2 string);
 insert into log values ('1', '1');
 create table user (uid string, name string);
 insert into user values ('1', test1);
 select b.name, c.name from log a
  left outer join (select uid, name from user) b on (a.uid=b.uid)
  left outer join user c on (a.uid2=c.uid);
 {code}
 return wrong result：
 1 test1
 It should be both return test1
 I try to find error, if I use this query, return right result.（join key 
 different）
 {code}
 select b.name, c.name from log a
  left outer join (select uid, name from user) b on (a.uid=b.uid)
  left outer join user c on (a.uid=c.uid);
 {code}
 The explain is different,Query1 only select one colum. It should select uid 
 and name.
 {code}
 b:user 
   TableScan
 alias: user
 Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column 
 stats: NONE
 Select Operator
   expressions: uid (type: string)
   outputColumnNames: _col0
 {code}
 I think there is something wrong in ColumnPruner.But i cannot find it out.
 It may relate HIVE-10996



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11271) java.lang.IndexOutOfBoundsException when union all with if function

2015-07-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636169#comment-14636169
 ] 

Ashutosh Chauhan commented on HIVE-11271:
-

[~ychena] Can you check if this issue is fixed by HIVE-11333 ? If so, I think 
that is a better fix, since such issues should be resolved at compile time, not 
run time (ie Operators should not participate in this) cc: [~pxiong]

 java.lang.IndexOutOfBoundsException when union all with if function
 ---

 Key: HIVE-11271
 URL: https://issues.apache.org/jira/browse/HIVE-11271
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.14.0, 1.0.0, 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11271.1.patch


 Some queries with Union all as subquery fail in MapReduce task with 
 stacktrace:
 {noformat}
 15/07/15 14:19:30 [pool-13-thread-1]: INFO exec.UnionOperator: Initializing 
 operator UNION[104]
 15/07/15 14:19:30 [Thread-72]: INFO mapred.LocalJobRunner: Map task executor 
 complete.
 15/07/15 14:19:30 [Thread-72]: WARN mapred.LocalJobRunner: 
 job_local826862759_0005
 java.lang.Exception: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 10 more
 Caused by: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
   ... 14 more
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 17 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140)
   ... 21 more
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at

[jira] [Commented] (HIVE-11271) java.lang.IndexOutOfBoundsException when union all with if function

2015-07-21 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636085#comment-14636085
 ] 

Yongzhi Chen commented on HIVE-11271:
-

Thanks [~szehon] for reviewing it. 

 java.lang.IndexOutOfBoundsException when union all with if function
 ---

 Key: HIVE-11271
 URL: https://issues.apache.org/jira/browse/HIVE-11271
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.14.0, 1.0.0, 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11271.1.patch


 Some queries with Union all as subquery fail in MapReduce task with 
 stacktrace:
 {noformat}
 15/07/15 14:19:30 [pool-13-thread-1]: INFO exec.UnionOperator: Initializing 
 operator UNION[104]
 15/07/15 14:19:30 [Thread-72]: INFO mapred.LocalJobRunner: Map task executor 
 complete.
 15/07/15 14:19:30 [Thread-72]: WARN mapred.LocalJobRunner: 
 job_local826862759_0005
 java.lang.Exception: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 10 more
 Caused by: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
   ... 14 more
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 17 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140)
   ... 21 more
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119)
   ... 21 more
 {noformat}
 Reproduce:
 {noformat}

[jira] [Updated] (HIVE-11335) Multi-Join Inner Query producing incorrect results


 [ 
https://issues.apache.org/jira/browse/HIVE-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fatkun updated HIVE-11335:
--
Attachment: query1.txt
query2.txt

attach query explain

 Multi-Join Inner Query producing incorrect results
 --

 Key: HIVE-11335
 URL: https://issues.apache.org/jira/browse/HIVE-11335
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 1.1.0
 Environment: CDH5.4.0
Reporter: fatkun
 Attachments: query1.txt, query2.txt


 test step
 {code}
 create table log (uid string, uid2 string);
 insert into log values ('1', '1');
 create table user (uid string, name string);
 insert into user values ('1', test1);
 select b.name, c.name from log a
  left outer join (select uid, name from user) b on (a.uid=b.uid)
  left outer join user c on (a.uid2=c.uid);
 {code}
 return wrong result：
 1 test1
 It should be both return test1
 I try to find error, if I use this query, return right result.（join key 
 different）
 {code}
 select b.name, c.name from log a
  left outer join (select uid, name from user) b on (a.uid=b.uid)
  left outer join user c on (a.uid=c.uid);
 {code}
 The explain is different,Query1 only select one colum. It should select uid 
 and name.
 {code}
 b:user 
   TableScan
 alias: user
 Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column 
 stats: NONE
 Select Operator
   expressions: uid (type: string)
   outputColumnNames: _col0
 {code}
 I think there is something wrong in ColumnPruner.But i cannot find it out.
 It may relate HIVE-10996



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11335) Multi-Join Inner Query producing incorrect results


 [ 
https://issues.apache.org/jira/browse/HIVE-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fatkun updated HIVE-11335:
--
Description: 
test step

{code}
create table log (uid string, uid2 string);
insert into log values ('1', '1');

create table user (uid string, name string);
insert into user values ('1', test1);

select b.name, c.name from log a
 left outer join (select uid, name from user) b on (a.uid=b.uid)
 left outer join user c on (a.uid2=c.uid);

{code}
return wrong result：
1   test1

It should be both return test1

I try to find error, if I use this query, return right result.（join key 
different）
{code}
select b.name, c.name from log a
 left outer join (select uid, name from user) b on (a.uid=b.uid)
 left outer join user c on (a.uid=c.uid);
{code}

The explain is different,Query1 only select one colum
{code}
b:user 
  TableScan
alias: user
Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: uid (type: string)
  outputColumnNames: _col0
{code}
I think there is something wrong in ColumnPruner.But i cannot find it out.
It may relate HIVE-10996




  was:
test step

```
create table log (uid string, uid2 string);
insert into log values ('1', '1');

create table user (uid string, name string);
insert into user values ('1', test1);

select b.name, c.name from log a
 left outer join (select uid, name from user) b on (a.uid=b.uid)
 left outer join user c on (a.uid2=c.uid);

```
return wrong result：
1   test1

It should be both return test1

I try to find error, if I use this query, return right result.（join key 
different）
```
select b.name, c.name from log a
 left outer join (select uid, name from user) b on (a.uid=b.uid)
 left outer join user c on (a.uid=c.uid);
```

The explain is different,Query1 only select one colum
```
b:user 
  TableScan
alias: user
Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: uid (type: string)
  outputColumnNames: _col0
```
I think there is something wrong in ColumnPruner.But i cannot find it out.
It may relate HIVE-10996





 Multi-Join Inner Query producing incorrect results
 --

 Key: HIVE-11335
 URL: https://issues.apache.org/jira/browse/HIVE-11335
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 1.1.0
 Environment: CDH5.4.0
Reporter: fatkun

 test step
 {code}
 create table log (uid string, uid2 string);
 insert into log values ('1', '1');
 create table user (uid string, name string);
 insert into user values ('1', test1);
 select b.name, c.name from log a
  left outer join (select uid, name from user) b on (a.uid=b.uid)
  left outer join user c on (a.uid2=c.uid);
 {code}
 return wrong result：
 1 test1
 It should be both return test1
 I try to find error, if I use this query, return right result.（join key 
 different）
 {code}
 select b.name, c.name from log a
  left outer join (select uid, name from user) b on (a.uid=b.uid)
  left outer join user c on (a.uid=c.uid);
 {code}
 The explain is different,Query1 only select one colum
 {code}
 b:user 
   TableScan
 alias: user
 Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column 
 stats: NONE
 Select Operator
   expressions: uid (type: string)
   outputColumnNames: _col0
 {code}
 I think there is something wrong in ColumnPruner.But i cannot find it out.
 It may relate HIVE-10996



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11077) Add support in parser and wire up to txn manager


[ 
https://issues.apache.org/jira/browse/HIVE-11077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635611#comment-14635611
 ] 

Alan Gates commented on HIVE-11077:
---

Comments posted to review board.

 Add support in parser and wire up to txn manager
 

 Key: HIVE-11077
 URL: https://issues.apache.org/jira/browse/HIVE-11077
 Project: Hive
  Issue Type: Sub-task
  Components: SQL, Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11077.3.patch, HIVE-11077.5.patch, 
 HIVE-11077.6.patch, HIVE-11077.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11310) Avoid expensive AST tree conversion to String in RowResolver

2015-07-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11310:
---
Attachment: HIVE-11310.4.patch

 Avoid expensive AST tree conversion to String in RowResolver
 

 Key: HIVE-11310
 URL: https://issues.apache.org/jira/browse/HIVE-11310
 Project: Hive
  Issue Type: Bug
  Components: Parser
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11310.1.patch, HIVE-11310.2.patch, 
 HIVE-11310.3.patch, HIVE-11310.4.patch, HIVE-11310.patch


 We use the AST tree String representation of a condition in the WHERE clause 
 to identify its column in the RowResolver. This can lead to OOM Exceptions 
 when the condition is very large.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11271) java.lang.IndexOutOfBoundsException when union all with if function

2015-07-21 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635581#comment-14635581
 ] 

Szehon Ho commented on HIVE-11271:
--

Sorry for late reply.  The overall idea makes sense (keeping track of 
corresponding columns in parent filter condition), so +1 from me.

 java.lang.IndexOutOfBoundsException when union all with if function
 ---

 Key: HIVE-11271
 URL: https://issues.apache.org/jira/browse/HIVE-11271
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.14.0, 1.0.0, 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11271.1.patch


 Some queries with Union all as subquery fail in MapReduce task with 
 stacktrace:
 {noformat}
 15/07/15 14:19:30 [pool-13-thread-1]: INFO exec.UnionOperator: Initializing 
 operator UNION[104]
 15/07/15 14:19:30 [Thread-72]: INFO mapred.LocalJobRunner: Map task executor 
 complete.
 15/07/15 14:19:30 [Thread-72]: WARN mapred.LocalJobRunner: 
 job_local826862759_0005
 java.lang.Exception: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 10 more
 Caused by: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
   ... 14 more
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 17 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140)
   ... 21 more
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
   at

[jira] [Commented] (HIVE-11301) thrift metastore issue when getting stats results in disconnect


[ 
https://issues.apache.org/jira/browse/HIVE-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635694#comment-14635694
 ] 

Hive QA commented on HIVE-11301:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746384/HIVE-11301.02.patch

{color:green}SUCCESS:{color} +1 9230 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4683/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4683/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4683/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746384 - PreCommit-HIVE-TRUNK-Build

 thrift metastore issue when getting stats results in disconnect
 ---

 Key: HIVE-11301
 URL: https://issues.apache.org/jira/browse/HIVE-11301
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Pengcheng Xiong
 Attachments: HIVE-11301.01.patch, HIVE-11301.02.patch


 On metastore side it looks like this:
 {noformat}
 2015-07-17 20:32:27,795 ERROR [pool-3-thread-150]: server.TThreadPoolServer 
 (TThreadPoolServer.java:run(294)) - Thrift error occurred during processing 
 of message.
 org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
 unset! Struct:AggrStats(colStats:null, partsFound:0)
 at 
 org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 and then
 {noformat}
 2015-07-17 20:32:27,796 WARN  [pool-3-thread-150]: 
 transport.TIOStreamTransport (TIOStreamTransport.java:close(112)) - Error 
 closing output stream.
 java.net.SocketException: Socket closed
 at 
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116)
 at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
 at 
 org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
 at org.apache.thrift.transport.TSocket.close(TSocket.java:196)
 at 
 org.apache.hadoop.hive.thrift.TFilterTransport.close(TFilterTransport.java:52)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:304)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Which on client manifests as
 {noformat}
 2015-07-17 20:32:27,796 WARN  [main()]: metastore.RetryingMetaStoreClient 
 (RetryingMetaStoreClient.java:invoke(187)) - MetaStoreClient lost

[jira] [Commented] (HIVE-11301) thrift metastore issue when getting stats results in disconnect


[ 
https://issues.apache.org/jira/browse/HIVE-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635698#comment-14635698
 ] 

Pengcheng Xiong commented on HIVE-11301:


[~sershe], could you please take a look? IMHO, I think the problem of invalid 
thrift can be handled in future, in a separate JIRA. Thanks.

 thrift metastore issue when getting stats results in disconnect
 ---

 Key: HIVE-11301
 URL: https://issues.apache.org/jira/browse/HIVE-11301
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Pengcheng Xiong
 Attachments: HIVE-11301.01.patch, HIVE-11301.02.patch


 On metastore side it looks like this:
 {noformat}
 2015-07-17 20:32:27,795 ERROR [pool-3-thread-150]: server.TThreadPoolServer 
 (TThreadPoolServer.java:run(294)) - Thrift error occurred during processing 
 of message.
 org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
 unset! Struct:AggrStats(colStats:null, partsFound:0)
 at 
 org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 and then
 {noformat}
 2015-07-17 20:32:27,796 WARN  [pool-3-thread-150]: 
 transport.TIOStreamTransport (TIOStreamTransport.java:close(112)) - Error 
 closing output stream.
 java.net.SocketException: Socket closed
 at 
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116)
 at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
 at 
 org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
 at org.apache.thrift.transport.TSocket.close(TSocket.java:196)
 at 
 org.apache.hadoop.hive.thrift.TFilterTransport.close(TFilterTransport.java:52)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:304)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Which on client manifests as
 {noformat}
 2015-07-17 20:32:27,796 WARN  [main()]: metastore.RetryingMetaStoreClient 
 (RetryingMetaStoreClient.java:invoke(187)) - MetaStoreClient lost connection. 
 Attempting to reconnect.
 org.apache.thrift.transport.TTransportException
 at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
 at 
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at

[jira] [Commented] (HIVE-11296) Merge from master to spark branch [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635578#comment-14635578
 ] 

Hive QA commented on HIVE-11296:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746164/HIVE-11296.1-spark.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7689 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_count_distinct
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/937/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/937/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-937/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746164 - PreCommit-HIVE-SPARK-Build

 Merge from master to spark branch [Spark Branch]
 

 Key: HIVE-11296
 URL: https://issues.apache.org/jira/browse/HIVE-11296
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chao Sun
Assignee: Chao Sun
 Attachments: HIVE-11296.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-07-21 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635548#comment-14635548
 ] 

Sushanth Sowmyan commented on HIVE-8678:


What storage format are you using for the table in question? (i.e. is it Text, 
RCFile, ORC, something else?)

 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan

 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11254) Process result sets returned by a stored procedure

2015-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635560#comment-14635560
 ] 

Dmitry Tolpeko commented on HIVE-11254:
---

This was the latest version available at maven repo. Actially I do not use 
hive-jdbc yet in tests, is it ok if I modify pom later (when tests requiring 
Hive connection will be added)?  

 Process result sets returned by a stored procedure
 --

 Key: HIVE-11254
 URL: https://issues.apache.org/jira/browse/HIVE-11254
 Project: Hive
  Issue Type: Improvement
  Components: hpl/sql
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Attachments: HIVE-11254.1.patch, HIVE-11254.2.patch, 
 HIVE-11254.3.patch, HIVE-11254.4.patch


 Stored procedure can return one or more result sets. A caller should be able 
 to process them.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()


 [ 
https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11316:
-
Attachment: HIVE-11316.3.patch

[~jcamachorodriguez] Can you please review this patch. The patch#3 also 
addresses the issue raised by [~eugene.koifman] in HIVE-11281.

Thanks
Hari

 Use datastructure that doesnt duplicate any part of string for 
 ASTNode::toStringTree()
 --

 Key: HIVE-11316
 URL: https://issues.apache.org/jira/browse/HIVE-11316
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11316.1.patch, HIVE-11316.2.patch, 
 HIVE-11316.3.patch


 HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira 
 is suppose to alter the string memoization to use a different data structure 
 that doesn't duplicate any part of the string so that we do not run into OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11305) LLAP: Hybrid Map-join cache returns invalid data


[ 
https://issues.apache.org/jira/browse/HIVE-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635576#comment-14635576
 ] 

Sergey Shelukhin commented on HIVE-11305:
-

also requires hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled 
to be true

 LLAP: Hybrid Map-join cache returns invalid data 
 -

 Key: HIVE-11305
 URL: https://issues.apache.org/jira/browse/HIVE-11305
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
 Environment: TPC-DS 200 scale data
Reporter: Gopal V
Assignee: Sergey Shelukhin
Priority: Critical
 Fix For: llap

 Attachments: q55-test.sql


 Start a 1-node LLAP cluster with 16 executors and run attached test-case on 
 the single node instance.
 {code}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer cannot be 
 cast to 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer
 at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.loadHashTable(VectorMapJoinCommonOperator.java:648)
 at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:314)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1104)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
 ... 17 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11301) thrift metastore issue when getting stats results in disconnect


[ 
https://issues.apache.org/jira/browse/HIVE-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635710#comment-14635710
 ] 

Sergey Shelukhin commented on HIVE-11301:
-

+1

 thrift metastore issue when getting stats results in disconnect
 ---

 Key: HIVE-11301
 URL: https://issues.apache.org/jira/browse/HIVE-11301
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Pengcheng Xiong
 Attachments: HIVE-11301.01.patch, HIVE-11301.02.patch


 On metastore side it looks like this:
 {noformat}
 2015-07-17 20:32:27,795 ERROR [pool-3-thread-150]: server.TThreadPoolServer 
 (TThreadPoolServer.java:run(294)) - Thrift error occurred during processing 
 of message.
 org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
 unset! Struct:AggrStats(colStats:null, partsFound:0)
 at 
 org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 and then
 {noformat}
 2015-07-17 20:32:27,796 WARN  [pool-3-thread-150]: 
 transport.TIOStreamTransport (TIOStreamTransport.java:close(112)) - Error 
 closing output stream.
 java.net.SocketException: Socket closed
 at 
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116)
 at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
 at 
 org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
 at org.apache.thrift.transport.TSocket.close(TSocket.java:196)
 at 
 org.apache.hadoop.hive.thrift.TFilterTransport.close(TFilterTransport.java:52)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:304)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Which on client manifests as
 {noformat}
 2015-07-17 20:32:27,796 WARN  [main()]: metastore.RetryingMetaStoreClient 
 (RetryingMetaStoreClient.java:invoke(187)) - MetaStoreClient lost connection. 
 Attempting to reconnect.
 org.apache.thrift.transport.TTransportException
 at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
 at 
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_aggr_stats_for(ThriftHiveMetastore.java:3029)
 at

[jira] [Commented] (HIVE-11305) LLAP: Hybrid Map-join cache returns invalid data


[ 
https://issues.apache.org/jira/browse/HIVE-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635835#comment-14635835
 ] 

Sergey Shelukhin commented on HIVE-11305:
-

hmm, no, it's a different cache and queryId

 LLAP: Hybrid Map-join cache returns invalid data 
 -

 Key: HIVE-11305
 URL: https://issues.apache.org/jira/browse/HIVE-11305
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
 Environment: TPC-DS 200 scale data
Reporter: Gopal V
Assignee: Sergey Shelukhin
Priority: Critical
 Fix For: llap

 Attachments: q55-test.sql


 Start a 1-node LLAP cluster with 16 executors and run attached test-case on 
 the single node instance.
 {code}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer cannot be 
 cast to 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer
 at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.loadHashTable(VectorMapJoinCommonOperator.java:648)
 at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:314)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1104)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
 ... 17 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11305) LLAP: Hybrid Map-join cache returns invalid data


[ 
https://issues.apache.org/jira/browse/HIVE-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635839#comment-14635839
 ] 

Sergey Shelukhin commented on HIVE-11305:
-

ah nm, stupid

 LLAP: Hybrid Map-join cache returns invalid data 
 -

 Key: HIVE-11305
 URL: https://issues.apache.org/jira/browse/HIVE-11305
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
 Environment: TPC-DS 200 scale data
Reporter: Gopal V
Assignee: Sergey Shelukhin
Priority: Critical
 Fix For: llap

 Attachments: q55-test.sql


 Start a 1-node LLAP cluster with 16 executors and run attached test-case on 
 the single node instance.
 {code}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer cannot be 
 cast to 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer
 at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.loadHashTable(VectorMapJoinCommonOperator.java:648)
 at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:314)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1104)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
 ... 17 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11294) Use HBase to cache aggregated stats

2015-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635866#comment-14635866
 ] 

Alan Gates commented on HIVE-11294:
---

Incorporated most of the feedback.  Responses a couple of the comments where I 
disagreed.

 Use HBase to cache aggregated stats
 ---

 Key: HIVE-11294
 URL: https://issues.apache.org/jira/browse/HIVE-11294
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: hbase-metastore-branch
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11294.patch


 Currently stats are cached only in the memory of the client.  Given that 
 HBase can easily manage the scale of caching aggregated stats we should be 
 using it to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11196) Utilities.getPartitionDesc() should try to reuse TableDesc object


 [ 
https://issues.apache.org/jira/browse/HIVE-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11196:
-
Attachment: HIVE-11196.3.patch

[~jcamachorodriguez] Can you please review this patch #3. 

Thanks
Hari

 Utilities.getPartitionDesc() should try to reuse TableDesc object 
 --

 Key: HIVE-11196
 URL: https://issues.apache.org/jira/browse/HIVE-11196
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11196.1.patch, HIVE-11196.2.patch, 
 HIVE-11196.3.patch


 Currently, Utilities.getPartitionDesc() creates a new PartitionDesc object 
 which inturn creates new TableDesc object via 
 Utilities.getTableDesc(part.getTable()) for every call. This value needs to 
 be reused  so that we can avoid the expense of creating new Descriptor object 
 wherever possible



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11330) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression

2015-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635830#comment-14635830
 ] 

Hive QA commented on HIVE-11330:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746390/HIVE-11330.patch

{color:green}SUCCESS:{color} +1 9241 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4684/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4684/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4684/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746390 - PreCommit-HIVE-TRUNK-Build

 Add early termination for recursion in 
 StatsRulesProcFactory$FilterStatsRule.evaluateExpression
 ---

 Key: HIVE-11330
 URL: https://issues.apache.org/jira/browse/HIVE-11330
 Project: Hive
  Issue Type: Bug
  Components: Hive, Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11330.patch


 Queries with heavily nested filters can cause a StackOverflowError
 {code}
 Exception in thread main java.lang.StackOverflowError
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:301)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
  
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11330) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression


[ 
https://issues.apache.org/jira/browse/HIVE-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635799#comment-14635799
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11330:
--

+1 pending tests.

 Add early termination for recursion in 
 StatsRulesProcFactory$FilterStatsRule.evaluateExpression
 ---

 Key: HIVE-11330
 URL: https://issues.apache.org/jira/browse/HIVE-11330
 Project: Hive
  Issue Type: Bug
  Components: Hive, Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11330.patch


 Queries with heavily nested filters can cause a StackOverflowError
 {code}
 Exception in thread main java.lang.StackOverflowError
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:301)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525)
  
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-07-21 Thread Michael McLellan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635822#comment-14635822
 ] 

Michael McLellan commented on HIVE-8678:


ORC

 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan

 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11305) LLAP: Hybrid Map-join cache returns invalid data


[ 
https://issues.apache.org/jira/browse/HIVE-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635781#comment-14635781
 ] 

Sergey Shelukhin commented on HIVE-11305:
-

looks like mapjoin cache is being reused between queries. There's either a bug 
in recent patch, or some fundamental issue (i.e. if query ID is duplicated 
between queries, I am not sure how it ever worked)

 LLAP: Hybrid Map-join cache returns invalid data 
 -

 Key: HIVE-11305
 URL: https://issues.apache.org/jira/browse/HIVE-11305
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
 Environment: TPC-DS 200 scale data
Reporter: Gopal V
Assignee: Sergey Shelukhin
Priority: Critical
 Fix For: llap

 Attachments: q55-test.sql


 Start a 1-node LLAP cluster with 16 executors and run attached test-case on 
 the single node instance.
 {code}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer cannot be 
 cast to 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer
 at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.loadHashTable(VectorMapJoinCommonOperator.java:648)
 at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:314)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1104)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
 ... 17 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez

2015-07-21 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-10673:
--
Release Note: This adds configuration parameter 
hive.optimize.dynamic.partition.hashjoin, which enables selection of the 
dynamically partitioned hash join with the Tez execution engine

 Dynamically partitioned hash join for Tez
 -

 Key: HIVE-10673
 URL: https://issues.apache.org/jira/browse/HIVE-10673
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning, Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10673.1.patch, HIVE-10673.10.patch, 
 HIVE-10673.11.patch, HIVE-10673.12, HIVE-10673.2.patch, HIVE-10673.3.patch, 
 HIVE-10673.4.patch, HIVE-10673.5.patch, HIVE-10673.6.patch, 
 HIVE-10673.7.patch, HIVE-10673.8.patch, HIVE-10673.9.patch


 Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 
 2/3 of the CPU was spent during sorting/merging.
 While this does not work for MR, for other execution engines (such as Tez), 
 it is possible to create a reduce-side join that uses unsorted inputs in 
 order to eliminate the sorting, which may be faster than a shuffle join. To 
 join on unsorted inputs, we can use the hash join algorithm to perform the 
 join in the reducer. This will require the small tables in the join to fit in 
 the reducer/hash table for this to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11254) Process result sets returned by a stored procedure


[ 
https://issues.apache.org/jira/browse/HIVE-11254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635932#comment-14635932
 ] 

Alan Gates commented on HIVE-11254:
---

I think changing the pom entry to:
{code}
dependency
  groupIdorg.apache.hive/groupId
  artifactIdhive-jdbc/artifactId
  version${project.version}/version
  scopetest/scope
/dependency 
{code}
will do what you want.  But if you want to drop the pom.xml changes that's fine 
too.  

Other than this I'm +1 on the patch.

 Process result sets returned by a stored procedure
 --

 Key: HIVE-11254
 URL: https://issues.apache.org/jira/browse/HIVE-11254
 Project: Hive
  Issue Type: Improvement
  Components: hpl/sql
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Attachments: HIVE-11254.1.patch, HIVE-11254.2.patch, 
 HIVE-11254.3.patch, HIVE-11254.4.patch


 Stored procedure can return one or more result sets. A caller should be able 
 to process them.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11253) Move SearchArgument and VectorizedRowBatch classes to storage-api.


[ 
https://issues.apache.org/jira/browse/HIVE-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635951#comment-14635951
 ] 

Prasanth Jayachandran commented on HIVE-11253:
--

LGTM, +1

 Move SearchArgument and VectorizedRowBatch classes to storage-api.
 --

 Key: HIVE-11253
 URL: https://issues.apache.org/jira/browse/HIVE-11253
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.0.0

 Attachments: HIVE-11253.patch, HIVE-11253.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11294) Use HBase to cache aggregated stats


 [ 
https://issues.apache.org/jira/browse/HIVE-11294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-11294:
--
Attachment: HIVE-11294.2.patch

New patch that incorporates changes based on Thejas' feedback.

 Use HBase to cache aggregated stats
 ---

 Key: HIVE-11294
 URL: https://issues.apache.org/jira/browse/HIVE-11294
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: hbase-metastore-branch
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11294.2.patch, HIVE-11294.patch


 Currently stats are cached only in the memory of the client.  Given that 
 HBase can easily manage the scale of caching aggregated stats we should be 
 using it to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-11318) Move ORC table properties from OrcFile to OrcOutputFormat

2015-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-11318.
--
Resolution: Duplicate

 Move ORC table properties from OrcFile to OrcOutputFormat
 -

 Key: HIVE-11318
 URL: https://issues.apache.org/jira/browse/HIVE-11318
 Project: Hive
  Issue Type: Sub-task
Reporter: Prasanth Jayachandran
 Fix For: 2.0.0


 OrcFile contains TableProperties which can be moved to OrcOutputFormat. Also 
 remove deprecated configs that are no longer used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()


[ 
https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635983#comment-14635983
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11316:
--

The failures on patch#3 are unrelated to the changes.

 Use datastructure that doesnt duplicate any part of string for 
 ASTNode::toStringTree()
 --

 Key: HIVE-11316
 URL: https://issues.apache.org/jira/browse/HIVE-11316
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11316.1.patch, HIVE-11316.2.patch, 
 HIVE-11316.3.patch


 HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira 
 is suppose to alter the string memoization to use a different data structure 
 that doesn't duplicate any part of the string so that we do not run into OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11321) Move OrcFile.OrcTableProperties from OrcFile into OrcConf.

2015-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635988#comment-14635988
 ] 

Owen O'Malley commented on HIVE-11321:
--

So the path looks like:
* table properties using the orc.* name
* config using the orc.* name
* config using the current hive.exec.ql.orc.* name
* default

 Move OrcFile.OrcTableProperties from OrcFile into OrcConf.
 --

 Key: HIVE-11321
 URL: https://issues.apache.org/jira/browse/HIVE-11321
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.0.0

 Attachments: HIVE-11321.patch


 We should pull all of the configuration/table property knobs into a single 
 list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity


 [ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7723:

Attachment: HIVE-7723.12.patch

 Explain plan for complex query with lots of partitions is slow due to 
 in-efficient collection used to find a matching ReadEntity
 

 Key: HIVE-7723
 URL: https://issues.apache.org/jira/browse/HIVE-7723
 Project: Hive
  Issue Type: Bug
  Components: CLI, Physical Optimizer
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, 
 HIVE-7723.11.patch, HIVE-7723.11.patch, HIVE-7723.12.patch, 
 HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch, 
 HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, HIVE-7723.9.patch


 Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
 showed that ReadEntity.equals is taking ~40% of the CPU.
 ReadEntity.equals is called from the snippet below.
 Again and again the set is iterated over to get the actual match, a HashMap 
 is a better option for this case as Set doesn't have a Get method.
 Also for ReadEntity equals is case-insensitive while hash is , which is an 
 undesired behavior.
 {code}
 public static ReadEntity addInput(SetReadEntity inputs, ReadEntity 
 newInput) {
 // If the input is already present, make sure the new parent is added to 
 the input.
 if (inputs.contains(newInput)) {
   for (ReadEntity input : inputs) {
 if (input.equals(newInput)) {
   if ((newInput.getParents() != null)  
 (!newInput.getParents().isEmpty())) {
 input.getParents().addAll(newInput.getParents());
 input.setDirect(input.isDirect() || newInput.isDirect());
   }
   return input;
 }
   }
   assert false;
 } else {
   inputs.add(newInput);
   return newInput;
 }
 // make compile happy
 return null;
   }
 {code}
 This is the query used : 
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON

[jira] [Commented] (HIVE-11172) Vectorization wrong results for aggregate query with where clause without group by

2015-07-21 Thread Mostafa Mokhtar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636001#comment-14636001
 ] 

Mostafa Mokhtar commented on HIVE-11172:


[~hsubramaniyan]
Is this getting back ported?

 Vectorization wrong results for aggregate query with where clause without 
 group by
 --

 Key: HIVE-11172
 URL: https://issues.apache.org/jira/browse/HIVE-11172
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.14.0
Reporter: Yi Zhang
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Fix For: 2.0.0

 Attachments: HIVE-11172.1.patch, HIVE-11172.2.patch, 
 HIVE-11172.3.patch


 create table testvec(id int, dt int, greg_dt string) stored as orc;
 insert into table testvec
 values 
 (1,20150330, '2015-03-30'),
 (2,20150301, '2015-03-01'),
 (3,20150502, '2015-05-02'),
 (4,20150401, '2015-04-01'),
 (5,20150313, '2015-03-13'),
 (6,20150314, '2015-03-14'),
 (7,20150404, '2015-04-04');
 hive select dt, greg_dt from testvec where id=5;
 OK
 20150313  2015-03-13
 Time taken: 4.435 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true;
 hive set hive.map.aggr;
 hive.map.aggr=true
 hive select max(dt), max(greg_dt) from testvec where id=5;
 OK
 20150313  2015-03-30
 hive set hive.vectorized.execution.enabled=false;
 hive  select max(dt), max(greg_dt) from testvec where id=5;
 OK
 20150313  2015-03-13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11259) LLAP: clean up ORC dependencies part 1


[ 
https://issues.apache.org/jira/browse/HIVE-11259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635943#comment-14635943
 ] 

Prasanth Jayachandran commented on HIVE-11259:
--

Moving interfaces to common looks good to me.
Some comments related to simplifying the code
1) There are many Chunk classes that inherit from DiskRange. BufferChunk, 
CacheChunk, ProcCacheChunk, TrackedCacheChunk, UncompressedCacheChunk, 
DiskRange, DiskRangeList. I am not sure if we need all of them. I see the 
purpose of BufferChunk, CacheChunk and DiskRange. But others seems to be an 
overkill. Can you move the flags/variables (isCompressed, isReleased etc.) to 
CacheChunk?
2) Can the DiskRangeList purpose can be replaced by TreeMap? Using offset as 
the key comparator, lowerKey(), higherKey() methods for merging in case of any 
overlap.
3) Replace BooleanRef with Boolean?

 LLAP: clean up ORC dependencies part 1
 --

 Key: HIVE-11259
 URL: https://issues.apache.org/jira/browse/HIVE-11259
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11259.patch


 Before there's storage handler module, we can clean some things up



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11259) LLAP: clean up ORC dependencies part 1


[ 
https://issues.apache.org/jira/browse/HIVE-11259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635948#comment-14635948
 ] 

Sergey Shelukhin commented on HIVE-11259:
-

1) These classes existed before, I will see if they can be merged. 
2) I don't think removing intrinsic linked list is a good idea, there are 
problems with java LinkedList - it's hard to keep pointer to an element, as 
soon as list is modified all the iterators are invalidated. So for example 
reading multiple columns RG by RG, keeping the pointer to end of last RG (where 
nexr RG, that may be in separate buffer due to SARG filtering, starts) becomes 
impossible - as soon as one column read replaces buffer chunk with cache chunk, 
pointers for all other columns become invalid. TreeMap will have the same 
problem as far as I can tell. It's really not that complicated to have a linked 
list, if Java was a real programming language we could even make it an aspect 
sort of thing via multiple inheritance :)
3) Ok

 LLAP: clean up ORC dependencies part 1
 --

 Key: HIVE-11259
 URL: https://issues.apache.org/jira/browse/HIVE-11259
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11259.patch


 Before there's storage handler module, we can clean some things up



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-11259) LLAP: clean up ORC dependencies part 1


[ 
https://issues.apache.org/jira/browse/HIVE-11259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635948#comment-14635948
 ] 

Sergey Shelukhin edited comment on HIVE-11259 at 7/21/15 10:33 PM:
---

1) These classes existed before, I will see if they can be merged. 
2) I don't think removing intrinsic linked list is a good idea, there are 
problems with java LinkedList - it's hard to keep pointer to an element and 
modify it in place in general; also, as soon as list is modified all the 
iterators are invalidated; so, for example, reading multiple columns RG by RG, 
keeping the pointer to end of last RG (where nexr RG, that may be in separate 
buffer due to SARG filtering, starts) becomes impossible - as soon as one 
column read replaces buffer chunk with cache chunk, iterators for all other 
columns become invalid. TreeMap will have the same problem as far as I can 
tell. It's really not that complicated to have a linked list, if Java was a 
real programming language we could even make it an aspect sort of thing via 
multiple inheritance :)
3) Ok


was (Author: sershe):
1) These classes existed before, I will see if they can be merged. 
2) I don't think removing intrinsic linked list is a good idea, there are 
problems with java LinkedList - it's hard to keep pointer to an element, as 
soon as list is modified all the iterators are invalidated. So for example 
reading multiple columns RG by RG, keeping the pointer to end of last RG (where 
nexr RG, that may be in separate buffer due to SARG filtering, starts) becomes 
impossible - as soon as one column read replaces buffer chunk with cache chunk, 
pointers for all other columns become invalid. TreeMap will have the same 
problem as far as I can tell. It's really not that complicated to have a linked 
list, if Java was a real programming language we could even make it an aspect 
sort of thing via multiple inheritance :)
3) Ok

 LLAP: clean up ORC dependencies part 1
 --

 Key: HIVE-11259
 URL: https://issues.apache.org/jira/browse/HIVE-11259
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11259.patch


 Before there's storage handler module, we can clean some things up



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-11253) Move SearchArgument and VectorizedRowBatch classes to storage-api.


[ 
https://issues.apache.org/jira/browse/HIVE-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635951#comment-14635951
 ] 

Prasanth Jayachandran edited comment on HIVE-11253 at 7/21/15 10:35 PM:


LGTM, +1. Pending tests.


was (Author: prasanth_j):
LGTM, +1

 Move SearchArgument and VectorizedRowBatch classes to storage-api.
 --

 Key: HIVE-11253
 URL: https://issues.apache.org/jira/browse/HIVE-11253
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.0.0

 Attachments: HIVE-11253.patch, HIVE-11253.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11321) Move OrcFile.OrcTableProperties from OrcFile into OrcConf.


 [ 
https://issues.apache.org/jira/browse/HIVE-11321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-11321:
-
Attachment: HIVE-11321.patch

This patch pulls all of the configuration knobs into OrcConf including all of 
the table properties.

 Move OrcFile.OrcTableProperties from OrcFile into OrcConf.
 --

 Key: HIVE-11321
 URL: https://issues.apache.org/jira/browse/HIVE-11321
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.0.0

 Attachments: HIVE-11321.patch


 We should pull all of the configuration/table property knobs into a single 
 list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11321) Move OrcFile.OrcTableProperties from OrcFile into OrcConf.


 [ 
https://issues.apache.org/jira/browse/HIVE-11321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-11321:
-
Attachment: (was: HIVE-11321.patch)

 Move OrcFile.OrcTableProperties from OrcFile into OrcConf.
 --

 Key: HIVE-11321
 URL: https://issues.apache.org/jira/browse/HIVE-11321
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.0.0


 We should pull all of the configuration/table property knobs into a single 
 list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11321) Move OrcFile.OrcTableProperties from OrcFile into OrcConf.


 [ 
https://issues.apache.org/jira/browse/HIVE-11321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-11321:
-
Attachment: HIVE-11321.patch

Sorry, I uploaded the wrong version of the patch.

 Move OrcFile.OrcTableProperties from OrcFile into OrcConf.
 --

 Key: HIVE-11321
 URL: https://issues.apache.org/jira/browse/HIVE-11321
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.0.0

 Attachments: HIVE-11321.patch


 We should pull all of the configuration/table property knobs into a single 
 list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11305) LLAP: Hybrid Map-join cache returns invalid data


 [ 
https://issues.apache.org/jira/browse/HIVE-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11305:

Attachment: HIVE-11305.patch

 LLAP: Hybrid Map-join cache returns invalid data 
 -

 Key: HIVE-11305
 URL: https://issues.apache.org/jira/browse/HIVE-11305
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
 Environment: TPC-DS 200 scale data
Reporter: Gopal V
Assignee: Sergey Shelukhin
Priority: Critical
 Fix For: llap

 Attachments: HIVE-11305.patch, q55-test.sql


 Start a 1-node LLAP cluster with 16 executors and run attached test-case on 
 the single node instance.
 {code}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer cannot be 
 cast to 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer
 at 
 org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.loadHashTable(VectorMapJoinCommonOperator.java:648)
 at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:314)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1104)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
 ... 17 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()


[ 
https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635982#comment-14635982
 ] 

Hive QA commented on HIVE-11316:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746398/HIVE-11316.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9245 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4685/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4685/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4685/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746398 - PreCommit-HIVE-TRUNK-Build

 Use datastructure that doesnt duplicate any part of string for 
 ASTNode::toStringTree()
 --

 Key: HIVE-11316
 URL: https://issues.apache.org/jira/browse/HIVE-11316
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11316.1.patch, HIVE-11316.2.patch, 
 HIVE-11316.3.patch


 HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira 
 is suppose to alter the string memoization to use a different data structure 
 that doesn't duplicate any part of the string so that we do not run into OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11332) Unicode table comments do not work

2015-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11332:

Description: 
Noticed by accident.
{noformat}
select ' ', count(*) from moo;
Query ID = sershe_20150721190413_979e1b6f-86d6-436f-b8e6-d6785b9d3b83
Total jobs = 1
Launching Job 1 out of 1

[snip]
OK
   0
Time taken: 13.347 seconds, Fetched: 1 row(s)
hive ALTER TABLE moo SET TBLPROPERTIES ('comment' = ' ');
OK
Time taken: 0.292 seconds
hive desc extended moo;
OK
i   int 
 
Detailed Table Information  Table(tableName:moo, dbName:default, 
owner:sershe, createTime:1437519787, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:i, type:int, comment:null)], 
location:hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/moo, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{last_modified_time=1437519883, totalSize=0, 
numRows=-1, rawDataSize=-1, COLUMN_STATS_ACCURATE=false, numFiles=0, 
transient_lastDdlTime=1437519883, comment=?? , last_modified_by=sershe}, 
viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) 
Time taken: 0.347 seconds, Fetched: 3 row(s)
{noformat}

  was:
Noticed by accident.
{noformat}
select ' ', count(*) from moo;
Query ID = sershe_20150721190413_979e1b6f-86d6-436f-b8e6-d6785b9d3b83
Total jobs = 1
Launching Job 1 out of 1

[snip]
OK
   0
Time taken: 13.347 seconds, Fetched: 1 row(s)
hive ALTER TABLE moo SET TBLPROPERTIES ('comment' = '' ');
OK
Time taken: 0.292 seconds
hive desc extended moo;
OK
i   int 
 
Detailed Table Information  Table(tableName:moo, dbName:default, 
owner:sershe, createTime:1437519787, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:i, type:int, comment:null)], 
location:hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/moo, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{last_modified_time=1437519883, totalSize=0, 
numRows=-1, rawDataSize=-1, COLUMN_STATS_ACCURATE=false, numFiles=0, 
transient_lastDdlTime=1437519883, comment=?? , last_modified_by=sershe}, 
viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) 
Time taken: 0.347 seconds, Fetched: 3 row(s)
{noformat}


 Unicode table comments do not work
 --

 Key: HIVE-11332
 URL: https://issues.apache.org/jira/browse/HIVE-11332
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin

 Noticed by accident.
 {noformat}
 select ' ', count(*) from moo;
 Query ID = sershe_20150721190413_979e1b6f-86d6-436f-b8e6-d6785b9d3b83
 Total jobs = 1
 Launching Job 1 out of 1
 [snip]
 OK
  0
 Time taken: 13.347 seconds, Fetched: 1 row(s)
 hive ALTER TABLE moo SET TBLPROPERTIES ('comment' = ' ');
 OK
 Time taken: 0.292 seconds
 hive desc extended moo;
 OK
 i int 

 Detailed Table InformationTable(tableName:moo, dbName:default, 
 owner:sershe, createTime:1437519787, lastAccessTime:0, retention:0, 
 sd:StorageDescriptor(cols:[FieldSchema(name:i, type:int, comment:null)], 
 location:hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/moo, 
 inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
 outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
 compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
 serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
 parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
 parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
 skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
 partitionKeys:[], parameters:{last_modified_time=1437519883, totalSize=0, 
 numRows=-1, rawDataSize=-1, COLUMN_STATS_ACCURATE=false, numFiles=0,

[jira] [Commented] (HIVE-11172) Vectorization wrong results for aggregate query with where clause without group by


[ 
https://issues.apache.org/jira/browse/HIVE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636005#comment-14636005
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11172:
--

Yes, will be a good candidate for backporting to 1.2.1. [~sushanth] what do you 
think.

Thanks
Hari

 Vectorization wrong results for aggregate query with where clause without 
 group by
 --

 Key: HIVE-11172
 URL: https://issues.apache.org/jira/browse/HIVE-11172
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.14.0
Reporter: Yi Zhang
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Fix For: 2.0.0

 Attachments: HIVE-11172.1.patch, HIVE-11172.2.patch, 
 HIVE-11172.3.patch


 create table testvec(id int, dt int, greg_dt string) stored as orc;
 insert into table testvec
 values 
 (1,20150330, '2015-03-30'),
 (2,20150301, '2015-03-01'),
 (3,20150502, '2015-05-02'),
 (4,20150401, '2015-04-01'),
 (5,20150313, '2015-03-13'),
 (6,20150314, '2015-03-14'),
 (7,20150404, '2015-04-04');
 hive select dt, greg_dt from testvec where id=5;
 OK
 20150313  2015-03-13
 Time taken: 4.435 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true;
 hive set hive.map.aggr;
 hive.map.aggr=true
 hive select max(dt), max(greg_dt) from testvec where id=5;
 OK
 20150313  2015-03-30
 hive set hive.vectorized.execution.enabled=false;
 hive  select max(dt), max(greg_dt) from testvec where id=5;
 OK
 20150313  2015-03-13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11172) Vectorization wrong results for aggregate query with where clause without group by

2015-07-21 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636010#comment-14636010
 ] 

Sushanth Sowmyan commented on HIVE-11172:
-

Incorrect results makes it a good candidate for a backport to branch-1.2.

Pedantic note : 1.2.1 has already shipped. This would go in 1.2.2, please set 
fix version appropriately after committing.

 Vectorization wrong results for aggregate query with where clause without 
 group by
 --

 Key: HIVE-11172
 URL: https://issues.apache.org/jira/browse/HIVE-11172
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.14.0
Reporter: Yi Zhang
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Fix For: 2.0.0

 Attachments: HIVE-11172.1.patch, HIVE-11172.2.patch, 
 HIVE-11172.3.patch


 create table testvec(id int, dt int, greg_dt string) stored as orc;
 insert into table testvec
 values 
 (1,20150330, '2015-03-30'),
 (2,20150301, '2015-03-01'),
 (3,20150502, '2015-05-02'),
 (4,20150401, '2015-04-01'),
 (5,20150313, '2015-03-13'),
 (6,20150314, '2015-03-14'),
 (7,20150404, '2015-04-04');
 hive select dt, greg_dt from testvec where id=5;
 OK
 20150313  2015-03-13
 Time taken: 4.435 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true;
 hive set hive.map.aggr;
 hive.map.aggr=true
 hive select max(dt), max(greg_dt) from testvec where id=5;
 OK
 20150313  2015-03-30
 hive set hive.vectorized.execution.enabled=false;
 hive  select max(dt), max(greg_dt) from testvec where id=5;
 OK
 20150313  2015-03-13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11172) Vectorization wrong results for aggregate query with where clause without group by

2015-07-21 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11172:

Fix Version/s: 1.3.0

 Vectorization wrong results for aggregate query with where clause without 
 group by
 --

 Key: HIVE-11172
 URL: https://issues.apache.org/jira/browse/HIVE-11172
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.14.0
Reporter: Yi Zhang
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11172.1.patch, HIVE-11172.2.patch, 
 HIVE-11172.3.patch


 create table testvec(id int, dt int, greg_dt string) stored as orc;
 insert into table testvec
 values 
 (1,20150330, '2015-03-30'),
 (2,20150301, '2015-03-01'),
 (3,20150502, '2015-05-02'),
 (4,20150401, '2015-04-01'),
 (5,20150313, '2015-03-13'),
 (6,20150314, '2015-03-14'),
 (7,20150404, '2015-04-04');
 hive select dt, greg_dt from testvec where id=5;
 OK
 20150313  2015-03-13
 Time taken: 4.435 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true;
 hive set hive.map.aggr;
 hive.map.aggr=true
 hive select max(dt), max(greg_dt) from testvec where id=5;
 OK
 20150313  2015-03-30
 hive set hive.vectorized.execution.enabled=false;
 hive  select max(dt), max(greg_dt) from testvec where id=5;
 OK
 20150313  2015-03-13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11209) Clean up dependencies in HiveDecimalWritable

2015-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636017#comment-14636017
 ] 

Prasanth Jayachandran commented on HIVE-11209:
--

Vint object is no longer reusable with this patch. Vint allocation in inner 
loop will hit performance right?

 Clean up dependencies in HiveDecimalWritable
 

 Key: HIVE-11209
 URL: https://issues.apache.org/jira/browse/HIVE-11209
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 2.0.0

 Attachments: HIVE-11209.patch, HIVE-11209.patch, HIVE-11209.patch, 
 HIVE-11209.patch


 Currently HiveDecimalWritable depends on:
 * org.apache.hadoop.hive.serde2.ByteStream
 * org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils
 * org.apache.hadoop.hive.serde2.typeinfo.HiveDecimalUtils
 since we need HiveDecimalWritable for the decimal VectorizedColumnBatch, 
 breaking these dependencies will improve things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11152) Swapping join inputs in ASTConverter

2015-07-21 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636039#comment-14636039
 ] 

Jason Dere commented on HIVE-11152:
---

FYI, this was listed as being fixed on 1.2.2, but I do not see any such commit 
on either branch-1 or branch-1.2.


 Swapping join inputs in ASTConverter
 

 Key: HIVE-11152
 URL: https://issues.apache.org/jira/browse/HIVE-11152
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 2.0.0

 Attachments: HIVE-11152.02.patch, HIVE-11152.patch


 We want that multijoin optimization in SemanticAnalyzer always kicks in when 
 CBO is enabled (if possible). For that, we may need to swap the join inputs 
 when we return from CBO through the Hive AST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()


 [ 
https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11316:
-
Attachment: HIVE-11316.2.patch

 Use datastructure that doesnt duplicate any part of string for 
 ASTNode::toStringTree()
 --

 Key: HIVE-11316
 URL: https://issues.apache.org/jira/browse/HIVE-11316
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11316.1.patch, HIVE-11316.2.patch


 HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira 
 is suppose to alter the string memoization to use a different data structure 
 that doesn't duplicate any part of the string so that we do not run into OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)


[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634593#comment-14634593
 ] 

wangchangchun commented on HIVE-11055:
--

Hello, I download  Hive sourcecode of  the latest version in July 17。I want to 
test PL/HQL function 。
I built  Hive package, and beeline hive is ok Now。 But hplsql can not use。
ERROR is like this:
./hplsql -e select * from hive_tables
Unhandled exception in PL/HQL
java.lang.Exception: Unknown connection profile: null
at org.apache.hive.hplsql.Conn.getConnection(Conn.java:127)
at org.apache.hive.hplsql.Conn.executeQuery(Conn.java:55)
at org.apache.hive.hplsql.Exec.executeQuery(Exec.java:412)
at org.apache.hive.hplsql.Exec.executeQuery(Exec.java:421)
at org.apache.hive.hplsql.Select.select(Select.java:73)

And I can not find hplsql-site.xml in the package.
Can you tell me where the problem is ?

 HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
 ---

 Key: HIVE-11055
 URL: https://issues.apache.org/jira/browse/HIVE-11055
 Project: Hive
  Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Fix For: 2.0.0

 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
 HIVE-11055.3.patch, HIVE-11055.4.patch


 There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
 (actually any SQL-on-Hadoop implementation and any JDBC source).
 Alan Gates offered to contribute it to Hive under HPL/SQL name 
 (org.apache.hive.hplsql package). This JIRA is to create a patch to 
 contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x


 [ 
https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11304:
-
Attachment: HIVE-11304.patch

This is WIP patch to trigger test run with new log4j2 properties. 

 Migrate to Log4j2 from Log4j 1.x
 

 Key: HIVE-11304
 URL: https://issues.apache.org/jira/browse/HIVE-11304
 Project: Hive
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11304.patch


 Log4J2 has some great benefits and can benefit hive significantly. Some 
 notable features include
 1) Performance (parametrized logging, performance when logging is disabled 
 etc.) More details can be found here 
 https://logging.apache.org/log4j/2.x/performance.html
 2) RoutingAppender - Route logs to different log files based on MDC context 
 (useful for HS2, LLAP etc.)
 3) Asynchronous logging
 This is an umbrella jira to track changes related to Log4j2 migration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11303) Getting Tez LimitExceededException after dag execution on large query


 [ 
https://issues.apache.org/jira/browse/HIVE-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11303:
---
Affects Version/s: 2.0.0

 Getting Tez LimitExceededException after dag execution on large query
 -

 Key: HIVE-11303
 URL: https://issues.apache.org/jira/browse/HIVE-11303
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.2.0, 1.3.0, 2.0.0
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11303.1.patch


 {noformat}
 2015-07-17 18:18:11,830 INFO  [main]: counters.Limits 
 (Limits.java:ensureInitialized(59)) - Counter limits initialized with 
 parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, 
 MAX_COUNTERS=1200
 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - 
 Failed to execute tez graph.
 org.apache.tez.common.counters.LimitExceededException: Too many counters: 
 1201 max=1200
 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87)
 at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104)
 at 
 org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567)
 at 
 org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)


[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634905#comment-14634905
 ] 

Dmitry Tolpeko commented on HIVE-11055:
---

Note that hive/hplsql/src/main/resources/hplsql-site.xml file appears after you 
apply HIVE-11254 patch.

 HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
 ---

 Key: HIVE-11055
 URL: https://issues.apache.org/jira/browse/HIVE-11055
 Project: Hive
  Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Fix For: 2.0.0

 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
 HIVE-11055.3.patch, HIVE-11055.4.patch


 There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
 (actually any SQL-on-Hadoop implementation and any JDBC source).
 Alan Gates offered to contribute it to Hive under HPL/SQL name 
 (org.apache.hive.hplsql package). This JIRA is to create a patch to 
 contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11303) Getting Tez LimitExceededException after dag execution on large query


[ 
https://issues.apache.org/jira/browse/HIVE-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634624#comment-14634624
 ] 

Hive QA commented on HIVE-11303:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746194/HIVE-11303.1.patch

{color:green}SUCCESS:{color} +1 9228 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4675/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4675/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4675/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746194 - PreCommit-HIVE-TRUNK-Build

 Getting Tez LimitExceededException after dag execution on large query
 -

 Key: HIVE-11303
 URL: https://issues.apache.org/jira/browse/HIVE-11303
 Project: Hive
  Issue Type: Bug
  Components: Tez
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11303.1.patch


 {noformat}
 2015-07-17 18:18:11,830 INFO  [main]: counters.Limits 
 (Limits.java:ensureInitialized(59)) - Counter limits initialized with 
 parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, 
 MAX_COUNTERS=1200
 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - 
 Failed to execute tez graph.
 org.apache.tez.common.counters.LimitExceededException: Too many counters: 
 1201 max=1200
 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87)
 at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104)
 at 
 org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567)
 at 
 org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)


[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634988#comment-14634988
 ] 

wangchangchun commented on HIVE-11055:
--

sorry, I can not find it in hive/hplsql/src/main/resources

 HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
 ---

 Key: HIVE-11055
 URL: https://issues.apache.org/jira/browse/HIVE-11055
 Project: Hive
  Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Fix For: 2.0.0

 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
 HIVE-11055.3.patch, HIVE-11055.4.patch


 There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
 (actually any SQL-on-Hadoop implementation and any JDBC source).
 Alan Gates offered to contribute it to Hive under HPL/SQL name 
 (org.apache.hive.hplsql package). This JIRA is to create a patch to 
 contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x


[ 
https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634997#comment-14634997
 ] 

Hive QA commented on HIVE-11304:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746277/HIVE-11304.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9231 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.testNegativeCliDriver_case_with_row_sequence
org.apache.hive.hplsql.TestHplsqlLocal.testException
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4678/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4678/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4678/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746277 - PreCommit-HIVE-TRUNK-Build

 Migrate to Log4j2 from Log4j 1.x
 

 Key: HIVE-11304
 URL: https://issues.apache.org/jira/browse/HIVE-11304
 Project: Hive
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11304.patch


 Log4J2 has some great benefits and can benefit hive significantly. Some 
 notable features include
 1) Performance (parametrized logging, performance when logging is disabled 
 etc.) More details can be found here 
 https://logging.apache.org/log4j/2.x/performance.html
 2) RoutingAppender - Route logs to different log files based on MDC context 
 (useful for HS2, LLAP etc.)
 3) Asynchronous logging
 This is an umbrella jira to track changes related to Log4j2 migration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)


 [ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Tolpeko updated HIVE-11055:
--
Attachment: hplsql-site.xml

 HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
 ---

 Key: HIVE-11055
 URL: https://issues.apache.org/jira/browse/HIVE-11055
 Project: Hive
  Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Fix For: 2.0.0

 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
 HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml


 There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
 (actually any SQL-on-Hadoop implementation and any JDBC source).
 Alan Gates offered to contribute it to Hive under HPL/SQL name 
 (org.apache.hive.hplsql package). This JIRA is to create a patch to 
 contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)


[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635014#comment-14635014
 ] 

wangchangchun commented on HIVE-11055:
--

I put an hplsql-site.xml in conf dir. The content is from HIVE-11254.4.patch
./hplsql -e select * from hive_tables;
Unhandled exception in PL/HQL
java.lang.Exception: Unknown connection profile: hiveconn
at org.apache.hive.hplsql.Conn.getConnection(Conn.java:127)
at org.apache.hive.hplsql.Conn.executeQuery(Conn.java:55)
at org.apache.hive.hplsql.Exec.executeQuery(Exec.java:412)
at org.apache.hive.hplsql.Exec.executeQuery(Exec.java:421)
at org.apache.hive.hplsql.Select.select(Select.java:73)

```
configuration
property
  namehplsql.conn.default/name
  valuehiveconn/value
  descriptionThe default connection profile/description
/property
property
  namehiveconn/name
  valueorg.apache.hive.jdbc.HiveDriver;jdbc:hive2:///value
  descriptionHiveServer2 JDBC connection (embedded mode)/description
/property
property
  namehplsql.conn.init.hiveconn/name
  value
 set mapred.job.queue.name=default;
 set hive.execution.engine=mr;
 use default;
  /value
  descriptionStatements for execute after connection to the 
database/description
/property
```

 HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
 ---

 Key: HIVE-11055
 URL: https://issues.apache.org/jira/browse/HIVE-11055
 Project: Hive
  Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Fix For: 2.0.0

 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
 HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml


 There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
 (actually any SQL-on-Hadoop implementation and any JDBC source).
 Alan Gates offered to contribute it to Hive under HPL/SQL name 
 (org.apache.hive.hplsql package). This JIRA is to create a patch to 
 contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)


[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634989#comment-14634989
 ] 

wangchangchun commented on HIVE-11055:
--

sorry, I can not find it in hive/hplsql/src/main/resources

 HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
 ---

 Key: HIVE-11055
 URL: https://issues.apache.org/jira/browse/HIVE-11055
 Project: Hive
  Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Fix For: 2.0.0

 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
 HIVE-11055.3.patch, HIVE-11055.4.patch


 There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
 (actually any SQL-on-Hadoop implementation and any JDBC source).
 Alan Gates offered to contribute it to Hive under HPL/SQL name 
 (org.apache.hive.hplsql package). This JIRA is to create a patch to 
 contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)


[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635013#comment-14635013
 ] 

Dmitry Tolpeko commented on HIVE-11055:
---

I attached it to the JIRA.

 HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
 ---

 Key: HIVE-11055
 URL: https://issues.apache.org/jira/browse/HIVE-11055
 Project: Hive
  Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Fix For: 2.0.0

 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
 HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml


 There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
 (actually any SQL-on-Hadoop implementation and any JDBC source).
 Alan Gates offered to contribute it to Hive under HPL/SQL name 
 (org.apache.hive.hplsql package). This JIRA is to create a patch to 
 contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8128) Improve Parquet Vectorization

2015-07-21 Thread Dong Chen (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634790#comment-14634790
]

Dong Chen commented on HIVE-8128:
-

Patch V6 updated. Review board: https://reviews.apache.org/r/36540/

The patch depends on the new Parquet vector API at
https://github.com/nezihyigitbasi-nflx/parquet-mr/commits/vector

In this POC, the general workflow was done, two tests passed, and INT type was
supported. The idea is that we create a VectorizedParquetRecordReader, which
wraps the ParquetRecordReader provided by Parquet. Then in its next() method,
we convert Parquet RowBatch to Hive VectorizedRowBatch.

This is the first patch. To complete vectorization feature, we still have work
to do in follow-up: 1) support all data types 2) support partition column 3)
add more test cases 4) evaluate performance on a real cluster.

Improve Parquet Vectorization
-

Key: HIVE-8128
URL: https://issues.apache.org/jira/browse/HIVE-8128
Project: Hive
Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Dong Chen
Fix For: parquet-branch

Attachments: HIVE-8128-parquet.patch.POC, HIVE-8128.1-parquet.patch

NO PRECOMMIT TESTS
We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde,
VectorizedOrcSerde) which was partially done in HIVE-5998.
As discussed in PARQUET-131, we will work out Hive POC based on the new
Parquet vectorized API, and then finish the implementation after finilized.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11311) Avoid dumping AST tree String in Explain unless necessary

2015-07-21 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634827#comment-14634827
 ] 

Jesus Camacho Rodriguez commented on HIVE-11311:


[~hsubramaniyan], could you review this one? Thanks

 Avoid dumping AST tree String in Explain unless necessary
 -

 Key: HIVE-11311
 URL: https://issues.apache.org/jira/browse/HIVE-11311
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11311.patch


 Currently, the AST tree String representation is created even if it is not 
 used; we should dump it only if we are going to use it (explain extended).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11325) Infinite loop in HiveHFileOutputFormat

2015-07-21 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634774#comment-14634774
 ] 

Harsh J commented on HIVE-11325:


I missed the srcDir declare, which'd explain the loop (we're walking). I'm 
checking why it doesn't abort at the family directory.

 Infinite loop in HiveHFileOutputFormat
 --

 Key: HIVE-11325
 URL: https://issues.apache.org/jira/browse/HIVE-11325
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 1.0.0
Reporter: Harsh J

 No idea why {{hbase_handler_bulk.q}} does not catch this if its being run 
 regularly in Hive builds, but here's the gist of the issue:
 The condition at 
 https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java#L152-L164
  indicates that we will infinitely loop until we find a file whose last path 
 component (the name) is equal to the column family name.
 In execution, however, the iteration enters an actual infinite loop cause the 
 file we end up considering as the srcDir name, is actually the region file, 
 whose name will never match the family name.
 This is an example of the IPC the listing loop of a 100% progress task gets 
 stuck in:
 {code}
 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
 1: Call - cdh54.vm/172.16.29.132:8020: getListing {src: 
 /user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c
  startAfter:  needLocation: false}
 2015-07-21 10:32:20,662 DEBUG [IPC Parameter Sending Thread #1] 
 org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to 
 cdh54.vm/172.16.29.132:8020 from hive sending #510346
 2015-07-21 10:32:20,662 DEBUG [IPC Client (1551465414) connection to 
 cdh54.vm/172.16.29.132:8020 from hive] org.apache.hadoop.ipc.Client: IPC 
 Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive got 
 value #510346
 2015-07-21 10:32:20,662 DEBUG [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
 Call: getListing took 0ms
 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
 1: Response - cdh54.vm/172.16.29.132:8020: getListing {dirList { 
 partialListing { fileType: IS_FILE path:  length: 863 permission { perm: 
 4600 } owner: hive group: hive modification_time: 1437454718130 
 access_time: 1437454717973 block_replication: 1 blocksize: 134217728 fileId: 
 33960 childrenNum: 0 storagePolicy: 0 } remainingEntries: 0 }}
 {code}
 The path we are getting out of the listing results is 
 {{/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c}},
  but instead of checking the path's parent {{family}} we're instead looping 
 infinitely over its hashed filename {{97112ac1c09548ae87bd85af072d2e8c}} 
 cause it does not match {{family}}.
 It stays in the infinite loop therefore, until the MR framework kills it away 
 due to an idle task timeout (and then since the subsequent task attempts fail 
 outright, the job fails).
 While doing a {{getPath().getParent()}} will resolve that, is that infinite 
 loop even necessary? Especially given the fact that we throw exceptions if 
 there are no entries or there is more than one entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8128) Improve Parquet Vectorization

2015-07-21 Thread Dong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-8128:

Attachment: HIVE-8128.6-parquet.patch

 Improve Parquet Vectorization
 -

 Key: HIVE-8128
 URL: https://issues.apache.org/jira/browse/HIVE-8128
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Dong Chen
 Fix For: parquet-branch

 Attachments: HIVE-8128-parquet.patch.POC, HIVE-8128.1-parquet.patch, 
 HIVE-8128.6-parquet.patch


 NO PRECOMMIT TESTS
 We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, 
 VectorizedOrcSerde) which was partially done in HIVE-5998.
 As discussed in PARQUET-131, we will work out Hive POC based on the new 
 Parquet vectorized API, and then finish the implementation after finilized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.

2015-07-21 Thread Bing Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634801#comment-14634801
 ] 

Bing Li commented on HIVE-3:


Thank you, [~tfriedr]
With your fix in HIVE-11326, all the queries could work now.

 ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. 
 ---

 Key: HIVE-3
 URL: https://issues.apache.org/jira/browse/HIVE-3
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 1.2.1
 Environment: 
Reporter: Shiroy Pigarez
Assignee: Pengcheng Xiong
Priority: Critical

 I was trying to perform some column statistics using hive as per the 
 documentation 
 https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive 
 and was encountering the following errors:
 Seems like a bug. Can you look into this? Thanks in advance.
 -- HIVE table
 {noformat}
 hive create table people_part(
 name string,
 address string) PARTITIONED BY (dob string, nationality varchar(2))
 row format delimited fields terminated by '\t';
 {noformat}
 --Analyze table with partition dob and nationality with FOR COLUMNS
 {noformat}
 hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) 
 COMPUTE STATISTICS FOR COLUMNS;
 NoViableAltException(-1@[])
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
 at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
 at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 FAILED: ParseException line 1:95 cannot recognize input near 'EOF' 'EOF' 
 'EOF' in column name
 {noformat}
 --Analyze table with partition dob and nationality values specified with FOR 
 COLUMNS
 {noformat}
 hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') 
 COMPUTE STATISTICS FOR COLUMNS;
 NoViableAltException(-1@[])
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
 at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
 at

[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)


[ 
https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634835#comment-14634835
 ] 

wangchangchun commented on HIVE-11055:
--

I use git apply command ,and patch my sourcecode OK.
Afart built a package, I install it.But still have the problem.

the first problem,
the bin dir of apache-hive-2.0.0-SNAPSHOT-bin.tar.gz   does not have hplsql. So 
I copy bin dir from master branch.

the second problem,
the lib dir of  apache-hive-2.0.0-SNAPSHOT-bin.tar.gz does not have 
hive-hplsql-2.0.0-SNAPSHOT.jar and antlr-runtime-4.5.jar.

problem above I have solved.

the last problem,
can not find hplsql-site.xml and hive-site.xml.

Can you tell me how to solve the problem?





 HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
 ---

 Key: HIVE-11055
 URL: https://issues.apache.org/jira/browse/HIVE-11055
 Project: Hive
  Issue Type: Improvement
Reporter: Dmitry Tolpeko
Assignee: Dmitry Tolpeko
 Fix For: 2.0.0

 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, 
 HIVE-11055.3.patch, HIVE-11055.4.patch


 There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive 
 (actually any SQL-on-Hadoop implementation and any JDBC source).
 Alan Gates offered to contribute it to Hive under HPL/SQL name 
 (org.apache.hive.hplsql package). This JIRA is to create a patch to 
 contribute  the PL/HQL code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11303) Getting Tez LimitExceededException after dag execution on large query


[ 
https://issues.apache.org/jira/browse/HIVE-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634627#comment-14634627
 ] 

Gopal V commented on HIVE-11303:


[~jdere]: +1 LGTM.


 Getting Tez LimitExceededException after dag execution on large query
 -

 Key: HIVE-11303
 URL: https://issues.apache.org/jira/browse/HIVE-11303
 Project: Hive
  Issue Type: Bug
  Components: Tez
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11303.1.patch


 {noformat}
 2015-07-17 18:18:11,830 INFO  [main]: counters.Limits 
 (Limits.java:ensureInitialized(59)) - Counter limits initialized with 
 parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, 
 MAX_COUNTERS=1200
 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - 
 Failed to execute tez graph.
 org.apache.tez.common.counters.LimitExceededException: Too many counters: 
 1201 max=1200
 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87)
 at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104)
 at 
 org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567)
 at 
 org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11303) Getting Tez LimitExceededException after dag execution on large query


 [ 
https://issues.apache.org/jira/browse/HIVE-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11303:
---
Affects Version/s: 1.3.0

 Getting Tez LimitExceededException after dag execution on large query
 -

 Key: HIVE-11303
 URL: https://issues.apache.org/jira/browse/HIVE-11303
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.2.0, 1.3.0
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11303.1.patch


 {noformat}
 2015-07-17 18:18:11,830 INFO  [main]: counters.Limits 
 (Limits.java:ensureInitialized(59)) - Counter limits initialized with 
 parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, 
 MAX_COUNTERS=1200
 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - 
 Failed to execute tez graph.
 org.apache.tez.common.counters.LimitExceededException: Too many counters: 
 1201 max=1200
 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87)
 at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93)
 at 
 org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104)
 at 
 org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567)
 at 
 org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11303) Getting Tez LimitExceededException after dag execution on large query