[jira] [Commented] (HIVE-4662) first_value can't have more than one order by column

2016-03-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194853#comment-15194853
 ] 

Hive QA commented on HIVE-4662:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12793291/HIVE-4662.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9811 tests executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7270/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7270/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7270/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12793291 - PreCommit-HIVE-TRUNK-Build

> first_value can't have more than one order by column
> 
>
> Key: HIVE-4662
> URL: https://issues.apache.org/jira/browse/HIVE-4662
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.11.0
>Reporter: Frans Drijver
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-4662.01.patch, HIVE-4662.01.patch, 
> HIVE-4662.01.patch, HIVE-4662.patch
>
>
> In the current implementation of the first_value function, it's not allowed 
> to have more than one (1) order by column, as so:
> {quote}
> select distinct 
> first_value(kastr.DEWNKNR) over ( partition by kastr.DEKTRNR order by 
> kastr.DETRADT, kastr.DEVPDNR )
> from RTAVP_DRKASTR kastr
> ;
> {quote}
> Error given:
> {quote}
> FAILED: SemanticException Range based Window Frame can have only 1 Sort Key
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR

2016-03-15 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13084:

Attachment: (was: HIVE-13084.03.patch)

> Vectorization add support for PROJECTION Multi-AND/OR
> -
>
> Key: HIVE-13084
> URL: https://issues.apache.org/jira/browse/HIVE-13084
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
> Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, 
> vector_between_date.q
>
>
> When there is case statement in group by, hive throws unable to vectorize 
> exception.
> e.g query just to demonstrate the problem
> {noformat}
> explain select l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts 
> group by l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END;
> org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize 
> expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2
>   File Output Operator [FS_7]
> Group By Operator [GBY_5] (rows=888777234 width=108)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE]
>   SHUFFLE [RS_4]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_3] (rows=1777554469 width=108)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_1] (rows=1777554469 width=108)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=1777554469 width=108)
>   
> rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"]
> {noformat}
> \cc [~mmccline], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR

2016-03-15 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13084:

Attachment: HIVE-13084.03.patch

> Vectorization add support for PROJECTION Multi-AND/OR
> -
>
> Key: HIVE-13084
> URL: https://issues.apache.org/jira/browse/HIVE-13084
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
> Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, 
> HIVE-13084.03.patch, vector_between_date.q
>
>
> When there is case statement in group by, hive throws unable to vectorize 
> exception.
> e.g query just to demonstrate the problem
> {noformat}
> explain select l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts 
> group by l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END;
> org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize 
> expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2
>   File Output Operator [FS_7]
> Group By Operator [GBY_5] (rows=888777234 width=108)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE]
>   SHUFFLE [RS_4]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_3] (rows=1777554469 width=108)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_1] (rows=1777554469 width=108)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=1777554469 width=108)
>   
> rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"]
> {noformat}
> \cc [~mmccline], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12481) Occasionally "Request is a replay" will be thrown from HS2

2016-03-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194927#comment-15194927
 ] 

Lefty Leverenz commented on HIVE-12481:
---

Doc note:  The final patch did not add *hive_server2_thrift_auth_max_retries* 
or any other configuration parameters.  No documentation is needed.

> Occasionally "Request is a replay" will be thrown from HS2
> --
>
> Key: HIVE-12481
> URL: https://issues.apache.org/jira/browse/HIVE-12481
> Project: Hive
>  Issue Type: Improvement
>  Components: Authentication
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-12481.2.patch, HIVE-12481.3.patch, HIVE-12481.patch
>
>
> We have seen the following exception thrown from HS2 in secured cluster when 
> many queries are running simultaneously on single HS2 instance.
> The cause I can guess is that it happens that two queries are submitted at 
> the same time and have the same timestamp. For such case, we can add a retry 
> for the query.
>  
> {noformat}
> 2015-11-18 16:12:33,117 ERROR org.apache.thrift.transport.TSaslTransport: 
> SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: Failure unspecified at GSS-API level (Mechanism level: Request 
> is a replay (34))]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:177)
> at 
> org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539)
> at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283)
> at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
> at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism 
> level: Request is a replay (34))
> at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788)
> at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
> at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:155)
> ... 14 more
> Caused by: KrbException: Request is a replay (34)
> at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:308)
> at sun.security.krb5.KrbApReq.(KrbApReq.java:144)
> at 
> sun.security.jgss.krb5.InitSecContextToken.(InitSecContextToken.java:108)
> at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:771)
> ... 17 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12481) Occasionally "Request is a replay" will be thrown from HS2

2016-03-15 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-12481:
--
Labels:   (was: TODOC2.1)

> Occasionally "Request is a replay" will be thrown from HS2
> --
>
> Key: HIVE-12481
> URL: https://issues.apache.org/jira/browse/HIVE-12481
> Project: Hive
>  Issue Type: Improvement
>  Components: Authentication
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 2.1.0
>
> Attachments: HIVE-12481.2.patch, HIVE-12481.3.patch, HIVE-12481.patch
>
>
> We have seen the following exception thrown from HS2 in secured cluster when 
> many queries are running simultaneously on single HS2 instance.
> The cause I can guess is that it happens that two queries are submitted at 
> the same time and have the same timestamp. For such case, we can add a retry 
> for the query.
>  
> {noformat}
> 2015-11-18 16:12:33,117 ERROR org.apache.thrift.transport.TSaslTransport: 
> SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: Failure unspecified at GSS-API level (Mechanism level: Request 
> is a replay (34))]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:177)
> at 
> org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539)
> at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283)
> at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
> at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism 
> level: Request is a replay (34))
> at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788)
> at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
> at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:155)
> ... 14 more
> Caused by: KrbException: Request is a replay (34)
> at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:308)
> at sun.security.krb5.KrbApReq.(KrbApReq.java:144)
> at 
> sun.security.jgss.krb5.InitSecContextToken.(InitSecContextToken.java:108)
> at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:771)
> ... 17 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13183) More logs in operation logs

2016-03-15 Thread Rajat Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-13183:

Status: Patch Available  (was: Open)

> More logs in operation logs
> ---
>
> Key: HIVE-13183
> URL: https://issues.apache.org/jira/browse/HIVE-13183
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-13183.02.patch, HIVE-13183.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11424:
---
Attachment: HIVE-11424.04.patch

> Rule to transform OR clauses into IN clauses in CBO
> ---
>
> Key: HIVE-11424
> URL: https://issues.apache.org/jira/browse/HIVE-11424
> Project: Hive
>  Issue Type: Bug
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, 
> HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, 
> HIVE-11424.2.patch, HIVE-11424.patch
>
>
> We create a rule that will transform OR clauses into IN clauses (when 
> possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-11424 started by Jesus Camacho Rodriguez.
--
> Rule to transform OR clauses into IN clauses in CBO
> ---
>
> Key: HIVE-11424
> URL: https://issues.apache.org/jira/browse/HIVE-11424
> Project: Hive
>  Issue Type: Bug
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, 
> HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, 
> HIVE-11424.2.patch, HIVE-11424.patch
>
>
> We create a rule that will transform OR clauses into IN clauses (when 
> possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11424:
---
Status: Open  (was: Patch Available)

> Rule to transform OR clauses into IN clauses in CBO
> ---
>
> Key: HIVE-11424
> URL: https://issues.apache.org/jira/browse/HIVE-11424
> Project: Hive
>  Issue Type: Bug
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, 
> HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, 
> HIVE-11424.2.patch, HIVE-11424.patch
>
>
> We create a rule that will transform OR clauses into IN clauses (when 
> possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11424:
---
Status: Patch Available  (was: In Progress)

> Rule to transform OR clauses into IN clauses in CBO
> ---
>
> Key: HIVE-11424
> URL: https://issues.apache.org/jira/browse/HIVE-11424
> Project: Hive
>  Issue Type: Bug
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, 
> HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, 
> HIVE-11424.2.patch, HIVE-11424.patch
>
>
> We create a rule that will transform OR clauses into IN clauses (when 
> possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194982#comment-15194982
 ] 

Jesus Camacho Rodriguez commented on HIVE-11424:


[~damien.carol], there are multiple reasons.
>From the optimizer perspective, it is a way of normalizing expressions in 
>Filter operators so we can find potential duplicate expressions and remove 
>them. This also has an impact in the operator statistics estimation.
Further, it will make a difference in execution performance for large number of 
comparisons using e.g. vectorization. I do not remember the exact details, but 
I believe [~gopalv] can extend on it.

> Rule to transform OR clauses into IN clauses in CBO
> ---
>
> Key: HIVE-11424
> URL: https://issues.apache.org/jira/browse/HIVE-11424
> Project: Hive
>  Issue Type: Bug
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, 
> HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, 
> HIVE-11424.2.patch, HIVE-11424.patch
>
>
> We create a rule that will transform OR clauses into IN clauses (when 
> possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194986#comment-15194986
 ] 

Jesus Camacho Rodriguez commented on HIVE-11424:


[~ashutoshc], could you take a look? Thanks

> Rule to transform OR clauses into IN clauses in CBO
> ---
>
> Key: HIVE-11424
> URL: https://issues.apache.org/jira/browse/HIVE-11424
> Project: Hive
>  Issue Type: Bug
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, 
> HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, 
> HIVE-11424.2.patch, HIVE-11424.patch
>
>
> We create a rule that will transform OR clauses into IN clauses (when 
> possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13269) Simplify comparison expressions using column stats

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13269:
---
Attachment: HIVE-13269.01.patch

> Simplify comparison expressions using column stats
> --
>
> Key: HIVE-13269
> URL: https://issues.apache.org/jira/browse/HIVE-13269
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13269.01.patch, HIVE-13269.patch, HIVE-13269.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-13269) Simplify comparison expressions using column stats

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-13269 started by Jesus Camacho Rodriguez.
--
> Simplify comparison expressions using column stats
> --
>
> Key: HIVE-13269
> URL: https://issues.apache.org/jira/browse/HIVE-13269
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13269.01.patch, HIVE-13269.patch, HIVE-13269.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13269) Simplify comparison expressions using column stats

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13269:
---
Status: Open  (was: Patch Available)

> Simplify comparison expressions using column stats
> --
>
> Key: HIVE-13269
> URL: https://issues.apache.org/jira/browse/HIVE-13269
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13269.01.patch, HIVE-13269.patch, HIVE-13269.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13269) Simplify comparison expressions using column stats

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13269:
---
Status: Patch Available  (was: In Progress)

> Simplify comparison expressions using column stats
> --
>
> Key: HIVE-13269
> URL: https://issues.apache.org/jira/browse/HIVE-13269
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13269.01.patch, HIVE-13269.patch, HIVE-13269.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13269) Simplify comparison expressions using column stats

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13269:
---
Attachment: HIVE-13269.01.patch

> Simplify comparison expressions using column stats
> --
>
> Key: HIVE-13269
> URL: https://issues.apache.org/jira/browse/HIVE-13269
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13269.01.patch, HIVE-13269.patch, HIVE-13269.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13269) Simplify comparison expressions using column stats

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13269:
---
Attachment: (was: HIVE-13269.01.patch)

> Simplify comparison expressions using column stats
> --
>
> Key: HIVE-13269
> URL: https://issues.apache.org/jira/browse/HIVE-13269
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13269.01.patch, HIVE-13269.patch, HIVE-13269.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13233) Use min and max values to estimate better stats for comparison operators

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13233:
---
   Resolution: Fixed
Fix Version/s: 2.1.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks [~ashutoshc]!

> Use min and max values to estimate better stats for comparison operators
> 
>
> Key: HIVE-13233
> URL: https://issues.apache.org/jira/browse/HIVE-13233
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 2.1.0
>
> Attachments: HIVE-13233.01.patch, HIVE-13233.patch
>
>
> We should benefit from the min/max values for each column to calculate more 
> precisely the number of rows produced by expressions with comparison operators



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13287) Add logic to estimate stats for IN operator

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13287:
---
Affects Version/s: 2.1.0

> Add logic to estimate stats for IN operator
> ---
>
> Key: HIVE-13287
> URL: https://issues.apache.org/jira/browse/HIVE-13287
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Currently, IN operator is considered in the default case: reduces the input 
> rows to the half. This may lead to wrong estimates for the number of rows 
> produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12619) Switching the field order within an array of structs causes the query to fail

2016-03-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195071#comment-15195071
 ] 

Hive QA commented on HIVE-12619:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12792594/HIVE-12619.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 9778 tests executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-cte_4.q-orc_merge5.q-vectorization_limit.q-and-12-more - 
did not produce a TEST-*.xml file
TestMiniTezCliDriver-dynpart_sort_optimization2.q-cte_mat_1.q-tez_bmj_schema_evolution.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_coalesce.q-auto_sortmerge_join_7.q-dynamic_partition_pruning.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7271/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7271/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7271/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12792594 - PreCommit-HIVE-TRUNK-Build

> Switching the field order within an array of structs causes the query to fail
> -
>
> Key: HIVE-12619
> URL: https://issues.apache.org/jira/browse/HIVE-12619
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ang Zhang
>Assignee: Mohammad Kamrul Islam
>Priority: Minor
> Attachments: HIVE-12619.2.patch, HIVE-12619.3.patch
>
>
> Switching the field order within an array of structs causes the query to fail 
> or return the wrong data for the fields, but switching the field order within 
> just a struct works.
> How to reproduce:
> Case1 if the two fields have the same type, query will return wrong data for 
> the fields
> drop table if exists schema_test;
> create table schema_test (msg array>) stored 
> as parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 'abc2')), array(named_struct('f1', 'efg', 'f2', 'efg2'))) from one 
> limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":"efg2"}]
> --[{"f1":"abc","f2":"abc2"}]
> alter table schema_test change msg msg array>;
> select * from schema_test;
> --returns
> --[{"f2":"efg","f1":"efg2"}]
> --[{"f2":"abc","f1":"abc2"}]
> Case2: if the two fields have different type, the query will fail
> drop table if exists schema_test;
> create table schema_test (msg array>) stored as 
> parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 1)), array(named_struct('f1', 'efg', 'f2', 2))) from one limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":2}]
> --[{"f1":"abc","f2":1}]
> alter table schema_test change msg msg array>;
> select * from schema_test;
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
> org.apache.hadoop.io.IntWritable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195278#comment-15195278
 ] 

Aihua Xu commented on HIVE-13286:
-

[~vikram.dixit] This is to check if we provide queryId from the client. If a 
client provides a queryId, then we will use that queryId internally, otherwise, 
we will make a new one since the client could need a meaningful queryId. 

Seems there is a bug in here. We should make a new queryId inside if statement 
and set it outside the if statement.

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12244) Refactoring code for avoiding of comparison of Strings and do comparison on Path

2016-03-15 Thread Alina Abramova (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195283#comment-15195283
 ] 

Alina Abramova commented on HIVE-12244:
---

I see that most test that passed locally are failed by Jenkins. For example:
I ran tests with   -Dqfile_regex=smb_mapjoin.* 
Tests run: 33, Failures: 3, Errors: 0, Skipped: 0,
Failed tests :
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin15

>From Jenkins message failed tests:
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin9

I do not understand what happens and why these tests results are different. 
Could somebody run this part of tests (  -Dqfile_regex=smb_mapjoin.* ) locally 
and show me results please? Maybe I do something wrong.

> Refactoring code for avoiding of comparison of Strings and do comparison on 
> Path
> 
>
> Key: HIVE-12244
> URL: https://issues.apache.org/jira/browse/HIVE-12244
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.1
>Reporter: Alina Abramova
>Assignee: Alina Abramova
>Priority: Minor
>  Labels: patch
> Fix For: 1.2.1
>
> Attachments: HIVE-12244.1.patch, HIVE-12244.2.patch, 
> HIVE-12244.3.patch, HIVE-12244.4.patch, HIVE-12244.5.patch, 
> HIVE-12244.6.patch, HIVE-12244.7.patch, HIVE-12244.8.patch, 
> HIVE-12244.8.patch, HIVE-12244.9.patch
>
>
> In Hive often String is used for representation path and it causes new issues.
> We need to compare it with equals() but comparing Strings often is not right 
> in terms comparing paths .
> I think if we use Path from org.apache.hadoop.fs we will avoid new problems 
> in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13232) Aggressively drop compression buffers in ORC OutStreams

2016-03-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195297#comment-15195297
 ] 

Hive QA commented on HIVE-13232:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12793328/HIVE-13232.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9826 tests executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7272/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7272/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7272/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12793328 - PreCommit-HIVE-TRUNK-Build

> Aggressively drop compression buffers in ORC OutStreams
> ---
>
> Key: HIVE-13232
> URL: https://issues.apache.org/jira/browse/HIVE-13232
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.14.1, 1.3.0, 2.1.0
>
> Attachments: HIVE-13232.patch, HIVE-13232.patch, HIVE-13232.patch
>
>
> In Hive 0.11, when ORC's OutStream's were flushed they dropped all of the 
> their buffers. In the patch for HIVE-4324, we inadvertently changed that 
> behavior so that one of the buffers is held on to. For queries with a lot of 
> writers and thus under significant memory pressure this can have a 
> significant impact on the memory usage. 
> Note that "hive.optimize.sort.dynamic.partition" avoids this problem by 
> sorting on the dynamic partition key and thus only a single ORC writer is 
> open at once. This will use memory more effectively and avoid creating ORC 
> files with very small stripes, which will produce better downstream 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195298#comment-15195298
 ] 

Aihua Xu commented on HIVE-13286:
-

OK. We had a followup to fix HIVE-12456 to avoid storing queryId in 
SessionState since multiple queries can run in the same session at the same 
time. Later we will combine session conf and confOverlay conf to the query conf 
so the query should have the new queryId.

[~vikram.dixit] Did you have the patch-12456 applied?

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12244) Refactoring code for avoiding of comparison of Strings and do comparison on Path

2016-03-15 Thread Alina Abramova (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195422#comment-15195422
 ] 

Alina Abramova commented on HIVE-12244:
---

 Could somebody apply my last patch locally and run this part of tests?

> Refactoring code for avoiding of comparison of Strings and do comparison on 
> Path
> 
>
> Key: HIVE-12244
> URL: https://issues.apache.org/jira/browse/HIVE-12244
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.1
>Reporter: Alina Abramova
>Assignee: Alina Abramova
>Priority: Minor
>  Labels: patch
> Fix For: 1.2.1
>
> Attachments: HIVE-12244.1.patch, HIVE-12244.2.patch, 
> HIVE-12244.3.patch, HIVE-12244.4.patch, HIVE-12244.5.patch, 
> HIVE-12244.6.patch, HIVE-12244.7.patch, HIVE-12244.8.patch, 
> HIVE-12244.8.patch, HIVE-12244.9.patch
>
>
> In Hive often String is used for representation path and it causes new issues.
> We need to compare it with equals() but comparing Strings often is not right 
> in terms comparing paths .
> I think if we use Path from org.apache.hadoop.fs we will avoid new problems 
> in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR

2016-03-15 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13084:

Attachment: HIVE-13084.03.patch

> Vectorization add support for PROJECTION Multi-AND/OR
> -
>
> Key: HIVE-13084
> URL: https://issues.apache.org/jira/browse/HIVE-13084
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
> Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, 
> HIVE-13084.03.patch, vector_between_date.q
>
>
> When there is case statement in group by, hive throws unable to vectorize 
> exception.
> e.g query just to demonstrate the problem
> {noformat}
> explain select l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts 
> group by l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END;
> org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize 
> expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2
>   File Output Operator [FS_7]
> Group By Operator [GBY_5] (rows=888777234 width=108)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE]
>   SHUFFLE [RS_4]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_3] (rows=1777554469 width=108)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_1] (rows=1777554469 width=108)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=1777554469 width=108)
>   
> rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"]
> {noformat}
> \cc [~mmccline], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR

2016-03-15 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13084:

Attachment: (was: HIVE-13084.03.patch)

> Vectorization add support for PROJECTION Multi-AND/OR
> -
>
> Key: HIVE-13084
> URL: https://issues.apache.org/jira/browse/HIVE-13084
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
> Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, 
> HIVE-13084.03.patch, vector_between_date.q
>
>
> When there is case statement in group by, hive throws unable to vectorize 
> exception.
> e.g query just to demonstrate the problem
> {noformat}
> explain select l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts 
> group by l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END;
> org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize 
> expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2
>   File Output Operator [FS_7]
> Group By Operator [GBY_5] (rows=888777234 width=108)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE]
>   SHUFFLE [RS_4]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_3] (rows=1777554469 width=108)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_1] (rows=1777554469 width=108)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=1777554469 width=108)
>   
> rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"]
> {noformat}
> \cc [~mmccline], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12540) Create function failed, but show functions display it

2016-03-15 Thread Reuben Kuhnert (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195436#comment-15195436
 ] 

Reuben Kuhnert commented on HIVE-12540:
---

I just tested this, but it worked for me. Is this still an issue?

Declaration:

{code}
public class FunctionTask extends Task {
  public static class MyUDF extends UDF {

  }
}
{code}

Test:
{code}
create function udfTest as 'org.apache.hadoop.hive.ql.exec.FunctionTask$MyUDF';
INFO  : Compiling 
command(queryId=sircodesalot_20160315095656_38c72e48-856e-4ece-94e8-eecc145cc045):
 create function udfTest as 'org.apache.hadoop.hive.ql.exec.FunctionTask$MyUDF'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling 
command(queryId=sircodesalot_20160315095656_38c72e48-856e-4ece-94e8-eecc145cc045);
 Time taken: 0.108 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=sircodesalot_20160315095656_38c72e48-856e-4ece-94e8-eecc145cc045):
 create function udfTest as 'org.apache.hadoop.hive.ql.exec.FunctionTask$MyUDF'
INFO  : Starting task [Stage-0:FUNC] in serial mode
INFO  : Completed executing 
command(queryId=sircodesalot_20160315095656_38c72e48-856e-4ece-94e8-eecc145cc045);
 Time taken: 75.289 seconds
INFO  : OK
No rows affected (75.5 seconds)
{code}

{code}
0: jdbc:hive2://localhost:1> show functions;
show functions;
+-+--+
|tab_name |
+-+--+
| ... |
| default.udftest |
| ... |
+-+--+
{code}

> Create function failed, but show functions display it
> -
>
> Key: HIVE-12540
> URL: https://issues.apache.org/jira/browse/HIVE-12540
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Weizhong
>Priority: Minor
>
> {noformat}
> 0: jdbc:hive2://vm119:1> create function udfTest as 
> 'hive.udf.UDFArrayNotE';
> ERROR : Failed to register default.udftest using class hive.udf.UDFArrayNotE
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)
> 0: jdbc:hive2://vm119:1> show functions;
> +-+--+
> |tab_name |
> +-+--+
> | ... |
> | default.udftest |
> | ... |
> +-+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-13287) Add logic to estimate stats for IN operator

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-13287 started by Jesus Camacho Rodriguez.
--
> Add logic to estimate stats for IN operator
> ---
>
> Key: HIVE-13287
> URL: https://issues.apache.org/jira/browse/HIVE-13287
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13287.patch
>
>
> Currently, IN operator is considered in the default case: reduces the input 
> rows to the half. This may lead to wrong estimates for the number of rows 
> produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13287) Add logic to estimate stats for IN operator

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13287:
---
Status: Patch Available  (was: In Progress)

> Add logic to estimate stats for IN operator
> ---
>
> Key: HIVE-13287
> URL: https://issues.apache.org/jira/browse/HIVE-13287
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13287.patch
>
>
> Currently, IN operator is considered in the default case: reduces the input 
> rows to the half. This may lead to wrong estimates for the number of rows 
> produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13287) Add logic to estimate stats for IN operator

2016-03-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13287:
---
Attachment: HIVE-13287.patch

> Add logic to estimate stats for IN operator
> ---
>
> Key: HIVE-13287
> URL: https://issues.apache.org/jira/browse/HIVE-13287
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13287.patch
>
>
> Currently, IN operator is considered in the default case: reduces the input 
> rows to the half. This may lead to wrong estimates for the number of rows 
> produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11837) comments do not support unicode characters well.

2016-03-15 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195520#comment-15195520
 ] 

Yongzhi Chen commented on HIVE-11837:
-

Need more research on how to make desc formatted work.

> comments do not support unicode characters well.
> 
>
> Key: HIVE-11837
> URL: https://issues.apache.org/jira/browse/HIVE-11837
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.13.1, 1.1.0
> Environment: Hadoop 2.7
> Hive 0.13.1 / Hive 1.1.0
> RHEL 6.4 / SLES 11.3
>Reporter: Rudd Chen
>Assignee: Rudd Chen
>Priority: Minor
> Attachments: HIVE-11837.1.patch, HIVE-11837.patch
>
>
> the terminal encoding is set to UTF-8, It can display Chinese characters. 
> then I create a table with a comment in Chinese, both "show create table" and 
> "desc formatted table" can not display the Chinese characters in the table 
> comments, meanwhile it can display Chinese characters in column comment.. See 
> below:
> 0: jdbc:hive2://ha-cluster/default> create table tt(id int comment '列中文测试') 
> comment '表中文测试';
> No rows affected (0.152 seconds)
> 0: jdbc:hive2://ha-cluster/default> 
> 0: jdbc:hive2://ha-cluster/default> 
> 0: jdbc:hive2://ha-cluster/default> desc formatted tt;   
> +---+---+-+
> |   col_name|   data_type 
>   | comment |
> +---+---+-+
> | # col_name| data_type   
>   | comment |
> |   | NULL
>   | NULL|
> | id| int 
>   | 列中文测试   |
> |   | NULL
>   | NULL|
> | # Detailed Table Information  | NULL
>   | NULL|
> | Database: | default 
>   | NULL|
> | Owner:| admin   
>   | NULL|
> | CreateTime:   | Wed Sep 16 11:13:34 CST 2015
>   | NULL|
> | LastAccessTime:   | UNKNOWN 
>   | NULL|
> | Protect Mode: | None
>   | NULL|
> | Retention:| 0   
>   | NULL|
> | Location: | hdfs://hacluster/user/hive/warehouse/tt 
>   | NULL|
> | Table Type:   | MANAGED_TABLE   
>   | NULL|
> | Table Parameters: | NULL
>   | NULL|
> |   | comment 
>   | \u8868\u4E2D\u6587\u6D4B\u8BD5  |
> |   | transient_lastDdlTime   
>   | 1442373214  |
> |   | NULL
>   | NULL|
> | # Storage Information | NULL
>   | NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe  | NULL  
>   |
> | InputFormat:  | 
> org.apache.hadoop.hive.ql.io.RCFileInputFormat| NULL  
>   |
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.RCFileOutputFormat   | NULL  
>   |
> | Compressed:   | No  
>   | NULL|
> | Num Buckets:  | -1  
>   | NULL|
> | Bucket Columns:   | []  
>   | NULL|
> | Sort Columns: |

[jira] [Updated] (HIVE-13243) Hive drop table on encyption zone fails for external tables

2016-03-15 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-13243:
---
   Resolution: Fixed
Fix Version/s: 2.0.1
   2.1.0
   Status: Resolved  (was: Patch Available)

Committed to 2.1.0 and 2.0.1. Thanks [~spena] for review!

> Hive drop table on encyption zone fails for external tables
> ---
>
> Key: HIVE-13243
> URL: https://issues.apache.org/jira/browse/HIVE-13243
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption, Metastore
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 2.1.0, 2.0.1
>
> Attachments: HIVE-13243.1.patch, HIVE-13243.2.patch, HIVE-13243.patch
>
>
> When dropping an external table with its data located in an encryption zone, 
> hive should not throw out MetaException(message:Unable to drop table because 
> it is in an encryption zone and trash is enabled. Use PURGE option to skip 
> trash.) in checkTrashPurgeCombination since the data should not get deleted 
> (or trashed) anyway regardless HDFS Trash is enabled or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13249) Hard upper bound on number of open transactions

2016-03-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195597#comment-15195597
 ] 

Alan Gates commented on HIVE-13249:
---

Definitely we want it more done more frequently for countOpenTxns.  I wasn't 
suggesting they should be in the same thread.  I see the disconnect now.  I was 
thinking of AcidHouseKeeperService as a threadpool, but it isn't, it's one 
thread. 

So I think in general we should consolidate these threads into one pool.  We 
should think too about whether the initiator, worker, and cleaner threads 
should be handled in such a pool as well.  But all that's out of the scope of 
this JIRA.

For this particular thread I still think we should run it server side and not 
client side.  Unfortunately for now that means a second HousekeeperService 
implementation.  But I think it's better to do that and clean it up later than 
it is to put the thread pool in the wrong place.

> Hard upper bound on number of open transactions
> ---
>
> Key: HIVE-13249
> URL: https://issues.apache.org/jira/browse/HIVE-13249
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13249.1.patch
>
>
> We need to have a safeguard by adding an upper bound for open transactions to 
> avoid huge number of open-transaction requests, usually due to improper 
> configuration of clients such as Storm.
> Once that limit is reached, clients will start failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13149) Remove some unnecessary HMS connections from HS2

2016-03-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195622#comment-15195622
 ] 

Hive QA commented on HIVE-13149:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12793335/HIVE-13149.4.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 32 failed/errored test(s), 9798 tests 
executed
*Failed tests:*
{noformat}
TestJdbcWithMiniHS2 - did not produce a TEST-*.xml file
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_binary
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_bulk
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_snapshot
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_null_first_col
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_join
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_timestamp
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_timestamp_format
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_generatehfiles_require_family_path
org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreMetrics.testMetaDataCounts
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7273/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7273/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7273/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 32 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12793335 - PreCommit-HIVE-TRUNK-Build

> Remove some unnecessary HMS connections from HS2 
> -
>
> Key: HIVE-13149
> URL: https://issues.apache.org/jira/browse/HIVE-13149
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-13149.1.patch, HIVE-13149.2.patch, 
> HIVE-13149.3.patch, HIVE-13149.4.patch
>
>
> In SessionState class, currently we will always try to get a HMS connection 
> in {{start(SessionState startSs, boolean isAsync, LogHelper console)}} 
> regardless of if the connection will be used later or not. 
> When SessionState is accessed by the tasks in TaskRunner.java, although most 
> of the tasks other than some like StatsTask, don't need to access HMS. 
> Currently a new HMS

[jira] [Commented] (HIVE-13260) ReduceSinkDeDuplication throws exception when pRS key is empty

2016-03-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195623#comment-15195623
 ] 

Prasanth Jayachandran commented on HIVE-13260:
--

+1

> ReduceSinkDeDuplication throws exception when pRS key is empty
> --
>
> Key: HIVE-13260
> URL: https://issues.apache.org/jira/browse/HIVE-13260
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13260.01.patch, HIVE-13260.02.patch
>
>
> Steps to reproduce:
> {code}
> set hive.mapred.mode=nonstrict;
> set hive.cbo.enable=false;
> set hive.map.aggr=false;
> set hive.groupby.skewindata=false;
> set mapred.reduce.tasks=31;
> select 
> compute_stats(a,16),compute_stats(b,16),compute_stats(c,16),compute_stats(d,16)
> from
> (
> select
>   avg(DISTINCT substr(src.value,5)) as a,
>   max(substr(src.value,5)) as b,
>   variance(substr(src.value,5)) as c,
>   var_samp(substr(src.value,5)) as d
>  from src)subq;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-03-15 Thread Rohit Dholakia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Dholakia updated HIVE-12049:
--
Attachment: HIVE-12049.13.patch

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.2.patch, 
> HIVE-12049.3.patch, HIVE-12049.4.patch, HIVE-12049.5.patch, 
> HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195761#comment-15195761
 ] 

Siddharth Seth commented on HIVE-13286:
---

[~aihuaxu] - I'm curious as to why we allow the queryId to be overwritten by 
users. Isn't that meant to be unique within HiveServer. If some historic query 
information were to be retained by HiveServer - that would break. The query 
name can already be overwritten.

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10176) skip.header.line.count causes values to be skipped when performing insert values

2016-03-15 Thread Vladyslav Pavlenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladyslav Pavlenko updated HIVE-10176:
--
Attachment: HIVE-10176.4.patch

> skip.header.line.count causes values to be skipped when performing insert 
> values
> 
>
> Key: HIVE-10176
> URL: https://issues.apache.org/jira/browse/HIVE-10176
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Wenbo Wang
>Assignee: Vladyslav Pavlenko
> Attachments: HIVE-10176.1.patch, HIVE-10176.2.patch, 
> HIVE-10176.3.patch, HIVE-10176.4.patch, data
>
>
> When inserting values in to tables with TBLPROPERTIES 
> ("skip.header.line.count"="1") the first value listed is also skipped. 
> create table test (row int, name string) TBLPROPERTIES 
> ("skip.header.line.count"="1"); 
> load data local inpath '/root/data' into table test;
> insert into table test values (1, 'a'), (2, 'b'), (3, 'c');
> (1, 'a') isn't inserted into the table. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195794#comment-15195794
 ] 

Aihua Xu commented on HIVE-13286:
-

QueryId should be unique. The user overwritten queryId is for the user to 
provide meaningful queryId if the user wants to and the user needs to make sure 
it's unique.  

If the user doesn't overwrite the queryId, then hive will generate one as 
before. 

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195815#comment-15195815
 ] 

Aihua Xu commented on HIVE-13286:
-

OK. I think there is an issue there. confOverlay is passed from the client. 
Seems we need to make a copy of that otherwise if the client reuses the same 
confOverlay, then queryId is reused. Is that the issue? I will correct that.

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13285) Orc concatenation may drop old files from moving to final path

2016-03-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13285:
-
Attachment: HIVE-13285.3.patch

Addressed [~gopalv]'s comment. closeOp uses abort state instead of exception 
state.

> Orc concatenation may drop old files from moving to final path
> --
>
> Key: HIVE-13285
> URL: https://issues.apache.org/jira/browse/HIVE-13285
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0, 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-13285.1.patch, HIVE-13285.2.patch, 
> HIVE-13285.3.patch
>
>
> ORC concatenation uses combine hive input format for merging files. Under 
> specific case where all files within a combine split are incompatible for 
> merge (old files without stripe statistics) then these files are added to 
> incompatible file set. But this file set is not processed as closeOp() will 
> not be called (no output file writer will exist which will skip 
> super.closeOp()). As a result, these incompatible files are not moved to 
> final path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195836#comment-15195836
 ] 

Vikram Dixit K commented on HIVE-13286:
---

The issue here is that if we make a change in the incoming configuration, it 
remains set for the duration of the session. We need to make sure that the user 
does not set the query id configuration because there is a chance of them 
breaking what a query id is - a unique id for each query. I think what you 
really want is something like the HIVE_LOG_TRACE_ID which could be renamed to 
something like a HIVE_USER_TRACE_ID - an id that a user can set and trace which 
can stay constant until the user decides to change it. You could create a 
separate configuration too for the use case you have. I think messing around 
with the query id looks like a recipe for bugs. I think we should move the 
query id to a config that the user cannot change and just put it in the 
utilities class for e.g. like INPUT_NAME that mapreduce used.

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13226) Improve tez print summary to print query execution breakdown

2016-03-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195839#comment-15195839
 ] 

Gopal V commented on HIVE-13226:


LGTM - +1.

> Improve tez print summary to print query execution breakdown
> 
>
> Key: HIVE-13226
> URL: https://issues.apache.org/jira/browse/HIVE-13226
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13226.1.patch, HIVE-13226.2.patch, 
> HIVE-13226.3.patch, sampleoutput.png
>
>
> When tez print summary is enabled, methods summary is printed which are 
> difficult to correlate with the actual execution time. We can improve that to 
> print  the execution times in the sequence of operations that happens behind 
> the scenes.
> Instead of printing the methods name it will be useful to print something 
> like below
> 1) Query Compilation time
> 2) Query Submit to DAG Submit time
> 3) DAG Submit to DAG Accept time
> 4) DAG Accept to DAG Start time
> 5) DAG Start to DAG End time
> With this it will be easier to find out where the actual time is spent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195836#comment-15195836
 ] 

Vikram Dixit K edited comment on HIVE-13286 at 3/15/16 6:14 PM:


The issue here is that if we make a change in the incoming configuration, it 
remains set for the duration of the session. We need to make sure that the user 
does not set the query id configuration because there is a chance of them 
breaking what a query id is - a unique id for each query. I think what you 
really want is something like the HIVE_LOG_TRACE_ID which could be renamed to 
something like a HIVE_USER_TRACE_ID - an id that a user can set and trace which 
can stay constant until the user decides to change it. You could create a 
separate configuration too for the use case you have. I think allowing the user 
to mess around with the query id looks like a recipe for bugs. I think we 
should move the query id to a config that the user cannot change and just put 
it in the utilities class for e.g. like INPUT_NAME that mapreduce used.


was (Author: vikram.dixit):
The issue here is that if we make a change in the incoming configuration, it 
remains set for the duration of the session. We need to make sure that the user 
does not set the query id configuration because there is a chance of them 
breaking what a query id is - a unique id for each query. I think what you 
really want is something like the HIVE_LOG_TRACE_ID which could be renamed to 
something like a HIVE_USER_TRACE_ID - an id that a user can set and trace which 
can stay constant until the user decides to change it. You could create a 
separate configuration too for the use case you have. I think messing around 
with the query id looks like a recipe for bugs. I think we should move the 
query id to a config that the user cannot change and just put it in the 
utilities class for e.g. like INPUT_NAME that mapreduce used.

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13226) Improve tez print summary to print query execution breakdown

2016-03-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13226:
-
   Resolution: Fixed
Fix Version/s: 2.1.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks [~gopalv] for the review!

> Improve tez print summary to print query execution breakdown
> 
>
> Key: HIVE-13226
> URL: https://issues.apache.org/jira/browse/HIVE-13226
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.1.0
>
> Attachments: HIVE-13226.1.patch, HIVE-13226.2.patch, 
> HIVE-13226.3.patch, sampleoutput.png
>
>
> When tez print summary is enabled, methods summary is printed which are 
> difficult to correlate with the actual execution time. We can improve that to 
> print  the execution times in the sequence of operations that happens behind 
> the scenes.
> Instead of printing the methods name it will be useful to print something 
> like below
> 1) Query Compilation time
> 2) Query Submit to DAG Submit time
> 3) DAG Submit to DAG Accept time
> 4) DAG Accept to DAG Start time
> 5) DAG Start to DAG End time
> With this it will be easier to find out where the actual time is spent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13285) Orc concatenation may drop old files from moving to final path

2016-03-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195866#comment-15195866
 ] 

Gopal V commented on HIVE-13285:


LGTM - +1, tests pending.

> Orc concatenation may drop old files from moving to final path
> --
>
> Key: HIVE-13285
> URL: https://issues.apache.org/jira/browse/HIVE-13285
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0, 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-13285.1.patch, HIVE-13285.2.patch, 
> HIVE-13285.3.patch
>
>
> ORC concatenation uses combine hive input format for merging files. Under 
> specific case where all files within a combine split are incompatible for 
> merge (old files without stripe statistics) then these files are added to 
> incompatible file set. But this file set is not processed as closeOp() will 
> not be called (no output file writer will exist which will skip 
> super.closeOp()). As a result, these incompatible files are not moved to 
> final path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195987#comment-15195987
 ] 

Aihua Xu commented on HIVE-13286:
-

Actually what I need is the unique queryId. Think of the scenario that hive is 
just one of the components in the pipeline. The client could have a queryId 
(e.g., to trace the generation of the query) and then call hive. Then such 
queryId can link them together and better for diagnosis. If we create other 
ids, then seems to defeat that purpose. 

If the user doesn't provide queryId, then Hive will take care of that. Is the 
following the actual issue you see?
{nformat}
I think there is an issue there. confOverlay is passed from the client. Seems 
we need to make a copy of that otherwise if the client reuses the same 
confOverlay, then queryId is reused. Is that the issue? I will correct that.
{noformat}
 

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196044#comment-15196044
 ] 

Vikram Dixit K commented on HIVE-13286:
---

[~aihuaxu] Consider the following scenario: In Tez/Spark, if we end up caching 
the small table based on the hive query id. If say the user set the hive query 
id for 1 query and does not reset it for the subsequent query, we will end up 
picking the previously cached hash table for the join resulting in incorrect 
results right? Creating a new conf object would only work if we reset the query 
id after the query completes. If we allow it to exist in the configuration 
object after a query has completed running, it will result in incorrect results 
or some weird behavior.

Consider hs2 or cli session, if a user in a session assigns a query id and 
doesn't reset it, it can result in incorrect results. You are expecting a user 
to set a query id each time after setting it once? I don't think that is great 
behavior.

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196054#comment-15196054
 ] 

Aihua Xu commented on HIVE-13286:
-

I moved to initialize the queryId earlier so that starting from the beginning 
of the execution, the workflow will have unique queryId. 

Actually I think your statement makes sense. What I need is really a traceId. 
Does the same queryId cause the issues? If it does, I can change to disallow 
the change from the client.

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12995) LLAP: Synthetic file ids need collision checks

2016-03-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196077#comment-15196077
 ] 

Hive QA commented on HIVE-12995:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12793362/HIVE-12995.04.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9825 tests executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7274/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7274/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7274/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12793362 - PreCommit-HIVE-TRUNK-Build

> LLAP: Synthetic file ids need collision checks
> --
>
> Key: HIVE-12995
> URL: https://issues.apache.org/jira/browse/HIVE-12995
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12995.01.patch, HIVE-12995.02.patch, 
> HIVE-12995.03.patch, HIVE-12995.04.patch, HIVE-12995.patch
>
>
> LLAP synthetic file ids do not have any way of checking whether a collision 
> occurs other than a data-error.
> Synthetic file-ids have only been used with unit tests so far - but they will 
> be needed to add cache mechanisms to non-HDFS filesystems.
> In case of Synthetic file-ids, it is recommended that we track the full-tuple 
> (path, mtime, len) in the cache so that a cache-hit for the synthetic file-id 
> can be compared against the parameters & only accepted if those match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13262) LLAP: Remove log levels from DebugUtils

2016-03-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13262:
-
Attachment: HIVE-13262.2.patch

Addressed [~sershe]'s review comments.

> LLAP: Remove log levels from DebugUtils
> ---
>
> Key: HIVE-13262
> URL: https://issues.apache.org/jira/browse/HIVE-13262
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13262.1.patch, HIVE-13262.2.patch
>
>
> DebugUtils has many hardcoded log levels. To enable logging we need to 
> recompile code with desired value. Instead configure add loggers for these 
> classes with log levels via log4j properties. Also use parametrized logging 
> in IO elevator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196097#comment-15196097
 ] 

Aihua Xu commented on HIVE-13286:
-

I see. I will disallow the input of queryId and generate a new one every time 
then. 

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196100#comment-15196100
 ] 

Vikram Dixit K commented on HIVE-13286:
---

Yeah. The same queryId causes issues. We should disallow a change from the 
client. 

The HIVE_LOG_TRACE_ID is already present in the hive configuration. You could 
use that.

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196105#comment-15196105
 ] 

Vikram Dixit K commented on HIVE-13286:
---

Great! Thanks!

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12619) Switching the field order within an array of structs causes the query to fail

2016-03-15 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196198#comment-15196198
 ] 

Xuefu Zhang commented on HIVE-12619:


Patch #3 seems simpler and fixing the field ordering issue. Looking good on my 
side. +1. 
[~spena], it would be good if you can take a look too.

> Switching the field order within an array of structs causes the query to fail
> -
>
> Key: HIVE-12619
> URL: https://issues.apache.org/jira/browse/HIVE-12619
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ang Zhang
>Assignee: Mohammad Kamrul Islam
>Priority: Minor
> Attachments: HIVE-12619.2.patch, HIVE-12619.3.patch
>
>
> Switching the field order within an array of structs causes the query to fail 
> or return the wrong data for the fields, but switching the field order within 
> just a struct works.
> How to reproduce:
> Case1 if the two fields have the same type, query will return wrong data for 
> the fields
> drop table if exists schema_test;
> create table schema_test (msg array>) stored 
> as parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 'abc2')), array(named_struct('f1', 'efg', 'f2', 'efg2'))) from one 
> limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":"efg2"}]
> --[{"f1":"abc","f2":"abc2"}]
> alter table schema_test change msg msg array>;
> select * from schema_test;
> --returns
> --[{"f2":"efg","f1":"efg2"}]
> --[{"f2":"abc","f1":"abc2"}]
> Case2: if the two fields have different type, the query will fail
> drop table if exists schema_test;
> create table schema_test (msg array>) stored as 
> parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 1)), array(named_struct('f1', 'efg', 'f2', 2))) from one limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":2}]
> --[{"f1":"abc","f2":1}]
> alter table schema_test change msg msg array>;
> select * from schema_test;
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
> org.apache.hadoop.io.IntWritable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13232) Aggressively drop compression buffers in ORC OutStreams

2016-03-15 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-13232:
-
   Resolution: Fixed
Fix Version/s: (was: 1.3.0)
   (was: 0.14.1)
   Status: Resolved  (was: Patch Available)

> Aggressively drop compression buffers in ORC OutStreams
> ---
>
> Key: HIVE-13232
> URL: https://issues.apache.org/jira/browse/HIVE-13232
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.1.0
>
> Attachments: HIVE-13232.patch, HIVE-13232.patch, HIVE-13232.patch
>
>
> In Hive 0.11, when ORC's OutStream's were flushed they dropped all of the 
> their buffers. In the patch for HIVE-4324, we inadvertently changed that 
> behavior so that one of the buffers is held on to. For queries with a lot of 
> writers and thus under significant memory pressure this can have a 
> significant impact on the memory usage. 
> Note that "hive.optimize.sort.dynamic.partition" avoids this problem by 
> sorting on the dynamic partition key and thus only a single ORC writer is 
> open at once. This will use memory more effectively and avoid creating ORC 
> files with very small stripes, which will produce better downstream 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12995) LLAP: Synthetic file ids need collision checks

2016-03-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12995:

   Resolution: Fixed
Fix Version/s: 2.1.0
   Status: Resolved  (was: Patch Available)

Committed to master.

> LLAP: Synthetic file ids need collision checks
> --
>
> Key: HIVE-12995
> URL: https://issues.apache.org/jira/browse/HIVE-12995
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 2.1.0
>
> Attachments: HIVE-12995.01.patch, HIVE-12995.02.patch, 
> HIVE-12995.03.patch, HIVE-12995.04.patch, HIVE-12995.patch
>
>
> LLAP synthetic file ids do not have any way of checking whether a collision 
> occurs other than a data-error.
> Synthetic file-ids have only been used with unit tests so far - but they will 
> be needed to add cache mechanisms to non-HDFS filesystems.
> In case of Synthetic file-ids, it is recommended that we track the full-tuple 
> (path, mtime, len) in the cache so that a cache-hit for the synthetic file-id 
> can be compared against the parameters & only accepted if those match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-13286:

Attachment: HIVE-13286.1.patch

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
> Attachments: HIVE-13286.1.patch
>
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-13286:

Status: Patch Available  (was: Open)

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
> Attachments: HIVE-13286.1.patch
>
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196271#comment-15196271
 ] 

Aihua Xu commented on HIVE-13286:
-

Attached the patch-1: disallow the input of the queryId. queryId will be 
regenernated and put in confOverlay for each query. 

[~vikram.dixit] Can you take a look?

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
> Attachments: HIVE-13286.1.patch
>
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12374) Improve setting of JVM configs for HS2 and Metastore shell scripts

2016-03-15 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196344#comment-15196344
 ] 

Ashutosh Chauhan commented on HIVE-12374:
-

This has come up multiple times. Shall we get this in [~thejas] ?

> Improve setting of JVM configs for HS2 and Metastore shell scripts 
> ---
>
> Key: HIVE-12374
> URL: https://issues.apache.org/jira/browse/HIVE-12374
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-12374.1.patch
>
>
> Adding {{HIVESERVER2_JVM_OPTS}} and {{METASTORE_JVM_OPTS}} env variables, 
> which will eventually set {{HADOOP_CLIENT_OPTS}} (since we start the 
> processes using hadoop jar ...). Also setting these defaults:{{-Xms128m 
> -Xmx2048m -XX:MaxPermSize=128m}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13290) Support primary keys/foreign keys constraint as part of create table command in Hive

2016-03-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13290:
-
Attachment: HIVE-13290.1.patch

> Support primary keys/foreign keys constraint as part of create table command 
> in Hive
> 
>
> Key: HIVE-13290
> URL: https://issues.apache.org/jira/browse/HIVE-13290
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Logical Optimizer
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13290.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13290) Support primary keys/foreign keys constraint as part of create table command in Hive

2016-03-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196362#comment-15196362
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13290:
--

Draft #1 with the basic changes to accept primary keys/foreign keys in create 
statement and the APIs to retrieve them which is exposed via desc extended 
tablename;

cc [~ashutoshc]

Will improve on this and add test cases to cover any existing issues.

> Support primary keys/foreign keys constraint as part of create table command 
> in Hive
> 
>
> Key: HIVE-13290
> URL: https://issues.apache.org/jira/browse/HIVE-13290
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Logical Optimizer
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13290.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions

2016-03-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196415#comment-15196415
 ] 

Prasanth Jayachandran edited comment on HIVE-13178 at 3/15/16 10:51 PM:


I haven't gone through the core changes yet. Left some initial comments. Main 
concerns is we are bringing in ObjectInspector back into tree readers which 
will make it difficult to separate ORC out of hive. If this feature is targeted 
to be supported inside of orc then these object inspectors should be replaced 
by TypeDescriptors. Also it will good to subclass TreeReaderFactory to handle 
type conversions. 


was (Author: prasanth_j):
I haven't gone through core changes. Left some initial comments. Main concerns 
is we are bringing in ObjectInspector back into tree readers which will make it 
difficult to separate ORC out of hive. If this feature is targeted to be 
supported inside of orc then these object inspectors should be replaced by 
TypeDescriptors. Also it will good to subclass TreeReaderFactory to handle type 
conversions. 

> Enhance ORC Schema Evolution to handle more standard data type conversions
> --
>
> Key: HIVE-13178
> URL: https://issues.apache.org/jira/browse/HIVE-13178
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, 
> HIVE-13178.03.patch
>
>
> Currently, SHORT -> INT -> BIGINT is supported.
> Handle ORC data type conversions permitted by Implicit conversion allowed by 
> TypeIntoUtils.implicitConvertible method.
>*   STRING_GROUP -> DOUBLE
>*   STRING_GROUP -> DECIMAL
>*   DATE_GROUP -> STRING
>*   NUMERIC_GROUP -> STRING
>*   STRING_GROUP -> STRING_GROUP
>*
>*   // Upward from "lower" type to "higher" numeric type:
>*   BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions

2016-03-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196415#comment-15196415
 ] 

Prasanth Jayachandran commented on HIVE-13178:
--

I haven't gone through core changes. Left some initial comments. Main concerns 
is we are bringing in ObjectInspector back into tree readers which will make it 
difficult to separate ORC out of hive. If this feature is targeted to be 
supported inside of orc then these object inspectors should be replaced by 
TypeDescriptors. Also it will good to subclass TreeReaderFactory to handle type 
conversions. 

> Enhance ORC Schema Evolution to handle more standard data type conversions
> --
>
> Key: HIVE-13178
> URL: https://issues.apache.org/jira/browse/HIVE-13178
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, 
> HIVE-13178.03.patch
>
>
> Currently, SHORT -> INT -> BIGINT is supported.
> Handle ORC data type conversions permitted by Implicit conversion allowed by 
> TypeIntoUtils.implicitConvertible method.
>*   STRING_GROUP -> DOUBLE
>*   STRING_GROUP -> DECIMAL
>*   DATE_GROUP -> STRING
>*   NUMERIC_GROUP -> STRING
>*   STRING_GROUP -> STRING_GROUP
>*
>*   // Upward from "lower" type to "higher" numeric type:
>*   BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy

2016-03-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196423#comment-15196423
 ] 

Hive QA commented on HIVE-11675:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12793363/HIVE-11675.14.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9827 tests executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7275/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7275/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7275/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12793363 - PreCommit-HIVE-TRUNK-Build

> make use of file footer PPD API in ETL strategy or separate strategy
> 
>
> Key: HIVE-11675
> URL: https://issues.apache.org/jira/browse/HIVE-11675
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11675.01.patch, HIVE-11675.02.patch, 
> HIVE-11675.03.patch, HIVE-11675.04.patch, HIVE-11675.05.patch, 
> HIVE-11675.06.patch, HIVE-11675.07.patch, HIVE-11675.08.patch, 
> HIVE-11675.09.patch, HIVE-11675.10.patch, HIVE-11675.11.patch, 
> HIVE-11675.12.patch, HIVE-11675.13.patch, HIVE-11675.14.patch, 
> HIVE-11675.patch, HIVE-11675.premature.opti.patch
>
>
> Need to take a look at the best flow. It won't be much different if we do 
> filtering metastore call for each partition. So perhaps we'd need the custom 
> sync point/batching after all.
> Or we can make it opportunistic and not fetch any footers unless it can be 
> pushed down to metastore or fetched from local cache, that way the only slow 
> threaded op is directory listings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11388) there should only be 1 Initiator for compactions per Hive installation

2016-03-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196428#comment-15196428
 ] 

Alan Gates commented on HIVE-11388:
---

TxnHandler.isDuplicateKeyError - There are only cases in here for Derby and 
MySQL.  The other options will need to be added before this is committed.

Other than that looks good.





> there should only be 1 Initiator for compactions per Hive installation
> --
>
> Key: HIVE-11388
> URL: https://issues.apache.org/jira/browse/HIVE-11388
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-11388.patch
>
>
> org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs 
> inside the metastore service to manage compactions of ACID tables.  There 
> should be exactly 1 instance of this thread (even with multiple Thrift 
> services).
> This is documented in 
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
>  but not enforced.
> Should add enforcement, since more than 1 Initiator could cause concurrent 
> attempts to compact the same table/partition - which will not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13264) JDBC driver makes 2 Open Session Calls for every open session

2016-03-15 Thread NITHIN MAHESH (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

NITHIN MAHESH updated HIVE-13264:
-
Attachment: HIVE-13264.patch

Fixes Hive-13264 by refactoring the code to retry in openSession layer instead 
of openTransport.

> JDBC driver makes 2 Open Session Calls for every open session
> -
>
> Key: HIVE-13264
> URL: https://issues.apache.org/jira/browse/HIVE-13264
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: NITHIN MAHESH
>Assignee: NITHIN MAHESH
> Attachments: HIVE-13264.patch
>
>
> When HTTP is used as the transport mode by the Hive JDBC driver, we noticed 
> that there is an additional open/close session just to validate the 
> connection. 
>  
> TCLIService.Iface client = new TCLIService.Client(new 
> TBinaryProtocol(transport));
>   TOpenSessionResp openResp = client.OpenSession(new TOpenSessionReq());
>   if (openResp != null) {
> client.CloseSession(new 
> TCloseSessionReq(openResp.getSessionHandle()));
>   }
>  
> The open session call is a costly one and should not be used to test 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions

2016-03-15 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13178:

Status: In Progress  (was: Patch Available)

> Enhance ORC Schema Evolution to handle more standard data type conversions
> --
>
> Key: HIVE-13178
> URL: https://issues.apache.org/jira/browse/HIVE-13178
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, 
> HIVE-13178.03.patch
>
>
> Currently, SHORT -> INT -> BIGINT is supported.
> Handle ORC data type conversions permitted by Implicit conversion allowed by 
> TypeIntoUtils.implicitConvertible method.
>*   STRING_GROUP -> DOUBLE
>*   STRING_GROUP -> DECIMAL
>*   DATE_GROUP -> STRING
>*   NUMERIC_GROUP -> STRING
>*   STRING_GROUP -> STRING_GROUP
>*
>*   // Upward from "lower" type to "higher" numeric type:
>*   BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions

2016-03-15 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13178:

Status: Patch Available  (was: In Progress)

> Enhance ORC Schema Evolution to handle more standard data type conversions
> --
>
> Key: HIVE-13178
> URL: https://issues.apache.org/jira/browse/HIVE-13178
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, 
> HIVE-13178.03.patch, HIVE-13178.04.patch
>
>
> Currently, SHORT -> INT -> BIGINT is supported.
> Handle ORC data type conversions permitted by Implicit conversion allowed by 
> TypeIntoUtils.implicitConvertible method.
>*   STRING_GROUP -> DOUBLE
>*   STRING_GROUP -> DECIMAL
>*   DATE_GROUP -> STRING
>*   NUMERIC_GROUP -> STRING
>*   STRING_GROUP -> STRING_GROUP
>*
>*   // Upward from "lower" type to "higher" numeric type:
>*   BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions

2016-03-15 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13178:

Attachment: HIVE-13178.04.patch

Rebase with recent commits.

> Enhance ORC Schema Evolution to handle more standard data type conversions
> --
>
> Key: HIVE-13178
> URL: https://issues.apache.org/jira/browse/HIVE-13178
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, 
> HIVE-13178.03.patch, HIVE-13178.04.patch
>
>
> Currently, SHORT -> INT -> BIGINT is supported.
> Handle ORC data type conversions permitted by Implicit conversion allowed by 
> TypeIntoUtils.implicitConvertible method.
>*   STRING_GROUP -> DOUBLE
>*   STRING_GROUP -> DECIMAL
>*   DATE_GROUP -> STRING
>*   NUMERIC_GROUP -> STRING
>*   STRING_GROUP -> STRING_GROUP
>*
>*   // Upward from "lower" type to "higher" numeric type:
>*   BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13264) JDBC driver makes 2 Open Session Calls for every open session

2016-03-15 Thread NITHIN MAHESH (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

NITHIN MAHESH updated HIVE-13264:
-
   Labels: jdbc  (was: )
Affects Version/s: 1.2.1
 Target Version/s: 1.2.1, 1.2.0
 Tags: jdbc
   Status: Patch Available  (was: Open)

Refactored the code in HiveConnection to do the connection retry at openSession 
level instead of OpenConnection.

> JDBC driver makes 2 Open Session Calls for every open session
> -
>
> Key: HIVE-13264
> URL: https://issues.apache.org/jira/browse/HIVE-13264
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.1
>Reporter: NITHIN MAHESH
>Assignee: NITHIN MAHESH
>  Labels: jdbc
> Attachments: HIVE-13264.patch
>
>
> When HTTP is used as the transport mode by the Hive JDBC driver, we noticed 
> that there is an additional open/close session just to validate the 
> connection. 
>  
> TCLIService.Iface client = new TCLIService.Client(new 
> TBinaryProtocol(transport));
>   TOpenSessionResp openResp = client.OpenSession(new TOpenSessionReq());
>   if (openResp != null) {
> client.CloseSession(new 
> TCloseSessionReq(openResp.getSessionHandle()));
>   }
>  
> The open session call is a costly one and should not be used to test 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13288) Confusing exception message in DagUtils.localizeResource

2016-03-15 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated HIVE-13288:
--
Affects Version/s: 1.2.1
  Component/s: Clients

> Confusing exception message in DagUtils.localizeResource
> 
>
> Key: HIVE-13288
> URL: https://issues.apache.org/jira/browse/HIVE-13288
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 1.2.1
>Reporter: Jeff Zhang
>
> I got the following exception when query through hive server2. And check the 
> source code, it it due to some error when copying data from local to hdfs. 
> But the IOException is ignored and assume that it is due to another thread is 
> also writing. I don't think it make sense to assume that, at least should log 
> the IOException. 
> {code}
> LOG.info("Localizing resource because it does not exist: " + src + " to dest: 
> " + dest);
>   try {
> destFS.copyFromLocalFile(false, false, src, dest);
>   } catch (IOException e) {
> LOG.info("Looks like another thread is writing the same file will 
> wait.");
> int waitAttempts =
> 
> conf.getInt(HiveConf.ConfVars.HIVE_LOCALIZE_RESOURCE_NUM_WAIT_ATTEMPTS.varname,
> 
> HiveConf.ConfVars.HIVE_LOCALIZE_RESOURCE_NUM_WAIT_ATTEMPTS.defaultIntVal);
> long sleepInterval = HiveConf.getTimeVar(
> conf, HiveConf.ConfVars.HIVE_LOCALIZE_RESOURCE_WAIT_INTERVAL,
> TimeUnit.MILLISECONDS);
> LOG.info("Number of wait attempts: " + waitAttempts + ". Wait 
> interval: "
> + sleepInterval);
> boolean found = false;
> {code}
> {noformat}
> 2016-03-15 11:25:39,921 INFO  [HiveServer2-Background-Pool: Thread-249]: 
> tez.DagUtils (DagUtils.java:getHiveJarDirectory(876)) - Jar dir is 
> null/directory doesn't exist. Choosing HIVE_INSTALL_DIR - /user/jeff/.hiveJars
> 2016-03-15 11:25:40,058 INFO  [HiveServer2-Background-Pool: Thread-249]: 
> tez.DagUtils (DagUtils.java:localizeResource(952)) - Localizing resource 
> because it does not exist: 
> file:/usr/hdp/2.3.2.0-2950/hive/lib/hive-exec-1.2.1.2.3.2.0-2950.jar to dest: 
> hdfs://sandbox.hortonworks.com:8020/user/jeff/.hiveJars/hive-exec-1.2.1.2.3.2.0-2950-a97c953db414a4f792d868e2b0417578a61ccfa368048016926117b641b07f34.jar
> 2016-03-15 11:25:40,063 INFO  [HiveServer2-Background-Pool: Thread-249]: 
> tez.DagUtils (DagUtils.java:localizeResource(956)) - Looks like another 
> thread is writing the same file will wait.
> 2016-03-15 11:25:40,064 INFO  [HiveServer2-Background-Pool: Thread-249]: 
> tez.DagUtils (DagUtils.java:localizeResource(963)) - Number of wait attempts: 
> 5. Wait interval: 5000
> 2016-03-15 11:25:53,548 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(294)) - Client 
> protocol version: HIVE_CLI_SERVICE_PROTOCOL_V8
> 2016-03-15 11:25:53,548 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 1: Shutting down 
> the object store...
> 2016-03-15 11:25:53,549 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - 
> ugi=hive/sandbox.hortonworks@example.com   ip=unknown-ip-addr  
> cmd=Shutting down the object store...
> 2016-03-15 11:25:53,549 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 1: Metastore 
> shutdown complete.
> 2016-03-15 11:25:53,549 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - 
> ugi=hive/sandbox.hortonworks@example.com   ip=unknown-ip-addr  
> cmd=Metastore shutdown complete.
> 2016-03-15 11:25:53,573 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> session.SessionState (SessionState.java:createPath(641)) - Created local 
> directory: /tmp/e43fbaab-a659-4331-90cb-0ea0b2098e25_resources
> 2016-03-15 11:25:53,577 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> session.SessionState (SessionState.java:createPath(641)) - Created HDFS 
> directory: /tmp/hive/ambari-qa/e43fbaab-a659-4331-90cb-0ea0b2098e25
> 2016-03-15 11:25:53,582 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> session.SessionState (SessionState.java:createPath(641)) - Created local 
> directory: /tmp/hive/e43fbaab-a659-4331-90cb-0ea0b2098e25
> 2016-03-15 11:25:53,587 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> session.SessionState (SessionState.java:createPath(641)) - Created HDFS 
> directory: 
> /tmp/hive/ambari-qa/e43fbaab-a659-4331-90cb-0ea0b2098e25/_tmp_space.db
> 2016-03-15 11:25:53,592 INFO  [HiveServer2-Handler-Pool: Thread-48]: 
> session.HiveSessionImpl (HiveSessionImpl.java:setOperationLogSessionDir(236)) 
> - Operation log session directory is created: 
> /home/hive/${sy

[jira] [Commented] (HIVE-13027) Async loggers for LLAP

2016-03-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196507#comment-15196507
 ] 

Prasanth Jayachandran commented on HIVE-13027:
--

I tried another query with TPCDS 1TB scale. The runtime for queries at WARN 
level and INFO are 18.1s vs 18.5s respectively. One thing I am noticing is that 
the presence of disruptor jar in the classpath itself triggers Async logging 
without -DLog4jContextSelector system property. I don't know why yet.  

> Async loggers for LLAP
> --
>
> Key: HIVE-13027
> URL: https://issues.apache.org/jira/browse/HIVE-13027
> Project: Hive
>  Issue Type: Improvement
>  Components: Logging
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13027.1.patch
>
>
> LOG4j2's async logger claims to have 6-68 times better performance than 
> synchronous logger. https://logging.apache.org/log4j/2.x/manual/async.html
> We should use that for LLAP. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12439) CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements

2016-03-15 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196513#comment-15196513
 ] 

Eugene Koifman commented on HIVE-12439:
---

what about TestTxnCommands2?  This is certainly a relevant test

> CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements
> --
>
> Key: HIVE-12439
> URL: https://issues.apache.org/jira/browse/HIVE-12439
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-12439.1.patch
>
>
> # add a safeguard to make sure IN clause is not too large; break up by txn id 
> to delete from TXN_COMPONENTS where tc_txnid in ...
> # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, 
> rather than 1 DB roundtrip per row



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores

2016-03-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11388:
--
Summary: Allow ACID Compactor components to run in multiple metastores  
(was: there should only be 1 Initiator for compactions per Hive installation)

> Allow ACID Compactor components to run in multiple metastores
> -
>
> Key: HIVE-11388
> URL: https://issues.apache.org/jira/browse/HIVE-11388
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-11388.patch
>
>
> org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs 
> inside the metastore service to manage compactions of ACID tables.  There 
> should be exactly 1 instance of this thread (even with multiple Thrift 
> services).
> This is documented in 
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
>  but not enforced.
> Should add enforcement, since more than 1 Initiator could cause concurrent 
> attempts to compact the same table/partition - which will not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores

2016-03-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11388:
--
Description: 
(this description is no loner accurate; see further comments)

org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs inside 
the metastore service to manage compactions of ACID tables.  There should be 
exactly 1 instance of this thread (even with multiple Thrift services).

This is documented in 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
 but not enforced.

Should add enforcement, since more than 1 Initiator could cause concurrent 
attempts to compact the same table/partition - which will not work.

  was:
org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs inside 
the metastore service to manage compactions of ACID tables.  There should be 
exactly 1 instance of this thread (even with multiple Thrift services).

This is documented in 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
 but not enforced.

Should add enforcement, since more than 1 Initiator could cause concurrent 
attempts to compact the same table/partition - which will not work.


> Allow ACID Compactor components to run in multiple metastores
> -
>
> Key: HIVE-11388
> URL: https://issues.apache.org/jira/browse/HIVE-11388
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-11388.patch
>
>
> (this description is no loner accurate; see further comments)
> org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs 
> inside the metastore service to manage compactions of ACID tables.  There 
> should be exactly 1 instance of this thread (even with multiple Thrift 
> services).
> This is documented in 
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
>  but not enforced.
> Should add enforcement, since more than 1 Initiator could cause concurrent 
> attempts to compact the same table/partition - which will not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12439) CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements

2016-03-15 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196525#comment-15196525
 ] 

Wei Zheng commented on HIVE-12439:
--

Oh that's not a real failure. It's complaining about no TEST-*.xml file.

> CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements
> --
>
> Key: HIVE-12439
> URL: https://issues.apache.org/jira/browse/HIVE-12439
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-12439.1.patch
>
>
> # add a safeguard to make sure IN clause is not too large; break up by txn id 
> to delete from TXN_COMPONENTS where tc_txnid in ...
> # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, 
> rather than 1 DB roundtrip per row



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12439) CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements

2016-03-15 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196536#comment-15196536
 ] 

Eugene Koifman commented on HIVE-12439:
---

BTW, this patch no longer applies to current master

> CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements
> --
>
> Key: HIVE-12439
> URL: https://issues.apache.org/jira/browse/HIVE-12439
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-12439.1.patch
>
>
> # add a safeguard to make sure IN clause is not too large; break up by txn id 
> to delete from TXN_COMPONENTS where tc_txnid in ...
> # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, 
> rather than 1 DB roundtrip per row



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks

2016-03-15 Thread Rohit Dholakia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Dholakia updated HIVE-12049:
--
Attachment: HIVE-12049.14.patch

> Provide an option to write serialized thrift objects in final tasks
> ---
>
> Key: HIVE-12049
> URL: https://issues.apache.org/jira/browse/HIVE-12049
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
> Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, 
> HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, 
> HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, 
> HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch
>
>
> For each fetch request to HiveServer2, we pay the penalty of deserializing 
> the row objects and translating them into a different representation suitable 
> for the RPC transfer. In a moderate to high concurrency scenarios, this can 
> result in significant CPU and memory wastage. By having each task write the 
> appropriate thrift objects to the output files, HiveServer2 can simply stream 
> a batch of rows on the wire without incurring any of the additional cost of 
> deserialization and translation. 
> This can be implemented by writing a new SerDe, which the FileSinkOperator 
> can use to write thrift formatted row batches to the output file. Using the 
> pluggable property of the {{hive.query.result.fileformat}}, we can set it to 
> use SequenceFile and write a batch of thrift formatted rows as a value blob. 
> The FetchTask can now simply read the blob and send it over the wire. On the 
> client side, the *DBC driver can read the blob and since it is already 
> formatted in the way it expects, it can continue building the ResultSet the 
> way it does in the current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy

2016-03-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11675:

   Resolution: Fixed
Fix Version/s: 2.1.0
   Status: Resolved  (was: Patch Available)

Committed to master after resolving conflicts.

> make use of file footer PPD API in ETL strategy or separate strategy
> 
>
> Key: HIVE-11675
> URL: https://issues.apache.org/jira/browse/HIVE-11675
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.1.0
>
> Attachments: HIVE-11675.01.patch, HIVE-11675.02.patch, 
> HIVE-11675.03.patch, HIVE-11675.04.patch, HIVE-11675.05.patch, 
> HIVE-11675.06.patch, HIVE-11675.07.patch, HIVE-11675.08.patch, 
> HIVE-11675.09.patch, HIVE-11675.10.patch, HIVE-11675.11.patch, 
> HIVE-11675.12.patch, HIVE-11675.13.patch, HIVE-11675.14.patch, 
> HIVE-11675.patch, HIVE-11675.premature.opti.patch
>
>
> Need to take a look at the best flow. It won't be much different if we do 
> filtering metastore call for each partition. So perhaps we'd need the custom 
> sync point/batching after all.
> Or we can make it opportunistic and not fetch any footers unless it can be 
> pushed down to metastore or fetched from local cache, that way the only slow 
> threaded op is directory listings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13249) Hard upper bound on number of open transactions

2016-03-15 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196611#comment-15196611
 ] 

Eugene Koifman commented on HIVE-13249:
---

AcidHouseKeeper uses a ScheduledExecutorService.  It run multiple tasks on 
separate schedules.   I don't think there was ever a proposal to run any 
threads on client side.

The idea Wei and I discussed was to run a single thread per  metastore JVM to 
count number of txns periodically and check the computed value in 
TxnHandler.openTxnx() each time it's called.
I think this is conceptually the same as your idea.

It's easy enough to have HouseKeeper run multiple tasks, but it complicates 
testing since it makes it harder to just run one iteration of a particular 
task.  We'd need to do some refactoring in HouseKeepers to make sure this is 
possible - then they can be combined into a single HouseKeeper that runs 
multiple periodic tasks.

Wei, I said earlier that putting this computation in AcidHouseKeeper was a bad 
idea but I was wrong.  Since there is a single AcidHouseKeeper per JVM, the 
task that it runs can easily just set a static variable on TxnHandler with 
results of the computation which openTxns() can read.

As far as testing, look at TestTxnHandler for example, there are multiple 
examples openTxns() calls.  In fact each call can open many txnxs at once.  
TestTxnCommands.testTimeOutReaper() has an example on how to run the 
HouseKeeper in UT, but like I said, you'd need to refactor it a bit if you want 
to run multiple tasks in it.


> Hard upper bound on number of open transactions
> ---
>
> Key: HIVE-13249
> URL: https://issues.apache.org/jira/browse/HIVE-13249
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13249.1.patch
>
>
> We need to have a safeguard by adding an upper bound for open transactions to 
> avoid huge number of open-transaction requests, usually due to improper 
> configuration of clients such as Storm.
> Once that limit is reached, clients will start failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR

2016-03-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196615#comment-15196615
 ] 

Hive QA commented on HIVE-13084:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12793547/HIVE-13084.03.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 9800 tests executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-vector_decimal_10_0.q-vector_acid3.q-vector_decimal_trailing.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_distinct_2.q-vector_interval_2.q-load_dyn_part2.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_multi_and_projection
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_multi_or_projection
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_multi_and_projection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7277/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7277/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7277/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12793547 - PreCommit-HIVE-TRUNK-Build

> Vectorization add support for PROJECTION Multi-AND/OR
> -
>
> Key: HIVE-13084
> URL: https://issues.apache.org/jira/browse/HIVE-13084
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
> Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, 
> HIVE-13084.03.patch, vector_between_date.q
>
>
> When there is case statement in group by, hive throws unable to vectorize 
> exception.
> e.g query just to demonstrate the problem
> {noformat}
> explain select l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts 
> group by l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END;
> org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize 
> expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2
>   File Output Operator [FS_7]
> Group By Operator [GBY_5] (rows=888777234 width=108)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE]
>   SHUFFLE [RS_4]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_3] (rows=1777554469 width=108)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_1] (rows=1777554469 width=108)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=1777554469 width=108)
>   
> rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"]
> {noformat}
> \cc [~mmccline], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size

2016-03-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13291:
-
Attachment: HIVE-13291.1.patch

> ORC BI Split strategy should consider block size instead of file size
> -
>
> Key: HIVE-13291
> URL: https://issues.apache.org/jira/browse/HIVE-13291
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13291.1.patch
>
>
> When we force split strategy to use "BI" (using 
> hive.exec.orc.split.strategy), entire file is considered as single split. 
> This might be inefficient when the files are large. Instead, BI should 
> consider splitting at block boundary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size

2016-03-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13291:
-
Status: Patch Available  (was: Open)

> ORC BI Split strategy should consider block size instead of file size
> -
>
> Key: HIVE-13291
> URL: https://issues.apache.org/jira/browse/HIVE-13291
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13291.1.patch
>
>
> When we force split strategy to use "BI" (using 
> hive.exec.orc.split.strategy), entire file is considered as single split. 
> This might be inefficient when the files are large. Instead, BI should 
> consider splitting at block boundary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size

2016-03-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196695#comment-15196695
 ] 

Prasanth Jayachandran commented on HIVE-13291:
--

[~gopalv] Could you please review this patch?

> ORC BI Split strategy should consider block size instead of file size
> -
>
> Key: HIVE-13291
> URL: https://issues.apache.org/jira/browse/HIVE-13291
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13291.1.patch
>
>
> When we force split strategy to use "BI" (using 
> hive.exec.orc.split.strategy), entire file is considered as single split. 
> This might be inefficient when the files are large. Instead, BI should 
> consider splitting at block boundary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size

2016-03-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196711#comment-15196711
 ] 

Prasanth Jayachandran commented on HIVE-13291:
--

[~gopalv] Addressed your review comments about using locations with offsets.

> ORC BI Split strategy should consider block size instead of file size
> -
>
> Key: HIVE-13291
> URL: https://issues.apache.org/jira/browse/HIVE-13291
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch
>
>
> When we force split strategy to use "BI" (using 
> hive.exec.orc.split.strategy), entire file is considered as single split. 
> This might be inefficient when the files are large. Instead, BI should 
> consider splitting at block boundary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size

2016-03-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13291:
-
Attachment: HIVE-13291.2.patch

> ORC BI Split strategy should consider block size instead of file size
> -
>
> Key: HIVE-13291
> URL: https://issues.apache.org/jira/browse/HIVE-13291
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch
>
>
> When we force split strategy to use "BI" (using 
> hive.exec.orc.split.strategy), entire file is considered as single split. 
> This might be inefficient when the files are large. Instead, BI should 
> consider splitting at block boundary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size

2016-03-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196717#comment-15196717
 ] 

Gopal V commented on HIVE-13291:


Left some minor comments about that loop.

Approach LGTM - +1, tests pending.

> ORC BI Split strategy should consider block size instead of file size
> -
>
> Key: HIVE-13291
> URL: https://issues.apache.org/jira/browse/HIVE-13291
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch
>
>
> When we force split strategy to use "BI" (using 
> hive.exec.orc.split.strategy), entire file is considered as single split. 
> This might be inefficient when the files are large. Instead, BI should 
> consider splitting at block boundary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12439) CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements

2016-03-15 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196731#comment-15196731
 ] 

Eugene Koifman commented on HIVE-12439:
---

1. CompactionTxnHandler.cleanEmptyAborted() - why rewrite 
"String s = "select txn_id from TXNS where " +
  "txn_id not in (select tc_txnid from TXN_COMPONENTS) and " +
  "txn_state = '" + TXN_ABORTED + "'";"
The IN clause here doesn't list values - it's not (cannot in fact be) subject 
to 1000 or any other limit.
Also, part of your rewrite lost
"LOG.info("Removed " + rc + "  empty Aborted transactions: " + txnIdBatch + " 
from TXNS");"
This is a critical debug/support log statement - it logs the actual txn IDs 
that were cleared.

2. TxnHandler.openTxns()
"  if (i > first) {
valuesClause.append(", ");
  }
"
this will generate a query with "values,(..." if the previous "if" with 
METASTORE_DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE executes.
This is a nit but this class has quoteString() and quoteChar() to generate SQL 
with string values

3. TxnHandler.timeOutLocks() - why does this need a suffix at all?  The extra 
parentheses seem redundant.
4. TxnHandler.abortTxns() - there seems to be a redundant set or parentheses 
wrapping the IN clause.  Why is this necessary?
5. TestTxnUtils - I think this test is very limited.  It would be better (in 
addition) to add some tests that will actually cause the new queries to execute 
in a DB (Derby in practice).  In particular, once the 2 new properties are 
exceeded.  I think that would provide better test coverage.

> CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements
> --
>
> Key: HIVE-12439
> URL: https://issues.apache.org/jira/browse/HIVE-12439
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-12439.1.patch
>
>
> # add a safeguard to make sure IN clause is not too large; break up by txn id 
> to delete from TXN_COMPONENTS where tc_txnid in ...
> # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, 
> rather than 1 DB roundtrip per row



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size

2016-03-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13291:
-
Attachment: HIVE-13291.3.patch

Addressed [~gopalv]'s RB comments.

> ORC BI Split strategy should consider block size instead of file size
> -
>
> Key: HIVE-13291
> URL: https://issues.apache.org/jira/browse/HIVE-13291
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch
>
>
> When we force split strategy to use "BI" (using 
> hive.exec.orc.split.strategy), entire file is considered as single split. 
> This might be inefficient when the files are large. Instead, BI should 
> consider splitting at block boundary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size

2016-03-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13291:
-
Attachment: (was: HIVE-13291.3.patch)

> ORC BI Split strategy should consider block size instead of file size
> -
>
> Key: HIVE-13291
> URL: https://issues.apache.org/jira/browse/HIVE-13291
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch
>
>
> When we force split strategy to use "BI" (using 
> hive.exec.orc.split.strategy), entire file is considered as single split. 
> This might be inefficient when the files are large. Instead, BI should 
> consider splitting at block boundary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size

2016-03-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13291:
-
Attachment: HIVE-13291.3.patch

> ORC BI Split strategy should consider block size instead of file size
> -
>
> Key: HIVE-13291
> URL: https://issues.apache.org/jira/browse/HIVE-13291
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch, 
> HIVE-13291.3.patch
>
>
> When we force split strategy to use "BI" (using 
> hive.exec.orc.split.strategy), entire file is considered as single split. 
> This might be inefficient when the files are large. Instead, BI should 
> consider splitting at block boundary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12977) Pass credentials in the current UGI while creating Tez session

2016-03-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196769#comment-15196769
 ] 

Hive QA commented on HIVE-12977:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12793407/HIVE-12977.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 9829 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_limit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_hybridgrace_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_llap_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_result_complex
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_decimal
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_multiinsert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_date_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_expressions
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_reduce2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7279/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7279/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7279/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12793407 - PreCommit-HIVE-TRUNK-Build

> Pass credentials in the current UGI while creating Tez session
> --
>
> Key: HIVE-12977
> URL: https://issues.apache.org/jira/browse/HIVE-12977
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Vinoth Sathappan
>Assignee: Vinoth Sathappan
> Attachments: HIVE-12977.1.patch, HIVE-12977.1.patch
>
>
> The credentials present in the current UGI i.e. 
> UserGroupInformation.getCurrentUser().getCredentials() isn't passed to the 
> Tez session. It is instantiated with null credentials 
> session = TezClient.create("HIVE-" + sessionId, tezConfig, true,
> commonLocalResources, null);
> In this case, Tez fails to access resources even if the tokens are available 
> in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13277) Exception "Unable to create serializer 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " occurred during query execution on spark engine when ve

2016-03-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196806#comment-15196806
 ] 

Rui Li commented on HIVE-13277:
---

Pinging [~xuefuz]

> Exception "Unable to create serializer 
> 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " 
> occurred during query execution on spark engine when vectorized execution is 
> switched on
> -
>
> Key: HIVE-13277
> URL: https://issues.apache.org/jira/browse/HIVE-13277
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Hive Version: Apache Hive 2.0.0
> Spark Version: Apache Spark 1.6.0
>Reporter: Xin Hao
>
> Found when executing TPCx-BB query2 for Hive on Spark engine, and switch on :
> Found during TPCx-BB query2 execution on spark engine when vectorized 
> execution is switched on:
> (1) set hive.vectorized.execution.enabled=true; 
> (2) set hive.vectorized.execution.reduce.enabled=true; (default value for 
> Apache Hive 2.0.0)
> It's OK for spark engine when hive.vectorized.execution.enabled is switched 
> off:
> (1) set hive.vectorized.execution.enabled=false;
> (2) set hive.vectorized.execution.reduce.enabled=true;
> For MR engine, the query could pass and no exception occurred when vectorized 
> execution is either switched on or switched off.
> Detail Error Message is below:
> {noformat}
> 2016-03-14T10:09:33,692 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 INFO 
> spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 154 
> bytes
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 WARN 
> scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 25, bhx3): 
> java.lang.RuntimeException: Failed to load plan: 
> hdfs://bhx3:8020/tmp/hive/root/40b90ebd-32d4-47bc-a5ab-12ff1c05d0d2/hive_2016-03-14_10-08-56_307_7692316402338632647-1/-mr-10002/ab0c0021-0c1a-496e-9703-87d5879353c8/reduce.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - Serialization trace:
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - reducer 
> (org.apache.hadoop.hive.ql.plan.ReduceWork)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:306)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.init(SparkReduceRecordHandler.java:117)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:46)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:28)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.Sp

[jira] [Commented] (HIVE-12612) beeline always exits with 0 status when reading query from standard input

2016-03-15 Thread stephen sprague (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196814#comment-15196814
 ] 

stephen sprague commented on HIVE-12612:


[~psequeirag]  - hey, thanks for the workaround suggestion. we too use tons of 
here documents and this is a show stopper.   

/dev/stdin  should be treated same as -f flag (a file) when /dev/stdin is not a 
tty. :)

> beeline always exits with 0 status when reading query from standard input
> -
>
> Key: HIVE-12612
> URL: https://issues.apache.org/jira/browse/HIVE-12612
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.1.0
> Environment: CDH5.5.0
>Reporter: Paulo Sequeira
>Priority: Minor
>
> Similar to what was reported on HIVE-6978, but now it only happens when the 
> query is read from the standard input. For example, the following fails as 
> expected:
> {code}
> bash$ if beeline -u "jdbc:hive2://..." -e "boo;" ; then echo "Ok?!" ; else 
> echo "Failed!" ; fi
> Connecting to jdbc:hive2://...
> Connected to: Apache Hive (version 1.1.0-cdh5.5.0)
> Driver: Hive JDBC (version 1.1.0-cdh5.5.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Error: Error while compiling statement: FAILED: ParseException line 1:0 
> cannot recognize input near 'boo' '' '' (state=42000,code=4)
> Closing: 0: jdbc:hive2://...
> Failed!
> {code}
> But the following does not:
> {code}
> bash$ if echo "boo;"|beeline -u "jdbc:hive2://..." ; then echo "Ok?!" ; else 
> echo "Failed!" ; fi
> Connecting to jdbc:hive2://...
> Connected to: Apache Hive (version 1.1.0-cdh5.5.0)
> Driver: Hive JDBC (version 1.1.0-cdh5.5.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 1.1.0-cdh5.5.0 by Apache Hive
> 0: jdbc:hive2://...:8> Error: Error while compiling statement: FAILED: 
> ParseException line 1:0 cannot recognize input near 'boo' '' '' 
> (state=42000,code=4)
> 0: jdbc:hive2://...:8> Closing: 0: jdbc:hive2://...
> Ok?!
> {code}
> This was misleading our batch scripts to always believe that the execution of 
> the queries succeded, when sometimes that was not the case. 
> h2. Workaround
> We found we can work around the issue by always using the -e or the -f 
> parameters, and even reading the standard input through the /dev/stdin device 
> (this was useful because a lot of the scripts fed the queries from here 
> documents), like this:
> {code:title=some-script.sh}
> #!/bin/sh
> set -o nounset -o errexit -o pipefail
> # As beeline is failing to report an error status if reading the query
> # to be executed from STDIN, check whether no -f or -e option is used
> # and, in that case, pretend it has to read the query from a regular
> # file using -f to read from /dev/stdin
> function beeline_workaround_exit_status () {
> for arg in "$@"
> do if [ "$arg" = "-f" -o "$arg" = "-e" ]
>then beeline -u "..." "$@"
> return
>fi
> done
> beeline -u "..." "$@" -f /dev/stdin
> }
> beeline_workaround_exit_status < boo;
> EOF
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR

2016-03-15 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196817#comment-15196817
 ] 

Matt McCline commented on HIVE-13084:
-


Seems like there may be a bug in the Comparison Vector Expressions:

{code}
SELECT t, si, i, (t < 0) as child1, (si > 0) as child2, (i < 0) as child3, (t < 
0 OR si > 0 OR i < 0) as multi_or_col
from vectortab2k_orc
 where pmod(i,4) = 2
 order by t, si, i;

Non-Vectorized:
tsi i  child1  child2child3  multi_or_col
-124 NULL   206942178  trueNULL  false   true

Vectorized:
tsi   ichild1  child2child3  multi_or_col
-124 NULL   206942178  truetrue  false   true

{code}

Child 2 is different!

LongColGreaterLongScalar ???

> Vectorization add support for PROJECTION Multi-AND/OR
> -
>
> Key: HIVE-13084
> URL: https://issues.apache.org/jira/browse/HIVE-13084
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Matt McCline
> Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, 
> HIVE-13084.03.patch, vector_between_date.q
>
>
> When there is case statement in group by, hive throws unable to vectorize 
> exception.
> e.g query just to demonstrate the problem
> {noformat}
> explain select l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts 
> group by l_partkey, case when l_commitdate between '2015-06-30' AND 
> '2015-07-06' THEN '2015-06-30' END;
> org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize 
> expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2
>   File Output Operator [FS_7]
> Group By Operator [GBY_5] (rows=888777234 width=108)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE]
>   SHUFFLE [RS_4]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_3] (rows=1777554469 width=108)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_1] (rows=1777554469 width=108)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=1777554469 width=108)
>   
> rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"]
> {noformat}
> \cc [~mmccline], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13292) Different DOUBLE type precision issue between Spark and MR engine

2016-03-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196819#comment-15196819
 ] 

Sergey Shelukhin commented on HIVE-13292:
-

With double type, it's usually by design

> Different DOUBLE type precision issue between Spark and MR engine
> -
>
> Key: HIVE-13292
> URL: https://issues.apache.org/jira/browse/HIVE-13292
> Project: Hive
>  Issue Type: Bug
> Environment: Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>
> Different DOUBLE type precision issue between Spark and MR engine.
> Found when executing the TPC-H query5 with scale factor 2 (2GB data size). 
> More details are as below.
> (1)The MR engine output:
> MOZAMBIQUE,1.0646195910990009E8
> ETHIOPIA,1.0108856206629996E8
> ALGERIA,9.987582690420012E7
> MOROCCO,9.785484184850013E7
> KENYA,9.412388077690017E7
> (2)The Spark engine output:
> MOZAMBIQUE,1.064619591099E8
> ETHIOPIA,1.0108856206630005E8
> ALGERIA,9.987582690419997E7
> MOROCCO,9.785484184850003E7
> KENYA,9.412388077690002E7
> (3)Detail SQL used:
> drop table if exists ${env:RESULT_TABLE};
> create table ${env:RESULT_TABLE} (
>   pid1 STRING,
>   pid2 DOUBLE
> )
> row format delimited fields terminated by ',' lines terminated by '\n'
> stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location 
> '${env:RESULT_DIR}';
> insert into table ${env:RESULT_TABLE}
> select
> n_name,
> sum(l_extendedprice * (1 - l_discount)) as revenue
> from
> customer,
> orders,
> lineitem,
> supplier,
> nation,
> region
> where
> c_custkey = o_custkey
> and l_orderkey = o_orderkey
> and l_suppkey = s_suppkey
> and c_nationkey = s_nationkey
> and s_nationkey = n_nationkey
> and n_regionkey = r_regionkey
> and r_name = 'AFRICA'
> and o_orderdate >= '1993-01-01'
> and o_orderdate < '1994-01-01'
> group by
> n_name
> order by
> revenue desc;
> (4)Similar issue also exists even after we simplified original query to a 
> simpler one as below:
> drop table if exists ${env:RESULT_TABLE};
> create table ${env:RESULT_TABLE} (
>   pid2 DOUBLE
> )
> row format delimited fields terminated by ',' lines terminated by '\n'
> stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location 
> '${env:RESULT_DIR}';
> insert into table ${env:RESULT_TABLE}
> select
> sum(l_extendedprice * (1 - l_discount)) as revenue
> from
> lineitem
> group by
> l_orderkey
> order by
> revenue;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >