[jira] [Commented] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types

2017-07-12 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083503#comment-16083503
 ] 

Matt McCline commented on HIVE-16730:
-

Committed to master.

> Vectorization: Schema Evolution for Text Vectorization / Complex Types
> --
>
> Key: HIVE-16730
> URL: https://issues.apache.org/jira/browse/HIVE-16730
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch, 
> HIVE-16730.3.patch
>
>
> With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes 
> PARTIAL2, FINAL, and COMPLETE  for AVG" change, the tests 
> schema_evol_text_vec_part_all_complex.q and 
> schema_evol_text_vecrow_part_all_complex.q fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-16977) Vectorization: Vectorize expressions in THEN/ELSE branches of IF/CASE WHEN

2017-07-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline resolved HIVE-16977.
-
Resolution: Not A Problem

> Vectorization: Vectorize expressions in THEN/ELSE branches of IF/CASE WHEN
> --
>
> Key: HIVE-16977
> URL: https://issues.apache.org/jira/browse/HIVE-16977
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
>
> VectorUDFAdaptor(CASE WHEN ((_col2 > 0)) THEN ((UDFToDouble(_col3) / 
> UDFToDouble(_col2)) BETWEEN 0. AND 1.5) ...
> The expression in the THEN is not permitted.   Only columns or constants are 
> vectorized.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17076) typo in itests/src/test/resources/testconfiguration.properties

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083540#comment-16083540
 ] 

Hive QA commented on HIVE-17076:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876715/HIVE-17076.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10719 tests 
executed
*Failed tests:*
{noformat}
TestCleaner2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=257)
TestConvertAstToSearchArg - did not produce a TEST-*.xml file (likely timed 
out) (batchId=257)
TestIOContextMap - did not produce a TEST-*.xml file (likely timed out) 
(batchId=257)
TestInitiator - did not produce a TEST-*.xml file (likely timed out) 
(batchId=257)
TestRecordIdentifier - did not produce a TEST-*.xml file (likely timed out) 
(batchId=257)
TestSearchArgumentImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=257)
TestWorker - did not produce a TEST-*.xml file (likely timed out) (batchId=257)
TestWorker2 - did not produce a TEST-*.xml file (likely timed out) (batchId=257)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table]
 (batchId=54)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=239)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5973/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5973/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5973/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876715 - PreCommit-HIVE-Build

> typo in itests/src/test/resources/testconfiguration.properties
> --
>
> Key: HIVE-17076
> URL: https://issues.apache.org/jira/browse/HIVE-17076
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17076.01.patch
>
>
> it has 
> {noformat}
> minillap.shared.query.files=insert_into1.q,\
>   insert_into2.q,\
>   insert_values_orig_table.,\
>   llapdecider.q,\
> {noformat}
>  "insert_values_orig_table.,\" is a typo which causes these to be run with 
> TestCliDriver
> Note that there are 2 .q files that start with insert_values_orig_table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators

2017-07-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16100:
---
Attachment: HIVE-16100.4.patch

> Dynamic Sorted Partition optimizer loses sibling operators
> --
>
> Key: HIVE-16100
> URL: https://issues.apache.org/jira/browse/HIVE-16100
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, 
> HIVE-16100.2.patch, HIVE-16100.3.patch, HIVE-16100.4.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173
> {code}
>   // unlink connection between FS and its parent
>   fsParent = fsOp.getParentOperators().get(0);
>   fsParent.getChildOperators().clear();
> {code}
> The optimizer discards any cases where the fsParent has another SEL child 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16989) Fix some issues identified by lgtm.com

2017-07-12 Thread Malcolm Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Malcolm Taylor updated HIVE-16989:
--
Status: In Progress  (was: Patch Available)

> Fix some issues identified by lgtm.com
> --
>
> Key: HIVE-16989
> URL: https://issues.apache.org/jira/browse/HIVE-16989
> Project: Hive
>  Issue Type: Improvement
>Reporter: Malcolm Taylor
>Assignee: Malcolm Taylor
> Attachments: HIVE-16989.2.patch, HIVE-16989.3.patch, HIVE-16989.patch
>
>
> [lgtm.com|https://lgtm.com] has identified a number of issues where there may 
> be scope for improvement. The plan is to address some of the alerts found at 
> [https://lgtm.com/projects/g/apache/hive/].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16989) Fix some issues identified by lgtm.com

2017-07-12 Thread Malcolm Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Malcolm Taylor updated HIVE-16989:
--
Attachment: HIVE-16989.4.patch

Rebased and dropped changes to .reviewboardrc

> Fix some issues identified by lgtm.com
> --
>
> Key: HIVE-16989
> URL: https://issues.apache.org/jira/browse/HIVE-16989
> Project: Hive
>  Issue Type: Improvement
>Reporter: Malcolm Taylor
>Assignee: Malcolm Taylor
> Attachments: HIVE-16989.2.patch, HIVE-16989.3.patch, 
> HIVE-16989.4.patch, HIVE-16989.patch
>
>
> [lgtm.com|https://lgtm.com] has identified a number of issues where there may 
> be scope for improvement. The plan is to address some of the alerts found at 
> [https://lgtm.com/projects/g/apache/hive/].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators

2017-07-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083560#comment-16083560
 ] 

Gopal V commented on HIVE-16100:


Need to revisit an assumption in the optimizer about parent being a SEL 
operator, expect another refresh.

> Dynamic Sorted Partition optimizer loses sibling operators
> --
>
> Key: HIVE-16100
> URL: https://issues.apache.org/jira/browse/HIVE-16100
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, 
> HIVE-16100.2.patch, HIVE-16100.3.patch, HIVE-16100.4.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173
> {code}
>   // unlink connection between FS and its parent
>   fsParent = fsOp.getParentOperators().get(0);
>   fsParent.getChildOperators().clear();
> {code}
> The optimizer discards any cases where the fsParent has another SEL child 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15051) Test framework integration with findbugs, rat checks etc.

2017-07-12 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083573#comment-16083573
 ] 

Peter Vary commented on HIVE-15051:
---

I am happy with the current wording!
Thanks [~leftylev]!

> Test framework integration with findbugs, rat checks etc.
> -
>
> Key: HIVE-15051
> URL: https://issues.apache.org/jira/browse/HIVE-15051
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Peter Vary
>Assignee: Peter Vary
> Fix For: 3.0.0
>
> Attachments: beeline.out, HIVE-15051.02.patch, HIVE-15051.patch, 
> Interim.patch, ql.out
>
>
> Find a way to integrate code analysis tools like findbugs, rat checks to 
> PreCommit tests, thus removing the burden from reviewers to check the code 
> style and other checks which could be done by code. 
> Might worth to take a look on Yetus, but keep in mind the Hive has a specific 
> parallel test framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16989) Fix some issues identified by lgtm.com

2017-07-12 Thread Malcolm Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Malcolm Taylor updated HIVE-16989:
--
Status: Patch Available  (was: In Progress)

> Fix some issues identified by lgtm.com
> --
>
> Key: HIVE-16989
> URL: https://issues.apache.org/jira/browse/HIVE-16989
> Project: Hive
>  Issue Type: Improvement
>Reporter: Malcolm Taylor
>Assignee: Malcolm Taylor
> Attachments: HIVE-16989.2.patch, HIVE-16989.3.patch, 
> HIVE-16989.4.patch, HIVE-16989.patch
>
>
> [lgtm.com|https://lgtm.com] has identified a number of issues where there may 
> be scope for improvement. The plan is to address some of the alerts found at 
> [https://lgtm.com/projects/g/apache/hive/].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)

2017-07-12 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083610#comment-16083610
 ] 

Chao Sun commented on HIVE-17018:
-

[~kellyzly] Yes. I think we don't need to change the existing behavior. I'm 
just suggesting that we might need a HoS specific config to replace 
{{hive.auto.convert.join.nonconditionaltask.size}}, so that it is less 
confusing.

> Small table is converted to map join even the total size of small tables 
> exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
> -
>
> Key: HIVE-17018
> URL: https://issues.apache.org/jira/browse/HIVE-17018
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt
>
>
>  we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it 
> means  the sum of size for n-1 of the tables/partitions for a n-way join is 
> smaller than it, it will be converted to a map join. for example, A join B 
> join C join D join E. Big table is A(100M), small tables are 
> B(10M),C(10M),D(10M),E(10M).  If we set 
> hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B 
> will be converted to map join but C will not be converted to map join. In my 
> understanding, because hive.auto.convert.join.noconditionaltask.size can only 
> contain E and D, so C and B should not be converted to map join.  
> Let's explain more why E can be converted to map join.
> in current code, 
> [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364]
>  calculates all the mapjoins  in the parent path and child path. The search 
> stops when encountering [UnionOperator or 
> ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381].
>  Because C is not converted to map join because {{connectedMapJoinSize + 
> totalSize) > maxSize}} [see 
> code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The
>  RS before the join of C remains. When calculating whether B will be 
> converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering 
> [RS 
> |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409]
>  and causes  {{connectedMapJoinSize + totalSize) < maxSize}} matches.
> [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not  as you 
> are more familiar with SparkJoinOptimizer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17077) Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number

2017-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083611#comment-16083611
 ] 

ASF GitHub Bot commented on HIVE-17077:
---

GitHub user chitin opened a pull request:

https://github.com/apache/hive/pull/203

HIVE-17077 Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD 
len character's value is negative number

[HIVE-17077] Hive should raise StringIndexOutOfBoundsException when 
LPAD/RPAD len character's value is negative number
- return null when len character's value is negative number

https://issues.apache.org/jira/browse/HIVE-17077

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chitin/hive hive17077

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/203.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #203


commit 8715915839f76f6d60d6b40400d3bb826cbeb8b5
Author: chitin 
Date:   2017-07-12T07:56:47Z

HIVE-17077 Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD 
len character's value is negative number
- Add len judgment logic




> Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len 
> character's value is negative number
> -
>
> Key: HIVE-17077
> URL: https://issues.apache.org/jira/browse/HIVE-17077
> Project: Hive
>  Issue Type: Bug
>Reporter: Lingang Deng
>Assignee: Lingang Deng
>Priority: Minor
>
> lpad(rpad) throw a exception when the second argument a negative number, as 
> follows,
> {code:java}
> hive> select lpad("hello", -1 ,"h");
> FAILED: StringIndexOutOfBoundsException String index out of range: -1
> hive> select rpad("hello", -1 ,"h");
> FAILED: StringIndexOutOfBoundsException String index out of range: -1
> {code}
> Maybe we should return friendly result such as mysql.
> {code:java}
> mysql> select lpad("hello", -1 ,"h");
> +--+
> | lpad("hello", -1 ,"h") |
> +--+
> | NULL |
> +--+
> 1 row in set (0.00 sec)
> mysql> select rpad("hello", -1 ,"h");
> +--+
> | rpad("hello", -1 ,"h") |
> +--+
> | NULL |
> +--+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083605#comment-16083605
 ] 

Hive QA commented on HIVE-17066:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876771/HIVE-17066.5.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10839 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5974/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5974/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5974/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876771 - PreCommit-HIVE-Build

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17077) Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number

2017-07-12 Thread Lingang Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lingang Deng updated HIVE-17077:

Summary: Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD 
len character's value is negative number  (was: lpad(rpad) should return a 
value but not throw a exception)

> Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len 
> character's value is negative number
> -
>
> Key: HIVE-17077
> URL: https://issues.apache.org/jira/browse/HIVE-17077
> Project: Hive
>  Issue Type: Bug
>Reporter: Lingang Deng
>Assignee: Lingang Deng
>Priority: Minor
>
> lpad(rpad) throw a exception when the second argument a negative number, as 
> follows,
> {code:java}
> hive> select lpad("hello", -1 ,"h");
> FAILED: StringIndexOutOfBoundsException String index out of range: -1
> hive> select rpad("hello", -1 ,"h");
> FAILED: StringIndexOutOfBoundsException String index out of range: -1
> {code}
> Maybe we should return friendly result such as mysql.
> {code:java}
> mysql> select lpad("hello", -1 ,"h");
> +--+
> | lpad("hello", -1 ,"h") |
> +--+
> | NULL |
> +--+
> 1 row in set (0.00 sec)
> mysql> select rpad("hello", -1 ,"h");
> +--+
> | rpad("hello", -1 ,"h") |
> +--+
> | NULL |
> +--+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17077) lpad(rpad) should return a value but not throw a exception

2017-07-12 Thread Lingang Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lingang Deng reassigned HIVE-17077:
---


> lpad(rpad) should return a value but not throw a exception
> --
>
> Key: HIVE-17077
> URL: https://issues.apache.org/jira/browse/HIVE-17077
> Project: Hive
>  Issue Type: Bug
>Reporter: Lingang Deng
>Assignee: Lingang Deng
>Priority: Minor
>
> lpad(rpad) throw a exception when the second argument a negative number, as 
> follows,
> {code:java}
> hive> select lpad("hello", -1 ,"h");
> FAILED: StringIndexOutOfBoundsException String index out of range: -1
> hive> select rpad("hello", -1 ,"h");
> FAILED: StringIndexOutOfBoundsException String index out of range: -1
> {code}
> Maybe we should return friendly result such as mysql.
> {code:java}
> mysql> select lpad("hello", -1 ,"h");
> +--+
> | lpad("hello", -1 ,"h") |
> +--+
> | NULL |
> +--+
> 1 row in set (0.00 sec)
> mysql> select rpad("hello", -1 ,"h");
> +--+
> | rpad("hello", -1 ,"h") |
> +--+
> | NULL |
> +--+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083662#comment-16083662
 ] 

Hive QA commented on HIVE-16975:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876748/HIVE-16975.1.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5975/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5975/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5975/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876748 - PreCommit-HIVE-Build

> Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is 
> now used
> -
>
> Key: HIVE-16975
> URL: https://issues.apache.org/jira/browse/HIVE-16975
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16975.1.patch
>
>
> Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17077) Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number

2017-07-12 Thread Lingang Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083673#comment-16083673
 ] 

Lingang Deng commented on HIVE-17077:
-

CC [~sershe]

> Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len 
> character's value is negative number
> -
>
> Key: HIVE-17077
> URL: https://issues.apache.org/jira/browse/HIVE-17077
> Project: Hive
>  Issue Type: Bug
>Reporter: Lingang Deng
>Assignee: Lingang Deng
>Priority: Minor
>
> lpad(rpad) throw a exception when the second argument a negative number, as 
> follows,
> {code:java}
> hive> select lpad("hello", -1 ,"h");
> FAILED: StringIndexOutOfBoundsException String index out of range: -1
> hive> select rpad("hello", -1 ,"h");
> FAILED: StringIndexOutOfBoundsException String index out of range: -1
> {code}
> Maybe we should return friendly result such as mysql.
> {code:java}
> mysql> select lpad("hello", -1 ,"h");
> +--+
> | lpad("hello", -1 ,"h") |
> +--+
> | NULL |
> +--+
> 1 row in set (0.00 sec)
> mysql> select rpad("hello", -1 ,"h");
> +--+
> | rpad("hello", -1 ,"h") |
> +--+
> | NULL |
> +--+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16917) HiveServer2 guard rails - Limit concurrent connections from user

2017-07-12 Thread Aengus Rooney (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083729#comment-16083729
 ] 

Aengus Rooney commented on HIVE-16917:
--

Thanks Thejas, discussed this with the team and it was agreed a User+IP 
combination would also be beneficial, as multiple apps use the same user for 
certain applications.  The default thresholds are acceptable, so long as they 
are configurable.   

> HiveServer2 guard rails - Limit concurrent connections from user
> 
>
> Key: HIVE-16917
> URL: https://issues.apache.org/jira/browse/HIVE-16917
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Thejas M Nair
>
> Rogue applications can make HS2 unusable for others by making too many 
> connections at a time.
> HS2 should start rejecting the number of connections from a user, after it 
> has reached a configurable threshold.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17019) Add support to download debugging information as an archive.

2017-07-12 Thread Harish Jaiprakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083774#comment-16083774
 ] 

Harish Jaiprakash commented on HIVE-17019:
--

Thanks [~sseth].

- Change the top level package from llap-debug to tez-debug? (Works with both I 
believe) [~ashutoshc], [~thejas] - any recommendations on whether the code gets 
a top level module, or goes under an existing module. This allows downloading 
of various debug artifacts for a tez job - logs, metrics for llap, hiveserver2 
logs (soon), tez am logs, ATS data for the query (hive and tez).

Will change the directory.

- In the new pom.xml, dependency on hive-llap-server. 1) Is it required?, 2) 
Will need to exclude some dependent artifacts. See service/pom.xml llap-server 
dependency handling

The llap status is fetched using LlapStatusServiceDriver which is part of 
hive-llap-server.

- LogDownloadServlet - Should this throw an error as soon as the filename 
pattern validation fails?

The filename check is to prevent any injection attack into the file name/http 
header, not to validate the id.

- LogDownloadServlet - change to dagId/queryId validation instead

Can do, but it will be sensitive to changes to the id format. Currently its 
passed down to ATS and nothing will be retrieved for it.

- LogDownloadServlet - thread being created inside of the request handler? This 
should be limited outside of the request? so that only a controlled number of 
parallel artifact downloads can run.

Creating a shared executor, does it make sense to use Guava's direct executor, 
which will schedule task in current thread.

- LogDownloadServlet - what happens in case of aggregator failure? Exception 
back to the user?

Jetty will handle the exception, returning 500 to the user. Not sure if 
exception trace is part of it. Will try and see.

- LogDownloadServlet - seems to be generating the file to disk and then 
streaming it over. Can this be streamed over directly instead. Otherwise 
there's the possibility of leaking files. (Artifact.downloadIntoStream or some 
such?) Guessing this is complicated further by the multi-threaded artifact 
downloader.
Alternately need to have a cleanup mechanism.

For streaming directly, it would not be possible because of multithreading. If 
its single threaded then I can use a ZipOutputStream and add entry one at a 
time.

Oops, sorry the finally got moved down since aggregator had to be closed before 
streaming the file. I'll handle it using a try finally to cleanup.

- Timeout on the tests

Setting timeouts on tests.

- Apache header needs to be added to files where it is missing.

Sorry, will add the licence header to all files.

- Main - Please rename to something more indicative of what the tool does.

I was planning to remove this and integrate with hive cli, --service 
. This does not work without lot of classpath fixes, or I'll 
have to create a script to add hive jars.

- Main - Likely a follow up jira - parse using a standard library, instead of 
trying to parse the arguments to main directly.

Will check a few libs, apache commons OptionBuilder uses a static instance in 
its builder. Should be ok, for a cli based invoke once app, but will look at 
something better on lines of python argparse.

- Server - Enabling the artifact should be controlled via a config. Does not 
always need to be hosted in HS2 (Default disabled, at least till security can 
be sorted out)

I'll add a config.

- Is it possible to support a timeout on the downloads? (Can be a follow up 
jira)

Sure, will do. Global or per download or both?

- ArtifactAggregator - I believe this does 2 stages of dependent artifacts / 
downloads? Stage1 - download whatever it can. Information from this should 
should be adequate for stage2 downloads ?

It could be more stages:
Ex: given dag_id
stage 1: will fetch tez ats info which is used to extract hive id, task 
container/node list.
stage 2: will fetch hive ats info, tez container log list.
stage 3: llap containers log list, tez task logs.
stage 4: llap container logs.

aggregator iterates through the list of sources and finds those which can 
download using info in the params.
It schedules the sources and waits for them to complete everything and the 
repeats.
Stop if no new sources could download or all sources are exhausted.


- For the ones not implemented yet (DummyArtifact) - think it's better to just 
comment out the code, instead of invoking the DummyArtifacts downloader

Sorry, will do.

- Security - ACL enforcement required on secure clusters to make sure users can 
only download what they have access to. This is a must fix before this can be 
enabled by default.

Working on this.

- Security - this can work around yarn restrictions on log downloads, since the 
files are being accessed by the hive user.

Yes this should work.

Could you please add some details on cluster testing.

I'll add ano

[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083727#comment-16083727
 ] 

Hive QA commented on HIVE-17066:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876771/HIVE-17066.5.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10839 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5976/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5976/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5976/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876771 - PreCommit-HIVE-Build

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17038) invalid result when CAST-ing to DATE

2017-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083775#comment-16083775
 ] 

ASF GitHub Bot commented on HIVE-17038:
---

GitHub user mlorek opened a pull request:

https://github.com/apache/hive/pull/204

HIVE-17038 - DateParser fix



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mlorek/hive master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/204.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #204


commit 1e4eb6a870af2ce3ec1e5116a81e6453f9fc990d
Author: Michael Lorek 
Date:   2017-07-12T10:30:22Z

HIVE-17038 - DateParser fix




> invalid result when CAST-ing to DATE
> 
>
> Key: HIVE-17038
> URL: https://issues.apache.org/jira/browse/HIVE-17038
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Hive
>Affects Versions: 1.2.1
>Reporter: Jim Hopper
>
> when casting incorrect date literals to DATE data type hive returns wrong 
> values instead of NULL.
> {code}
> SELECT CAST('2017-02-31' AS DATE);
> SELECT CAST('2017-04-31' AS DATE);
> {code}
> Some examples below where it really can produce weird results:
> {code}
> select *
>   from (
> select cast('2017-07-01' as date) as dt
> ) as t
> where t.dt = '2017-06-31';
> select *
>   from (
> select cast('2017-07-01' as date) as dt
> ) as t
> where t.dt = cast('2017-06-31' as date);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-12 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17073:
---
Attachment: HIVE-17073.03.patch

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, 
> HIVE-17073.03.patch, HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083799#comment-16083799
 ] 

Hive QA commented on HIVE-16100:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876773/HIVE-16100.4.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10839 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_move_tasks_share_dependencies]
 (batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[reducesink_dedup] 
(batchId=22)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5977/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5977/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5977/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876773 - PreCommit-HIVE-Build

> Dynamic Sorted Partition optimizer loses sibling operators
> --
>
> Key: HIVE-16100
> URL: https://issues.apache.org/jira/browse/HIVE-16100
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, 
> HIVE-16100.2.patch, HIVE-16100.3.patch, HIVE-16100.4.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173
> {code}
>   // unlink connection between FS and its parent
>   fsParent = fsOp.getParentOperators().get(0);
>   fsParent.getChildOperators().clear();
> {code}
> The optimizer discards any cases where the fsParent has another SEL child 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-12 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-12631:
--
Attachment: HIVE-12631.16.patch

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.1.patch, HIVE-12631.2.patch, HIVE-12631.3.patch, 
> HIVE-12631.4.patch, HIVE-12631.5.patch, HIVE-12631.6.patch, 
> HIVE-12631.7.patch, HIVE-12631.8.patch, HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16989) Fix some issues identified by lgtm.com

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083863#comment-16083863
 ] 

Hive QA commented on HIVE-16989:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876774/HIVE-16989.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10841 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning 
(batchId=289)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements (batchId=226)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5978/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5978/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5978/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876774 - PreCommit-HIVE-Build

> Fix some issues identified by lgtm.com
> --
>
> Key: HIVE-16989
> URL: https://issues.apache.org/jira/browse/HIVE-16989
> Project: Hive
>  Issue Type: Improvement
>Reporter: Malcolm Taylor
>Assignee: Malcolm Taylor
> Attachments: HIVE-16989.2.patch, HIVE-16989.3.patch, 
> HIVE-16989.4.patch, HIVE-16989.patch
>
>
> [lgtm.com|https://lgtm.com] has identified a number of issues where there may 
> be scope for improvement. The plan is to address some of the alerts found at 
> [https://lgtm.com/projects/g/apache/hive/].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083925#comment-16083925
 ] 

Hive QA commented on HIVE-17073:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876812/HIVE-17073.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator.testSelectOperator
 (batchId=272)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5979/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5979/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5979/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876812 - PreCommit-HIVE-Build

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, 
> HIVE-17073.03.patch, HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Yibing Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi reassigned HIVE-17078:
-

Assignee: Yibing Shi

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Yibing Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-17078:
--
Status: Patch Available  (was: Open)

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Yibing Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-17078:
--
Attachment: HIVE-17078.1.patch

Attach a quick patch.
No tests are added, because this feature seems not be able to be tested in mini 
cluster.

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084016#comment-16084016
 ] 

Hive QA commented on HIVE-17078:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876842/HIVE-17078.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5981/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5981/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5981/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-12 13:54:51.943
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5981/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-12 13:54:51.946
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 26d6de7 HIVE-16730: Vectorization: Schema Evolution for Text 
Vectorization / Complex Types
+ git clean -f -d
Removing ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderAdaptor.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java.orig
Removing ql/src/test/queries/clientpositive/llap_acid_fast.q
Removing ql/src/test/results/clientpositive/llap/llap_acid.q.out
Removing ql/src/test/results/clientpositive/llap/llap_acid_fast.q.out
Removing ql/src/test/results/clientpositive/llap_acid_fast.q.out
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 26d6de7 HIVE-16730: Vectorization: Schema Evolution for Text 
Vectorization / Complex Types
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-12 13:54:57.818
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java:36
error: ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java: 
patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876842 - PreCommit-HIVE-Build

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084008#comment-16084008
 ] 

Hive QA commented on HIVE-12631:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876820/HIVE-12631.16.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 35 failed/errored test(s), 10842 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_vectorization] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_reader] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] 
(batchId=56)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters1]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testACIDwithSchemaEvolutionAndCompaction
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testAlterTable
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testBucketizedInputFormat
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testDeleteIn
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testETLSplitStrategyForACID
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMerge2
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMerge3
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMultiInsertStatement
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNoHistory
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion1
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion2
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion3
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testOrcNoPPD
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testOrcPPD
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testUpdateMixedCase
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.updateDeletePartitioned
 (batchId=277)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.writeBetweenWorkerAndCleaner
 (batchId=277)
org.apache.hadoop.hive.ql.io.orc.TestVectorizedOrcAcidRowBatchReader.testVectorizedOrcAcidRowBatchReader
 (batchId=260)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5980/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5980/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5980/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 35 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876820 - PreCommit-HIVE-Build

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
> 

[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-8838:
-
Attachment: HIVE-8838.4.patch

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-8838:
-
Status: In Progress  (was: Patch Available)

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-8838:
-
Status: Patch Available  (was: In Progress)

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084074#comment-16084074
 ] 

Adam Szita commented on HIVE-8838:
--

[~spena] I've addressed your comments in [^HIVE-8838.4.patch]

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16831) Add unit tests for NPE fixes in HIVE-12054

2017-07-12 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated HIVE-16831:

Issue Type: Test  (was: Bug)

> Add unit tests for NPE fixes in HIVE-12054
> --
>
> Key: HIVE-16831
> URL: https://issues.apache.org/jira/browse/HIVE-16831
> Project: Hive
>  Issue Type: Test
>  Components: Hive
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Fix For: 3.0.0
>
> Attachments: HIVE-16831.1.patch, HIVE-16831.2.patch
>
>
> HIVE-12054 fixed NPE issues related to ObjectInspector which get triggered 
> when an empty ORC table/partition is read.
> This work adds tests that trigger that path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Yibing Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-17078:
--
Attachment: HIVE-17078.2.patch

Recreate the patch

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16880) Remove ArrayList Instantiation For Empty Arrays

2017-07-12 Thread Daniel Voros (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084108#comment-16084108
 ] 

Daniel Voros commented on HIVE-16880:
-

[~belugabehr] are you sure that using immutable lists won't affect the behavior 
in these cases? For example 
[here|https://github.com/apache/hive/commit/8f004997025f032242a5b2db4c6baf9256e0ecbd#diff-6b5f3b952d1387946d488fc2d4432ee1R1228]
 in the constructor of AggrStats the list will be stored as a field and we 
might try to add to it later.

> Remove ArrayList Instantiation For Empty Arrays
> ---
>
> Key: HIVE-16880
> URL: https://issues.apache.org/jira/browse/HIVE-16880
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1, 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-16880.1.patch, HIVE-16880.2.patch
>
>
> Class {{org.apache.hadoop.hive.metastore.MetaStoreDirectSql}} uses a lot of 
> empty arrays in the code.  Please replace with a static empty array instead 
> of all the instantiation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084144#comment-16084144
 ] 

Sergio Peña commented on HIVE-8838:
---

Thanks, LGTM
+1

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4577:
--
Attachment: HIVE-4577.6.patch

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16402) Upgrade to Hadoop 2.8.0

2017-07-12 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16402:

Fix Version/s: 2.2.0

> Upgrade to Hadoop 2.8.0
> ---
>
> Key: HIVE-16402
> URL: https://issues.apache.org/jira/browse/HIVE-16402
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.2.0, 3.0.0
>
> Attachments: HIVE-16402.1.patch, HIVE-16402.2.patch, 
> HIVE-16402.3.patch, HIVE-16402.4.patch, HIVE-16402.5.patch, 
> HIVE-16402.6.patch, HIVE-16402.7.patch
>
>
> Hadoop 2.8.0 has been out since March, we should upgrade to it. Release notes 
> for Hadoop 2.8.x are here: http://hadoop.apache.org/docs/r2.8.0/index.html
> It has a number of useful features, improvements for S3 support, ADLS 
> support, etc. along with a bunch of other fixes. This should also help us on 
> our way to upgrading to Hadoop 3.x (HIVE-15016).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests

2017-07-12 Thread Marta Kuczora (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-17072:
-
Attachment: HIVE-17072.1.patch

> Make the parallelized timeout configurable in BeeLine tests
> ---
>
> Key: HIVE-17072
> URL: https://issues.apache.org/jira/browse/HIVE-17072
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Minor
> Attachments: HIVE-17072.1.patch
>
>
> When running the BeeLine tests parallel, the timeout is hardcoded in the 
> Parallelized.java:
> {noformat}
> @Override
> public void finished() {
>   executor.shutdown();
>   try {
> executor.awaitTermination(10, TimeUnit.MINUTES);
>   } catch (InterruptedException exc) {
> throw new RuntimeException(exc);
>   }
> }
> {noformat}
> It would be better to make it configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-12 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084174#comment-16084174
 ] 

Bing Li commented on HIVE-4577:
---

Thank you, [~vgumashta]. I could reproduce TestPerfCliDriver [query14] in my 
env, and update its golden file. The failure of 
TestMiniLlapLocalCliDriver[vector_if_expr] and 
TestBeeLineDriver[materialized_view_create_rewrite] should not caused by this 
patch.

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Attachment: (was: HIVE-16922.2.patch)

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Attachment: HIVE-16922.2.patch

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084183#comment-16084183
 ] 

Bing Li commented on HIVE-16922:


Thank you, [~lirui]. Seems that the result page has been expired. Just 
re-submitted the patch to check.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests

2017-07-12 Thread Marta Kuczora (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-17072:
-
Status: Patch Available  (was: Open)

> Make the parallelized timeout configurable in BeeLine tests
> ---
>
> Key: HIVE-17072
> URL: https://issues.apache.org/jira/browse/HIVE-17072
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Minor
> Attachments: HIVE-17072.1.patch
>
>
> When running the BeeLine tests parallel, the timeout is hardcoded in the 
> Parallelized.java:
> {noformat}
> @Override
> public void finished() {
>   executor.shutdown();
>   try {
> executor.awaitTermination(10, TimeUnit.MINUTES);
>   } catch (InterruptedException exc) {
> throw new RuntimeException(exc);
>   }
> }
> {noformat}
> It would be better to make it configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-12 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17073:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Fixed TestVectorSelectOperator and pushed to master, thanks for reviewing 
[~mmccline]!

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, 
> HIVE-17073.03.patch, HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084197#comment-16084197
 ] 

Hive QA commented on HIVE-8838:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876860/HIVE-8838.4.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10873 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5982/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5982/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5982/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876860 - PreCommit-HIVE-Build

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084226#comment-16084226
 ] 

Adam Szita commented on HIVE-8838:
--

Test results above are irrelevant again - I think this is ready for commit

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled

2017-07-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-13384:
---
Description: 
I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
But found that it can't new a HiveMetaStoreClient object successfully via a 
proxy user in Kerberos env.

===
15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
==

When I debugging on Hive, I found that the error came from open() method in 
HiveMetaStoreClient class.

Around line 406,
 transport = UserGroupInformation.getCurrentUser().doAs(new 
PrivilegedExceptionAction() {  //FAILED, because the current user 
doesn't have the cridential

But it will work if I change above line to
 transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
PrivilegedExceptionAction() {  //PASS

I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
submit a mapreduce job via Pig/HCatalog, it runs into the same issue again when 
initialize the object via HCatalog.

It would be better to fix this issue in Hive side.

  was:
I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
But found that it can't new a HiveMetaStoreClient object successfully via a 
proxy using in Kerberos env.

===
15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
==

When I debugging on Hive, I found that the error came from open() method in 
HiveMetaStoreClient class.

Around line 406,
 transport = UserGroupInformation.getCurrentUser().doAs(new 
PrivilegedExceptionAction() {  //FAILED, because the current user 
doesn't have the cridential

But it will work if I change above line to
 transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
PrivilegedExceptionAction() {  //PASS

I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
submit a mapreduce job via Pig/HCatalog, it runs into the same issue again when 
initialize the object via HCatalog.

It would be better to fix this issue in Hive side.


> Failed to create HiveMetaStoreClient object with proxy user when Kerberos 
> enabled
> -
>
> Key: HIVE-13384
> URL: https://issues.apache.org/jira/browse/HIVE-13384
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>
> I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
> But found that it can't new a HiveMetaStoreClient object successfully via a 
> proxy user in Kerberos env.
> ===
> 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
> ==
> When I debugging on Hive, I found that the error came from open() method in 
> HiveMetaStoreClient class.
> Around line 406,
>  transport = UserGroupInformation.getCurrentUser().doAs(new 
> PrivilegedExceptionAction() {  //FAILED, because the current user 
> doesn't have the cridential
> But it will work if I change above line to
>  transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
> PrivilegedExceptionAction() {  //PASS
> I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
> submit a mapreduce job via Pig/HCatalog, it runs into the same issue again 
> when initialize the object

[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-12 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084245#comment-16084245
 ] 

Bing Li commented on HIVE-16907:


[~pxiong] and [~lirui], thank you for your comments.
I tried CREATE TABLE statement in MySQL, and found that it treats the `db.tbl` 
as the table name. And "dot" is allowed in the table name. 
e.g.

{code:java}
mysql> create table xxx (col int);
mysql> create table test.yyy (col int);
mysql> create table `test.zzz` (col int);
mysql> create table `test.test.tbl` (col int);

mysql> show tables;
++
| Tables_in_test |
++
| test.test.tbl  |
| test.zzz   |
| xxx|
| yyy|
++
{code}

Back to Hive, if we would like to make it having the same behavior as MySQL, we 
should change the logic of processing it.
My previous patch is NOT enough and can't handle `db.db.tbl` neither.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  

[jira] [Assigned] (HIVE-16999) Performance bottleneck in the ADD FILE/ARCHIVE commands for an HDFS resource

2017-07-12 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16999:
--

Assignee: Bing Li

> Performance bottleneck in the ADD FILE/ARCHIVE commands for an HDFS resource
> 
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Assignee: Bing Li
>Priority: Critical
>
> Performance bottleneck is found in adding resource[which is lying on HDFS] to 
> the distributed cache. 
> Commands used are :-
> {code:java}
> 1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
> 2. ADD FILE "hdfs://some_dir/file.txt"
> {code}
> Here is the log corresponding to the archive adding operation:-
> {noformat}
>  converting to local hdfs://some_dir/archive.tar
>  Added resources: [hdfs://some_dir/archive.tar
> {noformat}
> Hive is downloading the resource to the local filesystem [shown in log by 
> "converting to local"]. 
> {color:#d04437}Ideally there is no need to bring the file to the local 
> filesystem when this operation is all about copying the file from one 
> location on HDFS to other location on HDFS[distributed cache].{color}
> This adds lot of performance bottleneck when the the resource is a big file 
> and all commands need the same resource.
> After debugging around the impacted piece of code is found to be :-
> {code:java}
> public List add_resources(ResourceType t, Collection values, 
> boolean convertToUnix)
>   throws RuntimeException {
> Set resourceSet = resourceMaps.getResourceSet(t);
> Map> resourcePathMap = 
> resourceMaps.getResourcePathMap(t);
> Map> reverseResourcePathMap = 
> resourceMaps.getReverseResourcePathMap(t);
> List localized = new ArrayList();
> try {
>   for (String value : values) {
> String key;
>  {color:#d04437}//get the local path of downloaded jars{color}
> List downloadedURLs = resolveAndDownload(t, value, 
> convertToUnix);
>  ;
>   .
> {code}
> {code:java}
>   List resolveAndDownload(ResourceType t, String value, boolean 
> convertToUnix) throws URISyntaxException,
>   IOException {
> URI uri = createURI(value);
> if (getURLType(value).equals("file")) {
>   return Arrays.asList(uri);
> } else if (getURLType(value).equals("ivy")) {
>   return dependencyResolver.downloadDependencies(uri);
> } else { // goes here for HDFS
>   return Arrays.asList(createURI(downloadResource(value, 
> convertToUnix))); // Here when the resource is not local it will download it 
> to the local machine.
> }
>   }
> {code}
> Here, the function resolveAndDownload() always calls the downloadResource() 
> api in case of external filesystem. It should take into consideration the 
> fact that - when the resource is on same HDFS then bringing it on local 
> machine is not a needed step and can be skipped for better performance.
> Thanks,
> Sailee



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084287#comment-16084287
 ] 

Hive QA commented on HIVE-17078:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876861/HIVE-17078.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements (batchId=226)
org.apache.hive.jdbc.TestSSL.testMetastoreWithSSL (batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5983/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5983/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5983/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876861 - PreCommit-HIVE-Build

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16732) Transactional tables should block LOAD DATA

2017-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16732:
--
Attachment: HIVE-16732.03-branch-2.patch

> Transactional tables should block LOAD DATA 
> 
>
> Key: HIVE-16732
> URL: https://issues.apache.org/jira/browse/HIVE-16732
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, 
> HIVE-16732.03-branch-2.patch, HIVE-16732.03.patch
>
>
> This has always been the design.
> see LoadSemanticAnalyzer.analyzeInternal()
> StrictChecks.checkBucketing(conf);
> Some examples (this is exposed by HIVE-16177)
> insert_values_orig_table.q
>  insert_orig_table.q
>  insert_values_orig_table_use_metadata.q



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084376#comment-16084376
 ] 

Hive QA commented on HIVE-4577:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876871/HIVE-4577.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10841 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_12] 
(batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dfscmd] (batchId=33)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] 
(batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5984/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5984/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5984/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876871 - PreCommit-HIVE-Build

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16812) VectorizedOrcAcidRowBatchReader doesn't filter delete events

2017-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16812:
--
Priority: Critical  (was: Major)

> VectorizedOrcAcidRowBatchReader doesn't filter delete events
> 
>
> Key: HIVE-16812
> URL: https://issues.apache.org/jira/browse/HIVE-16812
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
>
> the c'tor of VectorizedOrcAcidRowBatchReader has
> {noformat}
> // Clone readerOptions for deleteEvents.
> Reader.Options deleteEventReaderOptions = readerOptions.clone();
> // Set the range on the deleteEventReaderOptions to 0 to INTEGER_MAX 
> because
> // we always want to read all the delete delta files.
> deleteEventReaderOptions.range(0, Long.MAX_VALUE);
> {noformat}
> This is suboptimal since base and deltas are sorted by ROW__ID.  So for each 
> split if base we can find min/max ROW_ID and only load events from delta that 
> are in [min,max] range.  This will reduce the number of delete events we load 
> in memory (to no more than there in the split).
> When we support sorting on PK, the same should apply but we'd need to make 
> sure to store PKs in ORC index
> See OrcRawRecordMerger.discoverKeyBounds()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084360#comment-16084360
 ] 

Eugene Koifman commented on HIVE-16177:
---

HIVE-16177.20-branch-2.patch committed to branch-2 (2.x)


> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, 
> HIVE-16177.19-branch-2.patch, HIVE-16177.20-branch-2.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16177:
--
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, 
> HIVE-16177.19-branch-2.patch, HIVE-16177.20-branch-2.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8838:
--
Issue Type: New Feature  (was: Bug)

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: New Feature
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17079) LLAP: Use FQDN by default for work submission

2017-07-12 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-17079:



> LLAP: Use FQDN by default for work submission
> -
>
> Key: HIVE-17079
> URL: https://issues.apache.org/jira/browse/HIVE-17079
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> HIVE-14624 added FDQN for work submission. We should enable it by default to 
> avoid DNS issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-12 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084432#comment-16084432
 ] 

Vineet Garg commented on HIVE-17066:


Pushed to master

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084434#comment-16084434
 ] 

Sushanth Sowmyan commented on HIVE-8838:


([~spena], I just pushed to master too, hopefully our pushes don't conflict :D )

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: New Feature
>Reporter: Brock Noland
>Assignee: Adam Szita
> Fix For: 3.0.0
>
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Fix For: 3.0.0
>
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-12 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084425#comment-16084425
 ] 

Vineet Garg commented on HIVE-17066:


Failures are not reproducible/un-related

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8838:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks [~szita] for your contribution. I committed to master.

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: New Feature
>Reporter: Brock Noland
>Assignee: Adam Szita
> Fix For: 3.0.0
>
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Fix Version/s: 3.0.0

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Fix For: 3.0.0
>
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16832:
--
Attachment: HIVE-16832.22.patch

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, 
> HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, 
> HIVE-16832.21.patch, HIVE-16832.22.patch
>
>
> {noformat}
>  create table AcidTablePart(a int, b int) partitioned by (p string) clustered 
> by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
>  create temporary table if not exists data1 (x int);
>  insert into data1 values (1);
>  from data1
>insert into AcidTablePart partition(p) select 0, 0, 'p' || x
>insert into AcidTablePart partition(p='p1') select 0, 1
> {noformat}
> Each branch of this multi-insert create a row in partition p1/bucket0 with 
> ROW__ID=(1,0,0).
> The same can happen when running SQL Merge (HIVE-10924) statement that has 
> both Insert and Update clauses when target table has 
> _'transactional'='true','transactional_properties'='default'_  (see 
> HIVE-14035).  This is so because Merge is internally run as a multi-insert 
> statement.
> The solution relies on statement ID introduced in HIVE-11030.  Each Insert 
> clause of a multi-insert is gets a unique ID.
> The ROW__ID.bucketId now becomes a bit packed triplet (format version, 
> bucketId, statementId).
> (Since ORC stores field names in the data file we can't rename 
> ROW__ID.bucketId).
> This ensures that there are no collisions and retains desired sort properties 
> of ROW__ID.
> In particular _SortedDynPartitionOptimizer_ works w/o any changes even in 
> cases where there fewer reducers than buckets.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17079) LLAP: Use FQDN by default for work submission

2017-07-12 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17079:
-
Status: Patch Available  (was: Open)

> LLAP: Use FQDN by default for work submission
> -
>
> Key: HIVE-17079
> URL: https://issues.apache.org/jira/browse/HIVE-17079
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17079.1.patch
>
>
> HIVE-14624 added FDQN for work submission. We should enable it by default to 
> avoid DNS issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17079) LLAP: Use FQDN by default for work submission

2017-07-12 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17079:
-
Attachment: HIVE-17079.1.patch

[~sseth] can you please take a look? small change

> LLAP: Use FQDN by default for work submission
> -
>
> Key: HIVE-17079
> URL: https://issues.apache.org/jira/browse/HIVE-17079
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17079.1.patch
>
>
> HIVE-14624 added FDQN for work submission. We should enable it by default to 
> avoid DNS issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17079) LLAP: Use FQDN by default for work submission

2017-07-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084438#comment-16084438
 ] 

Gopal V commented on HIVE-17079:


LGTM - +1

> LLAP: Use FQDN by default for work submission
> -
>
> Key: HIVE-17079
> URL: https://issues.apache.org/jira/browse/HIVE-17079
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17079.1.patch
>
>
> HIVE-14624 added FDQN for work submission. We should enable it by default to 
> avoid DNS issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode

2017-07-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16821:
---
Status: Open  (was: Patch Available)

Temporarily obsoleted by HIVE-17073

> Vectorization: support Explain Analyze in vectorized mode
> -
>
> Key: HIVE-16821
> URL: https://issues.apache.org/jira/browse/HIVE-16821
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability, Vectorization
>Affects Versions: 2.1.1, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, 
> HIVE-16821.2.patch, HIVE-16821.3.patch
>
>
> Currently, to avoid a branch in the operator inner loop - the runtime stats 
> are only available in non-vector mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-12 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-12631:
--
Attachment: HIVE-12631.17.patch

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.1.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, 
> HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084462#comment-16084462
 ] 

Sahil Takiar commented on HIVE-17078:
-

If we are printing the child stdout / stderr to the Hive logs then do we need 
to also print them to Hive stdout / stderr too?

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084479#comment-16084479
 ] 

Hive QA commented on HIVE-17072:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876874/HIVE-17072.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_queries] 
(batchId=94)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5985/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5985/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5985/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876874 - PreCommit-HIVE-Build

> Make the parallelized timeout configurable in BeeLine tests
> ---
>
> Key: HIVE-17072
> URL: https://issues.apache.org/jira/browse/HIVE-17072
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Minor
> Attachments: HIVE-17072.1.patch
>
>
> When running the BeeLine tests parallel, the timeout is hardcoded in the 
> Parallelized.java:
> {noformat}
> @Override
> public void finished() {
>   executor.shutdown();
>   try {
> executor.awaitTermination(10, TimeUnit.MINUTES);
>   } catch (InterruptedException exc) {
> throw new RuntimeException(exc);
>   }
> }
> {noformat}
> It would be better to make it configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084566#comment-16084566
 ] 

Hive QA commented on HIVE-16922:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876876/HIVE-16922.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10874 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5986/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5986/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5986/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876876 - PreCommit-HIVE-Build

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084585#comment-16084585
 ] 

Sergio Peña commented on HIVE-8838:
---

Aaa, that's what happened haha, I tried to push it when I got an error that I 
had to update my local repo, and when I updated it, I saw the patch was already 
there, then I got confused.

Anyway, no worries, thanks for the heads up.

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: New Feature
>Reporter: Brock Noland
>Assignee: Adam Szita
> Fix For: 3.0.0
>
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-12 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084614#comment-16084614
 ] 

Pengcheng Xiong commented on HIVE-16907:


That is exactly what I am worrying about : Hive may not well support table name 
with ".". Could u evaluate the work that we need to do if we want to support 
this? Thanks.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   |
> | compressed: false   
> 

[jira] [Comment Edited] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-12 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084614#comment-16084614
 ] 

Pengcheng Xiong edited comment on HIVE-16907 at 7/12/17 8:18 PM:
-

That is exactly what I am worrying about : Hive may not well support table name 
with ".". Could u estimate the work that we need to do if we want to support 
this? Thanks.


was (Author: pxiong):
That is exactly what I am worrying about : Hive may not well support table name 
with ".". Could u evaluate the work that we need to do if we want to support 
this? Thanks.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  

[jira] [Commented] (HIVE-16732) Transactional tables should block LOAD DATA

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084644#comment-16084644
 ] 

Hive QA commented on HIVE-16732:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876901/HIVE-16732.03-branch-2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10585 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=139)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=125)
org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228)
org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition 
(batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5987/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5987/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5987/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876901 - PreCommit-HIVE-Build

> Transactional tables should block LOAD DATA 
> 
>
> Key: HIVE-16732
> URL: https://issues.apache.org/jira/browse/HIVE-16732
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, 
> HIVE-16732.03-branch-2.patch, HIVE-16732.03.patch
>
>
> This has always been the design.
> see LoadSemanticAnalyzer.analyzeInternal()
> StrictChecks.checkBucketing(conf);
> Some examples (this is exposed by HIVE-16177)
> insert_values_orig_table.q
>  insert_orig_table.q
>  insert_values_orig_table_use_metadata.q



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16732) Transactional tables should block LOAD DATA

2017-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16732:
--
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

HIVE-16732.03-branch-2.patch committed to branch-2 (2.x)
thanks Wei for the review

> Transactional tables should block LOAD DATA 
> 
>
> Key: HIVE-16732
> URL: https://issues.apache.org/jira/browse/HIVE-16732
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, 
> HIVE-16732.03-branch-2.patch, HIVE-16732.03.patch
>
>
> This has always been the design.
> see LoadSemanticAnalyzer.analyzeInternal()
> StrictChecks.checkBucketing(conf);
> Some examples (this is exposed by HIVE-16177)
> insert_values_orig_table.q
>  insert_orig_table.q
>  insert_values_orig_table_use_metadata.q



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084708#comment-16084708
 ] 

Adam Szita commented on HIVE-8838:
--

Thanks for reviewing [~spena], [~sushanth], [~aihuaxu] and committing!

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: New Feature
>Reporter: Brock Noland
>Assignee: Adam Szita
> Fix For: 3.0.0
>
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Status: Patch Available  (was: Open)

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>

[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Attachment: HIVE-16793.5.patch

Latest patch adds a config param {{hive.optimize.remove.sq_count_check}} to 
enable this optimization. Since this optimization caters to a very specific 
case but could have adverse effects (join reordering, joins not merging) we 
have decided to disable this optimization by default

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> 

[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Status: Open  (was: Patch Available)

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>   <-Select Operator

[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084760#comment-16084760
 ] 

Gopal V commented on HIVE-16793:


Does enabling this optimization remove the cross-products triggered by the 
scalar sub-query?

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
> 

[jira] [Commented] (HIVE-17079) LLAP: Use FQDN by default for work submission

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084767#comment-16084767
 ] 

Hive QA commented on HIVE-17079:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876908/HIVE-17079.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10874 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5988/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5988/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5988/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876908 - PreCommit-HIVE-Build

> LLAP: Use FQDN by default for work submission
> -
>
> Key: HIVE-17079
> URL: https://issues.apache.org/jira/browse/HIVE-17079
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17079.1.patch
>
>
> HIVE-14624 added FDQN for work submission. We should enable it by default to 
> avoid DNS issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)

2017-07-12 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084774#comment-16084774
 ] 

liyunzhang_intel commented on HIVE-17018:
-

[~csun]: 
{quote}
Yes. I think we don't need to change the existing behavior. I'm just suggesting 
that we might need a HoS specific config to replace 
hive.auto.convert.join.nonconditionaltask.size
{quote}
rename {{hive.auto.convert.join.nonconditionaltask.size}} to 
{{hive.auto.convert.join.within.sparktask.size}}? and the description of the 
configuration 
{noformat} is changed from
the sum of size for n-1 of the tables/partitions for a n-way join is smaller 
than it
{noformat}

to 
{noformat}

the sum of size for n-1 of the tables/partitions for a n-way join is smaller 
than it in 1 MapTask or ReduceTask
{noformat}


Can you give some suggestion?


> Small table is converted to map join even the total size of small tables 
> exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
> -
>
> Key: HIVE-17018
> URL: https://issues.apache.org/jira/browse/HIVE-17018
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt
>
>
>  we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it 
> means  the sum of size for n-1 of the tables/partitions for a n-way join is 
> smaller than it, it will be converted to a map join. for example, A join B 
> join C join D join E. Big table is A(100M), small tables are 
> B(10M),C(10M),D(10M),E(10M).  If we set 
> hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B 
> will be converted to map join but C will not be converted to map join. In my 
> understanding, because hive.auto.convert.join.noconditionaltask.size can only 
> contain E and D, so C and B should not be converted to map join.  
> Let's explain more why E can be converted to map join.
> in current code, 
> [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364]
>  calculates all the mapjoins  in the parent path and child path. The search 
> stops when encountering [UnionOperator or 
> ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381].
>  Because C is not converted to map join because {{connectedMapJoinSize + 
> totalSize) > maxSize}} [see 
> code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The
>  RS before the join of C remains. When calculating whether B will be 
> converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering 
> [RS 
> |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409]
>  and causes  {{connectedMapJoinSize + totalSize) < maxSize}} matches.
> [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not  as you 
> are more familiar with SparkJoinOptimizer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084770#comment-16084770
 ] 

Vineet Garg commented on HIVE-16793:


It does if gby keys are constant

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COM

[jira] [Updated] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators

2017-07-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16100:
---
Attachment: HIVE-16100.5.patch

> Dynamic Sorted Partition optimizer loses sibling operators
> --
>
> Key: HIVE-16100
> URL: https://issues.apache.org/jira/browse/HIVE-16100
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, 
> HIVE-16100.2.patch, HIVE-16100.3.patch, HIVE-16100.4.patch, HIVE-16100.5.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173
> {code}
>   // unlink connection between FS and its parent
>   fsParent = fsOp.getParentOperators().get(0);
>   fsParent.getChildOperators().clear();
> {code}
> The optimizer discards any cases where the fsParent has another SEL child 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-07-12 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084817#comment-16084817
 ] 

Siddharth Seth commented on HIVE-16926:
---

bq.  Is there any action needed on this part?
I don't thing there is, unless you see this as a problem for the running spark 
task. The  number of threads created etc is quite small afaik.

bq. Maybe I can just replace pendingClients/registeredClients with a single 
list and the RequestInfo can keep a state to show if the request is 
pending/running/etc.
That'll work as well. Think there's still 2 places which have similar code 
related to heartbeats - heartbeat / nodePinged.

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch, HIVE-16926.4.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17079) LLAP: Use FQDN by default for work submission

2017-07-12 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17079:
-
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> LLAP: Use FQDN by default for work submission
> -
>
> Key: HIVE-17079
> URL: https://issues.apache.org/jira/browse/HIVE-17079
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 3.0.0
>
> Attachments: HIVE-17079.1.patch
>
>
> HIVE-14624 added FDQN for work submission. We should enable it by default to 
> avoid DNS issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084858#comment-16084858
 ] 

Hive QA commented on HIVE-16832:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876914/HIVE-16832.22.patch

{color:green}SUCCESS:{color} +1 due to 12 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10888 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_queries] 
(batchId=94)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5989/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5989/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5989/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876914 - PreCommit-HIVE-Build

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, 
> HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, 
> HIVE-16832.21.patch, HIVE-16832.22.patch
>
>
> {noformat}
>  create table AcidTablePart(a int, b int) partitioned by (p string) clustered 
> by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
>  create temporary table if not exists data1 (x int);
>  insert into data1 values (1);
>  from data1
>insert into AcidTablePart partition(p) select 0, 0, 'p' || x
>insert into AcidTablePart partition(p='p1') select 0, 1
> {noformat}
> Each branch of this multi-insert create a row in partition p1/bucket0 with 
> ROW__ID=(1,0,0).
> The same can happen when running SQL Merge (HIVE-10924) statement that has 
> both Insert and Update clauses when target table has 
> _'transactional'='true','transactional_properties'='default'_  (see 
> HIVE-14035).  This is so because Merge is internally run as a multi-insert 
> statement.
> The solution relies on statement ID introduced in HIVE-11030.  Each Insert 
> clause of a multi-insert is gets a unique ID.
> The ROW__ID.bucketId now becomes a bit packed triplet (format version, 
> bucketId, statementId).
> (Since ORC stores field names in the data file we can't rename 
> ROW__ID.bucketId).
> This ensures that there are no collisions and retains desired sort properties 
> of ROW__ID.
> In particular _SortedDynPartitionOptimizer_ works w/o any changes even in 
> cases where there fewer reducers than buckets.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16979) Cache UGI for metastore

2017-07-12 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084860#comment-16084860
 ] 

Tao Li commented on HIVE-16979:
---

[~gopalv] Can you please take a look at the patch? Thanks!

> Cache UGI for metastore
> ---
>
> Key: HIVE-16979
> URL: https://issues.apache.org/jira/browse/HIVE-16979
> Project: Hive
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-16979.1.patch, HIVE-16979.2.patch, 
> HIVE-16979.3.patch
>
>
> FileSystem.closeAllForUGI is called per request against metastore to dispose 
> UGI, which involves talking to HDFS name node and is time consuming. So the 
> perf improvement would be caching and reusing the UGI.
> Per FileSystem.closeAllForUG call could take up to 20 ms as E2E latency 
> against HDFS. Usually a Hive query could result in several calls against 
> metastore, so we can save up to 50-100 ms per hive query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16832:
--
  Resolution: Fixed
   Fix Version/s: 3.0.0
Target Version/s: 3.0.0  (was: 3.0.0, 2.4.0)
  Status: Resolved  (was: Patch Available)

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, 
> HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, 
> HIVE-16832.21.patch, HIVE-16832.22.patch
>
>
> {noformat}
>  create table AcidTablePart(a int, b int) partitioned by (p string) clustered 
> by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
>  create temporary table if not exists data1 (x int);
>  insert into data1 values (1);
>  from data1
>insert into AcidTablePart partition(p) select 0, 0, 'p' || x
>insert into AcidTablePart partition(p='p1') select 0, 1
> {noformat}
> Each branch of this multi-insert create a row in partition p1/bucket0 with 
> ROW__ID=(1,0,0).
> The same can happen when running SQL Merge (HIVE-10924) statement that has 
> both Insert and Update clauses when target table has 
> _'transactional'='true','transactional_properties'='default'_  (see 
> HIVE-14035).  This is so because Merge is internally run as a multi-insert 
> statement.
> The solution relies on statement ID introduced in HIVE-11030.  Each Insert 
> clause of a multi-insert is gets a unique ID.
> The ROW__ID.bucketId now becomes a bit packed triplet (format version, 
> bucketId, statementId).
> (Since ORC stores field names in the data file we can't rename 
> ROW__ID.bucketId).
> This ensures that there are no collisions and retains desired sort properties 
> of ROW__ID.
> In particular _SortedDynPartitionOptimizer_ works w/o any changes even in 
> cases where there fewer reducers than buckets.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14947) Add support for Acid 2 in Merge

2017-07-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084884#comment-16084884
 ] 

Eugene Koifman commented on HIVE-14947:
---

fixed via HIVE-16832 

> Add support for Acid 2 in Merge
> ---
>
> Key: HIVE-14947
> URL: https://issues.apache.org/jira/browse/HIVE-14947
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 3.0.0
>
>
> HIVE-14035 etc introduced a more efficient data layout for acid tables
> Additional work is needed to support Merge for these tables
> Need to make sure we generate unique ROW__IDs in each branch of the 
> multi-insert statement.  StatementId was introduced in HIVE-11030 but it's 
> not surfaced from storage layer.  It needs to be made part of ROW__ID to 
> ensure unique ROW__ID from concurrent writes from the same query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-14947) Add support for Acid 2 in Merge

2017-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved HIVE-14947.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Add support for Acid 2 in Merge
> ---
>
> Key: HIVE-14947
> URL: https://issues.apache.org/jira/browse/HIVE-14947
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 3.0.0
>
>
> HIVE-14035 etc introduced a more efficient data layout for acid tables
> Additional work is needed to support Merge for these tables
> Need to make sure we generate unique ROW__IDs in each branch of the 
> multi-insert statement.  StatementId was introduced in HIVE-11030 but it's 
> not surfaced from storage layer.  It needs to be made part of ROW__ID to 
> ensure unique ROW__ID from concurrent writes from the same query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16953) OrcRawRecordMerger.discoverOriginalKeyBounds issue if both split start and end are in the same stripe

2017-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16953:
--
Summary: OrcRawRecordMerger.discoverOriginalKeyBounds issue if both split 
start and end are in the same stripe  (was: 
OrcRawRecordMerger.discoverOriginalKeyBounds)

> OrcRawRecordMerger.discoverOriginalKeyBounds issue if both split start and 
> end are in the same stripe
> -
>
> Key: HIVE-16953
> URL: https://issues.apache.org/jira/browse/HIVE-16953
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>
> if getOffset() and getMaxOffset() are inside
> * the sames tripe - in this case we have minKey & isTail=false but 
> rowLength is never set.
> don't know if we can ever have a split like that



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Yibing Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084891#comment-16084891
 ] 

Yibing Shi commented on HIVE-17078:
---

I am trying to keep the current behaviour. With Hive CLI, by default Hive logs 
are not printed. Some users may rely on the stdout/stderr information. I don't 
want to surprise them.
If you still think it is unnecessary to print child stdout/stderr to Hive 
stdout/stderr, I can remove the corresponding code.

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-07-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084878#comment-16084878
 ] 

Eugene Koifman commented on HIVE-16832:
---

no related failures (see builds 5985,5984 for same failures)
HIVE-16832.22.patch committed to master (3.0)
thanks Gopal for the review

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, 
> HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, 
> HIVE-16832.21.patch, HIVE-16832.22.patch
>
>
> {noformat}
>  create table AcidTablePart(a int, b int) partitioned by (p string) clustered 
> by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
>  create temporary table if not exists data1 (x int);
>  insert into data1 values (1);
>  from data1
>insert into AcidTablePart partition(p) select 0, 0, 'p' || x
>insert into AcidTablePart partition(p='p1') select 0, 1
> {noformat}
> Each branch of this multi-insert create a row in partition p1/bucket0 with 
> ROW__ID=(1,0,0).
> The same can happen when running SQL Merge (HIVE-10924) statement that has 
> both Insert and Update clauses when target table has 
> _'transactional'='true','transactional_properties'='default'_  (see 
> HIVE-14035).  This is so because Merge is internally run as a multi-insert 
> statement.
> The solution relies on statement ID introduced in HIVE-11030.  Each Insert 
> clause of a multi-insert is gets a unique ID.
> The ROW__ID.bucketId now becomes a bit packed triplet (format version, 
> bucketId, statementId).
> (Since ORC stores field names in the data file we can't rename 
> ROW__ID.bucketId).
> This ensures that there are no collisions and retains desired sort properties 
> of ROW__ID.
> In particular _SortedDynPartitionOptimizer_ works w/o any changes even in 
> cases where there fewer reducers than buckets.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17013) Delete request with a subquery based on select over a view

2017-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17013:
--
Component/s: Transactions

> Delete request with a subquery based on select over a view
> --
>
> Key: HIVE-17013
> URL: https://issues.apache.org/jira/browse/HIVE-17013
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Frédéric ESCANDELL
>Priority: Blocker
>
> Hi, 
> I based my DDL on this exemple 
> https://fr.hortonworks.com/tutorial/using-hive-acid-transactions-to-insert-update-and-delete-data/.
> In a delete request, the use of a view in a subquery throw an exception : 
> FAILED: IllegalStateException Expected 'insert into table default.mydim 
> select ROW__ID from default.mydim sort by ROW__ID' to be in sub-query or set 
> operation.
> {code}
> {code:sql}
> drop table if exists mydim;
> create table mydim (key int, name string, zip string, is_current boolean)
> clustered by(key) into 3 buckets
> stored as orc tblproperties ('transactional'='true');
> insert into mydim values
>   (1, 'bob',   '95136', true),
>   (2, 'joe',   '70068', true),
>   (3, 'steve', '22150', true);
> drop table if exists updates_staging_table;
> create table updates_staging_table (key int, newzip string);
> insert into updates_staging_table values (1, 87102), (3, 45220);
> drop view if exists updates_staging_view;
> create view updates_staging_view (key, newzip) as select key, newzip from 
> updates_staging_table;
> delete from mydim
> where mydim.key in (select key from updates_staging_view);
> FAILED: IllegalStateException Expected 'insert into table default.mydim 
> select ROW__ID from default.mydim sort by ROW__ID' to be in sub-query or set 
> operation.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >