[jira] [Commented] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762451#comment-16762451
 ] 

Peter Vary commented on HIVE-21071:
---

+1

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.10.patch, 
> HIVE-21071.11.patch, HIVE-21071.2.patch, HIVE-21071.3.patch, 
> HIVE-21071.4.patch, HIVE-21071.5.patch, HIVE-21071.6.patch, 
> HIVE-21071.7.patch, HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21227) HIVE-20776 causes view access regression

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762427#comment-16762427
 ] 

Hive QA commented on HIVE-21227:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 7s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  2m 
30s{color} | {color:blue} standalone-metastore/metastore-common in master has 
29 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15982/dev-support/hive-personality.sh
 |
| git revision | master / 6508716 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15982/yetus/patch-asflicense-problems.txt
 |
| modules | C: standalone-metastore/metastore-common U: 
standalone-metastore/metastore-common |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15982/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> HIVE-20776 causes view access regression
> 
>
> Key: HIVE-21227
> URL: https://issues.apache.org/jira/browse/HIVE-21227
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Na Li
>Assignee: Na Li
>Priority: Major
> Attachments: HIVE-21227.001.patch
>
>
> HIVE-20776 introduces a change that causes regression for view access.
> Before the change, a user with select access of a view can get all columns of 
> a view with select access of a view that is derived from a partitioned table.
> With the change, that user cannot access that view.
> The reason is that
> * When user accesses columns of a view, Hive needs to get the partitions of 
> the table that the view is derived from. The user name is the user who issues 
> the query to access the view.
> *  The change in HIVE-20776 checks if user has access to a table before 
> getting its partitions. When user only has access of a view, not the access 
> of a table itself, this change denies the user access of the view. 
> The solution is when getting table partitions, do not filter on table at HMS 
> client



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21177) Optimize AcidUtils.getLogicalLength()

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762413#comment-16762413
 ] 

Hive QA commented on HIVE-21177:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957848/HIVE-21177.03.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 15721 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.parse.TestReplWithJsonMessageFormat.org.apache.hadoop.hive.ql.parse.TestReplWithJsonMessageFormat
 (batchId=244)
org.apache.hive.beeline.hs2connection.TestBeelineWithUserHs2ConnectionFile.testBeelineConnectionNoAuth
 (batchId=256)
org.apache.hive.service.auth.TestLdapAuthenticationProviderImpl.testAuthenticateWithBindInCredentialFilePasses
 (batchId=232)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15981/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15981/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15981/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957848 - PreCommit-HIVE-Build

> Optimize AcidUtils.getLogicalLength()
> -
>
> Key: HIVE-21177
> URL: https://issues.apache.org/jira/browse/HIVE-21177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-21177.01.patch, HIVE-21177.02.patch, 
> HIVE-21177.03.patch
>
>
> {{AcidUtils.getLogicalLength()}} - tries look for the side file 
> {{OrcAcidUtils.getSideFile()}} on the file system even when the file couldn't 
> possibly be there, e.g. when the path is delta_x_x or base_x.  It could only 
> be there in delta_x_y, x != y.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21177) Optimize AcidUtils.getLogicalLength()

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762401#comment-16762401
 ] 

Hive QA commented on HIVE-21177:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
57s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
47s{color} | {color:red} ql: The patch generated 8 new + 720 unchanged - 5 
fixed = 728 total (was 725) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
12s{color} | {color:red} ql generated 1 new + 2297 unchanged - 1 fixed = 2298 
total (was 2298) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  org.apache.hadoop.hive.ql.io.AcidUtils$ParsedDeltaLight defines 
compareTo(AcidUtils$ParsedDeltaLight) and uses Object.equals()  At 
AcidUtils.java:Object.equals()  At AcidUtils.java:[lines 915-943] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15981/dev-support/hive-personality.sh
 |
| git revision | master / 0e4d16b |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15981/yetus/diff-checkstyle-ql.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15981/yetus/new-findbugs-ql.html
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15981/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15981/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Optimize AcidUtils.getLogicalLength()
> -
>
> Key: HIVE-21177
> URL: https://issues.apache.org/jira/browse/HIVE-21177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-21177.01.patch, HIVE-21177.02.patch, 
> HIVE-21177.03.patch
>
>
> {{AcidUtils.getLogicalLength()}} - tries look for the side file 
> {{OrcAcidUtils.getSideFile()}} on the file system even when the file couldn't 
> possibly be there, e.g. when the path is delta_x_x or base_x.  It could only 
> be there in delta_x_y, x != y.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21222) ACID: When there are no delete deltas skip finding min max keys

2019-02-06 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762394#comment-16762394
 ] 

Eugene Koifman commented on HIVE-21222:
---

failure not related


> ACID: When there are no delete deltas skip finding min max keys
> ---
>
> Key: HIVE-21222
> URL: https://issues.apache.org/jira/browse/HIVE-21222
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21222.1.patch, HIVE-21222.2.patch
>
>
> We create an orc reader in VectorizedOrcAcidRowBatchReader.findMinMaxKeys 
> (which will read 16K footer) even for cases where delete deltas does not 
> exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21222) ACID: When there are no delete deltas skip finding min max keys

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762386#comment-16762386
 ] 

Hive QA commented on HIVE-21222:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957841/HIVE-21222.2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15777 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.service.auth.TestLdapAuthenticationProviderImpl.testAuthenticateWithBindInCredentialFilePasses
 (batchId=232)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15980/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15980/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15980/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957841 - PreCommit-HIVE-Build

> ACID: When there are no delete deltas skip finding min max keys
> ---
>
> Key: HIVE-21222
> URL: https://issues.apache.org/jira/browse/HIVE-21222
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21222.1.patch, HIVE-21222.2.patch
>
>
> We create an orc reader in VectorizedOrcAcidRowBatchReader.findMinMaxKeys 
> (which will read 16K footer) even for cases where delete deltas does not 
> exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21222) ACID: When there are no delete deltas skip finding min max keys

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762368#comment-16762368
 ] 

Hive QA commented on HIVE-21222:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
55s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 1 new + 278 unchanged - 0 
fixed = 279 total (was 278) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15980/dev-support/hive-personality.sh
 |
| git revision | master / 0e4d16b |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15980/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15980/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15980/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> ACID: When there are no delete deltas skip finding min max keys
> ---
>
> Key: HIVE-21222
> URL: https://issues.apache.org/jira/browse/HIVE-21222
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21222.1.patch, HIVE-21222.2.patch
>
>
> We create an orc reader in VectorizedOrcAcidRowBatchReader.findMinMaxKeys 
> (which will read 16K footer) even for cases where delete deltas does not 
> exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20841) LLAP: Make dynamic ports configurable

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762347#comment-16762347
 ] 

Hive QA commented on HIVE-20841:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957842/HIVE-20841.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15777 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.service.auth.TestLdapAuthenticationProviderImpl.testAuthenticateWithBindInCredentialFilePasses
 (batchId=232)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15979/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15979/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15979/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957842 - PreCommit-HIVE-Build

> LLAP: Make dynamic ports configurable
> -
>
> Key: HIVE-20841
> URL: https://issues.apache.org/jira/browse/HIVE-20841
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-20841.1.patch, HIVE-20841.2.patch
>
>
> Some ports in llap -> tez interaction code uses dynamic ports, provide an 
> option to make them configurable to facilitate adding them to iptable rules 
> in some environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21210:
---
Attachment: HIVE-21210.8.patch

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch, HIVE-21210.8.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21210:
---
Status: Patch Available  (was: Open)

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch, HIVE-21210.8.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21210:
---
Status: Open  (was: Patch Available)

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch, HIVE-21210.8.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20841) LLAP: Make dynamic ports configurable

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762328#comment-16762328
 ] 

Hive QA commented on HIVE-20841:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
23s{color} | {color:blue} llap-client in master has 26 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
25s{color} | {color:blue} llap-tez in master has 17 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
12s{color} | {color:red} llap-tez: The patch generated 2 new + 110 unchanged - 
1 fixed = 112 total (was 111) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 18m 17s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15979/dev-support/hive-personality.sh
 |
| git revision | master / 0e4d16b |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15979/yetus/diff-checkstyle-llap-tez.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15979/yetus/patch-asflicense-problems.txt
 |
| modules | C: common llap-client llap-tez U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15979/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> LLAP: Make dynamic ports configurable
> -
>
> Key: HIVE-20841
> URL: https://issues.apache.org/jira/browse/HIVE-20841
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-20841.1.patch, HIVE-20841.2.patch
>
>
> Some ports in llap -> tez interaction code uses dynamic ports, provide an 
> option to make them configurable to facilitate adding them to iptable rules 
> in some environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21227) HIVE-20776 causes view access regression

2019-02-06 Thread Na Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Li updated HIVE-21227:
-
Status: Patch Available  (was: Open)

> HIVE-20776 causes view access regression
> 
>
> Key: HIVE-21227
> URL: https://issues.apache.org/jira/browse/HIVE-21227
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Na Li
>Assignee: Na Li
>Priority: Major
> Attachments: HIVE-21227.001.patch
>
>
> HIVE-20776 introduces a change that causes regression for view access.
> Before the change, a user with select access of a view can get all columns of 
> a view with select access of a view that is derived from a partitioned table.
> With the change, that user cannot access that view.
> The reason is that
> * When user accesses columns of a view, Hive needs to get the partitions of 
> the table that the view is derived from. The user name is the user who issues 
> the query to access the view.
> *  The change in HIVE-20776 checks if user has access to a table before 
> getting its partitions. When user only has access of a view, not the access 
> of a table itself, this change denies the user access of the view. 
> The solution is when getting table partitions, do not filter on table at HMS 
> client



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21227) HIVE-20776 causes view access regression

2019-02-06 Thread Na Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Li updated HIVE-21227:
-
Attachment: HIVE-21227.001.patch

> HIVE-20776 causes view access regression
> 
>
> Key: HIVE-21227
> URL: https://issues.apache.org/jira/browse/HIVE-21227
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Na Li
>Assignee: Na Li
>Priority: Major
> Attachments: HIVE-21227.001.patch
>
>
> HIVE-20776 introduces a change that causes regression for view access.
> Before the change, a user with select access of a view can get all columns of 
> a view with select access of a view that is derived from a partitioned table.
> With the change, that user cannot access that view.
> The reason is that
> * When user accesses columns of a view, Hive needs to get the partitions of 
> the table that the view is derived from. The user name is the user who issues 
> the query to access the view.
> *  The change in HIVE-20776 checks if user has access to a table before 
> getting its partitions. When user only has access of a view, not the access 
> of a table itself, this change denies the user access of the view. 
> The solution is when getting table partitions, do not filter on table at HMS 
> client



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21227) HIVE-20776 causes view access regression

2019-02-06 Thread Na Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Li updated HIVE-21227:
-
Description: 
HIVE-20776 introduces a change that causes regression for view access.

Before the change, a user with select access of a view can get all columns of a 
view with select access of a view that is derived from a partitioned table.

With the change, that user cannot access that view.

The reason is that
* When user accesses columns of a view, Hive needs to get the partitions of the 
table that the view is derived from. The user name is the user who issues the 
query to access the view.
*  The change in HIVE-20776 checks if user has access to a table before getting 
its partitions. When user only has access of a view, not the access of a table 
itself, this change denies the user access of the view. 

The solution is when getting table partitions, do not filter on table at HMS 
client

  was:
HIVE-20776 introduces a change that causes regression for view access.

Before the change, a user with select access of a view can get all columns of a 
view with select access of a view that is derived from a partitioned table.

With the change, that user cannot access that view.

The reason is that
* When user accesses columns of a view, Hive needs to get the partitions of the 
table that the view is derived from. The user name is the user who issues the 
query to access the view.
*  The change in HIVE-20776 checks if user has access to a table before getting 
its partitions. When user only has access of a view, not the access of a table 
itself, this change denies the user access of the view. 


> HIVE-20776 causes view access regression
> 
>
> Key: HIVE-21227
> URL: https://issues.apache.org/jira/browse/HIVE-21227
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Na Li
>Assignee: Na Li
>Priority: Major
>
> HIVE-20776 introduces a change that causes regression for view access.
> Before the change, a user with select access of a view can get all columns of 
> a view with select access of a view that is derived from a partitioned table.
> With the change, that user cannot access that view.
> The reason is that
> * When user accesses columns of a view, Hive needs to get the partitions of 
> the table that the view is derived from. The user name is the user who issues 
> the query to access the view.
> *  The change in HIVE-20776 checks if user has access to a table before 
> getting its partitions. When user only has access of a view, not the access 
> of a table itself, this change denies the user access of the view. 
> The solution is when getting table partitions, do not filter on table at HMS 
> client



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21103) PartitionManagementTask should not modify DN configs to avoid closing persistence manager

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762322#comment-16762322
 ] 

Hive QA commented on HIVE-21103:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957834/HIVE-21103.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15777 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.service.auth.TestLdapAuthenticationProviderImpl.testAuthenticateWithBindInCredentialFilePasses
 (batchId=232)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15978/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15978/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15978/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957834 - PreCommit-HIVE-Build

> PartitionManagementTask should not modify DN configs to avoid closing 
> persistence manager
> -
>
> Key: HIVE-21103
> URL: https://issues.apache.org/jira/browse/HIVE-21103
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-21103.1.patch, HIVE-21103.2.patch, 
> HIVE-21103.3.patch
>
>
> HIVE-20707 added automatic partition management which uses thread pools to 
> run parallel msck repair. It also modifies datanucleus connection pool size 
> to avoid explosion of connections to backend database. But object store 
> closes the persistence manager when it detects a change in datanuclues or jdo 
> configs. So when PartitionManagementTask is running and when HS2 tries to 
> connect to metastore HS2 will get persistence manager close exception. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21227) HIVE-20776 causes view access regression

2019-02-06 Thread Na Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Li reassigned HIVE-21227:



> HIVE-20776 causes view access regression
> 
>
> Key: HIVE-21227
> URL: https://issues.apache.org/jira/browse/HIVE-21227
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Na Li
>Assignee: Na Li
>Priority: Major
>
> HIVE-20776 introduces a change that causes regression for view access.
> Before the change, a user with select access of a view can get all columns of 
> a view with select access of a view that is derived from a partitioned table.
> With the change, that user cannot access that view.
> The reason is that
> * When user accesses columns of a view, Hive needs to get the partitions of 
> the table that the view is derived from. The user name is the user who issues 
> the query to access the view.
> *  The change in HIVE-20776 checks if user has access to a table before 
> getting its partitions. When user only has access of a view, not the access 
> of a table itself, this change denies the user access of the view. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21177) Optimize AcidUtils.getLogicalLength()

2019-02-06 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762302#comment-16762302
 ] 

Eugene Koifman commented on HIVE-21177:
---

patch 3 - some more refactoring to use Path rather than FileStatus


> Optimize AcidUtils.getLogicalLength()
> -
>
> Key: HIVE-21177
> URL: https://issues.apache.org/jira/browse/HIVE-21177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-21177.01.patch, HIVE-21177.02.patch, 
> HIVE-21177.03.patch
>
>
> {{AcidUtils.getLogicalLength()}} - tries look for the side file 
> {{OrcAcidUtils.getSideFile()}} on the file system even when the file couldn't 
> possibly be there, e.g. when the path is delta_x_x or base_x.  It could only 
> be there in delta_x_y, x != y.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21177) Optimize AcidUtils.getLogicalLength()

2019-02-06 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762308#comment-16762308
 ] 

Prasanth Jayachandran commented on HIVE-21177:
--

lgtm, +1. Pending tests. 

> Optimize AcidUtils.getLogicalLength()
> -
>
> Key: HIVE-21177
> URL: https://issues.apache.org/jira/browse/HIVE-21177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-21177.01.patch, HIVE-21177.02.patch, 
> HIVE-21177.03.patch
>
>
> {{AcidUtils.getLogicalLength()}} - tries look for the side file 
> {{OrcAcidUtils.getSideFile()}} on the file system even when the file couldn't 
> possibly be there, e.g. when the path is delta_x_x or base_x.  It could only 
> be there in delta_x_y, x != y.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21103) PartitionManagementTask should not modify DN configs to avoid closing persistence manager

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762305#comment-16762305
 ] 

Hive QA commented on HIVE-21103:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  2m 
27s{color} | {color:blue} standalone-metastore/metastore-common in master has 
29 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m  
4s{color} | {color:blue} standalone-metastore/metastore-server in master has 
184 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15978/dev-support/hive-personality.sh
 |
| git revision | master / 0e4d16b |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15978/yetus/patch-asflicense-problems.txt
 |
| modules | C: standalone-metastore/metastore-common 
standalone-metastore/metastore-server U: standalone-metastore |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15978/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> PartitionManagementTask should not modify DN configs to avoid closing 
> persistence manager
> -
>
> Key: HIVE-21103
> URL: https://issues.apache.org/jira/browse/HIVE-21103
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-21103.1.patch, HIVE-21103.2.patch, 
> HIVE-21103.3.patch
>
>
> HIVE-20707 added automatic partition management which uses thread pools to 
> run parallel msck repair. It also modifies datanucleus connection pool size 
> to avoid explosion of connections to backend database. But object store 
> closes the persistence manager when it detects a change in datanuclues or jdo 
> configs. So when PartitionManagementTask is running and when HS2 tries 

[jira] [Updated] (HIVE-21177) Optimize AcidUtils.getLogicalLength()

2019-02-06 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-21177:
--
Attachment: HIVE-21177.03.patch

> Optimize AcidUtils.getLogicalLength()
> -
>
> Key: HIVE-21177
> URL: https://issues.apache.org/jira/browse/HIVE-21177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-21177.01.patch, HIVE-21177.02.patch, 
> HIVE-21177.03.patch
>
>
> {{AcidUtils.getLogicalLength()}} - tries look for the side file 
> {{OrcAcidUtils.getSideFile()}} on the file system even when the file couldn't 
> possibly be there, e.g. when the path is delta_x_x or base_x.  It could only 
> be there in delta_x_y, x != y.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21224) Upgrade tests JUnit3 to JUnit4

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762287#comment-16762287
 ] 

Hive QA commented on HIVE-21224:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
39s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
30s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
40s{color} | {color:blue} serde in master has 197 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
2s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
37s{color} | {color:blue} service in master has 48 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
28s{color} | {color:blue} cli in master has 13 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
22s{color} | {color:blue} contrib in master has 10 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
28s{color} | {color:blue} druid-handler in master has 3 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
31s{color} | {color:blue} hbase-handler in master has 15 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
37s{color} | {color:blue} hcatalog/core in master has 30 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
27s{color} | {color:blue} hcatalog/webhcat/java-client in master has 3 extant 
Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
31s{color} | {color:blue} hcatalog/webhcat/svr in master has 96 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
43s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
57s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
26s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
12s{color} | {color:red} common: The patch generated 3 new + 9 unchanged - 3 
fixed = 12 total (was 12) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
18s{color} | {color:red} serde: The patch generated 19 new + 288 unchanged - 20 
fixed = 307 total (was 308) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
46s{color} | {color:red} ql: The patch generated 90 new + 919 unchanged - 96 
fixed = 1009 total (was 1015) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
13s{color} | {color:red} service: The patch generated 5 new + 17 unchanged - 6 
fixed = 22 total (was 23) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
10s{color} | {color:red} cli: The patch generated 1 new + 9 unchanged - 1 fixed 
= 10 total (was 10) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
14s{color} | {color:red} hbase-handler: The patch generated 1 new + 227 
unchanged - 1 fixed = 228 total (was 228) 

[jira] [Commented] (HIVE-21224) Upgrade tests JUnit3 to JUnit4

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762289#comment-16762289
 ] 

Hive QA commented on HIVE-21224:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957832/HIVE-21224.3.patch

{color:green}SUCCESS:{color} +1 due to 153 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 15773 tests 
executed
*Failed tests:*
{noformat}
TestGenericUDFConcat - did not produce a TEST-*.xml file (likely timed out) 
(batchId=287)
TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) 
(batchId=254)
TestMTQueries - did not produce a TEST-*.xml file (likely timed out) 
(batchId=252)
org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testAutoPurgeTablesAndPartitions
 (batchId=308)
org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testHiveRefreshOnConfChange 
(batchId=308)
org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testPartition (batchId=308)
org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testTable (batchId=308)
org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testThriftTable (batchId=308)
org.apache.hive.service.auth.TestLdapAuthenticationProviderImpl.testAuthenticateWithBindInCredentialFilePasses
 (batchId=232)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15977/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15977/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15977/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957832 - PreCommit-HIVE-Build

> Upgrade tests JUnit3 to JUnit4
> --
>
> Key: HIVE-21224
> URL: https://issues.apache.org/jira/browse/HIVE-21224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Major
> Attachments: HIVE-21224.1.patch, HIVE-21224.2.patch, 
> HIVE-21224.3.patch
>
>
> Old JUnit3 tests should be upgraded to JUnit4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762263#comment-16762263
 ] 

Hive QA commented on HIVE-21210:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957830/HIVE-21210.7.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15779 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestSSL.testMetastoreWithSSL (batchId=260)
org.apache.hive.service.auth.TestLdapAuthenticationProviderImpl.testAuthenticateWithBindInCredentialFilePasses
 (batchId=232)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15975/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15975/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15975/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957830 - PreCommit-HIVE-Build

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and 

[jira] [Commented] (HIVE-20523) Improve table statistics for Parquet format

2019-02-06 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762260#comment-16762260
 ] 

Ashutosh Chauhan commented on HIVE-20523:
-

+1 patch needs a refresh and rerun.

> Improve table statistics for Parquet format
> ---
>
> Key: HIVE-20523
> URL: https://issues.apache.org/jira/browse/HIVE-20523
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Minor
> Attachments: HIVE-20523.1.patch, HIVE-20523.10.patch, 
> HIVE-20523.11.patch, HIVE-20523.12.patch, HIVE-20523.2.patch, 
> HIVE-20523.3.patch, HIVE-20523.4.patch, HIVE-20523.5.patch, 
> HIVE-20523.6.patch, HIVE-20523.7.patch, HIVE-20523.8.patch, 
> HIVE-20523.9.patch, HIVE-20523.patch
>
>
> Right now, in the table basic statistics, the *raw data size* for a row with 
> any data type in the Parquet format is 1. This is an underestimated value 
> when columns are complex data structures, like arrays.
> Having tables with underestimated raw data size makes Hive assign less 
> containers (mappers/reducers) to it, making the overall query slower. 
> Heavy underestimation also makes Hive choose MapJoin instead of the 
> ShuffleJoin that can fail with OOM errors.
> In this patch, I compute the columns data size better, taking into account 
> complex structures. I followed the Writer implementation for the ORC format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762249#comment-16762249
 ] 

Hive QA commented on HIVE-21210:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
40s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
0s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
12s{color} | {color:red} common: The patch generated 2 new + 0 unchanged - 0 
fixed = 2 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 1 new + 7 unchanged - 49 fixed 
= 8 total (was 56) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15975/dev-support/hive-personality.sh
 |
| git revision | master / 0e4d16b |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15975/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15975/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15975/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15975/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I 

[jira] [Commented] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762248#comment-16762248
 ] 

BELUGA BEHR commented on HIVE-21071:


[~pvary] I updated the patch, included the UT you requested.  Passed YETUS.  
Please consider for inclusion into the project.

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.10.patch, 
> HIVE-21071.11.patch, HIVE-21071.2.patch, HIVE-21071.3.patch, 
> HIVE-21071.4.patch, HIVE-21071.5.patch, HIVE-21071.6.patch, 
> HIVE-21071.7.patch, HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20841) LLAP: Make dynamic ports configurable

2019-02-06 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-20841:
-
Attachment: HIVE-20841.2.patch

> LLAP: Make dynamic ports configurable
> -
>
> Key: HIVE-20841
> URL: https://issues.apache.org/jira/browse/HIVE-20841
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-20841.1.patch, HIVE-20841.2.patch
>
>
> Some ports in llap -> tez interaction code uses dynamic ports, provide an 
> option to make them configurable to facilitate adding them to iptable rules 
> in some environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21222) ACID: When there are no delete deltas skip finding min max keys

2019-02-06 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762238#comment-16762238
 ] 

Prasanth Jayachandran commented on HIVE-21222:
--

fixes test failures. 

> ACID: When there are no delete deltas skip finding min max keys
> ---
>
> Key: HIVE-21222
> URL: https://issues.apache.org/jira/browse/HIVE-21222
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21222.1.patch, HIVE-21222.2.patch
>
>
> We create an orc reader in VectorizedOrcAcidRowBatchReader.findMinMaxKeys 
> (which will read 16K footer) even for cases where delete deltas does not 
> exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21222) ACID: When there are no delete deltas skip finding min max keys

2019-02-06 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21222:
-
Attachment: HIVE-21222.2.patch

> ACID: When there are no delete deltas skip finding min max keys
> ---
>
> Key: HIVE-21222
> URL: https://issues.apache.org/jira/browse/HIVE-21222
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21222.1.patch, HIVE-21222.2.patch
>
>
> We create an orc reader in VectorizedOrcAcidRowBatchReader.findMinMaxKeys 
> (which will read 16K footer) even for cases where delete deltas does not 
> exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762228#comment-16762228
 ] 

Hive QA commented on HIVE-21071:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957827/HIVE-21071.11.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15974/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15974/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15974/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL 
https://issues.apache.org/jira/secure/attachment/12957827/HIVE-21071.11.patch 
was found in seen patch url's cache and a test was probably run already on it. 
Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957827 - PreCommit-HIVE-Build

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.10.patch, 
> HIVE-21071.11.patch, HIVE-21071.2.patch, HIVE-21071.3.patch, 
> HIVE-21071.4.patch, HIVE-21071.5.patch, HIVE-21071.6.patch, 
> HIVE-21071.7.patch, HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762227#comment-16762227
 ] 

Hive QA commented on HIVE-21071:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957827/HIVE-21071.11.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15773 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15973/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15973/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15973/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957827 - PreCommit-HIVE-Build

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.10.patch, 
> HIVE-21071.11.patch, HIVE-21071.2.patch, HIVE-21071.3.patch, 
> HIVE-21071.4.patch, HIVE-21071.5.patch, HIVE-21071.6.patch, 
> HIVE-21071.7.patch, HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21009) LDAP - Specify binddn for ldap-search

2019-02-06 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21009:
-
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Thanks [~mcginnda] for the contribution! Committed patch to master. 

> LDAP - Specify binddn for ldap-search
> -
>
> Key: HIVE-21009
> URL: https://issues.apache.org/jira/browse/HIVE-21009
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.0, 2.1.1, 2.2.0, 2.3.0, 2.3.1, 2.3.2
>Reporter: Thomas Uhren
>Assignee: David McGinnis
>Priority: Major
>  Labels: features, newbie, security
> Fix For: 4.0.0
>
> Attachments: 
> 0001-HIVE-21009-Adding-ability-for-user-to-set-bind-user-.patch, 
> HIVE-21009.01.patch, HIVE-21009.02.patch, HIVE-21009.03.patch, 
> HIVE-21009.04.patch, HIVE-21009.05.patch, HIVE-21009.06.patch, 
> HIVE-21009.07.patch, HIVE-21009.patch
>
>
> When user accounts cannot do an LDAP search, there is currently no way of 
> specifying a custom binddn to use for the ldap-search.
> So I'm missing something like that:
> {code}
> hive.server2.authentication.ldap.bindn=cn=ldapuser,ou=user,dc=example
> hive.server2.authentication.ldap.bindnpw=password
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21009) LDAP - Specify binddn for ldap-search

2019-02-06 Thread David McGinnis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David McGinnis updated HIVE-21009:
--
Attachment: 0001-HIVE-21009-Adding-ability-for-user-to-set-bind-user-.patch

> LDAP - Specify binddn for ldap-search
> -
>
> Key: HIVE-21009
> URL: https://issues.apache.org/jira/browse/HIVE-21009
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.0, 2.1.1, 2.2.0, 2.3.0, 2.3.1, 2.3.2
>Reporter: Thomas Uhren
>Assignee: David McGinnis
>Priority: Major
>  Labels: features, newbie, security
> Attachments: 
> 0001-HIVE-21009-Adding-ability-for-user-to-set-bind-user-.patch, 
> HIVE-21009.01.patch, HIVE-21009.02.patch, HIVE-21009.03.patch, 
> HIVE-21009.04.patch, HIVE-21009.05.patch, HIVE-21009.06.patch, 
> HIVE-21009.07.patch, HIVE-21009.patch
>
>
> When user accounts cannot do an LDAP search, there is currently no way of 
> specifying a custom binddn to use for the ldap-search.
> So I'm missing something like that:
> {code}
> hive.server2.authentication.ldap.bindn=cn=ldapuser,ou=user,dc=example
> hive.server2.authentication.ldap.bindnpw=password
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21009) LDAP - Specify binddn for ldap-search

2019-02-06 Thread David McGinnis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762210#comment-16762210
 ] 

David McGinnis commented on HIVE-21009:
---

[~prasanth_j] Added!

> LDAP - Specify binddn for ldap-search
> -
>
> Key: HIVE-21009
> URL: https://issues.apache.org/jira/browse/HIVE-21009
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.0, 2.1.1, 2.2.0, 2.3.0, 2.3.1, 2.3.2
>Reporter: Thomas Uhren
>Assignee: David McGinnis
>Priority: Major
>  Labels: features, newbie, security
> Attachments: 
> 0001-HIVE-21009-Adding-ability-for-user-to-set-bind-user-.patch, 
> HIVE-21009.01.patch, HIVE-21009.02.patch, HIVE-21009.03.patch, 
> HIVE-21009.04.patch, HIVE-21009.05.patch, HIVE-21009.06.patch, 
> HIVE-21009.07.patch, HIVE-21009.patch
>
>
> When user accounts cannot do an LDAP search, there is currently no way of 
> specifying a custom binddn to use for the ldap-search.
> So I'm missing something like that:
> {code}
> hive.server2.authentication.ldap.bindn=cn=ldapuser,ou=user,dc=example
> hive.server2.authentication.ldap.bindnpw=password
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21103) PartitionManagementTask should not modify DN configs to avoid closing persistence manager

2019-02-06 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21103:
-
Attachment: HIVE-21103.3.patch

> PartitionManagementTask should not modify DN configs to avoid closing 
> persistence manager
> -
>
> Key: HIVE-21103
> URL: https://issues.apache.org/jira/browse/HIVE-21103
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-21103.1.patch, HIVE-21103.2.patch, 
> HIVE-21103.3.patch
>
>
> HIVE-20707 added automatic partition management which uses thread pools to 
> run parallel msck repair. It also modifies datanucleus connection pool size 
> to avoid explosion of connections to backend database. But object store 
> closes the persistence manager when it detects a change in datanuclues or jdo 
> configs. So when PartitionManagementTask is running and when HS2 tries to 
> connect to metastore HS2 will get persistence manager close exception. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21009) LDAP - Specify binddn for ldap-search

2019-02-06 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762201#comment-16762201
 ] 

Prasanth Jayachandran commented on HIVE-21009:
--

[~mcginnda] can you please upload git format-patch to properly attribute the 
commit to your email?

> LDAP - Specify binddn for ldap-search
> -
>
> Key: HIVE-21009
> URL: https://issues.apache.org/jira/browse/HIVE-21009
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.0, 2.1.1, 2.2.0, 2.3.0, 2.3.1, 2.3.2
>Reporter: Thomas Uhren
>Assignee: David McGinnis
>Priority: Major
>  Labels: features, newbie, security
> Attachments: HIVE-21009.01.patch, HIVE-21009.02.patch, 
> HIVE-21009.03.patch, HIVE-21009.04.patch, HIVE-21009.05.patch, 
> HIVE-21009.06.patch, HIVE-21009.07.patch, HIVE-21009.patch
>
>
> When user accounts cannot do an LDAP search, there is currently no way of 
> specifying a custom binddn to use for the ldap-search.
> So I'm missing something like that:
> {code}
> hive.server2.authentication.ldap.bindn=cn=ldapuser,ou=user,dc=example
> hive.server2.authentication.ldap.bindnpw=password
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21224) Upgrade tests JUnit3 to JUnit4

2019-02-06 Thread Bruno Pusztahazi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21224:

Status: Patch Available  (was: Open)

> Upgrade tests JUnit3 to JUnit4
> --
>
> Key: HIVE-21224
> URL: https://issues.apache.org/jira/browse/HIVE-21224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Major
> Attachments: HIVE-21224.1.patch, HIVE-21224.2.patch, 
> HIVE-21224.3.patch
>
>
> Old JUnit3 tests should be upgraded to JUnit4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21224) Upgrade tests JUnit3 to JUnit4

2019-02-06 Thread Bruno Pusztahazi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21224:

Attachment: HIVE-21224.3.patch

> Upgrade tests JUnit3 to JUnit4
> --
>
> Key: HIVE-21224
> URL: https://issues.apache.org/jira/browse/HIVE-21224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Major
> Attachments: HIVE-21224.1.patch, HIVE-21224.2.patch, 
> HIVE-21224.3.patch
>
>
> Old JUnit3 tests should be upgraded to JUnit4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762197#comment-16762197
 ] 

Hive QA commented on HIVE-21071:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
1s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 11 new + 132 unchanged - 27 
fixed = 143 total (was 159) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
6s{color} | {color:green} ql generated 0 new + 2295 unchanged - 3 fixed = 2295 
total (was 2298) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15973/dev-support/hive-personality.sh
 |
| git revision | master / fae6256 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15973/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15973/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15973/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.10.patch, 
> HIVE-21071.11.patch, HIVE-21071.2.patch, HIVE-21071.3.patch, 
> HIVE-21071.4.patch, HIVE-21071.5.patch, HIVE-21071.6.patch, 
> HIVE-21071.7.patch, HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method 

[jira] [Updated] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21210:
---
Status: Patch Available  (was: Open)

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21210:
---
Attachment: HIVE-21210.7.patch

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21225) ACID: getAcidState() should cache a recursive dir listing locally

2019-02-06 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-21225:
---
Issue Type: Improvement  (was: Bug)

> ACID: getAcidState() should cache a recursive dir listing locally
> -
>
> Key: HIVE-21225
> URL: https://issues.apache.org/jira/browse/HIVE-21225
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal V
>Priority: Major
>
> Currently getAcidState() makes 3 calls into the FS api which could be 
> answered by making a single recursive listDir call and reusing the same data 
> to check for isRawFormat() and isValidBase().
> All delta operations for a single partition can go against a single listed 
> directory snapshot instead of interacting with the NameNode or ObjectStore 
> within the inner loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21210:
---
Status: Open  (was: Patch Available)

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21224) Upgrade tests JUnit3 to JUnit4

2019-02-06 Thread Bruno Pusztahazi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21224:

Status: Open  (was: Patch Available)

> Upgrade tests JUnit3 to JUnit4
> --
>
> Key: HIVE-21224
> URL: https://issues.apache.org/jira/browse/HIVE-21224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Major
> Attachments: HIVE-21224.1.patch, HIVE-21224.2.patch
>
>
> Old JUnit3 tests should be upgraded to JUnit4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21224) Upgrade tests JUnit3 to JUnit4

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762176#comment-16762176
 ] 

Hive QA commented on HIVE-21224:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957818/HIVE-21224.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15972/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15972/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15972/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-02-06 21:50:19.488
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-15972/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-02-06 21:50:19.492
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at fae6256 HIVE-21214 : MoveTask : Use attemptId instead of file 
size for deduplication of files compareTempOrDuplicateFiles() (Deepak Jaiswal, 
reviewed by Jason Dere)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at fae6256 HIVE-21214 : MoveTask : Use attemptId instead of file 
size for deduplication of files compareTempOrDuplicateFiles() (Deepak Jaiswal, 
reviewed by Jason Dere)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-02-06 21:50:20.244
+ rm -rf ../yetus_PreCommit-HIVE-Build-15972
+ mkdir ../yetus_PreCommit-HIVE-Build-15972
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-15972
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-15972/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: git apply -p0
/data/hiveptest/working/scratch/build.patch:2087: trailing whitespace.
 */ 
/data/hiveptest/working/scratch/build.patch:4575: trailing whitespace.
 */ 
/data/hiveptest/working/scratch/build.patch:5979: trailing whitespace.
 */ 
/data/hiveptest/working/scratch/build.patch:6852: trailing whitespace.
  fail("Field reports not null but object is null (class " + 
complexFieldObj.getClass().getName() + 
/data/hiveptest/working/scratch/build.patch:6862: trailing whitespace.
  fail("Field reports null but object is not null (class " + 
expectedObject.getClass().getName() + 
warning: 5 lines add whitespace errors.
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
protoc-jar: executing: [/tmp/protoc797468535851606821.exe, --version]
libprotoc 2.5.0
protoc-jar: executing: [/tmp/protoc797468535851606821.exe, 
-I/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore,
 
--java_out=/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/target/generated-sources,
 
/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto]
ANTLR Parser Generator  Version 3.5.2
protoc-jar: executing: [/tmp/protoc8328222376437319475.exe, --version]
libprotoc 2.5.0
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-server/target/generated-sources/org/apache/hadoop/hive/metastore/parser/FilterParser.java
 does not exist: must build 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762172#comment-16762172
 ] 

Hive QA commented on HIVE-21210:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957816/HIVE-21210.6.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15773 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_insert_partition_static]
 (batchId=185)
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning 
(batchId=336)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15971/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15971/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15971/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957816 - PreCommit-HIVE-Build

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, HIVE-21210.6.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to 

[jira] [Commented] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-06 Thread Karen Coppage (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762164#comment-16762164
 ] 

Karen Coppage commented on HIVE-20758:
--

create_with_constraints.q probably has everything you need:)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, Screen Shot 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
> 

[jira] [Updated] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21071:
---
Status: Open  (was: Patch Available)

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.10.patch, 
> HIVE-21071.11.patch, HIVE-21071.2.patch, HIVE-21071.3.patch, 
> HIVE-21071.4.patch, HIVE-21071.5.patch, HIVE-21071.6.patch, 
> HIVE-21071.7.patch, HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21071:
---
Status: Patch Available  (was: Open)

Added another UT

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.10.patch, 
> HIVE-21071.11.patch, HIVE-21071.2.patch, HIVE-21071.3.patch, 
> HIVE-21071.4.patch, HIVE-21071.5.patch, HIVE-21071.6.patch, 
> HIVE-21071.7.patch, HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21071:
---
Attachment: HIVE-21071.11.patch

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.10.patch, 
> HIVE-21071.11.patch, HIVE-21071.2.patch, HIVE-21071.3.patch, 
> HIVE-21071.4.patch, HIVE-21071.5.patch, HIVE-21071.6.patch, 
> HIVE-21071.7.patch, HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762147#comment-16762147
 ] 

Hive QA commented on HIVE-21210:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
46s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
34s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
4s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
13s{color} | {color:red} common: The patch generated 2 new + 0 unchanged - 0 
fixed = 2 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
37s{color} | {color:red} ql: The patch generated 1 new + 7 unchanged - 49 fixed 
= 8 total (was 56) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15971/dev-support/hive-personality.sh
 |
| git revision | master / fae6256 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15971/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15971/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15971/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15971/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, HIVE-21210.6.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> 

[jira] [Resolved] (HIVE-21173) Upgrade to the latest release of Apache Thrift

2019-02-06 Thread James E. King III (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James E. King III resolved HIVE-21173.
--
Resolution: Duplicate

> Upgrade to the latest release of Apache Thrift
> --
>
> Key: HIVE-21173
> URL: https://issues.apache.org/jira/browse/HIVE-21173
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: James E. King III
>Priority: Major
>
> The project currently depends on libthrift-0.9.3, however thrift released 
> 0.12.0 on 2019-JAN-04.This release includes a security fix for 
> THRIFT-4506 (CVE-2018-1320).  Updating thrift to the latest version will 
> remove that vulnerability.
> Also note the Apache Thrift project does not publish "libfb303" any longer.  
> fb303 is contributed code (in '/contrib') and it has not been maintained.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-21217) Optimize range calculation for PTF

2019-02-06 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761332#comment-16761332
 ] 

Vineet Garg edited comment on HIVE-21217 at 2/6/19 8:50 PM:


[~szita] Would you mind providing an example query? Is this only valid for 
queries containing {{RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW}} ?


was (Author: vgarg):
[~szita] Would you mind providing an example query? Is this only valid for 
queries containing {{ RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW}} ?

> Optimize range calculation for PTF
> --
>
> Key: HIVE-21217
> URL: https://issues.apache.org/jira/browse/HIVE-21217
> Project: Hive
>  Issue Type: Improvement
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
>
> During window function execution Hive has to iterate on neighbouring rows of 
> the current row to find the beginning and end of the proper range (on which 
> the aggregation will be executed).
> When we're using range based windows and have many rows with a certain key 
> value this can take a lot of time. (e.g. partition size of 80M, in which we 
> have 2 ranges of 40M rows according to the orderby column: within these 40M 
> rowsets we're doing 40M x 40M/2 steps.. which is of n^2 time complexity)
> I propose to introduce a cache that keeps track of already calculated range 
> ends so it can be reused in future scans.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762091#comment-16762091
 ] 

Hive QA commented on HIVE-21071:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957784/HIVE-21071.9.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15772 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeOnTezEdges (batchId=327)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15969/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15969/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15969/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957784 - PreCommit-HIVE-Build

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.2.patch, 
> HIVE-21071.3.patch, HIVE-21071.4.patch, HIVE-21071.5.patch, 
> HIVE-21071.6.patch, HIVE-21071.7.patch, HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21071:
---
Status: Patch Available  (was: Open)

I think the last test failure is flaky. Submitting patch again.

Checkstyle errors should be ignored.  They are complaining about test code and 
some things that are outside the scope of this path (like complaining about 
method signature lengths).

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.10.patch, 
> HIVE-21071.2.patch, HIVE-21071.3.patch, HIVE-21071.4.patch, 
> HIVE-21071.5.patch, HIVE-21071.6.patch, HIVE-21071.7.patch, 
> HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21071:
---
Attachment: HIVE-21071.10.patch

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.10.patch, 
> HIVE-21071.2.patch, HIVE-21071.3.patch, HIVE-21071.4.patch, 
> HIVE-21071.5.patch, HIVE-21071.6.patch, HIVE-21071.7.patch, 
> HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21204) Instrumentation for read/write locks in LLAP

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762092#comment-16762092
 ] 

Hive QA commented on HIVE-21204:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957815/HIVE-21204.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15970/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15970/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15970/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-02-06 20:30:59.633
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-15970/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-02-06 20:30:59.637
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at fae6256 HIVE-21214 : MoveTask : Use attemptId instead of file 
size for deduplication of files compareTempOrDuplicateFiles() (Deepak Jaiswal, 
reviewed by Jason Dere)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at fae6256 HIVE-21214 : MoveTask : Use attemptId instead of file 
size for deduplication of files compareTempOrDuplicateFiles() (Deepak Jaiswal, 
reviewed by Jason Dere)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-02-06 20:31:00.774
+ rm -rf ../yetus_PreCommit-HIVE-Build-15970
+ mkdir ../yetus_PreCommit-HIVE-Build-15970
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-15970
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-15970/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: does not 
exist in index
error: 
a/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SerDeLowLevelCacheImpl.java:
 does not exist in index
error: 
a/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java:
 does not exist in index
error: 
a/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/services/impl/LlapWebServices.java:
 does not exist in index
error: 
a/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
does not exist in index
error: 
a/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java:
 does not exist in index
error: 
a/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java:
 does not exist in index
Going to apply patch with: git apply -p1
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
protoc-jar: executing: [/tmp/protoc1193346556322853212.exe, --version]
protoc-jar: executing: [/tmp/protoc1193346556322853212.exe, 
-I/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore,
 
--java_out=/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/target/generated-sources,
 
/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto]
libprotoc 2.5.0
ANTLR Parser Generator  Version 3.5.2
protoc-jar: executing: [/tmp/protoc3624403729085032937.exe, --version]
libprotoc 2.5.0
ANTLR Parser Generator  Version 3.5.2
Output file 

[jira] [Updated] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21071:
---
Status: Open  (was: Patch Available)

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.10.patch, 
> HIVE-21071.2.patch, HIVE-21071.3.patch, HIVE-21071.4.patch, 
> HIVE-21071.5.patch, HIVE-21071.6.patch, HIVE-21071.7.patch, 
> HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21224) Upgrade tests JUnit3 to JUnit4

2019-02-06 Thread Bruno Pusztahazi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21224:

Status: Open  (was: Patch Available)

> Upgrade tests JUnit3 to JUnit4
> --
>
> Key: HIVE-21224
> URL: https://issues.apache.org/jira/browse/HIVE-21224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Major
> Attachments: HIVE-21224.1.patch, HIVE-21224.2.patch
>
>
> Old JUnit3 tests should be upgraded to JUnit4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21224) Upgrade tests JUnit3 to JUnit4

2019-02-06 Thread Bruno Pusztahazi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21224:

Attachment: HIVE-21224.2.patch

> Upgrade tests JUnit3 to JUnit4
> --
>
> Key: HIVE-21224
> URL: https://issues.apache.org/jira/browse/HIVE-21224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Major
> Attachments: HIVE-21224.1.patch, HIVE-21224.2.patch
>
>
> Old JUnit3 tests should be upgraded to JUnit4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21224) Upgrade tests JUnit3 to JUnit4

2019-02-06 Thread Bruno Pusztahazi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21224:

Status: Patch Available  (was: Open)

> Upgrade tests JUnit3 to JUnit4
> --
>
> Key: HIVE-21224
> URL: https://issues.apache.org/jira/browse/HIVE-21224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Major
> Attachments: HIVE-21224.1.patch, HIVE-21224.2.patch
>
>
> Old JUnit3 tests should be upgraded to JUnit4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762057#comment-16762057
 ] 

Hive QA commented on HIVE-21071:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
59s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
42s{color} | {color:red} ql: The patch generated 10 new + 132 unchanged - 27 
fixed = 142 total (was 159) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
20s{color} | {color:green} ql generated 0 new + 2295 unchanged - 3 fixed = 2295 
total (was 2298) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15969/dev-support/hive-personality.sh
 |
| git revision | master / fae6256 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15969/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15969/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15969/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.2.patch, 
> HIVE-21071.3.patch, HIVE-21071.4.patch, HIVE-21071.5.patch, 
> HIVE-21071.6.patch, HIVE-21071.7.patch, HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used 

[jira] [Updated] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21210:
---
Status: Patch Available  (was: Open)

One more patch to fix whitespace and JavaDoc issues.

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, HIVE-21210.6.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21210:
---
Attachment: HIVE-21210.6.patch

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, HIVE-21210.6.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21210:
---
Status: Open  (was: Patch Available)

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, HIVE-21210.6.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21196) Support semijoin reduction on multiple column join

2019-02-06 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-21196:
--
Description: 
Currently for a query involving join on multiple columns creates  separate semi 
join edges for each key which in turn create a bloom filter for each of them, 
like below,

EXPLAIN select count(*) from srcpart_date_n7 join srcpart_small_n3 on 
(srcpart_date_n7.key = srcpart_small_n3.key1 and srcpart_date_n7.value = 
srcpart_small_n3.value1)
{code:java}
Map 1 <- Reducer 5 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE)
Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE)
 A masked pattern was here 
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: srcpart_date_n7
  filterExpr: (key is not null and value is not null and (key 
BETWEEN DynamicValue(RS_7_srcpart_small_n3_key1_min) AND 
DynamicValue(RS_7_srcpart_small_n3_key1_max) and in_bloom_filter(key, 
DynamicValue(RS_7_srcpart_small_n3_key1_bloom_filter (type: boolean)
  Statistics: Num rows: 2000 Data size: 356000 Basic stats: 
COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: ((key BETWEEN 
DynamicValue(RS_7_srcpart_small_n3_key1_min) AND 
DynamicValue(RS_7_srcpart_small_n3_key1_max) and in_bloom_filter(key, 
DynamicValue(RS_7_srcpart_small_n3_key1_bloom_filter))) and key is not null and 
value is not null) (type: boolean)
Statistics: Num rows: 2000 Data size: 356000 Basic stats: 
COMPLETE Column stats: COMPLETE
Select Operator
  expressions: key (type: string), value (type: string)
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 2000 Data size: 356000 Basic stats: 
COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: 
string)
sort order: ++
Map-reduce partition columns: _col0 (type: string), 
_col1 (type: string)
Statistics: Num rows: 2000 Data size: 356000 Basic 
stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map 4 
Map Operator Tree:
TableScan
  alias: srcpart_small_n3
  filterExpr: (key1 is not null and value1 is not null) (type: 
boolean)
  Statistics: Num rows: 20 Data size: 3560 Basic stats: PARTIAL 
Column stats: PARTIAL
  Filter Operator
predicate: (key1 is not null and value1 is not null) (type: 
boolean)
Statistics: Num rows: 20 Data size: 3560 Basic stats: 
PARTIAL Column stats: PARTIAL
Select Operator
  expressions: key1 (type: string), value1 (type: string)
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 3560 Basic stats: 
PARTIAL Column stats: PARTIAL
  Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: 
string)
sort order: ++
Map-reduce partition columns: _col0 (type: string), 
_col1 (type: string)
Statistics: Num rows: 20 Data size: 3560 Basic stats: 
PARTIAL Column stats: PARTIAL
  Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 20 Data size: 3560 Basic stats: 
PARTIAL Column stats: PARTIAL
Group By Operator
  aggregations: min(_col0), max(_col0), 
bloom_filter(_col0, expectedEntries=20)
  mode: hash
  outputColumnNames: _col0, _col1, _col2
  Statistics: Num rows: 1 Data size: 730 Basic stats: 
PARTIAL Column stats: PARTIAL
  Reduce Output Operator
sort order: 
Statistics: Num rows: 1 Data size: 730 Basic stats: 
PARTIAL Column stats: PARTIAL
value expressions: _col0 (type: string), _col1 
(type: string), _col2 (type: binary)
Execution mode: vectorized, llap
LLAP IO: all inputs
Reducer 2 
Execution mode: llap
Reduce Operator Tree:
  Merge Join Operator
condition map:
 Inner Join 0 to 1
keys:
  0 _col0 (type: string), _col1 (type: string)
  1 _col0 (type: string), 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762026#comment-16762026
 ] 

Hive QA commented on HIVE-21210:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957783/HIVE-21210.5.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15773 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15968/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15968/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15968/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957783 - PreCommit-HIVE-Build

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21196) Support semijoin reduction on multiple column join

2019-02-06 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-21196:
--
Description: 
Currently for a query involving join on multiple columns creates  separate semi 
join edges for each key which in turn create a bloom filter for each of them, 
like below,

EXPLAIN select count(*) from srcpart_date_n7 join srcpart_small_n3 on 
(srcpart_date_n7.key = srcpart_small_n3.key1 and srcpart_date_n7.value = 
srcpart_small_n3.value1)
{code:java}
Map 1 <- Reducer 5 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE)
Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE)
 A masked pattern was here 
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: srcpart_date_n7
  filterExpr: (key is not null and value is not null and (key 
BETWEEN DynamicValue(RS_7_srcpart_small_n3_key1_min) AND 
DynamicValue(RS_7_srcpart_small_n3_key1_max) and in_bloom_filter(key, 
DynamicValue(RS_7_srcpart_small_n3_key1_bloom_filter (type: boolean)
  Statistics: Num rows: 2000 Data size: 356000 Basic stats: 
COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: ((key BETWEEN 
DynamicValue(RS_7_srcpart_small_n3_key1_min) AND 
DynamicValue(RS_7_srcpart_small_n3_key1_max) and in_bloom_filter(key, 
DynamicValue(RS_7_srcpart_small_n3_key1_bloom_filter))) and key is not null and 
value is not null) (type: boolean)
Statistics: Num rows: 2000 Data size: 356000 Basic stats: 
COMPLETE Column stats: COMPLETE
Select Operator
  expressions: key (type: string), value (type: string)
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 2000 Data size: 356000 Basic stats: 
COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: 
string)
sort order: ++
Map-reduce partition columns: _col0 (type: string), 
_col1 (type: string)
Statistics: Num rows: 2000 Data size: 356000 Basic 
stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map 4 
Map Operator Tree:
TableScan
  alias: srcpart_small_n3
  filterExpr: (key1 is not null and value1 is not null) (type: 
boolean)
  Statistics: Num rows: 20 Data size: 3560 Basic stats: PARTIAL 
Column stats: PARTIAL
  Filter Operator
predicate: (key1 is not null and value1 is not null) (type: 
boolean)
Statistics: Num rows: 20 Data size: 3560 Basic stats: 
PARTIAL Column stats: PARTIAL
Select Operator
  expressions: key1 (type: string), value1 (type: string)
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 20 Data size: 3560 Basic stats: 
PARTIAL Column stats: PARTIAL
  Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: 
string)
sort order: ++
Map-reduce partition columns: _col0 (type: string), 
_col1 (type: string)
Statistics: Num rows: 20 Data size: 3560 Basic stats: 
PARTIAL Column stats: PARTIAL
  Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 20 Data size: 3560 Basic stats: 
PARTIAL Column stats: PARTIAL
Group By Operator
  aggregations: min(_col0), max(_col0), 
bloom_filter(_col0, expectedEntries=20)
  mode: hash
  outputColumnNames: _col0, _col1, _col2
  Statistics: Num rows: 1 Data size: 730 Basic stats: 
PARTIAL Column stats: PARTIAL
  Reduce Output Operator
sort order: 
Statistics: Num rows: 1 Data size: 730 Basic stats: 
PARTIAL Column stats: PARTIAL
value expressions: _col0 (type: string), _col1 
(type: string), _col2 (type: binary)
Execution mode: vectorized, llap
LLAP IO: all inputs
Reducer 2 
Execution mode: llap
Reduce Operator Tree:
  Merge Join Operator
condition map:
 Inner Join 0 to 1
keys:
  0 _col0 (type: string), _col1 (type: string)
  1 _col0 (type: string), 

[jira] [Updated] (HIVE-21204) Instrumentation for read/write locks in LLAP

2019-02-06 Thread Oliver Draese (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oliver Draese updated HIVE-21204:
-
Attachment: HIVE-21204.2.patch

> Instrumentation for read/write locks in LLAP
> 
>
> Key: HIVE-21204
> URL: https://issues.apache.org/jira/browse/HIVE-21204
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Oliver Draese
>Assignee: Oliver Draese
>Priority: Major
> Attachments: HIVE-21204.1.patch, HIVE-21204.2.patch, HIVE-21204.patch
>
>
> LLAP has several R/W locks for serialization of updates to query tracker, 
> file data, 
> Instrumentation is added to monitor the
>  * total amount of R/W locks within a particular category
>  * average + max wait/suspension time to get the R/W lock
> A category includes all lock instances for particular areas (i.e. category is 
> FileData and all R/W locks that are used in FileData instances are accounted 
> within the one category).
> The monitoring/accounting is done via Hadoop Metrics 2, making them 
> accessible via JMX. In addition, a new "locking" GET endpoint is added to the 
> LLAP daemon's REST interface. It produces output like the following example:
> {
>  {{  "statsCollection": "enabled",}}
>  {{  "lockStats": [}}
>  {{    {}}{{ "type": "R/W Lock Stats",}}
>  {{      "label": "FileData",}}
>  {{      "totalLockWaitTimeMillis": 0,}}
>  {{      "readLock": {}}
>  {{         "count": 0,}}
>  {{         "avgWaitTimeNanos": 0,}}
>  {{         "maxWaitTimeNanos": 0}}
>  {{      },}}
>  {{      "writeLock": {}}
>  {{         "count": 0,}}
>  {{         "avgWaitTimeNanos": 0,}}
>  {{         "maxWaitTimeNanos": 0}}
>               }
>  {{    },}}
>  {{    { "}}{{type": "R/W Lock Stats",}}
>  {{      "label": "QueryTracker",}}
>  {{      "totalLockWaitTimeMillis": 0,}}
>  {{      "readLock": {}}
>  {{         "count": 0,}}
>  {{         "avgWaitTimeNanos": 0,}}
>  {{         "maxWaitTimeNanos": 0}}
>  {{      },}}
>  {{      "writeLock": {}}
>  {{         "count": 0,}}
>  {{         "avgWaitTimeNanos": 0,}}
>  {{         "maxWaitTimeNanos": 0}}
>               }
>  {{    } }}{{]}}
> {{}}}
> To avoid the overhead of lock instrumentation, lock metrics collection is 
> disabled by default and can be enabled via the following configuration 
> parameter:
>   {{hive.llap.lockmetrics.collect = true}}
>   
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762006#comment-16762006
 ] 

Hive QA commented on HIVE-21210:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
8s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} common: The patch generated 2 new + 0 unchanged - 0 
fixed = 2 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 1 new + 7 unchanged - 49 fixed 
= 8 total (was 56) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
49s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
13s{color} | {color:red} common generated 18 new + 27 unchanged - 0 fixed = 45 
total (was 27) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15968/dev-support/hive-personality.sh
 |
| git revision | master / fae6256 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15968/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15968/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15968/yetus/whitespace-eol.txt
 |
| javadoc | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15968/yetus/diff-javadoc-javadoc-common.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15968/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15968/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>

[jira] [Comment Edited] (HIVE-21009) LDAP - Specify binddn for ldap-search

2019-02-06 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761990#comment-16761990
 ] 

Prasanth Jayachandran edited comment on HIVE-21009 at 2/6/19 6:10 PM:
--

Yeah. Make sense. If it is not related to this patch (or caused by the patch) 
then we don't have to handle it in this ticket. The test run looks clean, I 
will go ahead and commit the patch shortly.


was (Author: prasanth_j):
Yeah. Make sense. If it is not related to this patch then we don't have to 
handle it in this ticket. The test run looks clean, I will go ahead and commit 
the patch shortly.

> LDAP - Specify binddn for ldap-search
> -
>
> Key: HIVE-21009
> URL: https://issues.apache.org/jira/browse/HIVE-21009
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.0, 2.1.1, 2.2.0, 2.3.0, 2.3.1, 2.3.2
>Reporter: Thomas Uhren
>Assignee: David McGinnis
>Priority: Major
>  Labels: features, newbie, security
> Attachments: HIVE-21009.01.patch, HIVE-21009.02.patch, 
> HIVE-21009.03.patch, HIVE-21009.04.patch, HIVE-21009.05.patch, 
> HIVE-21009.06.patch, HIVE-21009.07.patch, HIVE-21009.patch
>
>
> When user accounts cannot do an LDAP search, there is currently no way of 
> specifying a custom binddn to use for the ldap-search.
> So I'm missing something like that:
> {code}
> hive.server2.authentication.ldap.bindn=cn=ldapuser,ou=user,dc=example
> hive.server2.authentication.ldap.bindnpw=password
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21009) LDAP - Specify binddn for ldap-search

2019-02-06 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761990#comment-16761990
 ] 

Prasanth Jayachandran commented on HIVE-21009:
--

Yeah. Make sense. If it is not related to this patch then we don't have to 
handle it in this ticket. The test run looks clean, I will go ahead and commit 
the patch shortly.

> LDAP - Specify binddn for ldap-search
> -
>
> Key: HIVE-21009
> URL: https://issues.apache.org/jira/browse/HIVE-21009
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.0, 2.1.1, 2.2.0, 2.3.0, 2.3.1, 2.3.2
>Reporter: Thomas Uhren
>Assignee: David McGinnis
>Priority: Major
>  Labels: features, newbie, security
> Attachments: HIVE-21009.01.patch, HIVE-21009.02.patch, 
> HIVE-21009.03.patch, HIVE-21009.04.patch, HIVE-21009.05.patch, 
> HIVE-21009.06.patch, HIVE-21009.07.patch, HIVE-21009.patch
>
>
> When user accounts cannot do an LDAP search, there is currently no way of 
> specifying a custom binddn to use for the ldap-search.
> So I'm missing something like that:
> {code}
> hive.server2.authentication.ldap.bindn=cn=ldapuser,ou=user,dc=example
> hive.server2.authentication.ldap.bindnpw=password
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16815) Clean up javadoc from error for the rest of modules

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761977#comment-16761977
 ] 

Hive QA commented on HIVE-16815:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957756/HIVE-16815.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15771 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15967/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15967/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15967/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957756 - PreCommit-HIVE-Build

> Clean up javadoc from error for the rest of modules
> ---
>
> Key: HIVE-16815
> URL: https://issues.apache.org/jira/browse/HIVE-16815
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Janos Gub
>Assignee: Robert Kucsora
>Priority: Major
> Attachments: HIVE-16815.2.patch, HIVE-16815.3.patch, HIVE-16815.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21009) LDAP - Specify binddn for ldap-search

2019-02-06 Thread David McGinnis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761976#comment-16761976
 ] 

David McGinnis commented on HIVE-21009:
---

[~prasanth_j]: I looked in that pom, and it should be covered by the line which 
excludes anything inside of a resources folder (which the jceks file is).

I believe we are seeing that error due to a commit a few days ago of 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java]
 which does not have a license at the top of the file. My understanding of the 
desired process is that you shouldn't make changes outside of the bug you are 
working on (and simple cleanup of the code around it), so I'm not sure I'm 
supposed to add the license to the file as part of this change, as part of a 
separate change, or just ignore it until someone else fixes it. Feel free to 
give me guidance on this.

> LDAP - Specify binddn for ldap-search
> -
>
> Key: HIVE-21009
> URL: https://issues.apache.org/jira/browse/HIVE-21009
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.0, 2.1.1, 2.2.0, 2.3.0, 2.3.1, 2.3.2
>Reporter: Thomas Uhren
>Assignee: David McGinnis
>Priority: Major
>  Labels: features, newbie, security
> Attachments: HIVE-21009.01.patch, HIVE-21009.02.patch, 
> HIVE-21009.03.patch, HIVE-21009.04.patch, HIVE-21009.05.patch, 
> HIVE-21009.06.patch, HIVE-21009.07.patch, HIVE-21009.patch
>
>
> When user accounts cannot do an LDAP search, there is currently no way of 
> specifying a custom binddn to use for the ldap-search.
> So I'm missing something like that:
> {code}
> hive.server2.authentication.ldap.bindn=cn=ldapuser,ou=user,dc=example
> hive.server2.authentication.ldap.bindnpw=password
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16815) Clean up javadoc from error for the rest of modules

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761975#comment-16761975
 ] 

Hive QA commented on HIVE-16815:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 5s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
25s{color} | {color:blue} storage-api in master has 48 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  2m 
16s{color} | {color:blue} standalone-metastore/metastore-common in master has 
29 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
29s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
38s{color} | {color:blue} serde in master has 197 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m  
2s{color} | {color:blue} standalone-metastore/metastore-server in master has 
184 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
37s{color} | {color:blue} llap-server in master has 81 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
36s{color} | {color:blue} service in master has 48 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
28s{color} | {color:blue} accumulo-handler in master has 21 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
23s{color} | {color:blue} jdbc in master has 16 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
22s{color} | {color:blue} contrib in master has 10 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
27s{color} | {color:blue} hbase-handler in master has 15 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} hcatalog/core in master has 30 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
24s{color} | {color:blue} hcatalog/server-extensions in master has 3 extant 
Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
25s{color} | {color:blue} hcatalog/webhcat/java-client in master has 3 extant 
Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
24s{color} | {color:blue} hcatalog/streaming in master has 11 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
37s{color} | {color:blue} hplsql in master has 176 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
26s{color} | {color:blue} streaming in master has 2 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
22s{color} | {color:blue} llap-ext-client in master has 1 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
12s{color} | {color:blue} testutils in master has 5 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
24s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | 

[jira] [Commented] (HIVE-21009) LDAP - Specify binddn for ldap-search

2019-02-06 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761959#comment-16761959
 ] 

Prasanth Jayachandran commented on HIVE-21009:
--

You may have to add apache rat exclusion for jceks file to 
[https://github.com/apache/hive/blob/master/pom.xml#L1350-L1353] to avoid the 
asflicense issue.

 

> LDAP - Specify binddn for ldap-search
> -
>
> Key: HIVE-21009
> URL: https://issues.apache.org/jira/browse/HIVE-21009
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.0, 2.1.1, 2.2.0, 2.3.0, 2.3.1, 2.3.2
>Reporter: Thomas Uhren
>Assignee: David McGinnis
>Priority: Major
>  Labels: features, newbie, security
> Attachments: HIVE-21009.01.patch, HIVE-21009.02.patch, 
> HIVE-21009.03.patch, HIVE-21009.04.patch, HIVE-21009.05.patch, 
> HIVE-21009.06.patch, HIVE-21009.07.patch, HIVE-21009.patch
>
>
> When user accounts cannot do an LDAP search, there is currently no way of 
> specifying a custom binddn to use for the ldap-search.
> So I'm missing something like that:
> {code}
> hive.server2.authentication.ldap.bindn=cn=ldapuser,ou=user,dc=example
> hive.server2.authentication.ldap.bindnpw=password
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-06 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761931#comment-16761931
 ] 

Jesus Camacho Rodriguez commented on HIVE-20758:


[~b.maidics], patch LGTM. Could you add a test for the new feature (probably a 
q file)? Thanks

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, Screen Shot 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   

[jira] [Commented] (HIVE-21224) Upgrade tests JUnit3 to JUnit4

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761911#comment-16761911
 ] 

Hive QA commented on HIVE-21224:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957750/HIVE-21224.1.patch

{color:green}SUCCESS:{color} +1 due to 153 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 15767 tests 
executed
*Failed tests:*
{noformat}
TestGenericUDFConcat - did not produce a TEST-*.xml file (likely timed out) 
(batchId=287)
TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) 
(batchId=254)
TestMTQueries - did not produce a TEST-*.xml file (likely timed out) 
(batchId=252)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_memcheck] 
(batchId=45)
org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testAutoPurgeTablesAndPartitions
 (batchId=308)
org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testHiveRefreshOnConfChange 
(batchId=308)
org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testPartition (batchId=308)
org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testTable (batchId=308)
org.apache.hadoop.hive.ql.metadata.TestHiveRemote.testThriftTable (batchId=308)
org.apache.hive.jdbc.TestActivePassiveHA.testClientConnectionsOnFailover 
(batchId=261)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15966/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15966/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15966/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957750 - PreCommit-HIVE-Build

> Upgrade tests JUnit3 to JUnit4
> --
>
> Key: HIVE-21224
> URL: https://issues.apache.org/jira/browse/HIVE-21224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Major
> Attachments: HIVE-21224.1.patch
>
>
> Old JUnit3 tests should be upgraded to JUnit4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21214) MoveTask : Use attemptId instead of file size for deduplication of files compareTempOrDuplicateFiles()

2019-02-06 Thread Deepak Jaiswal (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761906#comment-16761906
 ] 

Deepak Jaiswal commented on HIVE-21214:
---

Committed to master. Thanks for the review [~jdere].

> MoveTask : Use attemptId instead of file size for deduplication of files 
> compareTempOrDuplicateFiles()
> --
>
> Key: HIVE-21214
> URL: https://issues.apache.org/jira/browse/HIVE-21214
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-21214.1.patch, HIVE-21214.2.patch, 
> HIVE-21214.3.patch
>
>
> For a given task, if there is more than one attempt then deduplication logic 
> kicks in.
> {noformat}
> Utilities.compareTempOrDuplicateFiles(){noformat}
> The logic uses file size and picks the one with largest size. This logic is 
> very fragile.
> ideally, it should pick the successful attempt's file.
> However, a simpler solution is to pick the newest attempt and also checking 
> the file size for the newest attempt is the largest.
> If not, throw an exception.
>  
> cc [~gopalv] [~thejas] [~jdere] [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21214) MoveTask : Use attemptId instead of file size for deduplication of files compareTempOrDuplicateFiles()

2019-02-06 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-21214:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> MoveTask : Use attemptId instead of file size for deduplication of files 
> compareTempOrDuplicateFiles()
> --
>
> Key: HIVE-21214
> URL: https://issues.apache.org/jira/browse/HIVE-21214
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-21214.1.patch, HIVE-21214.2.patch, 
> HIVE-21214.3.patch
>
>
> For a given task, if there is more than one attempt then deduplication logic 
> kicks in.
> {noformat}
> Utilities.compareTempOrDuplicateFiles(){noformat}
> The logic uses file size and picks the one with largest size. This logic is 
> very fragile.
> ideally, it should pick the successful attempt's file.
> However, a simpler solution is to pick the newest attempt and also checking 
> the file size for the newest attempt is the largest.
> If not, throw an exception.
>  
> cc [~gopalv] [~thejas] [~jdere] [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2019-02-06 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-21218:
-

Assignee: Milan Baran

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: Milan Baran
>Priority: Major
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21224) Upgrade tests JUnit3 to JUnit4

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761898#comment-16761898
 ] 

Hive QA commented on HIVE-21224:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
3s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
51s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 0s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
36s{color} | {color:blue} serde in master has 197 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
31s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
35s{color} | {color:blue} service in master has 48 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
26s{color} | {color:blue} cli in master has 13 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
21s{color} | {color:blue} contrib in master has 10 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
25s{color} | {color:blue} druid-handler in master has 3 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
28s{color} | {color:blue} hbase-handler in master has 15 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} hcatalog/core in master has 30 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
25s{color} | {color:blue} hcatalog/webhcat/java-client in master has 3 extant 
Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
31s{color} | {color:blue} hcatalog/webhcat/svr in master has 96 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
25s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
52s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} common: The patch generated 12 new + 9 unchanged - 3 
fixed = 21 total (was 12) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
15s{color} | {color:red} serde: The patch generated 78 new + 288 unchanged - 20 
fixed = 366 total (was 308) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
44s{color} | {color:red} ql: The patch generated 205 new + 951 unchanged - 59 
fixed = 1156 total (was 1010) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} service: The patch generated 22 new + 17 unchanged - 6 
fixed = 39 total (was 23) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
10s{color} | {color:red} cli: The patch generated 2 new + 9 unchanged - 1 fixed 
= 11 total (was 10) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m  
9s{color} | {color:red} contrib: The patch generated 4 new + 10 unchanged - 0 
fixed = 14 total (was 10) {color} 

[jira] [Commented] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2019-02-06 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761897#comment-16761897
 ] 

slim bouguerra commented on HIVE-21218:
---

[~milan.baran] can you please upload a patch here to kick in the System test ?

please take a look here 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CreatingaPatch

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Priority: Major
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-21121) Cast date to timestamp incorrect interpretation

2019-02-06 Thread paco87 (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758251#comment-16758251
 ] 

paco87 edited comment on HIVE-21121 at 2/6/19 3:20 PM:
---

I have generated all dates (0001-01-01 to -12-31) and after testing issue 
appears on following dates:

1900-01-01 
 1919-04-15 
 1919-09-16 
 1944-10-04 
 1957-09-29 
 1958-09-28 
 1959-10-04 
 1960-10-02 
 1961-10-01 
 1962-09-30 
 1963-09-29 
 1964-09-27 
 1977-09-25 
 1978-10-01 
 1979-09-30 
 1980-09-28 
 1981-09-27 
 1982-09-26 
 1983-09-25 
 1984-09-30 
 1985-09-29 
 1986-09-28 
 1987-09-27

You may test it without generating any table with this simple queries:

select cast(cast('1900-01-01' as date) as timestamp); 
 select cast(cast('1919-04-15' as date) as timestamp); 
 select cast(cast('1919-09-16' as date) as timestamp); 
 select cast(cast('1944-10-04' as date) as timestamp); 
 select cast(cast('1957-09-29' as date) as timestamp); 
 select cast(cast('1958-09-28' as date) as timestamp); 
 select cast(cast('1959-10-04' as date) as timestamp); 
 select cast(cast('1960-10-02' as date) as timestamp); 
 select cast(cast('1961-10-01' as date) as timestamp); 
 select cast(cast('1962-09-30' as date) as timestamp); 
 select cast(cast('1963-09-29' as date) as timestamp); 
 select cast(cast('1964-09-27' as date) as timestamp); 
 select cast(cast('1977-09-25' as date) as timestamp); 
 select cast(cast('1978-10-01' as date) as timestamp); 
 select cast(cast('1979-09-30' as date) as timestamp); 
 select cast(cast('1980-09-28' as date) as timestamp); 
 select cast(cast('1981-09-27' as date) as timestamp); 
 select cast(cast('1982-09-26' as date) as timestamp); 
 select cast(cast('1983-09-25' as date) as timestamp); 
 select cast(cast('1984-09-30' as date) as timestamp); 
 select cast(cast('1985-09-29' as date) as timestamp); 
 select cast(cast('1986-09-28' as date) as timestamp); 
 select cast(cast('1987-09-27' as date) as timestamp);

 

 

 


was (Author: paco87):
I have generated all dates (0001-01-01 to -12-31) and after testing issue 
appears on following dates:

1900-01-01 
1919-04-15 
1919-09-16 
1944-10-04 
1957-09-29 
1958-09-28 
1959-10-04 
1960-10-02 
1961-10-01 
1962-09-30 
1963-09-29 
1964-09-27 
1977-09-25 
1978-10-01 
1979-09-30 
1980-09-28 
1981-09-27 
1982-09-26 
1983-09-25 
1984-09-30 
1985-09-29 
1986-09-28 
1987-09-27

You may test without generating any table with this simple queries:

select cast(cast('1900-01-01' as date) as timestamp); 
select cast(cast('1919-04-15' as date) as timestamp); 
select cast(cast('1919-09-16' as date) as timestamp); 
select cast(cast('1944-10-04' as date) as timestamp); 
select cast(cast('1957-09-29' as date) as timestamp); 
select cast(cast('1958-09-28' as date) as timestamp); 
select cast(cast('1959-10-04' as date) as timestamp); 
select cast(cast('1960-10-02' as date) as timestamp); 
select cast(cast('1961-10-01' as date) as timestamp); 
select cast(cast('1962-09-30' as date) as timestamp); 
select cast(cast('1963-09-29' as date) as timestamp); 
select cast(cast('1964-09-27' as date) as timestamp); 
select cast(cast('1977-09-25' as date) as timestamp); 
select cast(cast('1978-10-01' as date) as timestamp); 
select cast(cast('1979-09-30' as date) as timestamp); 
select cast(cast('1980-09-28' as date) as timestamp); 
select cast(cast('1981-09-27' as date) as timestamp); 
select cast(cast('1982-09-26' as date) as timestamp); 
select cast(cast('1983-09-25' as date) as timestamp); 
select cast(cast('1984-09-30' as date) as timestamp); 
select cast(cast('1985-09-29' as date) as timestamp); 
select cast(cast('1986-09-28' as date) as timestamp); 
select cast(cast('1987-09-27' as date) as timestamp);

 

 

 

> Cast date to timestamp incorrect interpretation
> ---
>
> Key: HIVE-21121
> URL: https://issues.apache.org/jira/browse/HIVE-21121
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 2.1.0, 3.0.0
>Reporter: paco87
>Priority: Major
> Attachments: jira_replicate_issue.txt
>
>
> Hive is returning timestamp with current time when casting date to timestamp, 
> where it should be 00:00:00.0 .
> This issue is wired. It seems that this happens only on specific dates: 
> '1900-01-01'.
> +++
> |ab|
> +++
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1890-01-01 00:00:00.0|
> |1901-01-01 00:00:00.0|
> |1900-01-01 11:28:46.869|
> +---++
> -I didin't notice this on any other date.-
>  This might be connected to old issue: HIVE-10488



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21121) Cast date to timestamp incorrect interpretation

2019-02-06 Thread paco87 (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

paco87 updated HIVE-21121:
--
Affects Version/s: 3.1.1

> Cast date to timestamp incorrect interpretation
> ---
>
> Key: HIVE-21121
> URL: https://issues.apache.org/jira/browse/HIVE-21121
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 2.1.0, 3.0.0, 3.1.1
>Reporter: paco87
>Priority: Major
> Attachments: jira_replicate_issue.txt
>
>
> Hive is returning timestamp with current time when casting date to timestamp, 
> where it should be 00:00:00.0 .
> This issue is wired. It seems that this happens only on specific dates: 
> '1900-01-01'.
> +++
> |ab|
> +++
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1890-01-01 00:00:00.0|
> |1901-01-01 00:00:00.0|
> |1900-01-01 11:28:46.869|
> +---++
> -I didin't notice this on any other date.-
>  This might be connected to old issue: HIVE-10488



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761850#comment-16761850
 ] 

Hive QA commented on HIVE-20758:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957745/HIVE-20758.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15771 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15965/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15965/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15965/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957745 - PreCommit-HIVE-Build

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, Screen Shot 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)

[jira] [Comment Edited] (HIVE-21121) Cast date to timestamp incorrect interpretation

2019-02-06 Thread paco87 (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761836#comment-16761836
 ] 

paco87 edited comment on HIVE-21121 at 2/6/19 3:21 PM:
---

It appears that this issue has something to do with timezones (maybe due to 
summer/winter time) if UTC is 0 everything works fine.

I checked two timezones Europe/Warsaw and Europe/Kiev  and issue appears on 
diffrent dates.

Anyway when casting date to timestamp should show correct time "00:00:00"

Timezone Europe/Kiev example:

select cast(cast('1900-01-01' as date) as timestamp); 
 select cast(cast('1930-06-21' as date) as timestamp); 
 select cast(cast('1943-11-06' as date) as timestamp); 
 select cast(cast('1981-04-01' as date) as timestamp); 
 select cast(cast('1982-04-01' as date) as timestamp); 
 select cast(cast('1983-04-01' as date) as timestamp); 
 select cast(cast('1984-04-01' as date) as timestamp); 
 select cast(cast('1984-09-30' as date) as timestamp); 
 select cast(cast('1984-09-30' as date) as timestamp); 
 select cast(cast('1985-09-29' as date) as timestamp); 
 select cast(cast('1985-09-29' as date) as timestamp); 
 select cast(cast('1986-09-28' as date) as timestamp); 
 select cast(cast('1986-09-28' as date) as timestamp); 
 select cast(cast('1987-09-27' as date) as timestamp); 
 select cast(cast('1987-09-27' as date) as timestamp); 
 select cast(cast('1988-09-25' as date) as timestamp); 
 select cast(cast('1988-09-25' as date) as timestamp); 
 select cast(cast('1989-09-24' as date) as timestamp); 
 select cast(cast('1989-09-24' as date) as timestamp); 
 select cast(cast('1990-07-01' as date) as timestamp); 
 select cast(cast('1991-09-29' as date) as timestamp); 
 select cast(cast('1991-09-29' as date) as timestamp);


was (Author: paco87):
It appears that this issue has something to do with timezones is set (maybe due 
to summer/winter time) if UTC is 0 everything works fine.

I checked two timezones Europe/Warsaw and Europe/Kiove and diffrent dates have 
this problem. 

Anyway when casting date to timestamp should show correct time "00:00:00"

Timezone Europe/Kiev example:

select cast(cast('1900-01-01' as date) as timestamp); 
select cast(cast('1930-06-21' as date) as timestamp); 
select cast(cast('1943-11-06' as date) as timestamp); 
select cast(cast('1981-04-01' as date) as timestamp); 
select cast(cast('1982-04-01' as date) as timestamp); 
select cast(cast('1983-04-01' as date) as timestamp); 
select cast(cast('1984-04-01' as date) as timestamp); 
select cast(cast('1984-09-30' as date) as timestamp); 
select cast(cast('1984-09-30' as date) as timestamp); 
select cast(cast('1985-09-29' as date) as timestamp); 
select cast(cast('1985-09-29' as date) as timestamp); 
select cast(cast('1986-09-28' as date) as timestamp); 
select cast(cast('1986-09-28' as date) as timestamp); 
select cast(cast('1987-09-27' as date) as timestamp); 
select cast(cast('1987-09-27' as date) as timestamp); 
select cast(cast('1988-09-25' as date) as timestamp); 
select cast(cast('1988-09-25' as date) as timestamp); 
select cast(cast('1989-09-24' as date) as timestamp); 
select cast(cast('1989-09-24' as date) as timestamp); 
select cast(cast('1990-07-01' as date) as timestamp); 
select cast(cast('1991-09-29' as date) as timestamp); 
select cast(cast('1991-09-29' as date) as timestamp);

> Cast date to timestamp incorrect interpretation
> ---
>
> Key: HIVE-21121
> URL: https://issues.apache.org/jira/browse/HIVE-21121
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 2.1.0, 3.0.0
>Reporter: paco87
>Priority: Major
> Attachments: jira_replicate_issue.txt
>
>
> Hive is returning timestamp with current time when casting date to timestamp, 
> where it should be 00:00:00.0 .
> This issue is wired. It seems that this happens only on specific dates: 
> '1900-01-01'.
> +++
> |ab|
> +++
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1890-01-01 00:00:00.0|
> |1901-01-01 00:00:00.0|
> |1900-01-01 11:28:46.869|
> +---++
> -I didin't notice this on any other date.-
>  This might be connected to old issue: HIVE-10488



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-21121) Cast date to timestamp incorrect interpretation

2019-02-06 Thread paco87 (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761836#comment-16761836
 ] 

paco87 edited comment on HIVE-21121 at 2/6/19 3:20 PM:
---

It appears that this issue has something to do with timezones is set (maybe due 
to summer/winter time) if UTC is 0 everything works fine.

I checked two timezones Europe/Warsaw and Europe/Kiove and diffrent dates have 
this problem. 

Anyway when casting date to timestamp should show correct time "00:00:00"

Timezone Europe/Kiev example:

select cast(cast('1900-01-01' as date) as timestamp); 
select cast(cast('1930-06-21' as date) as timestamp); 
select cast(cast('1943-11-06' as date) as timestamp); 
select cast(cast('1981-04-01' as date) as timestamp); 
select cast(cast('1982-04-01' as date) as timestamp); 
select cast(cast('1983-04-01' as date) as timestamp); 
select cast(cast('1984-04-01' as date) as timestamp); 
select cast(cast('1984-09-30' as date) as timestamp); 
select cast(cast('1984-09-30' as date) as timestamp); 
select cast(cast('1985-09-29' as date) as timestamp); 
select cast(cast('1985-09-29' as date) as timestamp); 
select cast(cast('1986-09-28' as date) as timestamp); 
select cast(cast('1986-09-28' as date) as timestamp); 
select cast(cast('1987-09-27' as date) as timestamp); 
select cast(cast('1987-09-27' as date) as timestamp); 
select cast(cast('1988-09-25' as date) as timestamp); 
select cast(cast('1988-09-25' as date) as timestamp); 
select cast(cast('1989-09-24' as date) as timestamp); 
select cast(cast('1989-09-24' as date) as timestamp); 
select cast(cast('1990-07-01' as date) as timestamp); 
select cast(cast('1991-09-29' as date) as timestamp); 
select cast(cast('1991-09-29' as date) as timestamp);


was (Author: paco87):
It appears that this issue has something to do with timezones is set (maybe due 
to summer/winter time) if UTC is 0 everything works fine.

I checked two timezones Europe/Warsaw and Europe/Kiove diffrent dates have this 
problem. 

Anyway when casting date to timestamp should show correct time "00:00:00"

> Cast date to timestamp incorrect interpretation
> ---
>
> Key: HIVE-21121
> URL: https://issues.apache.org/jira/browse/HIVE-21121
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 2.1.0, 3.0.0
>Reporter: paco87
>Priority: Major
> Attachments: jira_replicate_issue.txt
>
>
> Hive is returning timestamp with current time when casting date to timestamp, 
> where it should be 00:00:00.0 .
> This issue is wired. It seems that this happens only on specific dates: 
> '1900-01-01'.
> +++
> |ab|
> +++
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1890-01-01 00:00:00.0|
> |1901-01-01 00:00:00.0|
> |1900-01-01 11:28:46.869|
> +---++
> -I didin't notice this on any other date.-
>  This might be connected to old issue: HIVE-10488



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21121) Cast date to timestamp incorrect interpretation

2019-02-06 Thread paco87 (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

paco87 updated HIVE-21121:
--
Description: 
Hive is returning timestamp with current time when casting date to timestamp, 
where it should be 00:00:00.0 .

This issue is wired. It seems that this happens only on specific dates: 
'1900-01-01'.

+++
|ab|

+++
|1900-01-01 11:28:46.869|
|1900-01-01 11:28:46.869|
|1900-01-01 11:28:46.869|
|1900-01-01 11:28:46.869|
|1890-01-01 00:00:00.0|
|1901-01-01 00:00:00.0|
|1900-01-01 11:28:46.869|

+---++

-I didin't notice this on any other date.-

 This might be connected to old issue: HIVE-10488

  was:
Hive is returning timestamp with current time when casting date to timestamp, 
where it should be 00:00:00.0 .

This issue is wired. It seems that this happens only on specific dates: 
'1900-01-01'.

+++
|ab|

+++
|1900-01-01 11:28:46.869|
|1900-01-01 11:28:46.869|
|1900-01-01 11:28:46.869|
|1900-01-01 11:28:46.869|
|1890-01-01 00:00:00.0|
|1901-01-01 00:00:00.0|
|1900-01-01 11:28:46.869|

+---++

I didin't notice this on any other date.

 This might be connected to old issue: HIVE-10488


> Cast date to timestamp incorrect interpretation
> ---
>
> Key: HIVE-21121
> URL: https://issues.apache.org/jira/browse/HIVE-21121
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 2.1.0, 3.0.0
>Reporter: paco87
>Priority: Major
> Attachments: jira_replicate_issue.txt
>
>
> Hive is returning timestamp with current time when casting date to timestamp, 
> where it should be 00:00:00.0 .
> This issue is wired. It seems that this happens only on specific dates: 
> '1900-01-01'.
> +++
> |ab|
> +++
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1890-01-01 00:00:00.0|
> |1901-01-01 00:00:00.0|
> |1900-01-01 11:28:46.869|
> +---++
> -I didin't notice this on any other date.-
>  This might be connected to old issue: HIVE-10488



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21121) Cast date to timestamp incorrect interpretation

2019-02-06 Thread paco87 (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

paco87 updated HIVE-21121:
--
Description: 
Hive is returning timestamp with current time when casting date to timestamp, 
where it should be 00:00:00.0 .

This issue is wired. It seems that this happens only on specific dates: 
'1900-01-01'.

+++
|ab|

+++
|1900-01-01 11:28:46.869|
|1900-01-01 11:28:46.869|
|1900-01-01 11:28:46.869|
|1900-01-01 11:28:46.869|
|1890-01-01 00:00:00.0|
|1901-01-01 00:00:00.0|
|1900-01-01 11:28:46.869|

+---++

I didin't notice this on any other date.

 This might be connected to old issue: HIVE-10488

  was:
Hive is returning timestamp with current time when casting date to timestamp, 
where it should be 00:00:00.0 .

This issue is wired. It seems that this happens only on specific date: 
'1900-01-01'.

+++
|ab|

+++
|1900-01-01 11:28:46.869|
|1900-01-01 11:28:46.869|
|1900-01-01 11:28:46.869|
|1900-01-01 11:28:46.869|
|1890-01-01 00:00:00.0|
|1901-01-01 00:00:00.0|
|1900-01-01 11:28:46.869|

+---++

I didin't notice this on any other date.

 This might be connected to old issue: HIVE-10488


> Cast date to timestamp incorrect interpretation
> ---
>
> Key: HIVE-21121
> URL: https://issues.apache.org/jira/browse/HIVE-21121
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 2.1.0, 3.0.0
>Reporter: paco87
>Priority: Major
> Attachments: jira_replicate_issue.txt
>
>
> Hive is returning timestamp with current time when casting date to timestamp, 
> where it should be 00:00:00.0 .
> This issue is wired. It seems that this happens only on specific dates: 
> '1900-01-01'.
> +++
> |ab|
> +++
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1890-01-01 00:00:00.0|
> |1901-01-01 00:00:00.0|
> |1900-01-01 11:28:46.869|
> +---++
> I didin't notice this on any other date.
>  This might be connected to old issue: HIVE-10488



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21121) Cast date to timestamp incorrect interpretation

2019-02-06 Thread paco87 (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

paco87 updated HIVE-21121:
--
Affects Version/s: 3.0.0

> Cast date to timestamp incorrect interpretation
> ---
>
> Key: HIVE-21121
> URL: https://issues.apache.org/jira/browse/HIVE-21121
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 2.1.0, 3.0.0
>Reporter: paco87
>Priority: Major
> Attachments: jira_replicate_issue.txt
>
>
> Hive is returning timestamp with current time when casting date to timestamp, 
> where it should be 00:00:00.0 .
> This issue is wired. It seems that this happens only on specific date: 
> '1900-01-01'.
> +++
> |ab|
> +++
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1890-01-01 00:00:00.0|
> |1901-01-01 00:00:00.0|
> |1900-01-01 11:28:46.869|
> +---++
> I didin't notice this on any other date.
>  This might be connected to old issue: HIVE-10488



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-21121) Cast date to timestamp incorrect interpretation

2019-02-06 Thread paco87 (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761836#comment-16761836
 ] 

paco87 edited comment on HIVE-21121 at 2/6/19 3:15 PM:
---

It appears that this issue has something to do with timezones is set (maybe due 
to summer/winter time) if UTC is 0 everything works fine.

I checked two timezones Europe/Warsaw and Europe/Kiove diffrent dates have this 
problem. 

Anyway when casting date to timestamp should show correct time "00:00:00"


was (Author: paco87):
It appears that this issue has problems when timezone is set (maybe due to 
summer/winter time) if UTC is 0 everything works fine.

I checked two timezones Europe/Warsaw and Europe/Kiove diffrent dates have this 
problem. 

> Cast date to timestamp incorrect interpretation
> ---
>
> Key: HIVE-21121
> URL: https://issues.apache.org/jira/browse/HIVE-21121
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 2.1.0
>Reporter: paco87
>Priority: Major
> Attachments: jira_replicate_issue.txt
>
>
> Hive is returning timestamp with current time when casting date to timestamp, 
> where it should be 00:00:00.0 .
> This issue is wired. It seems that this happens only on specific date: 
> '1900-01-01'.
> +++
> |ab|
> +++
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1890-01-01 00:00:00.0|
> |1901-01-01 00:00:00.0|
> |1900-01-01 11:28:46.869|
> +---++
> I didin't notice this on any other date.
>  This might be connected to old issue: HIVE-10488



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21121) Cast date to timestamp incorrect interpretation

2019-02-06 Thread paco87 (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761836#comment-16761836
 ] 

paco87 commented on HIVE-21121:
---

It appears that this issue has problems when timezone is set (maybe due to 
summer/winter time) if UTC is 0 everything works fine.

I checked two timezones Europe/Warsaw and Europe/Kiove diffrent dates have this 
problem. 

> Cast date to timestamp incorrect interpretation
> ---
>
> Key: HIVE-21121
> URL: https://issues.apache.org/jira/browse/HIVE-21121
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1, 2.1.0
>Reporter: paco87
>Priority: Major
> Attachments: jira_replicate_issue.txt
>
>
> Hive is returning timestamp with current time when casting date to timestamp, 
> where it should be 00:00:00.0 .
> This issue is wired. It seems that this happens only on specific date: 
> '1900-01-01'.
> +++
> |ab|
> +++
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1900-01-01 11:28:46.869|
> |1890-01-01 00:00:00.0|
> |1901-01-01 00:00:00.0|
> |1900-01-01 11:28:46.869|
> +---++
> I didin't notice this on any other date.
>  This might be connected to old issue: HIVE-10488



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21199) Replace all occurences of new Byte with Byte.valueOf

2019-02-06 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21199:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

pushed to master. Thank you [~isuller]!

> Replace all occurences of new Byte with Byte.valueOf
> 
>
> Key: HIVE-21199
> URL: https://issues.apache.org/jira/browse/HIVE-21199
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Ivan Suller
>Assignee: Ivan Suller
>Priority: Trivial
> Fix For: 4.0.0
>
> Attachments: HIVE-21199.01.patch, HIVE-21199.02.patch, 
> HIVE-21199.03.patch, HIVE-21199.04.patch, HIVE-21199.05.patch
>
>
> Creating Byte objects with new Byte(...) creates a new object, while 
> Byte.valueOf(...) can be cached (and is actually cached in most if not all 
> JVMs) thus reducing GC overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761785#comment-16761785
 ] 

Hive QA commented on HIVE-20758:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
45s{color} | {color:blue} ql in master has 2303 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
12s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15965/dev-support/hive-personality.sh
 |
| git revision | master / 269dc5d |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15965/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15965/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, Screen Shot 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL   

[jira] [Comment Edited] (HIVE-20849) Review of ConstantPropagateProcFactory

2019-02-06 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760819#comment-16760819
 ] 

BELUGA BEHR edited comment on HIVE-20849 at 2/6/19 2:27 PM:


[~pvary] Thanks for the review!

# Do you mean why did I not use parameters in the logging?  First, I wanted to 
change as little as possible.  I know you prefer that. :)  Second, using 
parameters is the fastest way to *not* log [1], but it would be very unlikely 
that someone would configure their logging to be CRITICAL level only, so in 
practice, these messages will always be logged; there is no need to optimize.
# The {{Collections.emptyMap()}} is relying on unit tests and qtests for 
verification.



[1] https://www.slf4j.org/faq.html#logging_performance


was (Author: belugabehr):
[~pvary] Thanks for the review!

# Do you mean why did I not use parameters in the logging?  First, I wanted to 
change as little as possible.  I know you prefer that. :)  Second, using 
parameters is the fastest way to *not* log [1], but it would be very unlikely 
that someone would configure their logging to be CRITICAL level only, so in 
practice, these messages will always be logged; there is no need to optimize.
# The {{Collections.emptyMap()}} is relying on unit tests for verification.



[1] https://www.slf4j.org/faq.html#logging_performance

> Review of ConstantPropagateProcFactory
> --
>
> Key: HIVE-20849
> URL: https://issues.apache.org/jira/browse/HIVE-20849
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 3.1.0, 4.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-20849.1.patch, HIVE-20849.1.patch, 
> HIVE-20849.2.patch, HIVE-20849.3.patch, HIVE-20849.4.patch, 
> HIVE-20849.5.patch, HIVE-20849.6.patch
>
>
> I was looking at this class because it blasts a lot of useless (to an admin) 
> information to the logs.  Especially if the table has a lot of columns, I see 
> big blocks of logging that are meaningless to me.  I request that the logging 
> is toned down to debug, and some other improvements to the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21071:
---
Status: Patch Available  (was: Open)

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.2.patch, 
> HIVE-21071.3.patch, HIVE-21071.4.patch, HIVE-21071.5.patch, 
> HIVE-21071.6.patch, HIVE-21071.7.patch, HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21071:
---
Attachment: HIVE-21071.9.patch

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.2.patch, 
> HIVE-21071.3.patch, HIVE-21071.4.patch, HIVE-21071.5.patch, 
> HIVE-21071.6.patch, HIVE-21071.7.patch, HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21071) Improve getInputSummary

2019-02-06 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21071:
---
Status: Open  (was: Patch Available)

> Improve getInputSummary
> ---
>
> Key: HIVE-21071
> URL: https://issues.apache.org/jira/browse/HIVE-21071
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21071.1.patch, HIVE-21071.2.patch, 
> HIVE-21071.3.patch, HIVE-21071.4.patch, HIVE-21071.5.patch, 
> HIVE-21071.6.patch, HIVE-21071.7.patch, HIVE-21071.8.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >