[jira] [Updated] (HIVE-17626) Query reoptimization using cached runtime statistics

2018-03-04 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-17626:

Attachment: (was: HIVE-17626.08.patch)

> Query reoptimization using cached runtime statistics
> 
>
> Key: HIVE-17626
> URL: https://issues.apache.org/jira/browse/HIVE-17626
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-17626.01.patch, HIVE-17626.01wip01.patch, 
> HIVE-17626.02.patch, HIVE-17626.03.patch, HIVE-17626.04.patch, 
> HIVE-17626.05.patch, HIVE-17626.06.patch, HIVE-17626.07A.patch, 
> HIVE-17626.07B.patch, runtimestats.patch
>
>
> Something similar to "EXPLAIN ANALYZE" where we annotate explain plan with 
> actual and estimated statistics. The runtime stats can be cached at query 
> level and subsequent execution of the same query can make use of the cached 
> statistics from the previous run for better optimization. 
> Some use cases,
> 1) re-planning join query (mapjoin failures can be converted to shuffle joins)
> 2) better statistics for table scan operator if dynamic partition pruning is 
> involved
> 3) Better estimates for bloom filter initialization (setting expected entries 
> during merge)
> This can extended to support wider queries by caching fragments of operator 
> plans scanning same table(s) or matching some operator sequences.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17626) Query reoptimization using cached runtime statistics

2018-03-04 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-17626:

Attachment: HIVE-17626.08.patch

> Query reoptimization using cached runtime statistics
> 
>
> Key: HIVE-17626
> URL: https://issues.apache.org/jira/browse/HIVE-17626
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-17626.01.patch, HIVE-17626.01wip01.patch, 
> HIVE-17626.02.patch, HIVE-17626.03.patch, HIVE-17626.04.patch, 
> HIVE-17626.05.patch, HIVE-17626.06.patch, HIVE-17626.07A.patch, 
> HIVE-17626.07B.patch, HIVE-17626.08.patch, runtimestats.patch
>
>
> Something similar to "EXPLAIN ANALYZE" where we annotate explain plan with 
> actual and estimated statistics. The runtime stats can be cached at query 
> level and subsequent execution of the same query can make use of the cached 
> statistics from the previous run for better optimization. 
> Some use cases,
> 1) re-planning join query (mapjoin failures can be converted to shuffle joins)
> 2) better statistics for table scan operator if dynamic partition pruning is 
> involved
> 3) Better estimates for bloom filter initialization (setting expected entries 
> during merge)
> This can extended to support wider queries by caching fragments of operator 
> plans scanning same table(s) or matching some operator sequences.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17552) Enable bucket map join by default

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385705#comment-16385705
 ] 

Hive QA commented on HIVE-17552:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 49 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-9483/dev-support/hive-personality.sh
 |
| git revision | master / 05d4719 |
| Default Java | 1.8.0_111 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9483/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9483/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Enable bucket map join by default
> -
>
> Key: HIVE-17552
> URL: https://issues.apache.org/jira/browse/HIVE-17552
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-17552.1.patch, HIVE-17552.10.patch, 
> HIVE-17552.2.patch, HIVE-17552.3.patch, HIVE-17552.4.patch, 
> HIVE-17552.5.patch, HIVE-17552.6.patch, HIVE-17552.7.patch, 
> HIVE-17552.8.patch, HIVE-17552.9.patch
>
>
> Currently bucket map join is disabled by default, however, it is potentially 
> most optimal join we have. Need to enable it by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385693#comment-16385693
 ] 

Zoltan Haindrich commented on HIVE-18743:
-

[~akolb] if the stats collection is removed from the metastore; that also means 
that the code you are testing will be also gonebecause it will no longer 
happen there...
I think that probably the following command sequence could make this testable:
create table; insert ; desc the table; remove files from the table datadir by 
dfs commands; alter table ; desc table - stats are the same

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18051) qfiles: dataset support

2018-03-04 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385684#comment-16385684
 ] 

Zoltan Haindrich commented on HIVE-18051:
-

+1

> qfiles: dataset support
> ---
>
> Key: HIVE-18051
> URL: https://issues.apache.org/jira/browse/HIVE-18051
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-18051.01.patch, HIVE-18051.02.patch, 
> HIVE-18051.03.patch, HIVE-18051.04.patch, HIVE-18051.05.patch, 
> HIVE-18051.06.patch, HIVE-18051.07.patch, HIVE-18051.08.patch, 
> HIVE-18051.09.patch, HIVE-18051.10.patch, HIVE-18051.11.patch, 
> HIVE-18051.12.patch
>
>
> it would be great to have some kind of test dataset support; currently there 
> is the {{q_test_init.sql}} which is quite large; and I'm often override it 
> with an invalid string; because I write independent qtests most of the time - 
> and the load of {{src}} and other tables are just a waste of time for me ; 
> not to mention that the loading of those tables may also trigger breakpoints 
> - which is a bit annoying.
> Most of the tests are "only" using the {{src}} table and possibly 2 others; 
> however the main init script contains a bunch of tables - meanwhile there are 
> quite few other tests which could possibly also benefit from a more general 
> feature; for example the creation of {{bucket_small}} is present in 20 q 
> files.
> the proposal would be to enable the qfiles to be annotated with metadata like 
> datasets:
> {code}
> --! qt:dataset:src,bucket_small
> {code}
> proposal for storing a dataset:
> * the loader script would be at: {{data/datasets/__NAME__/load.hive.sql}}
> * the table data could be stored under that location
> a draft about this; and other qfiles related ideas:
> https://docs.google.com/document/d/1KtcIx8ggL9LxDintFuJo8NQuvNWkmtvv_ekbWrTLNGc/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17552) Enable bucket map join by default

2018-03-04 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17552:
--
Attachment: HIVE-17552.10.patch

> Enable bucket map join by default
> -
>
> Key: HIVE-17552
> URL: https://issues.apache.org/jira/browse/HIVE-17552
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-17552.1.patch, HIVE-17552.10.patch, 
> HIVE-17552.2.patch, HIVE-17552.3.patch, HIVE-17552.4.patch, 
> HIVE-17552.5.patch, HIVE-17552.6.patch, HIVE-17552.7.patch, 
> HIVE-17552.8.patch, HIVE-17552.9.patch
>
>
> Currently bucket map join is disabled by default, however, it is potentially 
> most optimal join we have. Need to enable it by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-18749) Need to replace transactionId with writeId in RecordIdentifier.Field.transactionId

2018-03-04 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-18749 started by Sankar Hariappan.
---
> Need to replace transactionId with writeId in 
> RecordIdentifier.Field.transactionId
> --
>
> Key: HIVE-18749
> URL: https://issues.apache.org/jira/browse/HIVE-18749
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID
> Fix For: 3.0.0
>
>
> Per table write ID implementation (HIVE-18192) have replaced global 
> transaction ID with write ID for the primary key for a row marked by 
> RecordIdentifier.Field..transactionId.
> Need to replace the same with writeId and update all test results file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17552) Enable bucket map join by default

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385635#comment-16385635
 ] 

Hive QA commented on HIVE-17552:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12912970/HIVE-17552.9.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 68 failed/errored test(s), 13456 tests 
executed
*Failed tests:*
{noformat}
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=93)


[jira] [Assigned] (HIVE-18857) Store default value text instead of default value expression in metastore

2018-03-04 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-18857:
--


> Store default value text instead of default value expression in metastore
> -
>
> Key: HIVE-18857
> URL: https://issues.apache.org/jira/browse/HIVE-18857
> Project: Hive
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>
> Currently for default value an expression is generated and serialized to 
> store in metastore. It should be improved to rather serialize the default 
> value itself instead of expression and store that in metastore. This will 
> have the following benefits:
> * It will make metastore schema upgrade safe. e.g. if a UDF function name is 
> changed hive wouldn't be able to parse back the expression for this UDF which 
> was serialized in earlier version.
> *  It will make metastore schema for default constraint hive agnostic. Other 
> databases would be able to use the value as it is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17552) Enable bucket map join by default

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385607#comment-16385607
 ] 

Hive QA commented on HIVE-17552:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 49 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 19s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-9482/dev-support/hive-personality.sh
 |
| git revision | master / 05d4719 |
| Default Java | 1.8.0_111 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9482/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9482/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Enable bucket map join by default
> -
>
> Key: HIVE-17552
> URL: https://issues.apache.org/jira/browse/HIVE-17552
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-17552.1.patch, HIVE-17552.2.patch, 
> HIVE-17552.3.patch, HIVE-17552.4.patch, HIVE-17552.5.patch, 
> HIVE-17552.6.patch, HIVE-17552.7.patch, HIVE-17552.8.patch, HIVE-17552.9.patch
>
>
> Currently bucket map join is disabled by default, however, it is potentially 
> most optimal join we have. Need to enable it by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17552) Enable bucket map join by default

2018-03-04 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17552:
--
Attachment: HIVE-17552.9.patch

> Enable bucket map join by default
> -
>
> Key: HIVE-17552
> URL: https://issues.apache.org/jira/browse/HIVE-17552
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-17552.1.patch, HIVE-17552.2.patch, 
> HIVE-17552.3.patch, HIVE-17552.4.patch, HIVE-17552.5.patch, 
> HIVE-17552.6.patch, HIVE-17552.7.patch, HIVE-17552.8.patch, HIVE-17552.9.patch
>
>
> Currently bucket map join is disabled by default, however, it is potentially 
> most optimal join we have. Need to enable it by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14848) S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385540#comment-16385540
 ] 

Hive QA commented on HIVE-14848:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12830630/HIVE-14848.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 13059 tests 
executed
*Failed tests:*
{noformat}
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=93)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-03-04 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385539#comment-16385539
 ] 

Saisai Shao commented on HIVE-16391:


Hi all,

Do we have any progress on it? Spark currently uses forked Hive 1.2.1.spark2, 
which rejects the Hadoop version 3.0 support (SPARK-18673). We can patch forked 
Hive 1.2.1.spark2 to support Hadoop 3, but seems a proper solution is to 
maintain this in Hive as discussed (SPARK-20202) and make it fix in the Hive 
community.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Reynold Xin
>Priority: Major
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14848) S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385506#comment-16385506
 ] 

Hive QA commented on HIVE-14848:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 49 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-9481/dev-support/hive-personality.sh
 |
| git revision | master / 05d4719 |
| Default Java | 1.8.0_111 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9481/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9481/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs
> 
>
> Key: HIVE-14848
> URL: https://issues.apache.org/jira/browse/HIVE-14848
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Major
> Attachments: HIVE-14848.1.patch, HIVE-14848.1.patch
>
>
> When S3 credentials are included in hive-site.xml, then MR jobs that need to 
> read
> data from S3 cannot use them because S3 values are stripped from the Job 
> configuration
> object before submitting the MR job.
> {noformat}
> @Override
> public void initialize(HiveConf conf, QueryPlan queryPlan, DriverContext 
> driverContext) {
>   ...
>   conf.stripHiddenConfigurations(job);
>   this.jobExecHelper = new HadoopJobExecHelper(job, console, this, this);
> }
> {noformat}
> A nice to have (available on hadoop 2.9.0) is an MR 
> {{mapreduce.job.redacted-properties}} that can be used to hide this list on 
> the MR side (such as history server UI) to allow MR run the job without 
> issues.
> UPDATE:
> Change the call to stripHiddenConfigurations() in 
> ql/exec/tez/DagUtils.createConfiguration(), because this is currently broken 
> for running hive-blobstore suite against Tez



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18856) param note error

2018-03-04 Thread Yu Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385500#comment-16385500
 ] 

Yu Wang edited comment on HIVE-18856 at 3/5/18 2:40 AM:


please have a look.[~thejas]


was (Author: gentlewang):
please have a look?[~thejas]

> param note error
> 
>
> Key: HIVE-18856
> URL: https://issues.apache.org/jira/browse/HIVE-18856
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Yu Wang
>Assignee: Yu Wang
>Priority: Critical
> Fix For: 1.1.0
>
> Attachments: HIVE-18856.patch
>
>
> The PerfLogBegin method in the PerfLogger file comments with an error



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18856) param note error

2018-03-04 Thread Yu Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385500#comment-16385500
 ] 

Yu Wang edited comment on HIVE-18856 at 3/5/18 2:39 AM:


please have a look?[~thejas]


was (Author: gentlewang):
mind have a look?[~thejas]

> param note error
> 
>
> Key: HIVE-18856
> URL: https://issues.apache.org/jira/browse/HIVE-18856
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Yu Wang
>Assignee: Yu Wang
>Priority: Critical
> Fix For: 1.1.0
>
> Attachments: HIVE-18856.patch
>
>
> The PerfLogBegin method in the PerfLogger file comments with an error



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18856) param note error

2018-03-04 Thread Yu Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385500#comment-16385500
 ] 

Yu Wang commented on HIVE-18856:


mind have a look?[~thejas]

> param note error
> 
>
> Key: HIVE-18856
> URL: https://issues.apache.org/jira/browse/HIVE-18856
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Yu Wang
>Assignee: Yu Wang
>Priority: Critical
> Fix For: 1.1.0
>
> Attachments: HIVE-18856.patch
>
>
> The PerfLogBegin method in the PerfLogger file comments with an error



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-14848) S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs

2018-03-04 Thread Franck Tago (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385495#comment-16385495
 ] 

Franck Tago edited comment on HIVE-14848 at 3/5/18 2:19 AM:


Have  we decided to completely ignore this issue in hive then ?  This is a 
major problem.  We run jobs for our customers on their cluster and other that  
changing the hidden  list , I do not know of any other workaround.  

What is the current plan for this issue ?

I also do not see a patch for the cited Tez issue .What is going with that ?


was (Author: tafra...@gmail.com):
Have  we decided to completely ignore this issue in hive then ?  This is a 
major problem.  We run jobs for our customers on their cluster and other that  
changing the hidden  list , I do not know of any other workaround.  

What is the current plan for this issue ?

> S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs
> 
>
> Key: HIVE-14848
> URL: https://issues.apache.org/jira/browse/HIVE-14848
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Major
> Attachments: HIVE-14848.1.patch, HIVE-14848.1.patch
>
>
> When S3 credentials are included in hive-site.xml, then MR jobs that need to 
> read
> data from S3 cannot use them because S3 values are stripped from the Job 
> configuration
> object before submitting the MR job.
> {noformat}
> @Override
> public void initialize(HiveConf conf, QueryPlan queryPlan, DriverContext 
> driverContext) {
>   ...
>   conf.stripHiddenConfigurations(job);
>   this.jobExecHelper = new HadoopJobExecHelper(job, console, this, this);
> }
> {noformat}
> A nice to have (available on hadoop 2.9.0) is an MR 
> {{mapreduce.job.redacted-properties}} that can be used to hide this list on 
> the MR side (such as history server UI) to allow MR run the job without 
> issues.
> UPDATE:
> Change the call to stripHiddenConfigurations() in 
> ql/exec/tez/DagUtils.createConfiguration(), because this is currently broken 
> for running hive-blobstore suite against Tez



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14848) S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs

2018-03-04 Thread Franck Tago (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385495#comment-16385495
 ] 

Franck Tago commented on HIVE-14848:


Have  we decided to completely ignore this issue in hive then ?  This is a 
major problem.  We run jobs for our customers on their cluster and other that  
changing the hidden  list , I do not know of any other workaround.  

What is the current plan for this issue ?

> S3 creds added to a hidden list by HIVE-14588 are not working on MR jobs
> 
>
> Key: HIVE-14848
> URL: https://issues.apache.org/jira/browse/HIVE-14848
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Major
> Attachments: HIVE-14848.1.patch, HIVE-14848.1.patch
>
>
> When S3 credentials are included in hive-site.xml, then MR jobs that need to 
> read
> data from S3 cannot use them because S3 values are stripped from the Job 
> configuration
> object before submitting the MR job.
> {noformat}
> @Override
> public void initialize(HiveConf conf, QueryPlan queryPlan, DriverContext 
> driverContext) {
>   ...
>   conf.stripHiddenConfigurations(job);
>   this.jobExecHelper = new HadoopJobExecHelper(job, console, this, this);
> }
> {noformat}
> A nice to have (available on hadoop 2.9.0) is an MR 
> {{mapreduce.job.redacted-properties}} that can be used to hide this list on 
> the MR side (such as history server UI) to allow MR run the job without 
> issues.
> UPDATE:
> Change the call to stripHiddenConfigurations() in 
> ql/exec/tez/DagUtils.createConfiguration(), because this is currently broken 
> for running hive-blobstore suite against Tez



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18856) param note error

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385487#comment-16385487
 ] 

Hive QA commented on HIVE-18856:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12912964/HIVE-18856.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/9480/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/9480/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-9480/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2018-03-05 02:03:46.154
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-9480/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2018-03-05 02:03:46.157
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 05d4719 HIVE-18833: Auto Merge fails when "insert into directory 
as orcfile" (Daniel Dai, reviewed by Prasanth Jayachandran)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 05d4719 HIVE-18833: Auto Merge fails when "insert into directory 
as orcfile" (Daniel Dai, reviewed by Prasanth Jayachandran)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2018-03-05 02:03:50.085
+ rm -rf ../yetus_PreCommit-HIVE-Build-9480
+ mkdir ../yetus_PreCommit-HIVE-Build-9480
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-9480
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-9480/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/ql/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java: does not 
exist in index
error: ql/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java: does not 
exist in index
error: src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java: does not exist 
in index
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12912964 - PreCommit-HIVE-Build

> param note error
> 
>
> Key: HIVE-18856
> URL: https://issues.apache.org/jira/browse/HIVE-18856
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Yu Wang
>Assignee: Yu Wang
>Priority: Critical
> Fix For: 1.1.0
>
> Attachments: HIVE-18856.patch
>
>
> The PerfLogBegin method in the PerfLogger file comments with an error



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18856) param note error

2018-03-04 Thread Yu Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated HIVE-18856:
---
Status: Patch Available  (was: Open)

> param note error
> 
>
> Key: HIVE-18856
> URL: https://issues.apache.org/jira/browse/HIVE-18856
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Yu Wang
>Assignee: Yu Wang
>Priority: Critical
> Fix For: 1.1.0
>
> Attachments: HIVE-18856.patch
>
>
> The PerfLogBegin method in the PerfLogger file comments with an error



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18856) param note error

2018-03-04 Thread Yu Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang updated HIVE-18856:
---
Attachment: HIVE-18856.patch

> param note error
> 
>
> Key: HIVE-18856
> URL: https://issues.apache.org/jira/browse/HIVE-18856
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Yu Wang
>Assignee: Yu Wang
>Priority: Critical
> Fix For: 1.1.0
>
> Attachments: HIVE-18856.patch
>
>
> The PerfLogBegin method in the PerfLogger file comments with an error



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18856) param note error

2018-03-04 Thread Yu Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Wang reassigned HIVE-18856:
--


> param note error
> 
>
> Key: HIVE-18856
> URL: https://issues.apache.org/jira/browse/HIVE-18856
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Yu Wang
>Assignee: Yu Wang
>Priority: Critical
> Fix For: 1.1.0
>
>
> The PerfLogBegin method in the PerfLogger file comments with an error



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18746) add_months should validate the date first

2018-03-04 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385457#comment-16385457
 ] 

Vineet Garg commented on HIVE-18746:


Look good to me. +1

> add_months should validate the date first
> -
>
> Key: HIVE-18746
> URL: https://issues.apache.org/jira/browse/HIVE-18746
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Subhasis Gorai
>Assignee: Kryvenko Igor
>Priority: Minor
> Attachments: HIVE-18746.1.patch, HIVE-18746.3.patch, 
> HIVE-18746.4.patch, HIVE-18746.5.patch, HIVE-18746.6.patch, 
> HIVE-18746.7.patch, HIVE-18746.patch
>
>
> hive (sbg_hvc_ods)> select add_months('2017-02-28', 1);
> OK
> _c0
> 2017-03-31
> Time taken: 0.107 seconds, Fetched: 1 row(s)
> hive (sbg_hvc_ods)> select add_months('2017-02-29', 1);
> OK
> _c0
> 2017-04-01
> Time taken: 0.084 seconds, Fetched: 1 row(s)
> hive (sbg_hvc_ods)>
>  
> '2017-02-29' is an invalid date.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18051) qfiles: dataset support

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385437#comment-16385437
 ] 

Hive QA commented on HIVE-18051:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12912957/HIVE-18051.12.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 13065 tests 
executed
*Failed tests:*
{noformat}
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=94)


[jira] [Commented] (HIVE-18051) qfiles: dataset support

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385431#comment-16385431
 ] 

Hive QA commented on HIVE-18051:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
43s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
39s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
12s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
47s{color} | {color:red} root: The patch generated 7 new + 165 unchanged - 9 
fixed = 172 total (was 174) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
14s{color} | {color:red} itests/util: The patch generated 7 new + 160 unchanged 
- 9 fixed = 167 total (was 169) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 49 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-9479/dev-support/hive-personality.sh
 |
| git revision | master / 05d4719 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9479/yetus/diff-checkstyle-root.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9479/yetus/diff-checkstyle-itests_util.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9479/yetus/patch-asflicense-problems.txt
 |
| modules | C: . itests/util ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9479/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> qfiles: dataset support
> ---
>
> Key: HIVE-18051
> URL: https://issues.apache.org/jira/browse/HIVE-18051
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-18051.01.patch, HIVE-18051.02.patch, 
> HIVE-18051.03.patch, HIVE-18051.04.patch, HIVE-18051.05.patch, 
> HIVE-18051.06.patch, HIVE-18051.07.patch, HIVE-18051.08.patch, 
> HIVE-18051.09.patch, HIVE-18051.10.patch, HIVE-18051.11.patch, 
> HIVE-18051.12.patch
>
>
> it would be great to have some kind of test dataset support; currently there 
> is the {{q_test_init.sql}} which is quite large; and I'm often override it 
> with an invalid string; because I write independent qtests most of the time - 
> and the load of {{src}} and other tables are just a waste of time for me ; 
> not to mention that the loading of 

[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385415#comment-16385415
 ] 

Hive QA commented on HIVE-18743:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12912949/HIVE-18743.07.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 13062 tests 
executed
*Failed tests:*
{noformat}
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=93)


[jira] [Updated] (HIVE-18051) qfiles: dataset support

2018-03-04 Thread Laszlo Bodor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-18051:

Attachment: HIVE-18051.12.patch

> qfiles: dataset support
> ---
>
> Key: HIVE-18051
> URL: https://issues.apache.org/jira/browse/HIVE-18051
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-18051.01.patch, HIVE-18051.02.patch, 
> HIVE-18051.03.patch, HIVE-18051.04.patch, HIVE-18051.05.patch, 
> HIVE-18051.06.patch, HIVE-18051.07.patch, HIVE-18051.08.patch, 
> HIVE-18051.09.patch, HIVE-18051.10.patch, HIVE-18051.11.patch, 
> HIVE-18051.12.patch
>
>
> it would be great to have some kind of test dataset support; currently there 
> is the {{q_test_init.sql}} which is quite large; and I'm often override it 
> with an invalid string; because I write independent qtests most of the time - 
> and the load of {{src}} and other tables are just a waste of time for me ; 
> not to mention that the loading of those tables may also trigger breakpoints 
> - which is a bit annoying.
> Most of the tests are "only" using the {{src}} table and possibly 2 others; 
> however the main init script contains a bunch of tables - meanwhile there are 
> quite few other tests which could possibly also benefit from a more general 
> feature; for example the creation of {{bucket_small}} is present in 20 q 
> files.
> the proposal would be to enable the qfiles to be annotated with metadata like 
> datasets:
> {code}
> --! qt:dataset:src,bucket_small
> {code}
> proposal for storing a dataset:
> * the loader script would be at: {{data/datasets/__NAME__/load.hive.sql}}
> * the table data could be stored under that location
> a draft about this; and other qfiles related ideas:
> https://docs.google.com/document/d/1KtcIx8ggL9LxDintFuJo8NQuvNWkmtvv_ekbWrTLNGc/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385397#comment-16385397
 ] 

Hive QA commented on HIVE-18743:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
19s{color} | {color:red} standalone-metastore: The patch generated 5 new + 505 
unchanged - 10 fixed = 510 total (was 515) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 49 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 11m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-9478/dev-support/hive-personality.sh
 |
| git revision | master / 05d4719 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9478/yetus/diff-checkstyle-standalone-metastore.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9478/yetus/patch-asflicense-problems.txt
 |
| modules | C: standalone-metastore U: standalone-metastore |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9478/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean 

[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385389#comment-16385389
 ] 

Hive QA commented on HIVE-18743:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12912949/HIVE-18743.07.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 13061 tests 
executed
*Failed tests:*
{noformat}
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=93)


[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385374#comment-16385374
 ] 

Alexander Kolbasov commented on HIVE-18743:
---

[~kgyrtkirk] what is the value of high-value qtest? The unit test allows me to 
control execution environment of the function exactly and it gives me an 
opportunity to verify whether warehouse ops are called or not. What extra value 
would we get from a qtest that we don't get from unit test?

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385370#comment-16385370
 ] 

Alexander Kolbasov commented on HIVE-18743:
---

[~kgyrtkirk] So the assumption here is that the value can be not just a 
"true"/"false" string but an actual JSON object in which case it is parsed and 
{{stats.basicStats = true}} just overwrites one property?

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385340#comment-16385340
 ] 

Hive QA commented on HIVE-18743:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
20s{color} | {color:red} standalone-metastore: The patch generated 5 new + 505 
unchanged - 10 fixed = 510 total (was 515) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 49 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 11m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-9477/dev-support/hive-personality.sh
 |
| git revision | master / 05d4719 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9477/yetus/diff-checkstyle-standalone-metastore.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9477/yetus/patch-asflicense-problems.txt
 |
| modules | C: standalone-metastore U: standalone-metastore |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9477/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean 

[jira] [Commented] (HIVE-18768) Use Datanucleus to serialize notification updates

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385328#comment-16385328
 ] 

Hive QA commented on HIVE-18768:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12912945/HIVE-18768.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 137 failed/errored test(s), 13059 tests 
executed
*Failed tests:*
{noformat}
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=93)


[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385296#comment-16385296
 ] 

Zoltan Haindrich commented on HIVE-18743:
-

I don't think so...you left out the other parts of that code... 
https://github.com/apache/hive/blob/05d4719eefc56676a3e0e8f706e1c5e5e1f6b345/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/StatsSetupConst.java#L232

[~akolb] Could you please add a high level qtest ? the testcase from 
testmetastore will also be removed...


> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385293#comment-16385293
 ] 

Alexander Kolbasov commented on HIVE-18743:
---

I noticed a bit of an odd code:
{code:java}
public static void setBasicStatsState(Map params, String 
setting) {
  ...
  ColumnStatsAccurate stats = parseStatsAcc(params.get(COLUMN_STATS_ACCURATE));
  stats.basicStats = true;
}{code}
So  it parses the value of {{COLUMN_STATS_ACCURATE}} but then always ignores it 
and sets {{stats.basicStats}} to true anyway. Is it intentional? Can this be 
removed?

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Kolbasov updated HIVE-18743:
--
Attachment: HIVE-18743.07.patch

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.06.patch, HIVE-18743.07.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-15161) migrate ColumnStats to use jackson

2018-03-04 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385292#comment-16385292
 ] 

Alexander Kolbasov commented on HIVE-15161:
---

[~kgyrtkirk] I noticed a bit of an odd code:
{code:java}
public static void setBasicStatsState(Map params, String 
setting) {
  ...
  ColumnStatsAccurate stats = parseStatsAcc(params.get(COLUMN_STATS_ACCURATE));
  stats.basicStats = true;
}{code}
So  it parses the value of {{COLUMN_STATS_ACCURATE}} but then always ignores it 
and sets {{stats.basicStats}} to true anyway. Is it intentional? Can this be 
removed?

> migrate ColumnStats to use jackson
> --
>
> Key: HIVE-15161
> URL: https://issues.apache.org/jira/browse/HIVE-15161
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 2.3.0
>
> Attachments: HIVE-15161.1.patch, HIVE-15161.2.patch, 
> HIVE-15161.3.patch, HIVE-15161.4.patch, HIVE-15161.4.patch, 
> HIVE-15161.5.patch, HIVE-15161.5.patch, HIVE-15161.6.patch
>
>
> * json.org has license issues
> * jackson can provide a fully compatible alternative to it
> * there are a few flakiness issues caused by the order of the map entries of 
> the columns...this cat be addressed, org.json api was unfriendly in this 
> manner ;)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18768) Use Datanucleus to serialize notification updates

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385287#comment-16385287
 ] 

Hive QA commented on HIVE-18768:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
19s{color} | {color:red} standalone-metastore: The patch generated 7 new + 389 
unchanged - 1 fixed = 396 total (was 390) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 49 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 11m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-9476/dev-support/hive-personality.sh
 |
| git revision | master / 05d4719 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9476/yetus/diff-checkstyle-standalone-metastore.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9476/yetus/patch-asflicense-problems.txt
 |
| modules | C: standalone-metastore U: standalone-metastore |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9476/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Use Datanucleus to serialize notification updates
> -
>
> Key: HIVE-18768
> URL: https://issues.apache.org/jira/browse/HIVE-18768
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.0.2, 3.0.0
>Reporter: Alexander Kolbasov
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18768.01.patch, HIVE-18768.02.patch
>
>
> HIVE-16886 added code to serialize notification updates using LOCK FOR 
> UPDATE. It turns out that there is a simpler way - see HIVE-18526. The goal 
> of this JIRA is to use the approach from HIVE-18526 - Datanucleus based 
> solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18768) Use Datanucleus to serialize notification updates

2018-03-04 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385276#comment-16385276
 ] 

Alexander Kolbasov commented on HIVE-18768:
---

Attaching the same patch to trigger rebuild.

> Use Datanucleus to serialize notification updates
> -
>
> Key: HIVE-18768
> URL: https://issues.apache.org/jira/browse/HIVE-18768
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.0.2, 3.0.0
>Reporter: Alexander Kolbasov
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18768.01.patch, HIVE-18768.02.patch
>
>
> HIVE-16886 added code to serialize notification updates using LOCK FOR 
> UPDATE. It turns out that there is a simpler way - see HIVE-18526. The goal 
> of this JIRA is to use the approach from HIVE-18526 - Datanucleus based 
> solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18768) Use Datanucleus to serialize notification updates

2018-03-04 Thread Alexander Kolbasov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Kolbasov updated HIVE-18768:
--
Attachment: HIVE-18768.02.patch

> Use Datanucleus to serialize notification updates
> -
>
> Key: HIVE-18768
> URL: https://issues.apache.org/jira/browse/HIVE-18768
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.0.2, 3.0.0
>Reporter: Alexander Kolbasov
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18768.01.patch, HIVE-18768.02.patch
>
>
> HIVE-16886 added code to serialize notification updates using LOCK FOR 
> UPDATE. It turns out that there is a simpler way - see HIVE-18526. The goal 
> of this JIRA is to use the approach from HIVE-18526 - Datanucleus based 
> solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Kolbasov updated HIVE-18743:
--
Status: Patch Available  (was: Open)

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.1.0, 1.2.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.06.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Kolbasov updated HIVE-18743:
--
Attachment: HIVE-18743.06.patch

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.06.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Kolbasov updated HIVE-18743:
--
Status: Open  (was: Patch Available)

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.1.0, 1.2.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Kolbasov updated HIVE-18743:
--
Attachment: (was: HIVE-18743.05.patch)

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385090#comment-16385090
 ] 

Hive QA commented on HIVE-18743:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12912918/HIVE-18743.05.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 13062 tests 
executed
*Failed tests:*
{noformat}
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=93)


[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385060#comment-16385060
 ] 

Hive QA commented on HIVE-18743:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
19s{color} | {color:red} standalone-metastore: The patch generated 4 new + 512 
unchanged - 3 fixed = 516 total (was 515) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 49 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 11m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-9475/dev-support/hive-personality.sh
 |
| git revision | master / 05d4719 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9475/yetus/diff-checkstyle-standalone-metastore.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9475/yetus/patch-asflicense-problems.txt
 |
| modules | C: standalone-metastore U: standalone-metastore |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9475/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.05.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> 

[jira] [Updated] (HIVE-10179) Optimization for SIMD instructions in Hive

2018-03-04 Thread kangkaisen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated HIVE-10179:
--
Description: 
[SIMD|http://en.wikipedia.org/wiki/SIMD] instuctions could be found in most of 
current CPUs, such as Intel's SSE2, SSE3, SSE4.x, AVX and AVX2, and it would 
help Hive to outperform if we can vectorize the mathematical manipulation part 
of Hive. This umbrella JIRA may contains but not limited to the subtasks like:
 # Code schema adaption, current JVM is quite strictly on the code schema which 
could be transformed into SIMD instructions during execution.
 # New implementation of mathematical manipulation part of Hive which designed 
to be optimized for SIMD instructions.

  was:
[SIMD|http://en.wikipedia.org/wiki/SIMD] instuctions could be found in most of 
current CPUs, such as Intel's SSE2, SSE3, SSE4.x, AVX and AVX2, and it would 
help Hive to outperform if we can vectorize the mathematical manipulation part 
of Hive. This umbrella JIRA may contains but not limited to the subtasks like:
# Code schema adaption, current JVM is quite strictly on the code schema which 
could be transformed into SIMD instructions during execution. 
# New implementation of mathematical manipulation part of Hive which designed 
to be optimized for SIMD instructions.


> Optimization for SIMD instructions in Hive
> --
>
> Key: HIVE-10179
> URL: https://issues.apache.org/jira/browse/HIVE-10179
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Major
>  Labels: optimization
>
> [SIMD|http://en.wikipedia.org/wiki/SIMD] instuctions could be found in most 
> of current CPUs, such as Intel's SSE2, SSE3, SSE4.x, AVX and AVX2, and it 
> would help Hive to outperform if we can vectorize the mathematical 
> manipulation part of Hive. This umbrella JIRA may contains but not limited to 
> the subtasks like:
>  # Code schema adaption, current JVM is quite strictly on the code schema 
> which could be transformed into SIMD instructions during execution.
>  # New implementation of mathematical manipulation part of Hive which 
> designed to be optimized for SIMD instructions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17626) Query reoptimization using cached runtime statistics

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385055#comment-16385055
 ] 

Hive QA commented on HIVE-17626:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12912917/HIVE-17626.07A.patch

{color:green}SUCCESS:{color} +1 due to 14 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 97 failed/errored test(s), 13474 tests 
executed
*Failed tests:*
{noformat}
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=94)


[jira] [Commented] (HIVE-17626) Query reoptimization using cached runtime statistics

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385047#comment-16385047
 ] 

Hive QA commented on HIVE-17626:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
31s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
33s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
6s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} The patch common passed checkstyle {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m 
10s{color} | {color:red} root: The patch generated 91 new + 1955 unchanged - 
166 fixed = 2046 total (was 2121) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} druid-handler: The patch generated 0 new + 1 
unchanged - 1 fixed = 1 total (was 2) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} itests/util: The patch generated 0 new + 103 
unchanged - 1 fixed = 103 total (was 104) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
48s{color} | {color:red} ql: The patch generated 91 new + 1422 unchanged - 163 
fixed = 1513 total (was 1585) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
15s{color} | {color:red} standalone-metastore: The patch generated 1 new + 3 
unchanged - 2 fixed = 4 total (was 5) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
8s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
12s{color} | {color:red} The patch generated 49 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  
xml  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-9474/dev-support/hive-personality.sh
 |
| git revision | master / 05d4719 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9474/yetus/diff-checkstyle-root.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9474/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9474/yetus/diff-checkstyle-standalone-metastore.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9474/yetus/whitespace-eol.txt 
|
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9474/yetus/patch-asflicense-problems.txt
 |
| modules | C: common . druid-handler 

[jira] [Commented] (HIVE-17626) Query reoptimization using cached runtime statistics

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385034#comment-16385034
 ] 

Hive QA commented on HIVE-17626:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12912915/HIVE-17626.07B.patch

{color:green}SUCCESS:{color} +1 due to 14 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 13077 tests 
executed
*Failed tests:*
{noformat}
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=94)


[jira] [Commented] (HIVE-17626) Query reoptimization using cached runtime statistics

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385032#comment-16385032
 ] 

Hive QA commented on HIVE-17626:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
43s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} The patch common passed checkstyle {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
8s{color} | {color:red} root: The patch generated 91 new + 1955 unchanged - 166 
fixed = 2046 total (was 2121) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} druid-handler: The patch generated 0 new + 1 
unchanged - 1 fixed = 1 total (was 2) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} itests/util: The patch generated 0 new + 103 
unchanged - 1 fixed = 103 total (was 104) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
52s{color} | {color:red} ql: The patch generated 91 new + 1422 unchanged - 163 
fixed = 1513 total (was 1585) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
15s{color} | {color:red} standalone-metastore: The patch generated 1 new + 3 
unchanged - 2 fixed = 4 total (was 5) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
12s{color} | {color:red} The patch generated 49 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  
xml  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-9473/dev-support/hive-personality.sh
 |
| git revision | master / 05d4719 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9473/yetus/diff-checkstyle-root.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9473/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9473/yetus/diff-checkstyle-standalone-metastore.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9473/yetus/whitespace-eol.txt 
|
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-9473/yetus/patch-asflicense-problems.txt
 |
| modules | C: common . druid-handler 

[jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385027#comment-16385027
 ] 

Alexander Kolbasov commented on HIVE-18743:
---

[~kgyrtkirk] Added unit tests.

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.05.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Kolbasov updated HIVE-18743:
--
Status: Patch Available  (was: Open)

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.1.0, 1.2.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.05.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Kolbasov updated HIVE-18743:
--
Status: Open  (was: Patch Available)

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.1.0, 1.2.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.05.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Kolbasov updated HIVE-18743:
--
Attachment: (was: HIVE-18743.04.patch)

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.05.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

2018-03-04 Thread Alexander Kolbasov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Kolbasov updated HIVE-18743:
--
Attachment: HIVE-18743.05.patch

> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround 
> is buggy.
> ---
>
> Key: HIVE-18743
> URL: https://issues.apache.org/jira/browse/HIVE-18743
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>Reporter: Alexander Behm
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-18743.05.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the 
> table directory to populate basic stats like file counts and sizes. This file 
> listing operation can be very expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is 
> intended to selectively prevent this stats collection. Unfortunately, this 
> table property is checked *after* the expensive file listing operation, so 
> the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, 
> Warehouse wh,
>  boolean madeDir, boolean 
> forceRecompute, EnvironmentContext environmentContext) throws MetaException {
> if (tbl.getPartitionKeysSize() == 0) {
>   // Update stats only when unpartitioned
>   FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, 
> tbl);
>   return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, 
> environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after 
> wh.getFileStatusesForUnpartitionedTable() has already been called
> } else {
>   return false;
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17626) Query reoptimization using cached runtime statistics

2018-03-04 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-17626:

Attachment: (was: HIVE-17626.07A.patch)

> Query reoptimization using cached runtime statistics
> 
>
> Key: HIVE-17626
> URL: https://issues.apache.org/jira/browse/HIVE-17626
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-17626.01.patch, HIVE-17626.01wip01.patch, 
> HIVE-17626.02.patch, HIVE-17626.03.patch, HIVE-17626.04.patch, 
> HIVE-17626.05.patch, HIVE-17626.06.patch, HIVE-17626.07A.patch, 
> HIVE-17626.07B.patch, runtimestats.patch
>
>
> Something similar to "EXPLAIN ANALYZE" where we annotate explain plan with 
> actual and estimated statistics. The runtime stats can be cached at query 
> level and subsequent execution of the same query can make use of the cached 
> statistics from the previous run for better optimization. 
> Some use cases,
> 1) re-planning join query (mapjoin failures can be converted to shuffle joins)
> 2) better statistics for table scan operator if dynamic partition pruning is 
> involved
> 3) Better estimates for bloom filter initialization (setting expected entries 
> during merge)
> This can extended to support wider queries by caching fragments of operator 
> plans scanning same table(s) or matching some operator sequences.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17626) Query reoptimization using cached runtime statistics

2018-03-04 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-17626:

Attachment: HIVE-17626.07A.patch

> Query reoptimization using cached runtime statistics
> 
>
> Key: HIVE-17626
> URL: https://issues.apache.org/jira/browse/HIVE-17626
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-17626.01.patch, HIVE-17626.01wip01.patch, 
> HIVE-17626.02.patch, HIVE-17626.03.patch, HIVE-17626.04.patch, 
> HIVE-17626.05.patch, HIVE-17626.06.patch, HIVE-17626.07A.patch, 
> HIVE-17626.07B.patch, runtimestats.patch
>
>
> Something similar to "EXPLAIN ANALYZE" where we annotate explain plan with 
> actual and estimated statistics. The runtime stats can be cached at query 
> level and subsequent execution of the same query can make use of the cached 
> statistics from the previous run for better optimization. 
> Some use cases,
> 1) re-planning join query (mapjoin failures can be converted to shuffle joins)
> 2) better statistics for table scan operator if dynamic partition pruning is 
> involved
> 3) Better estimates for bloom filter initialization (setting expected entries 
> during merge)
> This can extended to support wider queries by caching fragments of operator 
> plans scanning same table(s) or matching some operator sequences.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18726) Implement DEFAULT constraint

2018-03-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385016#comment-16385016
 ] 

Hive QA commented on HIVE-18726:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12912908/HIVE-18726.6.patch

{color:green}SUCCESS:{color} +1 due to 16 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 13062 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=91)

[infer_bucket_sort_num_buckets.q,infer_bucket_sort_reducers_power_two.q,parallel_orderby.q,bucket_num_reducers_acid.q,infer_bucket_sort_map_operators.q,infer_bucket_sort_merge.q,root_dir_external_table.q,infer_bucket_sort_dyn_part.q,udf_using.q,bucket_num_reducers_acid2.q]
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=93)


[jira] [Updated] (HIVE-17626) Query reoptimization using cached runtime statistics

2018-03-04 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-17626:

Attachment: HIVE-17626.07B.patch
HIVE-17626.07A.patch

> Query reoptimization using cached runtime statistics
> 
>
> Key: HIVE-17626
> URL: https://issues.apache.org/jira/browse/HIVE-17626
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-17626.01.patch, HIVE-17626.01wip01.patch, 
> HIVE-17626.02.patch, HIVE-17626.03.patch, HIVE-17626.04.patch, 
> HIVE-17626.05.patch, HIVE-17626.06.patch, HIVE-17626.07A.patch, 
> HIVE-17626.07B.patch, runtimestats.patch
>
>
> Something similar to "EXPLAIN ANALYZE" where we annotate explain plan with 
> actual and estimated statistics. The runtime stats can be cached at query 
> level and subsequent execution of the same query can make use of the cached 
> statistics from the previous run for better optimization. 
> Some use cases,
> 1) re-planning join query (mapjoin failures can be converted to shuffle joins)
> 2) better statistics for table scan operator if dynamic partition pruning is 
> involved
> 3) Better estimates for bloom filter initialization (setting expected entries 
> during merge)
> This can extended to support wider queries by caching fragments of operator 
> plans scanning same table(s) or matching some operator sequences.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)