[jira] [Commented] (HIVE-13817) Allow DNS CNAME ALIAS Resolution from apache hive beeline JDBC URL to allow for failover

2017-09-01 Thread Vijay Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151417#comment-16151417
 ] 

Vijay Singh commented on HIVE-13817:


Will rebase the patch as asked. Next ETA for patch is 09/03/17

> Allow DNS CNAME ALIAS Resolution from apache hive beeline JDBC URL to allow 
> for failover
> 
>
> Key: HIVE-13817
> URL: https://issues.apache.org/jira/browse/HIVE-13817
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 1.2.1
>Reporter: Vijay Singh
> Attachments: HIVE-13817.1.patch, HIVE-13817.2.patch, 
> HIVE-13817.3.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, in case of BDR clusters, DNS CNAME alias based connections fail. 
> As _HOST resolves to exact endpoint specified in connection string and that 
> may not be intended SPN for kerberos based on reverse DNS lookup. 
> Consequently this JIRA proposes that client specific setting be used to 
> resolv _HOST from CNAME DNS alias to A record entry on the fly in beeline.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17368) DBTokenStore fails to connect in Kerberos enabled remote HMS environment

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151416#comment-16151416
 ] 

Hive QA commented on HIVE-17368:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885052/HIVE-17368.03-branch-2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6654/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6654/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6654/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-09-02 05:39:27.367
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6654/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z branch-2 ]]
+ [[ -d apache-github-branch-2-source ]]
+ [[ ! -d apache-github-branch-2-source/.git ]]
+ [[ ! -d apache-github-branch-2-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-09-02 05:39:27.370
+ cd apache-github-branch-2-source
+ git fetch origin
>From https://github.com/apache/hive
   588148d..76933e7  branch-2   -> origin/branch-2
   5a62503..714d7cf  branch-2.1 -> origin/branch-2.1
   120476d..b2e7d5e  branch-2.2 -> origin/branch-2.2
   6f4c35c..dee0a20  branch-2.3 -> origin/branch-2.3
   6be50b7..d155565  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 588148d HIVE-17327 : ADDENDUM (revert a small part of the patch 
to fix the test) (Sergey Shelukhin)
+ git clean -f -d
+ git checkout branch-2
Already on 'branch-2'
Your branch is behind 'origin/branch-2' by 2 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/branch-2
HEAD is now at 76933e7 HIVE-17411 : LLAP IO may incorrectly release a refcount 
in some rare cases (Sergey Shelukhin, reviewed by Prasanth Jayachandran)
+ git merge --ff-only origin/branch-2
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-09-02 05:39:33.815
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p1
patching file 
itests/hive-minikdc/src/test/java/org/apache/hive/minikdc/MiniHiveKdc.java
patching file 
itests/hive-minikdc/src/test/java/org/apache/hive/minikdc/TestJdbcWithDBTokenStore.java
patching file 
itests/hive-unit-hadoop2/src/test/java/org/apache/hadoop/hive/thrift/TestHadoopAuthBridge23.java
patching file 
itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java
patching file ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
patching file 
service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java
patching file 
shims/common/src/main/java/org/apache/hadoop/hive/thrift/DBTokenStore.java
patching file 
shims/common/src/main/java/org/apache/hadoop/hive/thrift/DelegationTokenSecretManager.java
patching file 
shims/common/src/main/java/org/apache/hadoop/hive/thrift/HiveDelegationTokenManager.java
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
[ERROR] COMPILATION ERROR : 
[ERROR] 
/data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/thrift/HiveDelegationTokenManager.java:[124,25]
 cannot find symbol
  symbol:   method getDSeelegationToken(java.lang.String,java.lang.String)
  location: variable secretManager of type 
org.apache.hadoop.hive.thrift.DelegationTokenSecretManager
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hive-shims-common: Compilation failure
[ERROR] 
/data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/thrift/HiveDelegationTokenManager.java:[124,25]
 

[jira] [Commented] (HIVE-17431) change configuration handling in TezSessionState

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151415#comment-16151415
 ] 

Hive QA commented on HIVE-17431:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885036/HIVE-17431.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11031 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=239)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6653/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6653/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6653/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885036 - PreCommit-HIVE-Build

> change configuration handling in TezSessionState
> 
>
> Key: HIVE-17431
> URL: https://issues.apache.org/jira/browse/HIVE-17431
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17431.patch
>
>
> The configuration is only set when opening the session; that seems 
> unnecessary - it could be set in the ctor and made final. E.g. when updating 
> the session and localizing new resources we may theoretically open the 
> session with a new config, but we don't update the config and only update the 
> files if the session is already open, which seems to imply that it's ok to 
> not update the config. 
> In most cases, the session is opened only once or reopened without intending 
> to change the config (e.g. if it times out).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151411#comment-16151411
 ] 

Vineet Garg commented on HIVE-16811:


Failures are not reproducible. Pushing the patch to master

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.18.patch, HIVE-16811.19.patch, HIVE-16811.1.patch, 
> HIVE-16811.2.patch, HIVE-16811.3.patch, HIVE-16811.4.patch, 
> HIVE-16811.5.patch, HIVE-16811.6.patch, HIVE-16811.7.patch, 
> HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151406#comment-16151406
 ] 

Hive QA commented on HIVE-16811:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885034/HIVE-16811.18.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11032 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_predicate_pushdown]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[parquet_predicate_pushdown]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6652/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6652/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6652/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885034 - PreCommit-HIVE-Build

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.18.patch, HIVE-16811.19.patch, HIVE-16811.1.patch, 
> HIVE-16811.2.patch, HIVE-16811.3.patch, HIVE-16811.4.patch, 
> HIVE-16811.5.patch, HIVE-16811.6.patch, HIVE-16811.7.patch, 
> HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Attachment: HIVE-16811.19.patch

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.18.patch, HIVE-16811.19.patch, HIVE-16811.1.patch, 
> HIVE-16811.2.patch, HIVE-16811.3.patch, HIVE-16811.4.patch, 
> HIVE-16811.5.patch, HIVE-16811.6.patch, HIVE-16811.7.patch, 
> HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Patch Available  (was: Open)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.18.patch, HIVE-16811.19.patch, HIVE-16811.1.patch, 
> HIVE-16811.2.patch, HIVE-16811.3.patch, HIVE-16811.4.patch, 
> HIVE-16811.5.patch, HIVE-16811.6.patch, HIVE-16811.7.patch, 
> HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Open  (was: Patch Available)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.18.patch, HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch, HIVE-16811.4.patch, HIVE-16811.5.patch, 
> HIVE-16811.6.patch, HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151393#comment-16151393
 ] 

Hive QA commented on HIVE-16811:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885034/HIVE-16811.18.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11018 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_predicate_pushdown]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[parquet_predicate_pushdown]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=101)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6651/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6651/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6651/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885034 - PreCommit-HIVE-Build

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.18.patch, HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch, HIVE-16811.4.patch, HIVE-16811.5.patch, 
> HIVE-16811.6.patch, HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17368) DBTokenStore fails to connect in Kerberos enabled remote HMS environment

2017-09-01 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17368:
---
Attachment: HIVE-17368.03-branch-2.patch

Attaching the 3rd version of the patch. This removes the doAs to get the 
delegationToken. The doAs is not necessary since when impersonation is ON the 
current user is already set to impersonated user. Creating a proxy user to get 
the delegation might not work in all the cases (eg. when tokenstore is DB) 
since it needs to establish a authenticated (Kerberos or using 
HMSDelegationToken + Digest) transport to make the connection to HMS.

> DBTokenStore fails to connect in Kerberos enabled remote HMS environment
> 
>
> Key: HIVE-17368
> URL: https://issues.apache.org/jira/browse/HIVE-17368
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.0.0, 2.1.0, 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17368.01-branch-2.patch, HIVE-17368.01.patch, 
> HIVE-17368.02-branch-2.patch, HIVE-17368.03-branch-2.patch
>
>
> In setups where HMS is running as a remote process secured using Kerberos, 
> and when {{DBTokenStore}} is configured as the token store, the HS2 Thrift 
> API calls like {{GetDelegationToken}}, {{CancelDelegationToken}} and 
> {{RenewDelegationToken}} fail with exception trace seen below. HS2 is not 
> able to invoke HMS APIs needed to add/remove/renew tokens from the DB since 
> it is possible that the user which is issue the {{GetDelegationToken}} is not 
> kerberos enabled.
> Eg. Oozie submits a job on behalf of user "Joe". When Oozie opens a session 
> with HS2 it uses Oozie's principal and creates a proxy UGI with Hive. This 
> principal can establish a transport authenticated using Kerberos. It stores 
> the HMS delegation token string in the sessionConf and sessionToken. Now, 
> lets say Oozie issues a {{GetDelegationToken}} which has {{Joe}} as the owner 
> and {{oozie}} as the renewer in {{GetDelegationTokenReq}}. This API call 
> cannot instantiate a HMSClient and open transport to HMS using the HMSToken 
> string available in the sessionConf, since DBTokenStore uses server HiveConf 
> instead of sessionConf. It tries to establish transport using Kerberos and it 
> fails since user Joe is not Kerberos enabled.
> I see the following exception trace in HS2 logs.
> {noformat}
> 2017-08-21T18:07:19,644 ERROR [HiveServer2-Handler-Pool: Thread-61] 
> transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_121]
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_121]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_121]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  [hadoop-common-2.7.2.jar:?]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:488)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:255)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
>  [hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method) ~[?:1.8.0_121]
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  [?:1.8.0_121]
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  [?:1.8.0_121]
> at 

[jira] [Commented] (HIVE-17417) LazySimple Timestamp and Date serialization is very expensive

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151372#comment-16151372
 ] 

Hive QA commented on HIVE-17417:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885025/HIVE-17417.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 194 failed/errored test(s), 11031 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.accumulo.mr.TestHiveAccumuloTypes.testUtf8Types 
(batchId=176)
org.apache.hadoop.hive.accumulo.predicate.TestAccumuloRangeGenerator.testDateRangeConjunction
 (batchId=176)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status]
 (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status_disable_bitvector]
 (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_date] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[char_cast] (batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[compute_stats_date] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_type] 
(batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas_date] (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_1] (batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_2] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_3] (batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_4] (batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_join1] (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_serde] (batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_udf] (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[floor_time] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fm-sketch] (batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[hll] (batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_non_partitioned]
 (batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_partitioned]
 (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_alt] (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_arithmetic] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[json_serde1] (batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[json_serde_tsformat] 
(batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_text] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[localtimezone] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_int_type_promotion] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge11] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge5] (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge6] (batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge_incompat1] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge_incompat2] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_min_max] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_ppd_exception] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_mixed_partition_formats]
 (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_char] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_date] 
(batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_timestamp] 
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_varchar] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_types_non_dictionary_encoding_vectorization]
 (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_types_vectorization]
 (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_date2] 
(batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_date] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_timestamp2] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_timestamp] 

[jira] [Commented] (HIVE-17426) Execution framework in hive to run tasks in parallel other than MR Tasks

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151354#comment-16151354
 ] 

Hive QA commented on HIVE-17426:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885016/HIVE-17426.0.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 11031 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parallel_colstats] 
(batchId=32)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parallel_colstats]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.ql.optimizer.TestGenMapRedUtilsCreateConditionalTask.testConditionalMoveOnHdfsIsNotOptimized
 (batchId=270)
org.apache.hadoop.hive.ql.optimizer.TestGenMapRedUtilsCreateConditionalTask.testConditionalMoveTaskIsNotOptimized
 (batchId=270)
org.apache.hadoop.hive.ql.optimizer.TestGenMapRedUtilsCreateConditionalTask.testConditionalMoveTaskIsOptimized
 (batchId=270)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValid (batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg (batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeProxyAuth 
(batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth 
(batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=240)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6649/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6649/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6649/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885016 - PreCommit-HIVE-Build

> Execution framework in hive to run tasks in parallel other than MR Tasks
> 
>
> Key: HIVE-17426
> URL: https://issues.apache.org/jira/browse/HIVE-17426
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-17426.0.patch
>
>
> the execution framework currently only runs MR Tasks in parallel when {{set 
> hive.exec.parallel=true}}.
> Allow other types of tasks to run in parallel as well to support replication 
> scenarios in hive. TezTask / SparkTask will still not be allowed to run in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17432) Enable join and aggregate materialized view rewriting

2017-09-01 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17432 started by Jesus Camacho Rodriguez.
--
> Enable join and aggregate materialized view rewriting
> -
>
> Key: HIVE-17432
> URL: https://issues.apache.org/jira/browse/HIVE-17432
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Enable Calcite materialized view based rewriting for queries containing joins 
> and aggregates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17432) Enable join and aggregate materialized view rewriting

2017-09-01 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-17432:
--


> Enable join and aggregate materialized view rewriting
> -
>
> Key: HIVE-17432
> URL: https://issues.apache.org/jira/browse/HIVE-17432
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Enable Calcite materialized view based rewriting for queries containing joins 
> and aggregates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17417) LazySimple Timestamp and Date serialization is very expensive

2017-09-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151338#comment-16151338
 ] 

Sergey Shelukhin edited comment on HIVE-17417 at 9/2/17 12:30 AM:
--

Looks like all the date tests failed... lots of one-day diffs, and some that 
are larger.


was (Author: sershe):
Looks like all the date tests failed

> LazySimple Timestamp and Date serialization is very expensive
> -
>
> Key: HIVE-17417
> URL: https://issues.apache.org/jira/browse/HIVE-17417
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: date-serialize.png, HIVE-17417.1.patch, 
> HIVE-17417.2.patch, timestamp-serialize.png, ts-jmh-perf.png
>
>
> In a specific case where a schema contains array with timestamp and 
> date fields (array size >1). Any access to this column very very 
> expensive in terms of CPU as most of the time is serialization of timestamp 
> and date. Refer attached profiles. >70% time spent in serialization + 
> tostring conversions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17417) LazySimple Timestamp and Date serialization is very expensive

2017-09-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151338#comment-16151338
 ] 

Sergey Shelukhin commented on HIVE-17417:
-

Looks like all the date tests failed

> LazySimple Timestamp and Date serialization is very expensive
> -
>
> Key: HIVE-17417
> URL: https://issues.apache.org/jira/browse/HIVE-17417
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: date-serialize.png, HIVE-17417.1.patch, 
> HIVE-17417.2.patch, timestamp-serialize.png, ts-jmh-perf.png
>
>
> In a specific case where a schema contains array with timestamp and 
> date fields (array size >1). Any access to this column very very 
> expensive in terms of CPU as most of the time is serialization of timestamp 
> and date. Refer attached profiles. >70% time spent in serialization + 
> tostring conversions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17417) LazySimple Timestamp and Date serialization is very expensive

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151314#comment-16151314
 ] 

Hive QA commented on HIVE-17417:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885025/HIVE-17417.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 195 failed/errored test(s), 11031 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.accumulo.mr.TestHiveAccumuloTypes.testUtf8Types 
(batchId=176)
org.apache.hadoop.hive.accumulo.predicate.TestAccumuloRangeGenerator.testDateRangeConjunction
 (batchId=176)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status]
 (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status_disable_bitvector]
 (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_date] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[char_cast] (batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[compute_stats_date] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_type] 
(batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas_date] (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_1] (batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_2] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_3] (batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_4] (batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_join1] (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_serde] (batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_udf] (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[floor_time] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fm-sketch] (batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[hll] (batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_non_partitioned]
 (batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_partitioned]
 (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_alt] (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_arithmetic] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[json_serde1] (batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[json_serde_tsformat] 
(batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_text] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[localtimezone] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_int_type_promotion] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge11] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge5] (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge6] (batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge_incompat1] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge_incompat2] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_min_max] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_ppd_exception] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_mixed_partition_formats]
 (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_char] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_date] 
(batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_timestamp] 
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_varchar] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_types_non_dictionary_encoding_vectorization]
 (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_types_vectorization]
 (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_date2] 
(batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_date] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_timestamp2] 

[jira] [Commented] (HIVE-17429) Hive JDBC doesn't return rows when querying Impala

2017-09-01 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151312#comment-16151312
 ] 

Aihua Xu commented on HIVE-17429:
-

[~zamsden] Thanks for the patch. Can you rename the patch name to 
HIVE-17429.1.patch to trigger the pre-commit build? 

> Hive JDBC doesn't return rows when querying Impala
> --
>
> Key: HIVE-17429
> URL: https://issues.apache.org/jira/browse/HIVE-17429
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: Zach Amsden
>Assignee: Zach Amsden
> Attachments: hive-resultset.patch
>
>
> The Hive JDBC driver used to return a result set when querying Impala.  Now, 
> instead, it gets data back but interprets the data as query logs instead of a 
> resultSet.  This causes many issues (we see complaints about beeline as well 
> as test failures).
> This appears to be a regression introduced with asynchronous operation 
> against Hive.
> Ideally, we could make both behaviors work.  I have a simple patch that 
> should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17431) change configuration handling in TezSessionState

2017-09-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17431:

Description: 
The configuration is only set when opening the session; that seems unnecessary 
- it could be set in the ctor and made final. E.g. when updating the session 
and localizing new resources we may theoretically open the session with a new 
config, but we don't update the config and only update the files if the session 
is already open, which seems to imply that it's ok to not update the config. 
In most cases, the session is opened only once or reopened without intending to 
change the config (e.g. if it times out).

  was:
The configuration is only set when opening the session; that seems unnecessary. 
E.g. when updating the session and localizing new resources we may 
theoretically open the session with a new config, but we don't update the 
config and only update the files if the session is already open, which seems to 
imply that it's ok to not update the config. 
In most cases, the session is opened only once or reopened without intending to 
change the config (e.g. if it times out).


> change configuration handling in TezSessionState
> 
>
> Key: HIVE-17431
> URL: https://issues.apache.org/jira/browse/HIVE-17431
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17431.patch
>
>
> The configuration is only set when opening the session; that seems 
> unnecessary - it could be set in the ctor and made final. E.g. when updating 
> the session and localizing new resources we may theoretically open the 
> session with a new config, but we don't update the config and only update the 
> files if the session is already open, which seems to imply that it's ok to 
> not update the config. 
> In most cases, the session is opened only once or reopened without intending 
> to change the config (e.g. if it times out).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17431) change configuration handling in TezSessionState

2017-09-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17431:

Status: Patch Available  (was: Open)

> change configuration handling in TezSessionState
> 
>
> Key: HIVE-17431
> URL: https://issues.apache.org/jira/browse/HIVE-17431
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17431.patch
>
>
> The configuration is only set when opening the session; that seems 
> unnecessary. E.g. when updating the session and localizing new resources we 
> may theoretically open the session with a new config, but we don't update the 
> config and only update the files if the session is already open, which seems 
> to imply that it's ok to not update the config. 
> In most cases, the session is opened only once or reopened without intending 
> to change the config (e.g. if it times out).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151293#comment-16151293
 ] 

Ashutosh Chauhan commented on HIVE-16811:
-

+1 
Lets address rest of issue in a follow-up.

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.18.patch, HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch, HIVE-16811.4.patch, HIVE-16811.5.patch, 
> HIVE-16811.6.patch, HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17431) change configuration handling in TezSessionState

2017-09-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151291#comment-16151291
 ] 

Sergey Shelukhin commented on HIVE-17431:
-

[~sseth] can you please take a look? https://reviews.apache.org/r/62048/ 
Just to make sure there isn't some hidden behavior that I'm missing. The tests 
appear to pass.

> change configuration handling in TezSessionState
> 
>
> Key: HIVE-17431
> URL: https://issues.apache.org/jira/browse/HIVE-17431
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17431.patch
>
>
> The configuration is only set when opening the session; that seems 
> unnecessary. E.g. when updating the session and localizing new resources we 
> may theoretically open the session with a new config, but we don't update the 
> config and only update the files if the session is already open, which seems 
> to imply that it's ok to not update the config. 
> In most cases, the session is opened only once or reopened without intending 
> to change the config (e.g. if it times out).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17431) change configuration handling in TezSessionState

2017-09-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17431:

Attachment: HIVE-17431.patch

> change configuration handling in TezSessionState
> 
>
> Key: HIVE-17431
> URL: https://issues.apache.org/jira/browse/HIVE-17431
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17431.patch
>
>
> The configuration is only set when opening the session; that seems 
> unnecessary. E.g. when updating the session and localizing new resources we 
> may theoretically open the session with a new config, but we don't update the 
> config and only update the files if the session is already open, which seems 
> to imply that it's ok to not update the config. 
> In most cases, the session is opened only once or reopened without intending 
> to change the config (e.g. if it times out).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17431) change configuration handling in TezSessionState

2017-09-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-17431:
---

Assignee: Sergey Shelukhin

> change configuration handling in TezSessionState
> 
>
> Key: HIVE-17431
> URL: https://issues.apache.org/jira/browse/HIVE-17431
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> The configuration is only set when opening the session; that seems 
> unnecessary. E.g. when updating the session and localizing new resources we 
> may theoretically open the session with a new config, but we don't update the 
> config and only update the files if the session is already open, which seems 
> to imply that it's ok to not update the config. 
> In most cases, the session is opened only once or reopened without intending 
> to change the config (e.g. if it times out).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16994) Support connection pooling for HiveMetaStoreClient

2017-09-01 Thread Alexander Kolbasov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16994 started by Alexander Kolbasov.
-
> Support connection pooling for HiveMetaStoreClient
> --
>
> Key: HIVE-16994
> URL: https://issues.apache.org/jira/browse/HIVE-16994
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Alexander Kolbasov
>Assignee: Alexander Kolbasov
>
> The native {{HiveMetaStoreClient}} doesn't support connection pooling. I 
> think it would be a very useful feature, especially in Kerberos environments 
> where connection establishment may be especially expensive. 
> A similar feature is now supported in Sentry - see SENTRY-1580.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16994) Support connection pooling for HiveMetaStoreClient

2017-09-01 Thread Alexander Kolbasov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Kolbasov reassigned HIVE-16994:
-

Assignee: Alexander Kolbasov

> Support connection pooling for HiveMetaStoreClient
> --
>
> Key: HIVE-16994
> URL: https://issues.apache.org/jira/browse/HIVE-16994
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Alexander Kolbasov
>Assignee: Alexander Kolbasov
>
> The native {{HiveMetaStoreClient}} doesn't support connection pooling. I 
> think it would be a very useful feature, especially in Kerberos environments 
> where connection establishment may be especially expensive. 
> A similar feature is now supported in Sentry - see SENTRY-1580.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Patch Available  (was: Open)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.18.patch, HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch, HIVE-16811.4.patch, HIVE-16811.5.patch, 
> HIVE-16811.6.patch, HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Open  (was: Patch Available)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.18.patch, HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch, HIVE-16811.4.patch, HIVE-16811.5.patch, 
> HIVE-16811.6.patch, HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Attachment: HIVE-16811.18.patch

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.18.patch, HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch, HIVE-16811.4.patch, HIVE-16811.5.patch, 
> HIVE-16811.6.patch, HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151256#comment-16151256
 ] 

Hive QA commented on HIVE-16811:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885000/HIVE-16811.16.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 11032 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_smb_mapjoin_14]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_10]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_9]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez2]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_7]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization2]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[semijoin] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_mapjoin_14]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_mapjoin_6]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_auto_smb_mapjoin_14]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_leftsemi_mapjoin]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_multi_insert]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=99)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vector_non_string_partition]
 (batchId=99)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6647/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6647/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6647/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885000 - PreCommit-HIVE-Build

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.1.patch, HIVE-16811.2.patch, HIVE-16811.3.patch, 
> HIVE-16811.4.patch, HIVE-16811.5.patch, HIVE-16811.6.patch, 
> HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
>  

[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Open  (was: Patch Available)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.1.patch, HIVE-16811.2.patch, HIVE-16811.3.patch, 
> HIVE-16811.4.patch, HIVE-16811.5.patch, HIVE-16811.6.patch, 
> HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Patch Available  (was: Open)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.1.patch, HIVE-16811.2.patch, HIVE-16811.3.patch, 
> HIVE-16811.4.patch, HIVE-16811.5.patch, HIVE-16811.6.patch, 
> HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Attachment: HIVE-16811.17.patch

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.17.patch, 
> HIVE-16811.1.patch, HIVE-16811.2.patch, HIVE-16811.3.patch, 
> HIVE-16811.4.patch, HIVE-16811.5.patch, HIVE-16811.6.patch, 
> HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17430) Add LOAD DATA test for blobstores

2017-09-01 Thread Yuzhou Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuzhou Sun updated HIVE-17430:
--
Attachment: HIVE-17430.patch

Attached patch

> Add LOAD DATA test for blobstores
> -
>
> Key: HIVE-17430
> URL: https://issues.apache.org/jira/browse/HIVE-17430
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Yuzhou Sun
>Assignee: Yuzhou Sun
> Attachments: HIVE-17430.patch
>
>
> This patch introduces load_data.q regression tests into the hive-blobstore 
> qtest module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17430) Add LOAD DATA test for blobstores

2017-09-01 Thread Yuzhou Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuzhou Sun reassigned HIVE-17430:
-


> Add LOAD DATA test for blobstores
> -
>
> Key: HIVE-17430
> URL: https://issues.apache.org/jira/browse/HIVE-17430
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Yuzhou Sun
>Assignee: Yuzhou Sun
>
> This patch introduces load_data.q regression tests into the hive-blobstore 
> qtest module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17429) Hive JDBC doesn't return rows when querying Impala

2017-09-01 Thread Zach Amsden (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach Amsden updated HIVE-17429:
---
Attachment: hive-resultset.patch

> Hive JDBC doesn't return rows when querying Impala
> --
>
> Key: HIVE-17429
> URL: https://issues.apache.org/jira/browse/HIVE-17429
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: Zach Amsden
>Assignee: Zach Amsden
> Attachments: hive-resultset.patch
>
>
> The Hive JDBC driver used to return a result set when querying Impala.  Now, 
> instead, it gets data back but interprets the data as query logs instead of a 
> resultSet.  This causes many issues (we see complaints about beeline as well 
> as test failures).
> This appears to be a regression introduced with asynchronous operation 
> against Hive.
> Ideally, we could make both behaviors work.  I have a simple patch that 
> should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17417) LazySimple Timestamp and Date serialization is very expensive

2017-09-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151197#comment-16151197
 ] 

Sergey Shelukhin commented on HIVE-17417:
-

+1

> LazySimple Timestamp and Date serialization is very expensive
> -
>
> Key: HIVE-17417
> URL: https://issues.apache.org/jira/browse/HIVE-17417
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: date-serialize.png, HIVE-17417.1.patch, 
> HIVE-17417.2.patch, timestamp-serialize.png, ts-jmh-perf.png
>
>
> In a specific case where a schema contains array with timestamp and 
> date fields (array size >1). Any access to this column very very 
> expensive in terms of CPU as most of the time is serialization of timestamp 
> and date. Refer attached profiles. >70% time spent in serialization + 
> tostring conversions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151193#comment-16151193
 ] 

Hive QA commented on HIVE-15665:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884992/HIVE-15665.10.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11023 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[rcfile_format_nonpart]
 (batchId=242)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking8 
(batchId=282)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6646/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6646/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6646/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884992 - PreCommit-HIVE-Build

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, 
> HIVE-15665.06.patch, HIVE-15665.07.patch, HIVE-15665.08.patch, 
> HIVE-15665.09.patch, HIVE-15665.10.patch, HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17417) LazySimple Timestamp and Date serialization is very expensive

2017-09-01 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17417:
-
Attachment: HIVE-17417.2.patch

Changed DateWritable.toString(). TestDateWritable tests seems to pass which 
tests with all timezones. Also in JMH this is 1.7x faster.

> LazySimple Timestamp and Date serialization is very expensive
> -
>
> Key: HIVE-17417
> URL: https://issues.apache.org/jira/browse/HIVE-17417
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: date-serialize.png, HIVE-17417.1.patch, 
> HIVE-17417.2.patch, timestamp-serialize.png, ts-jmh-perf.png
>
>
> In a specific case where a schema contains array with timestamp and 
> date fields (array size >1). Any access to this column very very 
> expensive in terms of CPU as most of the time is serialization of timestamp 
> and date. Refer attached profiles. >70% time spent in serialization + 
> tostring conversions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17417) LazySimple Timestamp and Date serialization is very expensive

2017-09-01 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151175#comment-16151175
 ] 

Prasanth Jayachandran commented on HIVE-17417:
--

[~sershe] Can you please take a look at DateWritable changes as well?

> LazySimple Timestamp and Date serialization is very expensive
> -
>
> Key: HIVE-17417
> URL: https://issues.apache.org/jira/browse/HIVE-17417
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: date-serialize.png, HIVE-17417.1.patch, 
> HIVE-17417.2.patch, timestamp-serialize.png, ts-jmh-perf.png
>
>
> In a specific case where a schema contains array with timestamp and 
> date fields (array size >1). Any access to this column very very 
> expensive in terms of CPU as most of the time is serialization of timestamp 
> and date. Refer attached profiles. >70% time spent in serialization + 
> tostring conversions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17429) Hive JDBC doesn't return rows when querying Impala

2017-09-01 Thread Zach Amsden (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach Amsden reassigned HIVE-17429:
--


> Hive JDBC doesn't return rows when querying Impala
> --
>
> Key: HIVE-17429
> URL: https://issues.apache.org/jira/browse/HIVE-17429
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: Zach Amsden
>Assignee: Zach Amsden
>
> The Hive JDBC driver used to return a result set when querying Impala.  Now, 
> instead, it gets data back but interprets the data as query logs instead of a 
> resultSet.  This causes many issues (we see complaints about beeline as well 
> as test failures).
> This appears to be a regression introduced with asynchronous operation 
> against Hive.
> Ideally, we could make both behaviors work.  I have a simple patch that 
> should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17426) Execution framework in hive to run tasks in parallel other than MR Tasks

2017-09-01 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-17426:
---
Status: Patch Available  (was: Open)

> Execution framework in hive to run tasks in parallel other than MR Tasks
> 
>
> Key: HIVE-17426
> URL: https://issues.apache.org/jira/browse/HIVE-17426
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-17426.0.patch
>
>
> the execution framework currently only runs MR Tasks in parallel when {{set 
> hive.exec.parallel=true}}.
> Allow other types of tasks to run in parallel as well to support replication 
> scenarios in hive. TezTask / SparkTask will still not be allowed to run in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-09-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17297:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master after testing in the cluster and fixing an issue with 
meta-inf (incorrect SecurityInfo class).
Thanks for the reviews!

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-17297.01.nogen.patch, HIVE-17297.01.patch, 
> HIVE-17297.02.patch, HIVE-17297.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17426) Execution framework in hive to run tasks in parallel other than MR Tasks

2017-09-01 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-17426:
---
Attachment: HIVE-17426.0.patch

do an initial change to see, what is the impact.

> Execution framework in hive to run tasks in parallel other than MR Tasks
> 
>
> Key: HIVE-17426
> URL: https://issues.apache.org/jira/browse/HIVE-17426
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-17426.0.patch
>
>
> the execution framework currently only runs MR Tasks in parallel when {{set 
> hive.exec.parallel=true}}.
> Allow other types of tasks to run in parallel as well to support replication 
> scenarios in hive. TezTask / SparkTask will still not be allowed to run in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151097#comment-16151097
 ] 

Hive QA commented on HIVE-16811:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885000/HIVE-16811.16.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 11024 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_smb_mapjoin_14]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_10]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_9]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez2]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_7]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization2]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[semijoin] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_mapjoin_14]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_mapjoin_6]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_auto_smb_mapjoin_14]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_leftsemi_mapjoin]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_multi_insert]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=99)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vector_non_string_partition]
 (batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6645/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6645/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6645/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885000 - PreCommit-HIVE-Build

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.1.patch, 
> HIVE-16811.2.patch, HIVE-16811.3.patch, HIVE-16811.4.patch, 
> HIVE-16811.5.patch, HIVE-16811.6.patch, HIVE-16811.7.patch, 
> HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
>  

[jira] [Commented] (HIVE-17425) Change MetastoreConf.ConfVars internal members to be private

2017-09-01 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151040#comment-16151040
 ] 

Alan Gates commented on HIVE-17425:
---

[~sershe], MetastoreConf does support legacy names, meaning that if it's handed 
a config file with the names from Hive, or even a HiveConf object, it will work 
as expected.  This patch is about changing the internals of it so it makes it 
harder for developers to accidentally break that.  By using the getVar and 
related methods, developers can guarantee both new and old values are checked.  
If they call conf.get(key) directly, MetastoreConf can't make sure it checks 
both old and new values.  So I'm changing the key names to be private (with a 
getter that has comments on why it shouldn't be used) to developers don't mess 
up and make the wrong call.

> Change MetastoreConf.ConfVars internal members to be private
> 
>
> Key: HIVE-17425
> URL: https://issues.apache.org/jira/browse/HIVE-17425
> Project: Hive
>  Issue Type: Task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17425.patch
>
>
> MetastoreConf's dual use of metastore keys and Hive keys is causing confusion 
> for developers.  We should make the relevant members private and provide 
> getter methods with comments on when it is appropriate to use them.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151038#comment-16151038
 ] 

Hive QA commented on HIVE-17139:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884897/HIVE-17139.6.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11023 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6644/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6644/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6644/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884897 - PreCommit-HIVE-Build

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch, HIVE-17139.4.patch, HIVE-17139.5.patch, HIVE-17139.6.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17417) LazySimple Timestamp and Date serialization is very expensive

2017-09-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151027#comment-16151027
 ] 

Sergey Shelukhin commented on HIVE-17417:
-

+1 pending tests

> LazySimple Timestamp and Date serialization is very expensive
> -
>
> Key: HIVE-17417
> URL: https://issues.apache.org/jira/browse/HIVE-17417
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: date-serialize.png, HIVE-17417.1.patch, 
> timestamp-serialize.png, ts-jmh-perf.png
>
>
> In a specific case where a schema contains array with timestamp and 
> date fields (array size >1). Any access to this column very very 
> expensive in terms of CPU as most of the time is serialization of timestamp 
> and date. Refer attached profiles. >70% time spent in serialization + 
> tostring conversions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17417) LazySimple Timestamp and Date serialization is very expensive

2017-09-01 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17417:
-
Attachment: HIVE-17417.1.patch

Same patch for ptest again

> LazySimple Timestamp and Date serialization is very expensive
> -
>
> Key: HIVE-17417
> URL: https://issues.apache.org/jira/browse/HIVE-17417
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: date-serialize.png, HIVE-17417.1.patch, 
> timestamp-serialize.png, ts-jmh-perf.png
>
>
> In a specific case where a schema contains array with timestamp and 
> date fields (array size >1). Any access to this column very very 
> expensive in terms of CPU as most of the time is serialization of timestamp 
> and date. Refer attached profiles. >70% time spent in serialization + 
> tostring conversions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Patch Available  (was: Open)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.1.patch, 
> HIVE-16811.2.patch, HIVE-16811.3.patch, HIVE-16811.4.patch, 
> HIVE-16811.5.patch, HIVE-16811.6.patch, HIVE-16811.7.patch, 
> HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Attachment: HIVE-16811.16.patch

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.1.patch, 
> HIVE-16811.2.patch, HIVE-16811.3.patch, HIVE-16811.4.patch, 
> HIVE-16811.5.patch, HIVE-16811.6.patch, HIVE-16811.7.patch, 
> HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Open  (was: Patch Available)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.16.patch, HIVE-16811.1.patch, 
> HIVE-16811.2.patch, HIVE-16811.3.patch, HIVE-16811.4.patch, 
> HIVE-16811.5.patch, HIVE-16811.6.patch, HIVE-16811.7.patch, 
> HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17417) LazySimple Timestamp and Date serialization is very expensive

2017-09-01 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17417:
-
Attachment: (was: HIVE-17417.1.patch)

> LazySimple Timestamp and Date serialization is very expensive
> -
>
> Key: HIVE-17417
> URL: https://issues.apache.org/jira/browse/HIVE-17417
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: date-serialize.png, timestamp-serialize.png, 
> ts-jmh-perf.png
>
>
> In a specific case where a schema contains array with timestamp and 
> date fields (array size >1). Any access to this column very very 
> expensive in terms of CPU as most of the time is serialization of timestamp 
> and date. Refer attached profiles. >70% time spent in serialization + 
> tostring conversions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17399) Semijoin: Do not remove semijoin branch if it feeds to TS->DPP_EVENT

2017-09-01 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17399:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Semijoin: Do not remove semijoin branch if it feeds to TS->DPP_EVENT
> 
>
> Key: HIVE-17399
> URL: https://issues.apache.org/jira/browse/HIVE-17399
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17399.1.patch, HIVE-17399.2.patch, 
> HIVE-17399.3.patch, HIVE-17399.4.patch
>
>
> If there is an incoming semijoin branch to a TS which has DPP event, then try 
> to keep it as it may serve as an excellent filter for DPP thus reducing the 
> input to join drastically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17399) Semijoin: Do not remove semijoin branch if it feeds to TS->DPP_EVENT

2017-09-01 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151014#comment-16151014
 ] 

Deepak Jaiswal commented on HIVE-17399:
---

[~gopalv] committed to master. Thanks Gopal.

> Semijoin: Do not remove semijoin branch if it feeds to TS->DPP_EVENT
> 
>
> Key: HIVE-17399
> URL: https://issues.apache.org/jira/browse/HIVE-17399
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17399.1.patch, HIVE-17399.2.patch, 
> HIVE-17399.3.patch, HIVE-17399.4.patch
>
>
> If there is an incoming semijoin branch to a TS which has DPP event, then try 
> to keep it as it may serve as an excellent filter for DPP thus reducing the 
> input to join drastically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable

2017-09-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17409:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> refactor LLAP ZK registry to make the ZK-registry part reusable
> ---
>
> Key: HIVE-17409
> URL: https://issues.apache.org/jira/browse/HIVE-17409
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-17409.01.patch, HIVE-17409.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-09-01 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150997#comment-16150997
 ] 

Gunther Hagleitner commented on HIVE-17297:
---

+1

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.01.nogen.patch, HIVE-17297.01.patch, 
> HIVE-17297.02.patch, HIVE-17297.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-09-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: HIVE-15665.10.patch

Updating out files

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, 
> HIVE-15665.06.patch, HIVE-15665.07.patch, HIVE-15665.08.patch, 
> HIVE-15665.09.patch, HIVE-15665.10.patch, HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-09-01 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150969#comment-16150969
 ] 

Zhiyuan Yang commented on HIVE-17297:
-

+1 (non-binding)

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.01.nogen.patch, HIVE-17297.01.patch, 
> HIVE-17297.02.patch, HIVE-17297.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150949#comment-16150949
 ] 

Hive QA commented on HIVE-16811:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884898/HIVE-16811.15.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11010 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat6]
 (batchId=7)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=102)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth 
(batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=240)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6643/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6643/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6643/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884898 - PreCommit-HIVE-Build

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch, HIVE-16811.4.patch, HIVE-16811.5.patch, 
> HIVE-16811.6.patch, HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17399) Semijoin: Do not remove semijoin branch if it feeds to TS->DPP_EVENT

2017-09-01 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17399:
---
Summary: Semijoin: Do not remove semijoin branch if it feeds to 
TS->DPP_EVENT  (was: Do not remove semijoin branch if it feeds to TS->DPP_EVENT)

> Semijoin: Do not remove semijoin branch if it feeds to TS->DPP_EVENT
> 
>
> Key: HIVE-17399
> URL: https://issues.apache.org/jira/browse/HIVE-17399
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17399.1.patch, HIVE-17399.2.patch, 
> HIVE-17399.3.patch, HIVE-17399.4.patch
>
>
> If there is an incoming semijoin branch to a TS which has DPP event, then try 
> to keep it as it may serve as an excellent filter for DPP thus reducing the 
> input to join drastically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17225) HoS DPP pruning sink ops can target parallel work objects

2017-09-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-17225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150863#comment-16150863
 ] 

Sergio Peña commented on HIVE-17225:


Thanks for leaving comments on the code. I helped me to understand the code 
better.
+1

I just left a comment about the comments you wrote. Could you just confirm if 
it is correct?

> HoS DPP pruning sink ops can target parallel work objects
> -
>
> Key: HIVE-17225
> URL: https://issues.apache.org/jira/browse/HIVE-17225
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE17225.1.patch, HIVE-17225.2.patch, 
> HIVE-17225.3.patch, HIVE-17225.4.patch
>
>
> Setup:
> {code:sql}
> SET hive.spark.dynamic.partition.pruning=true;
> SET hive.strict.checks.cartesian.product=false;
> SET hive.auto.convert.join=true;
> CREATE TABLE partitioned_table1 (col int) PARTITIONED BY (part_col int);
> CREATE TABLE regular_table1 (col int);
> CREATE TABLE regular_table2 (col int);
> ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 1);
> ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 2);
> ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 3);
> INSERT INTO table regular_table1 VALUES (1), (2), (3), (4), (5), (6);
> INSERT INTO table regular_table2 VALUES (1), (2), (3), (4), (5), (6);
> INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 1) VALUES (1);
> INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 2) VALUES (2);
> INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 3) VALUES (3);
> SELECT *
> FROM   partitioned_table1,
>regular_table1 rt1,
>regular_table2 rt2
> WHERE  rt1.col = partitioned_table1.part_col
>AND rt2.col = partitioned_table1.part_col;
> {code}
> Exception:
> {code}
> 2017-08-01T13:27:47,483 ERROR [b0d354a8-4cdb-4ba9-acec-27d14926aaf4 main] 
> ql.Driver: FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.FileNotFoundException: File 
> file:/Users/stakiar/Documents/idea/apache-hive/itests/qtest-spark/target/tmp/scratchdir/stakiar/b0d354a8-4cdb-4ba9-acec-27d14926aaf4/hive_2017-08-01_13-27-45_553_1088589686371686526-1/-mr-10004/3/5
>  does not exist
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:408)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:498)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 

[jira] [Commented] (HIVE-16828) With CBO enabled, Query on partitioned views throws IndexOutOfBoundException

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150829#comment-16150829
 ] 

Hive QA commented on HIVE-16828:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871529/HIVE-16828.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11009 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=103)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6642/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6642/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6642/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871529 - PreCommit-HIVE-Build

> With CBO enabled, Query on partitioned views throws IndexOutOfBoundException
> 
>
> Key: HIVE-16828
> URL: https://issues.apache.org/jira/browse/HIVE-16828
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.2.0, 2.1.1, 2.2.0
>Reporter: Adesh Kumar Rao
>Assignee: Vineet Garg
> Attachments: HIVE-16828.patch
>
>
> {code:java}
> Caused by: java.lang.AssertionError: Internal error: While invoking method 
> 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Filter,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
> at org.apache.calcite.util.Util.newInternal(Util.java:789)
> at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:534)
> at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:270)
> at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimChild(RelFieldTrimmer.java:213)
> at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(RelFieldTrimmer.java:374)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveRelFieldTrimmer.trimFields(HiveRelFieldTrimmer.java:394)
> ... 98 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
> ... 102 more
> Caused by: java.lang.AssertionError: Internal error: While invoking method 
> 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveRelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Project,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
> at org.apache.calcite.util.Util.newInternal(Util.java:789)
> at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:534)
> at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:270)
> at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimChild(RelFieldTrimmer.java:213)
> at 
> org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(RelFieldTrimmer.java:466)
> ... 107 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
> ... 110 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 94, Size: 94
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at 
> 

[jira] [Commented] (HIVE-17414) HoS DPP + Vectorization generates invalid explain plan due to CombineEquivalentWorkResolver

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150751#comment-16150751
 ] 

Hive QA commented on HIVE-17414:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884872/HIVE-17414.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11023 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValid (batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg (batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeProxyAuth 
(batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth 
(batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=240)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=240)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6641/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6641/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6641/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884872 - PreCommit-HIVE-Build

> HoS DPP + Vectorization generates invalid explain plan due to 
> CombineEquivalentWorkResolver
> ---
>
> Key: HIVE-17414
> URL: https://issues.apache.org/jira/browse/HIVE-17414
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: liyunzhang_intel
> Attachments: HIVE-17414.1.patch, HIVE-17414.2.patch, HIVE-17414.patch
>
>
> Similar to HIVE-16948, the following query generates an invalid explain plan 
> when HoS DPP is enabled + vectorization:
> {code:sql}
> select ds from (select distinct(ds) as ds from srcpart union all select 
> distinct(ds) as ds from srcpart) s where s.ds in (select max(srcpart.ds) from 
> srcpart union all select min(srcpart.ds) from srcpart)
> {code}
> Explain Plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>  A masked pattern was here 
>   Vertices:
> Map 10
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Execution mode: vectorized
> Map 12
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE 

[jira] [Commented] (HIVE-17385) Fix incremental repl error for non-native tables

2017-09-01 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150748#comment-16150748
 ] 

Tao Li commented on HIVE-17385:
---

Thanks [~daijy]

> Fix incremental repl error for non-native tables
> 
>
> Key: HIVE-17385
> URL: https://issues.apache.org/jira/browse/HIVE-17385
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Fix For: 3.0.0
>
> Attachments: HIVE-17385.1.patch, HIVE-17385.2.patch, 
> HIVE-17385.3.patch, HIVE-17385.4.patch, HIVE-17385.5.patch, 
> HIVE-17385.6.patch, HIVE-17385.7.patch
>
>
> See below error with incremental replication for non-native (storage handler 
> based) tables. The bug is that we are not checking a table should be 
> dumped/exported or not during incremental dump.
> 2017-08-02T12:31:48,195 ERROR [HiveServer2-Background-Pool: Thread-8078]: 
> exec.DDLTask (DDLTask.java:failed(632)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:LOCATION may not be specified for HBase.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17183) Disable rename operations during bootstrap dump

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150653#comment-16150653
 ] 

Hive QA commented on HIVE-17183:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884873/HIVE-17183.05.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11024 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=227)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6640/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6640/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6640/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884873 - PreCommit-HIVE-Build

> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch, HIVE-17183.02.patch, 
> HIVE-17183.03.patch, HIVE-17183.04.patch, HIVE-17183.05.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. 
> *Scenario:*
> - Fetch table names (T1 and T2)
> - Dump table T1
> - Rename table T2 to T3 generates RENAME event
> - Dump table T2 is noop as table doesn’t exist.
> - In target after load, it only have T1.
> - Apply RENAME event will fail as T2 doesn’t exist in target.
> This feature can be supported in next phase development as it need proper 
> design to keep track of renamed tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-09-01 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-17394:
---
Description: 
The following methods in {{AvroDeserializer}} keep regenerating {{TypeInfo}} 
objects for every nullable  field in a row.

This is happening in the following methods.

{code}
private Object deserializeNullableUnion(Object datum, Schema fileSchema, Schema 
recordSchema) throws AvroSerdeException {
// elided
line 312:  return worker(datum, fileSchema, newRecordSchema,
SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
}
..
private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
recordSchema)
// elided
line 357: return worker(datum, currentFileSchema, schema,
  SchemaToTypeInfo.generateTypeInfo(schema, null));
{code}

This is really bad in terms of performance. I'm not sure why didn't we use the 
TypeInfo we already have instead of generating again for each nullable field.  
If you look at the {{worker}} method which calls the method 
{{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
column is already determined. 
Moreover the cache in {{SchmaToTypeInfo}} class does not help in nullable Avro 
records case as checking if an Avro record schema object already exists in the 
cache requires traversing all the fields in the record schema.

I've attached profiling snapshot which shows maximum time is being spent in the 
cache.

One way of fixing this IMO might be to make use of the column TypeInfo which is 
already passed in the worker method.

  was:
The following methods in {{AvroDeserializer}} keep regenerating TypeInfo 
objects for every nullable  field in a row.

This is happening in the following methods.

{code}
private Object deserializeNullableUnion(Object datum, Schema fileSchema, Schema 
recordSchema) throws AvroSerdeException {
// elided
line 312:  return worker(datum, fileSchema, newRecordSchema,
SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
}
..
private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
recordSchema)
// elided
line 357: return worker(datum, currentFileSchema, schema,
  SchemaToTypeInfo.generateTypeInfo(schema, null));
{code}

This is really bad in terms of performance. I'm not sure why didn't we use the 
TypeInfo we already have instead of generating again for each nullable field.  
If you look at the {{worker}} method which calls the method 
{{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
column is already determined. Not sure why we have to determine that 
information again.

Moreover the cache in SchmaToTypeInfo does not help in nullable Avro records 
case as checking if an Avro record schema object already exists in the cache 
requires traversing all the fields in the record schema.

I've attached profiling snapshot which shows maximum time is being spent in the 
cache.

One way of fixing this IMO is to make use of the column TypeInfo which is 
already passed in the worker method.


> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ratandeep Ratti
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png
>
>
> The following methods in {{AvroDeserializer}} keep regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchmaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this 

[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification

2017-09-01 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150529#comment-16150529
 ] 

Vaibhav Gumashta commented on HIVE-13989:
-

+1

[~cdrome] I wasn't able to check the test report in time - did you get a chance 
to look at them (they don't seem related though). 

> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, 
> HIVE-13989.4-branch-2.2.patch, HIVE-13989.4-branch-2.patch, 
> HIVE-13989-branch-1.patch, HIVE-13989-branch-2.2.patch, 
> HIVE-13989-branch-2.2.patch, HIVE-13989-branch-2.2.patch
>
>
> Hive takes two approaches to working with extended ACLs depending on whether 
> data is being produced via a Hive query or HCatalog APIs. A Hive query will 
> run an FsShell command to recursively set the extended ACLs for a directory 
> sub-tree. HCatalog APIs will attempt to build up the directory sub-tree 
> programmatically and runs some code to set the ACLs to match the parent 
> directory.
> Some incorrect assumptions were made when implementing the extended ACLs 
> support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the 
> design documents of extended ACLs in HDFS. These documents model the 
> implementation after the POSIX implementation on Linux, which can be found at 
> http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html.
> The code for setting extended ACLs via HCatalog APIs is found in 
> HdfsUtils.java:
> {code}
> if (aclEnabled) {
>   aclStatus =  sourceStatus.getAclStatus();
>   if (aclStatus != null) {
> LOG.trace(aclStatus.toString());
> aclEntries = aclStatus.getEntries();
> removeBaseAclEntries(aclEntries);
> //the ACL api's also expect the tradition user/group/other permission 
> in the form of ACL
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, 
> sourcePerm.getUserAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, 
> sourcePerm.getGroupAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, 
> sourcePerm.getOtherAction()));
>   }
> }
> {code}
> We found that DEFAULT extended ACL rules were not being inherited properly by 
> the directory sub-tree, so the above code is incomplete because it 
> effectively drops the DEFAULT rules. The second problem is with the call to 
> {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended 
> ACLs. When extended ACLs are used the GROUP permission is replaced with the 
> extended ACL mask. So the above code will apply the wrong permissions to the 
> GROUP. Instead the correct GROUP permissions now need to be pulled from the 
> AclEntry as returned by {{getAclStatus().getEntries()}}. See the 
> implementation of the new method {{getDefaultAclEntries}} for details.
> Similar issues exist with the HCatalog API. None of the API accounts for 
> setting extended ACLs on the directory sub-tree. The changes to the HCatalog 
> API allow the extended ACLs to be passed into the required methods similar to 
> how basic permissions are passed in. When building the directory sub-tree the 
> extended ACLs of the table directory are inherited by all sub-directories, 
> including the DEFAULT rules.
> Replicating the problem:
> Create a table to write data into (I will use acl_test as the destination and 
> words_text as the source) and set the ACLs as follows:
> {noformat}
> $ hdfs dfs -setfacl -m 
> default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx
>  /user/cdrome/hive/acl_test
> $ hdfs dfs -ls -d /user/cdrome/hive/acl_test
> drwxrwx---+  - cdrome hdfs  0 2016-07-13 20:36 
> /user/cdrome/hive/acl_test
> $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test
> # file: /user/cdrome/hive/acl_test
> # owner: cdrome
> # group: hdfs
> user::rwx
> user:hdfs:rwx
> group::r-x
> mask::rwx
> other::---
> default:user::rwx
> default:user:hdfs:rwx
> default:group::r-x
> default:mask::rwx
> default:other::---
> {noformat}
> Note that the basic GROUP permission is set to {{rwx}} after setting the 
> ACLs. The ACLs explicitly set the DEFAULT rules and a rule specifically for 
> the {{hdfs}} user.
> Run the following query to populate the table:
> {noformat}
> insert into acl_test partition (dt='a', ds='b') select a, b from words_text 
> where dt = 'c';
> {noformat}
> Note that words_text only has a single partition key.
> Now examine the ACLs for the resulting directories:
> {noformat}
> $ 

[jira] [Commented] (HIVE-17405) HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150526#comment-16150526
 ] 

Hive QA commented on HIVE-17405:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884864/HIVE-17405.7.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11023 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.insertOverwriteCreate 
(batchId=282)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6639/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6639/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6639/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884864 - PreCommit-HIVE-Build

> HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT
> -
>
> Key: HIVE-17405
> URL: https://issues.apache.org/jira/browse/HIVE-17405
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17405.1.patch, HIVE-17405.2.patch, 
> HIVE-17405.3.patch, HIVE-17405.4.patch, HIVE-17405.5.patch, 
> HIVE-17405.6.patch, HIVE-17405.7.patch
>
>
> In {{SparkCompiler#runDynamicPartitionPruning}} we should change {{new 
> ConstantPropagate().transform(parseContext)}} to {{new 
> ConstantPropagate(ConstantPropagateOption.SHORTCUT).transform(parseContext)}}
> Hive-on-Tez does the same thing.
> Running the full constant propagation isn't really necessary, we just want to 
> eliminate any {{and true}} predicates that were introduced by 
> {{SyntheticJoinPredicate}} and {{DynamicPartitionPruningOptimization}}. The 
> {{SyntheticJoinPredicate}} will introduce dummy filter predicates into the 
> operator tree, and {{DynamicPartitionPruningOptimization}} will replace them. 
> The predicates introduced via {{SyntheticJoinPredicate}} are necessary to 
> help {{DynamicPartitionPruningOptimization}} determine if DPP can be used or 
> not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150429#comment-16150429
 ] 

Hive QA commented on HIVE-17409:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884858/HIVE-17409.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11021 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hive.service.server.TestHS2HttpServer.org.apache.hive.service.server.TestHS2HttpServer
 (batchId=198)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6638/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6638/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6638/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884858 - PreCommit-HIVE-Build

> refactor LLAP ZK registry to make the ZK-registry part reusable
> ---
>
> Key: HIVE-17409
> URL: https://issues.apache.org/jira/browse/HIVE-17409
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17409.01.patch, HIVE-17409.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150367#comment-16150367
 ] 

Hive QA commented on HIVE-17297:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884850/HIVE-17297.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11017 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=102)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6637/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6637/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6637/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884850 - PreCommit-HIVE-Build

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.01.nogen.patch, HIVE-17297.01.patch, 
> HIVE-17297.02.patch, HIVE-17297.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17414) HoS DPP + Vectorization generates invalid explain plan due to CombineEquivalentWorkResolver

2017-09-01 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150357#comment-16150357
 ] 

Rui Li commented on HIVE-17414:
---

+1

> HoS DPP + Vectorization generates invalid explain plan due to 
> CombineEquivalentWorkResolver
> ---
>
> Key: HIVE-17414
> URL: https://issues.apache.org/jira/browse/HIVE-17414
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: liyunzhang_intel
> Attachments: HIVE-17414.1.patch, HIVE-17414.2.patch, HIVE-17414.patch
>
>
> Similar to HIVE-16948, the following query generates an invalid explain plan 
> when HoS DPP is enabled + vectorization:
> {code:sql}
> select ds from (select distinct(ds) as ds from srcpart union all select 
> distinct(ds) as ds from srcpart) s where s.ds in (select max(srcpart.ds) from 
> srcpart union all select min(srcpart.ds) from srcpart)
> {code}
> Explain Plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>  A masked pattern was here 
>   Vertices:
> Map 10
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Execution mode: vectorized
> Map 12
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Execution mode: vectorized
> Reducer 11
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition 

[jira] [Commented] (HIVE-17405) HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT

2017-09-01 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150356#comment-16150356
 ] 

Rui Li commented on HIVE-17405:
---

+1

> HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT
> -
>
> Key: HIVE-17405
> URL: https://issues.apache.org/jira/browse/HIVE-17405
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17405.1.patch, HIVE-17405.2.patch, 
> HIVE-17405.3.patch, HIVE-17405.4.patch, HIVE-17405.5.patch, 
> HIVE-17405.6.patch, HIVE-17405.7.patch
>
>
> In {{SparkCompiler#runDynamicPartitionPruning}} we should change {{new 
> ConstantPropagate().transform(parseContext)}} to {{new 
> ConstantPropagate(ConstantPropagateOption.SHORTCUT).transform(parseContext)}}
> Hive-on-Tez does the same thing.
> Running the full constant propagation isn't really necessary, we just want to 
> eliminate any {{and true}} predicates that were introduced by 
> {{SyntheticJoinPredicate}} and {{DynamicPartitionPruningOptimization}}. The 
> {{SyntheticJoinPredicate}} will introduce dummy filter predicates into the 
> operator tree, and {{DynamicPartitionPruningOptimization}} will replace them. 
> The predicates introduced via {{SyntheticJoinPredicate}} are necessary to 
> help {{DynamicPartitionPruningOptimization}} determine if DPP can be used or 
> not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17383) ArrayIndexOutOfBoundsException in VectorGroupByOperator

2017-09-01 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150347#comment-16150347
 ] 

Rui Li commented on HIVE-17383:
---

[~kellyzly], I mean the failures with age 1. Can you reproduce any of them with 
the patch in your env?

> ArrayIndexOutOfBoundsException in VectorGroupByOperator
> ---
>
> Key: HIVE-17383
> URL: https://issues.apache.org/jira/browse/HIVE-17383
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-17383.1.patch
>
>
> Query to reproduce:
> {noformat}
> set hive.cbo.enable=false;
> select count(*) from (select key from src group by key) s where s.key='98';
> {noformat}
> The stack trace is:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:831)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:174)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1046)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:462)
>   ... 18 more
> {noformat}
> More details can be found in HIVE-16823



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17383) ArrayIndexOutOfBoundsException in VectorGroupByOperator

2017-09-01 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150330#comment-16150330
 ] 

liyunzhang_intel commented on HIVE-17383:
-

why say "The failures can't be reproduced locally.". Actually it can be 
reproduced in my env. Do you mean in latest master, this is fixed?


Not very well understand the logic of vectorization.
why {{firstOutputColumnIndex}} starts from {{initialColumnNames.length}}.  for 
example,if have 1 column, the {{firstOutputColumnIndex}} is from 1( normally 
the index is from 0).  When we construct the output batch, the column is from 
1, is this right?
{code}
// Convenient constructor for initial batch creation takes
  // a list of columns names and maps them to 0..n-1 indices.
  public VectorizationContext(String contextName, List 
initialColumnNames,
  HiveConf hiveConf) {
this.contextName = contextName;
level = 0;
this.initialColumnNames = initialColumnNames;
this.projectionColumnNames = initialColumnNames;

projectedColumns = new ArrayList();
projectionColumnMap = new HashMap();
for (int i = 0; i < this.projectionColumnNames.size(); i++) {
  projectedColumns.add(i);
  projectionColumnMap.put(projectionColumnNames.get(i), i);
}

int firstOutputColumnIndex = projectedColumns.size();
this.ocm = new OutputColumnManager(firstOutputColumnIndex);
this.firstOutputColumnIndex = firstOutputColumnIndex;
vMap = new VectorExpressionDescriptor();

if (hiveConf != null) {
  setHiveConfVars(hiveConf);
}
  }
{code}


> ArrayIndexOutOfBoundsException in VectorGroupByOperator
> ---
>
> Key: HIVE-17383
> URL: https://issues.apache.org/jira/browse/HIVE-17383
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-17383.1.patch
>
>
> Query to reproduce:
> {noformat}
> set hive.cbo.enable=false;
> select count(*) from (select key from src group by key) s where s.key='98';
> {noformat}
> The stack trace is:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:831)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:174)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1046)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:462)
>   ... 18 more
> {noformat}
> More details can be found in HIVE-16823



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17417) LazySimple Timestamp and Date serialization is very expensive

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150324#comment-16150324
 ] 

Hive QA commented on HIVE-17417:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884853/ts-jmh-perf.png

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6636/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6636/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6636/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-09-01 10:08:03.367
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6636/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-09-01 10:08:03.370
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at b82d38a HIVE-17415 - Hit error "SemanticException View xxx is 
corresponding to LIMIT, rather than a SelectOperator." in Hive queries (Deepak 
Jaiswal, reviewed by Ashutosh Chauhan)
+ git clean -f -d
Removing 
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/MetadataCache.java
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at b82d38a HIVE-17415 - Hit error "SemanticException View xxx is 
corresponding to LIMIT, rather than a SelectOperator." in Hive queries (Deepak 
Jaiswal, reviewed by Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-09-01 10:08:07.214
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
patch:  Only garbage was found in the patch input.
patch:  Only garbage was found in the patch input.
patch:  Only garbage was found in the patch input.
fatal: unrecognized input
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884853 - PreCommit-HIVE-Build

> LazySimple Timestamp and Date serialization is very expensive
> -
>
> Key: HIVE-17417
> URL: https://issues.apache.org/jira/browse/HIVE-17417
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: date-serialize.png, HIVE-17417.1.patch, 
> timestamp-serialize.png, ts-jmh-perf.png
>
>
> In a specific case where a schema contains array with timestamp and 
> date fields (array size >1). Any access to this column very very 
> expensive in terms of CPU as most of the time is serialization of timestamp 
> and date. Refer attached profiles. >70% time spent in serialization + 
> tostring conversions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150322#comment-16150322
 ] 

Hive QA commented on HIVE-15665:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884843/HIVE-15665.09.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11023 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6635/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6635/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6635/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884843 - PreCommit-HIVE-Build

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, 
> HIVE-15665.06.patch, HIVE-15665.07.patch, HIVE-15665.08.patch, 
> HIVE-15665.09.patch, HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17383) ArrayIndexOutOfBoundsException in VectorGroupByOperator

2017-09-01 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150314#comment-16150314
 ] 

Rui Li commented on HIVE-17383:
---

The failures can't be reproduced locally.
[~mmccline], [~ashutoshc], [~gopalv] could you take a look? Thanks.

> ArrayIndexOutOfBoundsException in VectorGroupByOperator
> ---
>
> Key: HIVE-17383
> URL: https://issues.apache.org/jira/browse/HIVE-17383
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-17383.1.patch
>
>
> Query to reproduce:
> {noformat}
> set hive.cbo.enable=false;
> select count(*) from (select key from src group by key) s where s.key='98';
> {noformat}
> The stack trace is:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:831)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:174)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1046)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:462)
>   ... 18 more
> {noformat}
> More details can be found in HIVE-16823



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17420) bootstrap - get replid before object dump

2017-09-01 Thread Sankar Hariappan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150304#comment-16150304
 ] 

Sankar Hariappan commented on HIVE-17420:
-

+1, Patch looks good to me. 
cc [~thejas]/[~daijy]

> bootstrap - get replid before object dump
> -
>
> Key: HIVE-17420
> URL: https://issues.apache.org/jira/browse/HIVE-17420
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-17420.1.patch, HIVE-17420.2.patch
>
>
> when we do the bootstrap dump, the idea is that we take note of the 
> replication id of a object (table in case of this bug) and then request the 
> table metadata from the rdbms which is then serialized to _metadata file for 
> the table containing lastReplicationId + table metadata.
> This is to make sure that we only apply events after the event id in 
> _metadata on the table, which implies that its best to order the call to get 
> currentReplicationId and tablemetadata retrieval in the event of multiple 
> concurrent operations happening on the same table along with bootstrap.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17399) Do not remove semijoin branch if it feeds to TS->DPP_EVENT

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150241#comment-16150241
 ] 

Hive QA commented on HIVE-17399:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884842/HIVE-17399.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11023 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6634/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6634/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6634/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884842 - PreCommit-HIVE-Build

> Do not remove semijoin branch if it feeds to TS->DPP_EVENT
> --
>
> Key: HIVE-17399
> URL: https://issues.apache.org/jira/browse/HIVE-17399
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17399.1.patch, HIVE-17399.2.patch, 
> HIVE-17399.3.patch, HIVE-17399.4.patch
>
>
> If there is an incoming semijoin branch to a TS which has DPP event, then try 
> to keep it as it may serve as an excellent filter for DPP thus reducing the 
> input to join drastically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17420) bootstrap - get replid before object dump

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150179#comment-16150179
 ] 

Hive QA commented on HIVE-17420:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884832/HIVE-17420.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11009 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=101)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testMetastoreTablesCleanup 
(batchId=282)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=227)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6633/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6633/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6633/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884832 - PreCommit-HIVE-Build

> bootstrap - get replid before object dump
> -
>
> Key: HIVE-17420
> URL: https://issues.apache.org/jira/browse/HIVE-17420
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-17420.1.patch, HIVE-17420.2.patch
>
>
> when we do the bootstrap dump, the idea is that we take note of the 
> replication id of a object (table in case of this bug) and then request the 
> table metadata from the rdbms which is then serialized to _metadata file for 
> the table containing lastReplicationId + table metadata.
> This is to make sure that we only apply events after the event id in 
> _metadata on the table, which implies that its best to order the call to get 
> currentReplicationId and tablemetadata retrieval in the event of multiple 
> concurrent operations happening on the same table along with bootstrap.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Patch Available  (was: Open)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.15.patch, HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch, HIVE-16811.4.patch, HIVE-16811.5.patch, 
> HIVE-16811.6.patch, HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Open  (was: Patch Available)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.12.patch, HIVE-16811.13.patch, HIVE-16811.14.patch, 
> HIVE-16811.1.patch, HIVE-16811.2.patch, HIVE-16811.3.patch, 
> HIVE-16811.4.patch, HIVE-16811.5.patch, HIVE-16811.6.patch, 
> HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-09-01 Thread Ke Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Attachment: HIVE-17139.6.patch

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch, HIVE-17139.4.patch, HIVE-17139.5.patch, HIVE-17139.6.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150129#comment-16150129
 ] 

Hive QA commented on HIVE-16811:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884876/HIVE-16811.14.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 42 failed/errored test(s), 11024 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ppd_union_view]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sample10] 
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_nonvec_part]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_nonvec_part_all_complex]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_nonvec_part_all_primitive]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part_all_complex]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part_all_primitive]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_complex]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_primitive]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_complex]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_primitive]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_complex]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_primitive]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_mapjoin_18]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_sets4]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_sets5]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_sets6]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_sets_limit]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_mapjoin]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_rollup1]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_grouping_sets]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_mr_diff_schema_alias]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partition_diff_num_cols]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partitioned_date_time]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_when_case_null]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_windowing_navfn]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_short_regress]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_bucketmapjoin1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_context]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_date_funcs]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_partition_pruning]
 (batchId=152)

[jira] [Commented] (HIVE-12833) GROUPING__ID is wrong

2017-09-01 Thread Yann Byron (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150107#comment-16150107
 ] 

Yann Byron commented on HIVE-12833:
---

[~davies]
We met this issue;
Could you please post the Hive code, where {{grouping__id}} is generated ?
We are not very clear for the logic for generating {{grouping__id}}.

Also could you please post the example you mentioned?

> GROUPING__ID is wrong
> -
>
> Key: HIVE-12833
> URL: https://issues.apache.org/jira/browse/HIVE-12833
> Project: Hive
>  Issue Type: Bug
>Reporter: Davies Liu
>
> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup
> Grouping__ID function
> This function returns a bitvector corresponding to whether each column is 
> present or not. For each column, a value of "1" is produced for a row in
> the result set if that column has been aggregated in that row, otherwise the 
> value is "0". This can be used to differentiate when there are nulls
> in the data.
> This does not match the example blow, in the first row, both key and value 
> are aggregated, so GROUPING__ID should be 3 not, 0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)