[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060440#comment-16060440
 ] 

Hive QA commented on HIVE-11297:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874190/HIVE-11297.8.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10846 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_create] 
(batchId=83)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5739/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5739/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5739/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874190 - PreCommit-HIVE-Build

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, 
> HIVE-11297.6.patch, HIVE-11297.7.patch, HIVE-11297.8.patch, hive-site.xml
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16943) MoveTask should separate src FileSystem from dest FileSystem

2017-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060405#comment-16060405
 ] 

Hive QA commented on HIVE-16943:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874172/HIVE-16943.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 10730 tests 
executed
*Failed tests:*
{noformat}
TestCleaner2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=258)
TestConvertAstToSearchArg - did not produce a TEST-*.xml file (likely timed 
out) (batchId=258)
TestIOContextMap - did not produce a TEST-*.xml file (likely timed out) 
(batchId=258)
TestInitiator - did not produce a TEST-*.xml file (likely timed out) 
(batchId=258)
TestRecordIdentifier - did not produce a TEST-*.xml file (likely timed out) 
(batchId=258)
TestSearchArgumentImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=258)
TestWorker - did not produce a TEST-*.xml file (likely timed out) (batchId=258)
TestWorker2 - did not produce a TEST-*.xml file (likely timed out) (batchId=258)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5738/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5738/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5738/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874172 - PreCommit-HIVE-Build

> MoveTask should separate src FileSystem from dest FileSystem 
> -
>
> Key: HIVE-16943
> URL: https://issues.apache.org/jira/browse/HIVE-16943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16943.1.patch
>
>
> {code:title=MoveTask.java|borderStyle=solid} 
>  private void moveFileInDfs (Path sourcePath, Path targetPath, FileSystem fs)
>   throws HiveException, IOException {
> // if source exists, rename. Otherwise, create a empty directory
> if (fs.exists(sourcePath)) {
>   Path deletePath = null;
>   // If it multiple level of folder are there fs.rename is failing so 
> first
>   // create the targetpath.getParent() if it not exist
>   if (HiveConf.getBoolVar(conf, 
> HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
> deletePath = createTargetPath(targetPath, fs);
>   }
>   Hive.clearDestForSubDirSrc(conf, targetPath, sourcePath, false);
>   if (!Hive.moveFile(conf, sourcePath, targetPath, true, false)) {
> try {
>   if (deletePath != null) {
> fs.delete(deletePath, true);
>   }
> } catch (IOException e) {
>   LOG.info("Unable to delete the path created for facilitating rename"
>   + deletePath);
> }
> throw new HiveException("Unable to rename: " + sourcePath
> + " to: " + targetPath);
>   }
> } else if (!fs.mkdirs(targetPath)) {
>   throw new HiveException("Unable to make directory: " + targetPath);
> }
>   }
> {code}
> Maybe sourcePath and targetPath come from defferent filesystem, we should 
> separate them.
> I see that HIVE-11568 had done it in Hive.java



--

[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-06-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16832:
--
Attachment: HIVE-16832.09.patch

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-22 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-11297:

Attachment: HIVE-11297.8.patch

some minor changes about spark_partition_pruning.q.out

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, 
> HIVE-11297.6.patch, HIVE-11297.7.patch, HIVE-11297.8.patch, hive-site.xml
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-22 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Status: Patch Available  (was: Open)

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch, HIVE-16929.2.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, 
> which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package 
> that contains the corresponding configuration package name under the class 
> registered as a constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}
> Instructions:
>    1, upload your jar file to $ HIVE_HOME/auxlib
>    2, configure your UDF function corresponding to the package to the 
> following configuration parameters
> {code:java}
>
> hive.aux.udf.package.name.list
> com.sample.udf
>
>   {code}
>    3, the configuration items need to be placed in the hive-site.xml file
>    4, restart the Hive service to take effect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-22 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Attachment: HIVE-16929.2.patch

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch, HIVE-16929.2.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, 
> which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package 
> that contains the corresponding configuration package name under the class 
> registered as a constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}
> Instructions:
>    1, upload your jar file to $ HIVE_HOME/auxlib
>    2, configure your UDF function corresponding to the package to the 
> following configuration parameters
> {code:java}
>
> hive.aux.udf.package.name.list
> com.sample.udf
>
>   {code}
>    3, the configuration items need to be placed in the hive-site.xml file
>    4, restart the Hive service to take effect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-22 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Attachment: (was: HIVE-16929.2.patch)

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, 
> which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package 
> that contains the corresponding configuration package name under the class 
> registered as a constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}
> Instructions:
>    1, upload your jar file to $ HIVE_HOME/auxlib
>    2, configure your UDF function corresponding to the package to the 
> following configuration parameters
> {code:java}
>
> hive.aux.udf.package.name.list
> com.sample.udf
>
>   {code}
>    3, the configuration items need to be placed in the hive-site.xml file
>    4, restart the Hive service to take effect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-22 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Attachment: HIVE-16929.2.patch

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch, HIVE-16929.2.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, 
> which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package 
> that contains the corresponding configuration package name under the class 
> registered as a constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}
> Instructions:
>    1, upload your jar file to $ HIVE_HOME/auxlib
>    2, configure your UDF function corresponding to the package to the 
> following configuration parameters
> {code:java}
>
> hive.aux.udf.package.name.list
> com.sample.udf
>
>   {code}
>    3, the configuration items need to be placed in the hive-site.xml file
>    4, restart the Hive service to take effect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-22 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Status: Open  (was: Patch Available)

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch, HIVE-16929.2.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, 
> which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package 
> that contains the corresponding configuration package name under the class 
> registered as a constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}
> Instructions:
>    1, upload your jar file to $ HIVE_HOME/auxlib
>    2, configure your UDF function corresponding to the package to the 
> following configuration parameters
> {code:java}
>
> hive.aux.udf.package.name.list
> com.sample.udf
>
>   {code}
>    3, the configuration items need to be placed in the hive-site.xml file
>    4, restart the Hive service to take effect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-22 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Attachment: (was: HIVE-16929.2.patch)

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, 
> which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package 
> that contains the corresponding configuration package name under the class 
> registered as a constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}
> Instructions:
>    1, upload your jar file to $ HIVE_HOME/auxlib
>    2, configure your UDF function corresponding to the package to the 
> following configuration parameters
> {code:java}
>
> hive.aux.udf.package.name.list
> com.sample.udf
>
>   {code}
>    3, the configuration items need to be placed in the hive-site.xml file
>    4, restart the Hive service to take effect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16943) MoveTask should separate src FileSystem from dest FileSystem

2017-06-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060380#comment-16060380
 ] 

Ashutosh Chauhan commented on HIVE-16943:
-

+1 pending tests

> MoveTask should separate src FileSystem from dest FileSystem 
> -
>
> Key: HIVE-16943
> URL: https://issues.apache.org/jira/browse/HIVE-16943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16943.1.patch
>
>
> {code:title=MoveTask.java|borderStyle=solid} 
>  private void moveFileInDfs (Path sourcePath, Path targetPath, FileSystem fs)
>   throws HiveException, IOException {
> // if source exists, rename. Otherwise, create a empty directory
> if (fs.exists(sourcePath)) {
>   Path deletePath = null;
>   // If it multiple level of folder are there fs.rename is failing so 
> first
>   // create the targetpath.getParent() if it not exist
>   if (HiveConf.getBoolVar(conf, 
> HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
> deletePath = createTargetPath(targetPath, fs);
>   }
>   Hive.clearDestForSubDirSrc(conf, targetPath, sourcePath, false);
>   if (!Hive.moveFile(conf, sourcePath, targetPath, true, false)) {
> try {
>   if (deletePath != null) {
> fs.delete(deletePath, true);
>   }
> } catch (IOException e) {
>   LOG.info("Unable to delete the path created for facilitating rename"
>   + deletePath);
> }
> throw new HiveException("Unable to rename: " + sourcePath
> + " to: " + targetPath);
>   }
> } else if (!fs.mkdirs(targetPath)) {
>   throw new HiveException("Unable to make directory: " + targetPath);
> }
>   }
> {code}
> Maybe sourcePath and targetPath come from defferent filesystem, we should 
> separate them.
> I see that HIVE-11568 had done it in Hive.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in HOS

2017-06-22 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060377#comment-16060377
 ] 

Pengcheng Xiong commented on HIVE-16948:


thanks. :)

> Invalid explain when running dynamic partition pruning query in HOS
> ---
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
>
>  union_subquery.q 
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: ds
>  

[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060379#comment-16060379
 ] 

Hive QA commented on HIVE-16832:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874171/HIVE-16832.08.patch

{color:green}SUCCESS:{color} +1 due to 16 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 76 failed/errored test(s), 10858 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_subquery] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_explode2] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_noalias] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_7] (batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_8] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_9] (batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_acid_no_masking] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_stack] (batchId=36)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_3]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization_acid]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lateral_view]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf] 
(batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf_streaming]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge] 
(batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[windowing] 
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=98)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_1]
 (batchId=88)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udf_assert_true2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udf_assert_true] 
(batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[lateral_view_explode2]
 (batchId=136)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testCombinationInputFormatWithAcid
 (batchId=262)
org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testNewBaseAndDelta 
(batchId=262)
org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testRecordReaderIncompleteDelta
 (batchId=262)
org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testRecordReaderNewBaseAndDelta
 (batchId=262)
org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testRecordReaderOldBaseAndDelta
 (batchId=262)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.majorCompactAfterAbort 
(batchId=215)
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.majorCompactWhileStreaming
 (batchId=215)
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.majorCompactWhileStreamingForSplitUpdate
 (batchId=215)

[jira] [Assigned] (HIVE-16948) Invalid explain when running dynamic partition pruning query in HOS

2017-06-22 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel reassigned HIVE-16948:
---

Assignee: liyunzhang_intel

> Invalid explain when running dynamic partition pruning query in HOS
> ---
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
>
>  union_subquery.q 
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: ds
>  

[jira] [Updated] (HIVE-16948) Invalid explain when running dynamic partition pruning query in HOS

2017-06-22 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16948:

Summary: Invalid explain when running dynamic partition pruning query in 
HOS  (was: Invalid explain when running dynamic partition pruning query)

> Invalid explain when running dynamic partition pruning query in HOS
> ---
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
>  union_subquery.q 
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>

[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query

2017-06-22 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060374#comment-16060374
 ] 

liyunzhang_intel commented on HIVE-16948:
-

[~pxiong]: i found it in HoS, will modify the description soon.

> Invalid explain when running dynamic partition pruning query
> 
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
>  union_subquery.q 
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: ds
> 

[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query

2017-06-22 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060371#comment-16060371
 ] 

Pengcheng Xiong commented on HIVE-16948:


HoS? Hive on Spark?

> Invalid explain when running dynamic partition pruning query
> 
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
>  union_subquery.q 
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: ds
>   Statistics: Num rows: 2 Data 

[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-22 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060369#comment-16060369
 ] 

liyunzhang_intel commented on HIVE-11297:
-

[~csun]: for the second query you mentioned in RB. file HIVE-16948 to trace

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, 
> HIVE-11297.6.patch, HIVE-11297.7.patch, hive-site.xml
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16947) Semijoin Reduction : Task cycle created due to multiple semijoins in conjunction with hashjoin

2017-06-22 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16947:
--
Status: Patch Available  (was: In Progress)

> Semijoin Reduction : Task cycle created due to multiple semijoins in 
> conjunction with hashjoin
> --
>
> Key: HIVE-16947
> URL: https://issues.apache.org/jira/browse/HIVE-16947
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16947.1.patch
>
>
> Typically a semijoin branch and a mapjoin may create a cycle when on same 
> operator tree. This is already handled, however, a semijoin branch can serve 
> more than one filters and the cycle detection logic currently only handles 
> the 1st one causing cycles preventing the queries from running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16947) Semijoin Reduction : Task cycle created due to multiple semijoins in conjunction with hashjoin

2017-06-22 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16947:
--
Attachment: HIVE-16947.1.patch

Initial patch

> Semijoin Reduction : Task cycle created due to multiple semijoins in 
> conjunction with hashjoin
> --
>
> Key: HIVE-16947
> URL: https://issues.apache.org/jira/browse/HIVE-16947
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16947.1.patch
>
>
> Typically a semijoin branch and a mapjoin may create a cycle when on same 
> operator tree. This is already handled, however, a semijoin branch can serve 
> more than one filters and the cycle detection logic currently only handles 
> the 1st one causing cycles preventing the queries from running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16947) Semijoin Reduction : Task cycle created due to multiple semijoins in conjunction with hashjoin

2017-06-22 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16947 started by Deepak Jaiswal.
-
> Semijoin Reduction : Task cycle created due to multiple semijoins in 
> conjunction with hashjoin
> --
>
> Key: HIVE-16947
> URL: https://issues.apache.org/jira/browse/HIVE-16947
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> Typically a semijoin branch and a mapjoin may create a cycle when on same 
> operator tree. This is already handled, however, a semijoin branch can serve 
> more than one filters and the cycle detection logic currently only handles 
> the 1st one causing cycles preventing the queries from running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16947) Semijoin Reduction : Task cycle created due to multiple semijoins in conjunction with hashjoin

2017-06-22 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-16947:
-


> Semijoin Reduction : Task cycle created due to multiple semijoins in 
> conjunction with hashjoin
> --
>
> Key: HIVE-16947
> URL: https://issues.apache.org/jira/browse/HIVE-16947
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> Typically a semijoin branch and a mapjoin may create a cycle when on same 
> operator tree. This is already handled, however, a semijoin branch can serve 
> more than one filters and the cycle detection logic currently only handles 
> the 1st one causing cycles preventing the queries from running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-22 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Attachment: HIVE-16929.2.patch

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch, HIVE-16929.2.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, 
> which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package 
> that contains the corresponding configuration package name under the class 
> registered as a constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}
> Instructions:
>    1, upload your jar file to $ HIVE_HOME/auxlib
>    2, configure your UDF function corresponding to the package to the 
> following configuration parameters
> {code:java}
>
> hive.aux.udf.package.name.list
> com.sample.udf
>
>   {code}
>    3, the configuration items need to be placed in the hive-site.xml file
>    4, restart the Hive service to take effect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-22 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Status: Patch Available  (was: Open)

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch, HIVE-16929.2.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, 
> which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package 
> that contains the corresponding configuration package name under the class 
> registered as a constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}
> Instructions:
>    1, upload your jar file to $ HIVE_HOME/auxlib
>    2, configure your UDF function corresponding to the package to the 
> following configuration parameters
> {code:java}
>
> hive.aux.udf.package.name.list
> com.sample.udf
>
>   {code}
>    3, the configuration items need to be placed in the hive-site.xml file
>    4, restart the Hive service to take effect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-22 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Status: Open  (was: Patch Available)

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch, HIVE-16929.2.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, 
> which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package 
> that contains the corresponding configuration package name under the class 
> registered as a constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}
> Instructions:
>    1, upload your jar file to $ HIVE_HOME/auxlib
>    2, configure your UDF function corresponding to the package to the 
> following configuration parameters
> {code:java}
>
> hive.aux.udf.package.name.list
> com.sample.udf
>
>   {code}
>    3, the configuration items need to be placed in the hive-site.xml file
>    4, restart the Hive service to take effect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-22 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-11297:

Attachment: hive-site.xml

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, 
> HIVE-11297.6.patch, HIVE-11297.7.patch, hive-site.xml
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-22 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060336#comment-16060336
 ] 

liyunzhang_intel commented on HIVE-11297:
-

[~csun]: about the questions you mentioned in RB. there are two queries are 
different.  
explain query1( please use the attached hive-site.xml to verify, without the 
configuration in hive-site.xml,  i can not reproduce following explain)
{code}
set hive.execution.engine=spark; 
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.strict.checks.cartesian.product=false;
explain select count(*) from srcpart join srcpart_date on (srcpart.ds = 
srcpart_date.ds) join srcpart_hour on (srcpart.hr = srcpart_hour.hr) 
where srcpart_date.`date` = '2008-04-08' and srcpart_hour.hour = 11 and 
srcpart.hr = 11
{code}
previous explain 
{code}
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
Spark
  DagName: root_20170622213734_eb4c35e8-952a-4c4d-8972-ba5381bf51a3:2
  Vertices:
Map 7 
Map Operator Tree:
TableScan
  alias: srcpart_date
  filterExpr: ((date = '2008-04-08') and ds is not null) (type: 
boolean)
  Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE 
Column stats: NONE
  Filter Operator
predicate: ((date = '2008-04-08') and ds is not null) 
(type: boolean)
Statistics: Num rows: 1 Data size: 21 Basic stats: COMPLETE 
Column stats: NONE
Select Operator
  expressions: ds (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 21 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 21 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  keys: _col0 (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 21 Basic stats: 
COMPLETE Column stats: NONE
  Spark Partition Pruning Sink Operator
partition key expr: ds
Statistics: Num rows: 1 Data size: 21 Basic stats: 
COMPLETE Column stats: NONE
target column name: ds
target work: Map 1

  Stage: Stage-1
Spark
  Edges:
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 2), Map 5 (PARTITION-LEVEL 
SORT, 2)
Reducer 3 <- Map 6 (PARTITION-LEVEL SORT, 2), Reducer 2 
(PARTITION-LEVEL SORT, 2)
Reducer 4 <- Reducer 3 (GROUP, 1)
  DagName: root_20170622213734_eb4c35e8-952a-4c4d-8972-ba5381bf51a3:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: srcpart
  Statistics: Num rows: 1 Data size: 11624 Basic stats: PARTIAL 
Column stats: NONE
  Select Operator
expressions: ds (type: string), hr (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 11624 Basic stats: 
PARTIAL Column stats: NONE
Reduce Output Operator
  key expressions: _col0 (type: string)
  sort order: +
  Map-reduce partition columns: _col0 (type: string)
  Statistics: Num rows: 1 Data size: 11624 Basic stats: 
PARTIAL Column stats: NONE
  value expressions: _col1 (type: string)
Map 5 
Map Operator Tree:
TableScan
  alias: srcpart_date
  filterExpr: ((date = '2008-04-08') and ds is not null) (type: 
boolean)
  Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE 
Column stats: NONE
  Filter Operator
predicate: ((date = '2008-04-08') and ds is not null) 
(type: boolean)
Statistics: Num rows: 1 Data size: 21 Basic stats: COMPLETE 
Column stats: NONE
Select Operator
  expressions: ds (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 21 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
 

[jira] [Updated] (HIVE-16943) MoveTask should separate src FileSystem from dest FileSystem

2017-06-22 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-16943:
---
Status: Patch Available  (was: Open)

> MoveTask should separate src FileSystem from dest FileSystem 
> -
>
> Key: HIVE-16943
> URL: https://issues.apache.org/jira/browse/HIVE-16943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16943.1.patch
>
>
> {code:title=MoveTask.java|borderStyle=solid} 
>  private void moveFileInDfs (Path sourcePath, Path targetPath, FileSystem fs)
>   throws HiveException, IOException {
> // if source exists, rename. Otherwise, create a empty directory
> if (fs.exists(sourcePath)) {
>   Path deletePath = null;
>   // If it multiple level of folder are there fs.rename is failing so 
> first
>   // create the targetpath.getParent() if it not exist
>   if (HiveConf.getBoolVar(conf, 
> HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
> deletePath = createTargetPath(targetPath, fs);
>   }
>   Hive.clearDestForSubDirSrc(conf, targetPath, sourcePath, false);
>   if (!Hive.moveFile(conf, sourcePath, targetPath, true, false)) {
> try {
>   if (deletePath != null) {
> fs.delete(deletePath, true);
>   }
> } catch (IOException e) {
>   LOG.info("Unable to delete the path created for facilitating rename"
>   + deletePath);
> }
> throw new HiveException("Unable to rename: " + sourcePath
> + " to: " + targetPath);
>   }
> } else if (!fs.mkdirs(targetPath)) {
>   throw new HiveException("Unable to make directory: " + targetPath);
> }
>   }
> {code}
> Maybe sourcePath and targetPath come from defferent filesystem, we should 
> separate them.
> I see that HIVE-11568 had done it in Hive.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16943) MoveTask should separate src FileSystem from dest FileSystem

2017-06-22 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-16943:
---
Attachment: HIVE-16943.1.patch

> MoveTask should separate src FileSystem from dest FileSystem 
> -
>
> Key: HIVE-16943
> URL: https://issues.apache.org/jira/browse/HIVE-16943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16943.1.patch
>
>
> {code:title=MoveTask.java|borderStyle=solid} 
>  private void moveFileInDfs (Path sourcePath, Path targetPath, FileSystem fs)
>   throws HiveException, IOException {
> // if source exists, rename. Otherwise, create a empty directory
> if (fs.exists(sourcePath)) {
>   Path deletePath = null;
>   // If it multiple level of folder are there fs.rename is failing so 
> first
>   // create the targetpath.getParent() if it not exist
>   if (HiveConf.getBoolVar(conf, 
> HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
> deletePath = createTargetPath(targetPath, fs);
>   }
>   Hive.clearDestForSubDirSrc(conf, targetPath, sourcePath, false);
>   if (!Hive.moveFile(conf, sourcePath, targetPath, true, false)) {
> try {
>   if (deletePath != null) {
> fs.delete(deletePath, true);
>   }
> } catch (IOException e) {
>   LOG.info("Unable to delete the path created for facilitating rename"
>   + deletePath);
> }
> throw new HiveException("Unable to rename: " + sourcePath
> + " to: " + targetPath);
>   }
> } else if (!fs.mkdirs(targetPath)) {
>   throw new HiveException("Unable to make directory: " + targetPath);
> }
>   }
> {code}
> Maybe sourcePath and targetPath come from defferent filesystem, we should 
> separate them.
> I see that HIVE-11568 had done it in Hive.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16943) MoveTask should separate src FileSystem from dest FileSystem

2017-06-22 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-16943:
---
Attachment: (was: HIVE-16943.patch)

> MoveTask should separate src FileSystem from dest FileSystem 
> -
>
> Key: HIVE-16943
> URL: https://issues.apache.org/jira/browse/HIVE-16943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16943.1.patch
>
>
> {code:title=MoveTask.java|borderStyle=solid} 
>  private void moveFileInDfs (Path sourcePath, Path targetPath, FileSystem fs)
>   throws HiveException, IOException {
> // if source exists, rename. Otherwise, create a empty directory
> if (fs.exists(sourcePath)) {
>   Path deletePath = null;
>   // If it multiple level of folder are there fs.rename is failing so 
> first
>   // create the targetpath.getParent() if it not exist
>   if (HiveConf.getBoolVar(conf, 
> HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
> deletePath = createTargetPath(targetPath, fs);
>   }
>   Hive.clearDestForSubDirSrc(conf, targetPath, sourcePath, false);
>   if (!Hive.moveFile(conf, sourcePath, targetPath, true, false)) {
> try {
>   if (deletePath != null) {
> fs.delete(deletePath, true);
>   }
> } catch (IOException e) {
>   LOG.info("Unable to delete the path created for facilitating rename"
>   + deletePath);
> }
> throw new HiveException("Unable to rename: " + sourcePath
> + " to: " + targetPath);
>   }
> } else if (!fs.mkdirs(targetPath)) {
>   throw new HiveException("Unable to make directory: " + targetPath);
> }
>   }
> {code}
> Maybe sourcePath and targetPath come from defferent filesystem, we should 
> separate them.
> I see that HIVE-11568 had done it in Hive.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-06-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16832:
--
Attachment: HIVE-16832.08.patch

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16943) MoveTask should separate src FileSystem from dest FileSystem

2017-06-22 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060249#comment-16060249
 ] 

Fei Hui commented on HIVE-16943:


CC [~alangates] [~Ferd]

> MoveTask should separate src FileSystem from dest FileSystem 
> -
>
> Key: HIVE-16943
> URL: https://issues.apache.org/jira/browse/HIVE-16943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16943.patch
>
>
> {code:title=MoveTask.java|borderStyle=solid} 
>  private void moveFileInDfs (Path sourcePath, Path targetPath, FileSystem fs)
>   throws HiveException, IOException {
> // if source exists, rename. Otherwise, create a empty directory
> if (fs.exists(sourcePath)) {
>   Path deletePath = null;
>   // If it multiple level of folder are there fs.rename is failing so 
> first
>   // create the targetpath.getParent() if it not exist
>   if (HiveConf.getBoolVar(conf, 
> HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
> deletePath = createTargetPath(targetPath, fs);
>   }
>   Hive.clearDestForSubDirSrc(conf, targetPath, sourcePath, false);
>   if (!Hive.moveFile(conf, sourcePath, targetPath, true, false)) {
> try {
>   if (deletePath != null) {
> fs.delete(deletePath, true);
>   }
> } catch (IOException e) {
>   LOG.info("Unable to delete the path created for facilitating rename"
>   + deletePath);
> }
> throw new HiveException("Unable to rename: " + sourcePath
> + " to: " + targetPath);
>   }
> } else if (!fs.mkdirs(targetPath)) {
>   throw new HiveException("Unable to make directory: " + targetPath);
> }
>   }
> {code}
> Maybe sourcePath and targetPath come from defferent filesystem, we should 
> separate them.
> I see that HIVE-11568 had done it in Hive.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-22 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060201#comment-16060201
 ] 

Matt McCline commented on HIVE-16589:
-

Committed to master.

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, 
> HIVE-16589.0995.patch, HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16589:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, 
> HIVE-16589.0995.patch, HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16589:

Fix Version/s: 3.0.0

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, 
> HIVE-16589.0995.patch, HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16778) LLAP IO: better refcount management

2017-06-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060181#comment-16060181
 ] 

Sergey Shelukhin commented on HIVE-16778:
-

Also committed to branch-2

> LLAP IO: better refcount management
> ---
>
> Key: HIVE-16778
> URL: https://issues.apache.org/jira/browse/HIVE-16778
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16778.patch, HIVE-16778.patch
>
>
> Looks like task cancellation can close the UGI, causing the background thread 
> to die with an exception, leaving a bunch of unreleased cache buffers.
> Overall, it's probably better to modify how refcounts are handled - if 
> there's some bug in the code we don't want to leak them. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16778) LLAP IO: better refcount management

2017-06-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16778:

Fix Version/s: 2.4.0

> LLAP IO: better refcount management
> ---
>
> Key: HIVE-16778
> URL: https://issues.apache.org/jira/browse/HIVE-16778
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16778.patch, HIVE-16778.patch
>
>
> Looks like task cancellation can close the UGI, causing the background thread 
> to die with an exception, leaving a bunch of unreleased cache buffers.
> Overall, it's probably better to modify how refcounts are handled - if 
> there's some bug in the code we don't want to leak them. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16939) metastore error: 'export: -Dproc_metastore : not a valid identifier'

2017-06-22 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-16939:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to the upstream. Thanks [~ferhui] for the contribution.

> metastore error: 'export: -Dproc_metastore : not a valid identifier'
> 
>
> Key: HIVE-16939
> URL: https://issues.apache.org/jira/browse/HIVE-16939
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Fix For: 3.0.0
>
> Attachments: HIVE-16939.patch
>
>
> When i run metastore, it reports errors as bellow
> {quote}
> bin/ext/metastore.sh: line 29: export: ` -Dproc_metastore  ': not a valid 
> identifier
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14688) Hive drop call fails in presence of TDE

2017-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060102#comment-16060102
 ] 

Hive QA commented on HIVE-14688:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844114/HIVE-14688.4.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5736/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5736/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5736/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-06-22 22:15:46.299
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5736/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-06-22 22:15:46.301
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   7819cd3..b47736f  master -> origin/master
   3298e7f..f4a8fef  branch-2   -> origin/branch-2
+ git reset --hard HEAD
HEAD is now at 7819cd3 HIVE-16867: Extend shared scan optimizer to reuse 
computation from other operators (Jesus Camacho Rodriguez, reviewed by Ashutosh 
Chauhan)
+ git clean -f -d
Removing ql/src/test/queries/clientpositive/llap_smb.q
Removing ql/src/test/results/clientpositive/llap/llap_smb.q.out
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at b47736f HIVE-16930: HoS should verify the value of Kerberos 
principal and keytab file before adding them to spark-submit command parameters 
(Yibing Shi via Chaoyu Tang)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-06-22 22:15:52.164
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: itests/src/test/resources/testconfiguration.properties:710
error: itests/src/test/resources/testconfiguration.properties: patch does not 
apply
error: patch failed: 
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:1786
error: metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java: 
patch does not apply
error: patch failed: 
ql/src/test/results/clientpositive/encrypted/encryption_drop_partition.q.out:111
error: 
ql/src/test/results/clientpositive/encrypted/encryption_drop_partition.q.out: 
patch does not apply
error: patch failed: 
ql/src/test/results/clientpositive/encrypted/encryption_drop_table.q.out:67
error: 
ql/src/test/results/clientpositive/encrypted/encryption_drop_table.q.out: patch 
does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844114 - PreCommit-HIVE-Build

> Hive drop call fails in presence of TDE
> ---
>
> Key: HIVE-14688
> URL: https://issues.apache.org/jira/browse/HIVE-14688
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Deepesh Khandelwal
>Assignee: Wei Zheng
> Attachments: HIVE-14688.1.patch, HIVE-14688.2.patch, 
> HIVE-14688.3.patch, HIVE-14688.4.patch
>
>
> This should be committed to when Hive moves to Hadoop 2.8
> In Hadoop 2.8.0 TDE trash collection was fixed through HDFS-8831. This 
> enables us to make drop table calls for Hive managed tables where Hive 
> metastore warehouse directory is in encrypted zone. However even with the 
> feature in HDFS, Hive drop table currently fail:

[jira] [Commented] (HIVE-16874) qurey fail when try to read file from remote hdfs

2017-06-22 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060066#comment-16060066
 ] 

Thejas M Nair commented on HIVE-16874:
--

This might be fixed via changes in HIVE-14380


> qurey fail when try to read file from remote hdfs
> -
>
> Key: HIVE-16874
> URL: https://issues.apache.org/jira/browse/HIVE-16874
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 1.2.1
>Reporter: Yunjian Zhang
> Attachments: HIVE-6.ext.patch
>
>
> as per an extend issue on HIVE-6, table join and insert on remote hdfs 
> storage will fail with same issue.
> batch base on 
> https://issues.apache.org/jira/secure/attachment/12820392/HIVE-6.1.patch, 
> attached patch will fix the issues mentioned here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14380) Queries on tables with remote HDFS paths fail in "encryption" checks.

2017-06-22 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060064#comment-16060064
 ] 

Thejas M Nair commented on HIVE-14380:
--

Thanks for checking about the metastore fix [~mithun]!


> Queries on tables with remote HDFS paths fail in "encryption" checks.
> -
>
> Key: HIVE-14380
> URL: https://issues.apache.org/jira/browse/HIVE-14380
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 2.2.0
>
> Attachments: HIVE-14380.1.patch
>
>
> If a table has table/partition locations set to remote HDFS paths, querying 
> them will cause the following IAException:
> {noformat}
> 2016-07-26 01:16:27,471 ERROR parse.CalcitePlanner 
> (SemanticAnalyzer.java:getMetaData(1867)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if 
> hdfs://foo.ygrid.yahoo.com:8020/projects/my_db/my_table is encrypted: 
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://foo.ygrid.yahoo.com:8020/projects/my_db/my_table, expected: 
> hdfs://bar.ygrid.yahoo.com:8020
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:2204)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:2274)
> ...
> {noformat}
> This is because of the following code in {{SessionState}}:
> {code:title=SessionState.java|borderStyle=solid}
>  public HadoopShims.HdfsEncryptionShim getHdfsEncryptionShim() throws 
> HiveException {
> if (hdfsEncryptionShim == null) {
>   try {
> FileSystem fs = FileSystem.get(sessionConf);
> if ("hdfs".equals(fs.getUri().getScheme())) {
>   hdfsEncryptionShim = 
> ShimLoader.getHadoopShims().createHdfsEncryptionShim(fs, sessionConf);
> } else {
>   LOG.debug("Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.");
> }
>   } catch (Exception e) {
> throw new HiveException(e);
>   }
> }
> return hdfsEncryptionShim;
>   }
> {code}
> When the {{FileSystem}} instance is created, using the {{sessionConf}} 
> implies that the current HDFS is going to be used. This call should instead 
> fetch the {{FileSystem}} instance corresponding to the path being checked.
> A fix is forthcoming...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060060#comment-16060060
 ] 

Sergey Shelukhin commented on HIVE-16761:
-

Interesting... the results have changed. Need to investigate

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.02.patch, 
> HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060058#comment-16060058
 ] 

Hive QA commented on HIVE-16761:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874131/HIVE-16761.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10847 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5735/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5735/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5735/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874131 - PreCommit-HIVE-Build

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.02.patch, 
> HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16930) HoS should verify the value of Kerberos principal and keytab file before adding them to spark-submit command parameters

2017-06-22 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-16930:
---
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed to 3.0.0 and 2.4.0. Thanks [~Yibing] for the patch.

> HoS should verify the value of Kerberos principal and keytab file before 
> adding them to spark-submit command parameters
> ---
>
> Key: HIVE-16930
> URL: https://issues.apache.org/jira/browse/HIVE-16930
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16930.1.patch
>
>
> When Kerberos is enabled, Hive CLI fails to run Hive on Spark queries:
> {noformat}
> >hive -e "set hive.execution.engine=spark; create table if not exists test(a 
> >int); select count(*) from test" --hiveconf hive.root.logger=INFO,console > 
> >/var/tmp/hive_log.txt > /var/tmp/hive_log_2.txt 
> 17/06/16 16:13:13 [main]: ERROR client.SparkClientImpl: Error while waiting 
> for client to connect. 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel 
> client 'a5de85d1-6933-43e7-986f-5f8e5c001b5f'. Error: Child process exited 
> before connecting back with error log Error: Cannot load main class from JAR 
> file:/tmp/spark-submit.7196051517706529285.properties 
> Run with --help for usage help or --verbose for debug output 
> at 
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) 
> at 
> org.apache.hive.spark.client.SparkClientImpl.(SparkClientImpl.java:107) 
> at 
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
>  
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:100)
>  
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.(RemoteHiveSparkClient.java:96)
>  
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:66)
>  
> at 
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62)
>  
> at 
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
>  
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:111)
>  
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:97) 
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) 
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1972) 
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1685) 
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1421) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1205) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1195) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:720) 
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693) 
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:606) 
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
> Caused by: java.lang.RuntimeException: Cancel client 
> 'a5de85d1-6933-43e7-986f-5f8e5c001b5f'. Error: Child process exited before 
> connecting back with error log Error: Cannot load main class from JAR 
> file:/tmp/spark-submit.7196051517706529285.properties 
> Run with --help for usage help or --verbose for debug output 
> at 
> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179) 
> at 
> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:490) 
> at 

[jira] [Updated] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16761:

Attachment: HIVE-16761.02.patch

Added a test. Verified that it fails with the path error with the new code 
commented out.

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.02.patch, 
> HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-22 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059884#comment-16059884
 ] 

Matt McCline commented on HIVE-16589:
-

Thank you [~jdere] for your diligent and careful code review.

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, 
> HIVE-16589.0995.patch, HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059874#comment-16059874
 ] 

Hive QA commented on HIVE-15665:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874119/HIVE-15665.04.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10846 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5734/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5734/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5734/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874119 - PreCommit-HIVE-Build

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-22 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059844#comment-16059844
 ] 

Deepak Jaiswal commented on HIVE-16761:
---

+1


> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16932) incorrect predicate evaluation

2017-06-22 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059805#comment-16059805
 ] 

Jesus Camacho Rodriguez commented on HIVE-16932:


[~hopperjim], I have run this with multiple versions and it does not seem to be 
a problem (for given example I get 75000). What is the result that you get?

> incorrect predicate evaluation
> --
>
> Key: HIVE-16932
> URL: https://issues.apache.org/jira/browse/HIVE-16932
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Hive, ORC
>Affects Versions: 1.2.1
> Environment: CentOS, HDP 2.6
>Reporter: Jim Hopper
>
> hive returns incorrect number of rows when BETWEEN and NOT BETWEEN operators 
> are used in WHERE clause while querying a table that uses ORC as a storage 
> format.
> script to replicate the issue on HDP 2.6:
> {code}
> SET hive.exec.compress.output=false;
> SET hive.vectorized.execution.enabled=false;
> SET hive.optimize.ppd=true;
> SET hive.optimize.ppd.storage=true;
> SET N=10;
> SET TTT=default.tmp_tbl_text;
> SET TTO=default.tmp_tbl_orc;
> DROP TABLE IF EXISTS ${hiveconf:TTT};
> DROP TABLE IF EXISTS ${hiveconf:TTO};
> create table ${hiveconf:TTT}
> stored as textfile
> as
> select pos as c
> from (
> select posexplode(split(repeat(',', ${hiveconf:N}), ','))
> ) as t;
> create table ${hiveconf:TTO}
> stored as orc
> as
> select c
> from ${hiveconf:TTT};
> SELECT count(c) as cnt
> FROM ${hiveconf:TTT}
> WHERE
> c between 0 and ${hiveconf:N}
> and c not between ${hiveconf:N} div 4 and ${hiveconf:N} div 2
> ;
> SELECT count(c) as cnt
> FROM ${hiveconf:TTO}
> WHERE
> c between 0 and ${hiveconf:N}
> and c not between ${hiveconf:N} div 4 and ${hiveconf:N} div 2
> ;
> DROP TABLE IF EXISTS ${hiveconf:TTT};
> DROP TABLE IF EXISTS ${hiveconf:TTO};
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-06-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: HIVE-15665.04.patch

Fixing issues

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16946) Information Schema Improvements

2017-06-22 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reassigned HIVE-16946:
-


> Information Schema Improvements
> ---
>
> Key: HIVE-16946
> URL: https://issues.apache.org/jira/browse/HIVE-16946
> Project: Hive
>  Issue Type: Improvement
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>
> Collection of requested enhancements and fixes for the info schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-22 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059738#comment-16059738
 ] 

Deepak Jaiswal commented on HIVE-16761:
---

Sure.

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-22 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059718#comment-16059718
 ] 

Gunther Hagleitner commented on HIVE-16761:
---

Patch looks good, but needs a test before commit. [~djaiswal] can you also take 
a look?

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-22 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059708#comment-16059708
 ] 

Remus Rusanu commented on HIVE-16888:
-

I'm going through safe golden files updates (ie. better reduced predicates) and 
I'll put a new patch soon to see only more problematic diffs (result difs and 
some Tez graph diffs)

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-06-22 Thread Ratandeep Ratti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059651#comment-16059651
 ] 

Ratandeep Ratti commented on HIVE-16908:


[~sbeeram] . The tests look OK to me.

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16905) Add zookeeper ACL for hiveserver2

2017-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059585#comment-16059585
 ] 

Hive QA commented on HIVE-16905:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874089/HIVE%20ACL%20FOR%20HIVESERVER2.pdf

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5733/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5733/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5733/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-06-22 16:02:18.335
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5733/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-06-22 16:02:18.338
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   71f52d8..7819cd3  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 71f52d8 HIVE-16875: Query against view with partitioned child on 
HoS fails with privilege exception. (Yongzhi Chen, reviewed by Aihua Xu)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 7819cd3 HIVE-16867: Extend shared scan optimizer to reuse 
computation from other operators (Jesus Camacho Rodriguez, reviewed by Ashutosh 
Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-06-22 16:02:21.598
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
patch:  Only garbage was found in the patch input.
patch:  Only garbage was found in the patch input.
patch:  Only garbage was found in the patch input.
fatal: unrecognized input
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874089 - PreCommit-HIVE-Build

> Add zookeeper ACL for hiveserver2
> -
>
> Key: HIVE-16905
> URL: https://issues.apache.org/jira/browse/HIVE-16905
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
> Attachments: HIVE-16905.1.patch, HIVE ACL FOR HIVESERVER2.pdf
>
>
> Add zookeeper ACL for hiveserver2 is necessary for hive to protect the znode 
> of hiveserver2 deleted by accident.
> --
> case:
> when i do beeline connections throught hive HA with zookeeper, i suddenly 
> find the beeline can not connect the hiveserve2.The reason of the problem is 
> that others delete the /hiveserver2 falsely which cause to the beeline 
> connection is failed and can not read the configs from zookeeper.
> -
> as a result of the acl of /hiveserver2, the acl is set to world:anyone:cdrwa 
> which meant to anyone easily delete the /hiveserver2 and znodes anytime.It is 
> unsafe and necessary to protect the znode /hiveserver2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059564#comment-16059564
 ] 

Hive QA commented on HIVE-16888:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874085/HIVE-16888.03.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 155 failed/errored test(s), 10846 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=229)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=229)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=229)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[mapjoin2] 
(batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[udf_unix_timestamp] 
(batchId=238)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join12] (batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join16] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join4] (batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join5] (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join8] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_date] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cast_on_constant] 
(batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_annotate_stats_groupby]
 (batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join17] 
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_cross_product_check_2]
 (batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_gby2_map_multi_distinct]
 (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_groupby3_noskew_multi_distinct]
 (batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_join0] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_outer_join_ppr] 
(batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[char_cast] (batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas_date] (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_1] (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_4] (batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_udf] (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic2] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_intervals] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_timeseries] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_topn] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extrapolate_part_stats_date]
 (batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[filter_union] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fold_eq_with_case_when] 
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fouter_join_ppr] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_position] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_ppr_multi_distinct]
 (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables] 
(batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables_compact]
 (batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_3] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_alt] (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_arithmetic] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join12] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join16] (batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join4] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join5] (batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join8] (batchId=45)

[jira] [Commented] (HIVE-16944) schematool -dbType hive should give some more feedback/assistance

2017-06-22 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059560#comment-16059560
 ] 

Carter Shanklin commented on HIVE-16944:


Also [~vihangk1] if you're interested in setting up INFORMATION_SCHEMA there's 
a full how-to in HIVE-16941

> schematool -dbType hive should give some more feedback/assistance
> -
>
> Key: HIVE-16944
> URL: https://issues.apache.org/jira/browse/HIVE-16944
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>
> Given the other ways schematool is used, the most obvious guess I would have 
> for initializing the Hive schema is:
> {code}
> schematool -metaDbType mysql -dbType hive -initSchema
> {code}
> Unfortunately that fails with this NPE:
> {code}
> Exception in thread "main" java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.tools.HiveSchemaHelper.getDbCommandParser(HiveSchemaHelper.java:570)
>   at 
> org.apache.hadoop.hive.metastore.tools.HiveSchemaHelper.getDbCommandParser(HiveSchemaHelper.java:564)
>   at 
> org.apache.hadoop.hive.metastore.tools.HiveSchemaHelper.getDbCommandParser(HiveSchemaHelper.java:560)
>   at 
> org.apache.hadoop.hive.metastore.tools.HiveSchemaHelper$HiveCommandParser.(HiveSchemaHelper.java:373)
>   at 
> org.apache.hadoop.hive.metastore.tools.HiveSchemaHelper.getDbCommandParser(HiveSchemaHelper.java:573)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.getDbCommandParser(HiveSchemaTool.java:165)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.(HiveSchemaTool.java:101)
>   at org.apache.hive.beeline.HiveSchemaTool.(HiveSchemaTool.java:90)
>   at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1166)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> {code}
> Two additional arguments are needed:
> -url jdbc:hive2://localhost:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> If the user does not supply these for dbType hive, schematool should detect 
> and error out appropriately, plus give an example of what it's looking for.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16944) schematool -dbType hive should give some more feedback/assistance

2017-06-22 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059528#comment-16059528
 ] 

Vihang Karajgaonkar commented on HIVE-16944:


I can take a look at this [~cartershanklin]. Curious to understand what is the 
difference between {{metaDbType}} and {{dbType}}

> schematool -dbType hive should give some more feedback/assistance
> -
>
> Key: HIVE-16944
> URL: https://issues.apache.org/jira/browse/HIVE-16944
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>
> Given the other ways schematool is used, the most obvious guess I would have 
> for initializing the Hive schema is:
> {code}
> schematool -metaDbType mysql -dbType hive -initSchema
> {code}
> Unfortunately that fails with this NPE:
> {code}
> Exception in thread "main" java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.tools.HiveSchemaHelper.getDbCommandParser(HiveSchemaHelper.java:570)
>   at 
> org.apache.hadoop.hive.metastore.tools.HiveSchemaHelper.getDbCommandParser(HiveSchemaHelper.java:564)
>   at 
> org.apache.hadoop.hive.metastore.tools.HiveSchemaHelper.getDbCommandParser(HiveSchemaHelper.java:560)
>   at 
> org.apache.hadoop.hive.metastore.tools.HiveSchemaHelper$HiveCommandParser.(HiveSchemaHelper.java:373)
>   at 
> org.apache.hadoop.hive.metastore.tools.HiveSchemaHelper.getDbCommandParser(HiveSchemaHelper.java:573)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.getDbCommandParser(HiveSchemaTool.java:165)
>   at 
> org.apache.hive.beeline.HiveSchemaTool.(HiveSchemaTool.java:101)
>   at org.apache.hive.beeline.HiveSchemaTool.(HiveSchemaTool.java:90)
>   at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1166)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> {code}
> Two additional arguments are needed:
> -url jdbc:hive2://localhost:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> If the user does not supply these for dbType hive, schematool should detect 
> and error out appropriately, plus give an example of what it's looking for.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16867) Extend shared scan optimizer to reuse computation from other operators

2017-06-22 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16867:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for reviewing [~ashutoshc]!

> Extend shared scan optimizer to reuse computation from other operators
> --
>
> Key: HIVE-16867
> URL: https://issues.apache.org/jira/browse/HIVE-16867
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-16867.01.patch, HIVE-16867.02.patch, 
> HIVE-16867.03.patch, HIVE-16867.04.patch, HIVE-16867.patch
>
>
> Follow-up of the work in HIVE-16602.
> HIVE-16602 introduced an optimization that identifies scans on input tables 
> that can be merged so the data is read only once.
> This extension to that rule allows to reuse the computation that is done in 
> the work containing those scans. In particular, we traverse both parts of the 
> plan upstream and reuse the operators if possible.
> Currently, the optimizer will not go beyond the output edge(s) of that work. 
> Follow-up extensions might remove this limitation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16648) Allow select distinct with group by

2017-06-22 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059504#comment-16059504
 ] 

Carter Shanklin commented on HIVE-16648:


To clarify I wasn't looking for a workaround but this case does come up when 
porting SQL from other DBs to Hive, most mature SQL engines are smart enough to 
ignore the distinct clause in this case.

Looks like HIVE-16924 is going to tackle the more complex case when aggregates 
are also present but that should cover this one as well.

> Allow select distinct with group by
> ---
>
> Key: HIVE-16648
> URL: https://issues.apache.org/jira/browse/HIVE-16648
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Anshuman
>
> Although there are very few legitimate reasons to have both "select distinct" 
> and "group by" in the same query, it is still used from time to time and 
> other systems support it.
> Illustrating the issue:
> {code}
> hive> create table test (c1 integer);
> OK
> Time taken: 0.073 seconds
> hive> select distinct c1 from test group by c1;
> FAILED: SemanticException 1:38 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16943) MoveTask should separate src FileSystem from dest FileSystem

2017-06-22 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059499#comment-16059499
 ] 

Fei Hui commented on HIVE-16943:


[~sershe] could you please take a look ?  Thanks.
I see you did the similar change in HIVE-11568

> MoveTask should separate src FileSystem from dest FileSystem 
> -
>
> Key: HIVE-16943
> URL: https://issues.apache.org/jira/browse/HIVE-16943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Fei Hui
> Attachments: HIVE-16943.patch
>
>
> {code:title=MoveTask.java|borderStyle=solid} 
>  private void moveFileInDfs (Path sourcePath, Path targetPath, FileSystem fs)
>   throws HiveException, IOException {
> // if source exists, rename. Otherwise, create a empty directory
> if (fs.exists(sourcePath)) {
>   Path deletePath = null;
>   // If it multiple level of folder are there fs.rename is failing so 
> first
>   // create the targetpath.getParent() if it not exist
>   if (HiveConf.getBoolVar(conf, 
> HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
> deletePath = createTargetPath(targetPath, fs);
>   }
>   Hive.clearDestForSubDirSrc(conf, targetPath, sourcePath, false);
>   if (!Hive.moveFile(conf, sourcePath, targetPath, true, false)) {
> try {
>   if (deletePath != null) {
> fs.delete(deletePath, true);
>   }
> } catch (IOException e) {
>   LOG.info("Unable to delete the path created for facilitating rename"
>   + deletePath);
> }
> throw new HiveException("Unable to rename: " + sourcePath
> + " to: " + targetPath);
>   }
> } else if (!fs.mkdirs(targetPath)) {
>   throw new HiveException("Unable to make directory: " + targetPath);
> }
>   }
> {code}
> Maybe sourcePath and targetPath come from defferent filesystem, we should 
> separate them.
> I see that HIVE-11568 had done it in Hive.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16943) MoveTask should separate src FileSystem from dest FileSystem

2017-06-22 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui reassigned HIVE-16943:
--

Assignee: Fei Hui

> MoveTask should separate src FileSystem from dest FileSystem 
> -
>
> Key: HIVE-16943
> URL: https://issues.apache.org/jira/browse/HIVE-16943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16943.patch
>
>
> {code:title=MoveTask.java|borderStyle=solid} 
>  private void moveFileInDfs (Path sourcePath, Path targetPath, FileSystem fs)
>   throws HiveException, IOException {
> // if source exists, rename. Otherwise, create a empty directory
> if (fs.exists(sourcePath)) {
>   Path deletePath = null;
>   // If it multiple level of folder are there fs.rename is failing so 
> first
>   // create the targetpath.getParent() if it not exist
>   if (HiveConf.getBoolVar(conf, 
> HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
> deletePath = createTargetPath(targetPath, fs);
>   }
>   Hive.clearDestForSubDirSrc(conf, targetPath, sourcePath, false);
>   if (!Hive.moveFile(conf, sourcePath, targetPath, true, false)) {
> try {
>   if (deletePath != null) {
> fs.delete(deletePath, true);
>   }
> } catch (IOException e) {
>   LOG.info("Unable to delete the path created for facilitating rename"
>   + deletePath);
> }
> throw new HiveException("Unable to rename: " + sourcePath
> + " to: " + targetPath);
>   }
> } else if (!fs.mkdirs(targetPath)) {
>   throw new HiveException("Unable to make directory: " + targetPath);
> }
>   }
> {code}
> Maybe sourcePath and targetPath come from defferent filesystem, we should 
> separate them.
> I see that HIVE-11568 had done it in Hive.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16943) MoveTask should separate src FileSystem from dest FileSystem

2017-06-22 Thread Fei Hui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-16943:
---
Attachment: HIVE-16943.patch

patch upload

> MoveTask should separate src FileSystem from dest FileSystem 
> -
>
> Key: HIVE-16943
> URL: https://issues.apache.org/jira/browse/HIVE-16943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Fei Hui
> Attachments: HIVE-16943.patch
>
>
> {code:title=MoveTask.java|borderStyle=solid} 
>  private void moveFileInDfs (Path sourcePath, Path targetPath, FileSystem fs)
>   throws HiveException, IOException {
> // if source exists, rename. Otherwise, create a empty directory
> if (fs.exists(sourcePath)) {
>   Path deletePath = null;
>   // If it multiple level of folder are there fs.rename is failing so 
> first
>   // create the targetpath.getParent() if it not exist
>   if (HiveConf.getBoolVar(conf, 
> HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
> deletePath = createTargetPath(targetPath, fs);
>   }
>   Hive.clearDestForSubDirSrc(conf, targetPath, sourcePath, false);
>   if (!Hive.moveFile(conf, sourcePath, targetPath, true, false)) {
> try {
>   if (deletePath != null) {
> fs.delete(deletePath, true);
>   }
> } catch (IOException e) {
>   LOG.info("Unable to delete the path created for facilitating rename"
>   + deletePath);
> }
> throw new HiveException("Unable to rename: " + sourcePath
> + " to: " + targetPath);
>   }
> } else if (!fs.mkdirs(targetPath)) {
>   throw new HiveException("Unable to make directory: " + targetPath);
> }
>   }
> {code}
> Maybe sourcePath and targetPath come from defferent filesystem, we should 
> separate them.
> I see that HIVE-11568 had done it in Hive.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16905) Add zookeeper ACL for hiveserver2

2017-06-22 Thread Saijin Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang updated HIVE-16905:

Attachment: HIVE ACL FOR HIVESERVER2.pdf

> Add zookeeper ACL for hiveserver2
> -
>
> Key: HIVE-16905
> URL: https://issues.apache.org/jira/browse/HIVE-16905
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
> Attachments: HIVE-16905.1.patch, HIVE ACL FOR HIVESERVER2.pdf
>
>
> Add zookeeper ACL for hiveserver2 is necessary for hive to protect the znode 
> of hiveserver2 deleted by accident.
> --
> case:
> when i do beeline connections throught hive HA with zookeeper, i suddenly 
> find the beeline can not connect the hiveserve2.The reason of the problem is 
> that others delete the /hiveserver2 falsely which cause to the beeline 
> connection is failed and can not read the configs from zookeeper.
> -
> as a result of the acl of /hiveserver2, the acl is set to world:anyone:cdrwa 
> which meant to anyone easily delete the /hiveserver2 and znodes anytime.It is 
> unsafe and necessary to protect the znode /hiveserver2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16905) Add zookeeper ACL for hiveserver2

2017-06-22 Thread Saijin Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059453#comment-16059453
 ] 

Saijin Huang commented on HIVE-16905:
-

the doc is updated.

> Add zookeeper ACL for hiveserver2
> -
>
> Key: HIVE-16905
> URL: https://issues.apache.org/jira/browse/HIVE-16905
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
> Attachments: HIVE-16905.1.patch, HIVE ACL FOR HIVESERVER2.pdf
>
>
> Add zookeeper ACL for hiveserver2 is necessary for hive to protect the znode 
> of hiveserver2 deleted by accident.
> --
> case:
> when i do beeline connections throught hive HA with zookeeper, i suddenly 
> find the beeline can not connect the hiveserve2.The reason of the problem is 
> that others delete the /hiveserver2 falsely which cause to the beeline 
> connection is failed and can not read the configs from zookeeper.
> -
> as a result of the acl of /hiveserver2, the acl is set to world:anyone:cdrwa 
> which meant to anyone easily delete the /hiveserver2 and znodes anytime.It is 
> unsafe and necessary to protect the znode /hiveserver2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16875) Query against view with partitioned child on HoS fails with privilege exception.

2017-06-22 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-16875:

   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed the fix to master and branch-2. Thanks [~aihuaxu] for reviewing the 
code.

> Query against view with partitioned child on HoS fails with privilege 
> exception.
> 
>
> Key: HIVE-16875
> URL: https://issues.apache.org/jira/browse/HIVE-16875
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16875.1.patch, HIVE-16875.2.patch, 
> HIVE-16875.3.patch
>
>
> Query against view with child table that has partitions fails with privilege 
> exception even with correct privileges.
> Reproduce:
> {noformat}
> create table jsamp1 (a string) partitioned by (b int);
> insert into table jsamp1 partition (b=1) values ("hello");
> create view jview as select * from jsamp1;
> create role viewtester;
> grant all on table jview to role viewtester;
> grant role viewtester to group testers;
> Use MR, the select will succeed:
> set hive.execution.engine=mr;
> select count(*) from jview;
> while use spark:
> set hive.execution.engine=spark;
> select count(*) from jview;
> it fails with:
> Error: Error while compiling statement: FAILED: SemanticException No valid 
> privileges
>  User tester does not have privileges for QUERY
>  The required privileges: 
> Server=server1->Db=default->Table=j1part->action=select; 
> (state=42000,code=4)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16934) Transform COUNT(x) into COUNT() when x is not nullable

2017-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059437#comment-16059437
 ] 

Hive QA commented on HIVE-16934:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874082/HIVE-16934.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 10846 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_sort_1_23] 
(batchId=134)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_sort_skew_1_23]
 (batchId=104)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join35] 
(batchId=127)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5731/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5731/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5731/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874082 - PreCommit-HIVE-Build

> Transform COUNT(x) into COUNT() when x is not nullable
> --
>
> Key: HIVE-16934
> URL: https://issues.apache.org/jira/browse/HIVE-16934
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16934.01.patch, HIVE-16934.patch
>
>
> Add a rule to simplify COUNT aggregation function if possible, removing 
> expressions that cannot be nullable from its parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-22 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16888:

Attachment: HIVE-16888.03.patch

Patch.03 is considering VARCHAR(HiveTypeSystemImpl.MAX_VARCHAR_PRECISION)  as 
TOK_STRING in TypeConverter.hiveToken
The Calcite 1.13 literals now come as a CAST(... AS VARCHAR()) and the existing 
type conversion only considered TOK_STRING for VARCHAR(Integer.MAX_VALUE)

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16934) Transform COUNT(x) into COUNT() when x is not nullable

2017-06-22 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16934:
---
Attachment: HIVE-16934.01.patch

> Transform COUNT(x) into COUNT() when x is not nullable
> --
>
> Key: HIVE-16934
> URL: https://issues.apache.org/jira/browse/HIVE-16934
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16934.01.patch, HIVE-16934.patch
>
>
> Add a rule to simplify COUNT aggregation function if possible, removing 
> expressions that cannot be nullable from its parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16939) metastore error: 'export: -Dproc_metastore : not a valid identifier'

2017-06-22 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059086#comment-16059086
 ] 

Fei Hui commented on HIVE-16939:


Failed tests are unrelated

> metastore error: 'export: -Dproc_metastore : not a valid identifier'
> 
>
> Key: HIVE-16939
> URL: https://issues.apache.org/jira/browse/HIVE-16939
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16939.patch
>
>
> When i run metastore, it reports errors as bellow
> {quote}
> bin/ext/metastore.sh: line 29: export: ` -Dproc_metastore  ': not a valid 
> identifier
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16934) Transform COUNT(x) into COUNT() when x is not nullable

2017-06-22 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059024#comment-16059024
 ] 

Jesus Camacho Rodriguez commented on HIVE-16934:


[~vgarg], there are multiple benefits that I can think of. First, at the 
execution side we will not have to access/evaluate any expression when 
calculating the COUNT. Further, by removing expressions that are referenced by 
the aggregate call, we might be able to further prune columns in the operator 
plan. Another benefit is that this might lead to some aggregate calls not being 
computed twice, e.g., {{COUNT(x)}} and {{COUNT(y)}} if _x_ and _y_ are not 
nullable. Finally, as a side effect, we might be able to recognize more 
equivalent expressions in MVs rewriting or SharedWorkOptimizer, and push more 
computation to Druid, since currently Druid is only capable of executing 
{{count(*)}}.

> Transform COUNT(x) into COUNT() when x is not nullable
> --
>
> Key: HIVE-16934
> URL: https://issues.apache.org/jira/browse/HIVE-16934
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16934.patch
>
>
> Add a rule to simplify COUNT aggregation function if possible, removing 
> expressions that cannot be nullable from its parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16934) Transform COUNT(x) into COUNT() when x is not nullable

2017-06-22 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059024#comment-16059024
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-16934 at 6/22/17 9:14 AM:
-

[~vgarg], there are multiple benefits that I can think of. First, at the 
execution side we will not have to access/evaluate any expression when 
calculating the COUNT. Further, by removing expressions that are referenced by 
the aggregate call, we might be able to further prune columns in the operator 
plan. Another benefit is that this might lead to some aggregate calls not being 
computed twice, e.g., {{COUNT\(x\)}} and {{COUNT\(y\)}} if _x_ and _y_ are not 
nullable. Finally, as a side effect, we might be able to recognize more 
equivalent expressions in MVs rewriting or SharedWorkOptimizer, and push more 
computation to Druid, since currently Druid is only capable of executing 
{{COUNT\(*\)}}.


was (Author: jcamachorodriguez):
[~vgarg], there are multiple benefits that I can think of. First, at the 
execution side we will not have to access/evaluate any expression when 
calculating the COUNT. Further, by removing expressions that are referenced by 
the aggregate call, we might be able to further prune columns in the operator 
plan. Another benefit is that this might lead to some aggregate calls not being 
computed twice, e.g., {{COUNT(x)}} and {{COUNT(y)}} if _x_ and _y_ are not 
nullable. Finally, as a side effect, we might be able to recognize more 
equivalent expressions in MVs rewriting or SharedWorkOptimizer, and push more 
computation to Druid, since currently Druid is only capable of executing 
{{count(*)}}.

> Transform COUNT(x) into COUNT() when x is not nullable
> --
>
> Key: HIVE-16934
> URL: https://issues.apache.org/jira/browse/HIVE-16934
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16934.patch
>
>
> Add a rule to simplify COUNT aggregation function if possible, removing 
> expressions that cannot be nullable from its parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058992#comment-16058992
 ] 

Hive QA commented on HIVE-13567:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874016/HIVE-13567.17.patch

{color:green}SUCCESS:{color} +1 due to 20 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 138 failed/errored test(s), 10846 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter2] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_stats] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_numbuckets_partitioned_table2_h23]
 (batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_numbuckets_partitioned_table_h23]
 (batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_partition_change_col]
 (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_partition_coltype] 
(batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_rename_partition_authorization]
 (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_serde2] 
(batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_6] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_explain] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_view_2] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_view_disable_cbo_3]
 (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_comments] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_date] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_partitioned] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ba_table3] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin10] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin11] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin12] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin13] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin8] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin9] 
(batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_1]
 (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_like_view] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_or_replace_view] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[database_drop] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_whole_partition] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[describe_table] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[display_colstats_tbllvl] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_topn] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_04_evolved_parts] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extract] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[filter_cond_pushdown] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby2_limit] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby7_noskew_multi_single_reducer]
 (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets_grouping]
 (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[hook_order] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[implicit_cast1] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_self_join] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_compact_binary_search]
 (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input0] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input33] (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input9] (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_columnarserde] 
(batchId=57)

[jira] [Commented] (HIVE-16940) Residual predicates in join operator prevent vectorization

2017-06-22 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058934#comment-16058934
 ] 

Gopal V commented on HIVE-16940:


There's the planning part which throws out vectorization and then all the 
specialized map-joins need to replace all forward() calls with 
forwardToFilter() and bounce the output VRBs through any filters if they exist.

> Residual predicates in join operator prevent vectorization
> --
>
> Key: HIVE-16940
> URL: https://issues.apache.org/jira/browse/HIVE-16940
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>
> With HIVE-16885, filter predicates in ON clause for INNER joins are pushed 
> within the join operator (residual predicates in the operator). Previously, 
> residual predicates were only used for OUTER join operators.
> Currently, vectorization does not support the evaluation of residual 
> predicates that are within an INNER join, and thus, it gets disabled if the 
> filter expression is pushed within the join. We should implement the 
> vectorization of INNER join in the presence of residual predicates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16939) metastore error: 'export: -Dproc_metastore : not a valid identifier'

2017-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058911#comment-16058911
 ] 

Hive QA commented on HIVE-16939:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874004/HIVE-16939.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10841 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5729/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5729/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5729/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874004 - PreCommit-HIVE-Build

> metastore error: 'export: -Dproc_metastore : not a valid identifier'
> 
>
> Key: HIVE-16939
> URL: https://issues.apache.org/jira/browse/HIVE-16939
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16939.patch
>
>
> When i run metastore, it reports errors as bellow
> {quote}
> bin/ext/metastore.sh: line 29: export: ` -Dproc_metastore  ': not a valid 
> identifier
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16940) Residual predicates in join operator prevent vectorization

2017-06-22 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058899#comment-16058899
 ] 

Jesus Camacho Rodriguez commented on HIVE-16940:


Cc [~mmccline] [~ashutoshc] [~gopalv]

> Residual predicates in join operator prevent vectorization
> --
>
> Key: HIVE-16940
> URL: https://issues.apache.org/jira/browse/HIVE-16940
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>
> With HIVE-16885, filter predicates in ON clause for INNER joins are pushed 
> within the join operator (residual predicates in the operator). Previously, 
> residual predicates were only used for OUTER join operators.
> Currently, vectorization does not support the evaluation of residual 
> predicates that are within an INNER join, and thus, it gets disabled if the 
> filter expression is pushed within the join. We should implement the 
> vectorization of INNER join in the presence of residual predicates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause

2017-06-22 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16885:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for reviewing [~ashutoshc]!

> Non-equi Joins: Filter clauses should be pushed into the ON clause
> --
>
> Key: HIVE-16885
> URL: https://issues.apache.org/jira/browse/HIVE-16885
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-16885.01.patch, HIVE-16885.02.patch, 
> HIVE-16885.03.patch, HIVE-16885.patch
>
>
> FIL_24 -> MAPJOIN_23
> {code}
> hive> explain  select * from part where p_size > (select max(p_size) from 
> part group by p_type);
> Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 3 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_26]
> Select Operator [SEL_25] (rows=110 width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_24] (rows=110 width=625)
> predicate:(_col5 > _col9)
> Map Join Operator [MAPJOIN_23] (rows=330 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
> <-Reducer 3 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_21]
> Select Operator [SEL_20] (rows=165 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_19] (rows=165 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_18]
>   PartitionCols:_col0
>   Group By Operator [GBY_17] (rows=14190 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type
> Select Operator [SEL_16] (rows=2 width=109)
>   Output:["p_type","p_size"]
>   TableScan [TS_2] (rows=2 width=109)
> 
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Select Operator [SEL_22] (rows=2 width=621)
> 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
> TableScan [TS_0] (rows=2 width=621)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058842#comment-16058842
 ] 

Hive QA commented on HIVE-16832:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873989/HIVE-16832.06.patch

{color:green}SUCCESS:{color} +1 due to 15 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 188 failed/errored test(s), 10839 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_subquery] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_vectorization] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_all_non_partitioned]
 (batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_all_partitioned] 
(batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_orig_table] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_tmp_table] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_where_non_partitioned]
 (batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_where_partitioned]
 (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_whole_partition] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_update_delete] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_explode2] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_noalias] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_7] (batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_8] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_9] (batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_acid_no_masking] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_stack] (batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_after_multiple_inserts]
 (batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_after_multiple_inserts_special_characters]
 (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_all_non_partitioned]
 (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_all_partitioned] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_all_types] 
(batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_orig_table] 
(batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_tmp_table] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_two_cols] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_where_non_partitioned]
 (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_where_partitioned]
 (batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_all_non_partitioned]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_all_partitioned]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_tmp_table]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_where_non_partitioned]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_where_partitioned]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_whole_partition]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_3]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization_acid]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_update_delete]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lateral_view]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf_streaming]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acid_part_update]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acid_table_update]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_part_update]
 (batchId=146)

[jira] [Commented] (HIVE-16939) metastore error: 'export: -Dproc_metastore : not a valid identifier'

2017-06-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058818#comment-16058818
 ] 

Ferdinand Xu commented on HIVE-16939:
-

LGTM +1

> metastore error: 'export: -Dproc_metastore : not a valid identifier'
> 
>
> Key: HIVE-16939
> URL: https://issues.apache.org/jira/browse/HIVE-16939
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16939.patch
>
>
> When i run metastore, it reports errors as bellow
> {quote}
> bin/ext/metastore.sh: line 29: export: ` -Dproc_metastore  ': not a valid 
> identifier
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Attachment: HIVE-13567.17.patch

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch, HIVE-13567.17.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Open  (was: Patch Available)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch, HIVE-13567.17.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Patch Available  (was: Open)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch, HIVE-13567.17.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)