[jira] [Commented] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-08-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126836#comment-16126836
 ] 

Thejas M Nair commented on HIVE-17181:
--

+1 to latest patch
Please go ahead and commit


> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.2.patch, 
> HIVE-17181.3.patch, HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17218) Canonical-ize hostnames for Hive metastore, and HS2 servers.

2017-08-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126835#comment-16126835
 ] 

Thejas M Nair commented on HIVE-17218:
--

Thanks, can you please commit ?


> Canonical-ize hostnames for Hive metastore, and HS2 servers.
> 
>
> Key: HIVE-17218
> URL: https://issues.apache.org/jira/browse/HIVE-17218
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Security
>Affects Versions: 1.2.2, 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17218.1.patch
>
>
> Currently, the {{HiveMetastoreClient}} and {{HiveConnection}} do not 
> canonical-ize the hostnames of the metastore/HS2 servers. In deployments 
> where there are multiple such servers behind a VIP, this causes a number of 
> inconveniences:
> # The client-side configuration (e.g. {{hive.metastore.uris}} in 
> {{hive-site.xml}}) needs to specify the VIP's hostname, and cannot use a 
> simplified CNAME, in the thrift URL. If the 
> {{hive.metastore.kerberos.principal}} is specified using {{_HOST}}, one sees 
> GSS failures as follows:
> {noformat}
> hive --hiveconf hive.metastore.kerberos.principal=hive/_h...@grid.myth.net 
> --hiveconf 
> hive.metastore.uris="thrift://simplified-hcat-cname.grid.myth.net:56789"
> ...
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:542)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> ...
> {noformat}
> This is because {{_HOST}} is filled in with the CNAME, and not the 
> canonicalized name.
> # Oozie workflows that use HCat {{}} have to always use the VIP 
> hostname, and can't use {{_HOST}}-based service principals, if the CNAME 
> differs from the VIP name.
> If the client-code simply canonical-ized the hostnames, it would enable the 
> use of both simplified CNAMEs, and _HOST in service principals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-08-14 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126779#comment-16126779
 ] 

Mithun Radhakrishnan commented on HIVE-17181:
-

Hey, [~thejas]. Does the latest version of this patch look better? The test 
failures seem once again to be those being handled in HIVE-16908 and HIVE-15058.

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1.patch, HIVE-17181.2.patch, 
> HIVE-17181.3.patch, HIVE-17181.branch-2.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126774#comment-16126774
 ] 

Hive QA commented on HIVE-11548:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881855/HIVE-11548.5.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11008 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=244)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=244)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=244)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6395/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6395/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6395/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881855 - PreCommit-HIVE-Build

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126747#comment-16126747
 ] 

Hive QA commented on HIVE-17308:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881854/HIVE-17308.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_join] 
(batchId=51)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_semijoin_user_level]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[correlationoptimizer1]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_max_hashtable]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[skewjoin] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_exists]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_multi]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_views]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[annotate_stats_join]
 (batchId=123)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6394/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6394/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6394/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 30 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881854 - PreCommit-HIVE-Build

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, HIVE-17308.6.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> 

[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126744#comment-16126744
 ] 

Xuefu Zhang commented on HIVE-17291:


[~lirui], that makes sense.

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-14 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126738#comment-16126738
 ] 

Rui Li commented on HIVE-17291:
---

[~pvary],
bq. Logic suggests, that in this case we will request new executors based on 
the new settings, and we are in the middle of the query test file, so Rui Li's 
magic does not applies here
I think it's because when we get a spark session, we'll set it to the 
SessionState:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java#L127
In QTestUtil, we override the {{setSparkSession}} method, so each time we 
create a new spark session, we'll wait for it to init. Currently, we have only 
one executor, and we wait until numCores > 1. So we don't have any flakiness. 
With your change in HIVE-17292, we'll need to wait until we have 4 cores.

[~xuefuz], your concerns about dynamic allocation are valid. I think current 
implementation only uses "size per reducer" when dynamic allocation is enabled. 
For static allocation, as I mentioned, the available executor/core can only 
give us more reducers than needed (decided by "size per reducer"). As long as 
we have a reasonable {{hive.exec.reducers.bytes.per.reducer}}, we won't 
underestimate the num of reducers. So I think we're good with the current 
logic. Does that make sense?

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17218) Canonical-ize hostnames for Hive metastore, and HS2 servers.

2017-08-14 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126708#comment-16126708
 ] 

Mithun Radhakrishnan commented on HIVE-17218:
-

I have verified that the tests aren't failing because of this patch. HIVE-16908 
tracks the {{HCatClient}} failures. The others look like they're handled in 
HIVE-15058.

> Canonical-ize hostnames for Hive metastore, and HS2 servers.
> 
>
> Key: HIVE-17218
> URL: https://issues.apache.org/jira/browse/HIVE-17218
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Security
>Affects Versions: 1.2.2, 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17218.1.patch
>
>
> Currently, the {{HiveMetastoreClient}} and {{HiveConnection}} do not 
> canonical-ize the hostnames of the metastore/HS2 servers. In deployments 
> where there are multiple such servers behind a VIP, this causes a number of 
> inconveniences:
> # The client-side configuration (e.g. {{hive.metastore.uris}} in 
> {{hive-site.xml}}) needs to specify the VIP's hostname, and cannot use a 
> simplified CNAME, in the thrift URL. If the 
> {{hive.metastore.kerberos.principal}} is specified using {{_HOST}}, one sees 
> GSS failures as follows:
> {noformat}
> hive --hiveconf hive.metastore.kerberos.principal=hive/_h...@grid.myth.net 
> --hiveconf 
> hive.metastore.uris="thrift://simplified-hcat-cname.grid.myth.net:56789"
> ...
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:542)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> ...
> {noformat}
> This is because {{_HOST}} is filled in with the CNAME, and not the 
> canonicalized name.
> # Oozie workflows that use HCat {{}} have to always use the VIP 
> hostname, and can't use {{_HOST}}-based service principals, if the CNAME 
> differs from the VIP name.
> If the client-code simply canonical-ized the hostnames, it would enable the 
> use of both simplified CNAMEs, and _HOST in service principals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-08-14 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Status: Patch Available  (was: Open)

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-08-14 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Attachment: HIVE-11548.5.patch

Rebased, again.

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch, HIVE-11548.5.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-14 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Attachment: HIVE-17308.6.patch

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, HIVE-17308.6.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-14 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Status: Patch Available  (was: Open)

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, HIVE-17308.6.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-14 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Status: Open  (was: Patch Available)

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, HIVE-17308.6.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126690#comment-16126690
 ] 

Hive QA commented on HIVE-17297:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881850/HIVE-17297.only.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6393/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6393/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6393/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-08-15 00:59:48.520
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6393/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-08-15 00:59:48.523
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 06d9a6b HIVE-16873: Remove Thread Cache From Logging (BELUGA 
BEHR reviewed by Aihua Xu)
+ git clean -f -d
Removing 
itests/hive-unit-hadoop2/src/test/java/org/apache/hadoop/hive/metastore/
Removing 
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/security/
Removing 
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/HdfsUtils.java
Removing 
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/SecurityUtils.java
Removing standalone-metastore/src/main/java/org/apache/hadoop/security/
Removing 
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/utils/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 06d9a6b HIVE-16873: Remove Thread Cache From Logging (BELUGA 
BEHR reviewed by Aihua Xu)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-08-15 00:59:55.661
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java:47
error: 
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java:
 patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881850 - PreCommit-HIVE-Build

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.only.patch, HIVE-17297.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17241) Change metastore classes to not use the shims

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126688#comment-16126688
 ] 

Hive QA commented on HIVE-17241:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881840/HIVE-17241.2.patch

{color:green}SUCCESS:{color} +1 due to 24 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 11017 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat6]
 (batchId=7)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge1]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge3]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge_diff_fs]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[quotedid_smb]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_combine_equivalent_work]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_op_stats]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[truncate_column_buckets]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6392/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6392/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6392/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 21 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881840 - PreCommit-HIVE-Build

> Change metastore classes to not use the shims
> -
>
> Key: HIVE-17241
> URL: https://issues.apache.org/jira/browse/HIVE-17241
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17241.2.patch, HIVE-17241.patch
>
>
> As part of moving the metastore into a standalone package, it will no longer 
> have access to the shims.  This means we need to either copy them or access 
> the underlying Hadoop operations directly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17089) make acid 2.0 the default

2017-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126684#comment-16126684
 ] 

Sergey Shelukhin commented on HIVE-17089:
-

Posted some comments on RB

> make acid 2.0 the default
> -
>
> Key: HIVE-17089
> URL: https://issues.apache.org/jira/browse/HIVE-17089
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, 
> HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, 
> HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch, 
> HIVE-17089.12.patch, HIVE-17089.13.patch, HIVE-17089.14.patch
>
>
> acid 2.0 is introduced in HIVE-14035.  It replaces Update events with a 
> combination of Delete + Insert events.  This now makes U=D+I the default (and 
> only) supported acid table type in Hive 3.0.  
> The expectation for upgrade is that Major compaction has to be run on all 
> acid tables in the existing Hive cluster and that no new writes to these 
> table take place since the start of compaction (Need to add a mechanism to 
> put a table in read-only mode - this way it can still be read while it's 
> being compacted).  Then upgrade to Hive 3.0 can take place.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126674#comment-16126674
 ] 

Sergey Shelukhin commented on HIVE-17297:
-

Looks like RB has removed the option to post a patch with a base patch from 
both commandline and web UI. So, this won't really be on RB until the previous 
patch is committed. cc [~sseth]

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.only.patch, HIVE-17297.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17297:

Status: Patch Available  (was: Open)

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.only.patch, HIVE-17297.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17297:

Attachment: HIVE-17297.only.patch
HIVE-17297.patch

The patch on top of the previous one; and another version with both included, 
for HiveQA

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.only.patch, HIVE-17297.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-08-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17297:

Attachment: (was: HIVE-17297.patch)

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.only.patch, HIVE-17297.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17320) OrcRawRecordMerger.discoverKeyBounds logic can be simplified

2017-08-14 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17320:
-


> OrcRawRecordMerger.discoverKeyBounds logic can be simplified
> 
>
> Key: HIVE-17320
> URL: https://issues.apache.org/jira/browse/HIVE-17320
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 3.0.0
>
>
> with HIVE-17089 we never have any insert events in the deltas
> so if for every split of the base we know min/max key, we can use them to 
> filter delete events since all files are sorted by RecordIdentifier
> So we should be able to create SARG for all delete deltas
> the code can be simplified since now min/max key doesn't ever have to be null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126643#comment-16126643
 ] 

Hive QA commented on HIVE-17100:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881838/HIVE-17100.01.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6391/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6391/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6391/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-08-14 23:56:04.494
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6391/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-08-14 23:56:04.497
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 06d9a6b HIVE-16873: Remove Thread Cache From Logging (BELUGA 
BEHR reviewed by Aihua Xu)
+ git clean -f -d
Removing common/src/java/org/apache/hadoop/hive/common/ndv/fm/FMSketch.java
Removing metastore/src/java/org/apache/hadoop/hive/metastore/columnstats/cache/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 06d9a6b HIVE-16873: Remove Thread Cache From Logging (BELUGA 
BEHR reviewed by Aihua Xu)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-08-14 23:56:11.579
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p1
patching file metastore/src/java/org/apache/hadoop/hive/metastore/TableType.java
patching file ql/if/queryplan.thrift
patching file ql/src/gen/thrift/gen-cpp/queryplan_types.cpp
patching file ql/src/gen/thrift/gen-cpp/queryplan_types.h
patching file 
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java
patching file ql/src/gen/thrift/gen-php/Types.php
patching file ql/src/gen/thrift/gen-py/queryplan/ttypes.py
patching file ql/src/gen/thrift/gen-rb/queryplan_types.rb
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplStateLogTask.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplStateLogWork.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/ReplLoadTask.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/ReplLoadWork.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/BootstrapEventsIterator.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/DatabaseEventsIterator.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadFunction.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/Utils.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/AbstractMessageHandler.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/CreateFunctionHandler.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/log/logger/BootstrapDumpLogger.java
patching file 

[jira] [Commented] (HIVE-17286) Avoid expensive String serialization/deserialization for bitvectors

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126639#comment-16126639
 ] 

Hive QA commented on HIVE-17286:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881830/HIVE-17286.04.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 47 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_partition_update_status]
 (batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_column_stats]
 (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status]
 (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[analyze_tbl_part] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_3] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_decimal] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_decimal_native] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bitvector] (batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[colstats_all_nulls] 
(batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[column_names_with_leading_and_trailing_spaces]
 (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[confirm_initial_tbl_stats]
 (batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_stats] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[describe_table] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extrapolate_part_stats_full]
 (batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extrapolate_part_stats_partial]
 (batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fm-sketch] (batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[hll] (batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_coltype_literals]
 (batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[rename_external_partition_location]
 (batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[rename_table_update_column_stats]
 (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats_only_null] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[temp_table_display_colstats_tbllvl]
 (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[tunable_ndv] (batchId=43)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[autoColumnStats_2]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[column_names_with_leading_and_trailing_spaces]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[extrapolate_part_stats_partial_ndv]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_only_null]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hadoop.hive.common.ndv.fm.TestFMSketchSerialization.testSerDe 
(batchId=250)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=210)
org.apache.hadoop.hive.metastore.cache.TestCachedStore.testAggrStatsRepeatedRead
 (batchId=201)
org.apache.hadoop.hive.metastore.cache.TestCachedStore.testPartitionAggrStats 
(batchId=201)
org.apache.hadoop.hive.metastore.cache.TestCachedStore.testPartitionAggrStatsBitVector
 (batchId=201)

[jira] [Updated] (HIVE-17277) HiveMetastoreClient Log name is wrong

2017-08-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17277:
--
Status: Patch Available  (was: Open)

> HiveMetastoreClient Log name is wrong
> -
>
> Key: HIVE-17277
> URL: https://issues.apache.org/jira/browse/HIVE-17277
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Minor
> Attachments: HIVE-17277.patch
>
>
> The name of Log for HiveMetastoreClient is "hive.metastore". It's confused 
> for users to trace hive log



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17277) HiveMetastoreClient Log name is wrong

2017-08-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17277:
--
Status: Open  (was: Patch Available)

Cancelling and resubmitting the patch in an attempt to get the tests to run.

> HiveMetastoreClient Log name is wrong
> -
>
> Key: HIVE-17277
> URL: https://issues.apache.org/jira/browse/HIVE-17277
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Minor
> Attachments: HIVE-17277.patch
>
>
> The name of Log for HiveMetastoreClient is "hive.metastore". It's confused 
> for users to trace hive log



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17277) HiveMetastoreClient Log name is wrong

2017-08-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126611#comment-16126611
 ] 

Alan Gates commented on HIVE-17277:
---

+1

> HiveMetastoreClient Log name is wrong
> -
>
> Key: HIVE-17277
> URL: https://issues.apache.org/jira/browse/HIVE-17277
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Minor
> Attachments: HIVE-17277.patch
>
>
> The name of Log for HiveMetastoreClient is "hive.metastore". It's confused 
> for users to trace hive log



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17241) Change metastore classes to not use the shims

2017-08-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17241:
--
Attachment: HIVE-17241.2.patch

New version of the patch that fixes failing unit tests.

> Change metastore classes to not use the shims
> -
>
> Key: HIVE-17241
> URL: https://issues.apache.org/jira/browse/HIVE-17241
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17241.2.patch, HIVE-17241.patch
>
>
> As part of moving the metastore into a standalone package, it will no longer 
> have access to the shims.  This means we need to either copy them or access 
> the underlying Hadoop operations directly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17241) Change metastore classes to not use the shims

2017-08-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17241:
--
Status: Patch Available  (was: Open)

> Change metastore classes to not use the shims
> -
>
> Key: HIVE-17241
> URL: https://issues.apache.org/jira/browse/HIVE-17241
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17241.2.patch, HIVE-17241.patch
>
>
> As part of moving the metastore into a standalone package, it will no longer 
> have access to the shims.  This means we need to either copy them or access 
> the underlying Hadoop operations directly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-08-14 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126595#comment-16126595
 ] 

Gunther Hagleitner commented on HIVE-17006:
---

can you create follow ups for the major todos? a) Uncopify 
ParquetMetadataCacheImpl, BB put for FileMetadataCache, Parquet impl in LlapIo 
impl...

The error handling in LlapCacheAwareFs should be done before commit it seems, 
that can leave a leak?

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, 
> HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-14 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Status: Patch Available  (was: Open)

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded database.{color}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-14 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Attachment: HIVE-17100.01.patch

Added 01.patch with below changes.
- Added Repl logger for bootstrap dump/load and incremental dump/load.
- Serialized the log messages with separate classes for each log message.
- Logged both in console and log file.
- Added all logs as mentioned in description.


> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target 

[jira] [Work stopped] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-14 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17100 stopped by Sankar Hariappan.
---
> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded database.{color}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126586#comment-16126586
 ] 

ASF GitHub Bot commented on HIVE-17100:
---

GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/231

HIVE-17100: Improve HS2 operation logs for REPL commands.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-17100

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/231.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #231


commit 2e7de9cff5093aceab124917f23b505fb0330d7b
Author: Sankar Hariappan 
Date:   2017-07-24T06:44:34Z

HIVE-17100: Improve HS2 operation logs for REPL commands.




> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)

[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-08-14 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126585#comment-16126585
 ] 

Gunther Hagleitner commented on HIVE-17006:
---

call it AUTHORITY then? To be less confusing? (I'm guessing empty isn't an 
option here?)

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, 
> HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-08-14 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126584#comment-16126584
 ] 

Gunther Hagleitner commented on HIVE-17006:
---

The fix in HDFSUtils to handle case where shim doesn't return file id. Is that 
specific to this patch? Seems like that's a problem even w/o parquet?

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, 
> HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126577#comment-16126577
 ] 

Sergey Shelukhin commented on HIVE-17006:
-

I have to put something as an authority for the URI. This is as good as 
anything.

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, 
> HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-08-14 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126557#comment-16126557
 ] 

Gunther Hagleitner commented on HIVE-17006:
---

Follow would be good for counters.

Why do you set the uri to: SCHEME +"://"+SCHEME ? what's the second scheme for?

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, 
> HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17286) Avoid expensive String serialization/deserialization for bitvectors

2017-08-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17286:
---
Attachment: HIVE-17286.04.patch

> Avoid expensive String serialization/deserialization for bitvectors
> ---
>
> Key: HIVE-17286
> URL: https://issues.apache.org/jira/browse/HIVE-17286
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17286.01.patch, HIVE-17286.02.patch, 
> HIVE-17286.03.patch, HIVE-17286.04.patch, HIVE-17286.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17315) Make the DataSource used by the DataNucleus in the HMS configurable using Hive properties

2017-08-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126484#comment-16126484
 ] 

Eugene Koifman commented on HIVE-17315:
---

it may be useful to support dbcp..  As Thejas said there 
are different pools in use and it may be useful to configure some of the 
properties differently

> Make the DataSource used by the DataNucleus in the HMS configurable using 
> Hive properties
> -
>
> Key: HIVE-17315
> URL: https://issues.apache.org/jira/browse/HIVE-17315
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Currently we may use several connection pool implementations in the backend 
> (hikari, dbCp, boneCp) but these can only be configured using proprietary xml 
> files and not through hive-site.xml like DataNucleus.
> We should make them configurable just like DataNucleus, by allowing Hive 
> properties prefix by hikari, dbcp, bonecp to be set in the hive-site.xml. 
> However since these configurations may contain sensitive information 
> (passwords) these properties should not be displayable or manually settable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13817) Allow DNS CNAME ALIAS Resolution from apache hive beeline JDBC URL to allow for failover

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126428#comment-16126428
 ] 

Hive QA commented on HIVE-13817:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12805512/HIVE-13817.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6389/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6389/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6389/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-08-14 21:28:04.644
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6389/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-08-14 21:28:04.647
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   4f042cc..06d9a6b  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 4f042cc HIVE-17260: Typo: exception has been created and lost in 
the ThriftJDBCBinarySerDe (Oleg Danilov via Peter Vary)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 06d9a6b HIVE-16873: Remove Thread Cache From Logging (BELUGA 
BEHR reviewed by Aihua Xu)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-08-14 21:28:10.753
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java:100
error: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java: patch does not 
apply
error: patch failed: jdbc/src/java/org/apache/hive/jdbc/Utils.java:118
error: jdbc/src/java/org/apache/hive/jdbc/Utils.java: patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12805512 - PreCommit-HIVE-Build

> Allow DNS CNAME ALIAS Resolution from apache hive beeline JDBC URL to allow 
> for failover
> 
>
> Key: HIVE-13817
> URL: https://issues.apache.org/jira/browse/HIVE-13817
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 1.2.1
>Reporter: Vijay Singh
> Attachments: HIVE-13817.1.patch, HIVE-13817.2.patch, 
> HIVE-13817.3.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, in case of BDR clusters, DNS CNAME alias based connections fail. 
> As _HOST resolves to exact endpoint specified in connection string and that 
> may not be intended SPN for kerberos based on reverse DNS lookup. 
> Consequently this JIRA proposes that client specific setting be used to 
> resolv _HOST from CNAME DNS alias to A record entry on the fly in beeline.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17272) when hive.vectorized.execution.enabled is true, query on empty partitioned table fails with NPE

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126421#comment-16126421
 ] 

Hive QA commented on HIVE-17272:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881815/HIVE-17272.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=99)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6388/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6388/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6388/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881815 - PreCommit-HIVE-Build

> when hive.vectorized.execution.enabled is true, query on empty partitioned 
> table fails with NPE
> ---
>
> Key: HIVE-17272
> URL: https://issues.apache.org/jira/browse/HIVE-17272
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17272.1.patch, HIVE-17272.2.patch
>
>
> {noformat}
> set hive.vectorized.execution.enabled=true;
> CREATE TABLE `tab`(`x` int) PARTITIONED BY ( `y` int) stored as parquet;
> select * from tab t1 join tab t2 where t1.x=t2.x;
> {noformat}
> The query fails with the following exception.
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.createAndInitPartitionContext(VectorMapOperator.java:386)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.internalSetChildren(VectorMapOperator.java:559)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setChildren(VectorMapOperator.java:474)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106) 
> ~[hive-exec-2.3.0.jar:2.3.0]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
> ~[hadoop-common-2.6.0.jar:?]
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
>

[jira] [Commented] (HIVE-17315) Make the DataSource used by the DataNucleus in the HMS configurable using Hive properties

2017-08-14 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126381#comment-16126381
 ] 

Barna Zsombor Klara commented on HIVE-17315:


[~thejas]
Yes, you are correct. We currently only set the connection pool size property 
on the datasource, but at least for BoneCp I think (though I haven't tested it 
myself) that other BoneCp specific properties can also be set by adding a 
bonecp-config.xml to the classpath.
Look at the constructors for the config object for reference:
[BoneCPConfig|http://grepcode.com/file/repo1.maven.org/maven2/com.jolbox/bonecp/0.7.1.RELEASE/com/jolbox/bonecp/BoneCPConfig.java#BoneCPConfig.%3Cinit%3E%28%29]
I think that if the xml configuration file is present it will be loaded by 
BoneCp, so it also could be loaded by the BoneCp datasource backing the 
DataNucleus PersistenceManagerFactory.
HikariConfig is using a java system property "hikaricp.configurationFile" that 
can point to a properties file with default values, and I don't know about 
dbcp, but I assume they also have something like this.
I'm not sure if any of these "backdoor configurations" are used by anyone, but 
it should be possible.
And yes you are also correct about my intention. I would like to be able to set 
any property supported by bonecp (or hikari, or dbcp) on the underlying 
datasource not just the connectionpool size. And while you are right that 
password was a bad example I would still like to have these properties hidden. 
Even if the current properties are safe, I would be afraid of what could be 
added by future versions of the connection pool implementations, since we would 
be exposing these settings the moment we upgrade the library. I don't want to 
add every property explicitly to HiveConf, I would like to propagate anything 
prefixed with bonecp/hikari/dbcp to the DataSource implementation and let it 
use/ignore it.

[~ekoifman]
Exactly. I want to be able to set bonecp/hikari/dbcp properties on the 
datasource. For example in TxnHandler the getConnectionTimeoutMs for every 
implementation and partition count for bonecp is hardcoded. We should be able 
to set these values in the hive-site.xml with a property like 
bonecp.partitionCount or hikari.getConnectionTimeoutMs.

> Make the DataSource used by the DataNucleus in the HMS configurable using 
> Hive properties
> -
>
> Key: HIVE-17315
> URL: https://issues.apache.org/jira/browse/HIVE-17315
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Currently we may use several connection pool implementations in the backend 
> (hikari, dbCp, boneCp) but these can only be configured using proprietary xml 
> files and not through hive-site.xml like DataNucleus.
> We should make them configurable just like DataNucleus, by allowing Hive 
> properties prefix by hikari, dbcp, bonecp to be set in the hive-site.xml. 
> However since these configurations may contain sensitive information 
> (passwords) these properties should not be displayable or manually settable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126371#comment-16126371
 ] 

Sergio Peña commented on HIVE-16886:


I read on the datanucleus doc that those datastore-identity objects may be 
accessed by something like {{Object id = pm.getObjectId(obj);}}, do you know 
how if that would work? can we read and add the NL_ID into the object to be 
returned back to the client?

http://www.datanucleus.org/products/datanucleus/jdo/datastore_identity.html

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16873) Remove Thread Cache From Logging

2017-08-14 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16873:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks [~belugabehr] for the work.

> Remove Thread Cache From Logging
> 
>
> Key: HIVE-16873
> URL: https://issues.apache.org/jira/browse/HIVE-16873
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16873.1.patch, HIVE-16873.2.patch, 
> HIVE-16873.3.patch
>
>
> In {{org.apache.hadoop.hive.metastore.HiveMetaStore}} we have a {{Formatter}} 
> class (and its buffer) tied to every thread.
> This {{Formatter}} is for logging purposes. I would suggest that we simply 
> let let the logging framework itself handle these kind of details and ditch 
> the buffer per thread.
> {code}
> public static final String AUDIT_FORMAT =
> "ugi=%s\t" + // ugi
> "ip=%s\t" + // remote IP
> "cmd=%s\t"; // command
> public static final Logger auditLog = LoggerFactory.getLogger(
> HiveMetaStore.class.getName() + ".audit");
> private static final ThreadLocal auditFormatter =
> new ThreadLocal() {
>   @Override
>   protected Formatter initialValue() {
> return new Formatter(new StringBuilder(AUDIT_FORMAT.length() * 
> 4));
>   }
> };
> ...
> private static final void logAuditEvent(String cmd) {
>   final Formatter fmt = auditFormatter.get();
>   ((StringBuilder) fmt.out()).setLength(0);
>   String address = getIPAddress();
>   if (address == null) {
> address = "unknown-ip-addr";
>   }
>   auditLog.info(fmt.format(AUDIT_FORMAT, ugi.getUserName(),
>   address, cmd).toString());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13817) Allow DNS CNAME ALIAS Resolution from apache hive beeline JDBC URL to allow for failover

2017-08-14 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126359#comment-16126359
 ] 

Aihua Xu commented on HIVE-13817:
-

[~SINGHVJD] The patch makes sense to me. Can you attach a new patch from the 
new rebase since it has been a while? Thanks.

> Allow DNS CNAME ALIAS Resolution from apache hive beeline JDBC URL to allow 
> for failover
> 
>
> Key: HIVE-13817
> URL: https://issues.apache.org/jira/browse/HIVE-13817
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 1.2.1
>Reporter: Vijay Singh
> Attachments: HIVE-13817.1.patch, HIVE-13817.2.patch, 
> HIVE-13817.3.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, in case of BDR clusters, DNS CNAME alias based connections fail. 
> As _HOST resolves to exact endpoint specified in connection string and that 
> may not be intended SPN for kerberos based on reverse DNS lookup. 
> Consequently this JIRA proposes that client specific setting be used to 
> resolv _HOST from CNAME DNS alias to A record entry on the fly in beeline.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17272) when hive.vectorized.execution.enabled is true, query on empty partitioned table fails with NPE

2017-08-14 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17272:

Attachment: HIVE-17272.2.patch

> when hive.vectorized.execution.enabled is true, query on empty partitioned 
> table fails with NPE
> ---
>
> Key: HIVE-17272
> URL: https://issues.apache.org/jira/browse/HIVE-17272
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17272.1.patch, HIVE-17272.2.patch
>
>
> {noformat}
> set hive.vectorized.execution.enabled=true;
> CREATE TABLE `tab`(`x` int) PARTITIONED BY ( `y` int) stored as parquet;
> select * from tab t1 join tab t2 where t1.x=t2.x;
> {noformat}
> The query fails with the following exception.
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.createAndInitPartitionContext(VectorMapOperator.java:386)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.internalSetChildren(VectorMapOperator.java:559)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setChildren(VectorMapOperator.java:474)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106) 
> ~[hive-exec-2.3.0.jar:2.3.0]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
> ~[hadoop-common-2.6.0.jar:?]
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
> ~[hadoop-common-2.6.0.jar:?]
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
>  ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_101]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_101]
> at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_101]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126319#comment-16126319
 ] 

Hive QA commented on HIVE-17308:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881806/HIVE-17308.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10998 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=242)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6387/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6387/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6387/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881806 - PreCommit-HIVE-Build

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126284#comment-16126284
 ] 

Xuefu Zhang commented on HIVE-17291:


Did you confirm that the configuration is updated and new session is launched? 
It might be possible that the configuration is updated via other code path or 
the configuration update is ignored by the test. Every qtest files was executed 
in a single session in the past and later this was changed to sharing a session 
for multiple qtest files. I'm not sure if there is also some magic.

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-14 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126257#comment-16126257
 ] 

Peter Vary commented on HIVE-17291:
---

For testing this is exactly what I did in HIVE-17292.3.patch. I am not sure why 
we do not have the flakiness in this case when the configuration is changed, 
and the spark session is killed and restarted. Logic suggests, that in this 
case we will request new executors based on the new settings, and we are in the 
middle of the query test file, so [~lirui]'s magic does not applies here.
For example:
{code:title=spark_dynamic_partition_pruning_2.q}
EXPLAIN SELECT d1.label, count(*), sum(agg.amount) 
[..]

set hive.spark.dynamic.partition.pruning.max.data.size=1;   <-- I think new 
session is started here

EXPLAIN SELECT d1.label, count(*), sum(agg.amount)
[..]
{code}

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17315) Make the DataSource used by the DataNucleus in the HMS configurable using Hive properties

2017-08-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126248#comment-16126248
 ] 

Eugene Koifman edited comment on HIVE-17315 at 8/14/17 7:11 PM:


What is the intent here?  is it to support something like "bonecp." in hives-site.xml so that Hive sets "some string that 
bonecp recognizes" on the data source? (so hive doesn't interpret "some string 
that bonecp recognizes" in any way)



was (Author: ekoifman):
What is the intent here?  is it to support something like "bonecp." in hives-site.xml so that Hive sets "some string that 
bonecp recognizes" on the data source?


> Make the DataSource used by the DataNucleus in the HMS configurable using 
> Hive properties
> -
>
> Key: HIVE-17315
> URL: https://issues.apache.org/jira/browse/HIVE-17315
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Currently we may use several connection pool implementations in the backend 
> (hikari, dbCp, boneCp) but these can only be configured using proprietary xml 
> files and not through hive-site.xml like DataNucleus.
> We should make them configurable just like DataNucleus, by allowing Hive 
> properties prefix by hikari, dbcp, bonecp to be set in the hive-site.xml. 
> However since these configurations may contain sensitive information 
> (passwords) these properties should not be displayable or manually settable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17315) Make the DataSource used by the DataNucleus in the HMS configurable using Hive properties

2017-08-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126248#comment-16126248
 ] 

Eugene Koifman commented on HIVE-17315:
---

What is the intent here?  is it to support something like "bonecp." in hives-site.xml so that Hive sets "some string that 
bonecp recognizes" on the data source?


> Make the DataSource used by the DataNucleus in the HMS configurable using 
> Hive properties
> -
>
> Key: HIVE-17315
> URL: https://issues.apache.org/jira/browse/HIVE-17315
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Currently we may use several connection pool implementations in the backend 
> (hikari, dbCp, boneCp) but these can only be configured using proprietary xml 
> files and not through hive-site.xml like DataNucleus.
> We should make them configurable just like DataNucleus, by allowing Hive 
> properties prefix by hikari, dbcp, bonecp to be set in the hive-site.xml. 
> However since these configurations may contain sensitive information 
> (passwords) these properties should not be displayable or manually settable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126242#comment-16126242
 ] 

Xuefu Zhang commented on HIVE-17291:


Getting a value greater than 0 doesn't necessarily mean that all requested 
executors are up unless only one executor is requested. For testing, you might 
alter Rui's logic such that it waits until all expected cores are returned.

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-14 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Status: Patch Available  (was: Open)

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-14 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Status: Open  (was: Patch Available)

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-14 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Attachment: HIVE-17308.5.patch

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-14 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126224#comment-16126224
 ] 

Peter Vary commented on HIVE-17291:
---

To clarify: I see the flakiness appear only when running the HoS tests against 
*real clusters*. I am still not sure why I do not see it with [~lirui]'s magic. 
When running the tests agains an appropriately configured HoS cluster I am 
using the TestBeeLineDriver to run the query tests, and I see the flakiness 
there. The BeeLineDriver could not use the QTestUtil magic, since it runs the 
query against a real HS2 instance with the out of the box Sessions. So the 
current state is not that bad!

Quick question: Do we have a way to determine when info about the available 
executors is reliable? The current implementation assumes that if we have a 
value which is bigger than 0, then it is reliable. But it still might be 
invalid if we requested multiple executors, but not yet received all.

Thanks,
Peter

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17315) Make the DataSource used by the DataNucleus in the HMS configurable using Hive properties

2017-08-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126208#comment-16126208
 ] 

Thejas M Nair commented on HIVE-17315:
--

[~zsombor.klara]
+1 I think this would be useful for TxnHandler.java also which doesn't use 
Datanucleus. (cc [~ekoifman] [~anishek])

bq. However since these configurations may contain sensitive information 
(passwords) these properties should not be displayable or manually settable.
javax.jdo.option.ConnectionPassword is currently used to supply the password 
for metastore. Why do we need to use connection pool specific properties to 
pass in the password ? 

bq. these can only be configured using proprietary xml files and not through 
hive-site.xml like DataNucleus.
Just to clarify the properties that can be controlled via 
"datanucleus.connectionPool.*" settings are configurable through hive-site.xml. 
But I assume there are some additional connection pool properties which are not 
supported through that means, which you are trying to support. Is that right ?


> Make the DataSource used by the DataNucleus in the HMS configurable using 
> Hive properties
> -
>
> Key: HIVE-17315
> URL: https://issues.apache.org/jira/browse/HIVE-17315
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Currently we may use several connection pool implementations in the backend 
> (hikari, dbCp, boneCp) but these can only be configured using proprietary xml 
> files and not through hive-site.xml like DataNucleus.
> We should make them configurable just like DataNucleus, by allowing Hive 
> properties prefix by hikari, dbcp, bonecp to be set in the hive-site.xml. 
> However since these configurations may contain sensitive information 
> (passwords) these properties should not be displayable or manually settable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17300) WebUI query plan graphs

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126192#comment-16126192
 ] 

Hive QA commented on HIVE-17300:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881784/HIVE-17300.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver[udaf_example_avg] 
(batchId=234)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6386/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6386/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6386/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881784 - PreCommit-HIVE-Build

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, HIVE-17300.patch, last_stage_error.png, 
> last_stage_running.png, non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-08-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126148#comment-16126148
 ] 

Sergey Shelukhin commented on HIVE-17006:
-

Fragment-level IO counters. They are maintained in IO elevator and this doesn't 
use the elevator, only cache directly. Maybe this needs a follow-up JIRA.

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, 
> HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126125#comment-16126125
 ] 

Xuefu Zhang commented on HIVE-17291:


For test result stability, I'm fine with whatever solution. It would be 
interesting to find out why this is an issue even with Rui's magic in QTestUtil.

As to production, most likely dynamic allocation is enabled for sharing 
resources among users. The problem I see with dynamically determining 
parallelism with available executors is the dynamic nature of the executors. 
Typically, each user or user group has a queue, which is specified when a query 
is submitted. The capacity of the queue is decided in YARN. Further, the 
availability of the yarn containers is not guaranteed, and this is apparent 
when the cluster is busy. In this mode, using available executors can 
underestimate the number of reducers needed. This also happens when Spark 
client is initially launched. In addition, with dynamic allocation, 
minExecutors (usually 0) and initialExecutors (usually a small number) are not 
guaranteed either, and maxExecutors (usually a large number) only puts an upper 
limit. In a production env, users are less likely to overwrite what admin sets 
as default. Given such an uncertainly, I don't think we should determine 
parallelism based on available executors. I'd propose that we use "size per 
reducer" to decide the number of reducers, which might be further constrained 
under what maxExecutors allows.

Static allocation is mostly useful for benchmarking, but less likely used in a 
multi-tenant env. Even under this mode, {{spark.executor.instances}} are not 
guaranteed either. However, once the client gets an executor, it never gives it 
up. Thus, it's useful to determine the number of reducers using available 
executors. This comes with catch, which is about the first query run when the 
executors are starting. For this case, I think it should be okay to use 
{{spark.executor.instances}}. In short, with static allocation, we can use 
available executors to determine reducer parallelism and use 
{{spark.executor.instances}} when info about available executor is not 
available.

Any thoughts?

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17241) Change metastore classes to not use the shims

2017-08-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17241:
--
Status: Open  (was: Patch Available)

Failure of TestSSLWithMiniKdc and TestJdbcWithMiniHS2 look authentic.

> Change metastore classes to not use the shims
> -
>
> Key: HIVE-17241
> URL: https://issues.apache.org/jira/browse/HIVE-17241
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17241.patch
>
>
> As part of moving the metastore into a standalone package, it will no longer 
> have access to the shims.  This means we need to either copy them or access 
> the underlying Hadoop operations directly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17303) Missmatch between roaring bitmap library used by druid and the one coming from tez

2017-08-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126033#comment-16126033
 ] 

Ashutosh Chauhan commented on HIVE-17303:
-

+1

> Missmatch between roaring bitmap library used by druid and the one coming 
> from tez
> --
>
> Key: HIVE-17303
> URL: https://issues.apache.org/jira/browse/HIVE-17303
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17303.patch
>
>
> {code} 
>  
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NoSuchMethodError: 
> org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:165)
>   ... 25 more
> Caused by: java.lang.NoSuchMethodError: 
> org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
>   at 
> org.apache.hive.druid.com.metamx.collections.bitmap.WrappedRoaringBitmap.toImmutableBitmap(WrappedRoaringBitmap.java:65)
>   at 
> org.apache.hive.druid.com.metamx.collections.bitmap.RoaringBitmapFactory.makeImmutableBitmap(RoaringBitmapFactory.java:88)
>   at 
> org.apache.hive.druid.io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:348)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:218)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.merge(IndexMerger.java:438)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:186)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:152)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:996)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:93)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:385)
>   at 
> org.apache.hive.druid.io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44)
>   ... 4 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
> killedTasks:89, Vertex vertex_1502470020457_0005_12_05 [Reducer 2] 
> killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to 
> VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
> Options
> Attachments
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17300) WebUI query plan graphs

2017-08-14 Thread Karen Coppage (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126002#comment-16126002
 ] 

Karen Coppage commented on HIVE-17300:
--

Done, thanks very much for the reminder, [~xuefuz]!

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, HIVE-17300.patch, last_stage_error.png, 
> last_stage_running.png, non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17300) WebUI query plan graphs

2017-08-14 Thread Karen Coppage (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-17300:
-
Status: Patch Available  (was: Open)

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, HIVE-17300.patch, last_stage_error.png, 
> last_stage_running.png, non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17300) WebUI query plan graphs

2017-08-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125996#comment-16125996
 ] 

Xuefu Zhang commented on HIVE-17300:


[~klcopp], Thanks for your patch. Could you please change the status to "Patch 
available" so that tests can run on your patch? Thanks.

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, HIVE-17300.patch, last_stage_error.png, 
> last_stage_running.png, non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-08-14 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125994#comment-16125994
 ] 

Gunther Hagleitner commented on HIVE-17006:
---

{quote}
  // TODO: we currently pass null counters because this doesn't use 
LlapRecordReader.   

  //   Create counters for non-elevator-using fragments also?  
{quote}

What counters are those?

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, 
> HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17309) alter partition onto a table not in current database throw InvalidOperationException

2017-08-14 Thread Wang Haihua (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125944#comment-16125944
 ] 

Wang Haihua commented on HIVE-17309:


Seems failed test result is not related with this patch. cc [~pxiong] 
[~alangates] thanks for review in advance.

> alter partition onto a table not in current database throw 
> InvalidOperationException
> 
>
> Key: HIVE-17309
> URL: https://issues.apache.org/jira/browse/HIVE-17309
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.2, 2.1.1, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17309.1.patch
>
>
> When executor alter partition onto a table which existed not in current 
> database, InvalidOperationException thrown.
> SQL example:
> {code}
> use default;
> ALTER TABLE anotherdb.test_table_for_alter_partition_nocurrentdb 
> partition(ds='haihua001') CHANGE COLUMN a a_new BOOLEAN;
> {code}
> We see this code in {{DDLTask.java}} potential problem that not transfer the 
> qualified table name with database name when {{db.alterPartitions}} called.
> {code}
>   if (allPartitions == null) {
> db.alterTable(alterTbl.getOldName(), tbl, alterTbl.getIsCascade(), 
> alterTbl.getEnvironmentContext());
>   } else {
> db.alterPartitions(tbl.getTableName(), allPartitions, 
> alterTbl.getEnvironmentContext());
>   }
> {code}
> stacktrace:
> {code}
> 2017-07-19T11:06:39,639  INFO [main] metastore.HiveMetaStore: New partition 
> values:[2017-07-14]
> 2017-07-19T11:06:39,654 ERROR [main] metastore.RetryingHMSHandler: 
> InvalidOperationException(message:alter is not possible)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:526)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3560)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at 
> com.sun.proxy.$Proxy21.alter_partitions_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_partitions(HiveMetaStoreClient.java:1486)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy22.alter_partitions(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.alterPartitions(Hive.java:712)
> at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3338)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:368)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2166)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1837)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1713)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1543)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1174)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1164)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> 

[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125940#comment-16125940
 ] 

Hive QA commented on HIVE-17292:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881771/HIVE-17292.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6385/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6385/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6385/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881771 - PreCommit-HIVE-Build

> Change TestMiniSparkOnYarnCliDriver test configuration to use the configured 
> cores
> --
>
> Key: HIVE-17292
> URL: https://issues.apache.org/jira/browse/HIVE-17292
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch, 
> HIVE-17292.3.patch
>
>
> Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test 
> defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster 
> does not allows the creation of the 3rd container.
> The FairScheduler uses 1GB increments for memory, but the containers would 
> like to use only 512MB. We should change the fairscheduler configuration to 
> use only the requested 512MB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17300) WebUI query plan graphs

2017-08-14 Thread Karen Coppage (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-17300:
-
Attachment: HIVE-17300.patch

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, HIVE-17300.patch, last_stage_error.png, 
> last_stage_running.png, non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-14 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125842#comment-16125842
 ] 

Peter Vary commented on HIVE-17291:
---

Thanks [~lirui], helpful as always!

What I do not get, why we do not see flakiness on TestMiniSparkOnYarnCliDriver 
tests, when the we change a spark configuration value, and the spark session is 
invalidated. Until now I thought, that in this case new containers are created 
by yarn, and we will have a race condition when checking 
{{hiveSparkClient.getExecutorCount()}} again. But obviously this is not the 
case, since I do not see any flakiness in the test results in HIVE-17292.

Thanks,
Peter

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores

2017-08-14 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17292:
--
Attachment: HIVE-17292.3.patch

Changed we expect different number of executors on yarn, and on standalone mode

> Change TestMiniSparkOnYarnCliDriver test configuration to use the configured 
> cores
> --
>
> Key: HIVE-17292
> URL: https://issues.apache.org/jira/browse/HIVE-17292
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch, 
> HIVE-17292.3.patch
>
>
> Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test 
> defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster 
> does not allows the creation of the 3rd container.
> The FairScheduler uses 1GB increments for memory, but the containers would 
> like to use only 512MB. We should change the fairscheduler configuration to 
> use only the requested 512MB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17306) Support MySQL InnoDB Cluster

2017-08-14 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17306:
--
Component/s: Transactions

> Support MySQL InnoDB Cluster
> 
>
> Key: HIVE-17306
> URL: https://issues.apache.org/jira/browse/HIVE-17306
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Shawn Weeks
>Priority: Minor
>
> To support high availability of the Hive Metastore using a highly available 
> database is required. To support the MySQL InnoDB Cluster it looks like we're 
> just missing a couple primary keys as we were already using InnoDB tables for 
> the metastore. It looks like it's primarily the transaction tables that don't 
> have primary keys like TXN_COMPONENTS and COMPLETED_TXN_COMPONENTS. The 
> primary keys can be surrogate sequences if there really is no unique 
> identifier in these tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17316) Use regular expressions for the hidden configuration variables

2017-08-14 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17316 started by Barna Zsombor Klara.
--
> Use regular expressions for the hidden configuration variables
> --
>
> Key: HIVE-17316
> URL: https://issues.apache.org/jira/browse/HIVE-17316
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Currently HiveConf variables which should not be displayed to the user need 
> to be enumerated. We should enhance this to be able to set regular 
> expressions and any variable matching it should be hidden.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17316) Use regular expressions for the hidden configuration variables

2017-08-14 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara reassigned HIVE-17316:
--


> Use regular expressions for the hidden configuration variables
> --
>
> Key: HIVE-17316
> URL: https://issues.apache.org/jira/browse/HIVE-17316
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Currently HiveConf variables which should not be displayed to the user need 
> to be enumerated. We should enhance this to be able to set regular 
> expressions and any variable matching it should be hidden.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17315) Make the DataSource used by the DataNucleus in the HMS configurable using Hive properties

2017-08-14 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara reassigned HIVE-17315:
--


> Make the DataSource used by the DataNucleus in the HMS configurable using 
> Hive properties
> -
>
> Key: HIVE-17315
> URL: https://issues.apache.org/jira/browse/HIVE-17315
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Currently we may use several connection pool implementations in the backend 
> (hikari, dbCp, boneCp) but these can only be configured using proprietary xml 
> files and not through hive-site.xml like DataNucleus.
> We should make them configurable just like DataNucleus, by allowing Hive 
> properties prefix by hikari, dbcp, bonecp to be set in the hive-site.xml. 
> However since these configurations may contain sensitive information 
> (passwords) these properties should not be displayable or manually settable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125780#comment-16125780
 ] 

Hive QA commented on HIVE-17292:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881744/HIVE-17292.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 51 failed/errored test(s), 10413 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=101)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=102)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=103)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=104)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=105)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=106)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=107)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=108)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=109)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=110)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=111)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=112)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=113)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=114)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=117)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=118)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=119)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=122)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=123)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=124)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=125)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=126)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=127)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=128)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=129)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=131)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=132)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=133)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 

[jira] [Updated] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores

2017-08-14 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17292:
--
Attachment: HIVE-17292.2.patch

Moved the config change to the Shim, as suggested by [~lirui].
Also updated the QTestUtil, so it will wait until all of the executors are 
ready.
Might be some problem still, if the configuration is changed with a set command 
inside the test file. Will see the results.

Updated the necessary q.out files

> Change TestMiniSparkOnYarnCliDriver test configuration to use the configured 
> cores
> --
>
> Key: HIVE-17292
> URL: https://issues.apache.org/jira/browse/HIVE-17292
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch
>
>
> Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test 
> defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster 
> does not allows the creation of the 3rd container.
> The FairScheduler uses 1GB increments for memory, but the containers would 
> like to use only 512MB. We should change the fairscheduler configuration to 
> use only the requested 512MB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17313) Potentially possible 'case fall through' in the ObjectInspectorConverters

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125627#comment-16125627
 ] 

Hive QA commented on HIVE-17313:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881721/HIVE-17313.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate2 (batchId=183)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6383/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6383/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6383/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881721 - PreCommit-HIVE-Build

> Potentially possible 'case fall through' in the ObjectInspectorConverters
> -
>
> Key: HIVE-17313
> URL: https://issues.apache.org/jira/browse/HIVE-17313
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Trivial
> Attachments: HIVE-17313.patch
>
>
> Lines 103-110:
> {code:java}
> case STRING:
>   if (outputOI instanceof WritableStringObjectInspector) {
> return new PrimitiveObjectInspectorConverter.TextConverter(
> inputOI);
>   } else if (outputOI instanceof JavaStringObjectInspector) {
> return new PrimitiveObjectInspectorConverter.StringConverter(
> inputOI);
>   }
> case CHAR:
> {code}
> De-facto it should work correctly since outputOI is either an instance of 
> WritableStringObjectInspector or JavaStringObjectInspector, but it would be 
> better to rewrite this case to avoid possible fall through.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17260) Typo: exception has been created and lost in the ThriftJDBCBinarySerDe

2017-08-14 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17260:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master.
Thanks [~olegd]!

> Typo: exception has been created and lost in the ThriftJDBCBinarySerDe
> --
>
> Key: HIVE-17260
> URL: https://issues.apache.org/jira/browse/HIVE-17260
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17260.patch
>
>
> Line 100:
> {code:java}
> } catch (Exception e) {
>   new SerDeException(e);
> }
> {code}
> Seems like it should be thrown there :-)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17260) Typo: exception has been created and lost in the ThriftJDBCBinarySerDe

2017-08-14 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-17260:
-

Assignee: Oleg Danilov

> Typo: exception has been created and lost in the ThriftJDBCBinarySerDe
> --
>
> Key: HIVE-17260
> URL: https://issues.apache.org/jira/browse/HIVE-17260
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Minor
> Attachments: HIVE-17260.patch
>
>
> Line 100:
> {code:java}
> } catch (Exception e) {
>   new SerDeException(e);
> }
> {code}
> Seems like it should be thrown there :-)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-14 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Description: 
It is necessary to log the progress the replication tasks in a structured 
manner as follows.
*+Bootstrap Dump:+*
* At the start of bootstrap dump, will add one log with below details.
{color:#59afe1}* Database Name
* Dump Type (BOOTSTRAP)
* (Estimated) Total number of tables/views to dump
* (Estimated) Total number of functions to dump.
* Dump Start Time{color}
* After each table dump, will add a log as follows
{color:#59afe1}* Table/View Name
* Type (TABLE/VIEW/MATERIALIZED_VIEW)
* Table dump end time
* Table dump progress. Format is Table sequence no/(Estimated) Total number of 
tables and views.{color}
* After each function dump, will add a log as follows
{color:#59afe1}* Function Name
* Function dump end time
* Function dump progress. Format is Function sequence no/(Estimated) Total 
number of functions.{color}
* After completion of all dumps, will add a log as follows to consolidate the 
dump.
{color:#59afe1}* Database Name.
* Dump Type (BOOTSTRAP).
* Dump End Time.
* (Actual) Total number of tables/views dumped.
* (Actual) Total number of functions dumped.
* Dump Directory.
* Last Repl ID of the dump.{color}
*Note:* The actual and estimated number of tables/functions may not match if 
any table/function is dropped when dump in progress.

*+Bootstrap Load:+*
* At the start of bootstrap load, will add one log with below details.
{color:#59afe1}* Database Name
* Dump directory
* Load Type (BOOTSTRAP)
* Total number of tables/views to load
* Total number of functions to load.
* Load Start Time{color}
* After each table load, will add a log as follows
{color:#59afe1}* Table/View Name
* Type (TABLE/VIEW/MATERIALIZED_VIEW)
* Table load completion time
* Table load progress. Format is Table sequence no/Total number of tables and 
views.{color}
* After each function load, will add a log as follows
{color:#59afe1}* Function Name
* Function load completion time
* Function load progress. Format is Function sequence no/Total number of 
functions.{color}
* After completion of all dumps, will add a log as follows to consolidate the 
load.
{color:#59afe1}* Database Name.
* Load Type (BOOTSTRAP).
* Load End Time.
* Total number of tables/views loaded.
* Total number of functions loaded.
* Last Repl ID of the loaded database.{color}

*+Incremental Dump:+*
* At the start of database dump, will add one log with below details.
{color:#59afe1}* Database Name
* Dump Type (INCREMENTAL)
* (Estimated) Total number of events to dump.
* Dump Start Time{color}
* After each event dump, will add a log as follows
{color:#59afe1}* Event ID
* Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
* Event dump end time
* Event dump progress. Format is Event sequence no/ (Estimated) Total number of 
events.{color}
* After completion of all event dumps, will add a log as follows.
{color:#59afe1}* Database Name.
* Dump Type (INCREMENTAL).
* Dump End Time.
* (Actual) Total number of events dumped.
* Dump Directory.
* Last Repl ID of the dump.{color}
*Note:* The estimated number of events can be terribly inaccurate with actual 
number as we don’t have the number of events upfront until we read from 
metastore NotificationEvents table.

*+Incremental Load:+*
* At the start of incremental load, will add one log with below details.
{color:#59afe1}* Target Database Name 
* Dump directory
* Load Type (INCREMENTAL)
* Total number of events to load
* Load Start Time{color}
* After each event load, will add a log as follows
{color:#59afe1}* Event ID
* Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
* Event load end time
* Event load progress. Format is Event sequence no/ Total number of 
events.{color}
* After completion of all event loads, will add a log as follows to consolidate 
the load.
{color:#59afe1}* Target Database Name.
* Load Type (INCREMENTAL).
* Load End Time.
* Total number of events loaded.
* Last Repl ID of the loaded database.{color}

  was:
It is necessary to log the progress the replication tasks in a structured 
manner as follows.
*+Bootstrap Dump:+*
* At the start of bootstrap dump, will add one log with below details.
{color:#59afe1}* Database Name
* Dump Type (BOOTSTRAP)
* (Estimated) Total number of tables/views to dump
* (Estimated) Total number of functions to dump.
* Dump Start Time{color}
* After each table dump, will add a log as follows
{color:#59afe1}* Table/View Name
* Type (TABLE/VIEW/MATERIALIZED_VIEW)
* Table dump end time
* Table dump progress. Format is Table sequence no/(Estimated) Total number of 
tables and views.{color}
* After each function dump, will add a log as follows
{color:#59afe1}* Function Name
* Function dump end time
* Function dump progress. Format is Function sequence no/(Estimated) Total 
number of functions.{color}
* After completion of all dumps, will 

[jira] [Commented] (HIVE-17311) Numeric overflow in the HiveConf

2017-08-14 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125581#comment-16125581
 ] 

Peter Vary commented on HIVE-17311:
---

LGTM +1

> Numeric overflow in the HiveConf
> 
>
> Key: HIVE-17311
> URL: https://issues.apache.org/jira/browse/HIVE-17311
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Minor
> Attachments: HIVE-17311.patch
>
>
> multiplierFor() method contains a typo, which causes wrong parsing of the 
> rare suffixes ('tb' & 'pb').



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17311) Numeric overflow in the HiveConf

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125568#comment-16125568
 ] 

Hive QA commented on HIVE-17311:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881707/HIVE-17311.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11005 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6382/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6382/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6382/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881707 - PreCommit-HIVE-Build

> Numeric overflow in the HiveConf
> 
>
> Key: HIVE-17311
> URL: https://issues.apache.org/jira/browse/HIVE-17311
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Minor
> Attachments: HIVE-17311.patch
>
>
> multiplierFor() method contains a typo, which causes wrong parsing of the 
> rare suffixes ('tb' & 'pb').



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17313) Potentially possible 'case fall through' in the ObjectInspectorConverters

2017-08-14 Thread Oleg Danilov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Danilov updated HIVE-17313:

Status: Patch Available  (was: Open)

> Potentially possible 'case fall through' in the ObjectInspectorConverters
> -
>
> Key: HIVE-17313
> URL: https://issues.apache.org/jira/browse/HIVE-17313
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Trivial
> Attachments: HIVE-17313.patch
>
>
> Lines 103-110:
> {code:java}
> case STRING:
>   if (outputOI instanceof WritableStringObjectInspector) {
> return new PrimitiveObjectInspectorConverter.TextConverter(
> inputOI);
>   } else if (outputOI instanceof JavaStringObjectInspector) {
> return new PrimitiveObjectInspectorConverter.StringConverter(
> inputOI);
>   }
> case CHAR:
> {code}
> De-facto it should work correctly since outputOI is either an instance of 
> WritableStringObjectInspector or JavaStringObjectInspector, but it would be 
> better to rewrite this case to avoid possible fall through.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17313) Potentially possible 'case fall through' in the ObjectInspectorConverters

2017-08-14 Thread Oleg Danilov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Danilov updated HIVE-17313:

Attachment: HIVE-17313.patch

> Potentially possible 'case fall through' in the ObjectInspectorConverters
> -
>
> Key: HIVE-17313
> URL: https://issues.apache.org/jira/browse/HIVE-17313
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Trivial
> Attachments: HIVE-17313.patch
>
>
> Lines 103-110:
> {code:java}
> case STRING:
>   if (outputOI instanceof WritableStringObjectInspector) {
> return new PrimitiveObjectInspectorConverter.TextConverter(
> inputOI);
>   } else if (outputOI instanceof JavaStringObjectInspector) {
> return new PrimitiveObjectInspectorConverter.StringConverter(
> inputOI);
>   }
> case CHAR:
> {code}
> De-facto it should work correctly since outputOI is either an instance of 
> WritableStringObjectInspector or JavaStringObjectInspector, but it would be 
> better to rewrite this case to avoid possible fall through.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17313) Potentially possible 'case fall through' in the ObjectInspectorConverters

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125563#comment-16125563
 ] 

ASF GitHub Bot commented on HIVE-17313:
---

GitHub user dosoft opened a pull request:

https://github.com/apache/hive/pull/230

HIVE-17313: Fixed 'case fall-through'



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dosoft/hive HIVE-17313

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #230


commit 2f322ea6d6a9b5754c0c9b698395d8b3e94563f7
Author: Oleg Danilov 
Date:   2017-08-14T11:28:47Z

HIVE-17313: Fixed 'case fall-through'




> Potentially possible 'case fall through' in the ObjectInspectorConverters
> -
>
> Key: HIVE-17313
> URL: https://issues.apache.org/jira/browse/HIVE-17313
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Trivial
>
> Lines 103-110:
> {code:java}
> case STRING:
>   if (outputOI instanceof WritableStringObjectInspector) {
> return new PrimitiveObjectInspectorConverter.TextConverter(
> inputOI);
>   } else if (outputOI instanceof JavaStringObjectInspector) {
> return new PrimitiveObjectInspectorConverter.StringConverter(
> inputOI);
>   }
> case CHAR:
> {code}
> De-facto it should work correctly since outputOI is either an instance of 
> WritableStringObjectInspector or JavaStringObjectInspector, but it would be 
> better to rewrite this case to avoid possible fall through.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17313) Potentially possible 'case fall through' in the ObjectInspectorConverters

2017-08-14 Thread Oleg Danilov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Danilov reassigned HIVE-17313:
---

Assignee: Oleg Danilov

> Potentially possible 'case fall through' in the ObjectInspectorConverters
> -
>
> Key: HIVE-17313
> URL: https://issues.apache.org/jira/browse/HIVE-17313
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Trivial
>
> Lines 103-110:
> {code:java}
> case STRING:
>   if (outputOI instanceof WritableStringObjectInspector) {
> return new PrimitiveObjectInspectorConverter.TextConverter(
> inputOI);
>   } else if (outputOI instanceof JavaStringObjectInspector) {
> return new PrimitiveObjectInspectorConverter.StringConverter(
> inputOI);
>   }
> case CHAR:
> {code}
> De-facto it should work correctly since outputOI is either an instance of 
> WritableStringObjectInspector or JavaStringObjectInspector, but it would be 
> better to rewrite this case to avoid possible fall through.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-14 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125508#comment-16125508
 ] 

Rui Li commented on HIVE-17291:
---

[~pvary], thanks so much for working in the middle of the night - I'll make 
sure to take time difference into consideration next time I use any magic :) :)
bq. When we change the spark configuration the RpcServer is killed, and a new 
one is started
More precisely, it's the spark session gets killed and started again, not the 
RpcServer. Rpc configs are made immutable by HIVE-16876.
bq. What happens when a query uses a wrong number of reducers? The query will 
run, but will result in slower execution? Out of memory?
Yes. But I don't think using mem/core will lead to "wrong number of reducers". 
We always firstly compute numReducers based on data size. Then mem/core is used 
as an attempt to have more reducers, not fewer:
{code}
// If there are more cores, use the number of cores
numReducers = Math.max(numReducers, 
sparkMemoryAndCores.getSecond());
{code}
 The reason is spark tasks are cheap but tend to require more memory. So it's 
safer to have more of them when we can. On the other hand, we still don't want 
too much than really needed, therefore it's better to use the available cores 
than the configured number - it ensures when taking effect, all the reducers 
finish in one round (assuming static allocation). So I think this is a useful, 
at least harmless, optimization when automatically deciding numReducers.
For stable tests, I'd prefer to stick to what we do in QTestUtil. Let me know 
your opinions. Thanks.

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17311) Numeric overflow in the HiveConf

2017-08-14 Thread Oleg Danilov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Danilov updated HIVE-17311:

Attachment: HIVE-17311.patch

> Numeric overflow in the HiveConf
> 
>
> Key: HIVE-17311
> URL: https://issues.apache.org/jira/browse/HIVE-17311
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Minor
> Attachments: HIVE-17311.patch
>
>
> multiplierFor() method contains a typo, which causes wrong parsing of the 
> rare suffixes ('tb' & 'pb').



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17311) Numeric overflow in the HiveConf

2017-08-14 Thread Oleg Danilov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Danilov updated HIVE-17311:

Status: Patch Available  (was: Open)

> Numeric overflow in the HiveConf
> 
>
> Key: HIVE-17311
> URL: https://issues.apache.org/jira/browse/HIVE-17311
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Minor
> Attachments: HIVE-17311.patch
>
>
> multiplierFor() method contains a typo, which causes wrong parsing of the 
> rare suffixes ('tb' & 'pb').



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17311) Numeric overflow in the HiveConf

2017-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125503#comment-16125503
 ] 

ASF GitHub Bot commented on HIVE-17311:
---

GitHub user dosoft opened a pull request:

https://github.com/apache/hive/pull/229

HIVE-17311: Fixed numeric overflow, added test



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dosoft/hive HIVE-17311

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/229.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #229


commit 0dfa6648613a660961b06ffd4aed9b026c141a11
Author: Oleg Danilov 
Date:   2017-08-14T10:19:55Z

HIVE-17311: Fixed numeric overflow, added test




> Numeric overflow in the HiveConf
> 
>
> Key: HIVE-17311
> URL: https://issues.apache.org/jira/browse/HIVE-17311
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Minor
>
> multiplierFor() method contains a typo, which causes wrong parsing of the 
> rare suffixes ('tb' & 'pb').



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17311) Numeric overflow in the HiveConf

2017-08-14 Thread Oleg Danilov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Danilov reassigned HIVE-17311:
---


> Numeric overflow in the HiveConf
> 
>
> Key: HIVE-17311
> URL: https://issues.apache.org/jira/browse/HIVE-17311
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Minor
>
> multiplierFor() method contains a typo, which causes wrong parsing of the 
> rare suffixes ('tb' & 'pb').



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17268) WebUI / QueryPlan: query plan is sometimes null when explain output conf is on

2017-08-14 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125495#comment-16125495
 ] 

Peter Vary commented on HIVE-17268:
---

HIVE-17268.3.patch LGTM. +1

> WebUI / QueryPlan: query plan is sometimes null when explain output conf is on
> --
>
> Key: HIVE-17268
> URL: https://issues.apache.org/jira/browse/HIVE-17268
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
> Attachments: HIVE-17268.2.patch, HIVE-17268.3.patch, HIVE-17268.patch
>
>
> The Hive WebUI's Query Plan tab displays "SET hive.log.explain.output TO true 
> TO VIEW PLAN" even when hive.log.explain.output is set to true, when the 
> query cannot be compiled, because the plan is null in this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-08-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125474#comment-16125474
 ] 

Hive QA commented on HIVE-16948:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881698/HIVE-16948.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11005 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testJoinThriftSerializeInTasks 
(batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6381/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6381/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6381/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881698 - PreCommit-HIVE-Build

> Invalid explain when running dynamic partition pruning query in Hive On Spark
> -
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948_1.patch, HIVE-16948.2.patch, 
> HIVE-16948.5.patch, HIVE-16948.6.patch, HIVE-16948.patch
>
>
> in 
> [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
>  in spark_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
>   

[jira] [Commented] (HIVE-17305) New insert overwrite dynamic partitions qtest need to have the golden file regenerated

2017-08-14 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125469#comment-16125469
 ] 

Peter Vary commented on HIVE-17305:
---

+1 LGTM

> New insert overwrite dynamic partitions qtest need to have the golden file 
> regenerated
> --
>
> Key: HIVE-17305
> URL: https://issues.apache.org/jira/browse/HIVE-17305
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Trivial
> Attachments: HIVE-17305.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-14 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125439#comment-16125439
 ] 

Peter Vary commented on HIVE-17291:
---

[~lirui]: You might be using black magic, or similar :)
In the middle of the night I woke up, that something is not good with the patch 
- some of the query output should be changed as a result of the change, so I 
decided that check out some stuff, and saw your comment which immediately 
answered my question. What I am not sure of is wether you were using black 
magic to wake me up, because my patch was not good, or you were using white 
magic to answer my unasked question :) :) :)

I found this problem with the help of the qtest outputs, but I think this 
signals a more important usage problem. Maybe you have more experience how 
users use HoS, and how they tune their queries, but when my wife optimizes an 
Oracle query she often iterates through explain / hint change / explain loops. 
I imagine with hive it looks like this: explain / change config / explain. When 
we change the spark configuration the RpcServer is killed, and a new one is 
started (I might be wrong here, tell me if it is not like this). The result of 
this is that the first explain is different than the next one this defeats the 
whole purpose of the optimization.

What happens when a query uses a wrong number of reducers? The query will run, 
but will result in slower execution? Out of memory?

Also the test results of HIVE-17292 showed me, that with multiple reducers 
configured the output of the {{hiveSparkClient.getExecutorCount()}} is even 
less reliable. The possible outcomes are:
- *0* - if only the Application Master is started
- *1* - if the AM, and 1 executor is started
- *2* - if the AM, and both executors are started

So we should not base any decision on it neither in QTestUtil, nor in 
{{sparkSession.getMemoryAndCores()}}, only if we made sure that all of the 
executors are stared which will be running at the time of the query. I starting 
to understand the wisdom of [~xuefuz] statement:

??I also had some doubts on the original idea which was to automatically and 
dynamically deciding the parallelism based on available memory/cores. Maybe we 
should back to the basis, where the number of reducers is solely determined 
(statically) by the total shuffled data size (stats) divided by the 
configuration "bytes per reducer". I'm open to all proposals, including doing 
this for dynamic allocation and using spark.executor.instances for static 
allocation.??

In the light of these findings I think we should use the proposed algorithm to 
calculate the parallelism (as proposed by [~xuefuz]):
- for dynamic allocation - number of reducers is solely determined (statically) 
by the total shuffled data size (stats) divided by the configuration "bytes per 
reducer"
- for static allocation - using spark.executor.instances

What do you think?

Sorry for dragging you through the whole thinking process :(

Thanks,
Peter

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-14 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125415#comment-16125415
 ] 

liyunzhang_intel commented on HIVE-17287:
-

[~lirui]: When i viewed all tasks, saw that 1 task was  still running. 

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: not_stages_completed_but_job_completed.PNG, 
> query67-fail-at-groupby.png, query67-groupby_shuffle_metric.png
>
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-14 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125371#comment-16125371
 ] 

Rui Li commented on HIVE-17287:
---

[~kellyzly], disabling {{spark.shuffle.reduceLocality.enabled}} can make reduce 
tasks more evenly distributed among executors. But as you said, it can't solve 
the skew in data. As to the screenshot, I don't know the cause either. What if 
you look at the metrics of all the tasks? Are they all in complete state?

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: not_stages_completed_but_job_completed.PNG, 
> query67-fail-at-groupby.png, query67-groupby_shuffle_metric.png
>
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-08-14 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16948:

Attachment: HIVE-16948.6.patch

[~lirui]: attached is HIVE-16948.6.patch. Update code with latest master.  Can 
you spend some time to review, thanks!

> Invalid explain when running dynamic partition pruning query in Hive On Spark
> -
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948_1.patch, HIVE-16948.2.patch, 
> HIVE-16948.5.patch, HIVE-16948.6.patch, HIVE-16948.patch
>
>
> in 
> [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
>  in spark_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>

[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-14 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125326#comment-16125326
 ] 

liyunzhang_intel commented on HIVE-17287:
-

[~lirui]:  in current case, i have not set {{hive.spark.use.groupby.shuffle}}, 
but i think the value is true because the default value is true. After 
disabling {{spark.shuffle.reduceLocality.enabled}}, i reran the query67.  It 
showed passed. But one strange thing i found is [not all stages finished but 
the result of spark job is 
completed|https://issues.apache.org/jira/secure/attachment/12881692/not_stages_completed_but_job_completed.PNG].I
 don't know whether this is bug of spark. Meanwhile in spark history server, 
the shuffle read metrics are still skewed. 

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: not_stages_completed_but_job_completed.PNG, 
> query67-fail-at-groupby.png, query67-groupby_shuffle_metric.png
>
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >