[jira] [Commented] (HIVE-17205) add functional support

2017-08-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137975#comment-16137975
 ] 

Hive QA commented on HIVE-17205:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12883251/HIVE-17205.13.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10999 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_mat_4] (batchId=6)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_not_acid] 
(batchId=90)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistCpAs (batchId=249)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistcp (batchId=249)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion02 
(batchId=215)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimal 
(batchId=183)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimalXY 
(batchId=183)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6498/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6498/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6498/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12883251 - PreCommit-HIVE-Build

> add functional support
> --
>
> Key: HIVE-17205
> URL: https://issues.apache.org/jira/browse/HIVE-17205
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17205.01.patch, HIVE-17205.02.patch, 
> HIVE-17205.03.patch, HIVE-17205.09.patch, HIVE-17205.10.patch, 
> HIVE-17205.11.patch, HIVE-17205.12.patch, HIVE-17205.13.patch
>
>
> make sure unbucketed tables can be marked transactional=true
> make insert/update/delete/compaction work



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-23 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137991#comment-16137991
 ] 

liyunzhang_intel commented on HIVE-16823:
-

update changes in q*out in HIVE-16823.1.patch. Most changes like following. 
This is because we remove ConstantPropagate in 
SparkCompiler#runDynamicPartitionPruning.  
{code}
Map Operator Tree:
 TableScan
   alias: srcpart_date
-  filterExpr: ((date = '2008-04-08') and ds is not null) 
(type: boolean)
+  filterExpr: ((date = '2008-04-08') and ds is not null and 
true) (type: boolean)
   Statistics: Num rows: 2 Data size: 42 Basic stats: COMPLETE 
Column stats: NONE
   Filter Operator
-predicate: ((date = '2008-04-08') and ds is not null) 
(type: boolean)
+predicate: ((date = '2008-04-08') and ds is not null and 
true) (type: boolean)
 Statistics: Num rows: 1 Data size: 21 Basic stats: 
COMPLETE Column stats: NONE
{code}
Big changes in spark_vectorized_dynamic_partition_pruning.q.out as this file 
has not been updated for long time.

> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>Assignee: liyunzhang_intel
> Attachments: explain.spark, explain.tez, HIVE-16823.patch
>
>
> spark_vectorized_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:85) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Th

[jira] [Updated] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-23 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16823:

Attachment: HIVE-16823.1.patch

> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>Assignee: liyunzhang_intel
> Attachments: explain.spark, explain.tez, HIVE-16823.1.patch, 
> HIVE-16823.patch
>
>
> spark_vectorized_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:85) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:832)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:179)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1035)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
>

[jira] [Commented] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-23 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138071#comment-16138071
 ] 

liyunzhang_intel commented on HIVE-16823:
-

explain more about the big changes in 
spark_vectorized_dynamic_partition_pruning.q.out on review board. [~lirui] and 
[~stakiar]: If have time, please help review, thanks!

> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>Assignee: liyunzhang_intel
> Attachments: explain.spark, explain.tez, HIVE-16823.1.patch, 
> HIVE-16823.patch
>
>
> spark_vectorized_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:85) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:832)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:179)
>  ~[hive-exec-3.0.0-SNAPSHOT.ja

[jira] [Commented] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-23 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138093#comment-16138093
 ] 

Rui Li commented on HIVE-16823:
---

[~kellyzly], thanks for working on this. I think the root cause is not in how 
we implement DPP. I can reproduce the issue with DPP disabled:
{noformat}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=false;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.vectorized.execution.enabled=true;
set hive.strict.checks.cartesian.product=false;

set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask = true;
set hive.auto.convert.join.noconditionaltask.size = 1000;

set hive.cbo.enable=false;
set hive.optimize.constant.propagation=true;
{noformat}

Looking at the code in the vectorizer, I think it's because the vectorized GBY 
is not properly set when SparkHashTableSinkOperator is present. To verify this, 
if you change {{hive.auto.convert.join=false}} in the above configs, the query 
can run successfully. I'm still trying to understand the logic in the 
vectorizer. Please also update the JIRA if you find anything. Thanks.

> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>Assignee: liyunzhang_intel
> Attachments: explain.spark, explain.tez, HIVE-16823.1.patch, 
> HIVE-16823.patch
>
>
> spark_vectorized_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:85) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> java.ut

[jira] [Assigned] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-08-23 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-16908:
-

Assignee: Peter Vary  (was: Sunitha Beeram)

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Peter Vary
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-08-23 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16908:
--
Attachment: HIVE-16908.4.patch

Resubmitting [~mithun]'s patch to run the tests

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Peter Vary
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch, HIVE-16908.4.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-08-23 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-16908:
-

Assignee: Mithun Radhakrishnan  (was: Peter Vary)

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch, HIVE-16908.4.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17362) The MAX_PREWARM_TIME should be configurable on HoS

2017-08-23 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17362:
--
  Resolution: Fixed
Target Version/s: 3.0.0
  Status: Resolved  (was: Patch Available)

Pushed to master.
Thanks for the review [~xuefuz]!

> The MAX_PREWARM_TIME should be configurable on HoS
> --
>
> Key: HIVE-17362
> URL: https://issues.apache.org/jira/browse/HIVE-17362
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17362.2.patch, HIVE-17362.patch
>
>
> When using HIVE_PREWARM_ENABLED, we are waiting MAX_PREWARM_TIME for the 
> containers to warm up. This is currently set to 5s. This is often not enough 
> for a spark session to initialize the executors. We should be able to 
> configure this, so we can set a value which has an effect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17362) The MAX_PREWARM_TIME should be configurable on HoS

2017-08-23 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17362:
--
Fix Version/s: 3.0.0

> The MAX_PREWARM_TIME should be configurable on HoS
> --
>
> Key: HIVE-17362
> URL: https://issues.apache.org/jira/browse/HIVE-17362
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Fix For: 3.0.0
>
> Attachments: HIVE-17362.2.patch, HIVE-17362.patch
>
>
> When using HIVE_PREWARM_ENABLED, we are waiting MAX_PREWARM_TIME for the 
> containers to warm up. This is currently set to 5s. This is often not enough 
> for a spark session to initialize the executors. We should be able to 
> configure this, so we can set a value which has an effect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17319) Make BoneCp configurable using hive properties in hive-site.xml

2017-08-23 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138166#comment-16138166
 ] 

Barna Zsombor Klara commented on HIVE-17319:


Tests failures should not be related.

> Make BoneCp configurable using hive properties in hive-site.xml
> ---
>
> Key: HIVE-17319
> URL: https://issues.apache.org/jira/browse/HIVE-17319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17319.01.patch, HIVE-17319.02.patch, 
> HIVE-17319.03.patch, HIVE-17319.draft.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138170#comment-16138170
 ] 

Hive QA commented on HIVE-16823:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12883287/HIVE-16823.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10994 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistCpAs (batchId=250)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistcp (batchId=250)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6499/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6499/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6499/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12883287 - PreCommit-HIVE-Build

> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>Assignee: liyunzhang_intel
> Attachments: explain.spark, explain.tez, HIVE-16823.1.patch, 
> HIVE-16823.patch
>
>
> spark_vectorized_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> 

[jira] [Updated] (HIVE-17319) Make BoneCp configurable using hive properties in hive-site.xml

2017-08-23 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17319:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master.
Thanks for the patch [~zsombor.klara]!

Could you please update the documentation too?
Thanks,
Peter

> Make BoneCp configurable using hive properties in hive-site.xml
> ---
>
> Key: HIVE-17319
> URL: https://issues.apache.org/jira/browse/HIVE-17319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-17319.01.patch, HIVE-17319.02.patch, 
> HIVE-17319.03.patch, HIVE-17319.draft.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17316) Use String.startsWith for the hidden configuration variables

2017-08-23 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138182#comment-16138182
 ] 

Peter Vary commented on HIVE-17316:
---

[~zsombor.klara]: Could you please document the changes on the Wiki?

Thanks,
Peter

> Use String.startsWith for the hidden configuration variables
> 
>
> Key: HIVE-17316
> URL: https://issues.apache.org/jira/browse/HIVE-17316
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-17316.01.patch, HIVE-17316.02.patch, 
> HIVE-17316.03.patch
>
>
> Currently HiveConf variables which should not be displayed to the user need 
> to be enumerated. We should enhance this to be able to hide configuration 
> variables by string prefix not just full equality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-1683) Column aliases cannot be used in a group by clause

2017-08-23 Thread Alfonso Nishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138203#comment-16138203
 ] 

Alfonso Nishikawa edited comment on HIVE-1683 at 8/23/17 10:26 AM:
---

For anyone reaching this comments, it is possible to set the group by position 
(not by alias, but in much cases will be enough) configuring {{set 
hive.groupby.orderby.position.alias=true;}}.

>From [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy] 
>:

bq. In groupByExpression columns are specified by name, not by position number. 
However in Hive 0.11.0 and later, columns can be specified by position when 
configured as follows:
bq.For Hive 0.11.0 through 2.1.x, set hive.groupby.orderby.position.alias 
to true (the default is false).
bq.For Hive 2.2.0 and later, set hive.groupby.position.alias to true (the 
default is false).




was (Author: alfonso.nishikawa):
For anyone reaching this comments, it is possible to set the group by position 
(not by alias, but in much cases will be enough) configuring {{set 
hive.groupby.orderby.position.alias=true;}}.

>From [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy] 
>:

bq. In groupByExpression columns are specified by name, not by position number. 
However in Hive 0.11.0 and later, columns can be specified by position when 
configured as follows:
For Hive 0.11.0 through 2.1.x, set hive.groupby.orderby.position.alias to 
true (the default is false).
For Hive 2.2.0 and later, set hive.groupby.position.alias to true (the 
default is false).



> Column aliases cannot be used in a group by clause
> --
>
> Key: HIVE-1683
> URL: https://issues.apache.org/jira/browse/HIVE-1683
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shrikrishna Lawande
>  Labels: SQL
>
> Column aliases cannot be used in a group by clause
> Following query would fail :
> select col1 as t, count(col2) from test group by t;
> FAILED: Error in semantic analysis: line 1:49 Invalid Table Alias or Column 
> Reference t



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-1683) Column aliases cannot be used in a group by clause

2017-08-23 Thread Alfonso Nishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138203#comment-16138203
 ] 

Alfonso Nishikawa commented on HIVE-1683:
-

For anyone reaching this comments, it is possible to set the group by position 
(not by alias, but in much cases will be enough) configuring {{set 
hive.groupby.orderby.position.alias=true;}}.

>From [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy] 
>:

bq. In groupByExpression columns are specified by name, not by position number. 
However in Hive 0.11.0 and later, columns can be specified by position when 
configured as follows:
For Hive 0.11.0 through 2.1.x, set hive.groupby.orderby.position.alias to 
true (the default is false).
For Hive 2.2.0 and later, set hive.groupby.position.alias to true (the 
default is false).



> Column aliases cannot be used in a group by clause
> --
>
> Key: HIVE-1683
> URL: https://issues.apache.org/jira/browse/HIVE-1683
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shrikrishna Lawande
>  Labels: SQL
>
> Column aliases cannot be used in a group by clause
> Following query would fail :
> select col1 as t, count(col2) from test group by t;
> FAILED: Error in semantic analysis: line 1:49 Invalid Table Alias or Column 
> Reference t



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used

2017-08-23 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138204#comment-16138204
 ] 

Peter Vary commented on HIVE-17327:
---

[~gopalv],  [~sershe]: I think this patch breaks these tests:
{code}
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistCpAs (batchId=250)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistcp (batchId=250)
{code}

Shall we remove the optional import here:
{code}

  org.apache.hadoop
  hadoop-client
  ${hadoop.version}
  true<-- This causes the problem
 
  
org.slf4j
slf4j-log4j12
  
  
commmons-logging
commons-logging
  
 

{code}

Also, I know it is used elsewhere too, but {{DistributedFileSystem}} is not a 
public api, especially not the {{DistributedFileSystem.getClient()}} which is 
marked private, and visible for testing. See: 
https://github.com/apache/hadoop/blob/branch-2.8.0/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L1197

Shouldn't we raise a public API request from the hadoop community?

Thanks,
Peter

> LLAP IO: restrict native file ID usage to default FS to avoid hypothetical 
> collisions when HDFS federation is used
> --
>
> Key: HIVE-17327
> URL: https://issues.apache.org/jira/browse/HIVE-17327
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17327.01.patch, HIVE-17327.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-08-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138225#comment-16138225
 ] 

Hive QA commented on HIVE-16908:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12883297/HIVE-16908.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10994 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistCpAs (batchId=250)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistcp (batchId=250)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6500/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6500/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6500/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12883297 - PreCommit-HIVE-Build

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch, HIVE-16908.4.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-08-23 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138358#comment-16138358
 ] 

Peter Vary commented on HIVE-16908:
---

The failing tests are not related:
- TestBeeLineDriver.testCliDriver create_merge_compressed is caused by 
HIVE-17370: Some tests are failing with java.io.FileNotFoundException: File 
file:/tmp/hadoop/mapred/...
- TestMiniSparkOnYarnCliDriver.testCliDriver 
spark_vectorized_dynamic_partition_pruning - HIVE-17122
spark_vectorized_dynamic_partition_pruning.q is continuously failing
- TestFileUtils are caused by HIVE-17327
LLAP IO: restrict native file ID usage to default FS to avoid hypothetical 
collisions when HDFS federation is used
- The other two are known flaky

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch, HIVE-16908.4.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17374) HCatalog storage failures after hive TRUNCATE

2017-08-23 Thread Federico De Giuli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Federico De Giuli updated HIVE-17374:
-
   Tags: hcatalog, hive, truncate  (was: hcatalog)
Description: 
It looks like truncating a table in hive does not remove partition directories 
in the filesystem. This causes conflicts when HCatalog tries to write to the 
same partition:

{{-
Commit failed for output: outputName:scope-705 of vertex/vertexGroup:scope-708 
isVertexGroupOutput:false, java.io.IOException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:281)
at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigOutputFormatTez$PigOutputCommitterTez.commitJob(PigOutputFormatTez.java:98)
at 
org.apache.tez.mapreduce.committer.MROutputCommitter.commitOutput(MROutputCommitter.java:99)
at org.apache.tez.dag.app.dag.impl.DAGImpl$1.run(DAGImpl.java:1018)
at org.apache.tez.dag.app.dag.impl.DAGImpl$1.run(DAGImpl.java:1015)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1936)
at 
org.apache.tez.dag.app.dag.impl.DAGImpl.commitOutput(DAGImpl.java:1015)
at org.apache.tez.dag.app.dag.impl.DAGImpl.access$2000(DAGImpl.java:149)
at org.apache.tez.dag.app.dag.impl.DAGImpl$3.call(DAGImpl.java:1094)
at org.apache.tez.dag.app.dag.impl.DAGImpl$3.call(DAGImpl.java:1089)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:279)
... 15 more
Caused by: org.apache.hive.hcatalog.common.HCatException : 2012 : Moving of 
data failed during commit : Failed to move file: 
hdfs://HOSTNAME/user/fdegiuli/db/video_sessions/_DYN0.6589663085145253/dt=2017080500
 to hdfs://HOSTNAME/user/fdegiuli/db/video_sessions
at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:825)
at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:782)
at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:1176)
at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:272)
... 20 more

DAG did not succeed due to COMMIT_FAILURE. failedVertices:0 killedVertices:0}}

Here's a look at the directory. As you can see, {{dt=2017080500}} is empty.

{{[fdegiuli ~]$ hdfs dfs -ls  $NN/user/fdegiuli/db/video_sessions/
Found 4 items
-rw---   3 fdegiuli users  0 2017-08-16 15:48 
NAMENODE/user/fdegiuli/db/video_sessions/_placeholder_0.016147276492671336
-rw---   3 fdegiuli users  0 2017-08-15 21:19 
NAMENODE/user/fdegiuli/db/video_sessions/_placeholder_0.15239090174209513
-rw---   3 fdegiuli users  0 2017-08-16 14:49 
NAMENODE/user/fdegiuli/db/video_sessions/_placeholder_0.7007380672193727
drwx--   - fdegiuli users  0 2017-08-16 15:44 
NAMENODE/user/fdegiuli/db/video_sessions/dt=2017080500
[fdegiuli ~]$ hdfs dfs -ls  $NN/user/fdegiuli/db/video_sessions/dt=2017080500
[fdegiuli ~]$}}
 
I'm using a pig script to repopulate the data in the table. I don't know if 
this error happens when using Spark.

The way I see it, there's two possible bugs here:
 1. TRUNCATE TABLE should delete the partition directories
 2. HCatalog should handle existing partition directories correctly.

I'm happy to provide any more information you may want, and help with an 
eventual patch.

Copied from 
http://mail-archives.apache.org/mod_mbox/hive-user/201708.mbox/%3CCAFPScVjA8PYDXOajgJR8LgvypFd00eJDZuSrFVFCci1odt42_g%40mail.gmail.com%3E

> HCatalog storage failures after hive TRUNCATE
> -
>
> Key: HIVE-17374
> URL: https://issues.apache.org/jira/browse/HIVE-17374
> Project: Hive
>  I

[jira] [Updated] (HIVE-17374) HCatalog storage failures after hive TRUNCATE

2017-08-23 Thread Federico De Giuli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Federico De Giuli updated HIVE-17374:
-
Description: 
It looks like truncating a table in hive does not remove partition directories 
in the filesystem. This causes conflicts when HCatalog tries to write to the 
same partition:

{code}
Commit failed for output: outputName:scope-705 of vertex/vertexGroup:scope-708 
isVertexGroupOutput:false, java.io.IOException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:281)
at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigOutputFormatTez$PigOutputCommitterTez.commitJob(PigOutputFormatTez.java:98)
at 
org.apache.tez.mapreduce.committer.MROutputCommitter.commitOutput(MROutputCommitter.java:99)
at org.apache.tez.dag.app.dag.impl.DAGImpl$1.run(DAGImpl.java:1018)
at org.apache.tez.dag.app.dag.impl.DAGImpl$1.run(DAGImpl.java:1015)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1936)
at 
org.apache.tez.dag.app.dag.impl.DAGImpl.commitOutput(DAGImpl.java:1015)
at org.apache.tez.dag.app.dag.impl.DAGImpl.access$2000(DAGImpl.java:149)
at org.apache.tez.dag.app.dag.impl.DAGImpl$3.call(DAGImpl.java:1094)
at org.apache.tez.dag.app.dag.impl.DAGImpl$3.call(DAGImpl.java:1089)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:279)
... 15 more
Caused by: org.apache.hive.hcatalog.common.HCatException : 2012 : Moving of 
data failed during commit : Failed to move file: 
hdfs://HOSTNAME/user/fdegiuli/db/video_sessions/_DYN0.6589663085145253/dt=2017080500
 to hdfs://HOSTNAME/user/fdegiuli/db/video_sessions
at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:825)
at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:782)
at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:1176)
at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:272)
... 20 more

DAG did not succeed due to COMMIT_FAILURE. failedVertices:0 killedVertices:0
{code}

Here's a look at the directory. As you can see, {{dt=2017080500}} is empty.

{code}
[fdegiuli ~]$ hdfs dfs -ls  $NN/user/fdegiuli/db/video_sessions/
Found 4 items
-rw---   3 fdegiuli users  0 2017-08-16 15:48 
NAMENODE/user/fdegiuli/db/video_sessions/_placeholder_0.016147276492671336
-rw---   3 fdegiuli users  0 2017-08-15 21:19 
NAMENODE/user/fdegiuli/db/video_sessions/_placeholder_0.15239090174209513
-rw---   3 fdegiuli users  0 2017-08-16 14:49 
NAMENODE/user/fdegiuli/db/video_sessions/_placeholder_0.7007380672193727
drwx--   - fdegiuli users  0 2017-08-16 15:44 
NAMENODE/user/fdegiuli/db/video_sessions/dt=2017080500
[fdegiuli ~]$ hdfs dfs -ls  $NN/user/fdegiuli/db/video_sessions/dt=2017080500
[fdegiuli ~]$
{code}
 
I'm using a pig script to repopulate the data in the table. I don't know if 
this error happens when using Spark.

The way I see it, there's two possible bugs here:
 1. TRUNCATE TABLE should delete the partition directories
 2. HCatalog should handle existing partition directories correctly.

I'm happy to provide any more information you may want, and help with an 
eventual patch.

Copied from 
http://mail-archives.apache.org/mod_mbox/hive-user/201708.mbox/%3CCAFPScVjA8PYDXOajgJR8LgvypFd00eJDZuSrFVFCci1odt42_g%40mail.gmail.com%3E

  was:
It looks like truncating a table in hive does not remove partition directories 
in the filesystem. This causes conflicts when HCatalog tries to write to the 
same partition:

{{-
Commit failed for output: outputName:scope-705 of vertex/vertexGroup:scope-708 
isVertexGroupOutput

[jira] [Assigned] (HIVE-17318) Make Hikari configurable using hive properties in hive-site.xml

2017-08-23 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara reassigned HIVE-17318:
--

Assignee: Barna Zsombor Klara

> Make Hikari configurable using hive properties in hive-site.xml
> ---
>
> Key: HIVE-17318
> URL: https://issues.apache.org/jira/browse/HIVE-17318
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17318) Make Hikari CP configurable using hive properties in hive-site.xml

2017-08-23 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-17318:
---
Summary: Make Hikari CP configurable using hive properties in hive-site.xml 
 (was: Make Hikari configurable using hive properties in hive-site.xml)

> Make Hikari CP configurable using hive properties in hive-site.xml
> --
>
> Key: HIVE-17318
> URL: https://issues.apache.org/jira/browse/HIVE-17318
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17318) Make Hikari CP configurable using hive properties in hive-site.xml

2017-08-23 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-17318:
---
Attachment: HIVE-17318.01.patch

Uploading first patch for HikariCP. Should be similar to the patch for BoneCP.

> Make Hikari CP configurable using hive properties in hive-site.xml
> --
>
> Key: HIVE-17318
> URL: https://issues.apache.org/jira/browse/HIVE-17318
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17318.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17318) Make Hikari CP configurable using hive properties in hive-site.xml

2017-08-23 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-17318:
---
Status: Patch Available  (was: Open)

> Make Hikari CP configurable using hive properties in hive-site.xml
> --
>
> Key: HIVE-17318
> URL: https://issues.apache.org/jira/browse/HIVE-17318
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17318.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17265) Cache merged column stats from retrieved partitions

2017-08-23 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17265:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for reviewing [~ashutoshc]!

> Cache merged column stats from retrieved partitions
> ---
>
> Key: HIVE-17265
> URL: https://issues.apache.org/jira/browse/HIVE-17265
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17265.02.patch, HIVE-17265.03.patch, 
> HIVE-17265.04.patch, HIVE-17265.05.patch, HIVE-17265.patch
>
>
> Currently when we retrieve stats from the metastore for a column in a 
> partitioned table, we will execute the logic to merge the column stats coming 
> from each partition multiple times.
> Even though we avoid multiple calls to metastore if the cache for the stats 
> in enabled, merging the stats for a given column can take a large amount of 
> time if there is a large number of partitions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-23 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138741#comment-16138741
 ] 

anishek commented on HIVE-16886:


auto increments can have holes if

* a transaction was aborted.
* or the sequence generation is happening in code by explicitly calling the 
next_val on the sequence behind the auto increment and then doing the insert 
from the application in which case the problem mentioned by [~spena] with GC in 
app can cause holes or unordered inserts.

holes can happen and are not guaranteed because of the above cases, however we 
should not care about the case 1 of aborted transactions until we get 
increasing ordered unique numbers. 

as for the second case if we use the auto_increment at DB and not use 
datanuclues datastore-identity with auto-increment, we should be able to get 
them in order. 

the test provided by  [~spena]  works on mysql with both positive and negative 
mapping. 

negative mapping -- existing mapping with mysql.
positive mapping -- auto increment as part of create table for notification 
log, on NL_ID, on a fresh db. 

for the patch rather than using the existing columns i think i will create 
another column that will be auto increment at db level.  I will try the fix on 
a postgres sql db also to see if there is separate behavior, however if the 
race condition happens at the db level such that two transactions are committed 
from the application (HMS) at the same time but the db will order them 
depending on which acquires the lock to auto_increment first. how ever since 
replication is not realtime this lag of a few nano/micro secs should not be a 
problem. 

retrying the whole metastore operation with optimistic locking in application 
code is just calling for a lot of retries on HMS side with the possibility of 
retrying complete metastore operations to be redone if something fails when one 
commit is larger than other commits. additionally this will need us to do 
perfect distributed transactions on rdbms + hdfs. 

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest()

[jira] [Commented] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used

2017-08-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138843#comment-16138843
 ] 

Sergey Shelukhin commented on HIVE-17327:
-

HDFS-7878 is intended for the API but I gave up on that long ago.
DFSClient seems to be used in many places like this. As for the dependency I 
can update it.

> LLAP IO: restrict native file ID usage to default FS to avoid hypothetical 
> collisions when HDFS federation is used
> --
>
> Key: HIVE-17327
> URL: https://issues.apache.org/jira/browse/HIVE-17327
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17327.01.patch, HIVE-17327.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-23 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138864#comment-16138864
 ] 

Alexander Kolbasov commented on HIVE-16886:
---

There are two types of holes that can be observed:

1. Temporary hole - the transaction allocated the ID and is in progress but not 
committed yet. The hole will be filled at some time in the future when 
transaction commits.
2. Persistent hole in case when transaction was aborted.

Both cases present some problems for consumers. Suppose consumers process IDs 
in order and they read 1, 3. What should they do? If they continue going 
forward, ID 2 will not be handled at all. They don't know whether 2 will appear 
later or will never appear. Usually it will appear (in most cases transactions 
go through) but clients don't know when. Should they process 3 or not? If they 
do and 2 appears later and they detect it, they will apply operations out of 
order.

That said, it is better then having duplicate IDs. But this still creates hard 
problems for consumers who rely on accurate stream of IDs.

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2017-08-23 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138890#comment-16138890
 ] 

Aihua Xu commented on HIVE-14162:
-

Such requirement can be achieved by the following configuration properties.

hive.server2.session.check.interval
hive.server2.idle.session.check.operation
hive.server2.idle.session.timeout

With those properties being set, a background thread will check if the session 
is idle for enough time and if there are no active/pending operations. 
HiveSession will get closed if it's true and the associated SparkSession also 
gets closed to release the resources. 

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: New Feature
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-14162.1.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17360) Tez session reopen appears to use a wrong conf object

2017-08-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17360:

Attachment: HIVE-17360.03.patch

Fixing another test. Added a warning to a condition that looks test-specific.

> Tez session reopen appears to use a wrong conf object
> -
>
> Key: HIVE-17360
> URL: https://issues.apache.org/jira/browse/HIVE-17360
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17360.01.patch, HIVE-17360.02.patch, 
> HIVE-17360.03.patch, HIVE-17360.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-23 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138921#comment-16138921
 ] 

anishek commented on HIVE-16886:


1. this will not happen as the id will *not* be allocated before commit, its at 
DB commit level, that id is generated and *not* at some future time when commit 
is happening.
2. this can happen.

since 1 is not happening consumers should keep processing elements in order 1,3 
etc and not wait for 2 to appear later. 


> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17344) LocalCache element memory usage is not calculated properly.

2017-08-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138924#comment-16138924
 ] 

Sergey Shelukhin commented on HIVE-17344:
-

Yeah sorry it's allocated and reset to 0 or read the same way, so remaining == 
capacity. Then, it's duplicated (not copied) on read, so the original buffer 
never changes.
Can you add a check with an error that this is the case, we don't want to cache 
buffers with spurious extra space. Other than that +1

> LocalCache element memory usage is not calculated properly.
> ---
>
> Key: HIVE-17344
> URL: https://issues.apache.org/jira/browse/HIVE-17344
> Project: Hive
>  Issue Type: Bug
>Reporter: Janos Gub
>Assignee: Janos Gub
> Attachments: HIVE-17344.patch
>
>
> Orc footer cache has a calculation of memory usage:
> {code:java}
> public int getMemoryUsage() {
>   return bb.remaining() + 100; // 100 is for 2 longs, BB and java overheads 
> (semi-arbitrary).
> }
> {code}
> ByteBuffer.remaining returns the remaining space in the bytebuffer, thus 
> allowing this cache have elements MAXWEIGHT/100 of arbitrary size. I think 
> the correct solution would be bb.capacity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-23 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138927#comment-16138927
 ] 

Alexander Kolbasov commented on HIVE-16886:
---

[~anishek] We actually observed this behavior. Seems that DBs allocate this ID 
not at commit time.

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used

2017-08-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138929#comment-16138929
 ] 

Sergey Shelukhin commented on HIVE-17327:
-

Actually I'll probably revert changes to shims.

> LLAP IO: restrict native file ID usage to default FS to avoid hypothetical 
> collisions when HDFS federation is used
> --
>
> Key: HIVE-17327
> URL: https://issues.apache.org/jira/browse/HIVE-17327
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17327.01.patch, HIVE-17327.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used

2017-08-23 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138932#comment-16138932
 ] 

Peter Vary commented on HIVE-17327:
---

[~sershe]: I have read through the first 20 or so comments on HDFS-7878 , and I 
absolutely agree with your frustration! :( Sad to see things like this.
Thanks for the fast response!

Peter

> LLAP IO: restrict native file ID usage to default FS to avoid hypothetical 
> collisions when HDFS federation is used
> --
>
> Key: HIVE-17327
> URL: https://issues.apache.org/jira/browse/HIVE-17327
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17327.01.patch, HIVE-17327.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-23 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138939#comment-16138939
 ] 

Alexander Kolbasov commented on HIVE-16886:
---

Here is some discussion of the issue on the Oracle side - I don't know how 
accurate this is. 
https://dba.stackexchange.com/questions/3945/can-i-implement-a-gapless-identity-column-in-oracle

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-23 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138939#comment-16138939
 ] 

Alexander Kolbasov edited comment on HIVE-16886 at 8/23/17 7:37 PM:


Here is some discussion of the issue on the Oracle side - I don't know how 
accurate this is. 
https://dba.stackexchange.com/questions/3945/can-i-implement-a-gapless-identity-column-in-oracle

The statement there is that not only it is impossible to have gapless sequence, 
it is also impossible to guarantee ordering which means that earlier ID values 
may appear after later ones.


was (Author: akolb):
Here is some discussion of the issue on the Oracle side - I don't know how 
accurate this is. 
https://dba.stackexchange.com/questions/3945/can-i-implement-a-gapless-identity-column-in-oracle

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17372) update druid dependency to druid 0.10.1

2017-08-23 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17372:
--
Status: Patch Available  (was: Open)

> update druid dependency to druid 0.10.1
> ---
>
> Key: HIVE-17372
> URL: https://issues.apache.org/jira/browse/HIVE-17372
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17372.patch
>
>
> Update to most recent druid version to be released August 23.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17372) update druid dependency to druid 0.10.1

2017-08-23 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17372:
--
Attachment: HIVE-17372.patch

> update druid dependency to druid 0.10.1
> ---
>
> Key: HIVE-17372
> URL: https://issues.apache.org/jira/browse/HIVE-17372
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17372.patch
>
>
> Update to most recent druid version to be released August 23.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17372) update druid dependency to druid 0.10.1

2017-08-23 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138952#comment-16138952
 ] 

slim bouguerra commented on HIVE-17372:
---

[~ashutoshc] can you please take a look at this.

> update druid dependency to druid 0.10.1
> ---
>
> Key: HIVE-17372
> URL: https://issues.apache.org/jira/browse/HIVE-17372
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17372.patch
>
>
> Update to most recent druid version to be released August 23.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used

2017-08-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138983#comment-16138983
 ] 

Sergey Shelukhin commented on HIVE-17327:
-

Committed the shim changes (reverting the API change)

> LLAP IO: restrict native file ID usage to default FS to avoid hypothetical 
> collisions when HDFS federation is used
> --
>
> Key: HIVE-17327
> URL: https://issues.apache.org/jira/browse/HIVE-17327
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17327.01.patch, HIVE-17327.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used

2017-08-23 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139001#comment-16139001
 ] 

Peter Vary commented on HIVE-17327:
---

That was fast! Thanks [~sershe] for taking care of this!

Peter

> LLAP IO: restrict native file ID usage to default FS to avoid hypothetical 
> collisions when HDFS federation is used
> --
>
> Key: HIVE-17327
> URL: https://issues.apache.org/jira/browse/HIVE-17327
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17327.01.patch, HIVE-17327.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17375) stddev_samp,var_samp standard compliance

2017-08-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-17375:
---


> stddev_samp,var_samp standard compliance
> 
>
> Key: HIVE-17375
> URL: https://issues.apache.org/jira/browse/HIVE-17375
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
>
> these two udaf-s are returning 0 in case of only one element - however the 
> stadard requires NULL to be returned



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17375) stddev_samp,var_samp standard compliance

2017-08-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-17375:

Attachment: HIVE-17375.1.patch

#1) remove the extra condition to return 0 in these cases; add case the q files

> stddev_samp,var_samp standard compliance
> 
>
> Key: HIVE-17375
> URL: https://issues.apache.org/jira/browse/HIVE-17375
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-17375.1.patch
>
>
> these two udaf-s are returning 0 in case of only one element - however the 
> stadard requires NULL to be returned



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17375) stddev_samp,var_samp standard compliance

2017-08-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-17375:

Status: Patch Available  (was: Open)

> stddev_samp,var_samp standard compliance
> 
>
> Key: HIVE-17375
> URL: https://issues.apache.org/jira/browse/HIVE-17375
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-17375.1.patch
>
>
> these two udaf-s are returning 0 in case of only one element - however the 
> stadard requires NULL to be returned



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17370) Some tests are failing with java.io.FileNotFoundException: File file:/tmp/hadoop/mapred/...

2017-08-23 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139057#comment-16139057
 ] 

Peter Vary commented on HIVE-17370:
---

Another example:
https://builds.apache.org/job/PreCommit-HIVE-Build/6500/testReport/org.apache.hadoop.hive.cli/TestBeeLineDriver/testCliDriver_create_merge_compressed_/
{code}
PREHOOK: query: select count(1) from tgt_rc_merge_test
PREHOOK: type: QUERY
PREHOOK: Input: test_db_create_merge_compressed@tgt_rc_merge_test
PREHOOK: Output: 
file:/home/hiveptest/35.192.53.198-hiveptest-0/apache-github-source-source/itests/qtest/target/tmp/local_base/scratch/b28ee209-0c8d-43f1-954b-c18175e0b87b/hive_2017-08-23_03-41-54_963_3864419413885540377-21/-mr-10001
Query ID = hiveptest_20170823034154_62ee5e77-f752-4aa9-b09e-dd2edd778f89
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
ExitCodeException exitCode=1: chmod: cannot access 
‘/tmp/hadoop/mapred/staging/user1047316188/.staging/job_local1047316188_0068/job.jar’:
 No such file or directory

at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Shell.run(Shell.java:869)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1264)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1246)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:771)
at 
org.apache.hadoop.fs.ChecksumFileSystem$1.apply(ChecksumFileSystem.java:506)
at 
org.apache.hadoop.fs.ChecksumFileSystem$FsOperation.run(ChecksumFileSystem.java:487)
at 
org.apache.hadoop.fs.ChecksumFileSystem.setPermission(ChecksumFileSystem.java:503)
at 
org.apache.hadoop.mapreduce.JobResourceUploader.copyJar(JobResourceUploader.java:212)
at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:166)
at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:97)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:192)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:415)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2172)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1831)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1548)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1303)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1298)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:252)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:91)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:344)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWork

[jira] [Updated] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2017-08-23 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14162:

Resolution: Workaround
Status: Resolved  (was: Patch Available)

[~belugabehr] FYI. I'm resolving this now.

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: New Feature
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-14162.1.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17373) Upgrade some dependency versions

2017-08-23 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17373:

Attachment: HIVE-17373.1.patch

> Upgrade some dependency versions
> 
>
> Key: HIVE-17373
> URL: https://issues.apache.org/jira/browse/HIVE-17373
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17373.1.patch
>
>
> Upgrade some libraries including log4j to 2.8.2, accumulo to 1.8.1 and 
> commons-httpclient to 3.1. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17373) Upgrade some dependency versions

2017-08-23 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17373:

Status: Patch Available  (was: Open)

> Upgrade some dependency versions
> 
>
> Key: HIVE-17373
> URL: https://issues.apache.org/jira/browse/HIVE-17373
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17373.1.patch
>
>
> Upgrade some libraries including log4j to 2.8.2, accumulo to 1.8.1 and 
> commons-httpclient to 3.1. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17373) Upgrade some dependency versions

2017-08-23 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17373:

Status: Open  (was: Patch Available)

> Upgrade some dependency versions
> 
>
> Key: HIVE-17373
> URL: https://issues.apache.org/jira/browse/HIVE-17373
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17373.1.patch
>
>
> Upgrade some libraries including log4j to 2.8.2, accumulo to 1.8.1 and 
> commons-httpclient to 3.1. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17373) Upgrade some dependency versions

2017-08-23 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17373:

Attachment: (was: HIVE-17373.1.patch)

> Upgrade some dependency versions
> 
>
> Key: HIVE-17373
> URL: https://issues.apache.org/jira/browse/HIVE-17373
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17373.1.patch
>
>
> Upgrade some libraries including log4j to 2.8.2, accumulo to 1.8.1 and 
> commons-httpclient to 3.1. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16614) Support "set local time zone" statement

2017-08-23 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16614:
---
Attachment: HIVE-16614.04.patch

> Support "set local time zone" statement
> ---
>
> Key: HIVE-16614
> URL: https://issues.apache.org/jira/browse/HIVE-16614
> Project: Hive
>  Issue Type: Improvement
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16614.01.patch, HIVE-16614.02.patch, 
> HIVE-16614.03.patch, HIVE-16614.04.patch, HIVE-16614.patch
>
>
> HIVE-14412 introduces a timezone-aware timestamp.
> SQL has a concept of default time zone displacements, which are transparently 
> applied when converting between timezone-unaware types and timezone-aware 
> types and, in Hive's case, are also used to shift a timezone aware type to a 
> different time zone, depending on configuration.
> SQL also provides that the default time zone displacement be settable at a 
> session level, so that clients can access a database simultaneously from 
> different time zones and see time values in their own time zone.
> Currently the time zone displacement is fixed and is set based on the system 
> time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be 
> more convenient for users if they have the ability to set their time zone of 
> choice.
> SQL defines "set time zone" with 2 ways of specifying the time zone, first 
> using an interval and second using the special keyword LOCAL.
> Examples:
>   • set time zone '-8:00';
>   • set time zone LOCAL;
> LOCAL means to set the current default time zone displacement to the 
> session's original default time zone displacement.
> Reference: SQL:2011 section 19.4



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17376) Upgrade snappy version to 1.1.4

2017-08-23 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned HIVE-17376:
---


> Upgrade snappy version to 1.1.4
> ---
>
> Key: HIVE-17376
> URL: https://issues.apache.org/jira/browse/HIVE-17376
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> Upgrade the snappy java version to 1.1.4. The older version has some issues 
> like memory leak (https://github.com/xerial/snappy-java/issues/91).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17376) Upgrade snappy version to 1.1.4

2017-08-23 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17376:

Attachment: HIVE-17376.1.patch

> Upgrade snappy version to 1.1.4
> ---
>
> Key: HIVE-17376
> URL: https://issues.apache.org/jira/browse/HIVE-17376
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17376.1.patch
>
>
> Upgrade the snappy java version to 1.1.4. The older version has some issues 
> like memory leak (https://github.com/xerial/snappy-java/issues/91).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17376) Upgrade snappy version to 1.1.4

2017-08-23 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17376:

Status: Patch Available  (was: Open)

> Upgrade snappy version to 1.1.4
> ---
>
> Key: HIVE-17376
> URL: https://issues.apache.org/jira/browse/HIVE-17376
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17376.1.patch
>
>
> Upgrade the snappy java version to 1.1.4. The older version has some issues 
> like memory leak (https://github.com/xerial/snappy-java/issues/91).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-23 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Status: Open  (was: Patch Available)

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded database.{color}



--
This message was sent by Atla

[jira] [Commented] (HIVE-17361) Support LOAD DATA for transactional tables

2017-08-23 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139203#comment-16139203
 ] 

Wei Zheng commented on HIVE-17361:
--

{code}
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION 
(partcol1=val1, partcol2=val2 ...)]
{code}
Unlike non-ACID table, if the table is bucketed and there are more than 1 
bucket file, then LOAD DATA on ACID table will require 'filepath' to refer to a 
directory not a file. Otherwise, one may end up having a bucket file in one 
load_delta directory and another bucket file in a different load_delta 
directory.

The reason behind this is:
a) For a non-ACID table, say tbl1, one can continue loading files into the same 
table via consecutive LOAD commands, that will just result more and more files 
under tbl1/ directory
b) However, for a non-ACID table, since a new load_delta directory will be 
created every time when LOAD DATA is run, consecutive LOAD commands will create 
separate subdirectories for every single file, which may not be desirable, e.g. 
if one wants to load a file for one bucket, and then a file for another bucket, 
those two files will reside in two different load_delta directories.

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-17361.1.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-23 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Attachment: HIVE-17100.06.patch

Added 06.patch with 
- Fixes for Anishek's comments. 
- Additional bug fix for no endLog reported for Incremental Load.

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch, 
> HIVE-17100.06.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL

[jira] [Assigned] (HIVE-17377) SharedWorkOptimizer might not iterate through TS operators deterministically

2017-08-23 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-17377:
--


> SharedWorkOptimizer might not iterate through TS operators deterministically
> 
>
> Key: HIVE-17377
> URL: https://issues.apache.org/jira/browse/HIVE-17377
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Given same query, multiple executions of the same query might yield different 
> reutilization results since iteration order over TS operators is not 
> guaranteed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17377) SharedWorkOptimizer might not iterate through TS operators deterministically

2017-08-23 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17377 started by Jesus Camacho Rodriguez.
--
> SharedWorkOptimizer might not iterate through TS operators deterministically
> 
>
> Key: HIVE-17377
> URL: https://issues.apache.org/jira/browse/HIVE-17377
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17377.patch
>
>
> Given same query, multiple executions of the same query might yield different 
> reutilization results since iteration order over TS operators is not 
> guaranteed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17377) SharedWorkOptimizer might not iterate through TS operators deterministically

2017-08-23 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17377:
---
Attachment: HIVE-17377.patch

> SharedWorkOptimizer might not iterate through TS operators deterministically
> 
>
> Key: HIVE-17377
> URL: https://issues.apache.org/jira/browse/HIVE-17377
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17377.patch
>
>
> Given same query, multiple executions of the same query might yield different 
> reutilization results since iteration order over TS operators is not 
> guaranteed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17377) SharedWorkOptimizer might not iterate through TS operators deterministically

2017-08-23 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17377:
---
Status: Patch Available  (was: In Progress)

> SharedWorkOptimizer might not iterate through TS operators deterministically
> 
>
> Key: HIVE-17377
> URL: https://issues.apache.org/jira/browse/HIVE-17377
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17377.patch
>
>
> Given same query, multiple executions of the same query might yield different 
> reutilization results since iteration order over TS operators is not 
> guaranteed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-23 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Status: Patch Available  (was: Open)

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch, 
> HIVE-17100.06.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded database.{color}



--
This 

[jira] [Updated] (HIVE-17307) Change the metastore to not use the metrics code in hive/common

2017-08-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17307:
--
Status: Open  (was: Patch Available)

Need a new version of the patch as some changes in ObjectStore and TxnHandler 
conflict.

> Change the metastore to not use the metrics code in hive/common
> ---
>
> Key: HIVE-17307
> URL: https://issues.apache.org/jira/browse/HIVE-17307
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17307.patch
>
>
> As we move code into the standalone metastore module, it cannot use the 
> metrics in hive-common.  We could copy the current Metrics interface or we 
> could change the metastore code to directly use codahale metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17307) Change the metastore to not use the metrics code in hive/common

2017-08-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17307:
--
Attachment: (was: HIVE-17307.2.patch)

> Change the metastore to not use the metrics code in hive/common
> ---
>
> Key: HIVE-17307
> URL: https://issues.apache.org/jira/browse/HIVE-17307
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17307.patch
>
>
> As we move code into the standalone metastore module, it cannot use the 
> metrics in hive-common.  We could copy the current Metrics interface or we 
> could change the metastore code to directly use codahale metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17307) Change the metastore to not use the metrics code in hive/common

2017-08-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17307:
--
Attachment: HIVE-17307.2.patch

> Change the metastore to not use the metrics code in hive/common
> ---
>
> Key: HIVE-17307
> URL: https://issues.apache.org/jira/browse/HIVE-17307
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17307.patch
>
>
> As we move code into the standalone metastore module, it cannot use the 
> metrics in hive-common.  We could copy the current Metrics interface or we 
> could change the metastore code to directly use codahale metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17307) Change the metastore to not use the metrics code in hive/common

2017-08-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17307:
--
Attachment: HIVE-17307.2.patch

> Change the metastore to not use the metrics code in hive/common
> ---
>
> Key: HIVE-17307
> URL: https://issues.apache.org/jira/browse/HIVE-17307
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17307.2.patch, HIVE-17307.patch
>
>
> As we move code into the standalone metastore module, it cannot use the 
> metrics in hive-common.  We could copy the current Metrics interface or we 
> could change the metastore code to directly use codahale metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17307) Change the metastore to not use the metrics code in hive/common

2017-08-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17307:
--
Status: Patch Available  (was: Open)

> Change the metastore to not use the metrics code in hive/common
> ---
>
> Key: HIVE-17307
> URL: https://issues.apache.org/jira/browse/HIVE-17307
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17307.2.patch, HIVE-17307.patch
>
>
> As we move code into the standalone metastore module, it cannot use the 
> metrics in hive-common.  We could copy the current Metrics interface or we 
> could change the metastore code to directly use codahale metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17379) Null Pointer Exception in WHERE clause when using aggregate function as a filter

2017-08-23 Thread Sharanya Santhanam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharanya Santhanam updated HIVE-17379:
--
Description: 
Sample Query : 

with tableAAlias as (
   select a, count(z)  as acount
   from tableA
   groupBy a 

)

select a.a, b.b 
from tableB as b JOIN 
tableAAlias a
on a.a=b.a
where a.acount > 10 

FAILED: NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:103)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:176)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:136)
at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:246)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11149)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:264)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:80)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:264)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:490)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1270)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1412)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1199)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1189)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:265)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:444)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:514)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:882)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:836)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:732)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:223)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)


The above Query Succeeds if it is modified as : 

select a.a, b.b , *a.acount*
from tableB as b JOIN 
tableAAlias a
on a.a=b.a
where a.acount > 10 


Please Note the original query worked on hive1.2 & breaks on Hive2.1.1 

  was:

Sample Query : 

with tableAAlias as (
   select a, count(*)  as acount
   from tableA
   groupBy a 

)

select a.a, b.b 
from tableB as b JOIN 
tableAAlias a
on a.a=b.a
where a.acount > 10 

FAILED: NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:103)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:176)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:136)
at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:246)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11149)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:264)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:80)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:264)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:490)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1270)
at org.apache.hadoo

[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-23 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139246#comment-16139246
 ] 

anishek commented on HIVE-16886:


link seems to be discussing sequences and like i mentioned earlier the 
subsequent transactions will be in commit phase in db when ids get generated so 
unless something is happening as a trigger on new insert into notification log 
there is no problem with replication.

Doing retry with optimistic transactions in application code is not something 
we want to do since it will affect performance of straggle some users. do you 
have any other ideas ?

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17241) Change metastore classes to not use the shims

2017-08-23 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139250#comment-16139250
 ] 

Alan Gates commented on HIVE-17241:
---

[~vihangk1], no, it wasn't intentional.  I suspect the best approach is to make 
copies of these two classes for the metastore and do the same package remapping 
we do for MemoryTokenStore.  If you agree I'll file a JIRA and add that.  
Thanks for catching this.

> Change metastore classes to not use the shims
> -
>
> Key: HIVE-17241
> URL: https://issues.apache.org/jira/browse/HIVE-17241
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 3.0.0
>
> Attachments: HIVE-17241.2.patch, HIVE-17241.patch
>
>
> As part of moving the metastore into a standalone package, it will no longer 
> have access to the shims.  This means we need to either copy them or access 
> the underlying Hadoop operations directly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-08-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17297:

Attachment: (was: HIVE-17297.patch)

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.01.patch, HIVE-17297.only.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-08-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17297:

Attachment: HIVE-17297.01.nogen.patch
HIVE-17297.01.patch

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.01.nogen.patch, HIVE-17297.01.patch, 
> HIVE-17297.01.patch, HIVE-17297.only.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-08-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17297:

Attachment: (was: HIVE-17297.only.patch)

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.01.nogen.patch, HIVE-17297.01.patch, 
> HIVE-17297.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-08-23 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16949:

Status: Patch Available  (was: Open)

> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>Assignee: Sahil Takiar
> Attachments: HIVE-16949.1.patch
>
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed automatically. When queries spanning multiple partitions are made the 
> number of threads increases and is never reduced. On my machine hiveserver2 
> starts to get slower and slower once 10k threads are reached.
> Thread pools only shutdown automatically in special circumstances (see 
> [documentation section 
> _Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
>  This is not currently the case for the Get-Input-Paths thread pool. I would 
> add a _pool.shutdown()_ in a finally block just before returning the result 
> to make sure the threads are really shutdown.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
> This prevents the the thread pool from being spawned 
> [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
>  
> [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].
> The same issue probably also applies to the [Get-Input-Summary thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-08-23 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16949:

Attachment: HIVE-16949.1.patch

> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>Assignee: Sahil Takiar
> Attachments: HIVE-16949.1.patch
>
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed automatically. When queries spanning multiple partitions are made the 
> number of threads increases and is never reduced. On my machine hiveserver2 
> starts to get slower and slower once 10k threads are reached.
> Thread pools only shutdown automatically in special circumstances (see 
> [documentation section 
> _Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
>  This is not currently the case for the Get-Input-Paths thread pool. I would 
> add a _pool.shutdown()_ in a finally block just before returning the result 
> to make sure the threads are really shutdown.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
> This prevents the the thread pool from being spawned 
> [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
>  
> [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].
> The same issue probably also applies to the [Get-Input-Summary thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17380) refactor LlapProtocolClientProxy to be usable with other protocols

2017-08-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-17380:
---


> refactor LlapProtocolClientProxy to be usable with other protocols
> --
>
> Key: HIVE-17380
> URL: https://issues.apache.org/jira/browse/HIVE-17380
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17256) add a notion of a guaranteed task to LLAP

2017-08-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139343#comment-16139343
 ] 

Sergey Shelukhin commented on HIVE-17256:
-

Yes, there needs to be a writeup for workload management (for which this is a 
building block). I think we'll do it after AM and HS2 patches are ready.

> add a notion of a guaranteed task to LLAP
> -
>
> Key: HIVE-17256
> URL: https://issues.apache.org/jira/browse/HIVE-17256
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-17256.01.patch, HIVE-17256.patch
>
>
> Tasks are basically on two levels, guaranteed and speculative, with 
> speculative being the default. As long as noone uses the new flag, the tasks 
> behave the same.
> All the tasks that do have the flag also behave the same with regard to each 
> other.
> The difference is that a guaranteed task is always higher priority, and 
> preempts, a speculative task. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-23 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139346#comment-16139346
 ] 

anishek commented on HIVE-16886:


Thinking more on this, might be we just do "select for update' for getting the 
next event id from notification_sequence. this should work on most of the 
problems we have but will make metastore operations slow. 

also just wondering even in single standalone HMS we can have this issue, as 
the NOTIFICATION_TBL_LOCK is within a nested transaction and multiple threads 
can still read the same value across threads and update it separately.

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17318) Make Hikari CP configurable using hive properties in hive-site.xml

2017-08-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139351#comment-16139351
 ] 

Hive QA commented on HIVE-17318:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12883361/HIVE-17318.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11003 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_12] 
(batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_16] 
(batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[view_alias] (batchId=79)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6501/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6501/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6501/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12883361 - PreCommit-HIVE-Build

> Make Hikari CP configurable using hive properties in hive-site.xml
> --
>
> Key: HIVE-17318
> URL: https://issues.apache.org/jira/browse/HIVE-17318
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17318.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17380) refactor LlapProtocolClientProxy to be usable with other protocols

2017-08-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17380:

Attachment: HIVE-17380.patch

This basically moves a bunch of code into a generic async PB RPC proxy, in 
llap-common for now. Moving to common would require one to move LlapNodeId, 
that can be done later.
The only logic change is that concurrent hash map, that never expires, is 
replaced by Guava cache.

> refactor LlapProtocolClientProxy to be usable with other protocols
> --
>
> Key: HIVE-17380
> URL: https://issues.apache.org/jira/browse/HIVE-17380
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17380.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17380) refactor LlapProtocolClientProxy to be usable with other protocols

2017-08-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17380:

Status: Patch Available  (was: Open)

> refactor LlapProtocolClientProxy to be usable with other protocols
> --
>
> Key: HIVE-17380
> URL: https://issues.apache.org/jira/browse/HIVE-17380
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17380.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-23 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Status: Open  (was: Patch Available)

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch, 
> HIVE-17100.06.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded database.{color}



--
This 

[jira] [Updated] (HIVE-17380) refactor LlapProtocolClientProxy to be usable with other protocols

2017-08-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17380:

Description: 
This basically moves a bunch of code into a generic async PB RPC proxy, in 
llap-common for now. Moving to common would require one to move LlapNodeId, 
that can be done later.
The only logic change is that concurrent hash map, that never expires, is 
replaced by Guava cache. A path to shut down a proxy is added, but does nothing.

> refactor LlapProtocolClientProxy to be usable with other protocols
> --
>
> Key: HIVE-17380
> URL: https://issues.apache.org/jira/browse/HIVE-17380
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17380.patch
>
>
> This basically moves a bunch of code into a generic async PB RPC proxy, in 
> llap-common for now. Moving to common would require one to move LlapNodeId, 
> that can be done later.
> The only logic change is that concurrent hash map, that never expires, is 
> replaced by Guava cache. A path to shut down a proxy is added, but does 
> nothing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-23 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Attachment: HIVE-17100.07.patch

Attached 07.patch with fixes for few more comments.

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch, 
> HIVE-17100.06.patch, HIVE-17100.07.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of event

[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-23 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Status: Patch Available  (was: Open)

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch, 
> HIVE-17100.06.patch, HIVE-17100.07.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded databas

[jira] [Updated] (HIVE-17380) refactor LlapProtocolClientProxy to be usable with other protocols

2017-08-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17380:

Attachment: HIVE-17380.patch

Wrong base branch; the same patch for master this time.

> refactor LlapProtocolClientProxy to be usable with other protocols
> --
>
> Key: HIVE-17380
> URL: https://issues.apache.org/jira/browse/HIVE-17380
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17380.patch, HIVE-17380.patch
>
>
> This basically moves a bunch of code into a generic async PB RPC proxy, in 
> llap-common for now. Moving to common would require one to move LlapNodeId, 
> that can be done later.
> The only logic change is that concurrent hash map, that never expires, is 
> replaced by Guava cache. A path to shut down a proxy is added, but does 
> nothing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-23 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139403#comment-16139403
 ] 

Alexander Kolbasov commented on HIVE-16886:
---

{{select for update}} will work, but is, indeed a hammer, plus now we might 
have deadlocks. But I am not aware of any solution that guarantees the 
semantics we need and doesn't hurt performance.

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17309) alter partition onto a table not in current database throw InvalidOperationException

2017-08-23 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139408#comment-16139408
 ] 

Alan Gates commented on HIVE-17309:
---

In general looks good.  Rather than constructing the tablename with getDbName() 
+ '.' + getTableName() you should call Warehouse.getQualifiedName(). 

> alter partition onto a table not in current database throw 
> InvalidOperationException
> 
>
> Key: HIVE-17309
> URL: https://issues.apache.org/jira/browse/HIVE-17309
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.2, 2.1.1, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17309.1.patch
>
>
> When executor alter partition onto a table which existed not in current 
> database, InvalidOperationException thrown.
> SQL example:
> {code}
> use default;
> ALTER TABLE anotherdb.test_table_for_alter_partition_nocurrentdb 
> partition(ds='haihua001') CHANGE COLUMN a a_new BOOLEAN;
> {code}
> We see this code in {{DDLTask.java}} potential problem that not transfer the 
> qualified table name with database name when {{db.alterPartitions}} called.
> {code}
>   if (allPartitions == null) {
> db.alterTable(alterTbl.getOldName(), tbl, alterTbl.getIsCascade(), 
> alterTbl.getEnvironmentContext());
>   } else {
> db.alterPartitions(tbl.getTableName(), allPartitions, 
> alterTbl.getEnvironmentContext());
>   }
> {code}
> stacktrace:
> {code}
> 2017-07-19T11:06:39,639  INFO [main] metastore.HiveMetaStore: New partition 
> values:[2017-07-14]
> 2017-07-19T11:06:39,654 ERROR [main] metastore.RetryingHMSHandler: 
> InvalidOperationException(message:alter is not possible)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:526)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3560)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at 
> com.sun.proxy.$Proxy21.alter_partitions_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_partitions(HiveMetaStoreClient.java:1486)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy22.alter_partitions(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.alterPartitions(Hive.java:712)
> at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3338)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:368)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2166)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1837)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1713)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1543)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1174)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1164)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apa

[jira] [Updated] (HIVE-17309) alter partition onto a table not in current database throw InvalidOperationException

2017-08-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17309:
--
Status: Open  (was: Patch Available)

Cancelling patch pending changes and a new patch since the existing one has 
gotten out of sync.

> alter partition onto a table not in current database throw 
> InvalidOperationException
> 
>
> Key: HIVE-17309
> URL: https://issues.apache.org/jira/browse/HIVE-17309
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0, 2.1.1, 1.2.2
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17309.1.patch
>
>
> When executor alter partition onto a table which existed not in current 
> database, InvalidOperationException thrown.
> SQL example:
> {code}
> use default;
> ALTER TABLE anotherdb.test_table_for_alter_partition_nocurrentdb 
> partition(ds='haihua001') CHANGE COLUMN a a_new BOOLEAN;
> {code}
> We see this code in {{DDLTask.java}} potential problem that not transfer the 
> qualified table name with database name when {{db.alterPartitions}} called.
> {code}
>   if (allPartitions == null) {
> db.alterTable(alterTbl.getOldName(), tbl, alterTbl.getIsCascade(), 
> alterTbl.getEnvironmentContext());
>   } else {
> db.alterPartitions(tbl.getTableName(), allPartitions, 
> alterTbl.getEnvironmentContext());
>   }
> {code}
> stacktrace:
> {code}
> 2017-07-19T11:06:39,639  INFO [main] metastore.HiveMetaStore: New partition 
> values:[2017-07-14]
> 2017-07-19T11:06:39,654 ERROR [main] metastore.RetryingHMSHandler: 
> InvalidOperationException(message:alter is not possible)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:526)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3560)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at 
> com.sun.proxy.$Proxy21.alter_partitions_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_partitions(HiveMetaStoreClient.java:1486)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy22.alter_partitions(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.alterPartitions(Hive.java:712)
> at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3338)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:368)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2166)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1837)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1713)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1543)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1174)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1164)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop

[jira] [Commented] (HIVE-17360) Tez session reopen appears to use a wrong conf object

2017-08-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139422#comment-16139422
 ] 

Hive QA commented on HIVE-17360:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12883388/HIVE-17360.03.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10998 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6503/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6503/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6503/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12883388 - PreCommit-HIVE-Build

> Tez session reopen appears to use a wrong conf object
> -
>
> Key: HIVE-17360
> URL: https://issues.apache.org/jira/browse/HIVE-17360
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17360.01.patch, HIVE-17360.02.patch, 
> HIVE-17360.03.patch, HIVE-17360.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16529) Replace JPAM with libpam4j for PAM authentication

2017-08-23 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139440#comment-16139440
 ] 

Eric Yang commented on HIVE-16529:
--

JPAM user account expiration issue can easily work around by applying this 
patch to JPAM:

{code}
--- jpam/jpam/src/c/Pam.c   2005-06-14 20:02:36.0 -0700
+++ ../../jpam/jpam/jpam/src/c/Pam.c2017-08-23 18:20:09.0 -0700
@@ -151,6 +151,9 @@
 printf("***Sending password\n");
  reply[replies].resp = COPY_STRING(password);
   }
+  if (msg[replies]->msg_style==4) {
+ reply[replies].resp = NULL;
+  }
   if (debug)
 printf("***Response to PAM is: |%s|\n", reply[replies].resp);
}
{code}

This might be a workaround solution instead of replacing JPAM with libpam4j.

> Replace JPAM with libpam4j for PAM authentication
> -
>
> Key: HIVE-16529
> URL: https://issues.apache.org/jira/browse/HIVE-16529
> Project: Hive
>  Issue Type: Improvement
>  Components: Authentication
>Affects Versions: 1.2.0
>Reporter: Richard Ding
>Assignee: Sailaja Navvluru
>
> PAM authentication is an important feature available since Hive 0.13. But 
> Hive blog gives the following warnings:
> {quote}
> JPAM library that is used to provide the PAM authentication mode can cause 
> HiveServer2 to go down if a user's password has expired. This happens because 
> of segfault/core dumps from native code invoked by JPAM. Some users have also 
> reported crashes during logins in other cases as well. Use of LDAP or 
> KERBEROS is recommended.
> {quote}
> ​JPAM also requires user to install a native library. ​Furthermore, JPAM 
> library seems not to have been updated since 2007.
> Other Apache projects (e.g. Ambari/Ranger/Knox) use a newer library libpam4j 
> which doesn't require installation of native library. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15899) check CTAS over acid table

2017-08-23 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139458#comment-16139458
 ] 

Eugene Koifman commented on HIVE-15899:
---

look for QB.isCTAS() and CreateTableDesc() in SemanticAnalyzer

> check CTAS over acid table 
> ---
>
> Key: HIVE-15899
> URL: https://issues.apache.org/jira/browse/HIVE-15899
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> need to add a test to check if create table as works correctly with acid 
> tables



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-15899) check CTAS over acid table

2017-08-23 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139458#comment-16139458
 ] 

Eugene Koifman edited comment on HIVE-15899 at 8/24/17 1:53 AM:


look for QB.isCTAS() and CreateTableDesc() in SemanticAnalyzer
TestTxnNoBuckets.testCTAS()
TestAcidOnTez.*


was (Author: ekoifman):
look for QB.isCTAS() and CreateTableDesc() in SemanticAnalyzer

> check CTAS over acid table 
> ---
>
> Key: HIVE-15899
> URL: https://issues.apache.org/jira/browse/HIVE-15899
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> need to add a test to check if create table as works correctly with acid 
> tables



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17205) add functional support

2017-08-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17205:
--
Attachment: HIVE-17205.14.patch

> add functional support
> --
>
> Key: HIVE-17205
> URL: https://issues.apache.org/jira/browse/HIVE-17205
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17205.01.patch, HIVE-17205.02.patch, 
> HIVE-17205.03.patch, HIVE-17205.09.patch, HIVE-17205.10.patch, 
> HIVE-17205.11.patch, HIVE-17205.12.patch, HIVE-17205.13.patch, 
> HIVE-17205.14.patch
>
>
> make sure unbucketed tables can be marked transactional=true
> make insert/update/delete/compaction work



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-6131) New columns after table alter result in null values despite data

2017-08-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139468#comment-16139468
 ] 

Hive QA commented on HIVE-6131:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12637927/HIVE-6131.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 11000 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_partition_change_col]
 (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_cascade] 
(batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_data_after_schema_update]
 (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat11]
 (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat12]
 (batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat13]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat14]
 (batchId=73)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_complex]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_primitive]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_complex]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_primitive]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_complex]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_primitive]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6504/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6504/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6504/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12637927 - PreCommit-HIVE-Build

> New columns after table alter result in null values despite data
> 
>
> Key: HIVE-6131
> URL: https://issues.apache.org/jira/browse/HIVE-6131
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0, 0.12.0, 0.13.0, 1.2.1
>Reporter: James Vaughan
>Priority: Critical
> Attachments: HIVE-6131.1.patch
>
>
> Hi folks,
> I found and verified a bug on our CDH 4.0.3 install of Hive when adding 
> columns to tables with Partitions using 'REPLACE COLUMNS'.  I dug through the 
> Jira a little bit and didn't see anything for it so hopefully this isn't just 
> noise on the radar.
> Basically, when you alter a table with partitions and then reupload data to 
> that partition, it doesn't seem to recognize the extra data that actually 
> exists in HDFS- as in, returns NULL values on the new column despite having 
> the data and recognizing the new column in the metadata.
> Here's some steps to reproduce using a basic table:
> 1.  Run this hive command:  CREATE TABLE jvaughan_test (col1 string) 
> partitioned by (day string);
> 2.  Create a simple file on the system with a couple of entries, something 
> like "hi" and "hi2" separated by newlines.
> 3.  Run this hive command, pointing it at the file:  LOAD DATA LOCAL INPATH 
> '' OVERWRITE INTO TABLE jvaughan_test PARTITION (day = '2014-01-02');
> 4.  Confirm the data with:  SELECT * FROM jvaughan_test WHERE day = 
> '2014-01-02';
> 5.  Alter the column definitions:  ALTER TABLE jvaughan_test

[jira] [Commented] (HIVE-10349) overflow in stats

2017-08-23 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139497#comment-16139497
 ] 

liyunzhang_intel commented on HIVE-10349:
-

[~sershe]:  I met similar overflow problem when i running TPC-DS/query17 on 
Hive on Spark, the explain is in the 
[link|https://issues.apache.org/jira/secure/attachment/12875204/query17_explain.log].
  what's the root cause of the problem?

> overflow in stats
> -
>
> Key: HIVE-10349
> URL: https://issues.apache.org/jira/browse/HIVE-10349
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
>
> Discovered while running q17 in LLAP.
> {noformat}
> Reducer 2 
> Execution mode: llap
> Reduce Operator Tree:
>   Merge Join Operator
> condition map:
>  Inner Join 0 to 1
> keys:
>   0 _col28 (type: int), _col27 (type: int)
>   1 cs_bill_customer_sk (type: int), cs_item_sk (type: int)
> outputColumnNames: _col1, _col2, _col6, _col8, _col9, _col22, 
> _col27, _col28, _col34, _col35, _col45, _col51, _col63, _col66, _col82
> Statistics: Num rows: 1047651367827495040 Data size: 
> 9223372036854775807 Basic stats: COMPLETE Column stats: PARTIAL
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   keys:
> 0 _col22 (type: int)
> 1 d_date_sk (type: int)
>   outputColumnNames: _col1, _col2, _col6, _col8, _col9, 
> _col22, _col27, _col28, _col34, _col35, _col45, _col51, _col63, _col66, 
> _col82, _col86
>   input vertices:
> 1 Map 7
>   Statistics: Num rows: 1152416529588199552 Data size: 
> 9223372036854775807 Basic stats: COMPLETE Column stats: NONE
> {noformat}
> Data size overflows and row count also looks wrong. I wonder if this is why 
> it generates 1009 reducers for this stage on 6 machines



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17372) update druid dependency to druid 0.10.1

2017-08-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139503#comment-16139503
 ] 

Hive QA commented on HIVE-17372:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12883394/HIVE-17372.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10999 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[setop_no_distinct] 
(batchId=77)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6505/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6505/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6505/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12883394 - PreCommit-HIVE-Build

> update druid dependency to druid 0.10.1
> ---
>
> Key: HIVE-17372
> URL: https://issues.apache.org/jira/browse/HIVE-17372
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17372.patch
>
>
> Update to most recent druid version to be released August 23.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17375) stddev_samp,var_samp standard compliance

2017-08-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139533#comment-16139533
 ] 

Hive QA commented on HIVE-17375:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12883416/HIVE-17375.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10999 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_udf] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_12] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_14] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_15] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_16] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_9] 
(batchId=1)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_15]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_15] 
(batchId=128)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6506/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6506/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6506/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12883416 - PreCommit-HIVE-Build

> stddev_samp,var_samp standard compliance
> 
>
> Key: HIVE-17375
> URL: https://issues.apache.org/jira/browse/HIVE-17375
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-17375.1.patch
>
>
> these two udaf-s are returning 0 in case of only one element - however the 
> stadard requires NULL to be returned



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >