[jira] [Commented] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters

2017-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136343#comment-16136343
 ] 

Hive QA commented on HIVE-17237:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882955/HIVE-17237.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10994 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistCpAs (batchId=250)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistcp (batchId=250)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6477/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6477/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6477/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882955 - PreCommit-HIVE-Build

> HMS wastes 26.4% of memory due to dup strings in 
> metastore.api.Partition.parameters
> ---
>
> Key: HIVE-17237
> URL: https://issues.apache.org/jira/browse/HIVE-17237
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-17237.01.patch, HIVE-17237.02.patch
>
>
> I've analyzed a heap dump from a production Hive installation using jxray 
> (www.jxray.com) It turns out that there are a lot of duplicate strings in 
> memory, that waste 26.4% of the heap. Most of them come from HashMaps 
> referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. 
> Below is the relevant section of the jxray report.
> Looking at Partition.java, I see that in the past somebody has already added 
> code to intern keys and values in the parameters table when it's first set 
> up. However, when more key-value pairs are added, they are not interned, and 
> that probably explains the reason for all these duplicate strings. Also when 
> a Partition instance is deserialized, no interning of parameters is currently 
> done.
> {code}
> 6. DUPLICATE STRINGS
> Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  
> Overhead: 3,220,458K (26.4%)
> 
> ===
> 7. REFERENCE CHAINS FOR DUPLICATE STRINGS
>   2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing 
> arrays:
> 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 
> 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of 
> "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 
> 3560]"
> ... and 419200 more strings, of which 36376 are unique
> Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 
> 28 of "2", 21 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
>   463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing 
> arrays:
> 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of 
> "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
> ... and 84009 more strings, of which 34065 are unique
> Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 
> of "2", 3 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68]
>   233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays:
> 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of 

[jira] [Updated] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-21 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16823:

Status: Patch Available  (was: Open)

> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>Assignee: liyunzhang_intel
> Attachments: explain.spark, explain.tez, HIVE-16823.patch
>
>
> spark_vectorized_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:85) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:832)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:179)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1035)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> 

[jira] [Updated] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-21 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16823:

Attachment: HIVE-16823.patch

In HIVE-15269:Dynamic Min-Max/BloomFilter runtime-filtering for Tez, 
ConstantPropagate is removed in TezCompiler#runDynamicPartitionPruning. Similar 
code should be removed in SparkCompiler#runDynamicPartitionPruning
{code}
  private void runDynamicPartitionPruning(OptimizeTezProcContext procCtx, 
Set inputs,
  Set outputs) throws SemanticException {

if (!procCtx.conf.getBoolVar(ConfVars.TEZ_DYNAMIC_PARTITION_PRUNING)) {
  return;
}

// Sequence of TableScan operators to be walked
Deque deque = new LinkedList();
deque.addAll(procCtx.parseContext.getTopOps().values());

Map opRules = new LinkedHashMap();
opRules.put(
new RuleRegExp(new String("Dynamic Partition Pruning"), 
FilterOperator.getOperatorName()
+ "%"), new DynamicPartitionPruningOptimization());

// The dispatcher fires the processor corresponding to the closest matching
// rule and passes the context along
Dispatcher disp = new DefaultRuleDispatcher(null, opRules, procCtx);
List topNodes = new ArrayList();
topNodes.addAll(procCtx.parseContext.getTopOps().values());
GraphWalker ogw = new ForwardWalker(disp);
ogw.startWalking(topNodes, null);

/** Similar code is removed in TezCompiler in HIVE-15269:Dynamic 
Min-Max/BloomFilter runtime-filtering for Tez***/
// need a new run of the constant folding because we might have created lots
// of "and true and true" conditions.
// Rather than run the full constant folding just need to shortcut AND/OR 
expressions
// involving constant true/false values.
if(procCtx.conf.getBoolVar(ConfVars.HIVEOPTCONSTANTPROPAGATION)) {
  new 
ConstantPropagate(ConstantPropagateOption.SHORTCUT).transform(procCtx.parseContext);
}

  }

{code}
[~lirui],[~stakiar]: can you help review?


> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>Assignee: liyunzhang_intel
> Attachments: explain.spark, explain.tez, HIVE-16823.patch
>
>
> spark_vectorized_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  

[jira] [Commented] (HIVE-17362) The MAX_PREWARM_TIME should be configurable on HoS

2017-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136269#comment-16136269
 ] 

Hive QA commented on HIVE-17362:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882950/HIVE-17362.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10994 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistCpAs (batchId=250)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistcp (batchId=250)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6476/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6476/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6476/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882950 - PreCommit-HIVE-Build

> The MAX_PREWARM_TIME should be configurable on HoS
> --
>
> Key: HIVE-17362
> URL: https://issues.apache.org/jira/browse/HIVE-17362
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17362.2.patch, HIVE-17362.patch
>
>
> When using HIVE_PREWARM_ENABLED, we are waiting MAX_PREWARM_TIME for the 
> containers to warm up. This is currently set to 5s. This is often not enough 
> for a spark session to initialize the executors. We should be able to 
> configure this, so we can set a value which has an effect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-21 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136156#comment-16136156
 ] 

liyunzhang_intel edited comment on HIVE-16823 at 8/22/17 2:16 AM:
--

some updates about the jira.
 The root cause of the problem is because the difference of sub-query {{select 
ds as ds, ds as `date` from srcpart group by ds}} between tez and spark mode.
the spark explain(the full spark explain is attached in 
[here|https://issues.apache.org/jira/secure/attachment/12883036/explain.spark] )
{code}
  Map 3 
Map Operator Tree:
TableScan
  alias: srcpart
  filterExpr: (ds = '2008-04-08') (type: boolean)
  Statistics: Num rows: 1 Data size: 11624 Basic stats: PARTIAL 
Column stats: NONE
  Select Operator
Statistics: Num rows: 1 Data size: 11624 Basic stats: 
PARTIAL Column stats: NONE
Group By Operator
  keys: '2008-04-08' (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 11624 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: '2008-04-08' (type: string)
sort order: +
Map-reduce partition columns: '2008-04-08' (type: 
string)
Statistics: Num rows: 1 Data size: 11624 Basic stats: 
COMPLETE Column stats: NONE
 Reducer 4 
Local Work:
  Map Reduce Local Work
Reduce Operator Tree:
  Group By Operator
keys: '2008-04-08' (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 11624 Basic stats: COMPLETE 
Column stats: NONE
{code}

the tez explain(the full tez explain is attached in 
[here|https://issues.apache.org/jira/secure/attachment/12883035/explain.tez] )
{code}
  Map 2 
Map Operator Tree:
TableScan
  alias: srcpart
  filterExpr: (ds = '2008-04-08') (type: boolean)
  Statistics: Num rows: 1 Data size: 11624 Basic stats: PARTIAL 
Column stats: NONE
  Select Operator
Statistics: Num rows: 1 Data size: 11624 Basic stats: 
PARTIAL Column stats: NONE
Group By Operator
  keys: '2008-04-08' (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 11624 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 11624 Basic stats: 
COMPLETE Column stats: NONE
Execution mode: vectorized
Reducer 3 
Execution mode: vectorized
Reduce Operator Tree:
  Group By Operator
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
{code}

The Group By Operator appears in Map and Reducer in tez or spark mode. But the 
keys of GroupByOperator in Reducer is different. In tez, the key is  {{keys: 
KEY._col0 (type: string)}} while in spark the key is {{keys: '2008-04-08' 
(type: string)}}.  This difference causes 
[VectorizationContext#getVectorExpression|https://github.com/kellyzly/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L579
 ] returns 
[getColumnVectorExpression|https://github.com/kellyzly/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L582]
 in tez mode while  returns 
[getConstantVectorExpression|https://github.com/kellyzly/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L660].


was (Author: kellyzly):
some updates about the jira.
 The root cause of the problem is because the difference of sub-query {{select 
ds as ds, ds as `date` from srcpart group by ds}} between tez and spark mode.
the spark explain(the full spark explain is attached in 
[here|https://issues.apache.org/jira/secure/attachment/12883036/explain.spark] )
{code}
  Map 3 
Map Operator Tree:
TableScan
  alias: srcpart
  filterExpr: (ds = '2008-04-08') (type: boolean)
  Statistics: Num rows: 1 Data size: 11624 Basic stats: PARTIAL 
Column stats: NONE
  Select Operator
Statistics: Num rows: 1 Data size: 11624 Basic stats: 
PARTIAL 

[jira] [Commented] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-21 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136156#comment-16136156
 ] 

liyunzhang_intel commented on HIVE-16823:
-

some updates about the jira.
 The root cause of the problem is because the difference of sub-query {{select 
ds as ds, ds as `date` from srcpart group by ds}} between tez and spark mode.
the spark explain(the full spark explain is attached in 
[here|https://issues.apache.org/jira/secure/attachment/12883036/explain.spark] )
{code}
  Map 3 
Map Operator Tree:
TableScan
  alias: srcpart
  filterExpr: (ds = '2008-04-08') (type: boolean)
  Statistics: Num rows: 1 Data size: 11624 Basic stats: PARTIAL 
Column stats: NONE
  Select Operator
Statistics: Num rows: 1 Data size: 11624 Basic stats: 
PARTIAL Column stats: NONE
Group By Operator
  keys: '2008-04-08' (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 11624 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: '2008-04-08' (type: string)
sort order: +
Map-reduce partition columns: '2008-04-08' (type: 
string)
Statistics: Num rows: 1 Data size: 11624 Basic stats: 
COMPLETE Column stats: NONE
 Reducer 4 
Local Work:
  Map Reduce Local Work
Reduce Operator Tree:
  Group By Operator
keys: '2008-04-08' (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 11624 Basic stats: COMPLETE 
Column stats: NONE
{code}

the tez explain(the full tez explain is attached in 
[here|https://issues.apache.org/jira/secure/attachment/12883035/explain.tez] )
{code}
  Map 2 
Map Operator Tree:
TableScan
  alias: srcpart
  filterExpr: (ds = '2008-04-08') (type: boolean)
  Statistics: Num rows: 1 Data size: 11624 Basic stats: PARTIAL 
Column stats: NONE
  Select Operator
Statistics: Num rows: 1 Data size: 11624 Basic stats: 
PARTIAL Column stats: NONE
Group By Operator
  keys: '2008-04-08' (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 11624 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 11624 Basic stats: 
COMPLETE Column stats: NONE
Execution mode: vectorized
Reducer 3 
Execution mode: vectorized
Reduce Operator Tree:
  Group By Operator
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
{code}

The Group By Operator appears in Map and Reducer in tez or spark mode. But the 
keys of GroupByOperator in Reducer is different. In tez, the key is  {{keys: 
KEY._col0 (type: string)}} while in spark the key is {{keys: '2008-04-08' 
(type: string)}}.  This difference causes 
[VectorizationContext#getVectorExpression|https://github.com/kellyzly/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L597
 ] returns 
[getColumnVectorExpression|https://github.com/kellyzly/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L582]
 in tez mode while  returns 
[getConstantVectorExpression|https://github.com/kellyzly/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L660].

> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>Assignee: liyunzhang_intel
> Attachments: explain.spark, explain.tez
>
>
> spark_vectorized_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set 

[jira] [Updated] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-21 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16823:

Attachment: explain.spark
explain.tez

> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>Assignee: liyunzhang_intel
> Attachments: explain.spark, explain.tez
>
>
> spark_vectorized_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:85) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:832)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:179)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1035)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> 

[jira] [Updated] (HIVE-17368) DBTokenStore fails to connect in Kerberos enabled remote HMS environment

2017-08-21 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17368:
---
Attachment: HIVE-17368.01.patch

Adding the first version of the patch. Modified the existing test 
{{TestJdbcWithDBTokenStore}} so that it now uses a secure remote HMS.

> DBTokenStore fails to connect in Kerberos enabled remote HMS environment
> 
>
> Key: HIVE-17368
> URL: https://issues.apache.org/jira/browse/HIVE-17368
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.0.0, 2.1.0, 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17368.01.patch
>
>
> In setups where HMS is running as a remote process secured using Kerberos, 
> and when {{DBTokenStore}} is configured as the token store, the HS2 Thrift 
> API calls like {{GetDelegationToken}}, {{CancelDelegationToken}} and 
> {{RenewDelegationToken}} fail with exception trace seen below. HS2 is not 
> able to invoke HMS APIs needed to add/remove/renew tokens from the DB since 
> it is possible that the user which is issue the {{GetDelegationToken}} is not 
> kerberos enabled.
> Eg. Oozie submits a job on behalf of user "Joe". When Oozie opens a session 
> with HS2 it uses Oozie's principal and creates a proxy UGI with Hive. This 
> principal can establish a transport authenticated using Kerberos. It stores 
> the HMS delegation token string in the sessionConf and sessionToken. Now, 
> lets say Oozie issues a {{GetDelegationToken}} which has {{Joe}} as the owner 
> and {{oozie}} as the renewer in {{GetDelegationTokenReq}}. This API call 
> cannot instantiate a HMSClient and open transport to HMS using the HMSToken 
> string available in the sessionConf, since DBTokenStore uses server HiveConf 
> instead of sessionConf. It tries to establish transport using Kerberos and it 
> fails since user Joe is not Kerberos enabled.
> I see the following exception trace in HS2 logs.
> {noformat}
> 2017-08-21T18:07:19,644 ERROR [HiveServer2-Handler-Pool: Thread-61] 
> transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_121]
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_121]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_121]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  [hadoop-common-2.7.2.jar:?]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:488)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:255)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
>  [hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method) ~[?:1.8.0_121]
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  [?:1.8.0_121]
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  [?:1.8.0_121]
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
> [?:1.8.0_121]
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1699)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> 

[jira] [Updated] (HIVE-17368) DBTokenStore fails to connect in Kerberos enabled remote HMS environment

2017-08-21 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17368:
---
Status: Patch Available  (was: Open)

> DBTokenStore fails to connect in Kerberos enabled remote HMS environment
> 
>
> Key: HIVE-17368
> URL: https://issues.apache.org/jira/browse/HIVE-17368
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.1.0, 2.0.0, 1.1.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17368.01.patch
>
>
> In setups where HMS is running as a remote process secured using Kerberos, 
> and when {{DBTokenStore}} is configured as the token store, the HS2 Thrift 
> API calls like {{GetDelegationToken}}, {{CancelDelegationToken}} and 
> {{RenewDelegationToken}} fail with exception trace seen below. HS2 is not 
> able to invoke HMS APIs needed to add/remove/renew tokens from the DB since 
> it is possible that the user which is issue the {{GetDelegationToken}} is not 
> kerberos enabled.
> Eg. Oozie submits a job on behalf of user "Joe". When Oozie opens a session 
> with HS2 it uses Oozie's principal and creates a proxy UGI with Hive. This 
> principal can establish a transport authenticated using Kerberos. It stores 
> the HMS delegation token string in the sessionConf and sessionToken. Now, 
> lets say Oozie issues a {{GetDelegationToken}} which has {{Joe}} as the owner 
> and {{oozie}} as the renewer in {{GetDelegationTokenReq}}. This API call 
> cannot instantiate a HMSClient and open transport to HMS using the HMSToken 
> string available in the sessionConf, since DBTokenStore uses server HiveConf 
> instead of sessionConf. It tries to establish transport using Kerberos and it 
> fails since user Joe is not Kerberos enabled.
> I see the following exception trace in HS2 logs.
> {noformat}
> 2017-08-21T18:07:19,644 ERROR [HiveServer2-Handler-Pool: Thread-61] 
> transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_121]
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_121]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_121]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  [hadoop-common-2.7.2.jar:?]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:488)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:255)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
>  [hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method) ~[?:1.8.0_121]
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  [?:1.8.0_121]
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  [?:1.8.0_121]
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
> [?:1.8.0_121]
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1699)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> 

[jira] [Updated] (HIVE-17205) add functional support

2017-08-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17205:
--
Attachment: HIVE-17205.11.patch

> add functional support
> --
>
> Key: HIVE-17205
> URL: https://issues.apache.org/jira/browse/HIVE-17205
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17205.01.patch, HIVE-17205.02.patch, 
> HIVE-17205.03.patch, HIVE-17205.09.patch, HIVE-17205.10.patch, 
> HIVE-17205.11.patch
>
>
> make sure unbucketed tables can be marked transactional=true
> make insert/update/delete/compaction work



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-21 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel reassigned HIVE-16823:
---

Assignee: liyunzhang_intel

> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>Assignee: liyunzhang_intel
>
> spark_vectorized_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:85) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:832)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:179)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1035)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> 

[jira] [Commented] (HIVE-17366) Constraint replication in bootstrap

2017-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136130#comment-16136130
 ] 

Hive QA commented on HIVE-17366:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882946/HIVE-17366.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10994 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistCpAs (batchId=250)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistcp (batchId=250)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6475/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6475/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6475/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882946 - PreCommit-HIVE-Build

> Constraint replication in bootstrap
> ---
>
> Key: HIVE-17366
> URL: https://issues.apache.org/jira/browse/HIVE-17366
> Project: Hive
>  Issue Type: New Feature
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-17366.1.patch
>
>
> Incremental constraint replication is tracked in HIVE-15705. This is to track 
> the bootstrap replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17368) DBTokenStore fails to connect in Kerberos enabled remote HMS environment

2017-08-21 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17368:
---
Affects Version/s: 1.1.0
   2.0.0
   2.1.0
   2.2.0

> DBTokenStore fails to connect in Kerberos enabled remote HMS environment
> 
>
> Key: HIVE-17368
> URL: https://issues.apache.org/jira/browse/HIVE-17368
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.0.0, 2.1.0, 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> In setups where HMS is running as a remote process secured using Kerberos, 
> and when {{DBTokenStore}} is configured as the token store, the HS2 Thrift 
> API calls like {{GetDelegationToken}}, {{CancelDelegationToken}} and 
> {{RenewDelegationToken}} fail with exception trace seen below. HS2 is not 
> able to invoke HMS APIs needed to add/remove/renew tokens from the DB since 
> it is possible that the user which is issue the {{GetDelegationToken}} is not 
> kerberos enabled.
> Eg. Oozie submits a job on behalf of user "Joe". When Oozie opens a session 
> with HS2 it uses Oozie's principal and creates a proxy UGI with Hive. This 
> principal can establish a transport authenticated using Kerberos. It stores 
> the HMS delegation token string in the sessionConf and sessionToken. Now, 
> lets say Oozie issues a {{GetDelegationToken}} which has {{Joe}} as the owner 
> and {{oozie}} as the renewer in {{GetDelegationTokenReq}}. This API call 
> cannot instantiate a HMSClient and open transport to HMS using the HMSToken 
> string available in the sessionConf, since DBTokenStore uses server HiveConf 
> instead of sessionConf. It tries to establish transport using Kerberos and it 
> fails since user Joe is not Kerberos enabled.
> I see the following exception trace in HS2 logs.
> {noformat}
> 2017-08-21T18:07:19,644 ERROR [HiveServer2-Handler-Pool: Thread-61] 
> transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_121]
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_121]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_121]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  [hadoop-common-2.7.2.jar:?]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:488)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:255)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
>  [hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method) ~[?:1.8.0_121]
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  [?:1.8.0_121]
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  [?:1.8.0_121]
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
> [?:1.8.0_121]
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1699)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>  

[jira] [Assigned] (HIVE-17368) DBTokenStore fails to connect in Kerberos enabled remote HMS environment

2017-08-21 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-17368:
--


> DBTokenStore fails to connect in Kerberos enabled remote HMS environment
> 
>
> Key: HIVE-17368
> URL: https://issues.apache.org/jira/browse/HIVE-17368
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> In setups where HMS is running as a remote process secured using Kerberos, 
> and when {{DBTokenStore}} is configured as the token store, the HS2 Thrift 
> API calls like {{GetDelegationToken}}, {{CancelDelegationToken}} and 
> {{RenewDelegationToken}} fail with exception trace seen below. HS2 is not 
> able to invoke HMS APIs needed to add/remove/renew tokens from the DB since 
> it is possible that the user which is issue the {{GetDelegationToken}} is not 
> kerberos enabled.
> Eg. Oozie submits a job on behalf of user "Joe". When Oozie opens a session 
> with HS2 it uses Oozie's principal and creates a proxy UGI with Hive. This 
> principal can establish a transport authenticated using Kerberos. It stores 
> the HMS delegation token string in the sessionConf and sessionToken. Now, 
> lets say Oozie issues a {{GetDelegationToken}} which has {{Joe}} as the owner 
> and {{oozie}} as the renewer in {{GetDelegationTokenReq}}. This API call 
> cannot instantiate a HMSClient and open transport to HMS using the HMSToken 
> string available in the sessionConf, since DBTokenStore uses server HiveConf 
> instead of sessionConf. It tries to establish transport using Kerberos and it 
> fails since user Joe is not Kerberos enabled.
> I see the following exception trace in HS2 logs.
> {noformat}
> 2017-08-21T18:07:19,644 ERROR [HiveServer2-Handler-Pool: Thread-61] 
> transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_121]
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_121]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_121]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  [hadoop-common-2.7.2.jar:?]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:488)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:255)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
>  [hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method) ~[?:1.8.0_121]
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  [?:1.8.0_121]
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  [?:1.8.0_121]
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
> [?:1.8.0_121]
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1699)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>  

[jira] [Commented] (HIVE-17360) Tez session reopen appears to use a wrong conf object

2017-08-21 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136116#comment-16136116
 ] 

Siddharth Seth commented on HIVE-17360:
---

+1

> Tez session reopen appears to use a wrong conf object
> -
>
> Key: HIVE-17360
> URL: https://issues.apache.org/jira/browse/HIVE-17360
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17360.01.patch, HIVE-17360.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17241) Change metastore classes to not use the shims

2017-08-21 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136114#comment-16136114
 ] 

Vihang Karajgaonkar edited comment on HIVE-17241 at 8/22/17 12:56 AM:
--

Hi [~alangates] Sorry for not getting back to you in time. I noticed that the 
{{getTokenStore}} method in {{MetastoreDelegationTokenManager}} class will not 
work for {{DBTokenStore}} and {{ZKTokenStore}} since they implement 
{{org.apache.hadoop.hive.thrift.DelegationTokenStore}} instead of 
{{org.apache.hadoop.hive.metastore.security.DelegationTokenStore}}. Is this 
intentional? How do you propose users to handle this? Should we move the these 
two tokenStores to standalone-metastore package too?


was (Author: vihangk1):
Hi [~alangates] Sorry for not getting back to you in time. I noticed that the 
{{getTokenStore}} method in {{MetastoreDelegationTokenManager}} class will not 
work for {{DBTokenStore}} and {{ZKTokenStore}} since they implement 
{{org.apache.hadoop.hive.thrift.DelegationTokenStore}} instead of 
{{org.apache.hadoop.hive.metastore.security.DelegationTokenStore}}. Is this 
intentional? How do you propose users to handle this? Should we move the other 
these tokenStores to standalone-metastore package too?

> Change metastore classes to not use the shims
> -
>
> Key: HIVE-17241
> URL: https://issues.apache.org/jira/browse/HIVE-17241
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 3.0.0
>
> Attachments: HIVE-17241.2.patch, HIVE-17241.patch
>
>
> As part of moving the metastore into a standalone package, it will no longer 
> have access to the shims.  This means we need to either copy them or access 
> the underlying Hadoop operations directly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17241) Change metastore classes to not use the shims

2017-08-21 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136114#comment-16136114
 ] 

Vihang Karajgaonkar commented on HIVE-17241:


Hi [~alangates] Sorry for not getting back to you in time. I noticed that the 
{{getTokenStore}} method in {{MetastoreDelegationTokenManager}} class will not 
work for {{DBTokenStore}} and {{ZKTokenStore}} since they implement 
{{org.apache.hadoop.hive.thrift.DelegationTokenStore}} instead of 
{{org.apache.hadoop.hive.metastore.security.DelegationTokenStore}}. Is this 
intentional? How do you propose users to handle this? Should we move the other 
these tokenStores to standalone-metastore package too?

> Change metastore classes to not use the shims
> -
>
> Key: HIVE-17241
> URL: https://issues.apache.org/jira/browse/HIVE-17241
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 3.0.0
>
> Attachments: HIVE-17241.2.patch, HIVE-17241.patch
>
>
> As part of moving the metastore into a standalone package, it will no longer 
> have access to the shims.  This means we need to either copy them or access 
> the underlying Hadoop operations directly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17303) Missmatch between roaring bitmap library used by druid and the one coming from tez

2017-08-21 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136111#comment-16136111
 ] 

slim bouguerra commented on HIVE-17303:
---

[~ashutoshc] can we commit this.

> Missmatch between roaring bitmap library used by druid and the one coming 
> from tez
> --
>
> Key: HIVE-17303
> URL: https://issues.apache.org/jira/browse/HIVE-17303
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17303.patch
>
>
> {code} 
>  
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NoSuchMethodError: 
> org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:165)
>   ... 25 more
> Caused by: java.lang.NoSuchMethodError: 
> org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
>   at 
> org.apache.hive.druid.com.metamx.collections.bitmap.WrappedRoaringBitmap.toImmutableBitmap(WrappedRoaringBitmap.java:65)
>   at 
> org.apache.hive.druid.com.metamx.collections.bitmap.RoaringBitmapFactory.makeImmutableBitmap(RoaringBitmapFactory.java:88)
>   at 
> org.apache.hive.druid.io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:348)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:218)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.merge(IndexMerger.java:438)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:186)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:152)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:996)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:93)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:385)
>   at 
> org.apache.hive.druid.io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44)
>   ... 4 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
> killedTasks:89, Vertex vertex_1502470020457_0005_12_05 [Reducer 2] 
> killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to 
> VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
> Options
> Attachments
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation

2017-08-21 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136066#comment-16136066
 ] 

Vineet Garg commented on HIVE-17308:


Thanks [~ashutoshc]. I have addressed the review comments and have uploaded new 
patch. I'll push it once i get clean run.

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, 
> HIVE-17308.6.patch, HIVE-17308.7.patch, HIVE-17308.8.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-21 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Attachment: HIVE-17308.8.patch

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, 
> HIVE-17308.6.patch, HIVE-17308.7.patch, HIVE-17308.8.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17367) IMPORT table doesn't load data from export dumps for insert operation.

2017-08-21 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17367:

Summary: IMPORT table doesn't load data from export dumps for insert 
operation.  (was: IMPORT should overwrite the table if the dump has same state 
as table.)

> IMPORT table doesn't load data from export dumps for insert operation.
> --
>
> Key: HIVE-17367
> URL: https://issues.apache.org/jira/browse/HIVE-17367
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Import/Export, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Fix For: 3.0.0
>
>
> Repl v1 creates a set of EXPORT/IMPORT commands to replicate modified data 
> (as per events) across clusters.
> For instance, let's say, insert generates 2 events such as
> ALTER_TABLE (ID: 10)
> INSERT (ID: 11)
> Each event generates a set of EXPORT and IMPORT commands.
> ALTER_TABLE event generates metadata only export/import
> INSERT generates metadata+data export/import.
> As Hive always dump the latest copy of table during export, it sets the 
> latest notification event ID as current state of it. So, in this example, 
> import of metadata by ALTER_TABLE event sets the current state of the table 
> as 11.
> Now, when we try to import the data dumped by INSERT event, it is noop as the 
> table's current state(11) is equal to the dump state (11) which in-turn leads 
> to the data never gets replicated to target cluster.
> So, it is necessary to allow overwrite of table/partition if their current 
> state equals the dump state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-21 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136029#comment-16136029
 ] 

Alexander Kolbasov commented on HIVE-16886:
---

>From what I saw all mechanisms supported by DataNucleus can not guarantee that 
>there are no holes (although tehy deal with duplicates). There may be two 
>kinds of holes - temporary ones (that will be filled later when transaction is 
>committed) and permanent ones (which will be never filled in). 

MySQL InnoDB has a mechanism to provide no-holes semantics 
(https://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html) 
but ir seems that Oracle and PostgreSQL don't have similar mechanisms.

Holes create troubles for consumers. E.g. when consumer reads notification ID 
10 some earlier IDs may be committed later so they may be skipped or not 
processed properly. Also, consumers have no way of knowing whether the hole 
will be filled in later (when corresponding transaction commits) or never (if 
the transaction that allocated ID fails).

There is a way to guarantee that there re no holes and duplicates:

a) Make the ID primary key (which doesn't allow duplicates)
b) As part of the transaction, read the value of ID, increment it by 1 and 
persist it.

This approach guarantees that there are no holes or duplicates, but 
transactions can fail because of the ID conflicts, so it is important to retry 
such transactions. HMS doesn't provide per-transaction retries, but may be 
per-operation retries are Ok as well.

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-21 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16886:
---
Status: Patch Available  (was: Open)

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-21 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16886:
---
Attachment: HIVE-16886.1.patch

attaching patch for test run.

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: HIVE-16886.1.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17364) Add unit test to "alter view" replication

2017-08-21 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135976#comment-16135976
 ] 

Tao Li commented on HIVE-17364:
---

Test failures are not related.

> Add unit test to "alter view" replication
> -
>
> Key: HIVE-17364
> URL: https://issues.apache.org/jira/browse/HIVE-17364
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Attachments: HIVE-17364.1.patch
>
>
> Adding a unit test to HIVE-17354 change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17364) Add unit test to "alter view" replication

2017-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135931#comment-16135931
 ] 

Hive QA commented on HIVE-17364:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882933/HIVE-17364.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10994 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[setop_no_distinct] 
(batchId=77)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistCpAs (batchId=250)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistcp (batchId=250)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testParallelCompilation (batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6474/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6474/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6474/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882933 - PreCommit-HIVE-Build

> Add unit test to "alter view" replication
> -
>
> Key: HIVE-17364
> URL: https://issues.apache.org/jira/browse/HIVE-17364
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Attachments: HIVE-17364.1.patch
>
>
> Adding a unit test to HIVE-17354 change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17367) IMPORT should overwrite the table if the dump has same state as table.

2017-08-21 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-17367:
---


> IMPORT should overwrite the table if the dump has same state as table.
> --
>
> Key: HIVE-17367
> URL: https://issues.apache.org/jira/browse/HIVE-17367
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Import/Export, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Fix For: 3.0.0
>
>
> Repl v1 creates a set of EXPORT/IMPORT commands to replicate modified data 
> (as per events) across clusters.
> For instance, let's say, insert generates 2 events such as
> ALTER_TABLE (ID: 10)
> INSERT (ID: 11)
> Each event generates a set of EXPORT and IMPORT commands.
> ALTER_TABLE event generates metadata only export/import
> INSERT generates metadata+data export/import.
> As Hive always dump the latest copy of table during export, it sets the 
> latest notification event ID as current state of it. So, in this example, 
> import of metadata by ALTER_TABLE event sets the current state of the table 
> as 11.
> Now, when we try to import the data dumped by INSERT event, it is noop as the 
> table's current state(11) is equal to the dump state (11) which in-turn leads 
> to the data never gets replicated to target cluster.
> So, it is necessary to allow overwrite of table/partition if their current 
> state equals the dump state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-21 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Status: Patch Available  (was: Open)

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded database.{color}



--
This message was sent by 

[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-21 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135840#comment-16135840
 ] 

anishek commented on HIVE-16886:


[~spena] thanks for your insights.

* Sequence is not present in Mysql 5.6 and MS SQL and hence using that as the 
datastore-identity strategy will not work for all supported databases 
(https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin).
* leaving it to "native" should work for us since the underlying column is a 
bigint and should result in numeric generation, I am not aware of a database 
generating, out of order numeric values when using "auto-increment", do you 
know of any such use case ?
* I like the sequential guarantee that point 3, provides, but this should 
happen only for concurrent transactions / queries / modifying disjoint 
operations. In such cases it should not matter if one is applied before the 
other or vice versa, the database would still be in valid state ? even if we 
use the current_timestamp from db it would result in similar order of events as 
we do currently such that you can get 1,3,2 because it will be enforced at 
database insert time and not when the operation started. From what I see point 
1, looks like clients want the order of events to be the order in which they 
were started rather than the order which they were committed, where in the 
latter makes sense since at the database level that is how the events were 
applied. 

Please let me know what you think ?

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-21 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Attachment: (was: HIVE-17100.05.patch)

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded database.{color}



--
This message was sent 

[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-21 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Attachment: HIVE-17100.05.patch

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded database.{color}



--
This message was sent by 

[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-21 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Status: Open  (was: Patch Available)

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch, HIVE-17100.05.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded database.{color}



--
This message was sent by 

[jira] [Commented] (HIVE-16669) Fine tune Compaction to take advantage of Acid 2.0

2017-08-21 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135751#comment-16135751
 ] 

Eugene Koifman commented on HIVE-16669:
---

see todo: TestAcidOnTez.testCtasTezUnion

> Fine tune Compaction to take advantage of Acid 2.0
> --
>
> Key: HIVE-16669
> URL: https://issues.apache.org/jira/browse/HIVE-16669
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> * There is little point using 2.0 vectorized reader since there is no 
> operator pipeline in compaction
> * If minor compaction just concats delete_delta files together, then the 2 
> stage compaction should always ensure that we have a limited number of Orc 
> readers to do the merging and current OrcRawRecordMerger should be fine
> * ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17365) Druid CTAS should support CHAR/VARCHAR type

2017-08-21 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135740#comment-16135740
 ] 

Jesus Camacho Rodriguez commented on HIVE-17365:


Fails are not related. [~ashutoshc], could you take a look? Thanks

> Druid CTAS should support CHAR/VARCHAR type
> ---
>
> Key: HIVE-17365
> URL: https://issues.apache.org/jira/browse/HIVE-17365
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Dileep Kumar Chiguruvada
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17365.patch
>
>
> Currently this type is not recognized and we throw an exception when we try 
> to create a table with it.
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.serde2.SerDeException: Unknown type: CHAR
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:788)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:955)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:903)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:478)
>   ... 19 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: Unknown type: CHAR
>   at 
> org.apache.hadoop.hive.druid.serde.DruidSerDe.serialize(DruidSerDe.java:501)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:715)
>   ... 24 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17365) Druid CTAS should support CHAR/VARCHAR type

2017-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135734#comment-16135734
 ] 

Hive QA commented on HIVE-17365:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882937/HIVE-17365.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10994 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_notexists_having]
 (batchId=81)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6473/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6473/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6473/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882937 - PreCommit-HIVE-Build

> Druid CTAS should support CHAR/VARCHAR type
> ---
>
> Key: HIVE-17365
> URL: https://issues.apache.org/jira/browse/HIVE-17365
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Dileep Kumar Chiguruvada
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17365.patch
>
>
> Currently this type is not recognized and we throw an exception when we try 
> to create a table with it.
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.serde2.SerDeException: Unknown type: CHAR
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:788)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:955)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:903)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:478)
>   ... 19 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: Unknown type: CHAR
>   at 
> org.apache.hadoop.hive.druid.serde.DruidSerDe.serialize(DruidSerDe.java:501)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:715)
>   ... 24 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135733#comment-16135733
 ] 

Sergio Peña commented on HIVE-16886:


[~anishek] [~thejas] Thanks for starting working on this. I have a few of 
comments about using NL_ID that we should keep in mind on the patch. This is 
based on some other tests we have done with datastore incremental IDs.

1. Use the {{sequence}} strategy on the NL_ID. This should keep increments 
atomic. However, we have found that these increments do not guarantee an order 
when committing them. We have seen commits ID of order 1,3,2 because of some 
delay in the transaction commit. We call this out-of-order as 'holes' because 
when fetching updates we may get 1,3 only (2 is not there because will be 
committed later).

2. To solve the holes problem on our client requests when getting new 
notifications, we should request them in a temporal order instead of ID order. 
To do that, we should write a new API to get notifications based on a timestamp 
instead of an ID. Timestamps will keep the order correctly.

3. Use a SQL timestamp instead of using the one on DbNotificationListener#now() 
method. If possible, use a milliseconds timestamp, such as 
{{current_timestamp(6)}} from SQL. The now() method run in different HMS 
servers may be different and may be out-of-sync having a weird order sometimes. 
We found that {{current_timestamp(6)}} is executed at the moment of the SQL 
execution; so it is the best time we can get of the transaction.

This next section only applies to what we do as a client requesting 
notifications (not part for the patch, but useful to know).

On the client side, we sometimes do not request notifications for a period 
higher than the HMS clean-up thread. This means that we may miss notifications 
that were removed during that time. To avoid this issue on our side, we're 
requesting HMS notifications for a time window period and reapplying all of 
them for that time. If for some reason we get fewer notifications than 
expected, then we assume that older events were purged, and we may request a 
new HMS snapshot if necessary.

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> 

[jira] [Updated] (HIVE-17360) Tez session reopen appears to use a wrong conf object

2017-08-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17360:

Attachment: HIVE-17360.01.patch

Same patch, HiveQA was down it looks like. 

> Tez session reopen appears to use a wrong conf object
> -
>
> Key: HIVE-17360
> URL: https://issues.apache.org/jira/browse/HIVE-17360
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17360.01.patch, HIVE-17360.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used

2017-08-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17327:

   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master and branch-2. Thanks for the review!

> LLAP IO: restrict native file ID usage to default FS to avoid hypothetical 
> collisions when HDFS federation is used
> --
>
> Key: HIVE-17327
> URL: https://issues.apache.org/jira/browse/HIVE-17327
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17327.01.patch, HIVE-17327.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters

2017-08-21 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HIVE-17237:
--
Status: Patch Available  (was: In Progress)

Just rebased the change and submitted the new patch.

> HMS wastes 26.4% of memory due to dup strings in 
> metastore.api.Partition.parameters
> ---
>
> Key: HIVE-17237
> URL: https://issues.apache.org/jira/browse/HIVE-17237
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-17237.01.patch, HIVE-17237.02.patch
>
>
> I've analyzed a heap dump from a production Hive installation using jxray 
> (www.jxray.com) It turns out that there are a lot of duplicate strings in 
> memory, that waste 26.4% of the heap. Most of them come from HashMaps 
> referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. 
> Below is the relevant section of the jxray report.
> Looking at Partition.java, I see that in the past somebody has already added 
> code to intern keys and values in the parameters table when it's first set 
> up. However, when more key-value pairs are added, they are not interned, and 
> that probably explains the reason for all these duplicate strings. Also when 
> a Partition instance is deserialized, no interning of parameters is currently 
> done.
> {code}
> 6. DUPLICATE STRINGS
> Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  
> Overhead: 3,220,458K (26.4%)
> 
> ===
> 7. REFERENCE CHAINS FOR DUPLICATE STRINGS
>   2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing 
> arrays:
> 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 
> 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of 
> "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 
> 3560]"
> ... and 419200 more strings, of which 36376 are unique
> Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 
> 28 of "2", 21 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
>   463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing 
> arrays:
> 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of 
> "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
> ... and 84009 more strings, of which 34065 are unique
> Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 
> of "2", 3 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68]
>   233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays:
> 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 
> of "10" ... and 44568 more strings, of which 27285 are unique
> Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of 
> "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- Java Local (j.u.ArrayList) 
> [@4f4cfbd10,@536122408,@726616778]
> ...
>   52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays:
>  <--  {j.u.HashMap}.keys <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters

2017-08-21 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HIVE-17237:
--
Attachment: HIVE-17237.02.patch

> HMS wastes 26.4% of memory due to dup strings in 
> metastore.api.Partition.parameters
> ---
>
> Key: HIVE-17237
> URL: https://issues.apache.org/jira/browse/HIVE-17237
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-17237.01.patch, HIVE-17237.02.patch
>
>
> I've analyzed a heap dump from a production Hive installation using jxray 
> (www.jxray.com) It turns out that there are a lot of duplicate strings in 
> memory, that waste 26.4% of the heap. Most of them come from HashMaps 
> referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. 
> Below is the relevant section of the jxray report.
> Looking at Partition.java, I see that in the past somebody has already added 
> code to intern keys and values in the parameters table when it's first set 
> up. However, when more key-value pairs are added, they are not interned, and 
> that probably explains the reason for all these duplicate strings. Also when 
> a Partition instance is deserialized, no interning of parameters is currently 
> done.
> {code}
> 6. DUPLICATE STRINGS
> Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  
> Overhead: 3,220,458K (26.4%)
> 
> ===
> 7. REFERENCE CHAINS FOR DUPLICATE STRINGS
>   2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing 
> arrays:
> 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 
> 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of 
> "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 
> 3560]"
> ... and 419200 more strings, of which 36376 are unique
> Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 
> 28 of "2", 21 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
>   463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing 
> arrays:
> 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of 
> "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
> ... and 84009 more strings, of which 34065 are unique
> Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 
> of "2", 3 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68]
>   233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays:
> 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 
> of "10" ... and 44568 more strings, of which 27285 are unique
> Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of 
> "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- Java Local (j.u.ArrayList) 
> [@4f4cfbd10,@536122408,@726616778]
> ...
>   52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays:
>  <--  {j.u.HashMap}.keys <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17330) refactor TezSessionPoolManager to separate its multiple functions

2017-08-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135639#comment-16135639
 ] 

Sergey Shelukhin commented on HIVE-17330:
-

[~hagleitn] do you want to take a look at this?

> refactor TezSessionPoolManager to separate its multiple functions
> -
>
> Key: HIVE-17330
> URL: https://issues.apache.org/jira/browse/HIVE-17330
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17330.01.patch, HIVE-17330.patch
>
>
> TezSessionPoolManager would retain things specific to current Hive session 
> management. 
> The session pool itself, as well as expiration tracking, the pool session 
> implementation, and some config validation can be separated out and made 
> independent from the pool.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters

2017-08-21 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HIVE-17237:
--
Status: In Progress  (was: Patch Available)

> HMS wastes 26.4% of memory due to dup strings in 
> metastore.api.Partition.parameters
> ---
>
> Key: HIVE-17237
> URL: https://issues.apache.org/jira/browse/HIVE-17237
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-17237.01.patch, HIVE-17237.02.patch
>
>
> I've analyzed a heap dump from a production Hive installation using jxray 
> (www.jxray.com) It turns out that there are a lot of duplicate strings in 
> memory, that waste 26.4% of the heap. Most of them come from HashMaps 
> referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. 
> Below is the relevant section of the jxray report.
> Looking at Partition.java, I see that in the past somebody has already added 
> code to intern keys and values in the parameters table when it's first set 
> up. However, when more key-value pairs are added, they are not interned, and 
> that probably explains the reason for all these duplicate strings. Also when 
> a Partition instance is deserialized, no interning of parameters is currently 
> done.
> {code}
> 6. DUPLICATE STRINGS
> Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  
> Overhead: 3,220,458K (26.4%)
> 
> ===
> 7. REFERENCE CHAINS FOR DUPLICATE STRINGS
>   2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing 
> arrays:
> 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 
> 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of 
> "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 
> 3560]"
> ... and 419200 more strings, of which 36376 are unique
> Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 
> 28 of "2", 21 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
>   463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing 
> arrays:
> 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of 
> "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
> ... and 84009 more strings, of which 34065 are unique
> Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 
> of "2", 3 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68]
>   233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays:
> 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 
> of "10" ... and 44568 more strings, of which 27285 are unique
> Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of 
> "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- Java Local (j.u.ArrayList) 
> [@4f4cfbd10,@536122408,@726616778]
> ...
>   52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays:
>  <--  {j.u.HashMap}.keys <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17348) Remove unnecessary GenSparkUtils.java.orig file

2017-08-21 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135632#comment-16135632
 ] 

Sahil Takiar commented on HIVE-17348:
-

whoops, thanks for catching this. +1

> Remove unnecessary GenSparkUtils.java.orig file
> ---
>
> Key: HIVE-17348
> URL: https://issues.apache.org/jira/browse/HIVE-17348
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17348.patch
>
>
> HIVE-17247 added an extra file, which is most probably not needed :)
> [~stakiar]? :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17352) HiveSever2 error with "Illegal Operation state transition from CLOSED to FINISHED"

2017-08-21 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135619#comment-16135619
 ] 

Vaibhav Gumashta commented on HIVE-17352:
-

+1

> HiveSever2 error with "Illegal Operation state transition from CLOSED to 
> FINISHED"
> --
>
> Key: HIVE-17352
> URL: https://issues.apache.org/jira/browse/HIVE-17352
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17352.1.patch
>
>
> HiveSever2 error with "Illegal Operation state transition from CLOSED to 
> FINISHED"
> Many cases like CANCELED, TIMEDOUT AND CLOSED are handled. Need to handle 
> FINISHED in runQuery() method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17362) The MAX_PREWARM_TIME should be configurable on HoS

2017-08-21 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17362:
--
Attachment: HIVE-17362.2.patch

Addressed review comments, and test failures

> The MAX_PREWARM_TIME should be configurable on HoS
> --
>
> Key: HIVE-17362
> URL: https://issues.apache.org/jira/browse/HIVE-17362
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17362.2.patch, HIVE-17362.patch
>
>
> When using HIVE_PREWARM_ENABLED, we are waiting MAX_PREWARM_TIME for the 
> containers to warm up. This is currently set to 5s. This is often not enough 
> for a spark session to initialize the executors. We should be able to 
> configure this, so we can set a value which has an effect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17362) The MAX_PREWARM_TIME should be configurable on HoS

2017-08-21 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135615#comment-16135615
 ] 

Peter Vary commented on HIVE-17362:
---

Thanks for the review [~xuefuz]!

According to my tests 30s should be enough. I used 4 mins only because the 
original code in QTestUtil used this specific value:
{code:title=QTestUtil}
  private CliSessionState createSessionState() {
   return new CliSessionState(conf) {
  @Override
  public void setSparkSession(SparkSession sparkSession) {
[..]
// Wait a little for cluster to init, at most 4 minutes
long endTime = System.currentTimeMillis() + 24;
[..]
  }
};
  }
{code}

Shall I change it to a lower value?

Good catch with the {{hive.prewarm.numcontainers}}. We have to set it in case 
of the standalone configuration, thanks for pointing that out. With the 
yarn-master we do not need it, since we define {{spark.executor.instances}}

Also there was an issue with perwarm - it was only used in case of yarn master. 
We should enable it in local mode as well. See the original code. This caused 
the {{TestSparkNegativeCliDriver}} failures:
{code:title=RemoteHiveSparkClient}
  private void createRemoteClient() throws Exception {
remoteClient = SparkClientFactory.createClient(conf, hiveConf);

if (HiveConf.getBoolVar(hiveConf, ConfVars.HIVE_PREWARM_ENABLED) &&
SparkClientUtilities.isYarnMaster(hiveConf.get("spark.master"))) {
[..]
}
  }
{code}

> The MAX_PREWARM_TIME should be configurable on HoS
> --
>
> Key: HIVE-17362
> URL: https://issues.apache.org/jira/browse/HIVE-17362
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17362.patch
>
>
> When using HIVE_PREWARM_ENABLED, we are waiting MAX_PREWARM_TIME for the 
> containers to warm up. This is currently set to 5s. This is often not enough 
> for a spark session to initialize the executors. We should be able to 
> configure this, so we can set a value which has an effect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135601#comment-16135601
 ] 

Hive QA commented on HIVE-14731:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882920/HIVE-14731.20.patch

{color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 38 failed/errored test(s), 10992 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_mat_4] (batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_partition_pruning_2]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join0] 
(batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join_filters]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join_nulls]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_12]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_prod_1]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_prod_2]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_prod_4]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_product_check_2]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[jdbc_handler]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[leftsemijoin]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_exists]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_multi]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_null_agg]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_between_columns]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_complex_all]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_mapjoin]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_include_no_sel]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join_filters]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join_nulls]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_multi_output_select]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[hybridgrace_hashjoin_1]
 (batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=242)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6472/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6472/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6472/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 38 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882920 - 

[jira] [Updated] (HIVE-17366) Constraint replication in bootstrap

2017-08-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-17366:
--
Attachment: HIVE-17366.1.patch

> Constraint replication in bootstrap
> ---
>
> Key: HIVE-17366
> URL: https://issues.apache.org/jira/browse/HIVE-17366
> Project: Hive
>  Issue Type: New Feature
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-17366.1.patch
>
>
> Incremental constraint replication is tracked in HIVE-15705. This is to track 
> the bootstrap replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17366) Constraint replication in bootstrap

2017-08-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-17366:
--
Status: Patch Available  (was: Open)

> Constraint replication in bootstrap
> ---
>
> Key: HIVE-17366
> URL: https://issues.apache.org/jira/browse/HIVE-17366
> Project: Hive
>  Issue Type: New Feature
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-17366.1.patch
>
>
> Incremental constraint replication is tracked in HIVE-15705. This is to track 
> the bootstrap replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17366) Constraint replication in bootstrap

2017-08-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-17366:
--
Issue Type: New Feature  (was: Bug)

> Constraint replication in bootstrap
> ---
>
> Key: HIVE-17366
> URL: https://issues.apache.org/jira/browse/HIVE-17366
> Project: Hive
>  Issue Type: New Feature
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>
> Incremental constraint replication is tracked in HIVE-15705. This is to track 
> the bootstrap replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17366) Constraint replication in bootstrap

2017-08-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned HIVE-17366:
-


> Constraint replication in bootstrap
> ---
>
> Key: HIVE-17366
> URL: https://issues.apache.org/jira/browse/HIVE-17366
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>
> Incremental constraint replication is tracked in HIVE-15705. This is to track 
> the bootstrap replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17364) Add unit test to "alter view" replication

2017-08-21 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17364:
--
Status: Patch Available  (was: Open)

> Add unit test to "alter view" replication
> -
>
> Key: HIVE-17364
> URL: https://issues.apache.org/jira/browse/HIVE-17364
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Attachments: HIVE-17364.1.patch
>
>
> Adding a unit test to HIVE-17354 change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17364) Add unit test to "alter view" replication

2017-08-21 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135567#comment-16135567
 ] 

Daniel Dai commented on HIVE-17364:
---

+1 pending test.

> Add unit test to "alter view" replication
> -
>
> Key: HIVE-17364
> URL: https://issues.apache.org/jira/browse/HIVE-17364
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Attachments: HIVE-17364.1.patch
>
>
> Adding a unit test to HIVE-17354 change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17365) Druid CTAS should support CHAR/VARCHAR type

2017-08-21 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17365 started by Jesus Camacho Rodriguez.
--
> Druid CTAS should support CHAR/VARCHAR type
> ---
>
> Key: HIVE-17365
> URL: https://issues.apache.org/jira/browse/HIVE-17365
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Dileep Kumar Chiguruvada
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17365.patch
>
>
> Currently this type is not recognized and we throw an exception when we try 
> to create a table with it.
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.serde2.SerDeException: Unknown type: CHAR
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:788)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:955)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:903)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:478)
>   ... 19 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: Unknown type: CHAR
>   at 
> org.apache.hadoop.hive.druid.serde.DruidSerDe.serialize(DruidSerDe.java:501)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:715)
>   ... 24 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17365) Druid CTAS should support CHAR/VARCHAR type

2017-08-21 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17365:
---
Attachment: HIVE-17365.patch

> Druid CTAS should support CHAR/VARCHAR type
> ---
>
> Key: HIVE-17365
> URL: https://issues.apache.org/jira/browse/HIVE-17365
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Dileep Kumar Chiguruvada
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17365.patch
>
>
> Currently this type is not recognized and we throw an exception when we try 
> to create a table with it.
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.serde2.SerDeException: Unknown type: CHAR
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:788)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:955)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:903)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:478)
>   ... 19 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: Unknown type: CHAR
>   at 
> org.apache.hadoop.hive.druid.serde.DruidSerDe.serialize(DruidSerDe.java:501)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:715)
>   ... 24 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17365) Druid CTAS should support CHAR/VARCHAR type

2017-08-21 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17365:
---
Status: Patch Available  (was: In Progress)

> Druid CTAS should support CHAR/VARCHAR type
> ---
>
> Key: HIVE-17365
> URL: https://issues.apache.org/jira/browse/HIVE-17365
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Dileep Kumar Chiguruvada
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17365.patch
>
>
> Currently this type is not recognized and we throw an exception when we try 
> to create a table with it.
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.serde2.SerDeException: Unknown type: CHAR
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:788)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:955)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:903)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:478)
>   ... 19 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: Unknown type: CHAR
>   at 
> org.apache.hadoop.hive.druid.serde.DruidSerDe.serialize(DruidSerDe.java:501)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:715)
>   ... 24 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17364) Add unit test to "alter view" replication

2017-08-21 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135543#comment-16135543
 ] 

Tao Li commented on HIVE-17364:
---

[~daijy] Can you please take a look at this change? cc [~thejas]

> Add unit test to "alter view" replication
> -
>
> Key: HIVE-17364
> URL: https://issues.apache.org/jira/browse/HIVE-17364
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Attachments: HIVE-17364.1.patch
>
>
> Adding a unit test to HIVE-17354 change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17364) Add unit test to "alter view" replication

2017-08-21 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17364:
--
Description: Adding a unit test to HIVE-17354 change.

> Add unit test to "alter view" replication
> -
>
> Key: HIVE-17364
> URL: https://issues.apache.org/jira/browse/HIVE-17364
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Attachments: HIVE-17364.1.patch
>
>
> Adding a unit test to HIVE-17354 change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17365) Druid CTAS should support CHAR/VARCHAR type

2017-08-21 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-17365:
--


> Druid CTAS should support CHAR/VARCHAR type
> ---
>
> Key: HIVE-17365
> URL: https://issues.apache.org/jira/browse/HIVE-17365
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: Dileep Kumar Chiguruvada
>Assignee: Jesus Camacho Rodriguez
>
> Currently this type is not recognized and we throw an exception when we try 
> to create a table with it.
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.serde2.SerDeException: Unknown type: CHAR
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:788)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:955)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:903)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:478)
>   ... 19 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: Unknown type: CHAR
>   at 
> org.apache.hadoop.hive.druid.serde.DruidSerDe.serialize(DruidSerDe.java:501)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:715)
>   ... 24 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17364) Add unit test to "alter view" replication

2017-08-21 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17364:
--
Attachment: HIVE-17364.1.patch

> Add unit test to "alter view" replication
> -
>
> Key: HIVE-17364
> URL: https://issues.apache.org/jira/browse/HIVE-17364
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Attachments: HIVE-17364.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17364) Add unit test to "alter view" replication

2017-08-21 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li reassigned HIVE-17364:
-


> Add unit test to "alter view" replication
> -
>
> Key: HIVE-17364
> URL: https://issues.apache.org/jira/browse/HIVE-17364
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17362) The MAX_PREWARM_TIME should be configurable on HoS

2017-08-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135519#comment-16135519
 ] 

Xuefu Zhang commented on HIVE-17362:


I like the idea of replacing the code trick with the configurations. However, 
it seems to me that 4m is a long time. Nevertheless, we probably need to set 
{{hive.prewarm.numcontainers}} to whatever number of containers we are 
expecting. Otherwise, the test might have to wait for 4m as it will never gets 
the default number (10) of containers.

> The MAX_PREWARM_TIME should be configurable on HoS
> --
>
> Key: HIVE-17362
> URL: https://issues.apache.org/jira/browse/HIVE-17362
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17362.patch
>
>
> When using HIVE_PREWARM_ENABLED, we are waiting MAX_PREWARM_TIME for the 
> containers to warm up. This is currently set to 5s. This is often not enough 
> for a spark session to initialize the executors. We should be able to 
> configure this, so we can set a value which has an effect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17353) The ResultSets are not accessible if running multiple queries within the same HiveStatement

2017-08-21 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17353:

Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

It's by design from the doc here 
https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html. So closing 
now.

> The ResultSets are not accessible if running multiple queries within the same 
> HiveStatement 
> 
>
> Key: HIVE-17353
> URL: https://issues.apache.org/jira/browse/HIVE-17353
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17353.1.patch
>
>
> The following queries would fail,
> {noformat}
> ResultSet rs1 =
> stmt.executeQuery("select * from testMultipleResultSets1");
> ResultSet rs2 =
> stmt.executeQuery("select * from testMultipleResultSets2");
> rs1.next();
> rs2.next();
> {noformat}
> with the exception:
> {noformat}
> [HiveServer2-Handler-Pool: Thread-208]: Error fetching results: 
> org.apache.hive.service.cli.HiveSQLException: Invalid OperationHandle: 
> OperationHandle [opType=EXECUTE_STATEMENT, 
> getHandleIdentifier()=8a1c4fe5-e80b-4d9a-b673-78d92b3baaa8]
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperation(OperationManager.java:177)
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:462)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:691)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-08-21 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.20.patch

Attached rebased patch.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, 
> HIVE-14731.18.patch, HIVE-14731.19.patch, HIVE-14731.1.patch, 
> HIVE-14731.20.patch, HIVE-14731.2.patch, HIVE-14731.3.patch, 
> HIVE-14731.4.patch, HIVE-14731.5.patch, HIVE-14731.6.patch, 
> HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17357) Plugin jars are not properly added for LocalHiveSparkClient

2017-08-21 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17357:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Xuefu for reviewing.

> Plugin jars are not properly added for LocalHiveSparkClient
> ---
>
> Key: HIVE-17357
> URL: https://issues.apache.org/jira/browse/HIVE-17357
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 3.0.0
>
> Attachments: HIVE-17357.1.patch
>
>
> I forgot to include the same change for LocalHiveSparkClient.java in 
> HIVE-17336. We need to make the same change as HIVE-17336 in 
> LocalHiveSparkClient class to include plugin jars. Maybe we should have a 
> common base class for both LocalHiveSparkClient and RemoteHiveSparkClient to 
> have some common functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17357) Plugin jars are not properly added for LocalHiveSparkClient

2017-08-21 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17357:

Summary: Plugin jars are not properly added for LocalHiveSparkClient  (was: 
Similar to HIVE-17336, plugin jars are not properly added)

> Plugin jars are not properly added for LocalHiveSparkClient
> ---
>
> Key: HIVE-17357
> URL: https://issues.apache.org/jira/browse/HIVE-17357
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17357.1.patch
>
>
> I forgot to include the same change for LocalHiveSparkClient.java in 
> HIVE-17336. We need to make the same change as HIVE-17336 in 
> LocalHiveSparkClient class to include plugin jars. Maybe we should have a 
> common base class for both LocalHiveSparkClient and RemoteHiveSparkClient to 
> have some common functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters

2017-08-21 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135448#comment-16135448
 ] 

Sahil Takiar commented on HIVE-17237:
-

[~mi...@cloudera.com] this need to be rebased. HIVE-17170 moved the location of 
all the Thrift generated files.

> HMS wastes 26.4% of memory due to dup strings in 
> metastore.api.Partition.parameters
> ---
>
> Key: HIVE-17237
> URL: https://issues.apache.org/jira/browse/HIVE-17237
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-17237.01.patch
>
>
> I've analyzed a heap dump from a production Hive installation using jxray 
> (www.jxray.com) It turns out that there are a lot of duplicate strings in 
> memory, that waste 26.4% of the heap. Most of them come from HashMaps 
> referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. 
> Below is the relevant section of the jxray report.
> Looking at Partition.java, I see that in the past somebody has already added 
> code to intern keys and values in the parameters table when it's first set 
> up. However, when more key-value pairs are added, they are not interned, and 
> that probably explains the reason for all these duplicate strings. Also when 
> a Partition instance is deserialized, no interning of parameters is currently 
> done.
> {code}
> 6. DUPLICATE STRINGS
> Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  
> Overhead: 3,220,458K (26.4%)
> 
> ===
> 7. REFERENCE CHAINS FOR DUPLICATE STRINGS
>   2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing 
> arrays:
> 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 
> 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of 
> "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 
> 3560]"
> ... and 419200 more strings, of which 36376 are unique
> Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 
> 28 of "2", 21 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
>   463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing 
> arrays:
> 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of 
> "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
> ... and 84009 more strings, of which 34065 are unique
> Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 
> of "2", 3 of "0"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68]
>   233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays:
> 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 
> of "10" ... and 44568 more strings, of which 27285 are unique
> Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of 
> "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3"
>  <--  {j.u.HashMap}.values <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- Java Local (j.u.ArrayList) 
> [@4f4cfbd10,@536122408,@726616778]
> ...
>   52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays:
>  <--  {j.u.HashMap}.keys <-- 
> org.apache.hadoop.hive.metastore.api.Partition.parameters <--  
> {j.u.ArrayList} <-- 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success
>  <-- Java Local 
> (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result)
>  [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-21 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135433#comment-16135433
 ] 

anishek commented on HIVE-16886:


I am just going to provide some more detail information so that we all get to 
understand what is happening with this bug.

As of now the plan is to hopefully have a backward compatibility with 
replication v1, though primary focus is going to be fixing the issue of 
duplicate event ids with multiple HMS, which is specifically detrimental for 
replication v2, which is now going to replication point-in-time state of the 
database rather than the latest state as was  in repl v1. 

As for the fix, there are few tests i am going to fix and have a patch for all 
of you, hopefully today/tomorrow,  to review. 
* mapping for {{NOTIFICATION_SEQUENCE}}  has been removed.
* there are effectively two class mappings for {{NOTIFICATION_LOG}} , one 
representing the current state namely using {{EVENT_ID}} as event id and new 
implementation that will use {{NL_ID}}, with the new mapping putting a default 
value of 0 for {{EVENT_ID}}. 
* Backward compatibility in terms of replication v1 not being broken, should be 
possible if the following assumption holds in the metastore rdmbs : ??value of 
NL_ID is greater than the EVENT_ID for same rows, including only events that 
are not yet replicated to replica warehouse.??
* Code to switch mappings in the metastore from {{EVENT_ID}} to {{ND_ID}} is in 
place, depending on what is required, For ex for existing repl v1, it will 
first provide all the events using the {{EVENT_ID}} and post that start 
providing events with {{NL_ID}} since {{NL_ID > EVENT_ID}} it would allow 
existing setup to continue working.


> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 


[jira] [Commented] (HIVE-17347) TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning_mapjoin_only] is failing every time

2017-08-21 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135428#comment-16135428
 ] 

Sahil Takiar commented on HIVE-17347:
-

Thanks for taking care of this [~pvary]

> TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning_mapjoin_only] is 
> failing every time
> 
>
> Key: HIVE-17347
> URL: https://issues.apache.org/jira/browse/HIVE-17347
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Fix For: 3.0.0
>
> Attachments: HIVE-17347.patch
>
>
> As [~lirui] identified there was a missing file from this patch: HIVE-17247 - 
> HoS DPP: UDFs on the partition column side does not evaluate correctly



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17335) Join query with STREAMTABLE fails by java.lang.IndexOutOfBoundsException

2017-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135426#comment-16135426
 ] 

Hive QA commented on HIVE-17335:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882884/HIVE-17335-branch-2.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 452 failed/errored test(s), 10159 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=101)

[decimal_6.q,llap_uncompressed.q,alter_file_format.q,select_unquote_not.q,join14_hadoop20.q,constprog_when_case.q,avro_change_schema.q,index_auto_empty.q,index_in_db.q,create_udaf.q,subquery_in_having.q,lateral_view_onview.q,stats_ppr_all.q,ppd_constant_expr.q,drop_table_with_stats.q]
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=109)

[unicode_notation.q,udf_named_struct.q,ql_rewrite_gbtoidx_cbo_2.q,updateBasicStats.q,subquery_in.q,constantPropagateForSubQuery.q,encryption_select_read_only_encrypted_tbl.q,load_dyn_part4.q,udf_add_months.q,insert_values_orig_table_use_metadata.q,udf_trim.q,groupby1_map_skew.q,udf_to_short.q,nullinput2.q,stats18.q]
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=143)

[rcfile_merge1.q,show_tblproperties.q,list_bucket_dml_9.q,udf_bigint.q,partition_wise_fileformat12.q,subquery_multiinsert.q,autoColumnStats_2.q,bucketcontext_6.q,temp_table_precedence.q,decimal_stats.q,decimal_serde.q,view_alias.q,input26.q,udf_from_utc_timestamp.q,show_create_table_temp_table.q]
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=19)

[udf_elt.q,join44.q,index_auto_partitioned.q,udf_case_thrift.q,inputddl2.q,repl_1_drop.q,ppd_clusterby.q,udf_testlength.q,union10.q,load_dyn_part7.q,groupby_constcolval.q,udf_union.q,masking_1_newdb.q,vector_leftsemi_mapjoin.q,distinct_windowing.q]
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=38)

[skewjoinopt3.q,timestamp_comparison.q,rename_column.q,partition_multilevels.q,udf_stddev.q,sort_merge_join_desc_2.q,alter_varchar2.q,autoColumnStats_1.q,disable_file_format_check.q,vector_left_outer_join.q,alter_rename_partition.q,alter_index.q,semijoin.q,skewjoinopt9.q,input_testxpath3.q]
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=98)

[vector_decimal_10_0.q,tez_union_dynamic_partition.q,skewjoin_mapjoin7.q,leadlag.q,parquet_types.q,lineage1.q,correlationoptimizer1.q,authorization_9.q,udf_unhex.q,vector_decimal_mapjoin.q,sample5.q,udf_reflect.q,orc_file_dump.q,windowing_distinct.q,udf_mask_show_last_n.q]
TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=522)
TestMiniTezCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=191)

[auto_join30.q,dynamic_partition_pruning.q,vector_auto_smb_mapjoin_14.q,deleteAnalyze.q,vector_outer_join2.q,vector_varchar_4.q,metadata_only_queries.q,union6.q,vector_decimal_4.q,cbo_subq_in.q,vector_reduce_groupby_decimal.q,vectorized_dynamic_partition_pruning.q,cbo_views.q,vectorization_part.q,vectorized_timestamp_funcs.q]
TestMiniTezCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=194)

[tez_joins_explain.q,tez_join.q,vectorized_rcfile_columnar.q,delete_where_non_partitioned.q,transform2.q,orc_merge11.q,cbo_semijoin.q,vector_aggregate_9.q,schema_evol_orc_acid_mapwork_part.q,delete_orig_table.q,vectorization_11.q,transform1.q,vector_reduce2.q,vector_interval_mapjoin.q,delete_where_partitioned.q]
TestMiniTezCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=195)

[enforce_order.q,schema_evol_text_vec_mapwork_part_all_complex.q,explainuser_4.q,tez_dynpart_hashjoin_3.q,orc_vectorization_ppd.q,subquery_exists.q,schema_evol_text_vecrow_mapwork_part_all_complex.q,schema_evol_orc_vec_mapwork_part_all_primitive.q,mergejoin_3way.q,alter_merge_stats_orc.q,windowing_gby.q,vectorization_part_varchar.q,vector_orderby_5.q,vector_outer_join6.q,union9.q]
TestMiniTezCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=199)

[vector_partition_diff_num_cols.q,cbo_windowing.q,orc_ppd_schema_evol_3a.q,orc_merge5.q,vector_groupby_reduce.q,schema_evol_stats.q,union8.q,auto_sortmerge_join_16.q,auto_join29.q,merge1.q,correlationoptimizer1.q,cte_2.q,stats_filemetadata.q,ptf_streaming.q,vector_aggregate_without_gby.q]
TestMiniTezCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=201)


[jira] [Commented] (HIVE-17335) Join query with STREAMTABLE fails by java.lang.IndexOutOfBoundsException

2017-08-21 Thread Aleksey Vovchenko (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135423#comment-16135423
 ] 

Aleksey Vovchenko commented on HIVE-17335:
--

[~sershe]
Hi. 
We have this exception because Hive reorders tagOrder and filterOrder without 
filterMaps reordering. 
But I think filterMaps reordering is not a very good idea. So I have created a 
patch without changing order of filterMaps.

But now I have a problem. Left join works correctly, but right join returns 
wrong result. It happens because my fix changes order of processing(filters) 
rows from a storage. Can you take a look or give me an advice for how I can do 
it other way?

> Join query with STREAMTABLE fails by java.lang.IndexOutOfBoundsException
> 
>
> Key: HIVE-17335
> URL: https://issues.apache.org/jira/browse/HIVE-17335
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Aleksey Vovchenko
>Assignee: Aleksey Vovchenko
> Attachments: HIVE-17335-branch-2.1.patch
>
>
> Steps to reproduce this issue: 
> h2. STEP 1. Create test tables and insert some data 
> {noformat}
> hive> create table test1(x int, y int, z int);
> hive> create table test2(x int, y int, z int);
> {noformat}
> {noformat}
> hive> insert into table test1 values(1,1,1), (2,2,2);
> hive> insert into table test2 values(1,5,5), (2,6,6);
> {noformat}
> h2. STEP 2. Disable MapJoin
> {noformat}
> hive> set hive.auto.convert.join = false;
> {noformat}
> h2.STEP 3. Run query
> {noformat}
> select /*+ STREAMTABLE(test1) */ test1.*, test2.x from test1 left join test2 
> on test1.x = test2.x and test1.x > 1;
> {noformat}
> EXPECTED RESULT: 
> {noformat}
> OK
> 1 1   1   NULL
> 2 2   2   2
> {noformat}
> ACTUAL RESULT:
> {noformat} 
> 2017-08-17 00:36:46,305 Stage-1 map = 0%,  reduce = 0%
> 2017-08-17 00:36:51,708 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.25 
> sec
> 2017-08-17 00:36:52,761 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.35 
> sec
> 2017-08-17 00:37:17,137 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 2.35 sec
> MapReduce Total cumulative CPU time: 2 seconds 350 msec
> Ended Job = job_1502889241527_0005 with errors
> Error during job, obtaining debugging information...
> Examining task ID: task_1502889241527_0005_m_00 (and more) from job 
> job_1502889241527_0005
> Task with the most failures(4): 
> -
> Task ID:
>   task_1502889241527_0005_r_00
> -
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{"reducesinkkey0":1},"value":null}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:257)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) {"key":{"reducesinkkey0":1},"value":null}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:245)
>   ... 7 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.process(JoinOperator.java:138)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:236)
>   ... 7 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at org.apache.hadoop.hive.ql.exec.JoinUtil.isFiltered(JoinUtil.java:248)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getFilteredValue(CommonJoinOperator.java:420)
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.process(JoinOperator.java:91)
>   ... 8 more
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched: 
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17332) NullPointer exception when processing query

2017-08-21 Thread Lukas Waldmann (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135334#comment-16135334
 ] 

Lukas Waldmann commented on HIVE-17332:
---

HDP version 2.4
Ambari stacks and versions view:
{code}
HDFS2.7.1.2.4   Installed   Apache Hadoop Distributed File System
YARN2.7.1.2.4   Installed   Apache Hadoop NextGen MapReduce (YARN)
MapReduce2  2.7.1.2.4   Installed   Apache Hadoop NextGen MapReduce 
(YARN)
Tez 0.7.0.2.4   Installed   Tez is the next generation Hadoop Query 
Processing framework written on top of YARN.
Hive1.2.1.2.4   Installed   Data warehouse system for ad-hoc 
queries & analysis of large datasets and table & storage management service
HBase   1.1.2.2.4   Installed   A Non-relational distributed database, 
plus Phoenix, a high performance SQL layer for low latency applications.
{code}

If you need some hive settings variable let me know which one. There are too 
many of them just to post it here :)


> NullPointer exception when processing query
> ---
>
> Key: HIVE-17332
> URL: https://issues.apache.org/jira/browse/HIVE-17332
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Lukas Waldmann
>
> Hive query:
> {code}
> select count(*) from (select * from EXM_BASE_DATA, (select max(snapshot) 
> max_snapshot from EXM_BASE_DATA) s0 where snapshot == max_snapshot) t;
> {code}
> finish with NullPointer exception
> while 
> {code}
> select * from EXM_BASE_DATA, (select max(snapshot) max_snapshot from 
> EXM_BASE_DATA) s0 where snapshot == max_snapshot
> {code}
> is executed without error



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17332) NullPointer exception when processing query

2017-08-21 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135321#comment-16135321
 ] 

Zoltan Haindrich commented on HIVE-17332:
-

Hello [~luky],
I was not able to reproduce your problem either on branch-1 and using a recent 
hdp-2.5 release...could you tell some more about your settings/setup?

> NullPointer exception when processing query
> ---
>
> Key: HIVE-17332
> URL: https://issues.apache.org/jira/browse/HIVE-17332
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Lukas Waldmann
>
> Hive query:
> {code}
> select count(*) from (select * from EXM_BASE_DATA, (select max(snapshot) 
> max_snapshot from EXM_BASE_DATA) s0 where snapshot == max_snapshot) t;
> {code}
> finish with NullPointer exception
> while 
> {code}
> select * from EXM_BASE_DATA, (select max(snapshot) max_snapshot from 
> EXM_BASE_DATA) s0 where snapshot == max_snapshot
> {code}
> is executed without error



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17319) Make BoneCp configurable using hive properties in hive-site.xml

2017-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135264#comment-16135264
 ] 

Hive QA commented on HIVE-17319:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882869/HIVE-17319.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10999 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_custom_key] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6470/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6470/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6470/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882869 - PreCommit-HIVE-Build

> Make BoneCp configurable using hive properties in hive-site.xml
> ---
>
> Key: HIVE-17319
> URL: https://issues.apache.org/jira/browse/HIVE-17319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17319.01.patch, HIVE-17319.02.patch, 
> HIVE-17319.draft.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17335) Join query with STREAMTABLE fails by java.lang.IndexOutOfBoundsException

2017-08-21 Thread Aleksey Vovchenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Vovchenko updated HIVE-17335:
-
Status: Patch Available  (was: Open)

> Join query with STREAMTABLE fails by java.lang.IndexOutOfBoundsException
> 
>
> Key: HIVE-17335
> URL: https://issues.apache.org/jira/browse/HIVE-17335
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.1, 1.2.1
>Reporter: Aleksey Vovchenko
>Assignee: Aleksey Vovchenko
> Attachments: HIVE-17335-branch-2.1.patch
>
>
> Steps to reproduce this issue: 
> h2. STEP 1. Create test tables and insert some data 
> {noformat}
> hive> create table test1(x int, y int, z int);
> hive> create table test2(x int, y int, z int);
> {noformat}
> {noformat}
> hive> insert into table test1 values(1,1,1), (2,2,2);
> hive> insert into table test2 values(1,5,5), (2,6,6);
> {noformat}
> h2. STEP 2. Disable MapJoin
> {noformat}
> hive> set hive.auto.convert.join = false;
> {noformat}
> h2.STEP 3. Run query
> {noformat}
> select /*+ STREAMTABLE(test1) */ test1.*, test2.x from test1 left join test2 
> on test1.x = test2.x and test1.x > 1;
> {noformat}
> EXPECTED RESULT: 
> {noformat}
> OK
> 1 1   1   NULL
> 2 2   2   2
> {noformat}
> ACTUAL RESULT:
> {noformat} 
> 2017-08-17 00:36:46,305 Stage-1 map = 0%,  reduce = 0%
> 2017-08-17 00:36:51,708 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.25 
> sec
> 2017-08-17 00:36:52,761 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.35 
> sec
> 2017-08-17 00:37:17,137 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 2.35 sec
> MapReduce Total cumulative CPU time: 2 seconds 350 msec
> Ended Job = job_1502889241527_0005 with errors
> Error during job, obtaining debugging information...
> Examining task ID: task_1502889241527_0005_m_00 (and more) from job 
> job_1502889241527_0005
> Task with the most failures(4): 
> -
> Task ID:
>   task_1502889241527_0005_r_00
> -
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{"reducesinkkey0":1},"value":null}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:257)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) {"key":{"reducesinkkey0":1},"value":null}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:245)
>   ... 7 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.process(JoinOperator.java:138)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:236)
>   ... 7 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at org.apache.hadoop.hive.ql.exec.JoinUtil.isFiltered(JoinUtil.java:248)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getFilteredValue(CommonJoinOperator.java:420)
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.process(JoinOperator.java:91)
>   ... 8 more
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched: 
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17335) Join query with STREAMTABLE fails by java.lang.IndexOutOfBoundsException

2017-08-21 Thread Aleksey Vovchenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Vovchenko reassigned HIVE-17335:


Assignee: Aleksey Vovchenko

> Join query with STREAMTABLE fails by java.lang.IndexOutOfBoundsException
> 
>
> Key: HIVE-17335
> URL: https://issues.apache.org/jira/browse/HIVE-17335
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Aleksey Vovchenko
>Assignee: Aleksey Vovchenko
> Attachments: HIVE-17335-branch-2.1.patch
>
>
> Steps to reproduce this issue: 
> h2. STEP 1. Create test tables and insert some data 
> {noformat}
> hive> create table test1(x int, y int, z int);
> hive> create table test2(x int, y int, z int);
> {noformat}
> {noformat}
> hive> insert into table test1 values(1,1,1), (2,2,2);
> hive> insert into table test2 values(1,5,5), (2,6,6);
> {noformat}
> h2. STEP 2. Disable MapJoin
> {noformat}
> hive> set hive.auto.convert.join = false;
> {noformat}
> h2.STEP 3. Run query
> {noformat}
> select /*+ STREAMTABLE(test1) */ test1.*, test2.x from test1 left join test2 
> on test1.x = test2.x and test1.x > 1;
> {noformat}
> EXPECTED RESULT: 
> {noformat}
> OK
> 1 1   1   NULL
> 2 2   2   2
> {noformat}
> ACTUAL RESULT:
> {noformat} 
> 2017-08-17 00:36:46,305 Stage-1 map = 0%,  reduce = 0%
> 2017-08-17 00:36:51,708 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.25 
> sec
> 2017-08-17 00:36:52,761 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.35 
> sec
> 2017-08-17 00:37:17,137 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 2.35 sec
> MapReduce Total cumulative CPU time: 2 seconds 350 msec
> Ended Job = job_1502889241527_0005 with errors
> Error during job, obtaining debugging information...
> Examining task ID: task_1502889241527_0005_m_00 (and more) from job 
> job_1502889241527_0005
> Task with the most failures(4): 
> -
> Task ID:
>   task_1502889241527_0005_r_00
> -
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{"reducesinkkey0":1},"value":null}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:257)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) {"key":{"reducesinkkey0":1},"value":null}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:245)
>   ... 7 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.process(JoinOperator.java:138)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:236)
>   ... 7 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at org.apache.hadoop.hive.ql.exec.JoinUtil.isFiltered(JoinUtil.java:248)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getFilteredValue(CommonJoinOperator.java:420)
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.process(JoinOperator.java:91)
>   ... 8 more
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched: 
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17335) Join query with STREAMTABLE fails by java.lang.IndexOutOfBoundsException

2017-08-21 Thread Aleksey Vovchenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Vovchenko updated HIVE-17335:
-
Attachment: HIVE-17335-branch-2.1.patch

> Join query with STREAMTABLE fails by java.lang.IndexOutOfBoundsException
> 
>
> Key: HIVE-17335
> URL: https://issues.apache.org/jira/browse/HIVE-17335
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Aleksey Vovchenko
> Attachments: HIVE-17335-branch-2.1.patch
>
>
> Steps to reproduce this issue: 
> h2. STEP 1. Create test tables and insert some data 
> {noformat}
> hive> create table test1(x int, y int, z int);
> hive> create table test2(x int, y int, z int);
> {noformat}
> {noformat}
> hive> insert into table test1 values(1,1,1), (2,2,2);
> hive> insert into table test2 values(1,5,5), (2,6,6);
> {noformat}
> h2. STEP 2. Disable MapJoin
> {noformat}
> hive> set hive.auto.convert.join = false;
> {noformat}
> h2.STEP 3. Run query
> {noformat}
> select /*+ STREAMTABLE(test1) */ test1.*, test2.x from test1 left join test2 
> on test1.x = test2.x and test1.x > 1;
> {noformat}
> EXPECTED RESULT: 
> {noformat}
> OK
> 1 1   1   NULL
> 2 2   2   2
> {noformat}
> ACTUAL RESULT:
> {noformat} 
> 2017-08-17 00:36:46,305 Stage-1 map = 0%,  reduce = 0%
> 2017-08-17 00:36:51,708 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.25 
> sec
> 2017-08-17 00:36:52,761 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.35 
> sec
> 2017-08-17 00:37:17,137 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 2.35 sec
> MapReduce Total cumulative CPU time: 2 seconds 350 msec
> Ended Job = job_1502889241527_0005 with errors
> Error during job, obtaining debugging information...
> Examining task ID: task_1502889241527_0005_m_00 (and more) from job 
> job_1502889241527_0005
> Task with the most failures(4): 
> -
> Task ID:
>   task_1502889241527_0005_r_00
> -
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{"reducesinkkey0":1},"value":null}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:257)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) {"key":{"reducesinkkey0":1},"value":null}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:245)
>   ... 7 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.process(JoinOperator.java:138)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:236)
>   ... 7 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at org.apache.hadoop.hive.ql.exec.JoinUtil.isFiltered(JoinUtil.java:248)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getFilteredValue(CommonJoinOperator.java:420)
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.process(JoinOperator.java:91)
>   ... 8 more
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched: 
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17362) The MAX_PREWARM_TIME should be configurable on HoS

2017-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135201#comment-16135201
 ] 

Hive QA commented on HIVE-17362:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882867/HIVE-17362.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10994 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=231)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.testCliDriver[spark_job_max_tasks]
 (batchId=242)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.testCliDriver[spark_stage_max_tasks]
 (batchId=242)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6469/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6469/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6469/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882867 - PreCommit-HIVE-Build

> The MAX_PREWARM_TIME should be configurable on HoS
> --
>
> Key: HIVE-17362
> URL: https://issues.apache.org/jira/browse/HIVE-17362
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17362.patch
>
>
> When using HIVE_PREWARM_ENABLED, we are waiting MAX_PREWARM_TIME for the 
> containers to warm up. This is currently set to 5s. This is often not enough 
> for a spark session to initialize the executors. We should be able to 
> configure this, so we can set a value which has an effect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17319) Make BoneCp configurable using hive properties in hive-site.xml

2017-08-21 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-17319:
---
Attachment: HIVE-17319.02.patch

Rebased patch after the removal of Shims from HMS.

> Make BoneCp configurable using hive properties in hive-site.xml
> ---
>
> Key: HIVE-17319
> URL: https://issues.apache.org/jira/browse/HIVE-17319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17319.01.patch, HIVE-17319.02.patch, 
> HIVE-17319.draft.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17362) The MAX_PREWARM_TIME should be configurable on HoS

2017-08-21 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17362:
--
Status: Patch Available  (was: Open)

> The MAX_PREWARM_TIME should be configurable on HoS
> --
>
> Key: HIVE-17362
> URL: https://issues.apache.org/jira/browse/HIVE-17362
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17362.patch
>
>
> When using HIVE_PREWARM_ENABLED, we are waiting MAX_PREWARM_TIME for the 
> containers to warm up. This is currently set to 5s. This is often not enough 
> for a spark session to initialize the executors. We should be able to 
> configure this, so we can set a value which has an effect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17362) The MAX_PREWARM_TIME should be configurable on HoS

2017-08-21 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17362:
--
Attachment: HIVE-17362.patch

The patch contains the following changes:
- Add new configuration variable: HIVE_PREWARM_SPARK_TIMEOUT. Defaults to 
5000ms to retain backward compatibility
- Enabling prewarm for q-test configurations, and setting a 4m prewarm timeout
- Removing prewarm magic from QTestUtils

Let's see it this brakes anything in the QTest framework.

> The MAX_PREWARM_TIME should be configurable on HoS
> --
>
> Key: HIVE-17362
> URL: https://issues.apache.org/jira/browse/HIVE-17362
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17362.patch
>
>
> When using HIVE_PREWARM_ENABLED, we are waiting MAX_PREWARM_TIME for the 
> containers to warm up. This is currently set to 5s. This is often not enough 
> for a spark session to initialize the executors. We should be able to 
> configure this, so we can set a value which has an effect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17332) NullPointer exception when processing query

2017-08-21 Thread Lukas Waldmann (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135072#comment-16135072
 ] 

Lukas Waldmann commented on HIVE-17332:
---

Hi Zoltan,
snapshot is indeed partitioning column in my DB and is of type DATE.
Your sample fails on my DB with the same error. However in my db there is no 
null value in snapshot column.
So I actually tried your sample with only 'asd' row.
And guess what, query fails with the same error

we are using hive 1.2.1 and I don't think we will move to latest version 
anytime soon :(



> NullPointer exception when processing query
> ---
>
> Key: HIVE-17332
> URL: https://issues.apache.org/jira/browse/HIVE-17332
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Lukas Waldmann
>
> Hive query:
> {code}
> select count(*) from (select * from EXM_BASE_DATA, (select max(snapshot) 
> max_snapshot from EXM_BASE_DATA) s0 where snapshot == max_snapshot) t;
> {code}
> finish with NullPointer exception
> while 
> {code}
> select * from EXM_BASE_DATA, (select max(snapshot) max_snapshot from 
> EXM_BASE_DATA) s0 where snapshot == max_snapshot
> {code}
> is executed without error



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17332) NullPointer exception when processing query

2017-08-21 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135051#comment-16135051
 ] 

Zoltan Haindrich edited comment on HIVE-17332 at 8/21/17 11:52 AM:
---

Hello [~luky],

I was not able to reproduce the issue on the current master...it might be 
possible that its not anymore present...
I've just guessed the column types..they might be more relevantand my guess 
is that the problem might be sensitive to some null value in some column - does 
the snapshot column contains null values?
Could you check the following case:

{code}
create table EXM_BASE_DATA (key string) partitioned by (snapshot int) stored as 
orc;
insert into EXM_BASE_DATA partition(snapshot=1) values ('asd') ;
insert into EXM_BASE_DATA partition(snapshot=2) values (null) ;
set hive.auto.convert.join = false;
select count(*) from
(select * from EXM_BASE_DATA join (select max(snapshot) max_snapshot 
from EXM_BASE_DATA) s0 on (snapshot = max_snapshot)) t;
{code}


or this more sophisticated one:

{code}
set hive.exec.dynamic.partition.mode=nonstrict;

create table t0 (key string,snapshot int);
insert into t0 values (1,'asd');
insert into t0 values (null,'asd2');
insert into t0 values (2,null);
insert into t0 values (null,null);


create table EXM_BASE_DATA (key string) partitioned by (snapshot int) stored as 
orc;

from t0
insert into EXM_BASE_DATA partition(snapshot) select key,snapshot;

insert into EXM_BASE_DATA partition(snapshot=1) values ('asdx') ;
insert into EXM_BASE_DATA partition(snapshot=1) values ('asd') ;
insert into EXM_BASE_DATA partition(snapshot=2) values (null) ;

set hive.auto.convert.join = false;

select count(*) from
(select * from EXM_BASE_DATA join (select max(snapshot) max_snapshot 
from EXM_BASE_DATA) s0 on (snapshot = max_snapshot)) t;

{code}


was (Author: kgyrtkirk):
Hello [~luky],

I was not able to reproduce the issue on the current master...it might be 
possible that its not anymore present...
I've just guessed the column types..they might be more relevantand my guess 
is that the problem might be sensitive to some null value in some column - does 
the snapshot column contains null values?
Could you check the following case:

{code}
create table EXM_BASE_DATA (key string) partitioned by (snapshot int) stored as 
orc;
insert into EXM_BASE_DATA partition(snapshot=1) values ('asd') ;
insert into EXM_BASE_DATA partition(snapshot=2) values (null) ;
set hive.auto.convert.join = false;
select count(*) from
(select * from EXM_BASE_DATA join (select max(snapshot) max_snapshot 
from EXM_BASE_DATA) s0 on (snapshot = max_snapshot)) t;
{code}


> NullPointer exception when processing query
> ---
>
> Key: HIVE-17332
> URL: https://issues.apache.org/jira/browse/HIVE-17332
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Lukas Waldmann
>
> Hive query:
> {code}
> select count(*) from (select * from EXM_BASE_DATA, (select max(snapshot) 
> max_snapshot from EXM_BASE_DATA) s0 where snapshot == max_snapshot) t;
> {code}
> finish with NullPointer exception
> while 
> {code}
> select * from EXM_BASE_DATA, (select max(snapshot) max_snapshot from 
> EXM_BASE_DATA) s0 where snapshot == max_snapshot
> {code}
> is executed without error



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16747) Remove YETUS*.sh files after a YETUS release

2017-08-21 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135071#comment-16135071
 ] 

Peter Vary commented on HIVE-16747:
---

HIVE-17107 will solve this

> Remove YETUS*.sh files after a YETUS release
> 
>
> Key: HIVE-16747
> URL: https://issues.apache.org/jira/browse/HIVE-16747
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> For HIVE-15051 we had to add patched YETUS files which contains YETUS fixes 
> which are not yet released:
> - dev-support/checkstyle_YETUS-484.sh
> - dev-support/findbugs_YETUS-471.sh
> - dev-support/maven_YETUS-506.sh
> When there is a new YETUS release, then we have to move to the new release, 
> and remove the files.
> Also we have to remove the cp commands from the {{yetus-wrapper.sh}} in 3x3 
> places:
> {code}
> 75  cp ${BINDIR}/findbugs_YETUS-471.sh 
> ${YETUS_HOME}/lib/precommit/test-patch.d/findbugs.sh
> 76  cp ${BINDIR}/checkstyle_YETUS-484.sh 
> ${YETUS_HOME}/lib/precommit/test-patch.d/checkstyle.sh
> 77  cp ${BINDIR}/maven_YETUS-506.sh 
> ${YETUS_HOME}/lib/precommit/test-patch.d/maven.sh
> {code}
> {code}
> 101 cp ${BINDIR}/findbugs_YETUS-471.sh 
> ${HIVE_PATCHPROCESS}/yetus-${HIVE_YETUS_VERSION}/lib/precommit/test-patch.d/findbugs.sh
> 102 cp ${BINDIR}/checkstyle_YETUS-484.sh 
> ${HIVE_PATCHPROCESS}/yetus-${HIVE_YETUS_VERSION}/lib/precommit/test-patch.d/checkstyle.sh
> 103 cp ${BINDIR}/maven_YETUS-506.sh 
> ${HIVE_PATCHPROCESS}/yetus-${HIVE_YETUS_VERSION}/lib/precommit/test-patch.d/maven.sh
> {code}
> {code}
> 175 cp ${BINDIR}/findbugs_YETUS-471.sh 
> ${HIVE_PATCHPROCESS}/yetus-${HIVE_YETUS_VERSION}/lib/precommit/test-patch.d/findbugs.sh
> 176 cp ${BINDIR}/checkstyle_YETUS-484.sh 
> ${HIVE_PATCHPROCESS}/yetus-${HIVE_YETUS_VERSION}/lib/precommit/test-patch.d/checkstyle.sh
> 177 cp ${BINDIR}/maven_YETUS-506.sh 
> ${HIVE_PATCHPROCESS}/yetus-${HIVE_YETUS_VERSION}/lib/precommit/test-patch.d/maven.sh
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17332) NullPointer exception when processing query

2017-08-21 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135051#comment-16135051
 ] 

Zoltan Haindrich commented on HIVE-17332:
-

Hello [~luky],

I was not able to reproduce the issue on the current master...it might be 
possible that its not anymore present...
I've just guessed the column types..they might be more relevantand my guess 
is that the problem might be sensitive to some null value in some column - does 
the snapshot column contains null values?
Could you check the following case:

{code}
create table EXM_BASE_DATA (key string) partitioned by (snapshot int) stored as 
orc;
insert into EXM_BASE_DATA partition(snapshot=1) values ('asd') ;
insert into EXM_BASE_DATA partition(snapshot=2) values (null) ;
set hive.auto.convert.join = false;
select count(*) from
(select * from EXM_BASE_DATA join (select max(snapshot) max_snapshot 
from EXM_BASE_DATA) s0 on (snapshot = max_snapshot)) t;
{code}


> NullPointer exception when processing query
> ---
>
> Key: HIVE-17332
> URL: https://issues.apache.org/jira/browse/HIVE-17332
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Lukas Waldmann
>
> Hive query:
> {code}
> select count(*) from (select * from EXM_BASE_DATA, (select max(snapshot) 
> max_snapshot from EXM_BASE_DATA) s0 where snapshot == max_snapshot) t;
> {code}
> finish with NullPointer exception
> while 
> {code}
> select * from EXM_BASE_DATA, (select max(snapshot) max_snapshot from 
> EXM_BASE_DATA) s0 where snapshot == max_snapshot
> {code}
> is executed without error



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17356) Missing ASF headers 3 classes

2017-08-21 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17356:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master.
Thanks [~zsombor.klara]!

> Missing ASF headers 3 classes
> -
>
> Key: HIVE-17356
> URL: https://issues.apache.org/jira/browse/HIVE-17356
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-17356.01.patch
>
>
> JSONAddNotNullConstraintMessage.java, BucketCodec.java, TaskTrackerTest.java 
> are missing the ASF header that should be added.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17332) NullPointer exception when processing query

2017-08-21 Thread Lukas Waldmann (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135026#comment-16135026
 ] 

Lukas Waldmann edited comment on HIVE-17332 at 8/21/17 10:54 AM:
-

Query:
{code}
select count(*) from (select * from EXM_BASE_DATA join (select max(snapshot) 
max_snapshot from EXM_BASE_DATA) s0 on (snapshot = max_snapshot)) t;
{code}
causes the same error


was (Author: luky):
Query:
select count(*) from (select * from EXM_BASE_DATA join (select max(snapshot) 
max_snapshot from EXM_BASE_DATA) s0 on (snapshot = max_snapshot)) t;

causes the same error

> NullPointer exception when processing query
> ---
>
> Key: HIVE-17332
> URL: https://issues.apache.org/jira/browse/HIVE-17332
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Lukas Waldmann
>
> Hive query:
> {code}
> select count(*) from (select * from EXM_BASE_DATA, (select max(snapshot) 
> max_snapshot from EXM_BASE_DATA) s0 where snapshot == max_snapshot) t;
> {code}
> finish with NullPointer exception
> while 
> {code}
> select * from EXM_BASE_DATA, (select max(snapshot) max_snapshot from 
> EXM_BASE_DATA) s0 where snapshot == max_snapshot
> {code}
> is executed without error



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17332) NullPointer exception when processing query

2017-08-21 Thread Lukas Waldmann (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135026#comment-16135026
 ] 

Lukas Waldmann commented on HIVE-17332:
---

Query:
select count(*) from (select * from EXM_BASE_DATA join (select max(snapshot) 
max_snapshot from EXM_BASE_DATA) s0 on (snapshot = max_snapshot)) t;

causes the same error

> NullPointer exception when processing query
> ---
>
> Key: HIVE-17332
> URL: https://issues.apache.org/jira/browse/HIVE-17332
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Lukas Waldmann
>
> Hive query:
> {code}
> select count(*) from (select * from EXM_BASE_DATA, (select max(snapshot) 
> max_snapshot from EXM_BASE_DATA) s0 where snapshot == max_snapshot) t;
> {code}
> finish with NullPointer exception
> while 
> {code}
> select * from EXM_BASE_DATA, (select max(snapshot) max_snapshot from 
> EXM_BASE_DATA) s0 where snapshot == max_snapshot
> {code}
> is executed without error



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17344) LocalCache element memory usage is not calculated properly.

2017-08-21 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135008#comment-16135008
 ] 

Zoltan Haindrich commented on HIVE-17344:
-

[~sershe] I think it is currently true that remaining == capacity...but the 
underlying OrcTail resets the position to 0 when it reads explicitly...so if 
the bb position is not at the beginning remainig() would could underestimate 
the memory usage...so I think using capacity would be better...because in that 
case it may only overestimate memory usage...and not under (but there is the 
case when someone uses windows...)

I haven't seen any references where a bigger bb was sliced and passed to this 
function...if that happens...the capacity of the sliced bb is the old 
limit...which is ok...

I agree that currently capacity/remaining will do the same...but capacity would 
be more explicit

> LocalCache element memory usage is not calculated properly.
> ---
>
> Key: HIVE-17344
> URL: https://issues.apache.org/jira/browse/HIVE-17344
> Project: Hive
>  Issue Type: Bug
>Reporter: Janos Gub
>Assignee: Janos Gub
> Attachments: HIVE-17344.patch
>
>
> Orc footer cache has a calculation of memory usage:
> {code:java}
> public int getMemoryUsage() {
>   return bb.remaining() + 100; // 100 is for 2 longs, BB and java overheads 
> (semi-arbitrary).
> }
> {code}
> ByteBuffer.remaining returns the remaining space in the bytebuffer, thus 
> allowing this cache have elements MAXWEIGHT/100 of arbitrary size. I think 
> the correct solution would be bb.capacity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17344) LocalCache element memory usage is not calculated properly.

2017-08-21 Thread Janos Gub (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135009#comment-16135009
 ] 

Janos Gub commented on HIVE-17344:
--

Sorry my bad. So when bytebuffers sliced position and limit will be set in a 
way that remaining returns the size of the slice. If there are overlapped 
slices it will be counted twice, but capacity would not help in this case 
neither.

> LocalCache element memory usage is not calculated properly.
> ---
>
> Key: HIVE-17344
> URL: https://issues.apache.org/jira/browse/HIVE-17344
> Project: Hive
>  Issue Type: Bug
>Reporter: Janos Gub
>Assignee: Janos Gub
> Attachments: HIVE-17344.patch
>
>
> Orc footer cache has a calculation of memory usage:
> {code:java}
> public int getMemoryUsage() {
>   return bb.remaining() + 100; // 100 is for 2 longs, BB and java overheads 
> (semi-arbitrary).
> }
> {code}
> ByteBuffer.remaining returns the remaining space in the bytebuffer, thus 
> allowing this cache have elements MAXWEIGHT/100 of arbitrary size. I think 
> the correct solution would be bb.capacity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15104) Hive on Spark generate more shuffle data than hive on mr

2017-08-21 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-15104:
--
Attachment: HIVE-15104.5.patch

Run tests with the switch on.

> Hive on Spark generate more shuffle data than hive on mr
> 
>
> Key: HIVE-15104
> URL: https://issues.apache.org/jira/browse/HIVE-15104
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1
>Reporter: wangwenli
>Assignee: Rui Li
> Attachments: HIVE-15104.1.patch, HIVE-15104.2.patch, 
> HIVE-15104.3.patch, HIVE-15104.4.patch, HIVE-15104.5.patch, TPC-H 100G.xlsx
>
>
> the same sql,  running on spark  and mr engine, will generate different size 
> of shuffle data.
> i think it is because of hive on mr just serialize part of HiveKey, but hive 
> on spark which using kryo will serialize full of Hivekey object.  
> what is your opionion?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17362) The MAX_PREWARM_TIME should be configurable on HoS

2017-08-21 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-17362:
-


> The MAX_PREWARM_TIME should be configurable on HoS
> --
>
> Key: HIVE-17362
> URL: https://issues.apache.org/jira/browse/HIVE-17362
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> When using HIVE_PREWARM_ENABLED, we are waiting MAX_PREWARM_TIME for the 
> containers to warm up. This is currently set to 5s. This is often not enough 
> for a spark session to initialize the executors. We should be able to 
> configure this, so we can set a value which has an effect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17344) LocalCache element memory usage is not calculated properly.

2017-08-21 Thread Janos Gub (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134858#comment-16134858
 ] 

Janos Gub commented on HIVE-17344:
--

bq. it's either allocated exactly to size
If it is allocated to size, then the bb.remaining will be 0 right?
bq. read from disk the same way
Then here it is 0 also?

The weigher in LocalCache sums up the getMemoryUsage of the TailAndFileData.
If the remaining size is ALWAYS 0, how HIVE-16133 is limiting the maximum size 
of the cache? (Or how is it different from setting the maxsize of the cache to 
limit the number of elements in it)
Isn't HIVE-16133 about restricting the total memory usage of cache?

cc [~hagleitn]

> LocalCache element memory usage is not calculated properly.
> ---
>
> Key: HIVE-17344
> URL: https://issues.apache.org/jira/browse/HIVE-17344
> Project: Hive
>  Issue Type: Bug
>Reporter: Janos Gub
>Assignee: Janos Gub
> Attachments: HIVE-17344.patch
>
>
> Orc footer cache has a calculation of memory usage:
> {code:java}
> public int getMemoryUsage() {
>   return bb.remaining() + 100; // 100 is for 2 longs, BB and java overheads 
> (semi-arbitrary).
> }
> {code}
> ByteBuffer.remaining returns the remaining space in the bytebuffer, thus 
> allowing this cache have elements MAXWEIGHT/100 of arbitrary size. I think 
> the correct solution would be bb.capacity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)