date:20201209

[jira] [Work logged] (HIVE-24274) Implement Query Text based MaterializedView rewrite

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24274?focusedWorklogId=522580=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522580
 ]

ASF GitHub Bot logged work on HIVE-24274:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 07:11
Start Date: 10/Dec/20 07:11
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1706:
URL: https://github.com/apache/hive/pull/1706#discussion_r539925057



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -1844,6 +1844,9 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 // materialized views
 
HIVE_MATERIALIZED_VIEW_ENABLE_AUTO_REWRITING("hive.materializedview.rewriting", 
true,
 "Whether to try to rewrite queries using the materialized views 
enabled for rewriting"),
+
HIVE_MATERIALIZED_VIEW_ENABLE_AUTO_REWRITING_QUERY_TEXT("hive.materializedview.rewriting.query.text",
 true,

Review comment:
   renamed to `hive.materializedview.rewriting.sql`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522580)
Time Spent: 40m  (was: 0.5h)

> Implement Query Text based MaterializedView rewrite
> ---
>
> Key: HIVE-24274
> URL: https://issues.apache.org/jira/browse/HIVE-24274
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Besides the way queries are currently rewritten to use materialized views in 
> Hive this project provides an alternative:
> Compare the query text with the materialized views query text stored. If we 
> found a match the original query's logical plan can be replaced by a scan on 
> the materialized view.
> - Only materialized views which are enabled to rewrite can participate
> - Use existing *HiveMaterializedViewsRegistry* through *Hive* object by 
> adding a lookup method by query text.
> - There might be more than one materialized views which have the same query 
> text. In this case chose the first valid one.
> - Validation can be done by calling 
> *Hive.validateMaterializedViewsFromRegistry()*
> - The scope of this first patch is rewriting queries which entire text can be 
> matched only.
> - Use the expanded query text (fully qualified column and table names) for 
> comparing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout in cloud environment

2020-12-09 Thread Prasanth Jayachandran (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-24497.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master! Thanks for your contribution!

> Node heartbeats from LLAP Daemon to the client are not matching leading to 
> timeout in cloud environment
> ---
>
> Key: HIVE-24497
> URL: https://issues.apache.org/jira/browse/HIVE-24497
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: hive-24497.01.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Node heartbeat contains info about all the tasks that were submitted to that 
> LLAP Daemon. In cloud deployment, the client is not able to match this 
> heartbeats due to differences in hostname and port .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout in cloud environment

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24497?focusedWorklogId=522559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522559
 ]

ASF GitHub Bot logged work on HIVE-24497:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 05:33
Start Date: 10/Dec/20 05:33
Worklog Time Spent: 10m 
  Work Description: prasanthj merged pull request #1755:
URL: https://github.com/apache/hive/pull/1755


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522559)
Time Spent: 50m  (was: 40m)

> Node heartbeats from LLAP Daemon to the client are not matching leading to 
> timeout in cloud environment
> ---
>
> Key: HIVE-24497
> URL: https://issues.apache.org/jira/browse/HIVE-24497
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Minor
>  Labels: pull-request-available
> Attachments: hive-24497.01.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Node heartbeat contains info about all the tasks that were submitted to that 
> LLAP Daemon. In cloud deployment, the client is not able to match this 
> heartbeats due to differences in hostname and port .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522547
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 04:51
Start Date: 10/Dec/20 04:51
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1736:
URL: https://github.com/apache/hive/pull/1736#discussion_r539843531



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
##
@@ -712,6 +751,12 @@ private void processKey(Object row,
 
   @Override
   public void process(Object row, int tag) throws HiveException {
+if (hashAggr) {
+  if (getConfiguration().get("forced.streaming.mode", 
"false").equals("true")) {

Review comment:
   i have removed it in the next commit ..had added for test only.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522547)
Time Spent: 1h 20m  (was: 1h 10m)

> Add support for combiner in hash mode group aggregation 
> 
>
> Key: HIVE-24471
> URL: https://issues.apache.org/jira/browse/HIVE-24471
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used. If the hash table 
> size increases beyond configurable limit, data is flushed to disk and new 
> hash table is generated. If the reduction by hash table is less than min hash 
> aggregation reduction calculated during compile time, the map side 
> aggregation is converted to streaming mode. So if the first few batch of 
> records does not result into significant reduction, then the mode is switched 
> to streaming mode. This may have impact on performance, if the subsequent 
> batch of records have less number of distinct values. To mitigate this 
> situation, a combiner can be added to the map task after the keys are sorted. 
> This will make sure that the aggregation is done if possible and reduce the 
> data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522546=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522546
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 04:50
Start Date: 10/Dec/20 04:50
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1736:
URL: https://github.com/apache/hive/pull/1736#discussion_r539843194



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByCombiner.java
##
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByCombiner;
+import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.BaseWork;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.ReduceWork;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.serde2.AbstractSerDe;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.Serializer;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DataInputBuffer;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.apache.tez.runtime.api.TaskContext;
+import org.apache.tez.runtime.library.common.sort.impl.IFile;
+import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+
+import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK;
+import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME;
+
+// Combiner for normal group by operator. In case of map side aggregate, the 
partially
+// aggregated records are sorted based on group by key. If because of some 
reasons, like hash
+// table memory exceeded the limit or the first few batches of records have 
less ndvs, the
+// aggregation is not done, then here the aggregation can be done cheaply as 
the records
+// are sorted based on group by key.
+public class GroupByCombiner extends VectorGroupByCombiner {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+  org.apache.hadoop.hive.ql.exec.GroupByCombiner.class.getName());
+
+  private transient GenericUDAFEvaluator[] aggregationEvaluators;
+  Deserializer valueDeserializer;
+  GenericUDAFEvaluator.AggregationBuffer[] aggregationBuffers;
+  GroupByOperator groupByOperator;
+  Serializer valueSerializer;
+  ObjectInspector aggrObjectInspector;
+  DataInputBuffer valueBuffer;
+  Object[] cachedValues;
+
+  public GroupByCombiner(TaskContext taskContext) throws HiveException, 
IOException {
+super(taskContext);
+if (rw != null) {
+  try {
+groupByOperator = (GroupByOperator) rw.getReducer();
+
+ArrayList ois = new ArrayList();
+ois.add(keyObjectInspector);
+ois.add(valueObjectInspector);
+ObjectInspector[] rowObjectInspector = new ObjectInspector[1];
+rowObjectInspector[0] =
+
ObjectInspectorFactory.getStandardStructObjectInspector(Utilities.reduceFieldNameList,
+ois);
+groupByOperator.setInputObjInspectors(rowObjectInspector);
+groupByOperator.initializeOp(conf);
+aggregationBuffers = groupByOperator.getAggregationBuffers();
+aggregationEvaluators = groupByOperator.getAggregationEvaluator();
+
+TableDesc valueTableDesc = rw.getTagToValueDesc().get(0);
+valueSerializer

[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522544
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 04:47
Start Date: 10/Dec/20 04:47
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1736:
URL: https://github.com/apache/hive/pull/1736#discussion_r539842254



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByCombiner.java
##
@@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.exec.mr.ExecReducer;
+import 
org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorAggregateExpression;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.ReduceWork;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.serde2.AbstractSerDe;
+import org.apache.hadoop.hive.serde2.ByteStream;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinaryDeserializeRead;
+import org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinarySerializeWrite;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.DataInputBuffer;
+import org.apache.hadoop.mapreduce.TaskCounter;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.tez.common.TezUtils;
+import org.apache.tez.common.counters.TezCounter;
+import org.apache.tez.mapreduce.combine.MRCombiner;
+import org.apache.tez.runtime.api.TaskContext;
+import org.apache.tez.runtime.library.common.sort.impl.IFile;
+import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK;
+import static org.apache.hadoop.hive.ql.exec.Utilities.MAPRED_REDUCER_CLASS;
+import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME;
+import static 
org.apache.hadoop.hive.serde2.lazy.fast.LazySimpleDeserializeRead.byteArrayCompareRanges;
+
+// Combiner for vectorized group by operator. In case of map side aggregate, 
the partially
+// aggregated records are sorted based on group by key. If because of some 
reasons, like hash
+// table memory exceeded the limit or the first few batches of records have 
less ndvs, the
+// aggregation is not done, then here the aggregation can be done cheaply as 
the records
+// are sorted based on group by key.
+public class VectorGroupByCombiner extends MRCombiner {
+  private static final Logger LOG = LoggerFactory.getLogger(
+  VectorGroupByCombiner.class.getName());
+  protected final Configuration conf;
+  protected final TezCounter combineInputRecordsCounter;
+  protected final TezCounter combineOutputRecordsCounter;
+  VectorAggregateExpression[] aggregators;
+  VectorAggregationBufferRow aggregationBufferRow;
+  protected transient LazyBinarySerializeWrite valueLazyBinarySerializeWrite;
+
+  // This helper object serializes LazyBinary format reducer values from 
columns of a row
+  // in a vectorized row batch.
+  protected transient VectorSerializeRow 
valueVectorSerializeRow;
+
+  // The output buffer used to serialize a value into.
+  protected transient ByteStream.Output valueOutput;
+  DataInputBuffer valueBytesWritable;
+
+  // Only required minimal configs are copied to the worker nodes. This hack 
(file.) is
+  // done to include these configs to be copied to the worker node.
+  protected static String

[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522545
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 04:47
Start Date: 10/Dec/20 04:47
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1736:
URL: https://github.com/apache/hive/pull/1736#discussion_r539842254



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByCombiner.java
##
@@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.exec.mr.ExecReducer;
+import 
org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorAggregateExpression;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.ReduceWork;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.serde2.AbstractSerDe;
+import org.apache.hadoop.hive.serde2.ByteStream;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinaryDeserializeRead;
+import org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinarySerializeWrite;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.DataInputBuffer;
+import org.apache.hadoop.mapreduce.TaskCounter;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.tez.common.TezUtils;
+import org.apache.tez.common.counters.TezCounter;
+import org.apache.tez.mapreduce.combine.MRCombiner;
+import org.apache.tez.runtime.api.TaskContext;
+import org.apache.tez.runtime.library.common.sort.impl.IFile;
+import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK;
+import static org.apache.hadoop.hive.ql.exec.Utilities.MAPRED_REDUCER_CLASS;
+import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME;
+import static 
org.apache.hadoop.hive.serde2.lazy.fast.LazySimpleDeserializeRead.byteArrayCompareRanges;
+
+// Combiner for vectorized group by operator. In case of map side aggregate, 
the partially
+// aggregated records are sorted based on group by key. If because of some 
reasons, like hash
+// table memory exceeded the limit or the first few batches of records have 
less ndvs, the
+// aggregation is not done, then here the aggregation can be done cheaply as 
the records
+// are sorted based on group by key.
+public class VectorGroupByCombiner extends MRCombiner {
+  private static final Logger LOG = LoggerFactory.getLogger(
+  VectorGroupByCombiner.class.getName());
+  protected final Configuration conf;
+  protected final TezCounter combineInputRecordsCounter;
+  protected final TezCounter combineOutputRecordsCounter;
+  VectorAggregateExpression[] aggregators;
+  VectorAggregationBufferRow aggregationBufferRow;
+  protected transient LazyBinarySerializeWrite valueLazyBinarySerializeWrite;
+
+  // This helper object serializes LazyBinary format reducer values from 
columns of a row
+  // in a vectorized row batch.
+  protected transient VectorSerializeRow 
valueVectorSerializeRow;
+
+  // The output buffer used to serialize a value into.
+  protected transient ByteStream.Output valueOutput;
+  DataInputBuffer valueBytesWritable;
+
+  // Only required minimal configs are copied to the worker nodes. This hack 
(file.) is
+  // done to include these configs to be copied to the worker node.
+  protected static String

[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522542
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 04:46
Start Date: 10/Dec/20 04:46
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1736:
URL: https://github.com/apache/hive/pull/1736#discussion_r539842047



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByCombiner.java
##
@@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.exec.mr.ExecReducer;
+import 
org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorAggregateExpression;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.ReduceWork;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.serde2.AbstractSerDe;
+import org.apache.hadoop.hive.serde2.ByteStream;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinaryDeserializeRead;
+import org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinarySerializeWrite;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.DataInputBuffer;
+import org.apache.hadoop.mapreduce.TaskCounter;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.tez.common.TezUtils;
+import org.apache.tez.common.counters.TezCounter;
+import org.apache.tez.mapreduce.combine.MRCombiner;
+import org.apache.tez.runtime.api.TaskContext;
+import org.apache.tez.runtime.library.common.sort.impl.IFile;
+import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK;
+import static org.apache.hadoop.hive.ql.exec.Utilities.MAPRED_REDUCER_CLASS;
+import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME;
+import static 
org.apache.hadoop.hive.serde2.lazy.fast.LazySimpleDeserializeRead.byteArrayCompareRanges;
+
+// Combiner for vectorized group by operator. In case of map side aggregate, 
the partially
+// aggregated records are sorted based on group by key. If because of some 
reasons, like hash
+// table memory exceeded the limit or the first few batches of records have 
less ndvs, the
+// aggregation is not done, then here the aggregation can be done cheaply as 
the records
+// are sorted based on group by key.
+public class VectorGroupByCombiner extends MRCombiner {
+  private static final Logger LOG = LoggerFactory.getLogger(
+  VectorGroupByCombiner.class.getName());
+  protected final Configuration conf;
+  protected final TezCounter combineInputRecordsCounter;
+  protected final TezCounter combineOutputRecordsCounter;
+  VectorAggregateExpression[] aggregators;
+  VectorAggregationBufferRow aggregationBufferRow;
+  protected transient LazyBinarySerializeWrite valueLazyBinarySerializeWrite;
+
+  // This helper object serializes LazyBinary format reducer values from 
columns of a row
+  // in a vectorized row batch.
+  protected transient VectorSerializeRow 
valueVectorSerializeRow;
+
+  // The output buffer used to serialize a value into.
+  protected transient ByteStream.Output valueOutput;
+  DataInputBuffer valueBytesWritable;
+
+  // Only required minimal configs are copied to the worker nodes. This hack 
(file.) is
+  // done to include these configs to be copied to the worker node.
+  protected static String

[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522540
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 04:45
Start Date: 10/Dec/20 04:45
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #1736:
URL: https://github.com/apache/hive/pull/1736#discussion_r539841639



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByCombiner.java
##
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByCombiner;
+import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.BaseWork;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.ReduceWork;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.serde2.AbstractSerDe;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.Serializer;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DataInputBuffer;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.apache.tez.runtime.api.TaskContext;
+import org.apache.tez.runtime.library.common.sort.impl.IFile;
+import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+
+import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK;
+import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME;
+
+// Combiner for normal group by operator. In case of map side aggregate, the 
partially
+// aggregated records are sorted based on group by key. If because of some 
reasons, like hash
+// table memory exceeded the limit or the first few batches of records have 
less ndvs, the
+// aggregation is not done, then here the aggregation can be done cheaply as 
the records
+// are sorted based on group by key.
+public class GroupByCombiner extends VectorGroupByCombiner {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+  org.apache.hadoop.hive.ql.exec.GroupByCombiner.class.getName());
+
+  private transient GenericUDAFEvaluator[] aggregationEvaluators;
+  Deserializer valueDeserializer;
+  GenericUDAFEvaluator.AggregationBuffer[] aggregationBuffers;
+  GroupByOperator groupByOperator;
+  Serializer valueSerializer;
+  ObjectInspector aggrObjectInspector;
+  DataInputBuffer valueBuffer;
+  Object[] cachedValues;
+
+  public GroupByCombiner(TaskContext taskContext) throws HiveException, 
IOException {
+super(taskContext);
+if (rw != null) {
+  try {
+groupByOperator = (GroupByOperator) rw.getReducer();
+
+ArrayList ois = new ArrayList();
+ois.add(keyObjectInspector);
+ois.add(valueObjectInspector);
+ObjectInspector[] rowObjectInspector = new ObjectInspector[1];
+rowObjectInspector[0] =
+
ObjectInspectorFactory.getStandardStructObjectInspector(Utilities.reduceFieldNameList,
+ois);
+groupByOperator.setInputObjInspectors(rowObjectInspector);
+groupByOperator.initializeOp(conf);
+aggregationBuffers = groupByOperator.getAggregationBuffers();
+aggregationEvaluators = groupByOperator.getAggregationEvaluator();
+
+TableDesc valueTableDesc = rw.getTagToValueDesc().get(0);
+valueSerializer

[jira] [Work logged] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout in cloud environment

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24497?focusedWorklogId=522536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522536
 ]

ASF GitHub Bot logged work on HIVE-24497:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 04:39
Start Date: 10/Dec/20 04:39
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on pull request #1755:
URL: https://github.com/apache/hive/pull/1755#issuecomment-742234677


   Thanks @prasanthj for the review, I have made the recommended changes . 
Please have a check.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522536)
Time Spent: 40m  (was: 0.5h)

> Node heartbeats from LLAP Daemon to the client are not matching leading to 
> timeout in cloud environment
> ---
>
> Key: HIVE-24497
> URL: https://issues.apache.org/jira/browse/HIVE-24497
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Minor
>  Labels: pull-request-available
> Attachments: hive-24497.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Node heartbeat contains info about all the tasks that were submitted to that 
> LLAP Daemon. In cloud deployment, the client is not able to match this 
> heartbeats due to differences in hostname and port .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522535
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 04:39
Start Date: 10/Dec/20 04:39
Worklog Time Spent: 10m 
  Work Description: t3rmin4t0r commented on a change in pull request #1736:
URL: https://github.com/apache/hive/pull/1736#discussion_r539838422



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByCombiner.java
##
@@ -0,0 +1,246 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByCombiner;
+import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.BaseWork;
+import org.apache.hadoop.hive.ql.plan.GroupByDesc;
+import org.apache.hadoop.hive.ql.plan.ReduceWork;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.serde2.AbstractSerDe;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.Serializer;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import org.apache.hadoop.io.BytesWritable;
+import org.apache.hadoop.io.DataInputBuffer;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.apache.tez.runtime.api.TaskContext;
+import org.apache.tez.runtime.library.common.sort.impl.IFile;
+import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+
+import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK;
+import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME;
+
+// Combiner for normal group by operator. In case of map side aggregate, the 
partially
+// aggregated records are sorted based on group by key. If because of some 
reasons, like hash
+// table memory exceeded the limit or the first few batches of records have 
less ndvs, the
+// aggregation is not done, then here the aggregation can be done cheaply as 
the records
+// are sorted based on group by key.
+public class GroupByCombiner extends VectorGroupByCombiner {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+  org.apache.hadoop.hive.ql.exec.GroupByCombiner.class.getName());
+
+  private transient GenericUDAFEvaluator[] aggregationEvaluators;
+  Deserializer valueDeserializer;
+  GenericUDAFEvaluator.AggregationBuffer[] aggregationBuffers;
+  GroupByOperator groupByOperator;
+  Serializer valueSerializer;
+  ObjectInspector aggrObjectInspector;
+  DataInputBuffer valueBuffer;
+  Object[] cachedValues;
+
+  public GroupByCombiner(TaskContext taskContext) throws HiveException, 
IOException {
+super(taskContext);
+if (rw != null) {
+  try {
+groupByOperator = (GroupByOperator) rw.getReducer();
+
+ArrayList ois = new ArrayList();
+ois.add(keyObjectInspector);
+ois.add(valueObjectInspector);
+ObjectInspector[] rowObjectInspector = new ObjectInspector[1];
+rowObjectInspector[0] =
+
ObjectInspectorFactory.getStandardStructObjectInspector(Utilities.reduceFieldNameList,
+ois);
+groupByOperator.setInputObjInspectors(rowObjectInspector);
+groupByOperator.initializeOp(conf);
+aggregationBuffers = groupByOperator.getAggregationBuffers();
+aggregationEvaluators = groupByOperator.getAggregationEvaluator();
+
+TableDesc valueTableDesc = rw.getTagToValueDesc().get(0);
+valueSerializer

[jira] [Work logged] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24207?focusedWorklogId=522521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522521
 ]

ASF GitHub Bot logged work on HIVE-24207:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 03:42
Start Date: 10/Dec/20 03:42
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on pull request #1556:
URL: https://github.com/apache/hive/pull/1556#issuecomment-742218526


   select  ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in 
(1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk limit 
100;  
   
   Above query runs in **80+ seconds** in a small cluster with cloud storage, 
where as with the patch it took just **4 seconds.** So that is good news. :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522521)
Time Spent: 50m  (was: 40m)

> LimitOperator can leverage ObjectCache to bail out quickly
> --
>
> Key: HIVE-24207
> URL: https://issues.apache.org/jira/browse/HIVE-24207
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {noformat}
> select  ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in 
> (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk 
> limit 100;
>  select distinct ss_sold_date_sk from store_sales, date_dim where 
> date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = 
> date_dim.d_date_sk limit 100;
>  {noformat}
> Queries like the above generate a large number of map tasks. Currently they 
> don't bail out after generating enough amount of data. 
> It would be good to make use of ObjectCache & retain the number of records 
> generated. LimitOperator/VectorLimitOperator can bail out for the later tasks 
> in the operator's init phase itself. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24207?focusedWorklogId=522520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522520
 ]

ASF GitHub Bot logged work on HIVE-24207:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 03:39
Start Date: 10/Dec/20 03:39
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1556:
URL: https://github.com/apache/hive/pull/1556#discussion_r539820370



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
##
@@ -1811,4 +1819,11 @@ static long parseRightmostXmx(String javaOpts) {
 }
 return allNonAppFileResources;
   }
+
+  public static void initTezAttributes(Configuration conf, ProcessorContext 
context) {

Review comment:
   Move this in TezProcessor itself? It is not DAG specific and is needed 
mainly for logging purposes. Having it within TezProcessor will make it easier 
to read and not confuse anyone trying to walk through DagUtils.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java
##
@@ -19,11 +19,18 @@
 package org.apache.hadoop.hive.ql.exec;
 
 import java.io.Serializable;
+import java.util.concurrent.Callable;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
 
 import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.tez.DagUtils;
+import org.apache.hadoop.hive.ql.exec.tez.LlapObjectCache;
 import org.apache.hadoop.hive.ql.metadata.HiveException;
 import org.apache.hadoop.hive.ql.plan.LimitDesc;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;

Review comment:
   remove this?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522520)
Time Spent: 40m  (was: 0.5h)

> LimitOperator can leverage ObjectCache to bail out quickly
> --
>
> Key: HIVE-24207
> URL: https://issues.apache.org/jira/browse/HIVE-24207
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> select  ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in 
> (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk 
> limit 100;
>  select distinct ss_sold_date_sk from store_sales, date_dim where 
> date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = 
> date_dim.d_date_sk limit 100;
>  {noformat}
> Queries like the above generate a large number of map tasks. Currently they 
> don't bail out after generating enough amount of data. 
> It would be good to make use of ObjectCache & retain the number of records 
> generated. LimitOperator/VectorLimitOperator can bail out for the later tasks 
> in the operator's init phase itself. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24513) Advance write Id during AlterTableDropConstraint DDL

2020-12-09 Thread Kishen Das (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das reassigned HIVE-24513:
-


> Advance write Id during AlterTableDropConstraint DDL
> 
>
> Key: HIVE-24513
> URL: https://issues.apache.org/jira/browse/HIVE-24513
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableDropConstraint related DDL tasks, although we might be 
> advancing the write ID, looks like it's not updated correctly during the 
> Analyzer phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22415) Upgrade to Java 11

2020-12-09 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246965#comment-17246965
 ] 

David Mollitor commented on HIVE-22415:
---

Alternatively, it may be possible to load each Mini Cluster (HDFS, ZK, Druid, 
Kafka, etc.) into its own class loader so that these library conflicts (JAR 
HELL) is averted.

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-22415) Upgrade to Java 11

2020-12-09 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246960#comment-17246960
 ] 

David Mollitor edited comment on HIVE-22415 at 12/10/20, 2:08 AM:
--

OK, just wanted to provide an update here.

 I have been working hard on getting Hadoop 3.3 working with Hive, so that JDK 
11 can be supported, and it's been a challenge. [HIVE-24484]

I have worked through some of the initial pain points, but I got stuck.  Hadoop 
introduced a new RPC mechanism using Google Protobuf v3.  Some of the LLAP 
stuff was built on top of the existing Hadoop RPC code with Protobuf2.  It 
seems that Hadoop tried to allow for interoperability between the two RPCs, 
however, loading one version of the RPC engine blocks the loading of the other 
one (first one wins).  I think this becomes an issue for QTests since the tests 
may spin up an LLAP and a Hadoop mini cluster in the same classloader context. 
Simply loading the Protobuf3 Hadoop RPC (NameNode) code blocks the loading of 
the Protobuf2 Hadoop RPC (LLAP) code.  Without any changes on the Hadoop side 
to better support this setup, the LLAP code needs to be migrated to use 
Protobuf3 and to use the Hadoop 3rd part JAR with its shaded Protobuf version.

https://github.com/apache/hadoop/blob/0a45bd034e1ce08c556227bb2c815c15be17cf10/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine2.java#L63-L67

https://github.com/apache/hadoop/blob/0a45bd034e1ce08c556227bb2c815c15be17cf10/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L70-L74


was (Author: belugabehr):
OK, just wanted to provide an update here.

 I have been working hard on getting Hadoop 3.3 working with Hive, so that JDK 
11 can be supported, and it's been a challenge. [HIVE-24484]

I have worked through some of the initial pain points, but I got stuck.  Hadoop 
introduced a new RPC mechanism using Google Protobuf v3.  Some of the LLAP 
stuff was built on top of the existing Hadoop RPC code with Protobuf2.  It 
seems that Hadoop tried to all for interoperability between the two RPCs, 
however, loading one version of the RPC engine blocks the loading of the other 
one (first one wins).  I think this becomes an issue for QTests since the tests 
may spin up an LLAP and a Hadoop mini cluster in the same classloader context. 
Simply loading the Protobuf3 Hadoop RPC (NameNode) code blocks the loading of 
the Protobuf2 Hadoop RPC (LLAP) code.  Without any changes on the Hadoop side 
to better support this setup, the LLAP code needs to be migrated to use 
Protobuf3 and to use the Hadoop 3rd part JAR with its shaded Protobuf version.

https://github.com/apache/hadoop/blob/0a45bd034e1ce08c556227bb2c815c15be17cf10/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine2.java#L63-L67

https://github.com/apache/hadoop/blob/0a45bd034e1ce08c556227bb2c815c15be17cf10/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L70-L74

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22415) Upgrade to Java 11

2020-12-09 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246960#comment-17246960
 ] 

David Mollitor commented on HIVE-22415:
---

OK, just wanted to provide an update here.

 I have been working hard on getting Hadoop 3.3 working with Hive, so that JDK 
11 can be supported, and it's been a challenge. [HIVE-24484]

I have worked through some of the initial pain points, but I got stuck.  Hadoop 
introduced a new RPC mechanism using Google Protobuf v3.  Some of the LLAP 
stuff was built on top of the existing Hadoop RPC code with Protobuf2.  It 
seems that Hadoop tried to all for interoperability between the two RPCs, 
however, loading one version of the RPC engine blocks the loading of the other 
one (first one wins).  I think this becomes an issue for QTests since the tests 
may spin up an LLAP and a Hadoop mini cluster in the same classloader context. 
Simply loading the Protobuf3 Hadoop RPC (NameNode) code blocks the loading of 
the Protobuf2 Hadoop RPC (LLAP) code.  Without any changes on the Hadoop side 
to better support this setup, the LLAP code needs to be migrated to use 
Protobuf3 and to use the Hadoop 3rd part JAR with its shaded Protobuf version.

https://github.com/apache/hadoop/blob/0a45bd034e1ce08c556227bb2c815c15be17cf10/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine2.java#L63-L67

https://github.com/apache/hadoop/blob/0a45bd034e1ce08c556227bb2c815c15be17cf10/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L70-L74

> Upgrade to Java 11
> --
>
> Key: HIVE-22415
> URL: https://issues.apache.org/jira/browse/HIVE-22415
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Upgrade Hive to Java JDK 11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24274) Implement Query Text based MaterializedView rewrite

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24274?focusedWorklogId=522496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522496
 ]

ASF GitHub Bot logged work on HIVE-24274:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 01:17
Start Date: 10/Dec/20 01:17
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1706:
URL: https://github.com/apache/hive/pull/1706#discussion_r539730309



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java
##
@@ -57,7 +57,7 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
 
 ASTNode tableTree = (ASTNode) root.getChild(0);
 TableName tableName = getQualifiedTableName(tableTree);
-if (ctx.enableUnparse()) {
+if (ctx.isScheduledQuery()) {
   unparseTranslator.addTableNameTranslation(tableTree, 
SessionState.get().getCurrentDatabase());

Review comment:
   Can we add a comment (I know that the code was not added in this patch 
but it is useful to have some clarification on why this is being done)?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/Context.java
##
@@ -336,6 +344,9 @@ private Context(Configuration conf, String executionId)  {
 opContext = new CompilationOpContext();
 
 viewsTokenRewriteStreams = new HashMap<>();
+enableUnparse =

Review comment:
   Can we add a comment why we are only enabling this when this config 
value is true? `enableUnparse` documentation has a description on why it is not 
enabled in general. However, it is worth having a comment here, since it is 
difficult to establish the connection between the config property and the 
variable.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -1945,6 +1945,18 @@ public RelOptMaterialization 
getMaterializedViewForRebuild(String dbName, String
 }
   }
 
+  public List getMaterialization(

Review comment:
   add javadoc?
   
   Also, should this method be renamed to `getSQLMatchingMaterializedView` or 
anything more descriptive?

##
File path: ql/src/test/queries/clientpositive/materialized_view_create_rewrite.q
##
@@ -5,6 +5,7 @@ set hive.support.concurrency=true;
 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
 set hive.strict.checks.cartesian.product=false;
 set hive.materializedview.rewriting=true;
+set hive.materializedview.rewriting.query.text=false;

Review comment:
   Why do we disable it here?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewsCache.java
##
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.metadata;
+
+import org.apache.calcite.plan.RelOptMaterialization;
+import 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveMaterializedViewUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.function.BiFunction;
+
+import static java.util.Collections.emptyList;
+import static java.util.Collections.unmodifiableList;
+
+/**
+ * Collection for storing {@link RelOptMaterialization}s.
+ * RelOptMaterialization can be lookup by
+ * - the Materialized View fully qualified name
+ * - query text.
+ * This implementation contains two {@link ConcurrentHashMap} one for name 
based and one for query text based lookup.
+ * The map contents are synchronized during each dml operation: Dml operations 
are performed initially on the map
+ * which provides name based lookup. The map which provides query text based 
lookup is updated by lambda expressions
+ * passed to {@link ConcurrentHashMap#compute(Object, BiFunction)}.
+ */
+public class MaterializedViewsCache {
+  private static final Logger LOG = 
LoggerFactory.getLogger(MaterializedViewsCache.class);
+
+  // Key is the database name.

[jira] [Work logged] (HIVE-24254) Remove setOwner call in ReplChangeManager

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24254?focusedWorklogId=522487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522487
 ]

ASF GitHub Bot logged work on HIVE-24254:
-

Author: ASF GitHub Bot
Created on: 10/Dec/20 00:50
Start Date: 10/Dec/20 00:50
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1567:
URL: https://github.com/apache/hive/pull/1567#issuecomment-742159729


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522487)
Time Spent: 0.5h  (was: 20m)

> Remove setOwner call in ReplChangeManager
> -
>
> Key: HIVE-24254
> URL: https://issues.apache.org/jira/browse/HIVE-24254
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24254.01.patch, HIVE-24254.02.patch, 
> HIVE-24254.03.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24218) Drop table used by a materialized view

2020-12-09 Thread Pritha Dawn (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246910#comment-17246910
 ] 

Pritha Dawn commented on HIVE-24218:


Duplicate of https://issues.apache.org/jira/browse/HIVE-22566

> Drop table used by a materialized view
> --
>
> Key: HIVE-24218
> URL: https://issues.apache.org/jira/browse/HIVE-24218
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Hive, HiveServer2, Metastore
>Affects Versions: 3.1.0
>Reporter: stephbat
>Priority: Critical
>
> I have discovered that it's possible to drop a table used by a materialized 
> view. When I drop this table, the result is OK while I think this action 
> should be refused. When I check in the metastore database, I can see that the 
> table has been partially deleted (ie : the reference of the table still 
> exists in TBLS and in MV_TABLES_USED). This introduces an inconsistency in 
> the metastore.
> Steps to reproduced :
> {code:java}
> jdbc:hive2://localhost.> use use ptest2_db_dev;
> No rows affected (0.067 seconds)
> 0: jdbc:hive2://localhost.> create table table_blocked (id string);
> No rows affected (0.97 seconds)
> 0: jdbc:hive2://localhost.> desc table_blocked;
> +---++--+
> | col_name  | data_type  | comment  |
> +---++--+
> | id| string |  |
> +---++--+
> 1 row selected (0.171 seconds)
> 0: jdbc:hive2://localhost.> create materialized view table_blocked_mv as 
> select * from table_blocked;
> No rows affected (18.055 seconds)
> 0: jdbc:hive2://localhost.> desc table_blocked_mv;
> +---++--+
> | col_name  | data_type  | comment  |
> +---++--+
> | id| string |  |
> +---++--+
> 1 row selected (0.316 seconds)
> 0: jdbc:hive2://localhost.> drop table table_blocked;
> No rows affected (10.803 seconds)
> 0: jdbc:hive2://localhost.> desc table_blocked_mv;
> +---++--+
> | col_name  | data_type  | comment  |
> +---++--+
> | id| string |  |
> +---++--+
> 1 row selected (0.222 seconds)
> 0: jdbc:hive2://localhost.> desc table_blocked;
> Error: Error while compiling statement: FAILED: SemanticException Unable to 
> fetch table table_blocked. null (state=42000,code=4)
> 0: jdbc:hive2://localhost.> select * from table_blocked_mv;
> Error: Error while compiling statement: FAILED: SemanticException Table 
> ptest2_db_dev.table_blocked not found when trying to obtain it to check 
> masking/filtering policies (state=42000,code=4)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24512) Exclude calcite in packaging Hive

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24512?focusedWorklogId=522468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522468
 ]

ASF GitHub Bot logged work on HIVE-24512:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 23:20
Start Date: 09/Dec/20 23:20
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #1760:
URL: https://github.com/apache/hive/pull/1760#issuecomment-742125772


   Thanks @sunchao 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522468)
Time Spent: 50m  (was: 40m)

> Exclude calcite in packaging Hive
> -
>
> Key: HIVE-24512
> URL: https://issues.apache.org/jira/browse/HIVE-24512
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.8
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded 
> calcite, but we see such error:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
>   at 
> org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)
> We find in 2.3.8 binary distribution, there are calcite jars:
> calcite-core-1.10.0.jar
> calcite-druid-1.10.0.jar
> calcite-linq4j-1.10.0.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24388) Enhance swo optimizations to merge EventOperators

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24388?focusedWorklogId=522465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522465
 ]

ASF GitHub Bot logged work on HIVE-24388:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 23:08
Start Date: 09/Dec/20 23:08
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1750:
URL: https://github.com/apache/hive/pull/1750#discussion_r539697350



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
##
@@ -17,6 +17,7 @@
  */
 package org.apache.hadoop.hive.ql.optimizer;
 
+import java.io.File;

Review comment:
   Needed?

##
File path: ql/src/test/results/clientpositive/llap/swo_event_merge.q.out
##
@@ -0,0 +1,291 @@
+PREHOOK: query: drop table if exists x1_store_sales
+PREHOOK: type: DROPTABLE
+POSTHOOK: query: drop table if exists x1_store_sales
+POSTHOOK: type: DROPTABLE
+PREHOOK: query: drop table if exists x1_date_dim
+PREHOOK: type: DROPTABLE
+POSTHOOK: query: drop table if exists x1_date_dim
+POSTHOOK: type: DROPTABLE
+PREHOOK: query: drop table if exists x1_item
+PREHOOK: type: DROPTABLE
+POSTHOOK: query: drop table if exists x1_item
+POSTHOOK: type: DROPTABLE
+PREHOOK: query: create table x1_store_sales 
+(
+   ss_item_sk  int
+)
+partitioned by (ss_sold_date_sk int)
+stored as orc
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@x1_store_sales
+POSTHOOK: query: create table x1_store_sales 
+(
+   ss_item_sk  int
+)
+partitioned by (ss_sold_date_sk int)
+stored as orc
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@x1_store_sales
+PREHOOK: query: create table x1_date_dim
+(
+   d_date_sk   int,
+   d_month_seq int,
+   d_year  int,
+   d_moy   int
+)
+stored as orc
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@x1_date_dim
+POSTHOOK: query: create table x1_date_dim
+(
+   d_date_sk   int,
+   d_month_seq int,
+   d_year  int,
+   d_moy   int
+)
+stored as orc
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@x1_date_dim
+PREHOOK: query: insert into x1_date_dim values (1,1,2000,2),
+   (2,2,2001,2)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@x1_date_dim
+POSTHOOK: query: insert into x1_date_dim values(1,1,2000,2),
+   (2,2,2001,2)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@x1_date_dim
+POSTHOOK: Lineage: x1_date_dim.d_date_sk SCRIPT []
+POSTHOOK: Lineage: x1_date_dim.d_month_seq SCRIPT []
+POSTHOOK: Lineage: x1_date_dim.d_moy SCRIPT []
+POSTHOOK: Lineage: x1_date_dim.d_year SCRIPT []
+PREHOOK: query: insert into x1_store_sales partition (ss_sold_date_sk=1) 
values (1)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@x1_store_sales@ss_sold_date_sk=1
+POSTHOOK: query: insert into x1_store_sales partition (ss_sold_date_sk=1) 
values (1)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@x1_store_sales@ss_sold_date_sk=1
+POSTHOOK: Lineage: x1_store_sales PARTITION(ss_sold_date_sk=1).ss_item_sk 
SCRIPT []
+PREHOOK: query: insert into x1_store_sales partition (ss_sold_date_sk=2) 
values (2)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@x1_store_sales@ss_sold_date_sk=2
+POSTHOOK: query: insert into x1_store_sales partition (ss_sold_date_sk=2) 
values (2)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@x1_store_sales@ss_sold_date_sk=2
+POSTHOOK: Lineage: x1_store_sales PARTITION(ss_sold_date_sk=2).ss_item_sk 
SCRIPT []
+PREHOOK: query: alter table x1_store_sales partition (ss_sold_date_sk=1) 
update statistics set(
+'numRows'='123456',
+'rawDataSize'='1234567')
+PREHOOK: type: ALTERTABLE_UPDATEPARTSTATS
+PREHOOK: Input: default@x1_store_sales
+PREHOOK: Output: default@x1_store_sales@ss_sold_date_sk=1
+POSTHOOK: query: alter table x1_store_sales partition (ss_sold_date_sk=1) 
update statistics set(
+'numRows'='123456',
+'rawDataSize'='1234567')
+POSTHOOK: type: ALTERTABLE_UPDATEPARTSTATS
+POSTHOOK: Input: default@x1_store_sales
+POSTHOOK: Input: default@x1_store_sales@ss_sold_date_sk=1
+POSTHOOK: Output: default@x1_store_sales@ss_sold_date_sk=1
+PREHOOK: query: alter table x1_date_dim update statistics set(
+'numRows'='56',
+'rawDataSize'='81449')
+PREHOOK: type: ALTERTABLE_UPDATETABLESTATS
+PREHOOK: Input: default@x1_date_dim
+PREHOOK: Output: default@x1_date_dim
+POSTHOOK: query: alter table x1_date_dim update statistics set(

[jira] [Resolved] (HIVE-24512) Exclude calcite in packaging Hive

2020-12-09 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HIVE-24512.
-
Fix Version/s: 2.3.8
 Hadoop Flags: Reviewed
 Assignee: L. C. Hsieh
   Resolution: Fixed

> Exclude calcite in packaging Hive
> -
>
> Key: HIVE-24512
> URL: https://issues.apache.org/jira/browse/HIVE-24512
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.8
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded 
> calcite, but we see such error:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
>   at 
> org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)
> We find in 2.3.8 binary distribution, there are calcite jars:
> calcite-core-1.10.0.jar
> calcite-druid-1.10.0.jar
> calcite-linq4j-1.10.0.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24512) Exclude calcite in packaging Hive

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24512?focusedWorklogId=522452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522452
 ]

ASF GitHub Bot logged work on HIVE-24512:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 22:40
Start Date: 09/Dec/20 22:40
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1760:
URL: https://github.com/apache/hive/pull/1760#issuecomment-742108787


   CI test run finished and looks good. Merged to branch-2.3 and branch-2. 
Thanks @viirya .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522452)
Time Spent: 40m  (was: 0.5h)

> Exclude calcite in packaging Hive
> -
>
> Key: HIVE-24512
> URL: https://issues.apache.org/jira/browse/HIVE-24512
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded 
> calcite, but we see such error:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
>   at 
> org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)
> We find in 2.3.8 binary distribution, there are calcite jars:
> calcite-core-1.10.0.jar
> calcite-druid-1.10.0.jar
> calcite-linq4j-1.10.0.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24512) Exclude calcite in packaging Hive

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24512?focusedWorklogId=522451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522451
 ]

ASF GitHub Bot logged work on HIVE-24512:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 22:39
Start Date: 09/Dec/20 22:39
Worklog Time Spent: 10m 
  Work Description: sunchao merged pull request #1760:
URL: https://github.com/apache/hive/pull/1760


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522451)
Time Spent: 0.5h  (was: 20m)

> Exclude calcite in packaging Hive
> -
>
> Key: HIVE-24512
> URL: https://issues.apache.org/jira/browse/HIVE-24512
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded 
> calcite, but we see such error:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
>   at 
> org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)
> We find in 2.3.8 binary distribution, there are calcite jars:
> calcite-core-1.10.0.jar
> calcite-druid-1.10.0.jar
> calcite-linq4j-1.10.0.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24484) Upgrade Hadoop to 3.3.0

2020-12-09 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-24484:
--
Summary: Upgrade Hadoop to 3.3.0  (was: Upgrade Hadoop to 3.2.1)

> Upgrade Hadoop to 3.3.0
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=522420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522420
 ]

ASF GitHub Bot logged work on HIVE-24468:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 20:22
Start Date: 09/Dec/20 20:22
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1728:
URL: https://github.com/apache/hive/pull/1728


   …g DB Entry
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522420)
Time Spent: 2h  (was: 1h 50m)

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=522419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522419
 ]

ASF GitHub Bot logged work on HIVE-24468:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 20:21
Start Date: 09/Dec/20 20:21
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1728:
URL: https://github.com/apache/hive/pull/1728


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522419)
Time Spent: 1h 50m  (was: 1h 40m)

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24512) Exclude calcite in packaging Hive

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24512?focusedWorklogId=522362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522362
 ]

ASF GitHub Bot logged work on HIVE-24512:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 17:42
Start Date: 09/Dec/20 17:42
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #1760:
URL: https://github.com/apache/hive/pull/1760#issuecomment-741936967


   cc @sunchao 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522362)
Time Spent: 20m  (was: 10m)

> Exclude calcite in packaging Hive
> -
>
> Key: HIVE-24512
> URL: https://issues.apache.org/jira/browse/HIVE-24512
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded 
> calcite, but we see such error:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
>   at 
> org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)
> We find in 2.3.8 binary distribution, there are calcite jars:
> calcite-core-1.10.0.jar
> calcite-druid-1.10.0.jar
> calcite-linq4j-1.10.0.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24512) Exclude calcite in packaging Hive

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24512?focusedWorklogId=522361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522361
 ]

ASF GitHub Bot logged work on HIVE-24512:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 17:42
Start Date: 09/Dec/20 17:42
Worklog Time Spent: 10m 
  Work Description: viirya opened a new pull request #1760:
URL: https://github.com/apache/hive/pull/1760


   
   
   ### What changes were proposed in this pull request?
   
   
   This proposes to exclude calcite in packaging to avoid conflicting with 
shaded calcite in ql.
   
   ### Why are the changes needed?
   
   
   The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded 
calcite, but we see such error:
   
   Caused by: java.lang.NoSuchMethodError: 
org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
   at 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29)
   at 
org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
   at 
org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)
   
   We find in 2.3.8 binary distribution, there are calcite jars:
   
   calcite-core-1.10.0.jar
   calcite-druid-1.10.0.jar
   calcite-linq4j-1.10.0.jar
   
   We need to exclude calcite in packaging.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Unit test.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522361)
Remaining Estimate: 0h
Time Spent: 10m

> Exclude calcite in packaging Hive
> -
>
> Key: HIVE-24512
> URL: https://issues.apache.org/jira/browse/HIVE-24512
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded 
> calcite, but we see such error:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
>   at 
> org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)
> We find in 2.3.8 binary distribution, there are calcite jars:
> calcite-core-1.10.0.jar
> calcite-druid-1.10.0.jar
> calcite-linq4j-1.10.0.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24512) Exclude calcite in packaging Hive

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24512:
--
Labels: pull-request-available  (was: )

> Exclude calcite in packaging Hive
> -
>
> Key: HIVE-24512
> URL: https://issues.apache.org/jira/browse/HIVE-24512
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded 
> calcite, but we see such error:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
>   at 
> org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)
> We find in 2.3.8 binary distribution, there are calcite jars:
> calcite-core-1.10.0.jar
> calcite-druid-1.10.0.jar
> calcite-linq4j-1.10.0.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24512) Exclude calcite in packaging Hive

2020-12-09 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated HIVE-24512:
---
Affects Version/s: 2.3.8

> Exclude calcite in packaging Hive
> -
>
> Key: HIVE-24512
> URL: https://issues.apache.org/jira/browse/HIVE-24512
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8
>Reporter: L. C. Hsieh
>Priority: Major
>
> The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded 
> calcite, but we see such error:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
>   at 
> org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)
> We find in 2.3.8 binary distribution, there are calcite jars:
> calcite-core-1.10.0.jar
> calcite-druid-1.10.0.jar
> calcite-linq4j-1.10.0.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24512) Exclude calcite in packaging Hive

2020-12-09 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated HIVE-24512:
---
Description: 
The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded 
calcite, but we see such error:

Caused by: java.lang.NoSuchMethodError: 
org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
at 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
at 
org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)

We find in 2.3.8 binary distribution, there are calcite jars:

calcite-core-1.10.0.jar
calcite-druid-1.10.0.jar
calcite-linq4j-1.10.0.jar



> Exclude calcite in packaging Hive
> -
>
> Key: HIVE-24512
> URL: https://issues.apache.org/jira/browse/HIVE-24512
> Project: Hive
>  Issue Type: Bug
>Reporter: L. C. Hsieh
>Priority: Major
>
> The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded 
> calcite, but we see such error:
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
>   at 
> org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)
> We find in 2.3.8 binary distribution, there are calcite jars:
> calcite-core-1.10.0.jar
> calcite-druid-1.10.0.jar
> calcite-linq4j-1.10.0.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout in cloud environment

2020-12-09 Thread Simhadri G (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri G updated HIVE-24497:
--
Summary: Node heartbeats from LLAP Daemon to the client are not matching 
leading to timeout in cloud environment  (was: Node heartbeats from LLAP Daemon 
to the client are not matching leading to timeout.)

> Node heartbeats from LLAP Daemon to the client are not matching leading to 
> timeout in cloud environment
> ---
>
> Key: HIVE-24497
> URL: https://issues.apache.org/jira/browse/HIVE-24497
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Minor
>  Labels: pull-request-available
> Attachments: hive-24497.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Node heartbeat contains info about all the tasks that were submitted to that 
> LLAP Daemon. In cloud deployment, the client is not able to match this 
> heartbeats due to differences in hostname and port .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24504?focusedWorklogId=522260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522260
 ]

ASF GitHub Bot logged work on HIVE-24504:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 14:23
Start Date: 09/Dec/20 14:23
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1758:
URL: https://github.com/apache/hive/pull/1758#discussion_r539346886



##
File path: ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestSerializer.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.io.arrow;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.List;
+
+public class TestSerializer {
+  @Test
+  public void testEmptyArray() {

Review comment:
   > The name was based on the Hive type, but I think both makes sense so 
renamed 
   
   Cant disagree with that :D 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522260)
Time Spent: 50m  (was: 40m)

> VectorFileSinkArrowOperator does not serialize complex types correctly
> --
>
> Key: HIVE-24504
> URL: https://issues.apache.org/jira/browse/HIVE-24504
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When the table has complex types and the result has 0 records the 
> VectorFileSinkArrowOperator only serializes the primitive types correctly. 
> For complex types only the main type is set which causes issues for clients 
> trying to read data.
> Got the following HWC exception:
> {code:java}
> Previous exception in task: Unsupported data type: Null
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala)
>   
> org.apache.spark.sql.vectorized.ArrowColumnVector.(ArrowColumnVector.java:135)
>   
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105)
>   
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29)
>   
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
>   
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
>   
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown
>  Source)
>   
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>   
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>   
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>   
>

[jira] [Work logged] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24504?focusedWorklogId=522259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522259
 ]

ASF GitHub Bot logged work on HIVE-24504:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 14:10
Start Date: 09/Dec/20 14:10
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1758:
URL: https://github.com/apache/hive/pull/1758#discussion_r539336699



##
File path: ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestSerializer.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.io.arrow;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.List;
+
+public class TestSerializer {
+  @Test
+  public void testEmptyArray() {
+List typeInfos = 
TypeInfoUtils.getTypeInfosFromTypeString("array");
+List fieldNames = Arrays.asList(new String[]{"a"});
+Serializer converter = new Serializer(new HiveConf(), "attemptId", 
typeInfos, fieldNames);
+ArrowWrapperWritable writable = converter.emptyBatch();
+Assert.assertEquals("Schema>",
+writable.getVectorSchemaRoot().getSchema().toString());
+  }
+
+  @Test
+  public void testEmptyStruct() {
+List typeInfos = 
TypeInfoUtils.getTypeInfosFromTypeString("struct");
+List fieldNames = Arrays.asList(new String[] { "a" });
+Serializer converter = new Serializer(new HiveConf(), "attemptId", 
typeInfos, fieldNames);
+ArrowWrapperWritable writable = converter.emptyBatch();
+Assert.assertEquals("Schema>",
+writable.getVectorSchemaRoot().getSchema().toString());
+  }
+
+  @Test
+  public void testEmptyMap() {
+List typeInfos = 
TypeInfoUtils.getTypeInfosFromTypeString("map");
+List fieldNames = Arrays.asList(new String[] { "a" });
+Serializer converter = new Serializer(new HiveConf(), "attemptId", 
typeInfos, fieldNames);
+ArrowWrapperWritable writable = converter.emptyBatch();
+Assert.assertEquals("Schema>>",
+writable.getVectorSchemaRoot().getSchema().toString());
+  }
+
+  @Test
+  public void testEmptyComplexStruct() {
+List typeInfos = TypeInfoUtils.getTypeInfosFromTypeString(
+
"struct,c:map,d:struct,f:map>>");
+List fieldNames = Arrays.asList(new String[] { "a" });
+Serializer converter = new Serializer(new HiveConf(), "attemptId", 
typeInfos, fieldNames);
+ArrowWrapperWritable writable = converter.emptyBatch();
+Assert.assertEquals(
+"Schema, c: List<$data$: 
Struct>, " +
+"d: Struct, f: List<$data$: 
Struct",
+writable.getVectorSchemaRoot().getSchema().toString());
+  }

Review comment:
   Addes some more tests to cover every nested type at least once





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522259)
Time Spent: 40m  (was: 0.5h)

> VectorFileSinkArrowOperator does not serialize complex types correctly
> --
>
> Key: HIVE-24504
> URL: https://issues.apache.org/jira/browse/HIVE-24504
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When the table has complex types and the result has 0 records the 
> VectorFileSinkArrowOperator only serializes the primitive types correctly. 
> For complex types only the main type is set which causes

[jira] [Work logged] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24504?focusedWorklogId=522258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522258
 ]

ASF GitHub Bot logged work on HIVE-24504:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 14:09
Start Date: 09/Dec/20 14:09
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1758:
URL: https://github.com/apache/hive/pull/1758#discussion_r539336144



##
File path: ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestSerializer.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.io.arrow;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.List;
+
+public class TestSerializer {
+  @Test
+  public void testEmptyArray() {

Review comment:
   The name was based on the Hive type,  but I think both makes sense so 
renamed  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522258)
Time Spent: 0.5h  (was: 20m)

> VectorFileSinkArrowOperator does not serialize complex types correctly
> --
>
> Key: HIVE-24504
> URL: https://issues.apache.org/jira/browse/HIVE-24504
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the table has complex types and the result has 0 records the 
> VectorFileSinkArrowOperator only serializes the primitive types correctly. 
> For complex types only the main type is set which causes issues for clients 
> trying to read data.
> Got the following HWC exception:
> {code:java}
> Previous exception in task: Unsupported data type: Null
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala)
>   
> org.apache.spark.sql.vectorized.ArrowColumnVector.(ArrowColumnVector.java:135)
>   
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105)
>   
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29)
>   
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
>   
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
>   
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown
>  Source)
>   
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>   
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>   
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>   
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>   
>

[jira] [Work started] (HIVE-24502) Store table level regular expression used during dump for table level replication

2020-12-09 Thread Aasha Medhi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24502 started by Aasha Medhi.
--
> Store table level regular expression used during dump for table level 
> replication
> -
>
> Key: HIVE-24502
> URL: https://issues.apache.org/jira/browse/HIVE-24502
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24502.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Store include table list and exclude table list as part of dump meta data file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24502) Store table level regular expression used during dump for table level replication

2020-12-09 Thread Aasha Medhi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24502:
---
Attachment: HIVE-24502.01.patch
Status: Patch Available  (was: In Progress)

> Store table level regular expression used during dump for table level 
> replication
> -
>
> Key: HIVE-24502
> URL: https://issues.apache.org/jira/browse/HIVE-24502
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24502.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Store include table list and exclude table list as part of dump meta data file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24502) Store table level regular expression used during dump for table level replication

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24502:
--
Labels: pull-request-available  (was: )

> Store table level regular expression used during dump for table level 
> replication
> -
>
> Key: HIVE-24502
> URL: https://issues.apache.org/jira/browse/HIVE-24502
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24502) Store table level regular expression used during dump for table level replication

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24502?focusedWorklogId=522208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522208
 ]

ASF GitHub Bot logged work on HIVE-24502:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 11:59
Start Date: 09/Dec/20 11:59
Worklog Time Spent: 10m 
  Work Description: aasha opened a new pull request #1759:
URL: https://github.com/apache/hive/pull/1759


   …r table level replication
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522208)
Remaining Estimate: 0h
Time Spent: 10m

> Store table level regular expression used during dump for table level 
> replication
> -
>
> Key: HIVE-24502
> URL: https://issues.apache.org/jira/browse/HIVE-24502
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24502) Store table level regular expression used during dump for table level replication

2020-12-09 Thread Aasha Medhi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24502:
---
Description: Store include table list and exclude table list as part of 
dump meta data file

> Store table level regular expression used during dump for table level 
> replication
> -
>
> Key: HIVE-24502
> URL: https://issues.apache.org/jira/browse/HIVE-24502
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Store include table list and exclude table list as part of dump meta data file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24504?focusedWorklogId=522190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522190
 ]

ASF GitHub Bot logged work on HIVE-24504:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 11:35
Start Date: 09/Dec/20 11:35
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1758:
URL: https://github.com/apache/hive/pull/1758#discussion_r539228933



##
File path: ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestSerializer.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.io.arrow;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.List;
+
+public class TestSerializer {
+  @Test
+  public void testEmptyArray() {

Review comment:
   Nit: Would probably name it testEmptyList for consistency with the 
Serializer

##
File path: ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestSerializer.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.io.arrow;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.List;
+
+public class TestSerializer {
+  @Test
+  public void testEmptyArray() {
+List typeInfos = 
TypeInfoUtils.getTypeInfosFromTypeString("array");
+List fieldNames = Arrays.asList(new String[]{"a"});
+Serializer converter = new Serializer(new HiveConf(), "attemptId", 
typeInfos, fieldNames);
+ArrowWrapperWritable writable = converter.emptyBatch();
+Assert.assertEquals("Schema>",
+writable.getVectorSchemaRoot().getSchema().toString());
+  }
+
+  @Test
+  public void testEmptyStruct() {
+List typeInfos = 
TypeInfoUtils.getTypeInfosFromTypeString("struct");
+List fieldNames = Arrays.asList(new String[] { "a" });
+Serializer converter = new Serializer(new HiveConf(), "attemptId", 
typeInfos, fieldNames);
+ArrowWrapperWritable writable = converter.emptyBatch();
+Assert.assertEquals("Schema>",
+writable.getVectorSchemaRoot().getSchema().toString());
+  }
+
+  @Test
+  public void testEmptyMap() {
+List typeInfos = 
TypeInfoUtils.getTypeInfosFromTypeString("map");
+List fieldNames = Arrays.asList(new String[] { "a" });
+Serializer converter = new Serializer(new HiveConf(), "attemptId", 
typeInfos, fieldNames);
+ArrowWrapperWritable writable = converter.emptyBatch();
+Assert.assertEquals("Schema>>",
+writable.getVectorSchemaRoot().getSchema().toString());
+  }
+
+  @Test
+  public void testEmptyComplexStruct() {
+List typeInfos = TypeInfoUtils.getTypeInfosFromTypeString(
+
"struct,c:map,d:struct,f:map>>");
+List fieldNames = Arrays.asList(new String[] { "a" });
+Serializer converter = new Serializer(new HiveConf(), "attemptId", 
typeInfos, fieldNames);
+ArrowWrapperWritable writable = converter.emptyBatch();
+Assert.assertEquals(
+

[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=522186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522186
 ]

ASF GitHub Bot logged work on HIVE-24468:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 11:26
Start Date: 09/Dec/20 11:26
Worklog Time Spent: 10m 
  Work Description: aasha commented on pull request #1728:
URL: https://github.com/apache/hive/pull/1728#issuecomment-741711210


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522186)
Time Spent: 1h 40m  (was: 1.5h)

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24504?focusedWorklogId=522185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522185
 ]

ASF GitHub Bot logged work on HIVE-24504:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 11:24
Start Date: 09/Dec/20 11:24
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #1758:
URL: https://github.com/apache/hive/pull/1758


   
   ### What changes were proposed in this pull request?
   Use an empty batch to generate the schema for the empty results
   
   ### Why are the changes needed?
   Clients expect the full schema even for empty results
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Unit and other test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522185)
Remaining Estimate: 0h
Time Spent: 10m

> VectorFileSinkArrowOperator does not serialize complex types correctly
> --
>
> Key: HIVE-24504
> URL: https://issues.apache.org/jira/browse/HIVE-24504
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the table has complex types and the result has 0 records the 
> VectorFileSinkArrowOperator only serializes the primitive types correctly. 
> For complex types only the main type is set which causes issues for clients 
> trying to read data.
> Got the following HWC exception:
> {code:java}
> Previous exception in task: Unsupported data type: Null
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala)
>   
> org.apache.spark.sql.vectorized.ArrowColumnVector.(ArrowColumnVector.java:135)
>   
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105)
>   
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29)
>   
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
>   
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
>   
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown
>  Source)
>   
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>   
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>   
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>   
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>   
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>   org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>   org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>   org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   org.apache.spark.scheduler.Task.run(Task.scala:109)
>   org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>   
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139)
>   at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:117)
>

[jira] [Updated] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24504:
--
Labels: pull-request-available  (was: )

> VectorFileSinkArrowOperator does not serialize complex types correctly
> --
>
> Key: HIVE-24504
> URL: https://issues.apache.org/jira/browse/HIVE-24504
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the table has complex types and the result has 0 records the 
> VectorFileSinkArrowOperator only serializes the primitive types correctly. 
> For complex types only the main type is set which causes issues for clients 
> trying to read data.
> Got the following HWC exception:
> {code:java}
> Previous exception in task: Unsupported data type: Null
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98)
>   
> org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala)
>   
> org.apache.spark.sql.vectorized.ArrowColumnVector.(ArrowColumnVector.java:135)
>   
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105)
>   
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29)
>   
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
>   
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
>   
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown
>  Source)
>   
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>   
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>   
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>   
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>   
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>   org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>   org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>   org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   org.apache.spark.scheduler.Task.run(Task.scala:109)
>   org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>   
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139)
>   at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:117)
>   at org.apache.spark.scheduler.Task.run(Task.scala:119)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24511) Fix typo in SerDeStorageSchemaReader

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24511?focusedWorklogId=522182=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522182
 ]

ASF GitHub Bot logged work on HIVE-24511:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 11:14
Start Date: 09/Dec/20 11:14
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1757:
URL: https://github.com/apache/hive/pull/1757


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522182)
Remaining Estimate: 0h
Time Spent: 10m

> Fix typo in SerDeStorageSchemaReader
> 
>
> Key: HIVE-24511
> URL: https://issues.apache.org/jira/browse/HIVE-24511
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 1,  Close the created classloader to release resources.
> 2,  More detail error messages on MetaException when throwing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24511) Fix typo in SerDeStorageSchemaReader

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24511:
--
Labels: pull-request-available  (was: )

> Fix typo in SerDeStorageSchemaReader
> 
>
> Key: HIVE-24511
> URL: https://issues.apache.org/jira/browse/HIVE-24511
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 1,  Close the created classloader to release resources.
> 2,  More detail error messages on MetaException when throwing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24207?focusedWorklogId=522164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522164
 ]

ASF GitHub Bot logged work on HIVE-24207:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 10:12
Start Date: 09/Dec/20 10:12
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1556:
URL: https://github.com/apache/hive/pull/1556#discussion_r539176536



##
File path: ql/src/test/queries/clientpositive/authorization_view_1.q
##
@@ -1,5 +1,6 @@
 --! qt:dataset:src
 set 
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider;
+set hive.exec.reducers.max=1;

Review comment:
   Any reason for changing this?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522164)
Time Spent: 0.5h  (was: 20m)

> LimitOperator can leverage ObjectCache to bail out quickly
> --
>
> Key: HIVE-24207
> URL: https://issues.apache.org/jira/browse/HIVE-24207
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {noformat}
> select  ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in 
> (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk 
> limit 100;
>  select distinct ss_sold_date_sk from store_sales, date_dim where 
> date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = 
> date_dim.d_date_sk limit 100;
>  {noformat}
> Queries like the above generate a large number of map tasks. Currently they 
> don't bail out after generating enough amount of data. 
> It would be good to make use of ObjectCache & retain the number of records 
> generated. LimitOperator/VectorLimitOperator can bail out for the later tasks 
> in the operator's init phase itself. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24475) Generalize fixacidkeyindex utility

2020-12-09 Thread Antal Sinkovits (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits resolved HIVE-24475.

Resolution: Fixed

> Generalize fixacidkeyindex utility
> --
>
> Key: HIVE-24475
> URL: https://issues.apache.org/jira/browse/HIVE-24475
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Transactions
>Affects Versions: 3.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There is a utility in hive which can validate/fix corrupted 
> hive.acid.key.index.
> hive --service fixacidkeyindex
> Unfortunately it is only tailored for a specific problem 
> (https://issues.apache.org/jira/browse/HIVE-18907), instead of generally 
> validating and recovering the hive.acid.key.index from the stripe data itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24475) Generalize fixacidkeyindex utility

2020-12-09 Thread Antal Sinkovits (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-24475:
---
Fix Version/s: 4.0.0

> Generalize fixacidkeyindex utility
> --
>
> Key: HIVE-24475
> URL: https://issues.apache.org/jira/browse/HIVE-24475
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Transactions
>Affects Versions: 3.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There is a utility in hive which can validate/fix corrupted 
> hive.acid.key.index.
> hive --service fixacidkeyindex
> Unfortunately it is only tailored for a specific problem 
> (https://issues.apache.org/jira/browse/HIVE-18907), instead of generally 
> validating and recovering the hive.acid.key.index from the stripe data itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=522135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522135
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 09:10
Start Date: 09/Dec/20 09:10
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1710:
URL: https://github.com/apache/hive/pull/1710#discussion_r539130987



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -10800,53 +10801,89 @@ public void addNotificationEvent(NotificationEvent 
entry) throws MetaException {
 
   @Override
   public void cleanNotificationEvents(int olderThan) {

Review comment:
   The same improvement done in cleanNotificationEvents can be applied to 
cleanWriteNotificationEvents also.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522135)
Time Spent: 2.5h  (was: 2h 20m)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=522133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522133
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 09:09
Start Date: 09/Dec/20 09:09
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1710:
URL: https://github.com/apache/hive/pull/1710#discussion_r539130987



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -10800,53 +10801,89 @@ public void addNotificationEvent(NotificationEvent 
entry) throws MetaException {
 
   @Override
   public void cleanNotificationEvents(int olderThan) {

Review comment:
   The same improvement of deleting in batches can be applied to 
cleanWriteNotificationEvents also.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522133)
Time Spent: 2h 20m  (was: 2h 10m)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=522131=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522131
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 09:08
Start Date: 09/Dec/20 09:08
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1710:
URL: https://github.com/apache/hive/pull/1710#discussion_r539040762



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -10800,53 +10801,89 @@ public void addNotificationEvent(NotificationEvent 
entry) throws MetaException {
 
   @Override
   public void cleanNotificationEvents(int olderThan) {
-boolean commited = false;
-Query query = null;
+final int eventBatchSize = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.EVENT_CLEAN_MAX_EVENTS);
+
+final long ageSec = olderThan;
+final Instant now = Instant.now();
+
+final int tooOld = Math.toIntExact(now.getEpochSecond() - ageSec);
+
+final Optional batchSize = (eventBatchSize > 0) ? 
Optional.of(eventBatchSize) : Optional.empty();
+
+final long start = System.nanoTime();
+int deleteCount = doCleanNotificationEvents(tooOld, batchSize);
+
+if (deleteCount == 0) {
+  LOG.info("No Notification events found to be cleaned with eventTime < 
{}", tooOld);
+} else {
+  int batchCount = 0;
+  do {
+batchCount = doCleanNotificationEvents(tooOld, batchSize);
+deleteCount += batchCount;
+  } while (batchCount > 0);
+}
+
+final long finish = System.nanoTime();
+
+LOG.info("Deleted {} notification events older than epoch:{} in {}ms", 
deleteCount, tooOld,
+TimeUnit.NANOSECONDS.toMillis(finish - start));
+  }
+
+  private int doCleanNotificationEvents(final int ageSec, final 
Optional batchSize) {
+final Transaction tx = pm.currentTransaction();
+int eventsCount = 0;
+
 try {
-  openTransaction();
-  long tmp = System.currentTimeMillis() / 1000 - olderThan;
-  int tooOld = (tmp > Integer.MAX_VALUE) ? 0 : (int) tmp;
-  query = pm.newQuery(MNotificationLog.class, "eventTime < tooOld");
-  query.declareParameters("java.lang.Integer tooOld");
+  tx.begin();
 
-  int max_events = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.EVENT_CLEAN_MAX_EVENTS);
-  max_events = max_events > 0 ? max_events : Integer.MAX_VALUE;
-  query.setRange(0, max_events);
-  query.setOrdering("eventId ascending");
+  try (Query query = pm.newQuery(MNotificationLog.class, "eventTime < 
tooOld")) {
+query.declareParameters("java.lang.Integer tooOld");
+query.setOrdering("eventId ascending");
+if (batchSize.isPresent()) {
+  query.setRange(0, batchSize.get());
+}
 
-  List toBeRemoved = (List) query.execute(tooOld);
-  int iteration = 0;
-  int eventCount = 0;
-  long minEventId = 0;
-  long minEventTime = 0;
-  long maxEventId = 0;
-  long maxEventTime = 0;
-  while (CollectionUtils.isNotEmpty(toBeRemoved)) {
-int listSize = toBeRemoved.size();
-if (iteration == 0) {
-  MNotificationLog firstNotification = toBeRemoved.get(0);
-  minEventId = firstNotification.getEventId();
-  minEventTime = firstNotification.getEventTime();
+List events = (List) query.execute(ageSec);
+if (CollectionUtils.isNotEmpty(events)) {
+  eventsCount = events.size();
+
+  if (LOG.isDebugEnabled()) {
+int minEventTime, maxEventTime;
+long minEventId, maxEventId;
+Iterator iter = events.iterator();
+MNotificationLog firstNotification = iter.next();
+
+minEventTime = maxEventTime = firstNotification.getEventTime();
+minEventId = maxEventId = firstNotification.getEventId();
+
+while (iter.hasNext()) {
+  MNotificationLog notification = iter.next();

Review comment:
   Is the comparison required? events will always be in ascending order of 
event id





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522131)
Time Spent: 2h 10m  (was: 2h)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David

[jira] [Work logged] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24207?focusedWorklogId=522127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522127
 ]

ASF GitHub Bot logged work on HIVE-24207:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 08:54
Start Date: 09/Dec/20 08:54
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1556:
URL: https://github.com/apache/hive/pull/1556#issuecomment-741630312


   precommit tests passed, could you please take a look @rbalamohan ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522127)
Time Spent: 20m  (was: 10m)

> LimitOperator can leverage ObjectCache to bail out quickly
> --
>
> Key: HIVE-24207
> URL: https://issues.apache.org/jira/browse/HIVE-24207
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {noformat}
> select  ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in 
> (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk 
> limit 100;
>  select distinct ss_sold_date_sk from store_sales, date_dim where 
> date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = 
> date_dim.d_date_sk limit 100;
>  {noformat}
> Queries like the above generate a large number of map tasks. Currently they 
> don't bail out after generating enough amount of data. 
> It would be good to make use of ObjectCache & retain the number of records 
> generated. LimitOperator/VectorLimitOperator can bail out for the later tasks 
> in the operator's init phase itself. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24475) Generalize fixacidkeyindex utility

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24475?focusedWorklogId=522126=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522126
 ]

ASF GitHub Bot logged work on HIVE-24475:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 08:53
Start Date: 09/Dec/20 08:53
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on pull request #1730:
URL: https://github.com/apache/hive/pull/1730#issuecomment-741629522


   Merged into master. Thanks for the patch @asinkovits and for the review 
@pvargacl and @maheshk114 .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522126)
Time Spent: 1h 40m  (was: 1.5h)

> Generalize fixacidkeyindex utility
> --
>
> Key: HIVE-24475
> URL: https://issues.apache.org/jira/browse/HIVE-24475
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Transactions
>Affects Versions: 3.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There is a utility in hive which can validate/fix corrupted 
> hive.acid.key.index.
> hive --service fixacidkeyindex
> Unfortunately it is only tailored for a specific problem 
> (https://issues.apache.org/jira/browse/HIVE-18907), instead of generally 
> validating and recovering the hive.acid.key.index from the stripe data itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24475) Generalize fixacidkeyindex utility

2020-12-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24475?focusedWorklogId=522125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522125
 ]

ASF GitHub Bot logged work on HIVE-24475:
-

Author: ASF GitHub Bot
Created on: 09/Dec/20 08:52
Start Date: 09/Dec/20 08:52
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #1730:
URL: https://github.com/apache/hive/pull/1730


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 522125)
Time Spent: 1.5h  (was: 1h 20m)

> Generalize fixacidkeyindex utility
> --
>
> Key: HIVE-24475
> URL: https://issues.apache.org/jira/browse/HIVE-24475
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Transactions
>Affects Versions: 3.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There is a utility in hive which can validate/fix corrupted 
> hive.acid.key.index.
> hive --service fixacidkeyindex
> Unfortunately it is only tailored for a specific problem 
> (https://issues.apache.org/jira/browse/HIVE-18907), instead of generally 
> validating and recovering the hive.acid.key.index from the stripe data itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24508) hive.parquet.timestamp.skip.conversion doesn't work

2020-12-09 Thread Karen Coppage (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246366#comment-17246366
 ] 

Karen Coppage commented on HIVE-24508:
--

This is expected behavior.
 hive.parquet.timestamp.skip conversion only affects reading, not writing; and 
furthermore it only affects data not written by Hive. Please see the 
description:
{quote}"Current Hive implementation of parquet stores timestamps to UTC, this 
flag allows skipping of the conversion on reading parquet files from other 
tools."
{quote}

> hive.parquet.timestamp.skip.conversion doesn't work
> ---
>
> Key: HIVE-24508
> URL: https://issues.apache.org/jira/browse/HIVE-24508
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet
>Reporter: wenjun ma
>Assignee: wenjun ma
>Priority: Major
> Fix For: All Versions
>
>
> Even we set true or false. When we insert the current timestamp it always 
> uses the local time zone. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout.

2020-12-09 Thread Simhadri G (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri G updated HIVE-24497:
--
Attachment: hive-24497.01.patch

> Node heartbeats from LLAP Daemon to the client are not matching leading to 
> timeout.
> ---
>
> Key: HIVE-24497
> URL: https://issues.apache.org/jira/browse/HIVE-24497
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Minor
>  Labels: pull-request-available
> Attachments: hive-24497.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Node heartbeat contains info about all the tasks that were submitted to that 
> LLAP Daemon. In cloud deployment, the client is not able to match this 
> heartbeats due to differences in hostname and port .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

59 matches

Mail list logo