[jira] [Work logged] (HIVE-24274) Implement Query Text based MaterializedView rewrite
[ https://issues.apache.org/jira/browse/HIVE-24274?focusedWorklogId=522580=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522580 ] ASF GitHub Bot logged work on HIVE-24274: - Author: ASF GitHub Bot Created on: 10/Dec/20 07:11 Start Date: 10/Dec/20 07:11 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #1706: URL: https://github.com/apache/hive/pull/1706#discussion_r539925057 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -1844,6 +1844,9 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal // materialized views HIVE_MATERIALIZED_VIEW_ENABLE_AUTO_REWRITING("hive.materializedview.rewriting", true, "Whether to try to rewrite queries using the materialized views enabled for rewriting"), + HIVE_MATERIALIZED_VIEW_ENABLE_AUTO_REWRITING_QUERY_TEXT("hive.materializedview.rewriting.query.text", true, Review comment: renamed to `hive.materializedview.rewriting.sql` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522580) Time Spent: 40m (was: 0.5h) > Implement Query Text based MaterializedView rewrite > --- > > Key: HIVE-24274 > URL: https://issues.apache.org/jira/browse/HIVE-24274 > Project: Hive > Issue Type: Improvement >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Besides the way queries are currently rewritten to use materialized views in > Hive this project provides an alternative: > Compare the query text with the materialized views query text stored. If we > found a match the original query's logical plan can be replaced by a scan on > the materialized view. > - Only materialized views which are enabled to rewrite can participate > - Use existing *HiveMaterializedViewsRegistry* through *Hive* object by > adding a lookup method by query text. > - There might be more than one materialized views which have the same query > text. In this case chose the first valid one. > - Validation can be done by calling > *Hive.validateMaterializedViewsFromRegistry()* > - The scope of this first patch is rewriting queries which entire text can be > matched only. > - Use the expanded query text (fully qualified column and table names) for > comparing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-24497. -- Fix Version/s: 4.0.0 Resolution: Fixed Merged to master! Thanks for your contribution! > Node heartbeats from LLAP Daemon to the client are not matching leading to > timeout in cloud environment > --- > > Key: HIVE-24497 > URL: https://issues.apache.org/jira/browse/HIVE-24497 > Project: Hive > Issue Type: Sub-task >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: hive-24497.01.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Node heartbeat contains info about all the tasks that were submitted to that > LLAP Daemon. In cloud deployment, the client is not able to match this > heartbeats due to differences in hostname and port . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24497?focusedWorklogId=522559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522559 ] ASF GitHub Bot logged work on HIVE-24497: - Author: ASF GitHub Bot Created on: 10/Dec/20 05:33 Start Date: 10/Dec/20 05:33 Worklog Time Spent: 10m Work Description: prasanthj merged pull request #1755: URL: https://github.com/apache/hive/pull/1755 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522559) Time Spent: 50m (was: 40m) > Node heartbeats from LLAP Daemon to the client are not matching leading to > timeout in cloud environment > --- > > Key: HIVE-24497 > URL: https://issues.apache.org/jira/browse/HIVE-24497 > Project: Hive > Issue Type: Sub-task >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Minor > Labels: pull-request-available > Attachments: hive-24497.01.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Node heartbeat contains info about all the tasks that were submitted to that > LLAP Daemon. In cloud deployment, the client is not able to match this > heartbeats due to differences in hostname and port . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation
[ https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522547 ] ASF GitHub Bot logged work on HIVE-24471: - Author: ASF GitHub Bot Created on: 10/Dec/20 04:51 Start Date: 10/Dec/20 04:51 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #1736: URL: https://github.com/apache/hive/pull/1736#discussion_r539843531 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java ## @@ -712,6 +751,12 @@ private void processKey(Object row, @Override public void process(Object row, int tag) throws HiveException { +if (hashAggr) { + if (getConfiguration().get("forced.streaming.mode", "false").equals("true")) { Review comment: i have removed it in the next commit ..had added for test only. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522547) Time Spent: 1h 20m (was: 1h 10m) > Add support for combiner in hash mode group aggregation > > > Key: HIVE-24471 > URL: https://issues.apache.org/jira/browse/HIVE-24471 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In map side group aggregation, partial grouped aggregation is calculated to > reduce the data written to disk by map task. In case of hash aggregation, > where the input data is not sorted, hash table is used. If the hash table > size increases beyond configurable limit, data is flushed to disk and new > hash table is generated. If the reduction by hash table is less than min hash > aggregation reduction calculated during compile time, the map side > aggregation is converted to streaming mode. So if the first few batch of > records does not result into significant reduction, then the mode is switched > to streaming mode. This may have impact on performance, if the subsequent > batch of records have less number of distinct values. To mitigate this > situation, a combiner can be added to the map task after the keys are sorted. > This will make sure that the aggregation is done if possible and reduce the > data written to disk. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation
[ https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522546=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522546 ] ASF GitHub Bot logged work on HIVE-24471: - Author: ASF GitHub Bot Created on: 10/Dec/20 04:50 Start Date: 10/Dec/20 04:50 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #1736: URL: https://github.com/apache/hive/pull/1736#discussion_r539843194 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByCombiner.java ## @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByCombiner; +import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.BaseWork; +import org.apache.hadoop.hive.ql.plan.GroupByDesc; +import org.apache.hadoop.hive.ql.plan.ReduceWork; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; +import org.apache.hadoop.hive.serde2.AbstractSerDe; +import org.apache.hadoop.hive.serde2.Deserializer; +import org.apache.hadoop.hive.serde2.SerDeException; +import org.apache.hadoop.hive.serde2.SerDeUtils; +import org.apache.hadoop.hive.serde2.Serializer; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.io.BytesWritable; +import org.apache.hadoop.io.DataInputBuffer; +import org.apache.hadoop.util.ReflectionUtils; +import org.apache.tez.runtime.api.TaskContext; +import org.apache.tez.runtime.library.common.sort.impl.IFile; +import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.apache.hadoop.fs.Path; + +import java.io.IOException; +import java.util.ArrayList; + +import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK; +import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME; + +// Combiner for normal group by operator. In case of map side aggregate, the partially +// aggregated records are sorted based on group by key. If because of some reasons, like hash +// table memory exceeded the limit or the first few batches of records have less ndvs, the +// aggregation is not done, then here the aggregation can be done cheaply as the records +// are sorted based on group by key. +public class GroupByCombiner extends VectorGroupByCombiner { + + private static final Logger LOG = LoggerFactory.getLogger( + org.apache.hadoop.hive.ql.exec.GroupByCombiner.class.getName()); + + private transient GenericUDAFEvaluator[] aggregationEvaluators; + Deserializer valueDeserializer; + GenericUDAFEvaluator.AggregationBuffer[] aggregationBuffers; + GroupByOperator groupByOperator; + Serializer valueSerializer; + ObjectInspector aggrObjectInspector; + DataInputBuffer valueBuffer; + Object[] cachedValues; + + public GroupByCombiner(TaskContext taskContext) throws HiveException, IOException { +super(taskContext); +if (rw != null) { + try { +groupByOperator = (GroupByOperator) rw.getReducer(); + +ArrayList ois = new ArrayList(); +ois.add(keyObjectInspector); +ois.add(valueObjectInspector); +ObjectInspector[] rowObjectInspector = new ObjectInspector[1]; +rowObjectInspector[0] = + ObjectInspectorFactory.getStandardStructObjectInspector(Utilities.reduceFieldNameList, +ois); +groupByOperator.setInputObjInspectors(rowObjectInspector); +groupByOperator.initializeOp(conf); +aggregationBuffers = groupByOperator.getAggregationBuffers(); +aggregationEvaluators = groupByOperator.getAggregationEvaluator(); + +TableDesc valueTableDesc = rw.getTagToValueDesc().get(0); +valueSerializer
[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation
[ https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522544 ] ASF GitHub Bot logged work on HIVE-24471: - Author: ASF GitHub Bot Created on: 10/Dec/20 04:47 Start Date: 10/Dec/20 04:47 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #1736: URL: https://github.com/apache/hive/pull/1736#discussion_r539842254 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByCombiner.java ## @@ -0,0 +1,377 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.Utilities; +import org.apache.hadoop.hive.ql.exec.mr.ExecReducer; +import org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorAggregateExpression; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.ReduceWork; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.serde2.AbstractSerDe; +import org.apache.hadoop.hive.serde2.ByteStream; +import org.apache.hadoop.hive.serde2.Deserializer; +import org.apache.hadoop.hive.serde2.SerDeException; +import org.apache.hadoop.hive.serde2.SerDeUtils; +import org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinaryDeserializeRead; +import org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinarySerializeWrite; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.DataInputBuffer; +import org.apache.hadoop.mapreduce.TaskCounter; +import org.apache.hadoop.util.ReflectionUtils; +import org.apache.hadoop.util.StringUtils; +import org.apache.tez.common.TezUtils; +import org.apache.tez.common.counters.TezCounter; +import org.apache.tez.mapreduce.combine.MRCombiner; +import org.apache.tez.runtime.api.TaskContext; +import org.apache.tez.runtime.library.common.sort.impl.IFile; +import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.io.IOException; + +import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK; +import static org.apache.hadoop.hive.ql.exec.Utilities.MAPRED_REDUCER_CLASS; +import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME; +import static org.apache.hadoop.hive.serde2.lazy.fast.LazySimpleDeserializeRead.byteArrayCompareRanges; + +// Combiner for vectorized group by operator. In case of map side aggregate, the partially +// aggregated records are sorted based on group by key. If because of some reasons, like hash +// table memory exceeded the limit or the first few batches of records have less ndvs, the +// aggregation is not done, then here the aggregation can be done cheaply as the records +// are sorted based on group by key. +public class VectorGroupByCombiner extends MRCombiner { + private static final Logger LOG = LoggerFactory.getLogger( + VectorGroupByCombiner.class.getName()); + protected final Configuration conf; + protected final TezCounter combineInputRecordsCounter; + protected final TezCounter combineOutputRecordsCounter; + VectorAggregateExpression[] aggregators; + VectorAggregationBufferRow aggregationBufferRow; + protected transient LazyBinarySerializeWrite valueLazyBinarySerializeWrite; + + // This helper object serializes LazyBinary format reducer values from columns of a row + // in a vectorized row batch. + protected transient VectorSerializeRow valueVectorSerializeRow; + + // The output buffer used to serialize a value into. + protected transient ByteStream.Output valueOutput; + DataInputBuffer valueBytesWritable; + + // Only required minimal configs are copied to the worker nodes. This hack (file.) is + // done to include these configs to be copied to the worker node. + protected static String
[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation
[ https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522545 ] ASF GitHub Bot logged work on HIVE-24471: - Author: ASF GitHub Bot Created on: 10/Dec/20 04:47 Start Date: 10/Dec/20 04:47 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #1736: URL: https://github.com/apache/hive/pull/1736#discussion_r539842254 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByCombiner.java ## @@ -0,0 +1,377 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.Utilities; +import org.apache.hadoop.hive.ql.exec.mr.ExecReducer; +import org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorAggregateExpression; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.ReduceWork; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.serde2.AbstractSerDe; +import org.apache.hadoop.hive.serde2.ByteStream; +import org.apache.hadoop.hive.serde2.Deserializer; +import org.apache.hadoop.hive.serde2.SerDeException; +import org.apache.hadoop.hive.serde2.SerDeUtils; +import org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinaryDeserializeRead; +import org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinarySerializeWrite; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.DataInputBuffer; +import org.apache.hadoop.mapreduce.TaskCounter; +import org.apache.hadoop.util.ReflectionUtils; +import org.apache.hadoop.util.StringUtils; +import org.apache.tez.common.TezUtils; +import org.apache.tez.common.counters.TezCounter; +import org.apache.tez.mapreduce.combine.MRCombiner; +import org.apache.tez.runtime.api.TaskContext; +import org.apache.tez.runtime.library.common.sort.impl.IFile; +import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.io.IOException; + +import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK; +import static org.apache.hadoop.hive.ql.exec.Utilities.MAPRED_REDUCER_CLASS; +import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME; +import static org.apache.hadoop.hive.serde2.lazy.fast.LazySimpleDeserializeRead.byteArrayCompareRanges; + +// Combiner for vectorized group by operator. In case of map side aggregate, the partially +// aggregated records are sorted based on group by key. If because of some reasons, like hash +// table memory exceeded the limit or the first few batches of records have less ndvs, the +// aggregation is not done, then here the aggregation can be done cheaply as the records +// are sorted based on group by key. +public class VectorGroupByCombiner extends MRCombiner { + private static final Logger LOG = LoggerFactory.getLogger( + VectorGroupByCombiner.class.getName()); + protected final Configuration conf; + protected final TezCounter combineInputRecordsCounter; + protected final TezCounter combineOutputRecordsCounter; + VectorAggregateExpression[] aggregators; + VectorAggregationBufferRow aggregationBufferRow; + protected transient LazyBinarySerializeWrite valueLazyBinarySerializeWrite; + + // This helper object serializes LazyBinary format reducer values from columns of a row + // in a vectorized row batch. + protected transient VectorSerializeRow valueVectorSerializeRow; + + // The output buffer used to serialize a value into. + protected transient ByteStream.Output valueOutput; + DataInputBuffer valueBytesWritable; + + // Only required minimal configs are copied to the worker nodes. This hack (file.) is + // done to include these configs to be copied to the worker node. + protected static String
[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation
[ https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522542 ] ASF GitHub Bot logged work on HIVE-24471: - Author: ASF GitHub Bot Created on: 10/Dec/20 04:46 Start Date: 10/Dec/20 04:46 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #1736: URL: https://github.com/apache/hive/pull/1736#discussion_r539842047 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByCombiner.java ## @@ -0,0 +1,377 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.Utilities; +import org.apache.hadoop.hive.ql.exec.mr.ExecReducer; +import org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.VectorAggregateExpression; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.ReduceWork; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.serde2.AbstractSerDe; +import org.apache.hadoop.hive.serde2.ByteStream; +import org.apache.hadoop.hive.serde2.Deserializer; +import org.apache.hadoop.hive.serde2.SerDeException; +import org.apache.hadoop.hive.serde2.SerDeUtils; +import org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinaryDeserializeRead; +import org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinarySerializeWrite; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.DataInputBuffer; +import org.apache.hadoop.mapreduce.TaskCounter; +import org.apache.hadoop.util.ReflectionUtils; +import org.apache.hadoop.util.StringUtils; +import org.apache.tez.common.TezUtils; +import org.apache.tez.common.counters.TezCounter; +import org.apache.tez.mapreduce.combine.MRCombiner; +import org.apache.tez.runtime.api.TaskContext; +import org.apache.tez.runtime.library.common.sort.impl.IFile; +import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.io.IOException; + +import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK; +import static org.apache.hadoop.hive.ql.exec.Utilities.MAPRED_REDUCER_CLASS; +import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME; +import static org.apache.hadoop.hive.serde2.lazy.fast.LazySimpleDeserializeRead.byteArrayCompareRanges; + +// Combiner for vectorized group by operator. In case of map side aggregate, the partially +// aggregated records are sorted based on group by key. If because of some reasons, like hash +// table memory exceeded the limit or the first few batches of records have less ndvs, the +// aggregation is not done, then here the aggregation can be done cheaply as the records +// are sorted based on group by key. +public class VectorGroupByCombiner extends MRCombiner { + private static final Logger LOG = LoggerFactory.getLogger( + VectorGroupByCombiner.class.getName()); + protected final Configuration conf; + protected final TezCounter combineInputRecordsCounter; + protected final TezCounter combineOutputRecordsCounter; + VectorAggregateExpression[] aggregators; + VectorAggregationBufferRow aggregationBufferRow; + protected transient LazyBinarySerializeWrite valueLazyBinarySerializeWrite; + + // This helper object serializes LazyBinary format reducer values from columns of a row + // in a vectorized row batch. + protected transient VectorSerializeRow valueVectorSerializeRow; + + // The output buffer used to serialize a value into. + protected transient ByteStream.Output valueOutput; + DataInputBuffer valueBytesWritable; + + // Only required minimal configs are copied to the worker nodes. This hack (file.) is + // done to include these configs to be copied to the worker node. + protected static String
[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation
[ https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522540 ] ASF GitHub Bot logged work on HIVE-24471: - Author: ASF GitHub Bot Created on: 10/Dec/20 04:45 Start Date: 10/Dec/20 04:45 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #1736: URL: https://github.com/apache/hive/pull/1736#discussion_r539841639 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByCombiner.java ## @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByCombiner; +import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.BaseWork; +import org.apache.hadoop.hive.ql.plan.GroupByDesc; +import org.apache.hadoop.hive.ql.plan.ReduceWork; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; +import org.apache.hadoop.hive.serde2.AbstractSerDe; +import org.apache.hadoop.hive.serde2.Deserializer; +import org.apache.hadoop.hive.serde2.SerDeException; +import org.apache.hadoop.hive.serde2.SerDeUtils; +import org.apache.hadoop.hive.serde2.Serializer; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.io.BytesWritable; +import org.apache.hadoop.io.DataInputBuffer; +import org.apache.hadoop.util.ReflectionUtils; +import org.apache.tez.runtime.api.TaskContext; +import org.apache.tez.runtime.library.common.sort.impl.IFile; +import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.apache.hadoop.fs.Path; + +import java.io.IOException; +import java.util.ArrayList; + +import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK; +import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME; + +// Combiner for normal group by operator. In case of map side aggregate, the partially +// aggregated records are sorted based on group by key. If because of some reasons, like hash +// table memory exceeded the limit or the first few batches of records have less ndvs, the +// aggregation is not done, then here the aggregation can be done cheaply as the records +// are sorted based on group by key. +public class GroupByCombiner extends VectorGroupByCombiner { + + private static final Logger LOG = LoggerFactory.getLogger( + org.apache.hadoop.hive.ql.exec.GroupByCombiner.class.getName()); + + private transient GenericUDAFEvaluator[] aggregationEvaluators; + Deserializer valueDeserializer; + GenericUDAFEvaluator.AggregationBuffer[] aggregationBuffers; + GroupByOperator groupByOperator; + Serializer valueSerializer; + ObjectInspector aggrObjectInspector; + DataInputBuffer valueBuffer; + Object[] cachedValues; + + public GroupByCombiner(TaskContext taskContext) throws HiveException, IOException { +super(taskContext); +if (rw != null) { + try { +groupByOperator = (GroupByOperator) rw.getReducer(); + +ArrayList ois = new ArrayList(); +ois.add(keyObjectInspector); +ois.add(valueObjectInspector); +ObjectInspector[] rowObjectInspector = new ObjectInspector[1]; +rowObjectInspector[0] = + ObjectInspectorFactory.getStandardStructObjectInspector(Utilities.reduceFieldNameList, +ois); +groupByOperator.setInputObjInspectors(rowObjectInspector); +groupByOperator.initializeOp(conf); +aggregationBuffers = groupByOperator.getAggregationBuffers(); +aggregationEvaluators = groupByOperator.getAggregationEvaluator(); + +TableDesc valueTableDesc = rw.getTagToValueDesc().get(0); +valueSerializer
[jira] [Work logged] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24497?focusedWorklogId=522536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522536 ] ASF GitHub Bot logged work on HIVE-24497: - Author: ASF GitHub Bot Created on: 10/Dec/20 04:39 Start Date: 10/Dec/20 04:39 Worklog Time Spent: 10m Work Description: simhadri-g commented on pull request #1755: URL: https://github.com/apache/hive/pull/1755#issuecomment-742234677 Thanks @prasanthj for the review, I have made the recommended changes . Please have a check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522536) Time Spent: 40m (was: 0.5h) > Node heartbeats from LLAP Daemon to the client are not matching leading to > timeout in cloud environment > --- > > Key: HIVE-24497 > URL: https://issues.apache.org/jira/browse/HIVE-24497 > Project: Hive > Issue Type: Sub-task >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Minor > Labels: pull-request-available > Attachments: hive-24497.01.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Node heartbeat contains info about all the tasks that were submitted to that > LLAP Daemon. In cloud deployment, the client is not able to match this > heartbeats due to differences in hostname and port . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation
[ https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522535 ] ASF GitHub Bot logged work on HIVE-24471: - Author: ASF GitHub Bot Created on: 10/Dec/20 04:39 Start Date: 10/Dec/20 04:39 Worklog Time Spent: 10m Work Description: t3rmin4t0r commented on a change in pull request #1736: URL: https://github.com/apache/hive/pull/1736#discussion_r539838422 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByCombiner.java ## @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByCombiner; +import org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.BaseWork; +import org.apache.hadoop.hive.ql.plan.GroupByDesc; +import org.apache.hadoop.hive.ql.plan.ReduceWork; +import org.apache.hadoop.hive.ql.plan.TableDesc; +import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; +import org.apache.hadoop.hive.serde2.AbstractSerDe; +import org.apache.hadoop.hive.serde2.Deserializer; +import org.apache.hadoop.hive.serde2.SerDeException; +import org.apache.hadoop.hive.serde2.SerDeUtils; +import org.apache.hadoop.hive.serde2.Serializer; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.io.BytesWritable; +import org.apache.hadoop.io.DataInputBuffer; +import org.apache.hadoop.util.ReflectionUtils; +import org.apache.tez.runtime.api.TaskContext; +import org.apache.tez.runtime.library.common.sort.impl.IFile; +import org.apache.tez.runtime.library.common.sort.impl.TezRawKeyValueIterator; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.apache.hadoop.fs.Path; + +import java.io.IOException; +import java.util.ArrayList; + +import static org.apache.hadoop.hive.ql.exec.Utilities.HAS_REDUCE_WORK; +import static org.apache.hadoop.hive.ql.exec.Utilities.REDUCE_PLAN_NAME; + +// Combiner for normal group by operator. In case of map side aggregate, the partially +// aggregated records are sorted based on group by key. If because of some reasons, like hash +// table memory exceeded the limit or the first few batches of records have less ndvs, the +// aggregation is not done, then here the aggregation can be done cheaply as the records +// are sorted based on group by key. +public class GroupByCombiner extends VectorGroupByCombiner { + + private static final Logger LOG = LoggerFactory.getLogger( + org.apache.hadoop.hive.ql.exec.GroupByCombiner.class.getName()); + + private transient GenericUDAFEvaluator[] aggregationEvaluators; + Deserializer valueDeserializer; + GenericUDAFEvaluator.AggregationBuffer[] aggregationBuffers; + GroupByOperator groupByOperator; + Serializer valueSerializer; + ObjectInspector aggrObjectInspector; + DataInputBuffer valueBuffer; + Object[] cachedValues; + + public GroupByCombiner(TaskContext taskContext) throws HiveException, IOException { +super(taskContext); +if (rw != null) { + try { +groupByOperator = (GroupByOperator) rw.getReducer(); + +ArrayList ois = new ArrayList(); +ois.add(keyObjectInspector); +ois.add(valueObjectInspector); +ObjectInspector[] rowObjectInspector = new ObjectInspector[1]; +rowObjectInspector[0] = + ObjectInspectorFactory.getStandardStructObjectInspector(Utilities.reduceFieldNameList, +ois); +groupByOperator.setInputObjInspectors(rowObjectInspector); +groupByOperator.initializeOp(conf); +aggregationBuffers = groupByOperator.getAggregationBuffers(); +aggregationEvaluators = groupByOperator.getAggregationEvaluator(); + +TableDesc valueTableDesc = rw.getTagToValueDesc().get(0); +valueSerializer
[jira] [Work logged] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly
[ https://issues.apache.org/jira/browse/HIVE-24207?focusedWorklogId=522521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522521 ] ASF GitHub Bot logged work on HIVE-24207: - Author: ASF GitHub Bot Created on: 10/Dec/20 03:42 Start Date: 10/Dec/20 03:42 Worklog Time Spent: 10m Work Description: rbalamohan commented on pull request #1556: URL: https://github.com/apache/hive/pull/1556#issuecomment-742218526 select ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk limit 100; Above query runs in **80+ seconds** in a small cluster with cloud storage, where as with the patch it took just **4 seconds.** So that is good news. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522521) Time Spent: 50m (was: 40m) > LimitOperator can leverage ObjectCache to bail out quickly > -- > > Key: HIVE-24207 > URL: https://issues.apache.org/jira/browse/HIVE-24207 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > {noformat} > select ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in > (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk > limit 100; > select distinct ss_sold_date_sk from store_sales, date_dim where > date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = > date_dim.d_date_sk limit 100; > {noformat} > Queries like the above generate a large number of map tasks. Currently they > don't bail out after generating enough amount of data. > It would be good to make use of ObjectCache & retain the number of records > generated. LimitOperator/VectorLimitOperator can bail out for the later tasks > in the operator's init phase itself. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly
[ https://issues.apache.org/jira/browse/HIVE-24207?focusedWorklogId=522520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522520 ] ASF GitHub Bot logged work on HIVE-24207: - Author: ASF GitHub Bot Created on: 10/Dec/20 03:39 Start Date: 10/Dec/20 03:39 Worklog Time Spent: 10m Work Description: rbalamohan commented on a change in pull request #1556: URL: https://github.com/apache/hive/pull/1556#discussion_r539820370 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java ## @@ -1811,4 +1819,11 @@ static long parseRightmostXmx(String javaOpts) { } return allNonAppFileResources; } + + public static void initTezAttributes(Configuration conf, ProcessorContext context) { Review comment: Move this in TezProcessor itself? It is not DAG specific and is needed mainly for logging purposes. Having it within TezProcessor will make it easier to read and not confuse anyone trying to walk through DagUtils. ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java ## @@ -19,11 +19,18 @@ package org.apache.hadoop.hive.ql.exec; import java.io.Serializable; +import java.util.concurrent.Callable; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.conf.HiveConf; import org.apache.hadoop.hive.ql.CompilationOpContext; +import org.apache.hadoop.hive.ql.exec.tez.DagUtils; +import org.apache.hadoop.hive.ql.exec.tez.LlapObjectCache; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.plan.LimitDesc; +import org.apache.hadoop.hive.ql.plan.OperatorDesc; Review comment: remove this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522520) Time Spent: 40m (was: 0.5h) > LimitOperator can leverage ObjectCache to bail out quickly > -- > > Key: HIVE-24207 > URL: https://issues.apache.org/jira/browse/HIVE-24207 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > select ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in > (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk > limit 100; > select distinct ss_sold_date_sk from store_sales, date_dim where > date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = > date_dim.d_date_sk limit 100; > {noformat} > Queries like the above generate a large number of map tasks. Currently they > don't bail out after generating enough amount of data. > It would be good to make use of ObjectCache & retain the number of records > generated. LimitOperator/VectorLimitOperator can bail out for the later tasks > in the operator's init phase itself. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24513) Advance write Id during AlterTableDropConstraint DDL
[ https://issues.apache.org/jira/browse/HIVE-24513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das reassigned HIVE-24513: - > Advance write Id during AlterTableDropConstraint DDL > > > Key: HIVE-24513 > URL: https://issues.apache.org/jira/browse/HIVE-24513 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > For AlterTableDropConstraint related DDL tasks, although we might be > advancing the write ID, looks like it's not updated correctly during the > Analyzer phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22415) Upgrade to Java 11
[ https://issues.apache.org/jira/browse/HIVE-22415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246965#comment-17246965 ] David Mollitor commented on HIVE-22415: --- Alternatively, it may be possible to load each Mini Cluster (HDFS, ZK, Druid, Kafka, etc.) into its own class loader so that these library conflicts (JAR HELL) is averted. > Upgrade to Java 11 > -- > > Key: HIVE-22415 > URL: https://issues.apache.org/jira/browse/HIVE-22415 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > Upgrade Hive to Java JDK 11 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-22415) Upgrade to Java 11
[ https://issues.apache.org/jira/browse/HIVE-22415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246960#comment-17246960 ] David Mollitor edited comment on HIVE-22415 at 12/10/20, 2:08 AM: -- OK, just wanted to provide an update here. I have been working hard on getting Hadoop 3.3 working with Hive, so that JDK 11 can be supported, and it's been a challenge. [HIVE-24484] I have worked through some of the initial pain points, but I got stuck. Hadoop introduced a new RPC mechanism using Google Protobuf v3. Some of the LLAP stuff was built on top of the existing Hadoop RPC code with Protobuf2. It seems that Hadoop tried to allow for interoperability between the two RPCs, however, loading one version of the RPC engine blocks the loading of the other one (first one wins). I think this becomes an issue for QTests since the tests may spin up an LLAP and a Hadoop mini cluster in the same classloader context. Simply loading the Protobuf3 Hadoop RPC (NameNode) code blocks the loading of the Protobuf2 Hadoop RPC (LLAP) code. Without any changes on the Hadoop side to better support this setup, the LLAP code needs to be migrated to use Protobuf3 and to use the Hadoop 3rd part JAR with its shaded Protobuf version. https://github.com/apache/hadoop/blob/0a45bd034e1ce08c556227bb2c815c15be17cf10/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine2.java#L63-L67 https://github.com/apache/hadoop/blob/0a45bd034e1ce08c556227bb2c815c15be17cf10/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L70-L74 was (Author: belugabehr): OK, just wanted to provide an update here. I have been working hard on getting Hadoop 3.3 working with Hive, so that JDK 11 can be supported, and it's been a challenge. [HIVE-24484] I have worked through some of the initial pain points, but I got stuck. Hadoop introduced a new RPC mechanism using Google Protobuf v3. Some of the LLAP stuff was built on top of the existing Hadoop RPC code with Protobuf2. It seems that Hadoop tried to all for interoperability between the two RPCs, however, loading one version of the RPC engine blocks the loading of the other one (first one wins). I think this becomes an issue for QTests since the tests may spin up an LLAP and a Hadoop mini cluster in the same classloader context. Simply loading the Protobuf3 Hadoop RPC (NameNode) code blocks the loading of the Protobuf2 Hadoop RPC (LLAP) code. Without any changes on the Hadoop side to better support this setup, the LLAP code needs to be migrated to use Protobuf3 and to use the Hadoop 3rd part JAR with its shaded Protobuf version. https://github.com/apache/hadoop/blob/0a45bd034e1ce08c556227bb2c815c15be17cf10/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine2.java#L63-L67 https://github.com/apache/hadoop/blob/0a45bd034e1ce08c556227bb2c815c15be17cf10/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L70-L74 > Upgrade to Java 11 > -- > > Key: HIVE-22415 > URL: https://issues.apache.org/jira/browse/HIVE-22415 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > Upgrade Hive to Java JDK 11 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22415) Upgrade to Java 11
[ https://issues.apache.org/jira/browse/HIVE-22415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246960#comment-17246960 ] David Mollitor commented on HIVE-22415: --- OK, just wanted to provide an update here. I have been working hard on getting Hadoop 3.3 working with Hive, so that JDK 11 can be supported, and it's been a challenge. [HIVE-24484] I have worked through some of the initial pain points, but I got stuck. Hadoop introduced a new RPC mechanism using Google Protobuf v3. Some of the LLAP stuff was built on top of the existing Hadoop RPC code with Protobuf2. It seems that Hadoop tried to all for interoperability between the two RPCs, however, loading one version of the RPC engine blocks the loading of the other one (first one wins). I think this becomes an issue for QTests since the tests may spin up an LLAP and a Hadoop mini cluster in the same classloader context. Simply loading the Protobuf3 Hadoop RPC (NameNode) code blocks the loading of the Protobuf2 Hadoop RPC (LLAP) code. Without any changes on the Hadoop side to better support this setup, the LLAP code needs to be migrated to use Protobuf3 and to use the Hadoop 3rd part JAR with its shaded Protobuf version. https://github.com/apache/hadoop/blob/0a45bd034e1ce08c556227bb2c815c15be17cf10/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine2.java#L63-L67 https://github.com/apache/hadoop/blob/0a45bd034e1ce08c556227bb2c815c15be17cf10/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L70-L74 > Upgrade to Java 11 > -- > > Key: HIVE-22415 > URL: https://issues.apache.org/jira/browse/HIVE-22415 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > Upgrade Hive to Java JDK 11 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24274) Implement Query Text based MaterializedView rewrite
[ https://issues.apache.org/jira/browse/HIVE-24274?focusedWorklogId=522496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522496 ] ASF GitHub Bot logged work on HIVE-24274: - Author: ASF GitHub Bot Created on: 10/Dec/20 01:17 Start Date: 10/Dec/20 01:17 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1706: URL: https://github.com/apache/hive/pull/1706#discussion_r539730309 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java ## @@ -57,7 +57,7 @@ public void analyzeInternal(ASTNode root) throws SemanticException { ASTNode tableTree = (ASTNode) root.getChild(0); TableName tableName = getQualifiedTableName(tableTree); -if (ctx.enableUnparse()) { +if (ctx.isScheduledQuery()) { unparseTranslator.addTableNameTranslation(tableTree, SessionState.get().getCurrentDatabase()); Review comment: Can we add a comment (I know that the code was not added in this patch but it is useful to have some clarification on why this is being done)? ## File path: ql/src/java/org/apache/hadoop/hive/ql/Context.java ## @@ -336,6 +344,9 @@ private Context(Configuration conf, String executionId) { opContext = new CompilationOpContext(); viewsTokenRewriteStreams = new HashMap<>(); +enableUnparse = Review comment: Can we add a comment why we are only enabling this when this config value is true? `enableUnparse` documentation has a description on why it is not enabled in general. However, it is worth having a comment here, since it is difficult to establish the connection between the config property and the variable. ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -1945,6 +1945,18 @@ public RelOptMaterialization getMaterializedViewForRebuild(String dbName, String } } + public List getMaterialization( Review comment: add javadoc? Also, should this method be renamed to `getSQLMatchingMaterializedView` or anything more descriptive? ## File path: ql/src/test/queries/clientpositive/materialized_view_create_rewrite.q ## @@ -5,6 +5,7 @@ set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.strict.checks.cartesian.product=false; set hive.materializedview.rewriting=true; +set hive.materializedview.rewriting.query.text=false; Review comment: Why do we disable it here? ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewsCache.java ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.metadata; + +import org.apache.calcite.plan.RelOptMaterialization; +import org.apache.hadoop.hive.ql.optimizer.calcite.rules.views.HiveMaterializedViewUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentMap; +import java.util.function.BiFunction; + +import static java.util.Collections.emptyList; +import static java.util.Collections.unmodifiableList; + +/** + * Collection for storing {@link RelOptMaterialization}s. + * RelOptMaterialization can be lookup by + * - the Materialized View fully qualified name + * - query text. + * This implementation contains two {@link ConcurrentHashMap} one for name based and one for query text based lookup. + * The map contents are synchronized during each dml operation: Dml operations are performed initially on the map + * which provides name based lookup. The map which provides query text based lookup is updated by lambda expressions + * passed to {@link ConcurrentHashMap#compute(Object, BiFunction)}. + */ +public class MaterializedViewsCache { + private static final Logger LOG = LoggerFactory.getLogger(MaterializedViewsCache.class); + + // Key is the database name.
[jira] [Work logged] (HIVE-24254) Remove setOwner call in ReplChangeManager
[ https://issues.apache.org/jira/browse/HIVE-24254?focusedWorklogId=522487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522487 ] ASF GitHub Bot logged work on HIVE-24254: - Author: ASF GitHub Bot Created on: 10/Dec/20 00:50 Start Date: 10/Dec/20 00:50 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1567: URL: https://github.com/apache/hive/pull/1567#issuecomment-742159729 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522487) Time Spent: 0.5h (was: 20m) > Remove setOwner call in ReplChangeManager > - > > Key: HIVE-24254 > URL: https://issues.apache.org/jira/browse/HIVE-24254 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24254.01.patch, HIVE-24254.02.patch, > HIVE-24254.03.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24218) Drop table used by a materialized view
[ https://issues.apache.org/jira/browse/HIVE-24218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246910#comment-17246910 ] Pritha Dawn commented on HIVE-24218: Duplicate of https://issues.apache.org/jira/browse/HIVE-22566 > Drop table used by a materialized view > -- > > Key: HIVE-24218 > URL: https://issues.apache.org/jira/browse/HIVE-24218 > Project: Hive > Issue Type: Bug > Components: CLI, Hive, HiveServer2, Metastore >Affects Versions: 3.1.0 >Reporter: stephbat >Priority: Critical > > I have discovered that it's possible to drop a table used by a materialized > view. When I drop this table, the result is OK while I think this action > should be refused. When I check in the metastore database, I can see that the > table has been partially deleted (ie : the reference of the table still > exists in TBLS and in MV_TABLES_USED). This introduces an inconsistency in > the metastore. > Steps to reproduced : > {code:java} > jdbc:hive2://localhost.> use use ptest2_db_dev; > No rows affected (0.067 seconds) > 0: jdbc:hive2://localhost.> create table table_blocked (id string); > No rows affected (0.97 seconds) > 0: jdbc:hive2://localhost.> desc table_blocked; > +---++--+ > | col_name | data_type | comment | > +---++--+ > | id| string | | > +---++--+ > 1 row selected (0.171 seconds) > 0: jdbc:hive2://localhost.> create materialized view table_blocked_mv as > select * from table_blocked; > No rows affected (18.055 seconds) > 0: jdbc:hive2://localhost.> desc table_blocked_mv; > +---++--+ > | col_name | data_type | comment | > +---++--+ > | id| string | | > +---++--+ > 1 row selected (0.316 seconds) > 0: jdbc:hive2://localhost.> drop table table_blocked; > No rows affected (10.803 seconds) > 0: jdbc:hive2://localhost.> desc table_blocked_mv; > +---++--+ > | col_name | data_type | comment | > +---++--+ > | id| string | | > +---++--+ > 1 row selected (0.222 seconds) > 0: jdbc:hive2://localhost.> desc table_blocked; > Error: Error while compiling statement: FAILED: SemanticException Unable to > fetch table table_blocked. null (state=42000,code=4) > 0: jdbc:hive2://localhost.> select * from table_blocked_mv; > Error: Error while compiling statement: FAILED: SemanticException Table > ptest2_db_dev.table_blocked not found when trying to obtain it to check > masking/filtering policies (state=42000,code=4) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24512) Exclude calcite in packaging Hive
[ https://issues.apache.org/jira/browse/HIVE-24512?focusedWorklogId=522468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522468 ] ASF GitHub Bot logged work on HIVE-24512: - Author: ASF GitHub Bot Created on: 09/Dec/20 23:20 Start Date: 09/Dec/20 23:20 Worklog Time Spent: 10m Work Description: viirya commented on pull request #1760: URL: https://github.com/apache/hive/pull/1760#issuecomment-742125772 Thanks @sunchao This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522468) Time Spent: 50m (was: 40m) > Exclude calcite in packaging Hive > - > > Key: HIVE-24512 > URL: https://issues.apache.org/jira/browse/HIVE-24512 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.8 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 2.3.8 > > Time Spent: 50m > Remaining Estimate: 0h > > The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded > calcite, but we see such error: > Caused by: java.lang.NoSuchMethodError: > org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V > at > org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29) > at > org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181) > at > org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175) > We find in 2.3.8 binary distribution, there are calcite jars: > calcite-core-1.10.0.jar > calcite-druid-1.10.0.jar > calcite-linq4j-1.10.0.jar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24388) Enhance swo optimizations to merge EventOperators
[ https://issues.apache.org/jira/browse/HIVE-24388?focusedWorklogId=522465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522465 ] ASF GitHub Bot logged work on HIVE-24388: - Author: ASF GitHub Bot Created on: 09/Dec/20 23:08 Start Date: 09/Dec/20 23:08 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1750: URL: https://github.com/apache/hive/pull/1750#discussion_r539697350 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java ## @@ -17,6 +17,7 @@ */ package org.apache.hadoop.hive.ql.optimizer; +import java.io.File; Review comment: Needed? ## File path: ql/src/test/results/clientpositive/llap/swo_event_merge.q.out ## @@ -0,0 +1,291 @@ +PREHOOK: query: drop table if exists x1_store_sales +PREHOOK: type: DROPTABLE +POSTHOOK: query: drop table if exists x1_store_sales +POSTHOOK: type: DROPTABLE +PREHOOK: query: drop table if exists x1_date_dim +PREHOOK: type: DROPTABLE +POSTHOOK: query: drop table if exists x1_date_dim +POSTHOOK: type: DROPTABLE +PREHOOK: query: drop table if exists x1_item +PREHOOK: type: DROPTABLE +POSTHOOK: query: drop table if exists x1_item +POSTHOOK: type: DROPTABLE +PREHOOK: query: create table x1_store_sales +( + ss_item_sk int +) +partitioned by (ss_sold_date_sk int) +stored as orc +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@x1_store_sales +POSTHOOK: query: create table x1_store_sales +( + ss_item_sk int +) +partitioned by (ss_sold_date_sk int) +stored as orc +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@x1_store_sales +PREHOOK: query: create table x1_date_dim +( + d_date_sk int, + d_month_seq int, + d_year int, + d_moy int +) +stored as orc +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@x1_date_dim +POSTHOOK: query: create table x1_date_dim +( + d_date_sk int, + d_month_seq int, + d_year int, + d_moy int +) +stored as orc +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@x1_date_dim +PREHOOK: query: insert into x1_date_dim values (1,1,2000,2), + (2,2,2001,2) +PREHOOK: type: QUERY +PREHOOK: Input: _dummy_database@_dummy_table +PREHOOK: Output: default@x1_date_dim +POSTHOOK: query: insert into x1_date_dim values(1,1,2000,2), + (2,2,2001,2) +POSTHOOK: type: QUERY +POSTHOOK: Input: _dummy_database@_dummy_table +POSTHOOK: Output: default@x1_date_dim +POSTHOOK: Lineage: x1_date_dim.d_date_sk SCRIPT [] +POSTHOOK: Lineage: x1_date_dim.d_month_seq SCRIPT [] +POSTHOOK: Lineage: x1_date_dim.d_moy SCRIPT [] +POSTHOOK: Lineage: x1_date_dim.d_year SCRIPT [] +PREHOOK: query: insert into x1_store_sales partition (ss_sold_date_sk=1) values (1) +PREHOOK: type: QUERY +PREHOOK: Input: _dummy_database@_dummy_table +PREHOOK: Output: default@x1_store_sales@ss_sold_date_sk=1 +POSTHOOK: query: insert into x1_store_sales partition (ss_sold_date_sk=1) values (1) +POSTHOOK: type: QUERY +POSTHOOK: Input: _dummy_database@_dummy_table +POSTHOOK: Output: default@x1_store_sales@ss_sold_date_sk=1 +POSTHOOK: Lineage: x1_store_sales PARTITION(ss_sold_date_sk=1).ss_item_sk SCRIPT [] +PREHOOK: query: insert into x1_store_sales partition (ss_sold_date_sk=2) values (2) +PREHOOK: type: QUERY +PREHOOK: Input: _dummy_database@_dummy_table +PREHOOK: Output: default@x1_store_sales@ss_sold_date_sk=2 +POSTHOOK: query: insert into x1_store_sales partition (ss_sold_date_sk=2) values (2) +POSTHOOK: type: QUERY +POSTHOOK: Input: _dummy_database@_dummy_table +POSTHOOK: Output: default@x1_store_sales@ss_sold_date_sk=2 +POSTHOOK: Lineage: x1_store_sales PARTITION(ss_sold_date_sk=2).ss_item_sk SCRIPT [] +PREHOOK: query: alter table x1_store_sales partition (ss_sold_date_sk=1) update statistics set( +'numRows'='123456', +'rawDataSize'='1234567') +PREHOOK: type: ALTERTABLE_UPDATEPARTSTATS +PREHOOK: Input: default@x1_store_sales +PREHOOK: Output: default@x1_store_sales@ss_sold_date_sk=1 +POSTHOOK: query: alter table x1_store_sales partition (ss_sold_date_sk=1) update statistics set( +'numRows'='123456', +'rawDataSize'='1234567') +POSTHOOK: type: ALTERTABLE_UPDATEPARTSTATS +POSTHOOK: Input: default@x1_store_sales +POSTHOOK: Input: default@x1_store_sales@ss_sold_date_sk=1 +POSTHOOK: Output: default@x1_store_sales@ss_sold_date_sk=1 +PREHOOK: query: alter table x1_date_dim update statistics set( +'numRows'='56', +'rawDataSize'='81449') +PREHOOK: type: ALTERTABLE_UPDATETABLESTATS +PREHOOK: Input: default@x1_date_dim +PREHOOK: Output: default@x1_date_dim +POSTHOOK: query: alter table x1_date_dim update statistics set(
[jira] [Resolved] (HIVE-24512) Exclude calcite in packaging Hive
[ https://issues.apache.org/jira/browse/HIVE-24512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HIVE-24512. - Fix Version/s: 2.3.8 Hadoop Flags: Reviewed Assignee: L. C. Hsieh Resolution: Fixed > Exclude calcite in packaging Hive > - > > Key: HIVE-24512 > URL: https://issues.apache.org/jira/browse/HIVE-24512 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.8 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 2.3.8 > > Time Spent: 40m > Remaining Estimate: 0h > > The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded > calcite, but we see such error: > Caused by: java.lang.NoSuchMethodError: > org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V > at > org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29) > at > org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181) > at > org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175) > We find in 2.3.8 binary distribution, there are calcite jars: > calcite-core-1.10.0.jar > calcite-druid-1.10.0.jar > calcite-linq4j-1.10.0.jar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24512) Exclude calcite in packaging Hive
[ https://issues.apache.org/jira/browse/HIVE-24512?focusedWorklogId=522452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522452 ] ASF GitHub Bot logged work on HIVE-24512: - Author: ASF GitHub Bot Created on: 09/Dec/20 22:40 Start Date: 09/Dec/20 22:40 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1760: URL: https://github.com/apache/hive/pull/1760#issuecomment-742108787 CI test run finished and looks good. Merged to branch-2.3 and branch-2. Thanks @viirya . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522452) Time Spent: 40m (was: 0.5h) > Exclude calcite in packaging Hive > - > > Key: HIVE-24512 > URL: https://issues.apache.org/jira/browse/HIVE-24512 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.8 >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded > calcite, but we see such error: > Caused by: java.lang.NoSuchMethodError: > org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V > at > org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29) > at > org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181) > at > org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175) > We find in 2.3.8 binary distribution, there are calcite jars: > calcite-core-1.10.0.jar > calcite-druid-1.10.0.jar > calcite-linq4j-1.10.0.jar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24512) Exclude calcite in packaging Hive
[ https://issues.apache.org/jira/browse/HIVE-24512?focusedWorklogId=522451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522451 ] ASF GitHub Bot logged work on HIVE-24512: - Author: ASF GitHub Bot Created on: 09/Dec/20 22:39 Start Date: 09/Dec/20 22:39 Worklog Time Spent: 10m Work Description: sunchao merged pull request #1760: URL: https://github.com/apache/hive/pull/1760 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522451) Time Spent: 0.5h (was: 20m) > Exclude calcite in packaging Hive > - > > Key: HIVE-24512 > URL: https://issues.apache.org/jira/browse/HIVE-24512 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.8 >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded > calcite, but we see such error: > Caused by: java.lang.NoSuchMethodError: > org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V > at > org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29) > at > org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181) > at > org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175) > We find in 2.3.8 binary distribution, there are calcite jars: > calcite-core-1.10.0.jar > calcite-druid-1.10.0.jar > calcite-linq4j-1.10.0.jar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24484) Upgrade Hadoop to 3.3.0
[ https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-24484: -- Summary: Upgrade Hadoop to 3.3.0 (was: Upgrade Hadoop to 3.2.1) > Upgrade Hadoop to 3.3.0 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry
[ https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=522420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522420 ] ASF GitHub Bot logged work on HIVE-24468: - Author: ASF GitHub Bot Created on: 09/Dec/20 20:22 Start Date: 09/Dec/20 20:22 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1728: URL: https://github.com/apache/hive/pull/1728 …g DB Entry ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522420) Time Spent: 2h (was: 1h 50m) > Use Event Time instead of Current Time in Notification Log DB Entry > --- > > Key: HIVE-24468 > URL: https://issues.apache.org/jira/browse/HIVE-24468 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry
[ https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=522419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522419 ] ASF GitHub Bot logged work on HIVE-24468: - Author: ASF GitHub Bot Created on: 09/Dec/20 20:21 Start Date: 09/Dec/20 20:21 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1728: URL: https://github.com/apache/hive/pull/1728 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522419) Time Spent: 1h 50m (was: 1h 40m) > Use Event Time instead of Current Time in Notification Log DB Entry > --- > > Key: HIVE-24468 > URL: https://issues.apache.org/jira/browse/HIVE-24468 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24512) Exclude calcite in packaging Hive
[ https://issues.apache.org/jira/browse/HIVE-24512?focusedWorklogId=522362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522362 ] ASF GitHub Bot logged work on HIVE-24512: - Author: ASF GitHub Bot Created on: 09/Dec/20 17:42 Start Date: 09/Dec/20 17:42 Worklog Time Spent: 10m Work Description: viirya commented on pull request #1760: URL: https://github.com/apache/hive/pull/1760#issuecomment-741936967 cc @sunchao This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522362) Time Spent: 20m (was: 10m) > Exclude calcite in packaging Hive > - > > Key: HIVE-24512 > URL: https://issues.apache.org/jira/browse/HIVE-24512 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.8 >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded > calcite, but we see such error: > Caused by: java.lang.NoSuchMethodError: > org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V > at > org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29) > at > org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181) > at > org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175) > We find in 2.3.8 binary distribution, there are calcite jars: > calcite-core-1.10.0.jar > calcite-druid-1.10.0.jar > calcite-linq4j-1.10.0.jar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24512) Exclude calcite in packaging Hive
[ https://issues.apache.org/jira/browse/HIVE-24512?focusedWorklogId=522361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522361 ] ASF GitHub Bot logged work on HIVE-24512: - Author: ASF GitHub Bot Created on: 09/Dec/20 17:42 Start Date: 09/Dec/20 17:42 Worklog Time Spent: 10m Work Description: viirya opened a new pull request #1760: URL: https://github.com/apache/hive/pull/1760 ### What changes were proposed in this pull request? This proposes to exclude calcite in packaging to avoid conflicting with shaded calcite in ql. ### Why are the changes needed? The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded calcite, but we see such error: Caused by: java.lang.NoSuchMethodError: org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29) at org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181) at org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175) We find in 2.3.8 binary distribution, there are calcite jars: calcite-core-1.10.0.jar calcite-druid-1.10.0.jar calcite-linq4j-1.10.0.jar We need to exclude calcite in packaging. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522361) Remaining Estimate: 0h Time Spent: 10m > Exclude calcite in packaging Hive > - > > Key: HIVE-24512 > URL: https://issues.apache.org/jira/browse/HIVE-24512 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.8 >Reporter: L. C. Hsieh >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded > calcite, but we see such error: > Caused by: java.lang.NoSuchMethodError: > org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V > at > org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29) > at > org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181) > at > org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175) > We find in 2.3.8 binary distribution, there are calcite jars: > calcite-core-1.10.0.jar > calcite-druid-1.10.0.jar > calcite-linq4j-1.10.0.jar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24512) Exclude calcite in packaging Hive
[ https://issues.apache.org/jira/browse/HIVE-24512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24512: -- Labels: pull-request-available (was: ) > Exclude calcite in packaging Hive > - > > Key: HIVE-24512 > URL: https://issues.apache.org/jira/browse/HIVE-24512 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.8 >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded > calcite, but we see such error: > Caused by: java.lang.NoSuchMethodError: > org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V > at > org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29) > at > org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181) > at > org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175) > We find in 2.3.8 binary distribution, there are calcite jars: > calcite-core-1.10.0.jar > calcite-druid-1.10.0.jar > calcite-linq4j-1.10.0.jar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24512) Exclude calcite in packaging Hive
[ https://issues.apache.org/jira/browse/HIVE-24512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated HIVE-24512: --- Affects Version/s: 2.3.8 > Exclude calcite in packaging Hive > - > > Key: HIVE-24512 > URL: https://issues.apache.org/jira/browse/HIVE-24512 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.8 >Reporter: L. C. Hsieh >Priority: Major > > The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded > calcite, but we see such error: > Caused by: java.lang.NoSuchMethodError: > org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V > at > org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29) > at > org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181) > at > org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175) > We find in 2.3.8 binary distribution, there are calcite jars: > calcite-core-1.10.0.jar > calcite-druid-1.10.0.jar > calcite-linq4j-1.10.0.jar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24512) Exclude calcite in packaging Hive
[ https://issues.apache.org/jira/browse/HIVE-24512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated HIVE-24512: --- Description: The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded calcite, but we see such error: Caused by: java.lang.NoSuchMethodError: org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29) at org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181) at org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175) We find in 2.3.8 binary distribution, there are calcite jars: calcite-core-1.10.0.jar calcite-druid-1.10.0.jar calcite-linq4j-1.10.0.jar > Exclude calcite in packaging Hive > - > > Key: HIVE-24512 > URL: https://issues.apache.org/jira/browse/HIVE-24512 > Project: Hive > Issue Type: Bug >Reporter: L. C. Hsieh >Priority: Major > > The issue is similar to HIVE-23593. In 2.3.8 RC, ql already has a shaded > calcite, but we see such error: > Caused by: java.lang.NoSuchMethodError: > org.apache.calcite.rel.RelCollationImpl.(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V > at > org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.(HiveRelCollation.java:29) > at > org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181) > at > org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175) > We find in 2.3.8 binary distribution, there are calcite jars: > calcite-core-1.10.0.jar > calcite-druid-1.10.0.jar > calcite-linq4j-1.10.0.jar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri G updated HIVE-24497: -- Summary: Node heartbeats from LLAP Daemon to the client are not matching leading to timeout in cloud environment (was: Node heartbeats from LLAP Daemon to the client are not matching leading to timeout.) > Node heartbeats from LLAP Daemon to the client are not matching leading to > timeout in cloud environment > --- > > Key: HIVE-24497 > URL: https://issues.apache.org/jira/browse/HIVE-24497 > Project: Hive > Issue Type: Sub-task >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Minor > Labels: pull-request-available > Attachments: hive-24497.01.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Node heartbeat contains info about all the tasks that were submitted to that > LLAP Daemon. In cloud deployment, the client is not able to match this > heartbeats due to differences in hostname and port . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly
[ https://issues.apache.org/jira/browse/HIVE-24504?focusedWorklogId=522260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522260 ] ASF GitHub Bot logged work on HIVE-24504: - Author: ASF GitHub Bot Created on: 09/Dec/20 14:23 Start Date: 09/Dec/20 14:23 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1758: URL: https://github.com/apache/hive/pull/1758#discussion_r539346886 ## File path: ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestSerializer.java ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.io.arrow; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; +import org.junit.Assert; +import org.junit.Test; + +import java.util.Arrays; +import java.util.List; + +public class TestSerializer { + @Test + public void testEmptyArray() { Review comment: > The name was based on the Hive type, but I think both makes sense so renamed Cant disagree with that :D This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522260) Time Spent: 50m (was: 40m) > VectorFileSinkArrowOperator does not serialize complex types correctly > -- > > Key: HIVE-24504 > URL: https://issues.apache.org/jira/browse/HIVE-24504 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > When the table has complex types and the result has 0 records the > VectorFileSinkArrowOperator only serializes the primitive types correctly. > For complex types only the main type is set which causes issues for clients > trying to read data. > Got the following HWC exception: > {code:java} > Previous exception in task: Unsupported data type: Null > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71) > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106) > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98) > > org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala) > > org.apache.spark.sql.vectorized.ArrowColumnVector.(ArrowColumnVector.java:135) > > com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105) > > com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29) > > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59) > > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown > Source) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > >
[jira] [Work logged] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly
[ https://issues.apache.org/jira/browse/HIVE-24504?focusedWorklogId=522259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522259 ] ASF GitHub Bot logged work on HIVE-24504: - Author: ASF GitHub Bot Created on: 09/Dec/20 14:10 Start Date: 09/Dec/20 14:10 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1758: URL: https://github.com/apache/hive/pull/1758#discussion_r539336699 ## File path: ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestSerializer.java ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.io.arrow; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; +import org.junit.Assert; +import org.junit.Test; + +import java.util.Arrays; +import java.util.List; + +public class TestSerializer { + @Test + public void testEmptyArray() { +List typeInfos = TypeInfoUtils.getTypeInfosFromTypeString("array"); +List fieldNames = Arrays.asList(new String[]{"a"}); +Serializer converter = new Serializer(new HiveConf(), "attemptId", typeInfos, fieldNames); +ArrowWrapperWritable writable = converter.emptyBatch(); +Assert.assertEquals("Schema>", +writable.getVectorSchemaRoot().getSchema().toString()); + } + + @Test + public void testEmptyStruct() { +List typeInfos = TypeInfoUtils.getTypeInfosFromTypeString("struct"); +List fieldNames = Arrays.asList(new String[] { "a" }); +Serializer converter = new Serializer(new HiveConf(), "attemptId", typeInfos, fieldNames); +ArrowWrapperWritable writable = converter.emptyBatch(); +Assert.assertEquals("Schema>", +writable.getVectorSchemaRoot().getSchema().toString()); + } + + @Test + public void testEmptyMap() { +List typeInfos = TypeInfoUtils.getTypeInfosFromTypeString("map"); +List fieldNames = Arrays.asList(new String[] { "a" }); +Serializer converter = new Serializer(new HiveConf(), "attemptId", typeInfos, fieldNames); +ArrowWrapperWritable writable = converter.emptyBatch(); +Assert.assertEquals("Schema>>", +writable.getVectorSchemaRoot().getSchema().toString()); + } + + @Test + public void testEmptyComplexStruct() { +List typeInfos = TypeInfoUtils.getTypeInfosFromTypeString( + "struct,c:map,d:struct,f:map>>"); +List fieldNames = Arrays.asList(new String[] { "a" }); +Serializer converter = new Serializer(new HiveConf(), "attemptId", typeInfos, fieldNames); +ArrowWrapperWritable writable = converter.emptyBatch(); +Assert.assertEquals( +"Schema, c: List<$data$: Struct>, " + +"d: Struct, f: List<$data$: Struct", +writable.getVectorSchemaRoot().getSchema().toString()); + } Review comment: Addes some more tests to cover every nested type at least once This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522259) Time Spent: 40m (was: 0.5h) > VectorFileSinkArrowOperator does not serialize complex types correctly > -- > > Key: HIVE-24504 > URL: https://issues.apache.org/jira/browse/HIVE-24504 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > When the table has complex types and the result has 0 records the > VectorFileSinkArrowOperator only serializes the primitive types correctly. > For complex types only the main type is set which causes
[jira] [Work logged] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly
[ https://issues.apache.org/jira/browse/HIVE-24504?focusedWorklogId=522258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522258 ] ASF GitHub Bot logged work on HIVE-24504: - Author: ASF GitHub Bot Created on: 09/Dec/20 14:09 Start Date: 09/Dec/20 14:09 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1758: URL: https://github.com/apache/hive/pull/1758#discussion_r539336144 ## File path: ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestSerializer.java ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.io.arrow; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; +import org.junit.Assert; +import org.junit.Test; + +import java.util.Arrays; +import java.util.List; + +public class TestSerializer { + @Test + public void testEmptyArray() { Review comment: The name was based on the Hive type, but I think both makes sense so renamed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522258) Time Spent: 0.5h (was: 20m) > VectorFileSinkArrowOperator does not serialize complex types correctly > -- > > Key: HIVE-24504 > URL: https://issues.apache.org/jira/browse/HIVE-24504 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When the table has complex types and the result has 0 records the > VectorFileSinkArrowOperator only serializes the primitive types correctly. > For complex types only the main type is set which causes issues for clients > trying to read data. > Got the following HWC exception: > {code:java} > Previous exception in task: Unsupported data type: Null > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71) > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106) > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98) > > org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala) > > org.apache.spark.sql.vectorized.ArrowColumnVector.(ArrowColumnVector.java:135) > > com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105) > > com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29) > > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59) > > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown > Source) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > >
[jira] [Work started] (HIVE-24502) Store table level regular expression used during dump for table level replication
[ https://issues.apache.org/jira/browse/HIVE-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24502 started by Aasha Medhi. -- > Store table level regular expression used during dump for table level > replication > - > > Key: HIVE-24502 > URL: https://issues.apache.org/jira/browse/HIVE-24502 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24502.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Store include table list and exclude table list as part of dump meta data file -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24502) Store table level regular expression used during dump for table level replication
[ https://issues.apache.org/jira/browse/HIVE-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-24502: --- Attachment: HIVE-24502.01.patch Status: Patch Available (was: In Progress) > Store table level regular expression used during dump for table level > replication > - > > Key: HIVE-24502 > URL: https://issues.apache.org/jira/browse/HIVE-24502 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24502.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Store include table list and exclude table list as part of dump meta data file -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24502) Store table level regular expression used during dump for table level replication
[ https://issues.apache.org/jira/browse/HIVE-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24502: -- Labels: pull-request-available (was: ) > Store table level regular expression used during dump for table level > replication > - > > Key: HIVE-24502 > URL: https://issues.apache.org/jira/browse/HIVE-24502 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24502) Store table level regular expression used during dump for table level replication
[ https://issues.apache.org/jira/browse/HIVE-24502?focusedWorklogId=522208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522208 ] ASF GitHub Bot logged work on HIVE-24502: - Author: ASF GitHub Bot Created on: 09/Dec/20 11:59 Start Date: 09/Dec/20 11:59 Worklog Time Spent: 10m Work Description: aasha opened a new pull request #1759: URL: https://github.com/apache/hive/pull/1759 …r table level replication ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522208) Remaining Estimate: 0h Time Spent: 10m > Store table level regular expression used during dump for table level > replication > - > > Key: HIVE-24502 > URL: https://issues.apache.org/jira/browse/HIVE-24502 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24502) Store table level regular expression used during dump for table level replication
[ https://issues.apache.org/jira/browse/HIVE-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-24502: --- Description: Store include table list and exclude table list as part of dump meta data file > Store table level regular expression used during dump for table level > replication > - > > Key: HIVE-24502 > URL: https://issues.apache.org/jira/browse/HIVE-24502 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Store include table list and exclude table list as part of dump meta data file -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly
[ https://issues.apache.org/jira/browse/HIVE-24504?focusedWorklogId=522190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522190 ] ASF GitHub Bot logged work on HIVE-24504: - Author: ASF GitHub Bot Created on: 09/Dec/20 11:35 Start Date: 09/Dec/20 11:35 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1758: URL: https://github.com/apache/hive/pull/1758#discussion_r539228933 ## File path: ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestSerializer.java ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.io.arrow; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; +import org.junit.Assert; +import org.junit.Test; + +import java.util.Arrays; +import java.util.List; + +public class TestSerializer { + @Test + public void testEmptyArray() { Review comment: Nit: Would probably name it testEmptyList for consistency with the Serializer ## File path: ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestSerializer.java ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.io.arrow; + +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; +import org.junit.Assert; +import org.junit.Test; + +import java.util.Arrays; +import java.util.List; + +public class TestSerializer { + @Test + public void testEmptyArray() { +List typeInfos = TypeInfoUtils.getTypeInfosFromTypeString("array"); +List fieldNames = Arrays.asList(new String[]{"a"}); +Serializer converter = new Serializer(new HiveConf(), "attemptId", typeInfos, fieldNames); +ArrowWrapperWritable writable = converter.emptyBatch(); +Assert.assertEquals("Schema>", +writable.getVectorSchemaRoot().getSchema().toString()); + } + + @Test + public void testEmptyStruct() { +List typeInfos = TypeInfoUtils.getTypeInfosFromTypeString("struct"); +List fieldNames = Arrays.asList(new String[] { "a" }); +Serializer converter = new Serializer(new HiveConf(), "attemptId", typeInfos, fieldNames); +ArrowWrapperWritable writable = converter.emptyBatch(); +Assert.assertEquals("Schema>", +writable.getVectorSchemaRoot().getSchema().toString()); + } + + @Test + public void testEmptyMap() { +List typeInfos = TypeInfoUtils.getTypeInfosFromTypeString("map"); +List fieldNames = Arrays.asList(new String[] { "a" }); +Serializer converter = new Serializer(new HiveConf(), "attemptId", typeInfos, fieldNames); +ArrowWrapperWritable writable = converter.emptyBatch(); +Assert.assertEquals("Schema>>", +writable.getVectorSchemaRoot().getSchema().toString()); + } + + @Test + public void testEmptyComplexStruct() { +List typeInfos = TypeInfoUtils.getTypeInfosFromTypeString( + "struct,c:map,d:struct,f:map>>"); +List fieldNames = Arrays.asList(new String[] { "a" }); +Serializer converter = new Serializer(new HiveConf(), "attemptId", typeInfos, fieldNames); +ArrowWrapperWritable writable = converter.emptyBatch(); +Assert.assertEquals( +
[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry
[ https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=522186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522186 ] ASF GitHub Bot logged work on HIVE-24468: - Author: ASF GitHub Bot Created on: 09/Dec/20 11:26 Start Date: 09/Dec/20 11:26 Worklog Time Spent: 10m Work Description: aasha commented on pull request #1728: URL: https://github.com/apache/hive/pull/1728#issuecomment-741711210 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522186) Time Spent: 1h 40m (was: 1.5h) > Use Event Time instead of Current Time in Notification Log DB Entry > --- > > Key: HIVE-24468 > URL: https://issues.apache.org/jira/browse/HIVE-24468 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly
[ https://issues.apache.org/jira/browse/HIVE-24504?focusedWorklogId=522185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522185 ] ASF GitHub Bot logged work on HIVE-24504: - Author: ASF GitHub Bot Created on: 09/Dec/20 11:24 Start Date: 09/Dec/20 11:24 Worklog Time Spent: 10m Work Description: pvary opened a new pull request #1758: URL: https://github.com/apache/hive/pull/1758 ### What changes were proposed in this pull request? Use an empty batch to generate the schema for the empty results ### Why are the changes needed? Clients expect the full schema even for empty results ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit and other test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522185) Remaining Estimate: 0h Time Spent: 10m > VectorFileSinkArrowOperator does not serialize complex types correctly > -- > > Key: HIVE-24504 > URL: https://issues.apache.org/jira/browse/HIVE-24504 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When the table has complex types and the result has 0 records the > VectorFileSinkArrowOperator only serializes the primitive types correctly. > For complex types only the main type is set which causes issues for clients > trying to read data. > Got the following HWC exception: > {code:java} > Previous exception in task: Unsupported data type: Null > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71) > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106) > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98) > > org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala) > > org.apache.spark.sql.vectorized.ArrowColumnVector.(ArrowColumnVector.java:135) > > com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105) > > com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29) > > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59) > > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown > Source) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > org.apache.spark.scheduler.Task.run(Task.scala:109) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > at > org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139) > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:117) >
[jira] [Updated] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly
[ https://issues.apache.org/jira/browse/HIVE-24504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24504: -- Labels: pull-request-available (was: ) > VectorFileSinkArrowOperator does not serialize complex types correctly > -- > > Key: HIVE-24504 > URL: https://issues.apache.org/jira/browse/HIVE-24504 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When the table has complex types and the result has 0 records the > VectorFileSinkArrowOperator only serializes the primitive types correctly. > For complex types only the main type is set which causes issues for clients > trying to read data. > Got the following HWC exception: > {code:java} > Previous exception in task: Unsupported data type: Null > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71) > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106) > > org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98) > > org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala) > > org.apache.spark.sql.vectorized.ArrowColumnVector.(ArrowColumnVector.java:135) > > com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105) > > com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29) > > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59) > > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown > Source) > > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > org.apache.spark.scheduler.Task.run(Task.scala:109) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > at > org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139) > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:117) > at org.apache.spark.scheduler.Task.run(Task.scala:119) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24511) Fix typo in SerDeStorageSchemaReader
[ https://issues.apache.org/jira/browse/HIVE-24511?focusedWorklogId=522182=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522182 ] ASF GitHub Bot logged work on HIVE-24511: - Author: ASF GitHub Bot Created on: 09/Dec/20 11:14 Start Date: 09/Dec/20 11:14 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1757: URL: https://github.com/apache/hive/pull/1757 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522182) Remaining Estimate: 0h Time Spent: 10m > Fix typo in SerDeStorageSchemaReader > > > Key: HIVE-24511 > URL: https://issues.apache.org/jira/browse/HIVE-24511 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Zhihua Deng >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > 1, Close the created classloader to release resources. > 2, More detail error messages on MetaException when throwing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24511) Fix typo in SerDeStorageSchemaReader
[ https://issues.apache.org/jira/browse/HIVE-24511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24511: -- Labels: pull-request-available (was: ) > Fix typo in SerDeStorageSchemaReader > > > Key: HIVE-24511 > URL: https://issues.apache.org/jira/browse/HIVE-24511 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > 1, Close the created classloader to release resources. > 2, More detail error messages on MetaException when throwing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly
[ https://issues.apache.org/jira/browse/HIVE-24207?focusedWorklogId=522164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522164 ] ASF GitHub Bot logged work on HIVE-24207: - Author: ASF GitHub Bot Created on: 09/Dec/20 10:12 Start Date: 09/Dec/20 10:12 Worklog Time Spent: 10m Work Description: rbalamohan commented on a change in pull request #1556: URL: https://github.com/apache/hive/pull/1556#discussion_r539176536 ## File path: ql/src/test/queries/clientpositive/authorization_view_1.q ## @@ -1,5 +1,6 @@ --! qt:dataset:src set hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider; +set hive.exec.reducers.max=1; Review comment: Any reason for changing this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522164) Time Spent: 0.5h (was: 20m) > LimitOperator can leverage ObjectCache to bail out quickly > -- > > Key: HIVE-24207 > URL: https://issues.apache.org/jira/browse/HIVE-24207 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > {noformat} > select ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in > (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk > limit 100; > select distinct ss_sold_date_sk from store_sales, date_dim where > date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = > date_dim.d_date_sk limit 100; > {noformat} > Queries like the above generate a large number of map tasks. Currently they > don't bail out after generating enough amount of data. > It would be good to make use of ObjectCache & retain the number of records > generated. LimitOperator/VectorLimitOperator can bail out for the later tasks > in the operator's init phase itself. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24475) Generalize fixacidkeyindex utility
[ https://issues.apache.org/jira/browse/HIVE-24475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits resolved HIVE-24475. Resolution: Fixed > Generalize fixacidkeyindex utility > -- > > Key: HIVE-24475 > URL: https://issues.apache.org/jira/browse/HIVE-24475 > Project: Hive > Issue Type: Improvement > Components: ORC, Transactions >Affects Versions: 3.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > There is a utility in hive which can validate/fix corrupted > hive.acid.key.index. > hive --service fixacidkeyindex > Unfortunately it is only tailored for a specific problem > (https://issues.apache.org/jira/browse/HIVE-18907), instead of generally > validating and recovering the hive.acid.key.index from the stripe data itself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24475) Generalize fixacidkeyindex utility
[ https://issues.apache.org/jira/browse/HIVE-24475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-24475: --- Fix Version/s: 4.0.0 > Generalize fixacidkeyindex utility > -- > > Key: HIVE-24475 > URL: https://issues.apache.org/jira/browse/HIVE-24475 > Project: Hive > Issue Type: Improvement > Components: ORC, Transactions >Affects Versions: 3.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > There is a utility in hive which can validate/fix corrupted > hive.acid.key.index. > hive --service fixacidkeyindex > Unfortunately it is only tailored for a specific problem > (https://issues.apache.org/jira/browse/HIVE-18907), instead of generally > validating and recovering the hive.acid.key.index from the stripe data itself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=522135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522135 ] ASF GitHub Bot logged work on HIVE-24432: - Author: ASF GitHub Bot Created on: 09/Dec/20 09:10 Start Date: 09/Dec/20 09:10 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1710: URL: https://github.com/apache/hive/pull/1710#discussion_r539130987 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java ## @@ -10800,53 +10801,89 @@ public void addNotificationEvent(NotificationEvent entry) throws MetaException { @Override public void cleanNotificationEvents(int olderThan) { Review comment: The same improvement done in cleanNotificationEvents can be applied to cleanWriteNotificationEvents also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522135) Time Spent: 2.5h (was: 2h 20m) > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > Notification events are loaded in batches (reduces memory pressure on the > HMS), but all of the deletes happen under a single transactions and, when > deleting many records, can put a lot of pressure on the backend database. > Instead, delete events in batches (in different transactions) as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=522133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522133 ] ASF GitHub Bot logged work on HIVE-24432: - Author: ASF GitHub Bot Created on: 09/Dec/20 09:09 Start Date: 09/Dec/20 09:09 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1710: URL: https://github.com/apache/hive/pull/1710#discussion_r539130987 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java ## @@ -10800,53 +10801,89 @@ public void addNotificationEvent(NotificationEvent entry) throws MetaException { @Override public void cleanNotificationEvents(int olderThan) { Review comment: The same improvement of deleting in batches can be applied to cleanWriteNotificationEvents also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522133) Time Spent: 2h 20m (was: 2h 10m) > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Notification events are loaded in batches (reduces memory pressure on the > HMS), but all of the deletes happen under a single transactions and, when > deleting many records, can put a lot of pressure on the backend database. > Instead, delete events in batches (in different transactions) as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=522131=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522131 ] ASF GitHub Bot logged work on HIVE-24432: - Author: ASF GitHub Bot Created on: 09/Dec/20 09:08 Start Date: 09/Dec/20 09:08 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1710: URL: https://github.com/apache/hive/pull/1710#discussion_r539040762 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java ## @@ -10800,53 +10801,89 @@ public void addNotificationEvent(NotificationEvent entry) throws MetaException { @Override public void cleanNotificationEvents(int olderThan) { -boolean commited = false; -Query query = null; +final int eventBatchSize = MetastoreConf.getIntVar(conf, MetastoreConf.ConfVars.EVENT_CLEAN_MAX_EVENTS); + +final long ageSec = olderThan; +final Instant now = Instant.now(); + +final int tooOld = Math.toIntExact(now.getEpochSecond() - ageSec); + +final Optional batchSize = (eventBatchSize > 0) ? Optional.of(eventBatchSize) : Optional.empty(); + +final long start = System.nanoTime(); +int deleteCount = doCleanNotificationEvents(tooOld, batchSize); + +if (deleteCount == 0) { + LOG.info("No Notification events found to be cleaned with eventTime < {}", tooOld); +} else { + int batchCount = 0; + do { +batchCount = doCleanNotificationEvents(tooOld, batchSize); +deleteCount += batchCount; + } while (batchCount > 0); +} + +final long finish = System.nanoTime(); + +LOG.info("Deleted {} notification events older than epoch:{} in {}ms", deleteCount, tooOld, +TimeUnit.NANOSECONDS.toMillis(finish - start)); + } + + private int doCleanNotificationEvents(final int ageSec, final Optional batchSize) { +final Transaction tx = pm.currentTransaction(); +int eventsCount = 0; + try { - openTransaction(); - long tmp = System.currentTimeMillis() / 1000 - olderThan; - int tooOld = (tmp > Integer.MAX_VALUE) ? 0 : (int) tmp; - query = pm.newQuery(MNotificationLog.class, "eventTime < tooOld"); - query.declareParameters("java.lang.Integer tooOld"); + tx.begin(); - int max_events = MetastoreConf.getIntVar(conf, MetastoreConf.ConfVars.EVENT_CLEAN_MAX_EVENTS); - max_events = max_events > 0 ? max_events : Integer.MAX_VALUE; - query.setRange(0, max_events); - query.setOrdering("eventId ascending"); + try (Query query = pm.newQuery(MNotificationLog.class, "eventTime < tooOld")) { +query.declareParameters("java.lang.Integer tooOld"); +query.setOrdering("eventId ascending"); +if (batchSize.isPresent()) { + query.setRange(0, batchSize.get()); +} - List toBeRemoved = (List) query.execute(tooOld); - int iteration = 0; - int eventCount = 0; - long minEventId = 0; - long minEventTime = 0; - long maxEventId = 0; - long maxEventTime = 0; - while (CollectionUtils.isNotEmpty(toBeRemoved)) { -int listSize = toBeRemoved.size(); -if (iteration == 0) { - MNotificationLog firstNotification = toBeRemoved.get(0); - minEventId = firstNotification.getEventId(); - minEventTime = firstNotification.getEventTime(); +List events = (List) query.execute(ageSec); +if (CollectionUtils.isNotEmpty(events)) { + eventsCount = events.size(); + + if (LOG.isDebugEnabled()) { +int minEventTime, maxEventTime; +long minEventId, maxEventId; +Iterator iter = events.iterator(); +MNotificationLog firstNotification = iter.next(); + +minEventTime = maxEventTime = firstNotification.getEventTime(); +minEventId = maxEventId = firstNotification.getEventId(); + +while (iter.hasNext()) { + MNotificationLog notification = iter.next(); Review comment: Is the comparison required? events will always be in ascending order of event id This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522131) Time Spent: 2h 10m (was: 2h) > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David
[jira] [Work logged] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly
[ https://issues.apache.org/jira/browse/HIVE-24207?focusedWorklogId=522127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522127 ] ASF GitHub Bot logged work on HIVE-24207: - Author: ASF GitHub Bot Created on: 09/Dec/20 08:54 Start Date: 09/Dec/20 08:54 Worklog Time Spent: 10m Work Description: abstractdog commented on pull request #1556: URL: https://github.com/apache/hive/pull/1556#issuecomment-741630312 precommit tests passed, could you please take a look @rbalamohan ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522127) Time Spent: 20m (was: 10m) > LimitOperator can leverage ObjectCache to bail out quickly > -- > > Key: HIVE-24207 > URL: https://issues.apache.org/jira/browse/HIVE-24207 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {noformat} > select ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in > (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk > limit 100; > select distinct ss_sold_date_sk from store_sales, date_dim where > date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = > date_dim.d_date_sk limit 100; > {noformat} > Queries like the above generate a large number of map tasks. Currently they > don't bail out after generating enough amount of data. > It would be good to make use of ObjectCache & retain the number of records > generated. LimitOperator/VectorLimitOperator can bail out for the later tasks > in the operator's init phase itself. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24475) Generalize fixacidkeyindex utility
[ https://issues.apache.org/jira/browse/HIVE-24475?focusedWorklogId=522126=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522126 ] ASF GitHub Bot logged work on HIVE-24475: - Author: ASF GitHub Bot Created on: 09/Dec/20 08:53 Start Date: 09/Dec/20 08:53 Worklog Time Spent: 10m Work Description: lcspinter commented on pull request #1730: URL: https://github.com/apache/hive/pull/1730#issuecomment-741629522 Merged into master. Thanks for the patch @asinkovits and for the review @pvargacl and @maheshk114 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522126) Time Spent: 1h 40m (was: 1.5h) > Generalize fixacidkeyindex utility > -- > > Key: HIVE-24475 > URL: https://issues.apache.org/jira/browse/HIVE-24475 > Project: Hive > Issue Type: Improvement > Components: ORC, Transactions >Affects Versions: 3.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > There is a utility in hive which can validate/fix corrupted > hive.acid.key.index. > hive --service fixacidkeyindex > Unfortunately it is only tailored for a specific problem > (https://issues.apache.org/jira/browse/HIVE-18907), instead of generally > validating and recovering the hive.acid.key.index from the stripe data itself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24475) Generalize fixacidkeyindex utility
[ https://issues.apache.org/jira/browse/HIVE-24475?focusedWorklogId=522125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522125 ] ASF GitHub Bot logged work on HIVE-24475: - Author: ASF GitHub Bot Created on: 09/Dec/20 08:52 Start Date: 09/Dec/20 08:52 Worklog Time Spent: 10m Work Description: lcspinter merged pull request #1730: URL: https://github.com/apache/hive/pull/1730 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 522125) Time Spent: 1.5h (was: 1h 20m) > Generalize fixacidkeyindex utility > -- > > Key: HIVE-24475 > URL: https://issues.apache.org/jira/browse/HIVE-24475 > Project: Hive > Issue Type: Improvement > Components: ORC, Transactions >Affects Versions: 3.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > There is a utility in hive which can validate/fix corrupted > hive.acid.key.index. > hive --service fixacidkeyindex > Unfortunately it is only tailored for a specific problem > (https://issues.apache.org/jira/browse/HIVE-18907), instead of generally > validating and recovering the hive.acid.key.index from the stripe data itself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24508) hive.parquet.timestamp.skip.conversion doesn't work
[ https://issues.apache.org/jira/browse/HIVE-24508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246366#comment-17246366 ] Karen Coppage commented on HIVE-24508: -- This is expected behavior. hive.parquet.timestamp.skip conversion only affects reading, not writing; and furthermore it only affects data not written by Hive. Please see the description: {quote}"Current Hive implementation of parquet stores timestamps to UTC, this flag allows skipping of the conversion on reading parquet files from other tools." {quote} > hive.parquet.timestamp.skip.conversion doesn't work > --- > > Key: HIVE-24508 > URL: https://issues.apache.org/jira/browse/HIVE-24508 > Project: Hive > Issue Type: Bug > Components: Parquet >Reporter: wenjun ma >Assignee: wenjun ma >Priority: Major > Fix For: All Versions > > > Even we set true or false. When we insert the current timestamp it always > uses the local time zone. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24497) Node heartbeats from LLAP Daemon to the client are not matching leading to timeout.
[ https://issues.apache.org/jira/browse/HIVE-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri G updated HIVE-24497: -- Attachment: hive-24497.01.patch > Node heartbeats from LLAP Daemon to the client are not matching leading to > timeout. > --- > > Key: HIVE-24497 > URL: https://issues.apache.org/jira/browse/HIVE-24497 > Project: Hive > Issue Type: Sub-task >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Minor > Labels: pull-request-available > Attachments: hive-24497.01.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Node heartbeat contains info about all the tasks that were submitted to that > LLAP Daemon. In cloud deployment, the client is not able to match this > heartbeats due to differences in hostname and port . -- This message was sent by Atlassian Jira (v8.3.4#803005)