(hive) branch master updated (a537000af73 -> 5e78ce0e1f1)

2024-04-05 Thread abstractdog
This is an automated email from the ASF dual-hosted git repository.

abstractdog pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hive.git


from a537000af73 HIVE-28046: Use serdeConstants fields instead of string 
literals in hive-exec module (Michal Lorek reviewed by Stamatis Zampetakis)
 add 5e78ce0e1f1 HIVE-28147. Upgrade commons-compress to 1.26.0 Address CVE 
Issue. (#5156) (Shilun Fan reviewed by Laszlo Bodor)

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



(hive) branch master updated (855e4055675 -> a537000af73)

2024-04-05 Thread zabetak
This is an automated email from the ASF dual-hosted git repository.

zabetak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hive.git


from 855e4055675 HIVE-28148: Implement array_compact UDF to remove all 
nulls from an array (#5161) (Taraka Rama Rao Lethavadla reviewed by Sourabh 
Badhya)
 new e6e6b12d5fe HIVE-28144: Remove overly verbose debug messages from 
MetastoreDirectSqlUtils (Stamatis Zampetakis reviewed by Butao Zhang)
 new a537000af73 HIVE-28046: Use serdeConstants fields instead of string 
literals in hive-exec module (Michal Lorek reviewed by Stamatis Zampetakis)

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../hadoop/hive/ql/exec/ColumnStatsUpdateTask.java | 27 ---
 .../apache/hadoop/hive/ql/exec/DDLPlanUtils.java   |  3 +-
 .../ql/exec/vector/VectorExpressionDescriptor.java | 24 +++---
 .../hive/ql/exec/vector/VectorizationContext.java  | 86 +++---
 .../hive/ql/exec/vector/VectorizedBatchUtil.java   |  3 +-
 .../exec/vector/expressions/CastDateToBoolean.java |  3 +-
 .../vector/expressions/CastDateToTimestamp.java|  3 +-
 .../vector/expressions/CastDoubleToTimestamp.java  |  3 +-
 .../vector/expressions/CastTimestampToBoolean.java |  3 +-
 .../vector/expressions/CastTimestampToDouble.java  |  3 +-
 .../vector/expressions/CastTimestampToLong.java|  3 +-
 .../expressions/DateColSubtractDateColumn.java |  5 +-
 .../expressions/DateColSubtractDateScalar.java |  5 +-
 .../expressions/DateScalarSubtractDateColumn.java  |  5 +-
 .../IfExprDoubleColumnDoubleColumn.java|  5 +-
 .../hive/ql/exec/vector/ptf/VectorPTFOperator.java | 11 +--
 .../org/apache/hadoop/hive/ql/io/IOConstants.java  |  2 +-
 .../hadoop/hive/ql/io/RCFileOutputFormat.java  |  3 +-
 .../org/apache/hadoop/hive/ql/io/orc/OrcSerde.java |  4 +-
 .../org/apache/hadoop/hive/ql/io/orc/OrcUnion.java |  5 +-
 .../parquet/convert/HiveCollectionConverter.java   |  4 +-
 .../hive/ql/io/parquet/serde/ParquetHiveSerDe.java |  2 +-
 .../hive/ql/io/sarg/ConvertAstToSearchArg.java |  3 +-
 .../hive/ql/optimizer/SimpleFetchOptimizer.java|  3 +-
 .../calcite/translator/SqlFunctionConverter.java   | 28 +++
 .../hive/ql/optimizer/physical/Vectorizer.java | 55 +++---
 .../hadoop/hive/ql/parse/BaseSemanticAnalyzer.java |  4 +-
 .../hive/ql/parse/rewrite/MergeRewriter.java   |  3 +-
 .../hive/ql/parse/type/TypeCheckProcFactory.java   |  8 +-
 .../apache/hadoop/hive/ql/plan/PartitionDesc.java  |  2 +-
 .../org/apache/hadoop/hive/ql/plan/PlanUtils.java  |  9 ++-
 .../hadoop/hive/ql/processors/DfsProcessor.java|  3 +-
 .../ql/processors/LlapCacheResourceProcessor.java  |  5 +-
 .../processors/LlapClusterResourceProcessor.java   | 13 ++--
 .../hive/ql/udf/esri/serde/BaseJsonSerDe.java  |  3 +-
 .../hive/ql/udf/generic/GenericUDFBetween.java |  3 +-
 .../hive/ql/udf/ptf/ValueBoundaryScanner.java  | 25 ---
 .../hive/metastore/MetastoreDirectSqlUtils.java|  5 +-
 38 files changed, 209 insertions(+), 175 deletions(-)



(hive) 01/02: HIVE-28144: Remove overly verbose debug messages from MetastoreDirectSqlUtils (Stamatis Zampetakis reviewed by Butao Zhang)

2024-04-05 Thread zabetak
This is an automated email from the ASF dual-hosted git repository.

zabetak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hive.git

commit e6e6b12d5fe13e8cbca99e8f3a5d5f13438b2400
Author: Stamatis Zampetakis 
AuthorDate: Fri Mar 22 11:05:08 2024 +0100

HIVE-28144: Remove overly verbose debug messages from 
MetastoreDirectSqlUtils (Stamatis Zampetakis reviewed by Butao Zhang)

When BITVECTOR or KLL stats are disabled/not present in the metastore the 
following message may appear way too often in the HMS logs.
```
2024-03-22T01:50:57,849 DEBUG [CachedStore-CacheUpdateService: Thread-240] 
metastore.MetastoreDirectSqlUtils: Expected blob type but got java.lang.String
```
In fact in some cases, the message appears more than once for every single 
partition that is present in the table(s) being queried. When the number of 
partitions is important it can easily clog the logs with redundant and useless 
information.

To put things in perspective while running the cbo_query10.q on the 
statistics of TPC-DS30TB dataset the message occupies more than 50% (26MB) of 
the total log file (46MB).

The presence of the message does not tells us much on its own. In 
conjunction with the code we can infer that we are not fetching BITVECTOR/KLL 
stats from the metastore but this could be done in a different place without 
having to print the same message 170K times.

Removing this message saves disk space, avoids frequent log rotation, and 
improves the overall readability of the log file.

There is another redundant message which appears when transforming a 
database value to Boolean. The message is redundant since it is followed 
directly by an exception so there is no reason to have both. This message may 
not appear as often as the previous one but given that it doesn't add much 
value it can also be removed.

Close apache/hive#5159
---
 .../org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java| 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git 
a/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java
 
b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java
index 067e415d725..8a608a030ee 100644
--- 
a/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java
+++ 
b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java
@@ -554,7 +554,6 @@ class MetastoreDirectSqlUtils {
 return true;
   }
 }
-LOG.debug("Value is of type {}", value.getClass());
 throw new MetaException("Cannot extract boolean from column value " + 
value);
   }
 
@@ -590,8 +589,8 @@ class MetastoreDirectSqlUtils {
   return (byte[]) value;
 }
else {
-  // this may happen when enablebitvector is false
-  LOG.debug("Expected blob type but got " + value.getClass().getName());
+  // 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getStatsList(enableBitVector,enableKll)
+  // We get here when enableBitvector or enableKll is false
   return null;
 }
   }



(hive) 02/02: HIVE-28046: Use serdeConstants fields instead of string literals in hive-exec module (Michal Lorek reviewed by Stamatis Zampetakis)

2024-04-05 Thread zabetak
This is an automated email from the ASF dual-hosted git repository.

zabetak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hive.git

commit a537000af730e994a37b0f60b37007e3caf8f1c7
Author: mlorek 
AuthorDate: Tue Jan 30 14:28:23 2024 +

HIVE-28046: Use serdeConstants fields instead of string literals in 
hive-exec module (Michal Lorek reviewed by Stamatis Zampetakis)

The procedure to replace literals with constans is outlined below:
1. Generate a sed script file using serdeConstants.java for the possible 
replacements
grep "public static final java.lang.String" 
./serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java
 | sed 's/.\+java\.lang\.String//'| sed 's/ //g' | sed 
's/\(.*\)=\(.*\);/s#\2#serdeConstants.\1#/' | sed 's/\./\\./g'  > sedrepscript

2. Apply the script file to all java files in production code
find ql/src/java -name "*.java" -exec sed -i -f sedrepscript {} \;

3. Manual review of change files adding import when necessary and reverting 
irrelevant changes

using serdeConstants fields instead of string literals for 'columns' and 
'columns.types'

Co-authored-by: Stamatis Zampetakis 

Close apache/hive#5072
---
 .../hadoop/hive/ql/exec/ColumnStatsUpdateTask.java | 27 ---
 .../apache/hadoop/hive/ql/exec/DDLPlanUtils.java   |  3 +-
 .../ql/exec/vector/VectorExpressionDescriptor.java | 24 +++---
 .../hive/ql/exec/vector/VectorizationContext.java  | 86 +++---
 .../hive/ql/exec/vector/VectorizedBatchUtil.java   |  3 +-
 .../exec/vector/expressions/CastDateToBoolean.java |  3 +-
 .../vector/expressions/CastDateToTimestamp.java|  3 +-
 .../vector/expressions/CastDoubleToTimestamp.java  |  3 +-
 .../vector/expressions/CastTimestampToBoolean.java |  3 +-
 .../vector/expressions/CastTimestampToDouble.java  |  3 +-
 .../vector/expressions/CastTimestampToLong.java|  3 +-
 .../expressions/DateColSubtractDateColumn.java |  5 +-
 .../expressions/DateColSubtractDateScalar.java |  5 +-
 .../expressions/DateScalarSubtractDateColumn.java  |  5 +-
 .../IfExprDoubleColumnDoubleColumn.java|  5 +-
 .../hive/ql/exec/vector/ptf/VectorPTFOperator.java | 11 +--
 .../org/apache/hadoop/hive/ql/io/IOConstants.java  |  2 +-
 .../hadoop/hive/ql/io/RCFileOutputFormat.java  |  3 +-
 .../org/apache/hadoop/hive/ql/io/orc/OrcSerde.java |  4 +-
 .../org/apache/hadoop/hive/ql/io/orc/OrcUnion.java |  5 +-
 .../parquet/convert/HiveCollectionConverter.java   |  4 +-
 .../hive/ql/io/parquet/serde/ParquetHiveSerDe.java |  2 +-
 .../hive/ql/io/sarg/ConvertAstToSearchArg.java |  3 +-
 .../hive/ql/optimizer/SimpleFetchOptimizer.java|  3 +-
 .../calcite/translator/SqlFunctionConverter.java   | 28 +++
 .../hive/ql/optimizer/physical/Vectorizer.java | 55 +++---
 .../hadoop/hive/ql/parse/BaseSemanticAnalyzer.java |  4 +-
 .../hive/ql/parse/rewrite/MergeRewriter.java   |  3 +-
 .../hive/ql/parse/type/TypeCheckProcFactory.java   |  8 +-
 .../apache/hadoop/hive/ql/plan/PartitionDesc.java  |  2 +-
 .../org/apache/hadoop/hive/ql/plan/PlanUtils.java  |  9 ++-
 .../hadoop/hive/ql/processors/DfsProcessor.java|  3 +-
 .../ql/processors/LlapCacheResourceProcessor.java  |  5 +-
 .../processors/LlapClusterResourceProcessor.java   | 13 ++--
 .../hive/ql/udf/esri/serde/BaseJsonSerDe.java  |  3 +-
 .../hive/ql/udf/generic/GenericUDFBetween.java |  3 +-
 .../hive/ql/udf/ptf/ValueBoundaryScanner.java  | 25 ---
 37 files changed, 207 insertions(+), 172 deletions(-)

diff --git 
a/ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java 
b/ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
index 8b6c8d6b1bd..c4ae676d7a9 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
@@ -55,6 +55,7 @@ import org.apache.hadoop.hive.ql.metadata.Table;
 import org.apache.hadoop.hive.ql.parse.SemanticException;
 import org.apache.hadoop.hive.ql.plan.ColumnStatsUpdateWork;
 import org.apache.hadoop.hive.ql.plan.api.StageType;
+import org.apache.hadoop.hive.serde.serdeConstants;
 import org.apache.hadoop.hive.serde2.io.DateWritableV2;
 import org.apache.hadoop.hive.serde2.io.TimestampWritableV2;
 import org.slf4j.Logger;
@@ -101,9 +102,11 @@ public class ColumnStatsUpdateTask extends 
Task {
 
 ColumnStatisticsData statsData = new ColumnStatisticsData();
 
-if (columnType.equalsIgnoreCase("long") || 
columnType.equalsIgnoreCase("tinyint")
-|| columnType.equalsIgnoreCase("smallint") || 
columnType.equalsIgnoreCase("int")
-|| columnType.equalsIgnoreCase("bigint")) {
+if (columnType.equalsIgnoreCase("long")
+|| columnType.equalsIgnoreCase(serdeConstants.TINYINT_TYPE_NAME)
+|| columnType.equalsIgnoreCase(serdeConstants.SMALLINT_TYPE_NAME)
+|| 

(hive) branch master updated: HIVE-28148: Implement array_compact UDF to remove all nulls from an array (#5161) (Taraka Rama Rao Lethavadla reviewed by Sourabh Badhya)

2024-04-05 Thread sbadhya
This is an automated email from the ASF dual-hosted git repository.

sbadhya pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hive.git


The following commit(s) were added to refs/heads/master by this push:
 new 855e4055675 HIVE-28148: Implement array_compact UDF to remove all 
nulls from an array (#5161) (Taraka Rama Rao Lethavadla reviewed by Sourabh 
Badhya)
855e4055675 is described below

commit 855e4055675e3c993a61f59501f783e641abaaa6
Author: tarak271 
AuthorDate: Fri Apr 5 16:45:17 2024 +0530

HIVE-28148: Implement array_compact UDF to remove all nulls from an array 
(#5161) (Taraka Rama Rao Lethavadla reviewed by Sourabh Badhya)
---
 .../hadoop/hive/ql/exec/FunctionRegistry.java  |   1 +
 .../ql/udf/generic/GenericUDFArrayCompact.java |  56 +
 .../ql/udf/generic/TestGenericUDFArrayCompact.java | 127 +
 .../queries/clientnegative/udf_array_compact_1.q   |   1 +
 .../queries/clientpositive/udf_array_compact.q |  38 ++
 .../clientnegative/udf_array_compact_1.q.out   |   1 +
 .../clientpositive/llap/show_functions.q.out   |   2 +
 .../clientpositive/llap/udf_array_compact.q.out| 112 ++
 8 files changed, 338 insertions(+)

diff --git a/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
b/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
index 28f35c4a15f..c54a59f9516 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
@@ -617,6 +617,7 @@ public final class FunctionRegistry {
 system.registerGenericUDF("array_remove", GenericUDFArrayRemove.class);
 system.registerGenericUDF("array_position", GenericUDFArrayPosition.class);
 system.registerGenericUDF("array_append", GenericUDFArrayAppend.class);
+system.registerGenericUDF("array_compact", GenericUDFArrayCompact.class);
 system.registerGenericUDF("deserialize", GenericUDFDeserialize.class);
 system.registerGenericUDF("sentences", GenericUDFSentences.class);
 system.registerGenericUDF("map_keys", GenericUDFMapKeys.class);
diff --git 
a/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayCompact.java 
b/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayCompact.java
new file mode 100644
index 000..71f5526e126
--- /dev/null
+++ 
b/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayCompact.java
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+
+/**
+ * GenericUDFArrayCompact.
+ */
+@Description(name = "array_compact", value = "_FUNC_(array) - Removes NULL 
elements from array.",
+extended = "Example:\n" + "  > SELECT _FUNC_(array(1,NULL,3,NULL,4)) FROM 
src;\n" + "  [1,3,4]")
+public class GenericUDFArrayCompact extends AbstractGenericUDFArrayBase {
+  private static final String FUNC_NAME = "ARRAY_COMPACT";
+
+  public GenericUDFArrayCompact() {
+super(FUNC_NAME, 1, 1, ObjectInspector.Category.LIST);
+  }
+
+  @Override
+  public Object evaluate(DeferredObject[] arguments) throws HiveException {
+Object array = arguments[ARRAY_IDX].get();
+int arrayLength = arrayOI.getListLength(array);
+if (arrayLength == 0) {
+  return Collections.emptyList();
+} else if (arrayLength < 0) {
+  return null;
+}
+
+List resultArray = new ArrayList<>(((ListObjectInspector) 
argumentOIs[ARRAY_IDX]).getList(array));
+return resultArray.stream().filter(Objects::nonNull).map(o -> 
converter.convert(o)).collect(Collectors.toList());
+  }
+}
diff --git 
a/ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFArrayCompact.java
 

(hive) branch master updated: HIVE-28037: Run multiple Qtests with Postgres (Zoltan Ratkai, reviewed by Denys Kuzmenko, Zsolt Miskolczi)

2024-04-05 Thread dkuzmenko
This is an automated email from the ASF dual-hosted git repository.

dkuzmenko pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hive.git


The following commit(s) were added to refs/heads/master by this push:
 new c6290206d3b HIVE-28037: Run multiple Qtests with Postgres (Zoltan 
Ratkai, reviewed by Denys Kuzmenko, Zsolt Miskolczi)
c6290206d3b is described below

commit c6290206d3bc2d97872b2b0a7910c6cc05526c3c
Author: Zoltan Ratkai <117656751+zrat...@users.noreply.github.com>
AuthorDate: Fri Apr 5 10:14:10 2024 +0200

HIVE-28037: Run multiple Qtests with Postgres (Zoltan Ratkai, reviewed by 
Denys Kuzmenko, Zsolt Miskolczi)

Closes #5118
---
 .../main/java/org/apache/hadoop/hive/cli/control/CoreCliDriver.java | 1 +
 .../main/java/org/apache/hadoop/hive/ql/QTestMetaStoreHandler.java  | 6 ++
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git 
a/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreCliDriver.java
 
b/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreCliDriver.java
index 8f4e9ad1a62..19b93f1825f 100644
--- 
a/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreCliDriver.java
+++ 
b/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreCliDriver.java
@@ -92,6 +92,7 @@ public class CoreCliDriver extends CliAdapter {
   @AfterClass
   public void shutdown() throws Exception {
 qt.shutdown();
+metaStoreHandler.getRule().after();
   }
 
   @Override
diff --git 
a/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestMetaStoreHandler.java
 
b/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestMetaStoreHandler.java
index e8827bda900..95ae730d704 100644
--- 
a/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestMetaStoreHandler.java
+++ 
b/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestMetaStoreHandler.java
@@ -98,14 +98,12 @@ public class QTestMetaStoreHandler {
   }
 
   public void beforeTest() throws Exception {
-getRule().before();
-if (!isDerby()) {// derby is handled with old QTestUtil logic (TxnDbUtil 
stuff)
-  getRule().install();
+if (isDerby()) {
+  getRule().before();
 }
   }
 
   public void afterTest(QTestUtil qt) throws Exception {
-getRule().after();
 
 // special qtest logic, which doesn't fit quite well into Derby.after()
 if (isDerby()) {