[jira] [Work logged] (HIVE-27317) Temporary (local) session files cleanup improvements

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27317?focusedWorklogId=861116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861116
 ]

ASF GitHub Bot logged work on HIVE-27317:
-

Author: ASF GitHub Bot
Created on: 09/May/23 04:31
Start Date: 09/May/23 04:31
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4293:
URL: https://github.com/apache/hive/pull/4293#issuecomment-1539389753

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4293)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4293=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4293=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4293=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4293=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4293=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4293=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4293=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4293=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4293=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4293=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4293=false=CODE_SMELL)
 [2 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4293=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4293=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4293=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 861116)
Time Spent: 1h 10m  (was: 1h)

> Temporary (local) session files cleanup improvements
> 
>
> Key: HIVE-27317
> URL: https://issues.apache.org/jira/browse/HIVE-27317
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sercan Tekin
>Assignee: Sercan Tekin
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-27317.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When Hive session is killed, no chance for shutdown hook to clean-up tmp 
> files.
> There is a Hive service to clean residual files 
> https://issues.apache.org/jira/browse/HIVE-13429, and later on its execution 
> is scheduled inside HS2 https://issues.apache.org/jira/browse/HIVE-15068 to 
> make sure not to leave any temp file behind. But this service cleans up only 
> HDFS temp files, there are still residual files/dirs in 
> *HiveConf.ConfVars.LOCALSCRATCHDIR* location as follows;
> {code:java}
> > ll /tmp/user/97c4ef50-5e80-480e-a6f0-4f779050852b*
> drwx-- 2 user user 4096 Oct 29 10:09 97c4ef50-5e80-480e-a6f0-4f779050852b
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b10571819313894728966.pipeout
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b16013956055489853961.pipeout
> -rw--- 1 user user    0 Oct 29 10:09 
> 

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861113
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 09/May/23 04:06
Start Date: 09/May/23 04:06
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on PR #4194:
URL: https://github.com/apache/hive/pull/4194#issuecomment-1539365660

   The latest change overall looks good to me. @saihemanth-cloudera @nrg4878 
Could you please take a look if have a chance? Thanks!




Issue Time Tracking
---

Worklog Id: (was: 861113)
Time Spent: 18h 50m  (was: 18h 40m)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> to. Any information related to a database, a table or the cluster - like 
> statistics for example or any operational data state or data (think rolling 
> backup) -  fall in this use-case.
> Typically, accommodating such a feature requires modifying the Metastore 
> database schema by adding or altering a table. It also usually implies 
> modifying the thrift APIs to expose such meta-data to consumers.
> The proposed feature wants to solve the persistence and query/transport for 
> these types of use-cases by exposing a 'key/(meta)value' store exposed as a 
> property system.
> HOW
> A property-value model is the simple and generic exposed API.
> To provision for several usage scenarios, the model entry point is a 
> 'namespace' that qualifies the feature-component property manager. For 
> example, 'stats' could be the namespace for all properties related to the 
> 'statistics' feature.
> The namespace identifies a manager that handles property-groups persisted as 
> property-maps. For instance, all statistics pertaining to a given table would 
> be collocated in the same property-group. As such, all properties (say number 
> of 'unique_values' per columns) for a given HMS table 'relation0' would all 
> be stored and persisted in the same property-map instance.
> Property-maps may be decorated by an (optional) schema that may declare the 
> name and value-type of allowed properties (and their optional default value). 
> Each property is addressed by a name, a path uniquely identifying the 
> property in a given property map.
> The manager also handles transforming property-map names to the property-map 
> keys used to persist them in the DB.
> The API provides inserting/updating properties in bulk transactionally. It 
> also provides selection/projection to help reduce the volume of exchange 
> between client/server; selection can use (JEXL expression) predicates to 
> filter maps.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861112
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 09/May/23 04:03
Start Date: 09/May/23 04:03
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1188094645


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/properties/PropertyManager.java:
##
@@ -0,0 +1,641 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.properties;
+
+
+import org.apache.commons.jexl3.JexlBuilder;
+import org.apache.commons.jexl3.JexlContext;
+import org.apache.commons.jexl3.JexlEngine;
+import org.apache.commons.jexl3.JexlException;
+import org.apache.commons.jexl3.JexlExpression;
+import org.apache.commons.jexl3.JexlFeatures;
+import org.apache.commons.jexl3.JexlScript;
+import org.apache.commons.jexl3.ObjectContext;
+import org.apache.commons.jexl3.introspection.JexlPermissions;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.reflect.Constructor;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Properties;
+import java.util.TreeMap;
+import java.util.UUID;
+import java.util.function.Function;
+import java.util.function.Predicate;

Review Comment:
   nit: unused import





Issue Time Tracking
---

Worklog Id: (was: 861112)
Time Spent: 18h 40m  (was: 18.5h)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 18h 40m
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> to. Any information related to a database, a table or the cluster - like 
> statistics for example or any operational data state or data (think rolling 
> backup) -  fall in this use-case.
> Typically, accommodating such a feature requires modifying the Metastore 
> database schema by adding or altering a table. It also usually implies 
> modifying the thrift APIs to expose such meta-data to consumers.
> The proposed feature wants to solve the persistence and query/transport for 
> these types of use-cases by exposing a 'key/(meta)value' store exposed as a 
> property system.
> HOW
> A property-value model is the simple and generic exposed API.
> To provision for several usage scenarios, the model entry point is a 
> 'namespace' that qualifies the feature-component property manager. For 
> example, 'stats' could be the namespace for all properties related to the 
> 'statistics' feature.
> The namespace identifies a manager that handles property-groups persisted as 
> property-maps. For instance, all statistics pertaining to a given table would 
> be collocated in the same property-group. As such, all properties (say number 
> of 'unique_values' per columns) for a given HMS table 'relation0' would all 
> be stored and persisted in the same property-map instance.
> Property-maps may be decorated by an (optional) schema that may declare the 
> name and value-type of allowed properties (and their optional default value). 
> Each property is addressed by a name, a path uniquely 

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=86=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 09/May/23 04:02
Start Date: 09/May/23 04:02
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1188094363


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/properties/CachingPropertyStore.java:
##
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.properties;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import static 
org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars.PROPERTIES_CACHE_CAPACITY;
+import static 
org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars.PROPERTIES_CACHE_LOADFACTOR;
+
+import java.util.Iterator;
+import java.util.Map;
+import java.util.Objects;
+import java.util.TreeMap;
+import java.util.UUID;
+import java.util.function.BiFunction;
+import java.util.function.Function;
+import java.util.function.Predicate;
+
+/**
+ * A property map store using a pull-thru cache.
+ * 
+ * Before a map is returned, a check against the stored corresponding digest 
is performed to avoid
+ * using stale data.
+ * 
+ */
+public class CachingPropertyStore extends PropertyStore {
+  protected final SoftCache maps;
+  protected final PropertyStore store;
+  public CachingPropertyStore(PropertyStore wrap) {
+this(wrap, new Configuration());
+  }
+
+  public CachingPropertyStore(PropertyStore wrap, Configuration conf) {
+store = wrap;
+int capacity = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.PROPERTIES_CACHE_CAPACITY);
+double fillFactor = MetastoreConf.getDoubleVar(conf, 
MetastoreConf.ConfVars.PROPERTIES_CACHE_LOADFACTOR);
+maps = new SoftCache<>(capacity, fillFactor, false);
+  }
+  public void clearCache() {
+maps.clear();
+  }
+
+  @Override public UUID fetchDigest(String mapKey) {
+return store.fetchDigest(mapKey);
+  }
+
+  @Override
+  public Map selectDigest(String keyPrefix, Predicate 
keyFilter) {
+return store.selectDigest(keyPrefix, keyFilter);
+  }
+
+  @Override
+  public PropertyMap fetchProperties(final String mapKey, final 
Function getSchema) {
+synchronized(this) {
+  PropertyMap map = maps.compute(mapKey, mapsCompute(mapKey, getSchema));
+  // we always return a copy of the properties in the cache
+  return map != null? map.copy() : null;
+}
+  }
+
+  BiFunction mapsCompute(String string, 
Function getSchema) {
+return (k, v) -> {
+  PropertyMap map = v;
+  if (map != null) {
+UUID digest = map.getDigest();
+UUID fetchedDigest = fetchDigest(string);
+if (fetchedDigest != null && !Objects.equals(digest, fetchedDigest)) {
+  map = null;
+}
+  }
+  if (map == null) {
+map = store.fetchProperties(string, getSchema);
+  }
+  return map;
+};
+  }
+
+  @Override
+  public Map selectProperties(final String keyPrefix, 
Predicate keyFilter, Function getSchema) {
+final Map results = new TreeMap<>();
+// go select the digests for the maps we seek
+final Map digests = store.selectDigest(keyPrefix, keyFilter);
+final Iterator> idigest = 
digests.entrySet().iterator();
+while (idigest.hasNext()) {
+  Map.Entry entry = idigest.next();
+  String key = entry.getKey();
+  PropertyMap map = maps.get(key);
+  // remove from maps to select and add to results if in the cache and 
digest is valid
+  if (map != null && Objects.equals(map.getDigest(), entry.getValue())) {
+results.put(key, map.copy());
+idigest.remove();
+  }
+}
+// digests now contains the names of maps required that are not results
+Map selectedMaps = store.selectProperties(keyPrefix, 
digests::containsKey, getSchema);
+// we cache those new maps and for 

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861110
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 09/May/23 04:01
Start Date: 09/May/23 04:01
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1188093913


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/properties/PropertyManager.java:
##
@@ -0,0 +1,629 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.properties;
+
+
+import org.apache.commons.jexl3.JexlBuilder;
+import org.apache.commons.jexl3.JexlContext;
+import org.apache.commons.jexl3.JexlEngine;
+import org.apache.commons.jexl3.JexlException;
+import org.apache.commons.jexl3.JexlExpression;
+import org.apache.commons.jexl3.JexlFeatures;
+import org.apache.commons.jexl3.JexlScript;
+import org.apache.commons.jexl3.ObjectContext;
+import org.apache.commons.jexl3.introspection.JexlPermissions;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.reflect.Constructor;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Properties;
+import java.util.TreeMap;
+import java.util.UUID;
+import java.util.function.Function;
+
+/**
+ * A property manager.
+ * 
+ * This handles operations at the higher functional level; an instance is 
created per-session and
+ * drives queries and updates in a transactional manner.
+ * 
+ * 
+ * The manager ties the property schemas into one namespace; all property maps 
it handles must and will use
+ * one of its known schema.
+ * 
+ * The manager class needs to be registered with its namespace as key
+ * 
+ *   Since a collection of properties are stored in a map, to avoid hitting 
the persistence store for each update
+ *   - which would mean rewriting the map multiple times - the manager keeps 
track of dirty maps whilst
+ *   serving as transaction manager. This way, when importing multiple 
properties targeting different elements (think
+ *   setting properties for different tables), each impacted map is only 
rewritten
+ *   once by the persistence layer during commit. This also allows multiple 
calls to participate to one transactions.
+ * 
+ */
+public abstract class PropertyManager {
+  /** The logger. */
+  public static final Logger LOGGER = 
LoggerFactory.getLogger(PropertyManager.class);
+  /** The set of dirty maps. */
+  protected final Map dirtyMaps = new HashMap<>();
+  /** This manager namespace. */
+  protected final String namespace;
+  /** The property map store. */
+  protected final PropertyStore store;
+  /** A Jexl engine for convenience. */
+  static final JexlEngine JEXL;
+  static {
+JexlFeatures features = new JexlFeatures()
+.sideEffect(false)
+.sideEffectGlobal(false);
+JexlPermissions p = JexlPermissions.RESTRICTED
+.compose("org.apache.hadoop.hive.metastore.properties.*");
+JEXL = new JexlBuilder()
+.features(features)
+.permissions(p)
+.create();
+  }
+
+  /**
+   * The map of defined managers.
+   */
+  private static final Map> 
NSMANAGERS = new HashMap<>();
+
+  /**
+   * Declares a property manager class.
+   * @param ns the namespace
+   * @param pmClazz the property manager class
+   */
+  public static boolean declare(String ns, Class 
pmClazz) {
+try {
+  synchronized(NSMANAGERS) {
+Constructor ctor = NSMANAGERS.get(ns);
+if (ctor == null) {
+  ctor = pmClazz.getConstructor(String.class, PropertyStore.class);
+  NSMANAGERS.put(ns, ctor);
+  return true;
+} else {
+  if (!Objects.equals(ctor.getDeclaringClass(), pmClazz)) {
+LOGGER.error("namespace 

[jira] [Commented] (HIVE-27317) Temporary (local) session files cleanup improvements

2023-05-08 Thread Sercan Tekin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720762#comment-17720762
 ] 

Sercan Tekin commented on HIVE-27317:
-

[~daijy], you have implemented *ClearDanglingScratchDir* in the first place, 
could you please take a look at this?

> Temporary (local) session files cleanup improvements
> 
>
> Key: HIVE-27317
> URL: https://issues.apache.org/jira/browse/HIVE-27317
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sercan Tekin
>Assignee: Sercan Tekin
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-27317.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When Hive session is killed, no chance for shutdown hook to clean-up tmp 
> files.
> There is a Hive service to clean residual files 
> https://issues.apache.org/jira/browse/HIVE-13429, and later on its execution 
> is scheduled inside HS2 https://issues.apache.org/jira/browse/HIVE-15068 to 
> make sure not to leave any temp file behind. But this service cleans up only 
> HDFS temp files, there are still residual files/dirs in 
> *HiveConf.ConfVars.LOCALSCRATCHDIR* location as follows;
> {code:java}
> > ll /tmp/user/97c4ef50-5e80-480e-a6f0-4f779050852b*
> drwx-- 2 user user 4096 Oct 29 10:09 97c4ef50-5e80-480e-a6f0-4f779050852b
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b10571819313894728966.pipeout
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b16013956055489853961.pipeout
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b4383913570068173450.pipeout
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b889740171428672108.pipeout {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27325) Expiring old snapshots deletes files with DirectExecutorService causing runtime delays

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27325?focusedWorklogId=861109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861109
 ]

ASF GitHub Bot logged work on HIVE-27325:
-

Author: ASF GitHub Bot
Created on: 09/May/23 03:17
Start Date: 09/May/23 03:17
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on code in PR #4302:
URL: https://github.com/apache/hive/pull/4302#discussion_r1188072886


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -676,6 +688,15 @@ public void 
executeOperation(org.apache.hadoop.hive.ql.metadata.Table hmsTable,
 }
   }
 
+  private static ExecutorService getDeleteExecutorService(String completeName, 
int numThreads) {
+AtomicInteger deleteThreadsIndex = new AtomicInteger(0);
+return Executors.newFixedThreadPool(numThreads, runnable -> {

Review Comment:
   Thanx. Yes, Iceberg ain't closing it. Have addressed the comment





Issue Time Tracking
---

Worklog Id: (was: 861109)
Time Spent: 40m  (was: 0.5h)

> Expiring old snapshots deletes files with DirectExecutorService causing 
> runtime delays
> --
>
> Key: HIVE-27325
> URL: https://issues.apache.org/jira/browse/HIVE-27325
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: iceberg, pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Expiring old snapshots takes a lot of time, as fileCleanupStrategy internally 
> uses directExecutorService. Creating this as a placeholder ticket to fix the 
> same. If fixed in iceberg, need to upgrade the lib here.
> {noformat}
> insert into store_sales_delete_9 select *, current_timestamp() as ts from 
> tpcds_1000_update.ssv ;;
> ALTER TABLE store_sales_delete_9 EXECUTE expire_snapshots('2023-05-09 
> 00:00:00');
> {noformat}
> {noformat}
>   at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
>   at org.apache.iceberg.util.Tasks$Builder.runParallel(Tasks.java:300)
>   at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:194)
>   at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189)
>   at 
> org.apache.iceberg.FileCleanupStrategy.deleteFiles(FileCleanupStrategy.java:84)
>   at 
> org.apache.iceberg.IncrementalFileCleanup.cleanFiles(IncrementalFileCleanup.java:262)
>   at 
> org.apache.iceberg.RemoveSnapshots.cleanExpiredSnapshots(RemoveSnapshots.java:338)
>   at org.apache.iceberg.RemoveSnapshots.commit(RemoveSnapshots.java:312)
>   at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.executeOperation(HiveIcebergStorageHandler.java:560)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6844)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
>   at java.security.AccessController.doPrivileged(java.base@11.0.19/Native 
> Method)
>   at javax.security.auth.Subject.doAs(java.base@11.0.19/Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
>   at 
> 

[jira] [Work logged] (HIVE-27325) Expiring old snapshots deletes files with DirectExecutorService causing runtime delays

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27325?focusedWorklogId=861106=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861106
 ]

ASF GitHub Bot logged work on HIVE-27325:
-

Author: ASF GitHub Bot
Created on: 09/May/23 02:41
Start Date: 09/May/23 02:41
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on code in PR #4302:
URL: https://github.com/apache/hive/pull/4302#discussion_r1188058912


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -676,6 +688,15 @@ public void 
executeOperation(org.apache.hadoop.hive.ql.metadata.Table hmsTable,
 }
   }
 
+  private static ExecutorService getDeleteExecutorService(String completeName, 
int numThreads) {
+AtomicInteger deleteThreadsIndex = new AtomicInteger(0);
+return Executors.newFixedThreadPool(numThreads, runnable -> {

Review Comment:
   Iceberg API may not take care of TP lifecycle. Do you need to take care of 
shutting down TP in finally block after expire snapshots? Otherwise it will end 
up creating too may TP depending on number of executions.





Issue Time Tracking
---

Worklog Id: (was: 861106)
Time Spent: 0.5h  (was: 20m)

> Expiring old snapshots deletes files with DirectExecutorService causing 
> runtime delays
> --
>
> Key: HIVE-27325
> URL: https://issues.apache.org/jira/browse/HIVE-27325
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: iceberg, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Expiring old snapshots takes a lot of time, as fileCleanupStrategy internally 
> uses directExecutorService. Creating this as a placeholder ticket to fix the 
> same. If fixed in iceberg, need to upgrade the lib here.
> {noformat}
> insert into store_sales_delete_9 select *, current_timestamp() as ts from 
> tpcds_1000_update.ssv ;;
> ALTER TABLE store_sales_delete_9 EXECUTE expire_snapshots('2023-05-09 
> 00:00:00');
> {noformat}
> {noformat}
>   at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
>   at org.apache.iceberg.util.Tasks$Builder.runParallel(Tasks.java:300)
>   at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:194)
>   at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189)
>   at 
> org.apache.iceberg.FileCleanupStrategy.deleteFiles(FileCleanupStrategy.java:84)
>   at 
> org.apache.iceberg.IncrementalFileCleanup.cleanFiles(IncrementalFileCleanup.java:262)
>   at 
> org.apache.iceberg.RemoveSnapshots.cleanExpiredSnapshots(RemoveSnapshots.java:338)
>   at org.apache.iceberg.RemoveSnapshots.commit(RemoveSnapshots.java:312)
>   at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.executeOperation(HiveIcebergStorageHandler.java:560)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6844)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
>   at java.security.AccessController.doPrivileged(java.base@11.0.19/Native 
> Method)
>   at javax.security.auth.Subject.doAs(java.base@11.0.19/Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
>   at 
> 

[jira] [Work logged] (HIVE-27311) Improve LDAP auth to support generic search bind authentication

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27311?focusedWorklogId=861102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861102
 ]

ASF GitHub Bot logged work on HIVE-27311:
-

Author: ASF GitHub Bot
Created on: 09/May/23 02:08
Start Date: 09/May/23 02:08
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4284:
URL: https://github.com/apache/hive/pull/4284#issuecomment-1539288182

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4284)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4284=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4284=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4284=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=CODE_SMELL)
 [11 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4284=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4284=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 861102)
Time Spent: 1h 40m  (was: 1.5h)

> Improve LDAP auth to support generic search bind authentication
> ---
>
> Key: HIVE-27311
> URL: https://issues.apache.org/jira/browse/HIVE-27311
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-2
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Hive's LDAP auth configuration is home-baked and a bit specific to hive. This 
> was by design intending to be as flexible as it can be for accommodating 
> various LDAP implementations. But this does not necessarily make it easy to 
> configure hive with such custom values for ldap filtering when most other 
> components accept generic ldap filters, for example: search bind filters.
> There has to be a layer of translation to have it configured. Instead we can 
> enhance Hive to support generic search bind filters.
> To support this, I am proposing adding NEW alternate configurations. 
> hive.server2.authentication.ldap.userSearchFilter
> hive.server2.authentication.ldap.groupSearchFilter
> hive.server2.authentication.ldap.groupBaseDN
> Search bind filtering will also use EXISTING config param
> hive.server2.authentication.ldap.baseDN
> This is alternate configuration and will be used 

[jira] [Work logged] (HIVE-27163) Column stats are not getting published after an insert query into an external table with custom location

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27163?focusedWorklogId=861099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861099
 ]

ASF GitHub Bot logged work on HIVE-27163:
-

Author: ASF GitHub Bot
Created on: 09/May/23 01:47
Start Date: 09/May/23 01:47
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4228:
URL: https://github.com/apache/hive/pull/4228#issuecomment-1539273649

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4228)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4228=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4228=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4228=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=CODE_SMELL)
 [17 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4228=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4228=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 861099)
Time Spent: 4.5h  (was: 4h 20m)

> Column stats are not getting published after an insert query into an external 
> table with custom location
> 
>
> Key: HIVE-27163
> URL: https://issues.apache.org/jira/browse/HIVE-27163
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Test case details are below
> *test.q*
> {noformat}
> set hive.stats.column.autogather=true;
> set hive.stats.autogather=true;
> dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test;
> create external table test_custom(age int, name string) stored as orc 
> location '/tmp/test';
> insert into test_custom select 1, 'test';
> desc formatted test_custom age;{noformat}
> *test.q.out*
>  
>  
> {noformat}
>  A masked pattern was here 
> PREHOOK: type: CREATETABLE
>  A masked pattern was here 
> PREHOOK: Output: database:default
> PREHOOK: Output: default@test_custom
>  A masked pattern was here 
> POSTHOOK: type: CREATETABLE
>  A masked pattern was here 
> POSTHOOK: Output: database:default
> POSTHOOK: Output: default@test_custom
> PREHOOK: query: insert into test_custom select 1, 'test'
> PREHOOK: type: QUERY
> 

[jira] [Work logged] (HIVE-27277) Set up github actions workflow to build and push docker image to docker hub

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27277?focusedWorklogId=861097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861097
 ]

ASF GitHub Bot logged work on HIVE-27277:
-

Author: ASF GitHub Bot
Created on: 09/May/23 01:10
Start Date: 09/May/23 01:10
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4298:
URL: https://github.com/apache/hive/pull/4298#discussion_r1188022575


##
packaging/src/docker/Dockerfile:
##
@@ -14,14 +14,31 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
+ARG BUILD_ENV
+
 FROM ubuntu as unarchive
+ONBUILD COPY hadoop-*.tar.gz /opt
+ONBUILD COPY apache-hive-*-bin.tar.gz /opt
+ONBUILD COPY apache-tez-*-bin.tar.gz /opt
+
+FROM ubuntu as archive
+ARG HADOOP_VERSION
+ARG HIVE_VERSION
+ARG TEZ_VERSION
+ONBUILD RUN apt-get update && apt-get -y install wget
+ONBUILD RUN wget 
https://archive.apache.org/dist/tez/$TEZ_VERSION/apache-tez-$TEZ_VERSION-bin.tar.gz
 && \
+ wget 
https://archive.apache.org/dist/hadoop/core/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
 && \
+ wget 
https://archive.apache.org/dist/hive/hive-$HIVE_VERSION/apache-hive-$HIVE_VERSION-bin.tar.gz

Review Comment:
   nit: When we create a rel/release-*, in Dockerfile we get the hive tarball 
from 
https://archive.apache.org/dist/hive/hive-$HIVE_VERSION/apache-hive-$HIVE_VERSION-bin.tar.gz,
 I'm not sure it would be there on the event rel/release-* created.





Issue Time Tracking
---

Worklog Id: (was: 861097)
Time Spent: 4h  (was: 3h 50m)

> Set up github actions workflow to build and push docker image to docker hub
> ---
>
> Key: HIVE-27277
> URL: https://issues.apache.org/jira/browse/HIVE-27277
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27319) HMS server should throw InvalidObjectException in get_partitions_by_names() when the table is missing/dropped

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27319?focusedWorklogId=861094=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861094
 ]

ASF GitHub Bot logged work on HIVE-27319:
-

Author: ASF GitHub Bot
Created on: 09/May/23 00:45
Start Date: 09/May/23 00:45
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4299:
URL: https://github.com/apache/hive/pull/4299#discussion_r1188013427


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java:
##
@@ -127,7 +128,7 @@ public ExceptionHandler toMetaExceptionIfInstance(String 
message, Class... cl
*/
   public static void rethrowException(Exception e) throws TException {
 throw handleException(e)
-.throwIfInstance(MetaException.class, NoSuchObjectException.class)
+.throwIfInstance(MetaException.class, NoSuchObjectException.class, 
InvalidObjectException.class)

Review Comment:
   The `rethrowException` is called in multiple APIs, we should make sure these 
APIs able to throw InvalidObjectException, otherwise the client will only see 
`TApplicationException`, which makes things worse. 
   The `MetaException` at least will convey the message of 
`InvalidObjectException` to the client.
   I think we'd better modify the way `get_partition_by_names_req()` handles 
the exception.





Issue Time Tracking
---

Worklog Id: (was: 861094)
Time Spent: 1h 20m  (was: 1h 10m)

> HMS server should throw InvalidObjectException in get_partitions_by_names() 
> when the table is missing/dropped
> -
>
> Key: HIVE-27319
> URL: https://issues.apache.org/jira/browse/HIVE-27319
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When the table object is dropped by a concurrent thread, the 
> get_partitions_by_names_req() API is currently throwing a 
> TApplicationException to the client. Instead, the HMS server should propagate 
> the InvalidObjectException thrown by getTable() to the HMS client. By doing 
> this, other services using HMS client will understand the exception better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27319) HMS server should throw InvalidObjectException in get_partitions_by_names() when the table is missing/dropped

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27319?focusedWorklogId=861093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861093
 ]

ASF GitHub Bot logged work on HIVE-27319:
-

Author: ASF GitHub Bot
Created on: 09/May/23 00:34
Start Date: 09/May/23 00:34
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4299:
URL: https://github.com/apache/hive/pull/4299#issuecomment-1539231907

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4299)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4299=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4299=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4299=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4299=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4299=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4299=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4299=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4299=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4299=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4299=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4299=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4299=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4299=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4299=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 861093)
Time Spent: 1h 10m  (was: 1h)

> HMS server should throw InvalidObjectException in get_partitions_by_names() 
> when the table is missing/dropped
> -
>
> Key: HIVE-27319
> URL: https://issues.apache.org/jira/browse/HIVE-27319
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When the table object is dropped by a concurrent thread, the 
> get_partitions_by_names_req() API is currently throwing a 
> TApplicationException to the client. Instead, the HMS server should propagate 
> the InvalidObjectException thrown by getTable() to the HMS client. By doing 
> this, other services using HMS client will understand the exception better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27112) implement array_except UDF in Hive

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27112?focusedWorklogId=861091=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861091
 ]

ASF GitHub Bot logged work on HIVE-27112:
-

Author: ASF GitHub Bot
Created on: 08/May/23 23:00
Start Date: 08/May/23 23:00
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on PR #4090:
URL: https://github.com/apache/hive/pull/4090#issuecomment-1539169056

   Can you rerun the tests on this PR? I can't find a way to rerun the Build CI 
test suit. Can you push an empty commit? Thanks.




Issue Time Tracking
---

Worklog Id: (was: 861091)
Time Spent: 1h 40m  (was: 1.5h)

> implement array_except UDF in Hive
> --
>
> Key: HIVE-27112
> URL: https://issues.apache.org/jira/browse/HIVE-27112
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> *array_except(array1, array2)* 
> Returns an array of the elements in {{array1}} but not in {{array2, without 
> duplicates.}}
>  
> {noformat}
> > SELECT array_except(array(1, 2, 2, 3), array(1, 1, 3, 5));
> [2]
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27268) Hive.getPartitionsByNames should not enforce SessionState to be available

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27268?focusedWorklogId=861090=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861090
 ]

ASF GitHub Bot logged work on HIVE-27268:
-

Author: ASF GitHub Bot
Created on: 08/May/23 22:54
Start Date: 08/May/23 22:54
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4241:
URL: https://github.com/apache/hive/pull/4241#issuecomment-1539163937

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4241)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4241=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4241=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4241=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=CODE_SMELL)
 [7 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4241=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4241=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 861090)
Time Spent: 1h 50m  (was: 1h 40m)

> Hive.getPartitionsByNames should not enforce SessionState to be available
> -
>
> Key: HIVE-27268
> URL: https://issues.apache.org/jira/browse/HIVE-27268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Henri Biestro
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> HIVE-24743, HIVE-24392 is enforcing to check for valid write Id list for 
> "Hive.getPartitionsByName".
> This breaks basic API integration. For a user who needs to get basic 
> partition detail, he is forced to have SessionState.
> Request in this ticket is to ensure that if SessionState.get() is null, it 
> should return empty validWriteIdList.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27118) implement array_intersect UDF in Hive

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27118?focusedWorklogId=861089=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861089
 ]

ASF GitHub Bot logged work on HIVE-27118:
-

Author: ASF GitHub Bot
Created on: 08/May/23 22:51
Start Date: 08/May/23 22:51
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #4094:
URL: https://github.com/apache/hive/pull/4094#discussion_r1187967449


##
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayIntersect.java:
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file intersect in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * GenericUDFArrayIntersect.
+ */
+@Description(name = "array_intersect", value = "_FUNC_(array1, array2) - 
Returns an array of the elements in the intersection of array1 and array2, 
without duplicates.", extended =
+"Example:\n" + "  > SELECT _FUNC_(array(1, 2, 3,4), array(1,2,3)) FROM 
src;\n"
++ "  [1,2,3]") public class GenericUDFArrayIntersect extends 
AbstractGenericUDFArrayBase {

Review Comment:
   Nit: `+ "  [1,2,3]") ` should be moved to the line above.





Issue Time Tracking
---

Worklog Id: (was: 861089)
Time Spent: 1.5h  (was: 1h 20m)

> implement array_intersect UDF in Hive
> -
>
> Key: HIVE-27118
> URL: https://issues.apache.org/jira/browse/HIVE-27118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> *array_intersect(array1, array2)*
> {{Returns an array of the elements in the intersection of array1}} and 
> {{{}array2{}}}, without duplicates.
>  
> {noformat}
> > SELECT array_intersect(array(1, 2, 2, 3), array(1, 1, 3, 5));
> [1,3]
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27118) implement array_intersect UDF in Hive

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27118?focusedWorklogId=861087=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861087
 ]

ASF GitHub Bot logged work on HIVE-27118:
-

Author: ASF GitHub Bot
Created on: 08/May/23 22:36
Start Date: 08/May/23 22:36
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #4094:
URL: https://github.com/apache/hive/pull/4094#discussion_r1187960694


##
ql/src/test/queries/clientpositive/udf_array_intersect.q:
##
@@ -0,0 +1,42 @@
+--! qt:dataset:src
+
+

Issue Time Tracking
---

Worklog Id: (was: 861087)
Time Spent: 1h 20m  (was: 1h 10m)

> implement array_intersect UDF in Hive
> -
>
> Key: HIVE-27118
> URL: https://issues.apache.org/jira/browse/HIVE-27118
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *array_intersect(array1, array2)*
> {{Returns an array of the elements in the intersection of array1}} and 
> {{{}array2{}}}, without duplicates.
>  
> {noformat}
> > SELECT array_intersect(array(1, 2, 2, 3), array(1, 1, 3, 5));
> [1,3]
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27325) Expiring old snapshots deletes files with DirectExecutorService causing runtime delays

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27325?focusedWorklogId=861085=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861085
 ]

ASF GitHub Bot logged work on HIVE-27325:
-

Author: ASF GitHub Bot
Created on: 08/May/23 22:01
Start Date: 08/May/23 22:01
Worklog Time Spent: 10m 
  Work Description: aturoczy commented on PR #4302:
URL: https://github.com/apache/hive/pull/4302#issuecomment-1539110193

   This upstream ticket just recently created! How the hack you could be as 
fast to understand and implement it? :) Really I wanted to ask about the 
details, and you have done :D It is not fair! :) 




Issue Time Tracking
---

Worklog Id: (was: 861085)
Time Spent: 20m  (was: 10m)

> Expiring old snapshots deletes files with DirectExecutorService causing 
> runtime delays
> --
>
> Key: HIVE-27325
> URL: https://issues.apache.org/jira/browse/HIVE-27325
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: iceberg, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Expiring old snapshots takes a lot of time, as fileCleanupStrategy internally 
> uses directExecutorService. Creating this as a placeholder ticket to fix the 
> same. If fixed in iceberg, need to upgrade the lib here.
> {noformat}
> insert into store_sales_delete_9 select *, current_timestamp() as ts from 
> tpcds_1000_update.ssv ;;
> ALTER TABLE store_sales_delete_9 EXECUTE expire_snapshots('2023-05-09 
> 00:00:00');
> {noformat}
> {noformat}
>   at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
>   at org.apache.iceberg.util.Tasks$Builder.runParallel(Tasks.java:300)
>   at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:194)
>   at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189)
>   at 
> org.apache.iceberg.FileCleanupStrategy.deleteFiles(FileCleanupStrategy.java:84)
>   at 
> org.apache.iceberg.IncrementalFileCleanup.cleanFiles(IncrementalFileCleanup.java:262)
>   at 
> org.apache.iceberg.RemoveSnapshots.cleanExpiredSnapshots(RemoveSnapshots.java:338)
>   at org.apache.iceberg.RemoveSnapshots.commit(RemoveSnapshots.java:312)
>   at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.executeOperation(HiveIcebergStorageHandler.java:560)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6844)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
>   at java.security.AccessController.doPrivileged(java.base@11.0.19/Native 
> Method)
>   at javax.security.auth.Subject.doAs(java.base@11.0.19/Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
>   at 
> java.util.concurrent.FutureTask.run(java.base@11.0.19/FutureTask.java:264)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27311) Improve LDAP auth to support generic search bind authentication

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27311?focusedWorklogId=861084=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861084
 ]

ASF GitHub Bot logged work on HIVE-27311:
-

Author: ASF GitHub Bot
Created on: 08/May/23 21:36
Start Date: 08/May/23 21:36
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4284:
URL: https://github.com/apache/hive/pull/4284#issuecomment-1539087597

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4284)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4284=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4284=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4284=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=CODE_SMELL)
 [11 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4284=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4284=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4284=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 861084)
Time Spent: 1.5h  (was: 1h 20m)

> Improve LDAP auth to support generic search bind authentication
> ---
>
> Key: HIVE-27311
> URL: https://issues.apache.org/jira/browse/HIVE-27311
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-2
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Hive's LDAP auth configuration is home-baked and a bit specific to hive. This 
> was by design intending to be as flexible as it can be for accommodating 
> various LDAP implementations. But this does not necessarily make it easy to 
> configure hive with such custom values for ldap filtering when most other 
> components accept generic ldap filters, for example: search bind filters.
> There has to be a layer of translation to have it configured. Instead we can 
> enhance Hive to support generic search bind filters.
> To support this, I am proposing adding NEW alternate configurations. 
> hive.server2.authentication.ldap.userSearchFilter
> hive.server2.authentication.ldap.groupSearchFilter
> hive.server2.authentication.ldap.groupBaseDN
> Search bind filtering will also use EXISTING config param
> hive.server2.authentication.ldap.baseDN
> This is alternate configuration and will be used first 

[jira] [Work logged] (HIVE-27325) Expiring old snapshots deletes files with DirectExecutorService causing runtime delays

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27325?focusedWorklogId=861082=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861082
 ]

ASF GitHub Bot logged work on HIVE-27325:
-

Author: ASF GitHub Bot
Created on: 08/May/23 21:29
Start Date: 08/May/23 21:29
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request, #4302:
URL: https://github.com/apache/hive/pull/4302

   ### What changes were proposed in this pull request?
   
   Allow using TPE for snapshot expiry
   
   ### Why are the changes needed?
   
   Perf benefits
   
   ### Does this PR introduce _any_ user-facing change?
   
   Quicker response
   
   ### How was this patch tested?
   
   Manually 




Issue Time Tracking
---

Worklog Id: (was: 861082)
Remaining Estimate: 0h
Time Spent: 10m

> Expiring old snapshots deletes files with DirectExecutorService causing 
> runtime delays
> --
>
> Key: HIVE-27325
> URL: https://issues.apache.org/jira/browse/HIVE-27325
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: iceberg
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Expiring old snapshots takes a lot of time, as fileCleanupStrategy internally 
> uses directExecutorService. Creating this as a placeholder ticket to fix the 
> same. If fixed in iceberg, need to upgrade the lib here.
> {noformat}
> insert into store_sales_delete_9 select *, current_timestamp() as ts from 
> tpcds_1000_update.ssv ;;
> ALTER TABLE store_sales_delete_9 EXECUTE expire_snapshots('2023-05-09 
> 00:00:00');
> {noformat}
> {noformat}
>   at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
>   at org.apache.iceberg.util.Tasks$Builder.runParallel(Tasks.java:300)
>   at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:194)
>   at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189)
>   at 
> org.apache.iceberg.FileCleanupStrategy.deleteFiles(FileCleanupStrategy.java:84)
>   at 
> org.apache.iceberg.IncrementalFileCleanup.cleanFiles(IncrementalFileCleanup.java:262)
>   at 
> org.apache.iceberg.RemoveSnapshots.cleanExpiredSnapshots(RemoveSnapshots.java:338)
>   at org.apache.iceberg.RemoveSnapshots.commit(RemoveSnapshots.java:312)
>   at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.executeOperation(HiveIcebergStorageHandler.java:560)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6844)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
>   at java.security.AccessController.doPrivileged(java.base@11.0.19/Native 
> Method)
>   at javax.security.auth.Subject.doAs(java.base@11.0.19/Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
>   at 
> java.util.concurrent.FutureTask.run(java.base@11.0.19/FutureTask.java:264)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27325) Expiring old snapshots deletes files with DirectExecutorService causing runtime delays

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27325:
--
Labels: iceberg pull-request-available  (was: iceberg)

> Expiring old snapshots deletes files with DirectExecutorService causing 
> runtime delays
> --
>
> Key: HIVE-27325
> URL: https://issues.apache.org/jira/browse/HIVE-27325
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: iceberg, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Expiring old snapshots takes a lot of time, as fileCleanupStrategy internally 
> uses directExecutorService. Creating this as a placeholder ticket to fix the 
> same. If fixed in iceberg, need to upgrade the lib here.
> {noformat}
> insert into store_sales_delete_9 select *, current_timestamp() as ts from 
> tpcds_1000_update.ssv ;;
> ALTER TABLE store_sales_delete_9 EXECUTE expire_snapshots('2023-05-09 
> 00:00:00');
> {noformat}
> {noformat}
>   at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
>   at org.apache.iceberg.util.Tasks$Builder.runParallel(Tasks.java:300)
>   at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:194)
>   at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189)
>   at 
> org.apache.iceberg.FileCleanupStrategy.deleteFiles(FileCleanupStrategy.java:84)
>   at 
> org.apache.iceberg.IncrementalFileCleanup.cleanFiles(IncrementalFileCleanup.java:262)
>   at 
> org.apache.iceberg.RemoveSnapshots.cleanExpiredSnapshots(RemoveSnapshots.java:338)
>   at org.apache.iceberg.RemoveSnapshots.commit(RemoveSnapshots.java:312)
>   at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.executeOperation(HiveIcebergStorageHandler.java:560)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6844)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
>   at java.security.AccessController.doPrivileged(java.base@11.0.19/Native 
> Method)
>   at javax.security.auth.Subject.doAs(java.base@11.0.19/Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
>   at 
> java.util.concurrent.FutureTask.run(java.base@11.0.19/FutureTask.java:264)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27325) Expiring old snapshots deletes files with DirectExecutorService causing runtime delays

2023-05-08 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-27325:
---

Assignee: Ayush Saxena

> Expiring old snapshots deletes files with DirectExecutorService causing 
> runtime delays
> --
>
> Key: HIVE-27325
> URL: https://issues.apache.org/jira/browse/HIVE-27325
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: iceberg
>
> Expiring old snapshots takes a lot of time, as fileCleanupStrategy internally 
> uses directExecutorService. Creating this as a placeholder ticket to fix the 
> same. If fixed in iceberg, need to upgrade the lib here.
> {noformat}
> insert into store_sales_delete_9 select *, current_timestamp() as ts from 
> tpcds_1000_update.ssv ;;
> ALTER TABLE store_sales_delete_9 EXECUTE expire_snapshots('2023-05-09 
> 00:00:00');
> {noformat}
> {noformat}
>   at 
> org.apache.iceberg.relocated.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
>   at org.apache.iceberg.util.Tasks$Builder.runParallel(Tasks.java:300)
>   at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:194)
>   at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189)
>   at 
> org.apache.iceberg.FileCleanupStrategy.deleteFiles(FileCleanupStrategy.java:84)
>   at 
> org.apache.iceberg.IncrementalFileCleanup.cleanFiles(IncrementalFileCleanup.java:262)
>   at 
> org.apache.iceberg.RemoveSnapshots.cleanExpiredSnapshots(RemoveSnapshots.java:338)
>   at org.apache.iceberg.RemoveSnapshots.commit(RemoveSnapshots.java:312)
>   at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.executeOperation(HiveIcebergStorageHandler.java:560)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6844)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
>   at java.security.AccessController.doPrivileged(java.base@11.0.19/Native 
> Method)
>   at javax.security.auth.Subject.doAs(java.base@11.0.19/Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
>   at 
> java.util.concurrent.FutureTask.run(java.base@11.0.19/FutureTask.java:264)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27319) HMS server should throw InvalidObjectException in get_partitions_by_names() when the table is missing/dropped

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27319?focusedWorklogId=861079=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861079
 ]

ASF GitHub Bot logged work on HIVE-27319:
-

Author: ASF GitHub Bot
Created on: 08/May/23 20:50
Start Date: 08/May/23 20:50
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #4299:
URL: https://github.com/apache/hive/pull/4299#discussion_r1187886173


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java:
##
@@ -127,7 +128,7 @@ public ExceptionHandler toMetaExceptionIfInstance(String 
message, Class... cl
*/
   public static void rethrowException(Exception e) throws TException {
 throw handleException(e)
-.throwIfInstance(MetaException.class, NoSuchObjectException.class)
+.throwIfInstance(MetaException.class, NoSuchObjectException.class, 
InvalidObjectException.class)

Review Comment:
   The reason why I explicitly changed this definition is that the HMS server 
should always throw the right exception to the client. If the 
InvalidObjectException is not declared in the HMS APIs then we should do so 
thereby letting know the client about the underlying cause. If we are 
disguising the InvalidObjectException in the form of MetaException, then I 
think we are throwing the incorrect exception.
   Do you still think we need to modify the way we throw the exception in 
get_partition_by_names_req() API instead of changing it globally?





Issue Time Tracking
---

Worklog Id: (was: 861079)
Time Spent: 1h  (was: 50m)

> HMS server should throw InvalidObjectException in get_partitions_by_names() 
> when the table is missing/dropped
> -
>
> Key: HIVE-27319
> URL: https://issues.apache.org/jira/browse/HIVE-27319
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When the table object is dropped by a concurrent thread, the 
> get_partitions_by_names_req() API is currently throwing a 
> TApplicationException to the client. Instead, the HMS server should propagate 
> the InvalidObjectException thrown by getTable() to the HMS client. By doing 
> this, other services using HMS client will understand the exception better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861076
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 20:31
Start Date: 08/May/23 20:31
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4194:
URL: https://github.com/apache/hive/pull/4194#issuecomment-1539010667

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4194)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=BUG)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=BUG)
 [5 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=VULNERABILITY)
 
[![B](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/B-16px.png
 
'B')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=VULNERABILITY)
 [5 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4194=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4194=false=SECURITY_HOTSPOT)
 [1 Security 
Hotspot](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4194=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=CODE_SMELL)
 [121 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4194=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4194=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 861076)
Time Spent: 18h 10m  (was: 18h)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 18h 10m
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> to. Any information related to a database, a table or the cluster - like 
> statistics for example or any operational data state or data (think rolling 
> backup) -  fall in this use-case.
> Typically, accommodating such a feature requires modifying the Metastore 
> database schema by adding or altering a table. It also usually implies 
> modifying the thrift APIs to expose such meta-data to consumers.
> The proposed feature wants to solve the persistence and query/transport for 
> these types of use-cases by exposing a 'key/(meta)value' store exposed as a 
> property system.
> HOW
> A property-value model is the simple and generic exposed API.
> To provision for several usage scenarios, the 

[jira] [Work logged] (HIVE-27327) Iceberg basic stats: Incorrect row count in snapshot summary leading to unoptimized plans

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27327?focusedWorklogId=861067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861067
 ]

ASF GitHub Bot logged work on HIVE-27327:
-

Author: ASF GitHub Bot
Created on: 08/May/23 18:36
Start Date: 08/May/23 18:36
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4301:
URL: https://github.com/apache/hive/pull/4301#issuecomment-1538853763

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4301)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4301=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4301=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4301=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4301=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4301=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4301=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4301=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4301=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4301=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4301=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4301=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4301=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4301=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4301=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 861067)
Time Spent: 40m  (was: 0.5h)

> Iceberg basic stats: Incorrect row count in snapshot summary leading to 
> unoptimized plans
> -
>
> Key: HIVE-27327
> URL: https://issues.apache.org/jira/browse/HIVE-27327
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In the absence of equality deletes, the total row count should be :
> {noformat}
> row_count = total-records - total-position-deletes{noformat}
>  
>  
> Example:
> After many inserts and deletes, there are only 46 records in a table.
> {noformat}
> >>select count(*) from llap_orders;
> +--+
> | _c0  |
> +--+
> | 46   |
> +--+
> 1 row selected (7.22 seconds)
> {noformat}
>  
> But the total records in snapshot summary indicate that there are 300 records
>  
> {noformat}
>  {
>     "sequence-number" : 19,
>     "snapshot-id" : 4237525869561629328,
>     "parent-snapshot-id" : 2572487769557272977,
>     "timestamp-ms" : 1683553017982,
>     "summary" : {
>       "operation" : "append",
>       "added-data-files" : "5",
>       "added-records" : "12",
>       "added-files-size" : "3613",
>       "changed-partition-count" : "5",
>       "total-records" : "300",
>       "total-files-size" : "164405",
> 

[jira] [Work logged] (HIVE-27327) Iceberg basic stats: Incorrect row count in snapshot summary leading to unoptimized plans

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27327?focusedWorklogId=861059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861059
 ]

ASF GitHub Bot logged work on HIVE-27327:
-

Author: ASF GitHub Bot
Created on: 08/May/23 17:32
Start Date: 08/May/23 17:32
Worklog Time Spent: 10m 
  Work Description: aturoczy commented on code in PR #4301:
URL: https://github.com/apache/hive/pull/4301#discussion_r1187698990


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -346,7 +346,16 @@ public Map getBasicStatistics(Partish 
partish) {
   stats.put(StatsSetupConst.NUM_FILES, 
summary.get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
 }
 if (summary.containsKey(SnapshotSummary.TOTAL_RECORDS_PROP)) {
-  stats.put(StatsSetupConst.ROW_COUNT, 
summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+  long totalRecords = 
Long.parseLong(summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+  if (summary.containsKey(SnapshotSummary.TOTAL_EQ_DELETES_PROP) &&
+  summary.containsKey(SnapshotSummary.TOTAL_POS_DELETES_PROP)) 
{
+Long actualRecords =
+totalRecords - 
(Long.parseLong(summary.get(SnapshotSummary.TOTAL_EQ_DELETES_PROP)) >

Review Comment:
   It is a code smell a bit imho.
   
   When you are using conditional operator your eyes need to parse a variable 
that you put it into 2 methods. So to read / debug the value you need to have 
an understanding what the summary.get returns and after the Long.prase (last is 
easier for sure.  Maybe It is fully OK and you can choose to not accept this 
opinion but I think it is nicer if you create a variable for those values. 
   



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -346,7 +346,16 @@ public Map getBasicStatistics(Partish 
partish) {
   stats.put(StatsSetupConst.NUM_FILES, 
summary.get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
 }
 if (summary.containsKey(SnapshotSummary.TOTAL_RECORDS_PROP)) {
-  stats.put(StatsSetupConst.ROW_COUNT, 
summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+  long totalRecords = 
Long.parseLong(summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+  if (summary.containsKey(SnapshotSummary.TOTAL_EQ_DELETES_PROP) &&
+  summary.containsKey(SnapshotSummary.TOTAL_POS_DELETES_PROP)) 
{
+Long actualRecords =
+totalRecords - 
(Long.parseLong(summary.get(SnapshotSummary.TOTAL_EQ_DELETES_PROP)) >

Review Comment:
   Btw why this getBasicStatistics needs switch case? 





Issue Time Tracking
---

Worklog Id: (was: 861059)
Time Spent: 0.5h  (was: 20m)

> Iceberg basic stats: Incorrect row count in snapshot summary leading to 
> unoptimized plans
> -
>
> Key: HIVE-27327
> URL: https://issues.apache.org/jira/browse/HIVE-27327
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In the absence of equality deletes, the total row count should be :
> {noformat}
> row_count = total-records - total-position-deletes{noformat}
>  
>  
> Example:
> After many inserts and deletes, there are only 46 records in a table.
> {noformat}
> >>select count(*) from llap_orders;
> +--+
> | _c0  |
> +--+
> | 46   |
> +--+
> 1 row selected (7.22 seconds)
> {noformat}
>  
> But the total records in snapshot summary indicate that there are 300 records
>  
> {noformat}
>  {
>     "sequence-number" : 19,
>     "snapshot-id" : 4237525869561629328,
>     "parent-snapshot-id" : 2572487769557272977,
>     "timestamp-ms" : 1683553017982,
>     "summary" : {
>       "operation" : "append",
>       "added-data-files" : "5",
>       "added-records" : "12",
>       "added-files-size" : "3613",
>       "changed-partition-count" : "5",
>       "total-records" : "300",
>       "total-files-size" : "164405",
>       "total-data-files" : "100",
>       "total-delete-files" : "73",
>       "total-position-deletes" : "254",
>       "total-equality-deletes" : "0"
>     }{noformat}
>  
> As a result of this , the hive plans generated are unoptimized.
> {noformat}
> 0: jdbc:hive2://simhadrigovindappa-2.simhadri> explain update llap_orders set 
> itemid=7 where itemid=5;
> INFO  : OK
> ++
> |                      Explain                       |
> 

[jira] [Work logged] (HIVE-27327) Iceberg basic stats: Incorrect row count in snapshot summary leading to unoptimized plans

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27327?focusedWorklogId=861058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861058
 ]

ASF GitHub Bot logged work on HIVE-27327:
-

Author: ASF GitHub Bot
Created on: 08/May/23 17:24
Start Date: 08/May/23 17:24
Worklog Time Spent: 10m 
  Work Description: InvisibleProgrammer commented on code in PR #4301:
URL: https://github.com/apache/hive/pull/4301#discussion_r1187686934


##
iceberg/iceberg-handler/src/test/queries/positive/row_count.q:
##
@@ -0,0 +1,42 @@
+drop table llap_orders;

Review Comment:
   I'm not 100% sure but maybe there can be an edge case when the table is not 
cleaned up properly. In that case, I would consider using `DROP TABLE IF 
EXISTS...`



##
iceberg/iceberg-handler/src/test/results/positive/row_count.q.out:
##
@@ -0,0 +1,302 @@
+PREHOOK: query: drop table llap_orders

Review Comment:
   Please mask out the values that can change frequently, like 
`current-snapshot-id`, `added-files-size`, etc... 



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -346,7 +346,16 @@ public Map getBasicStatistics(Partish 
partish) {
   stats.put(StatsSetupConst.NUM_FILES, 
summary.get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
 }
 if (summary.containsKey(SnapshotSummary.TOTAL_RECORDS_PROP)) {
-  stats.put(StatsSetupConst.ROW_COUNT, 
summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+  long totalRecords = 
Long.parseLong(summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+  if (summary.containsKey(SnapshotSummary.TOTAL_EQ_DELETES_PROP) &&

Review Comment:
   What if onlye one of `TOTAL_EQ_DELETES_PROP` and `TOTAL_POS_DELETES_PROP` 
persists?





Issue Time Tracking
---

Worklog Id: (was: 861058)
Time Spent: 20m  (was: 10m)

> Iceberg basic stats: Incorrect row count in snapshot summary leading to 
> unoptimized plans
> -
>
> Key: HIVE-27327
> URL: https://issues.apache.org/jira/browse/HIVE-27327
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the absence of equality deletes, the total row count should be :
> {noformat}
> row_count = total-records - total-position-deletes{noformat}
>  
>  
> Example:
> After many inserts and deletes, there are only 46 records in a table.
> {noformat}
> >>select count(*) from llap_orders;
> +--+
> | _c0  |
> +--+
> | 46   |
> +--+
> 1 row selected (7.22 seconds)
> {noformat}
>  
> But the total records in snapshot summary indicate that there are 300 records
>  
> {noformat}
>  {
>     "sequence-number" : 19,
>     "snapshot-id" : 4237525869561629328,
>     "parent-snapshot-id" : 2572487769557272977,
>     "timestamp-ms" : 1683553017982,
>     "summary" : {
>       "operation" : "append",
>       "added-data-files" : "5",
>       "added-records" : "12",
>       "added-files-size" : "3613",
>       "changed-partition-count" : "5",
>       "total-records" : "300",
>       "total-files-size" : "164405",
>       "total-data-files" : "100",
>       "total-delete-files" : "73",
>       "total-position-deletes" : "254",
>       "total-equality-deletes" : "0"
>     }{noformat}
>  
> As a result of this , the hive plans generated are unoptimized.
> {noformat}
> 0: jdbc:hive2://simhadrigovindappa-2.simhadri> explain update llap_orders set 
> itemid=7 where itemid=5;
> INFO  : OK
> ++
> |                      Explain                       |
> ++
> | Vertex dependency in root stage                    |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)                   |
> | Reducer 3 <- Map 1 (SIMPLE_EDGE)                   |
> |                                                    |
> | Stage-4                                            |
> |   Stats Work{}                                     |
> |     Stage-0                                        |
> |       Move Operator                                |
> |         table:{"name:":"db.llap_orders"}           |
> |         Stage-3                                    |
> |           Dependency Collection{}                  |
> |             Stage-2                                |
> |               Reducer 2 vectorized                 |
> |               File Output Operator [FS_14]         |
> |                 table:{"name:":"db.llap_orders"}   |
> |                 Select Operator [SEL_13] (rows=150 width=424) |
> |       

[jira] [Work logged] (HIVE-27268) Hive.getPartitionsByNames should not enforce SessionState to be available

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27268?focusedWorklogId=861057=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861057
 ]

ASF GitHub Bot logged work on HIVE-27268:
-

Author: ASF GitHub Bot
Created on: 08/May/23 17:15
Start Date: 08/May/23 17:15
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4241:
URL: https://github.com/apache/hive/pull/4241#issuecomment-1538746841

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4241)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=BUG)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=BUG)
 [5 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=VULNERABILITY)
 
[![B](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/B-16px.png
 
'B')](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=VULNERABILITY)
 [5 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4241=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4241=false=SECURITY_HOTSPOT)
 [1 Security 
Hotspot](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4241=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=CODE_SMELL)
 [127 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4241=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4241=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4241=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 861057)
Time Spent: 1h 40m  (was: 1.5h)

> Hive.getPartitionsByNames should not enforce SessionState to be available
> -
>
> Key: HIVE-27268
> URL: https://issues.apache.org/jira/browse/HIVE-27268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Henri Biestro
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> HIVE-24743, HIVE-24392 is enforcing to check for valid write Id list for 
> "Hive.getPartitionsByName".
> This breaks basic API integration. For a user who needs to get basic 
> partition detail, he is forced to have SessionState.
> Request in this ticket is to ensure that if SessionState.get() is null, it 
> should return empty validWriteIdList.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27327) Iceberg basic stats: Incorrect row count in snapshot summary leading to unoptimized plans

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27327:
--
Labels: pull-request-available  (was: )

> Iceberg basic stats: Incorrect row count in snapshot summary leading to 
> unoptimized plans
> -
>
> Key: HIVE-27327
> URL: https://issues.apache.org/jira/browse/HIVE-27327
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the absence of equality deletes, the total row count should be :
> {noformat}
> row_count = total-records - total-position-deletes{noformat}
>  
>  
> Example:
> After many inserts and deletes, there are only 46 records in a table.
> {noformat}
> >>select count(*) from llap_orders;
> +--+
> | _c0  |
> +--+
> | 46   |
> +--+
> 1 row selected (7.22 seconds)
> {noformat}
>  
> But the total records in snapshot summary indicate that there are 300 records
>  
> {noformat}
>  {
>     "sequence-number" : 19,
>     "snapshot-id" : 4237525869561629328,
>     "parent-snapshot-id" : 2572487769557272977,
>     "timestamp-ms" : 1683553017982,
>     "summary" : {
>       "operation" : "append",
>       "added-data-files" : "5",
>       "added-records" : "12",
>       "added-files-size" : "3613",
>       "changed-partition-count" : "5",
>       "total-records" : "300",
>       "total-files-size" : "164405",
>       "total-data-files" : "100",
>       "total-delete-files" : "73",
>       "total-position-deletes" : "254",
>       "total-equality-deletes" : "0"
>     }{noformat}
>  
> As a result of this , the hive plans generated are unoptimized.
> {noformat}
> 0: jdbc:hive2://simhadrigovindappa-2.simhadri> explain update llap_orders set 
> itemid=7 where itemid=5;
> INFO  : OK
> ++
> |                      Explain                       |
> ++
> | Vertex dependency in root stage                    |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)                   |
> | Reducer 3 <- Map 1 (SIMPLE_EDGE)                   |
> |                                                    |
> | Stage-4                                            |
> |   Stats Work{}                                     |
> |     Stage-0                                        |
> |       Move Operator                                |
> |         table:{"name:":"db.llap_orders"}           |
> |         Stage-3                                    |
> |           Dependency Collection{}                  |
> |             Stage-2                                |
> |               Reducer 2 vectorized                 |
> |               File Output Operator [FS_14]         |
> |                 table:{"name:":"db.llap_orders"}   |
> |                 Select Operator [SEL_13] (rows=150 width=424) |
> |                   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
>  |
> |                 <-Map 1 [SIMPLE_EDGE]              |
> |                   SHUFFLE [RS_4]                   |
> |                     Select Operator [SEL_3] (rows=150 width=424) |
> |                       
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9"]
>  |
> |                       Select Operator [SEL_2] (rows=150 width=644) |
> |                         
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9","_col10","_col11","_col13","_col14","_col15"]
>  |
> |                         Filter Operator [FIL_9] (rows=150 width=220) |
> |                           predicate:(itemid = 5)   |
> |                           TableScan [TS_0] (rows=300 width=220) |
> |                             
> db@llap_orders,llap_orders,Tbl:COMPLETE,Col:COMPLETE,Output:["orderid","quantity","itemid","tradets","p1","p2"]
>  |
> |               Reducer 3 vectorized                 |
> |               File Output Operator [FS_16]         |
> |                 table:{"name:":"db.llap_orders"}   |
> |                 Select Operator [SEL_15]           |
> |                   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col4","_col5"] |
> |                 <-Map 1 [SIMPLE_EDGE]              |
> |                   SHUFFLE [RS_10]                  |
> |                     PartitionCols:_col4, _col5     |
> |                     Select Operator [SEL_7] (rows=150 width=220) |
> |                       
> Output:["_col0","_col1","_col2","_col3","_col4","_col5"] |
> |                        Please refer to the previous Select Operator [SEL_2] 
> |
> |                                          

[jira] [Work logged] (HIVE-27327) Iceberg basic stats: Incorrect row count in snapshot summary leading to unoptimized plans

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27327?focusedWorklogId=861056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861056
 ]

ASF GitHub Bot logged work on HIVE-27327:
-

Author: ASF GitHub Bot
Created on: 08/May/23 17:02
Start Date: 08/May/23 17:02
Worklog Time Spent: 10m 
  Work Description: simhadri-g opened a new pull request, #4301:
URL: https://github.com/apache/hive/pull/4301

   …mary leading to unoptimized plans
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 861056)
Remaining Estimate: 0h
Time Spent: 10m

> Iceberg basic stats: Incorrect row count in snapshot summary leading to 
> unoptimized plans
> -
>
> Key: HIVE-27327
> URL: https://issues.apache.org/jira/browse/HIVE-27327
> Project: Hive
>  Issue Type: Bug
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the absence of equality deletes, the total row count should be :
> {noformat}
> row_count = total-records - total-position-deletes{noformat}
>  
>  
> Example:
> After many inserts and deletes, there are only 46 records in a table.
> {noformat}
> >>select count(*) from llap_orders;
> +--+
> | _c0  |
> +--+
> | 46   |
> +--+
> 1 row selected (7.22 seconds)
> {noformat}
>  
> But the total records in snapshot summary indicate that there are 300 records
>  
> {noformat}
>  {
>     "sequence-number" : 19,
>     "snapshot-id" : 4237525869561629328,
>     "parent-snapshot-id" : 2572487769557272977,
>     "timestamp-ms" : 1683553017982,
>     "summary" : {
>       "operation" : "append",
>       "added-data-files" : "5",
>       "added-records" : "12",
>       "added-files-size" : "3613",
>       "changed-partition-count" : "5",
>       "total-records" : "300",
>       "total-files-size" : "164405",
>       "total-data-files" : "100",
>       "total-delete-files" : "73",
>       "total-position-deletes" : "254",
>       "total-equality-deletes" : "0"
>     }{noformat}
>  
> As a result of this , the hive plans generated are unoptimized.
> {noformat}
> 0: jdbc:hive2://simhadrigovindappa-2.simhadri> explain update llap_orders set 
> itemid=7 where itemid=5;
> INFO  : OK
> ++
> |                      Explain                       |
> ++
> | Vertex dependency in root stage                    |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)                   |
> | Reducer 3 <- Map 1 (SIMPLE_EDGE)                   |
> |                                                    |
> | Stage-4                                            |
> |   Stats Work{}                                     |
> |     Stage-0                                        |
> |       Move Operator                                |
> |         table:{"name:":"db.llap_orders"}           |
> |         Stage-3                                    |
> |           Dependency Collection{}                  |
> |             Stage-2                                |
> |               Reducer 2 vectorized                 |
> |               File Output Operator [FS_14]         |
> |                 table:{"name:":"db.llap_orders"}   |
> |                 Select Operator [SEL_13] (rows=150 width=424) |
> |                   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
>  |
> |                 <-Map 1 [SIMPLE_EDGE]              |
> |                   SHUFFLE [RS_4]                   |
> |                     Select Operator [SEL_3] (rows=150 width=424) |
> |                       
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9"]
>  |
> |                       Select Operator [SEL_2] (rows=150 width=644) |
> |                         
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9","_col10","_col11","_col13","_col14","_col15"]
>  |
> |                         Filter Operator [FIL_9] (rows=150 width=220) |
> |                           predicate:(itemid = 5)   |
> |                           TableScan [TS_0] (rows=300 width=220) |
> |                             
> db@llap_orders,llap_orders,Tbl:COMPLETE,Col:COMPLETE,Output:["orderid","quantity","itemid","tradets","p1","p2"]
>  |
> |               Reducer 3 vectorized                 |
> |               File Output Operator [FS_16]         |
> |                 

[jira] [Work logged] (HIVE-27234) Iceberg: CREATE BRANCH SQL implementation

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27234?focusedWorklogId=861050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861050
 ]

ASF GitHub Bot logged work on HIVE-27234:
-

Author: ASF GitHub Bot
Created on: 08/May/23 16:19
Start Date: 08/May/23 16:19
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4216:
URL: https://github.com/apache/hive/pull/4216#discussion_r1187634091


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -6709,6 +6710,15 @@ public void alterTableExecuteOperation(Table table, 
AlterTableExecuteSpec execut
 }
   }
 
+  public void alterTableCreateBranchOperation(Table table, 
AlterTableCreateBranchSpec createBranchSpec) throws HiveException {

Review Comment:
   maybe simply 'alterTableCreateBranch' ?





Issue Time Tracking
---

Worklog Id: (was: 861050)
Time Spent: 8h 20m  (was: 8h 10m)

> Iceberg:  CREATE BRANCH SQL implementation
> --
>
> Key: HIVE-27234
> URL: https://issues.apache.org/jira/browse/HIVE-27234
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Maybe we can follow spark sql about branch ddl implementation 
> [https://github.com/apache/iceberg/pull/6617]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27234) Iceberg: CREATE BRANCH SQL implementation

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27234?focusedWorklogId=861049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861049
 ]

ASF GitHub Bot logged work on HIVE-27234:
-

Author: ASF GitHub Bot
Created on: 08/May/23 16:14
Start Date: 08/May/23 16:14
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4216:
URL: https://github.com/apache/hive/pull/4216#discussion_r1187629739


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/branch/create/AlterTableCreateBranchAnalyzer.java:
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.ddl.table.branch.create;
+
+import java.util.Locale;
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.hadoop.hive.common.TableName;
+import org.apache.hadoop.hive.metastore.HiveMetaHook;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory;
+import org.apache.hadoop.hive.ql.ddl.DDLWork;
+import org.apache.hadoop.hive.ql.ddl.table.AbstractAlterTableAnalyzer;
+import org.apache.hadoop.hive.ql.ddl.table.AlterTableType;
+import org.apache.hadoop.hive.ql.exec.TaskFactory;
+import org.apache.hadoop.hive.ql.hooks.ReadEntity;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.AlterTableCreateBranchSpec;
+import org.apache.hadoop.hive.ql.parse.HiveParser;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+
+@DDLSemanticAnalyzerFactory.DDLType(types = 
HiveParser.TOK_ALTERTABLE_CREATE_BRANCH)
+public class AlterTableCreateBranchAnalyzer extends AbstractAlterTableAnalyzer 
{
+
+  public AlterTableCreateBranchAnalyzer(QueryState queryState) throws 
SemanticException {
+super(queryState);
+  }
+
+  @Override
+  protected void analyzeCommand(TableName tableName, Map 
partitionSpec, ASTNode command)
+  throws SemanticException {
+Table table = getTable(tableName);
+validateAlterTableType(table, AlterTableType.CREATEBRANCH, false);
+if 
(!HiveMetaHook.ICEBERG.equalsIgnoreCase(table.getParameters().get(HiveMetaHook.TABLE_TYPE)))
 {

Review Comment:
   would be great to have a short helper method like `isIceberg()` instead of 
this long construct (used in multiple places).
   Found  similar thing in `HiveTableOperations#validateTableIsIceberg(Table 
table, String fullName)`
 





Issue Time Tracking
---

Worklog Id: (was: 861049)
Time Spent: 8h 10m  (was: 8h)

> Iceberg:  CREATE BRANCH SQL implementation
> --
>
> Key: HIVE-27234
> URL: https://issues.apache.org/jira/browse/HIVE-27234
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Maybe we can follow spark sql about branch ddl implementation 
> [https://github.com/apache/iceberg/pull/6617]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27234) Iceberg: CREATE BRANCH SQL implementation

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27234?focusedWorklogId=861048=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861048
 ]

ASF GitHub Bot logged work on HIVE-27234:
-

Author: ASF GitHub Bot
Created on: 08/May/23 16:14
Start Date: 08/May/23 16:14
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4216:
URL: https://github.com/apache/hive/pull/4216#discussion_r1187629739


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/branch/create/AlterTableCreateBranchAnalyzer.java:
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.ddl.table.branch.create;
+
+import java.util.Locale;
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.hadoop.hive.common.TableName;
+import org.apache.hadoop.hive.metastore.HiveMetaHook;
+import org.apache.hadoop.hive.ql.QueryState;
+import org.apache.hadoop.hive.ql.ddl.DDLSemanticAnalyzerFactory;
+import org.apache.hadoop.hive.ql.ddl.DDLWork;
+import org.apache.hadoop.hive.ql.ddl.table.AbstractAlterTableAnalyzer;
+import org.apache.hadoop.hive.ql.ddl.table.AlterTableType;
+import org.apache.hadoop.hive.ql.exec.TaskFactory;
+import org.apache.hadoop.hive.ql.hooks.ReadEntity;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.parse.ASTNode;
+import org.apache.hadoop.hive.ql.parse.AlterTableCreateBranchSpec;
+import org.apache.hadoop.hive.ql.parse.HiveParser;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+
+@DDLSemanticAnalyzerFactory.DDLType(types = 
HiveParser.TOK_ALTERTABLE_CREATE_BRANCH)
+public class AlterTableCreateBranchAnalyzer extends AbstractAlterTableAnalyzer 
{
+
+  public AlterTableCreateBranchAnalyzer(QueryState queryState) throws 
SemanticException {
+super(queryState);
+  }
+
+  @Override
+  protected void analyzeCommand(TableName tableName, Map 
partitionSpec, ASTNode command)
+  throws SemanticException {
+Table table = getTable(tableName);
+validateAlterTableType(table, AlterTableType.CREATEBRANCH, false);
+if 
(!HiveMetaHook.ICEBERG.equalsIgnoreCase(table.getParameters().get(HiveMetaHook.TABLE_TYPE)))
 {

Review Comment:
   would be great to have a short helper method like` isIceberg()` instead of 
this long construct (used in multiple places).
   Found  similar thing in `HiveTableOperations#validateTableIsIceberg(Table 
table, String fullName)`
 





Issue Time Tracking
---

Worklog Id: (was: 861048)
Time Spent: 8h  (was: 7h 50m)

> Iceberg:  CREATE BRANCH SQL implementation
> --
>
> Key: HIVE-27234
> URL: https://issues.apache.org/jira/browse/HIVE-27234
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Maybe we can follow spark sql about branch ddl implementation 
> [https://github.com/apache/iceberg/pull/6617]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27234) Iceberg: CREATE BRANCH SQL implementation

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27234?focusedWorklogId=861047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861047
 ]

ASF GitHub Bot logged work on HIVE-27234:
-

Author: ASF GitHub Bot
Created on: 08/May/23 16:11
Start Date: 08/May/23 16:11
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4216:
URL: https://github.com/apache/hive/pull/4216#discussion_r1187626822


##
ql/src/java/org/apache/hadoop/hive/ql/parse/AlterTableCreateBranchSpec.java:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.parse;
+
+import com.google.common.base.MoreObjects;
+
+public class AlterTableCreateBranchSpec {

Review Comment:
   I think so, ExecuteOperationType was extended with those 3 operations:
   
   ROLLBACK,
   EXPIRE_SNAPSHOT,
   SET_CURRENT_SNAPSHOT
   
   note, if you think that create/drop branch doesn't semantically fit this 
design or complicates it, feel free to create a a new abstraction





Issue Time Tracking
---

Worklog Id: (was: 861047)
Time Spent: 7h 50m  (was: 7h 40m)

> Iceberg:  CREATE BRANCH SQL implementation
> --
>
> Key: HIVE-27234
> URL: https://issues.apache.org/jira/browse/HIVE-27234
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Maybe we can follow spark sql about branch ddl implementation 
> [https://github.com/apache/iceberg/pull/6617]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27234) Iceberg: CREATE BRANCH SQL implementation

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27234?focusedWorklogId=861045=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861045
 ]

ASF GitHub Bot logged work on HIVE-27234:
-

Author: ASF GitHub Bot
Created on: 08/May/23 16:05
Start Date: 08/May/23 16:05
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4216:
URL: https://github.com/apache/hive/pull/4216#discussion_r1187621453


##
parser/src/java/org/apache/hadoop/hive/ql/parse/AlterClauseParser.g:
##
@@ -477,6 +478,34 @@ alterStatementSuffixExecute
 -> ^(TOK_ALTERTABLE_EXECUTE KW_SET_CURRENT_SNAPSHOT $snapshotParam)
 ;
 
+alterStatementSuffixCreateBranch
+@init { gParent.pushMsg("alter table create branch", state); }
+@after { gParent.popMsg(state); }
+: KW_CREATE KW_BRANCH branchName=identifier snapshotIdOfBranch? 
branchRetain? retentionOfSnapshots?
+-> ^(TOK_ALTERTABLE_CREATE_BRANCH $branchName snapshotIdOfBranch? 
branchRetain? retentionOfSnapshots?)
+;
+
+snapshotIdOfBranch
+@init { gParent.pushMsg("alter table create branch as of version", state); }

Review Comment:
    





Issue Time Tracking
---

Worklog Id: (was: 861045)
Time Spent: 7h 40m  (was: 7.5h)

> Iceberg:  CREATE BRANCH SQL implementation
> --
>
> Key: HIVE-27234
> URL: https://issues.apache.org/jira/browse/HIVE-27234
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Maybe we can follow spark sql about branch ddl implementation 
> [https://github.com/apache/iceberg/pull/6617]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27234) Iceberg: CREATE BRANCH SQL implementation

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27234?focusedWorklogId=861044=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861044
 ]

ASF GitHub Bot logged work on HIVE-27234:
-

Author: ASF GitHub Bot
Created on: 08/May/23 16:05
Start Date: 08/May/23 16:05
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4216:
URL: https://github.com/apache/hive/pull/4216#discussion_r1187620978


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -676,6 +678,35 @@ public void 
executeOperation(org.apache.hadoop.hive.ql.metadata.Table hmsTable,
 }
   }
 
+  @Override
+  public void createBranchOperation(org.apache.hadoop.hive.ql.metadata.Table 
hmsTable,
+  AlterTableCreateBranchSpec createBranchSpec) {
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table icebergTable = IcebergTableUtil.getTable(conf, 
tableDesc.getProperties());
+
+String branchName = createBranchSpec.getBranchName();
+Optional.ofNullable(icebergTable.currentSnapshot()).orElseThrow(() -> new 
UnsupportedOperationException(
+String.format("Cannot create branch %s on iceberg table %s.%s which 
has no snapshot",
+branchName, hmsTable.getDbName(), hmsTable.getTableName(;
+Long snapshotId = Optional.ofNullable(createBranchSpec.getSnapshotId())
+.orElse(icebergTable.currentSnapshot().snapshotId());
+LOG.info("Creating branch {} on iceberg table {}.{}", branchName, 
hmsTable.getDbName(),
+hmsTable.getTableName());
+ManageSnapshots manageSnapshots = icebergTable.manageSnapshots();
+manageSnapshots.createBranch(branchName, snapshotId);

Review Comment:
    





Issue Time Tracking
---

Worklog Id: (was: 861044)
Time Spent: 7.5h  (was: 7h 20m)

> Iceberg:  CREATE BRANCH SQL implementation
> --
>
> Key: HIVE-27234
> URL: https://issues.apache.org/jira/browse/HIVE-27234
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Maybe we can follow spark sql about branch ddl implementation 
> [https://github.com/apache/iceberg/pull/6617]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27319) HMS server should throw InvalidObjectException in get_partitions_by_names() when the table is missing/dropped

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27319?focusedWorklogId=861033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861033
 ]

ASF GitHub Bot logged work on HIVE-27319:
-

Author: ASF GitHub Bot
Created on: 08/May/23 14:56
Start Date: 08/May/23 14:56
Worklog Time Spent: 10m 
  Work Description: henrib commented on code in PR #4299:
URL: https://github.com/apache/hive/pull/4299#discussion_r1187546032


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java:
##
@@ -127,7 +128,7 @@ public ExceptionHandler toMetaExceptionIfInstance(String 
message, Class... cl
*/
   public static void rethrowException(Exception e) throws TException {
 throw handleException(e)
-.throwIfInstance(MetaException.class, NoSuchObjectException.class)
+.throwIfInstance(MetaException.class, NoSuchObjectException.class, 
InvalidObjectException.class)

Review Comment:
   May be explicitly call `
   throw handleException(e) 
.throwIfInstance(MetaException.class,NoSuchObjectException.class,InvalidObjectException.class)
   ` instead of modifying rethrowException ?





Issue Time Tracking
---

Worklog Id: (was: 861033)
Time Spent: 50m  (was: 40m)

> HMS server should throw InvalidObjectException in get_partitions_by_names() 
> when the table is missing/dropped
> -
>
> Key: HIVE-27319
> URL: https://issues.apache.org/jira/browse/HIVE-27319
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When the table object is dropped by a concurrent thread, the 
> get_partitions_by_names_req() API is currently throwing a 
> TApplicationException to the client. Instead, the HMS server should propagate 
> the InvalidObjectException thrown by getTable() to the HMS client. By doing 
> this, other services using HMS client will understand the exception better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27327) Iceberg basic stats: Incorrect row count in snapshot summary leading to unoptimized plans

2023-05-08 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27327:
--

 Summary: Iceberg basic stats: Incorrect row count in snapshot 
summary leading to unoptimized plans
 Key: HIVE-27327
 URL: https://issues.apache.org/jira/browse/HIVE-27327
 Project: Hive
  Issue Type: Bug
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa


In the absence of equality deletes, the total row count should be :
{noformat}
row_count = total-records - total-position-deletes{noformat}
 

 

Example:

After many inserts and deletes, there are only 46 records in a table.
{noformat}
>>select count(*) from llap_orders;
+--+
| _c0  |
+--+
| 46   |
+--+
1 row selected (7.22 seconds)

{noformat}
 

But the total records in snapshot summary indicate that there are 300 records

 
{noformat}
 {
    "sequence-number" : 19,
    "snapshot-id" : 4237525869561629328,
    "parent-snapshot-id" : 2572487769557272977,
    "timestamp-ms" : 1683553017982,
    "summary" : {
      "operation" : "append",
      "added-data-files" : "5",
      "added-records" : "12",
      "added-files-size" : "3613",
      "changed-partition-count" : "5",
      "total-records" : "300",
      "total-files-size" : "164405",
      "total-data-files" : "100",
      "total-delete-files" : "73",
      "total-position-deletes" : "254",
      "total-equality-deletes" : "0"
    }{noformat}
 

As a result of this , the hive plans generated are unoptimized.
{noformat}
0: jdbc:hive2://simhadrigovindappa-2.simhadri> explain update llap_orders set 
itemid=7 where itemid=5;

INFO  : OK
++
|                      Explain                       |
++
| Vertex dependency in root stage                    |
| Reducer 2 <- Map 1 (SIMPLE_EDGE)                   |
| Reducer 3 <- Map 1 (SIMPLE_EDGE)                   |
|                                                    |
| Stage-4                                            |
|   Stats Work{}                                     |
|     Stage-0                                        |
|       Move Operator                                |
|         table:{"name:":"db.llap_orders"}           |
|         Stage-3                                    |
|           Dependency Collection{}                  |
|             Stage-2                                |
|               Reducer 2 vectorized                 |
|               File Output Operator [FS_14]         |
|                 table:{"name:":"db.llap_orders"}   |
|                 Select Operator [SEL_13] (rows=150 width=424) |
|                   
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
 |
|                 <-Map 1 [SIMPLE_EDGE]              |
|                   SHUFFLE [RS_4]                   |
|                     Select Operator [SEL_3] (rows=150 width=424) |
|                       
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9"]
 |
|                       Select Operator [SEL_2] (rows=150 width=644) |
|                         
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9","_col10","_col11","_col13","_col14","_col15"]
 |
|                         Filter Operator [FIL_9] (rows=150 width=220) |
|                           predicate:(itemid = 5)   |
|                           TableScan [TS_0] (rows=300 width=220) |
|                             
db@llap_orders,llap_orders,Tbl:COMPLETE,Col:COMPLETE,Output:["orderid","quantity","itemid","tradets","p1","p2"]
 |
|               Reducer 3 vectorized                 |
|               File Output Operator [FS_16]         |
|                 table:{"name:":"db.llap_orders"}   |
|                 Select Operator [SEL_15]           |
|                   
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col4","_col5"] |
|                 <-Map 1 [SIMPLE_EDGE]              |
|                   SHUFFLE [RS_10]                  |
|                     PartitionCols:_col4, _col5     |
|                     Select Operator [SEL_7] (rows=150 width=220) |
|                       
Output:["_col0","_col1","_col2","_col3","_col4","_col5"] |
|                        Please refer to the previous Select Operator [SEL_2] |
|                                                    |
++
39 rows selected (0.104 seconds)
0: jdbc:hive2://simhadrigovindappa-2.simhadri>{noformat}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27326) Hive Authorizer not receiving resource information for few alter queries causing authorization check to fail

2023-05-08 Thread Jai Patel (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Patel updated HIVE-27326:
-
Description: 
We have a Ranger plugin implemented for HiveService which uses the hook 
provided by the HiveService i.e. the "{*}checkPrivileges{*}" method in 
"org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizer.java" - 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveAuthorizer.java#L163|http://example.com/].

We do authorization based on the information provided in the *inputObjs* and 
*outputObjs* parameters. 
This *works fine* for the normal alter query like -
{code:java}
ALTER TABLE hr ADD COLUMNS (country VARCHAR(255)){code}

Logs -
{code:java}
2023-05-08T14:31:40,505 DEBUG [c85f84fd-85d6-4e1a-ae72-ea07323e1a93 
HiveServer2-Handler-Pool: Thread-90] 
ranger.authorization.hive.authorizer.RangerHiveAuthorizer: 
'checkPrivileges':{'hiveOpType':ALTERTABLE_ADDCOLS, 
'inputHObjs':['HivePrivilegeObject':{'type':TABLE_OR_VIEW, 'dbName':test, 
'objectType':TABLE_OR_VIEW, 'objectName':hr, 'columns':[], 'partKeys':[], 
'commandParams':[], 'actionType':OTHER}], 
'outputHObjs':['HivePrivilegeObject':{'type':TABLE_OR_VIEW, 'dbName':test, 
'objectType':TABLE_OR_VIEW, 'objectName':hr, 'columns':[], 'partKeys':[], 
'commandParams':[], 'actionType':OTHER}], 'context':{'clientType':HIVESERVER2, 
'commandString':ALTER TABLE hr ADD COLUMNS (country VARCHAR(255)), 
'ipAddress':172.18.0.1, 'forwardedAddresses':null, 
'sessionString':c85f84fd-85d6-4e1a-ae72-ea07323e1a93}, 'user':root, 
'groups':[root]}
{code}
 

{color:#ff}*But for below alter queries, we are not getting the db and 
table information -* 
{color}Query 1 -
{code:java}
ALTER TABLE hr ADD CONSTRAINT unique_key_const UNIQUE (c0) DISABLE 
NOVALIDATE;{code}
LOGS -
{code:java}
2023-05-08T12:14:22,502 DEBUG [c0c66e4e-3014-4258-8e1a-7b689c2fbe6d 
HiveServer2-Handler-Pool: Thread-90] 
ranger.authorization.hive.authorizer.RangerHiveAuthorizer: 
'checkPrivileges':{'hiveOpType':ALTERTABLE_ADDCONSTRAINT, 'inputHObjs':[], 
'outputHObjs':[], 'context':{'clientType':HIVESERVER2, 'commandString':ALTER 
TABLE hr ADD CONSTRAINT unique_key_const1 UNIQUE (c0) DISABLE NOVALIDATE, 
'ipAddress':172.18.0.1, 'forwardedAddresses':null, 'sessionString':c0c66{code}
Query 2 -
{code:java}
ALTER TABLE temp PARTITION (c1=1) COMPACT 'minor';{code}

Logs -
{code:java}
2023-05-08T12:16:30,595 DEBUG [c0c66e4e-3014-4258-8e1a-7b689c2fbe6d 
HiveServer2-Handler-Pool: Thread-90] 
ranger.authorization.hive.authorizer.RangerHiveAuthorizer: 
'checkPrivileges':{'hiveOpType':ALTERTABLE_COMPACT, 'inputHObjs':[], 
'outputHObjs':[], 'context':
{'clientType':HIVESERVER2, 'commandString':ALTER TABLE temp PARTITION (c1=1) 
COMPACT 'minor', 'ipAddress':172.18.0.1, 'forwardedAddresses':null, 
'sessionString':c0c66e4e-3014-4258-8e1a-7b689c2fbe6d}
, 'user':root, 'groups':[root]}
{code}
 

 

As you can see in the logs, we are getting empty inputHObjs and outputObjs in 
case of Alter Table Add Constraint and Partition. This is not the case for 
ALTER TABLE ADD COLUMNS and hence it works fine in that case.



Can we fix this so as to provide proper authorization on these queries?

 

  was:
We have a Ranger plugin implemented for HiveService which uses the hook 
provided by the HiveService i.e. the "checkPriviliges" method in 
"org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizer.java" - 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveAuthorizer.java#L163|http://example.com].

We do authorization based on the information provided in the inputObjs and 
outputObjs parameters. 
This works fine for the normal alter query like -
ALTER TABLE hr ADD COLUMNS (country VARCHAR(255))
Logs -
2023-05-08T14:31:40,505 DEBUG [c85f84fd-85d6-4e1a-ae72-ea07323e1a93 
HiveServer2-Handler-Pool: Thread-90] 
ranger.authorization.hive.authorizer.RangerHiveAuthorizer: 
'checkPrivileges':\{'hiveOpType':ALTERTABLE_ADDCOLS, 
'inputHObjs':['HivePrivilegeObject':{'type':TABLE_OR_VIEW, 'dbName':privacera, 
'objectType':TABLE_OR_VIEW, 'objectName':hr, 'columns':[], 'partKeys':[], 
'commandParams':[], 'actionType':OTHER}], 
'outputHObjs':['HivePrivilegeObject':\{'type':TABLE_OR_VIEW, 
'dbName':privacera, 'objectType':TABLE_OR_VIEW, 'objectName':hr, 'columns':[], 
'partKeys':[], 'commandParams':[], 'actionType':OTHER}], 
'context':\{'clientType':HIVESERVER2, 'commandString':ALTER TABLE hr ADD 
COLUMNS (country VARCHAR(255)), 'ipAddress':172.18.0.1, 
'forwardedAddresses':null, 
'sessionString':c85f84fd-85d6-4e1a-ae72-ea07323e1a93}, 'user':root, 
'groups':[root]}


{color:#FF}*But for below alter queries, we are not getting the db and 
table information -* 
{color}Query 1 -
ALTER TABLE hr ADD CONSTRAINT unique_key_const UNIQUE (c0) DISABLE NOVALIDATE;
LOGS -
2023-05-08T12:14:22,502 

[jira] [Created] (HIVE-27326) Hive Authorizer not receiving resource information for few alter queries causing authorization check to fail

2023-05-08 Thread Jai Patel (Jira)
Jai Patel created HIVE-27326:


 Summary: Hive Authorizer not receiving resource information for 
few alter queries causing authorization check to fail
 Key: HIVE-27326
 URL: https://issues.apache.org/jira/browse/HIVE-27326
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 3.1.2
Reporter: Jai Patel


We have a Ranger plugin implemented for HiveService which uses the hook 
provided by the HiveService i.e. the "checkPriviliges" method in 
"org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizer.java" - 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveAuthorizer.java#L163|http://example.com].

We do authorization based on the information provided in the inputObjs and 
outputObjs parameters. 
This works fine for the normal alter query like -
ALTER TABLE hr ADD COLUMNS (country VARCHAR(255))
Logs -
2023-05-08T14:31:40,505 DEBUG [c85f84fd-85d6-4e1a-ae72-ea07323e1a93 
HiveServer2-Handler-Pool: Thread-90] 
ranger.authorization.hive.authorizer.RangerHiveAuthorizer: 
'checkPrivileges':\{'hiveOpType':ALTERTABLE_ADDCOLS, 
'inputHObjs':['HivePrivilegeObject':{'type':TABLE_OR_VIEW, 'dbName':privacera, 
'objectType':TABLE_OR_VIEW, 'objectName':hr, 'columns':[], 'partKeys':[], 
'commandParams':[], 'actionType':OTHER}], 
'outputHObjs':['HivePrivilegeObject':\{'type':TABLE_OR_VIEW, 
'dbName':privacera, 'objectType':TABLE_OR_VIEW, 'objectName':hr, 'columns':[], 
'partKeys':[], 'commandParams':[], 'actionType':OTHER}], 
'context':\{'clientType':HIVESERVER2, 'commandString':ALTER TABLE hr ADD 
COLUMNS (country VARCHAR(255)), 'ipAddress':172.18.0.1, 
'forwardedAddresses':null, 
'sessionString':c85f84fd-85d6-4e1a-ae72-ea07323e1a93}, 'user':root, 
'groups':[root]}


{color:#FF}*But for below alter queries, we are not getting the db and 
table information -* 
{color}Query 1 -
ALTER TABLE hr ADD CONSTRAINT unique_key_const UNIQUE (c0) DISABLE NOVALIDATE;
LOGS -
2023-05-08T12:14:22,502 DEBUG [c0c66e4e-3014-4258-8e1a-7b689c2fbe6d 
HiveServer2-Handler-Pool: Thread-90] 
ranger.authorization.hive.authorizer.RangerHiveAuthorizer: 
'checkPrivileges':{'hiveOpType':ALTERTABLE_ADDCONSTRAINT, 'inputHObjs':[], 
'outputHObjs':[], 'context':{'clientType':HIVESERVER2, 'commandString':ALTER 
TABLE hr ADD CONSTRAINT unique_key_const1 UNIQUE (c0) DISABLE NOVALIDATE, 
'ipAddress':172.18.0.1, 'forwardedAddresses':null, 'sessionString':c0c66
Query 2 -
ALTER TABLE temp PARTITION (c1=1) COMPACT 'minor';
Logs -
2023-05-08T12:16:30,595 DEBUG [c0c66e4e-3014-4258-8e1a-7b689c2fbe6d 
HiveServer2-Handler-Pool: Thread-90] 
ranger.authorization.hive.authorizer.RangerHiveAuthorizer: 
'checkPrivileges':\{'hiveOpType':ALTERTABLE_COMPACT, 'inputHObjs':[], 
'outputHObjs':[], 'context':{'clientType':HIVESERVER2, 'commandString':ALTER 
TABLE temp PARTITION (c1=1) COMPACT 'minor', 'ipAddress':172.18.0.1, 
'forwardedAddresses':null, 
'sessionString':c0c66e4e-3014-4258-8e1a-7b689c2fbe6d}, 'user':root, 
'groups':[root]}

As you can see in the logs, we are getting empty inputHObjs and outputObjs in 
case of Alter Table Add Constraint and Partition.

Can we fix this so as to provide proper authorization on these queries?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861020
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 14:35
Start Date: 08/May/23 14:35
Worklog Time Spent: 10m 
  Work Description: henrib commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187524702


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PropertyServlet.java:
##
@@ -0,0 +1,307 @@
+/* * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import com.google.gson.Gson;
+import com.google.gson.JsonIOException;
+import com.google.gson.JsonSyntaxException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.properties.PropertyException;
+import org.apache.hadoop.hive.metastore.properties.PropertyManager;
+import org.apache.hadoop.hive.metastore.properties.PropertyMap;
+import org.apache.hadoop.hive.metastore.properties.PropertyStore;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.security.SecurityUtil;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.eclipse.jetty.server.HttpConfiguration;
+import org.eclipse.jetty.server.HttpConnectionFactory;
+import org.eclipse.jetty.server.Server;
+import org.eclipse.jetty.server.ServerConnector;
+import org.eclipse.jetty.servlet.ServletContextHandler;
+import org.eclipse.jetty.servlet.ServletHandler;
+import org.eclipse.jetty.servlet.ServletHolder;
+import org.eclipse.jetty.servlet.Source;
+import org.eclipse.jetty.util.ssl.SslContextFactory;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.servlet.ServletException;
+import javax.servlet.ServletInputStream;
+import javax.servlet.ServletOutputStream;
+import javax.servlet.http.HttpServlet;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import javax.servlet.http.Part;
+import java.io.BufferedReader;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.PrintWriter;
+import java.io.Reader;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+/**
+ * The property  cli servlet.
+ */
+public class PropertyServlet extends HttpServlet {
+  /** The logger. */
+  public static final Logger LOGGER = 
LoggerFactory.getLogger(PropertyServlet.class);
+  /** The object store. */
+  private final RawStore objectStore;
+  /** The security. */
+  private final ServletSecurity security;
+
+  PropertyServlet(Configuration configuration, RawStore store) {

Review Comment:
   We want other (no Thrift) clients to be able to communicate with the 
property api.





Issue Time Tracking
---

Worklog Id: (was: 861020)
Time Spent: 18h  (was: 17h 50m)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> 

[jira] [Work logged] (HIVE-27163) Column stats are not getting published after an insert query into an external table with custom location

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27163?focusedWorklogId=861016=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861016
 ]

ASF GitHub Bot logged work on HIVE-27163:
-

Author: ASF GitHub Bot
Created on: 08/May/23 14:10
Start Date: 08/May/23 14:10
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4228:
URL: https://github.com/apache/hive/pull/4228#issuecomment-1538428179

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4228)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4228=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4228=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4228=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=CODE_SMELL)
 [17 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4228=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4228=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 861016)
Time Spent: 4h 20m  (was: 4h 10m)

> Column stats are not getting published after an insert query into an external 
> table with custom location
> 
>
> Key: HIVE-27163
> URL: https://issues.apache.org/jira/browse/HIVE-27163
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Test case details are below
> *test.q*
> {noformat}
> set hive.stats.column.autogather=true;
> set hive.stats.autogather=true;
> dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test;
> create external table test_custom(age int, name string) stored as orc 
> location '/tmp/test';
> insert into test_custom select 1, 'test';
> desc formatted test_custom age;{noformat}
> *test.q.out*
>  
>  
> {noformat}
>  A masked pattern was here 
> PREHOOK: type: CREATETABLE
>  A masked pattern was here 
> PREHOOK: Output: database:default
> PREHOOK: Output: default@test_custom
>  A masked pattern was here 
> POSTHOOK: type: CREATETABLE
>  A masked pattern was here 
> POSTHOOK: Output: database:default
> POSTHOOK: Output: default@test_custom
> PREHOOK: query: insert into test_custom select 1, 'test'
> PREHOOK: type: QUERY
> 

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861013
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 13:52
Start Date: 08/May/23 13:52
Worklog Time Spent: 10m 
  Work Description: henrib commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187474104


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/properties/SerializationProxy.java:
##
@@ -0,0 +1,614 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.properties;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.Externalizable;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInput;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutput;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import java.io.Serializable;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.Executable;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+
+import static 
org.apache.hadoop.hive.metastore.properties.Serializer.SERIALIZER;
+
+/**
+ * The serialization proxy template.
+ * 
+ * This allows a class that defines final members to be made serializable in 
an easy way.
+ * The class must implement:
+ * 
+ * a constructor that takes a DataInput (or derived class) as 
parameter
+ * a write method that takes a DataOutput (or derived class) as 
parameter
+ * 
+ * 
+ *   One should consider the constructor as being potentially fed with an 
invalid stream so
+ *   all usual checks of a public constructor should apply.
+ * 
+ * Standard usage is to add the Serializable interface implementation through 
the following 2 methods:
+ * 
+ * private Object writeReplace() throws ObjectStreamException {
+ * return new SerializationProxyTheClass(this);
+ * }
+ * private void readObject(ObjectInputStream in)throws 
IOException,ClassNotFoundException{
+ * throw new InvalidObjectException("proxy required");
+ * }
+ * 
+ * @param  the serializable object type
+ */
+public class SerializationProxy implements 
Externalizable {
+  /** Serial version. */
+  private static final long serialVersionUID = 202212281757L;
+  /** The logger. */
+  public static final Logger LOGGER = 
LoggerFactory.getLogger(SerializationProxy.class);
+  /** The map of class names to types. */
+  private static final ConcurrentMap> TYPES = new 
ConcurrentHashMap<>();
+  /** The list of registered pre-defined classes. */
+  private static final List> REGISTERED = new ArrayList<>();
+  /** A thread local context used for arguments passing during 
serialization/de-serialization. */
+  private static final ThreadLocal EXTRA_ARGUMENTS = new 
ThreadLocal<>();
+
+  /** The type of instance being read or written. */
+  private transient Type type = null;
+  /** The instance being read or written. */
+  private transient T proxied = null;
+
+  /**
+   * Wraps any error that may occur whilst using reflective calls.
+   */
+  public static class ProxyException extends RuntimeException {
+public ProxyException(Throwable cause) {
+  super(cause);
+}
+
+public ProxyException(String msg) {
+  super(msg);
+}
+
+/**
+ * Convert an exception to a VDBRuntimeException.
+ * @param cause the exception to convert
+ * @return the wrapping CubeException
+ */
+public static ProxyException convert(Throwable cause) {
+  if (cause instanceof ProxyException) {
+return (ProxyException) cause;
+  } else {
+return 

[jira] [Work logged] (HIVE-27138) MapJoinOperator throws NPE when computing OuterJoin with filter expressions on small table

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27138?focusedWorklogId=861012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861012
 ]

ASF GitHub Bot logged work on HIVE-27138:
-

Author: ASF GitHub Bot
Created on: 08/May/23 13:50
Start Date: 08/May/23 13:50
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on PR #4115:
URL: https://github.com/apache/hive/pull/4115#issuecomment-1538394166

   @zabetak @abstractdog Can you please review these if you have a chance? 
Thank you




Issue Time Tracking
---

Worklog Id: (was: 861012)
Time Spent: 1h  (was: 50m)

> MapJoinOperator throws NPE when computing OuterJoin with filter expressions 
> on small table
> --
>
> Key: HIVE-27138
> URL: https://issues.apache.org/jira/browse/HIVE-27138
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Blocker
>  Labels: hive-4.0.0-must, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive throws NPE when running mapjoin_filter_on_outerjoin.q using Tez engine. 
> (I used TestMiniLlapCliDriver.)
> The NPE is thrown by CommonJoinOperator.getFilterTag(), which just retreives 
> the last object from the given list.
> To the best of my knowledge, if Hive selects MapJoin to perform Join 
> operation, filterTag should be computed and appended to a row before the row 
> is passed to MapJoinOperator.
> In the case of MapReduce engine, this is done by HashTableSinkOperator.
> However, I cannot find any logic pareparing filterTag for small tables when 
> Hive uses Tez engine.
> I think there are 2 available options:
> 1. Don't use MapJoinOperator if a small table has filter expression.
> 2. Add a new logic that computes and passes filterTag to MapJoinOperator.
> I am working on the second option and ready to discuss about it.
> It would be grateful if you could give any opinion about this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27138) MapJoinOperator throws NPE when computing OuterJoin with filter expressions on small table

2023-05-08 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-27138:
-
Priority: Blocker  (was: Major)

> MapJoinOperator throws NPE when computing OuterJoin with filter expressions 
> on small table
> --
>
> Key: HIVE-27138
> URL: https://issues.apache.org/jira/browse/HIVE-27138
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Blocker
>  Labels: hive-4.0.0-must, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hive throws NPE when running mapjoin_filter_on_outerjoin.q using Tez engine. 
> (I used TestMiniLlapCliDriver.)
> The NPE is thrown by CommonJoinOperator.getFilterTag(), which just retreives 
> the last object from the given list.
> To the best of my knowledge, if Hive selects MapJoin to perform Join 
> operation, filterTag should be computed and appended to a row before the row 
> is passed to MapJoinOperator.
> In the case of MapReduce engine, this is done by HashTableSinkOperator.
> However, I cannot find any logic pareparing filterTag for small tables when 
> Hive uses Tez engine.
> I think there are 2 available options:
> 1. Don't use MapJoinOperator if a small table has filter expression.
> 2. Add a new logic that computes and passes filterTag to MapJoinOperator.
> I am working on the second option and ready to discuss about it.
> It would be grateful if you could give any opinion about this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861011
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 13:45
Start Date: 08/May/23 13:45
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187466903


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PropertyServlet.java:
##
@@ -0,0 +1,307 @@
+/* * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import com.google.gson.Gson;
+import com.google.gson.JsonIOException;
+import com.google.gson.JsonSyntaxException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.properties.PropertyException;
+import org.apache.hadoop.hive.metastore.properties.PropertyManager;
+import org.apache.hadoop.hive.metastore.properties.PropertyMap;
+import org.apache.hadoop.hive.metastore.properties.PropertyStore;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.security.SecurityUtil;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.eclipse.jetty.server.HttpConfiguration;
+import org.eclipse.jetty.server.HttpConnectionFactory;
+import org.eclipse.jetty.server.Server;
+import org.eclipse.jetty.server.ServerConnector;
+import org.eclipse.jetty.servlet.ServletContextHandler;
+import org.eclipse.jetty.servlet.ServletHandler;
+import org.eclipse.jetty.servlet.ServletHolder;
+import org.eclipse.jetty.servlet.Source;
+import org.eclipse.jetty.util.ssl.SslContextFactory;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.servlet.ServletException;
+import javax.servlet.ServletInputStream;
+import javax.servlet.ServletOutputStream;
+import javax.servlet.http.HttpServlet;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import javax.servlet.http.Part;
+import java.io.BufferedReader;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.PrintWriter;
+import java.io.Reader;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+/**
+ * The property  cli servlet.
+ */
+public class PropertyServlet extends HttpServlet {
+  /** The logger. */
+  public static final Logger LOGGER = 
LoggerFactory.getLogger(PropertyServlet.class);
+  /** The object store. */
+  private final RawStore objectStore;
+  /** The security. */
+  private final ServletSecurity security;
+
+  PropertyServlet(Configuration configuration, RawStore store) {

Review Comment:
   Why not use the HTTP over Thrift directly, we can start the HMS easily by 
altering a property, for example:
   `TestRemoteHiveHttpMetaStore`
   
   





Issue Time Tracking
---

Worklog Id: (was: 861011)
Time Spent: 17h 40m  (was: 17.5h)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> 

[jira] [Updated] (HIVE-27269) VectorizedMapJoin returns wrong result for TPC-DS query 97

2023-05-08 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-27269:
-
Labels: hive-4.0.0-must  (was: )

> VectorizedMapJoin returns wrong result for TPC-DS query 97
> --
>
> Key: HIVE-27269
> URL: https://issues.apache.org/jira/browse/HIVE-27269
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.0-must
>
> TPC-DS query 97 returns wrong results when hive.auto.convert.join and 
> hive.vectorized.execution.enabled are set to true.
>  
> Result of query 97 on 1TB text dataset:
> CommonMergeJoinOperator(hive.auto.convert.join=false): 534151529, 
> 284185{*}746{*}, 84163
> MapJoinOperator(hive.auto.convert.join=true, 
> hive.vectorized.execution.enabled=false): 534151529, 284185{*}746{*}, 84163
> VectorMapJoinOperator(hive.auto.convert.join=true, 
> hive.vectorized.execution.enabled=true): 534151529, 284185{*}388{*}, 84163
>  
> Also I observed that VectorizedMapJoin returns different results for 100GB 
> dataset when I run query 97 twice, but I could not reproduce it since then.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27269) VectorizedMapJoin returns wrong result for TPC-DS query 97

2023-05-08 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-27269:
-
Priority: Blocker  (was: Major)

> VectorizedMapJoin returns wrong result for TPC-DS query 97
> --
>
> Key: HIVE-27269
> URL: https://issues.apache.org/jira/browse/HIVE-27269
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Seonggon Namgung
>Priority: Blocker
>  Labels: hive-4.0.0-must
>
> TPC-DS query 97 returns wrong results when hive.auto.convert.join and 
> hive.vectorized.execution.enabled are set to true.
>  
> Result of query 97 on 1TB text dataset:
> CommonMergeJoinOperator(hive.auto.convert.join=false): 534151529, 
> 284185{*}746{*}, 84163
> MapJoinOperator(hive.auto.convert.join=true, 
> hive.vectorized.execution.enabled=false): 534151529, 284185{*}746{*}, 84163
> VectorMapJoinOperator(hive.auto.convert.join=true, 
> hive.vectorized.execution.enabled=true): 534151529, 284185{*}388{*}, 84163
>  
> Also I observed that VectorizedMapJoin returns different results for 100GB 
> dataset when I run query 97 twice, but I could not reproduce it since then.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861010
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 13:19
Start Date: 08/May/23 13:19
Worklog Time Spent: 10m 
  Work Description: henrib commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187438016


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/properties/PropertyManager.java:
##
@@ -0,0 +1,629 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.properties;
+
+
+import org.apache.commons.jexl3.JexlBuilder;
+import org.apache.commons.jexl3.JexlContext;
+import org.apache.commons.jexl3.JexlEngine;
+import org.apache.commons.jexl3.JexlException;
+import org.apache.commons.jexl3.JexlExpression;
+import org.apache.commons.jexl3.JexlFeatures;
+import org.apache.commons.jexl3.JexlScript;
+import org.apache.commons.jexl3.ObjectContext;
+import org.apache.commons.jexl3.introspection.JexlPermissions;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.reflect.Constructor;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Properties;
+import java.util.TreeMap;
+import java.util.UUID;
+import java.util.function.Function;
+
+/**
+ * A property manager.
+ * 
+ * This handles operations at the higher functional level; an instance is 
created per-session and
+ * drives queries and updates in a transactional manner.
+ * 
+ * 
+ * The manager ties the property schemas into one namespace; all property maps 
it handles must and will use
+ * one of its known schema.
+ * 
+ * The manager class needs to be registered with its namespace as key
+ * 
+ *   Since a collection of properties are stored in a map, to avoid hitting 
the persistence store for each update
+ *   - which would mean rewriting the map multiple times - the manager keeps 
track of dirty maps whilst
+ *   serving as transaction manager. This way, when importing multiple 
properties targeting different elements (think
+ *   setting properties for different tables), each impacted map is only 
rewritten
+ *   once by the persistence layer during commit. This also allows multiple 
calls to participate to one transactions.
+ * 
+ */
+public abstract class PropertyManager {
+  /** The logger. */
+  public static final Logger LOGGER = 
LoggerFactory.getLogger(PropertyManager.class);
+  /** The set of dirty maps. */
+  protected final Map dirtyMaps = new HashMap<>();
+  /** This manager namespace. */
+  protected final String namespace;
+  /** The property map store. */
+  protected final PropertyStore store;
+  /** A Jexl engine for convenience. */
+  static final JexlEngine JEXL;
+  static {
+JexlFeatures features = new JexlFeatures()
+.sideEffect(false)
+.sideEffectGlobal(false);
+JexlPermissions p = JexlPermissions.RESTRICTED
+.compose("org.apache.hadoop.hive.metastore.properties.*");
+JEXL = new JexlBuilder()
+.features(features)
+.permissions(p)
+.create();
+  }
+
+  /**
+   * The map of defined managers.
+   */
+  private static final Map> 
NSMANAGERS = new HashMap<>();
+
+  /**
+   * Declares a property manager class.
+   * @param ns the namespace
+   * @param pmClazz the property manager class
+   */
+  public static boolean declare(String ns, Class 
pmClazz) {
+try {
+  synchronized(NSMANAGERS) {
+Constructor ctor = NSMANAGERS.get(ns);
+if (ctor == null) {
+  ctor = pmClazz.getConstructor(String.class, PropertyStore.class);
+  NSMANAGERS.put(ns, ctor);
+  return true;
+} else {
+  if (!Objects.equals(ctor.getDeclaringClass(), pmClazz)) {
+LOGGER.error("namespace {} 

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861009
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 13:17
Start Date: 08/May/23 13:17
Worklog Time Spent: 10m 
  Work Description: henrib commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187435115


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PropertyServlet.java:
##
@@ -0,0 +1,307 @@
+/* * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import com.google.gson.Gson;
+import com.google.gson.JsonIOException;
+import com.google.gson.JsonSyntaxException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.properties.PropertyException;
+import org.apache.hadoop.hive.metastore.properties.PropertyManager;
+import org.apache.hadoop.hive.metastore.properties.PropertyMap;
+import org.apache.hadoop.hive.metastore.properties.PropertyStore;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.security.SecurityUtil;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.eclipse.jetty.server.HttpConfiguration;
+import org.eclipse.jetty.server.HttpConnectionFactory;
+import org.eclipse.jetty.server.Server;
+import org.eclipse.jetty.server.ServerConnector;
+import org.eclipse.jetty.servlet.ServletContextHandler;
+import org.eclipse.jetty.servlet.ServletHandler;
+import org.eclipse.jetty.servlet.ServletHolder;
+import org.eclipse.jetty.servlet.Source;
+import org.eclipse.jetty.util.ssl.SslContextFactory;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.servlet.ServletException;
+import javax.servlet.ServletInputStream;
+import javax.servlet.ServletOutputStream;
+import javax.servlet.http.HttpServlet;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import javax.servlet.http.Part;
+import java.io.BufferedReader;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.PrintWriter;
+import java.io.Reader;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+/**
+ * The property  cli servlet.
+ */
+public class PropertyServlet extends HttpServlet {
+  /** The logger. */
+  public static final Logger LOGGER = 
LoggerFactory.getLogger(PropertyServlet.class);
+  /** The object store. */
+  private final RawStore objectStore;
+  /** The security. */
+  private final ServletSecurity security;
+
+  PropertyServlet(Configuration configuration, RawStore store) {

Review Comment:
   This is not the same as Thrift over HTTP since this does *not* use Thrift; 
the messages used by this servlet are JSON encoded.





Issue Time Tracking
---

Worklog Id: (was: 861009)
Time Spent: 17h 20m  (was: 17h 10m)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the 

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861008
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 13:15
Start Date: 08/May/23 13:15
Worklog Time Spent: 10m 
  Work Description: henrib commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187433359


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/properties/PropertyMap.java:
##
@@ -0,0 +1,466 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.properties;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+import java.io.InvalidObjectException;
+import java.io.ObjectInputStream;
+import java.io.ObjectStreamException;
+import java.io.Serializable;
+import java.util.Collections;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Properties;
+import java.util.TreeMap;
+import java.util.UUID;
+import java.util.function.BiConsumer;
+import java.util.function.Function;
+
+/**
+ * A property map pertaining to a given object type (cluster, database, table).
+ * 
+ *   Maps follow a copy-on-write scheme gated by a dirty flag (avoid copy of a 
dirty map). This allows
+ *   sharing their content (the inner properties map) with guaranteed 
isolation and thread safety.
+ * 
+ */
+public class PropertyMap implements Serializable {

Review Comment:
   The idea behind the serialization proxy is that of a convention; the 
'contract' does not need to be made explicit through inheritance. Adding an 
abstract class for one use case that is already provably following a convention 
does not bring any clarity imho.





Issue Time Tracking
---

Worklog Id: (was: 861008)
Time Spent: 17h 10m  (was: 17h)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> to. Any information related to a database, a table or the cluster - like 
> statistics for example or any operational data state or data (think rolling 
> backup) -  fall in this use-case.
> Typically, accommodating such a feature requires modifying the Metastore 
> database schema by adding or altering a table. It also usually implies 
> modifying the thrift APIs to expose such meta-data to consumers.
> The proposed feature wants to solve the persistence and query/transport for 
> these types of use-cases by exposing a 'key/(meta)value' store exposed as a 
> property system.
> HOW
> A property-value model is the simple and generic exposed API.
> To provision for several usage scenarios, the model entry point is a 
> 'namespace' that qualifies the feature-component property manager. For 
> example, 'stats' could be the namespace for all properties related to the 
> 'statistics' feature.
> The namespace identifies a manager that handles property-groups persisted as 
> property-maps. For instance, all statistics pertaining to a given table would 
> be collocated in the same property-group. As such, all properties (say number 
> of 'unique_values' per columns) for a given HMS table 'relation0' would all 
> be stored and persisted in the same property-map instance.
> Property-maps may be decorated by an (optional) schema that may declare the 
> 

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861006
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 12:43
Start Date: 08/May/23 12:43
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187402296


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java:
##
@@ -113,7 +113,9 @@
 import org.apache.hadoop.hive.metastore.api.WMTrigger;
 import org.apache.hadoop.hive.metastore.api.WMValidateResourcePlanResponse;
 import org.apache.hadoop.hive.metastore.api.WriteEventInfo;
+import org.apache.hadoop.hive.metastore.model.MMetastoreDBProperties;

Review Comment:
   nit: unused import





Issue Time Tracking
---

Worklog Id: (was: 861006)
Time Spent: 17h  (was: 16h 50m)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> to. Any information related to a database, a table or the cluster - like 
> statistics for example or any operational data state or data (think rolling 
> backup) -  fall in this use-case.
> Typically, accommodating such a feature requires modifying the Metastore 
> database schema by adding or altering a table. It also usually implies 
> modifying the thrift APIs to expose such meta-data to consumers.
> The proposed feature wants to solve the persistence and query/transport for 
> these types of use-cases by exposing a 'key/(meta)value' store exposed as a 
> property system.
> HOW
> A property-value model is the simple and generic exposed API.
> To provision for several usage scenarios, the model entry point is a 
> 'namespace' that qualifies the feature-component property manager. For 
> example, 'stats' could be the namespace for all properties related to the 
> 'statistics' feature.
> The namespace identifies a manager that handles property-groups persisted as 
> property-maps. For instance, all statistics pertaining to a given table would 
> be collocated in the same property-group. As such, all properties (say number 
> of 'unique_values' per columns) for a given HMS table 'relation0' would all 
> be stored and persisted in the same property-map instance.
> Property-maps may be decorated by an (optional) schema that may declare the 
> name and value-type of allowed properties (and their optional default value). 
> Each property is addressed by a name, a path uniquely identifying the 
> property in a given property map.
> The manager also handles transforming property-map names to the property-map 
> keys used to persist them in the DB.
> The API provides inserting/updating properties in bulk transactionally. It 
> also provides selection/projection to help reduce the volume of exchange 
> between client/server; selection can use (JEXL expression) predicates to 
> filter maps.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861005
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 12:40
Start Date: 08/May/23 12:40
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187398678


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/properties/PropertyMap.java:
##
@@ -0,0 +1,466 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.properties;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+import java.io.InvalidObjectException;
+import java.io.ObjectInputStream;
+import java.io.ObjectStreamException;
+import java.io.Serializable;
+import java.util.Collections;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Properties;
+import java.util.TreeMap;
+import java.util.UUID;
+import java.util.function.BiConsumer;
+import java.util.function.Function;
+
+/**
+ * A property map pertaining to a given object type (cluster, database, table).
+ * 
+ *   Maps follow a copy-on-write scheme gated by a dirty flag (avoid copy of a 
dirty map). This allows
+ *   sharing their content (the inner properties map) with guaranteed 
isolation and thread safety.
+ * 
+ */
+public class PropertyMap implements Serializable {

Review Comment:
   I check the SerializationProxy, looks like we use something like `public 
Abc(DataInput input, Object... args)` to deserialize the object from byte 
array, and use `write(DataOutput out)` to serialize the instance, so my idea is 
that, we'd better to have these two methods as a template for 
SerializationProxy.





Issue Time Tracking
---

Worklog Id: (was: 861005)
Time Spent: 16h 50m  (was: 16h 40m)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> to. Any information related to a database, a table or the cluster - like 
> statistics for example or any operational data state or data (think rolling 
> backup) -  fall in this use-case.
> Typically, accommodating such a feature requires modifying the Metastore 
> database schema by adding or altering a table. It also usually implies 
> modifying the thrift APIs to expose such meta-data to consumers.
> The proposed feature wants to solve the persistence and query/transport for 
> these types of use-cases by exposing a 'key/(meta)value' store exposed as a 
> property system.
> HOW
> A property-value model is the simple and generic exposed API.
> To provision for several usage scenarios, the model entry point is a 
> 'namespace' that qualifies the feature-component property manager. For 
> example, 'stats' could be the namespace for all properties related to the 
> 'statistics' feature.
> The namespace identifies a manager that handles property-groups persisted as 
> property-maps. For instance, all statistics pertaining to a given table would 
> be collocated in the same property-group. As such, all properties (say number 
> of 'unique_values' per columns) for a given HMS table 'relation0' would all 
> be stored and persisted in the same property-map instance.
> Property-maps may be 

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861004
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 12:35
Start Date: 08/May/23 12:35
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187394625


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PropertyServlet.java:
##
@@ -0,0 +1,307 @@
+/* * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import com.google.gson.Gson;
+import com.google.gson.JsonIOException;
+import com.google.gson.JsonSyntaxException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.properties.PropertyException;
+import org.apache.hadoop.hive.metastore.properties.PropertyManager;
+import org.apache.hadoop.hive.metastore.properties.PropertyMap;
+import org.apache.hadoop.hive.metastore.properties.PropertyStore;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.security.SecurityUtil;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.eclipse.jetty.server.HttpConfiguration;
+import org.eclipse.jetty.server.HttpConnectionFactory;
+import org.eclipse.jetty.server.Server;
+import org.eclipse.jetty.server.ServerConnector;
+import org.eclipse.jetty.servlet.ServletContextHandler;
+import org.eclipse.jetty.servlet.ServletHandler;
+import org.eclipse.jetty.servlet.ServletHolder;
+import org.eclipse.jetty.servlet.Source;
+import org.eclipse.jetty.util.ssl.SslContextFactory;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.servlet.ServletException;
+import javax.servlet.ServletInputStream;
+import javax.servlet.ServletOutputStream;
+import javax.servlet.http.HttpServlet;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import javax.servlet.http.Part;
+import java.io.BufferedReader;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.PrintWriter;
+import java.io.Reader;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+/**
+ * The property  cli servlet.
+ */
+public class PropertyServlet extends HttpServlet {
+  /** The logger. */
+  public static final Logger LOGGER = 
LoggerFactory.getLogger(PropertyServlet.class);
+  /** The object store. */
+  private final RawStore objectStore;
+  /** The security. */
+  private final ServletSecurity security;
+
+  PropertyServlet(Configuration configuration, RawStore store) {

Review Comment:
   As you can see, we have already supported HTTP over Thrift in HMS, if this 
is an issue, can we track it in a separate Jira? Thanks!





Issue Time Tracking
---

Worklog Id: (was: 861004)
Time Spent: 16h 40m  (was: 16.5h)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking 

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=861002=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861002
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 11:58
Start Date: 08/May/23 11:58
Worklog Time Spent: 10m 
  Work Description: henrib commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187361181


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/properties/PropertyMap.java:
##
@@ -0,0 +1,466 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.properties;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+import java.io.InvalidObjectException;
+import java.io.ObjectInputStream;
+import java.io.ObjectStreamException;
+import java.io.Serializable;
+import java.util.Collections;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Properties;
+import java.util.TreeMap;
+import java.util.UUID;
+import java.util.function.BiConsumer;
+import java.util.function.Function;
+
+/**
+ * A property map pertaining to a given object type (cluster, database, table).
+ * 
+ *   Maps follow a copy-on-write scheme gated by a dirty flag (avoid copy of a 
dirty map). This allows
+ *   sharing their content (the inner properties map) with guaranteed 
isolation and thread safety.
+ * 
+ */
+public class PropertyMap implements Serializable {

Review Comment:
   We could but this is not needed and would hide the convention which does not 
require any extending any class, just implement the interface. And note that 
most serialization methods are private (
 private Object writeReplace() throws ObjectStreamException; private void 
readObject(ObjectInputStream in) throws IOException, ClassNotFoundException).





Issue Time Tracking
---

Worklog Id: (was: 861002)
Time Spent: 16.5h  (was: 16h 20m)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> to. Any information related to a database, a table or the cluster - like 
> statistics for example or any operational data state or data (think rolling 
> backup) -  fall in this use-case.
> Typically, accommodating such a feature requires modifying the Metastore 
> database schema by adding or altering a table. It also usually implies 
> modifying the thrift APIs to expose such meta-data to consumers.
> The proposed feature wants to solve the persistence and query/transport for 
> these types of use-cases by exposing a 'key/(meta)value' store exposed as a 
> property system.
> HOW
> A property-value model is the simple and generic exposed API.
> To provision for several usage scenarios, the model entry point is a 
> 'namespace' that qualifies the feature-component property manager. For 
> example, 'stats' could be the namespace for all properties related to the 
> 'statistics' feature.
> The namespace identifies a manager that handles property-groups persisted as 
> property-maps. For instance, all statistics pertaining to a given table would 
> be collocated in the same property-group. As such, all properties (say number 
> of 'unique_values' per columns) for a given HMS table 'relation0' would all 
> be stored and persisted in the same property-map 

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=860999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860999
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 11:29
Start Date: 08/May/23 11:29
Worklog Time Spent: 10m 
  Work Description: henrib commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187336490


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PropertyServlet.java:
##
@@ -0,0 +1,307 @@
+/* * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import com.google.gson.Gson;
+import com.google.gson.JsonIOException;
+import com.google.gson.JsonSyntaxException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.properties.PropertyException;
+import org.apache.hadoop.hive.metastore.properties.PropertyManager;
+import org.apache.hadoop.hive.metastore.properties.PropertyMap;
+import org.apache.hadoop.hive.metastore.properties.PropertyStore;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.security.SecurityUtil;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.eclipse.jetty.server.HttpConfiguration;
+import org.eclipse.jetty.server.HttpConnectionFactory;
+import org.eclipse.jetty.server.Server;
+import org.eclipse.jetty.server.ServerConnector;
+import org.eclipse.jetty.servlet.ServletContextHandler;
+import org.eclipse.jetty.servlet.ServletHandler;
+import org.eclipse.jetty.servlet.ServletHolder;
+import org.eclipse.jetty.servlet.Source;
+import org.eclipse.jetty.util.ssl.SslContextFactory;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.servlet.ServletException;
+import javax.servlet.ServletInputStream;
+import javax.servlet.ServletOutputStream;
+import javax.servlet.http.HttpServlet;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import javax.servlet.http.Part;
+import java.io.BufferedReader;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.PrintWriter;
+import java.io.Reader;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+/**
+ * The property  cli servlet.
+ */
+public class PropertyServlet extends HttpServlet {
+  /** The logger. */
+  public static final Logger LOGGER = 
LoggerFactory.getLogger(PropertyServlet.class);
+  /** The object store. */
+  private final RawStore objectStore;
+  /** The security. */
+  private final ServletSecurity security;
+
+  PropertyServlet(Configuration configuration, RawStore store) {

Review Comment:
   This the servlet allowing the properties features to be accessed through 
https.





Issue Time Tracking
---

Worklog Id: (was: 860999)
Time Spent: 16h 20m  (was: 16h 10m)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it 

[jira] [Resolved] (HIVE-27187) Incremental rebuild of materialized view having aggregate and stored by iceberg

2023-05-08 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27187.
---
Resolution: Fixed

> Incremental rebuild of materialized view having aggregate and stored by 
> iceberg
> ---
>
> Key: HIVE-27187
> URL: https://issues.apache.org/jira/browse/HIVE-27187
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Currently incremental rebuild of materialized view stored by iceberg which 
> definition query contains aggregate operator is transformed to an insert 
> overwrite statement which contains a union operator if the source tables 
> contains insert operations only. One branch of the union scans the view the 
> other produces the delta.
> This can be improved further: transform the statement to a multi insert 
> statement representing a merge statement to insert new aggregations and 
> update existing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27268) Hive.getPartitionsByNames should not enforce SessionState to be available

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27268?focusedWorklogId=860996=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860996
 ]

ASF GitHub Bot logged work on HIVE-27268:
-

Author: ASF GitHub Bot
Created on: 08/May/23 11:22
Start Date: 08/May/23 11:22
Worklog Time Spent: 10m 
  Work Description: henrib commented on code in PR #4241:
URL: https://github.com/apache/hive/pull/4241#discussion_r1187329741


##
ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java:
##
@@ -909,5 +910,30 @@ public void testEmptyCompactionResult() throws Exception {
 Assert.assertEquals(stringifyValues(data), rs);
 
   }
+
+  /**
+   * HIVE-27268
+   */
+  @Test
+  public void testGetPartitionsNoSession() throws Exception {
+hiveConf.setIntVar(HiveConf.ConfVars.HIVEOPTSORTDYNAMICPARTITIONTHRESHOLD, 
-1);
+runStatementOnDriver("drop table if exists T");
+runStatementOnDriver("create table T(a int, b int) partitioned by (p int, 
q int) " +
+"stored as orc TBLPROPERTIES ('transactional'='true')");
+
+int[][] targetVals = {{4, 1, 1}, {4, 2, 2}, {4, 3, 1}, {4, 4, 2}};
+//we only recompute stats after major compact if they existed before
+runStatementOnDriver("insert into T partition(p=1,q) " + 
makeValuesClause(targetVals));
+runStatementOnDriver("analyze table T  partition(p=1) compute statistics 
for columns");
+
+Hive hive = Hive.get();
+org.apache.hadoop.hive.ql.metadata.Table hiveTable = hive.getTable("T");
+// this will ensure the getValidWriteIdList has no session to work with 
(thru getPartitions)
+SessionState.detachSession();
+List partitions = 
hive.getPartitions(hiveTable);
+Assert.assertNotNull(partitions);
+// prevent tear down failure
+d = null;

Review Comment:
   Implies protecting against the 'null' session in context.close(), done.





Issue Time Tracking
---

Worklog Id: (was: 860996)
Time Spent: 1.5h  (was: 1h 20m)

> Hive.getPartitionsByNames should not enforce SessionState to be available
> -
>
> Key: HIVE-27268
> URL: https://issues.apache.org/jira/browse/HIVE-27268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Henri Biestro
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> HIVE-24743, HIVE-24392 is enforcing to check for valid write Id list for 
> "Hive.getPartitionsByName".
> This breaks basic API integration. For a user who needs to get basic 
> partition detail, he is forced to have SessionState.
> Request in this ticket is to ensure that if SessionState.get() is null, it 
> should return empty validWriteIdList.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27187) Incremental rebuild of materialized view having aggregate and stored by iceberg

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27187?focusedWorklogId=860993=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860993
 ]

ASF GitHub Bot logged work on HIVE-27187:
-

Author: ASF GitHub Bot
Created on: 08/May/23 11:21
Start Date: 08/May/23 11:21
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged PR #4278:
URL: https://github.com/apache/hive/pull/4278




Issue Time Tracking
---

Worklog Id: (was: 860993)
Time Spent: 5h 40m  (was: 5.5h)

> Incremental rebuild of materialized view having aggregate and stored by 
> iceberg
> ---
>
> Key: HIVE-27187
> URL: https://issues.apache.org/jira/browse/HIVE-27187
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Currently incremental rebuild of materialized view stored by iceberg which 
> definition query contains aggregate operator is transformed to an insert 
> overwrite statement which contains a union operator if the source tables 
> contains insert operations only. One branch of the union scans the view the 
> other produces the delta.
> This can be improved further: transform the statement to a multi insert 
> statement representing a merge statement to insert new aggregations and 
> update existing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27187) Incremental rebuild of materialized view having aggregate and stored by iceberg

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27187?focusedWorklogId=860991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860991
 ]

ASF GitHub Bot logged work on HIVE-27187:
-

Author: ASF GitHub Bot
Created on: 08/May/23 11:04
Start Date: 08/May/23 11:04
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4278:
URL: https://github.com/apache/hive/pull/4278#discussion_r1187315293


##
ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java:
##
@@ -3096,6 +3096,8 @@ Seems much cleaner if each stmt is identified as a 
particular HiveOperation (whi
 assert t != null;
 if (AcidUtils.isTransactionalTable(t) && sharedWrite) {
   compBuilder.setSharedWrite();
+} else if (MetaStoreUtils.isNonNativeTable(t.getTTable())) {
+  compBuilder.setLock(getLockTypeFromStorageHandler(output, t));

Review Comment:
    





Issue Time Tracking
---

Worklog Id: (was: 860991)
Time Spent: 5.5h  (was: 5h 20m)

> Incremental rebuild of materialized view having aggregate and stored by 
> iceberg
> ---
>
> Key: HIVE-27187
> URL: https://issues.apache.org/jira/browse/HIVE-27187
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Currently incremental rebuild of materialized view stored by iceberg which 
> definition query contains aggregate operator is transformed to an insert 
> overwrite statement which contains a union operator if the source tables 
> contains insert operations only. One branch of the union scans the view the 
> other produces the delta.
> This can be improved further: transform the statement to a multi insert 
> statement representing a merge statement to insert new aggregations and 
> update existing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=860988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860988
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 10:35
Start Date: 08/May/23 10:35
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187292000


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/properties/SerializationProxy.java:
##
@@ -0,0 +1,614 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.properties;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.Externalizable;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInput;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutput;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import java.io.Serializable;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.Executable;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+
+import static 
org.apache.hadoop.hive.metastore.properties.Serializer.SERIALIZER;
+
+/**
+ * The serialization proxy template.
+ * 
+ * This allows a class that defines final members to be made serializable in 
an easy way.
+ * The class must implement:
+ * 
+ * a constructor that takes a DataInput (or derived class) as 
parameter
+ * a write method that takes a DataOutput (or derived class) as 
parameter
+ * 
+ * 
+ *   One should consider the constructor as being potentially fed with an 
invalid stream so
+ *   all usual checks of a public constructor should apply.
+ * 
+ * Standard usage is to add the Serializable interface implementation through 
the following 2 methods:
+ * 
+ * private Object writeReplace() throws ObjectStreamException {
+ * return new SerializationProxyTheClass(this);
+ * }
+ * private void readObject(ObjectInputStream in)throws 
IOException,ClassNotFoundException{
+ * throw new InvalidObjectException("proxy required");
+ * }
+ * 
+ * @param  the serializable object type
+ */
+public class SerializationProxy implements 
Externalizable {
+  /** Serial version. */
+  private static final long serialVersionUID = 202212281757L;
+  /** The logger. */
+  public static final Logger LOGGER = 
LoggerFactory.getLogger(SerializationProxy.class);
+  /** The map of class names to types. */
+  private static final ConcurrentMap> TYPES = new 
ConcurrentHashMap<>();
+  /** The list of registered pre-defined classes. */
+  private static final List> REGISTERED = new ArrayList<>();
+  /** A thread local context used for arguments passing during 
serialization/de-serialization. */
+  private static final ThreadLocal EXTRA_ARGUMENTS = new 
ThreadLocal<>();
+
+  /** The type of instance being read or written. */
+  private transient Type type = null;
+  /** The instance being read or written. */
+  private transient T proxied = null;
+
+  /**
+   * Wraps any error that may occur whilst using reflective calls.
+   */
+  public static class ProxyException extends RuntimeException {
+public ProxyException(Throwable cause) {
+  super(cause);
+}
+
+public ProxyException(String msg) {
+  super(msg);
+}
+
+/**
+ * Convert an exception to a VDBRuntimeException.
+ * @param cause the exception to convert
+ * @return the wrapping CubeException
+ */
+public static ProxyException convert(Throwable cause) {
+  if (cause instanceof ProxyException) {
+return (ProxyException) cause;
+  } else {
+

[jira] [Resolved] (HIVE-23394) TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky

2023-05-08 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa resolved HIVE-23394.

Resolution: Fixed

> TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky
> 
>
> Key: HIVE-23394
> URL: https://issues.apache.org/jira/browse/HIVE-23394
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> both 
> TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1 and
> TestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1
> can fail with the exception below
> seems like the connection was lost
> {code}
> Error Message
> Failed to close statement
> Stacktrace
> java.sql.SQLException: Failed to close statement
>   at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:200)
>   at 
> org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:205)
>   at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:222)
>   at 
> org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.runQuery(AbstractTestJdbcGenericUDTFGetSplits.java:135)
>   at 
> org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1(AbstractTestJdbcGenericUDTFGetSplits.java:164)
>   at 
> org.apache.hive.jdbc.TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1(TestJdbcGenericUDTFGetSplits2.java:28)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: org.apache.thrift.TApplicationException: CloseOperation failed: 
> out of sequence response
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:84)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:521)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:508)
>   at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1732)
>   at com.sun.proxy.$Proxy146.CloseOperation(Unknown Source)
>   at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:193)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-23394) TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky

2023-05-08 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720466#comment-17720466
 ] 

Simhadri Govindappa commented on HIVE-23394:


The change is merged to master. 



Thanks, [~dkuzmenko] ,[~ayushtkn]  for the review!

> TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky
> 
>
> Key: HIVE-23394
> URL: https://issues.apache.org/jira/browse/HIVE-23394
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> both 
> TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1 and
> TestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1
> can fail with the exception below
> seems like the connection was lost
> {code}
> Error Message
> Failed to close statement
> Stacktrace
> java.sql.SQLException: Failed to close statement
>   at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:200)
>   at 
> org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:205)
>   at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:222)
>   at 
> org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.runQuery(AbstractTestJdbcGenericUDTFGetSplits.java:135)
>   at 
> org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1(AbstractTestJdbcGenericUDTFGetSplits.java:164)
>   at 
> org.apache.hive.jdbc.TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1(TestJdbcGenericUDTFGetSplits2.java:28)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: org.apache.thrift.TApplicationException: CloseOperation failed: 
> out of sequence response
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:84)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:521)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:508)
>   at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1732)
>   at com.sun.proxy.$Proxy146.CloseOperation(Unknown Source)
>   at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:193)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=860987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860987
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 10:29
Start Date: 08/May/23 10:29
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187286512


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PropertyServlet.java:
##
@@ -0,0 +1,307 @@
+/* * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore;
+
+import com.google.gson.Gson;
+import com.google.gson.JsonIOException;
+import com.google.gson.JsonSyntaxException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.properties.PropertyException;
+import org.apache.hadoop.hive.metastore.properties.PropertyManager;
+import org.apache.hadoop.hive.metastore.properties.PropertyMap;
+import org.apache.hadoop.hive.metastore.properties.PropertyStore;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.security.SecurityUtil;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.eclipse.jetty.server.HttpConfiguration;
+import org.eclipse.jetty.server.HttpConnectionFactory;
+import org.eclipse.jetty.server.Server;
+import org.eclipse.jetty.server.ServerConnector;
+import org.eclipse.jetty.servlet.ServletContextHandler;
+import org.eclipse.jetty.servlet.ServletHandler;
+import org.eclipse.jetty.servlet.ServletHolder;
+import org.eclipse.jetty.servlet.Source;
+import org.eclipse.jetty.util.ssl.SslContextFactory;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.servlet.ServletException;
+import javax.servlet.ServletInputStream;
+import javax.servlet.ServletOutputStream;
+import javax.servlet.http.HttpServlet;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import javax.servlet.http.Part;
+import java.io.BufferedReader;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.PrintWriter;
+import java.io.Reader;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+/**
+ * The property  cli servlet.
+ */
+public class PropertyServlet extends HttpServlet {
+  /** The logger. */
+  public static final Logger LOGGER = 
LoggerFactory.getLogger(PropertyServlet.class);
+  /** The object store. */
+  private final RawStore objectStore;
+  /** The security. */
+  private final ServletSecurity security;
+
+  PropertyServlet(Configuration configuration, RawStore store) {

Review Comment:
   What the purpose of this servlet? if this is for test, can we move it to 
`test` package?





Issue Time Tracking
---

Worklog Id: (was: 860987)
Time Spent: 16h  (was: 15h 50m)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it 

[jira] [Work logged] (HIVE-23394) TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23394?focusedWorklogId=860986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860986
 ]

ASF GitHub Bot logged work on HIVE-23394:
-

Author: ASF GitHub Bot
Created on: 08/May/23 10:28
Start Date: 08/May/23 10:28
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on PR #4249:
URL: https://github.com/apache/hive/pull/4249#issuecomment-1538140793

   Thanks @aturoczy , @deniskuzZ , @ayushtkn  for the review!




Issue Time Tracking
---

Worklog Id: (was: 860986)
Time Spent: 4h 40m  (was: 4.5h)

> TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky
> 
>
> Key: HIVE-23394
> URL: https://issues.apache.org/jira/browse/HIVE-23394
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> both 
> TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1 and
> TestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1
> can fail with the exception below
> seems like the connection was lost
> {code}
> Error Message
> Failed to close statement
> Stacktrace
> java.sql.SQLException: Failed to close statement
>   at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:200)
>   at 
> org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:205)
>   at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:222)
>   at 
> org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.runQuery(AbstractTestJdbcGenericUDTFGetSplits.java:135)
>   at 
> org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1(AbstractTestJdbcGenericUDTFGetSplits.java:164)
>   at 
> org.apache.hive.jdbc.TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1(TestJdbcGenericUDTFGetSplits2.java:28)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: org.apache.thrift.TApplicationException: CloseOperation failed: 
> out of sequence response
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:84)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:521)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:508)
>   at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1732)
>   at com.sun.proxy.$Proxy146.CloseOperation(Unknown Source)
>   at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:193)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-23394) TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23394?focusedWorklogId=860985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860985
 ]

ASF GitHub Bot logged work on HIVE-23394:
-

Author: ASF GitHub Bot
Created on: 08/May/23 10:24
Start Date: 08/May/23 10:24
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged PR #4249:
URL: https://github.com/apache/hive/pull/4249




Issue Time Tracking
---

Worklog Id: (was: 860985)
Time Spent: 4.5h  (was: 4h 20m)

> TestJdbcGenericUDTFGetSplits2#testGenericUDTFOrderBySplitCount1 is flaky
> 
>
> Key: HIVE-23394
> URL: https://issues.apache.org/jira/browse/HIVE-23394
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> both 
> TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1 and
> TestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1
> can fail with the exception below
> seems like the connection was lost
> {code}
> Error Message
> Failed to close statement
> Stacktrace
> java.sql.SQLException: Failed to close statement
>   at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:200)
>   at 
> org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:205)
>   at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:222)
>   at 
> org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.runQuery(AbstractTestJdbcGenericUDTFGetSplits.java:135)
>   at 
> org.apache.hive.jdbc.AbstractTestJdbcGenericUDTFGetSplits.testGenericUDTFOrderBySplitCount1(AbstractTestJdbcGenericUDTFGetSplits.java:164)
>   at 
> org.apache.hive.jdbc.TestJdbcGenericUDTFGetSplits2.testGenericUDTFOrderBySplitCount1(TestJdbcGenericUDTFGetSplits2.java:28)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: org.apache.thrift.TApplicationException: CloseOperation failed: 
> out of sequence response
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:84)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:521)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:508)
>   at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1732)
>   at com.sun.proxy.$Proxy146.CloseOperation(Unknown Source)
>   at 
> org.apache.hive.jdbc.HiveStatement.closeStatementIfNeeded(HiveStatement.java:193)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=860984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860984
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 10:23
Start Date: 08/May/23 10:23
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187282071


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/properties/PropertyMap.java:
##
@@ -0,0 +1,466 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.properties;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+import java.io.InvalidObjectException;
+import java.io.ObjectInputStream;
+import java.io.ObjectStreamException;
+import java.io.Serializable;
+import java.util.Collections;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Properties;
+import java.util.TreeMap;
+import java.util.UUID;
+import java.util.function.BiConsumer;
+import java.util.function.Function;
+
+/**
+ * A property map pertaining to a given object type (cluster, database, table).
+ * 
+ *   Maps follow a copy-on-write scheme gated by a dirty flag (avoid copy of a 
dirty map). This allows
+ *   sharing their content (the inner properties map) with guaranteed 
isolation and thread safety.
+ * 
+ */
+public class PropertyMap implements Serializable {

Review Comment:
   nit: may be we can declare a super class for de/serializing instead extends 
`Serializable`, for example, the super class might look like:
   ```java
   abstract class Abc {
 public Abc(DataInput input, Object... args) {
   // noop
 }
 public abstract Abc write(DataOutput out);
   }





Issue Time Tracking
---

Worklog Id: (was: 860984)
Time Spent: 15h 50m  (was: 15h 40m)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> to. Any information related to a database, a table or the cluster - like 
> statistics for example or any operational data state or data (think rolling 
> backup) -  fall in this use-case.
> Typically, accommodating such a feature requires modifying the Metastore 
> database schema by adding or altering a table. It also usually implies 
> modifying the thrift APIs to expose such meta-data to consumers.
> The proposed feature wants to solve the persistence and query/transport for 
> these types of use-cases by exposing a 'key/(meta)value' store exposed as a 
> property system.
> HOW
> A property-value model is the simple and generic exposed API.
> To provision for several usage scenarios, the model entry point is a 
> 'namespace' that qualifies the feature-component property manager. For 
> example, 'stats' could be the namespace for all properties related to the 
> 'statistics' feature.
> The namespace identifies a manager that handles property-groups persisted as 
> property-maps. For instance, all statistics pertaining to a given table would 
> be collocated in the same property-group. As such, all properties (say number 
> of 'unique_values' per columns) for a given HMS table 'relation0' would all 
> be stored and persisted in the same property-map instance.
> Property-maps may be decorated by 

[jira] [Created] (HIVE-27325) Expiring old snapshots deletes files with DirectExecutorService causing runtime delays

2023-05-08 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-27325:
---

 Summary: Expiring old snapshots deletes files with 
DirectExecutorService causing runtime delays
 Key: HIVE-27325
 URL: https://issues.apache.org/jira/browse/HIVE-27325
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


Expiring old snapshots takes a lot of time, as fileCleanupStrategy internally 
uses directExecutorService. Creating this as a placeholder ticket to fix the 
same. If fixed in iceberg, need to upgrade the lib here.

{noformat}
insert into store_sales_delete_9 select *, current_timestamp() as ts from 
tpcds_1000_update.ssv ;;

ALTER TABLE store_sales_delete_9 EXECUTE expire_snapshots('2023-05-09 
00:00:00');


{noformat}


{noformat}
at 
org.apache.iceberg.relocated.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
at org.apache.iceberg.util.Tasks$Builder.runParallel(Tasks.java:300)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:194)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189)
at 
org.apache.iceberg.FileCleanupStrategy.deleteFiles(FileCleanupStrategy.java:84)
at 
org.apache.iceberg.IncrementalFileCleanup.cleanFiles(IncrementalFileCleanup.java:262)
at 
org.apache.iceberg.RemoveSnapshots.cleanExpiredSnapshots(RemoveSnapshots.java:338)
at org.apache.iceberg.RemoveSnapshots.commit(RemoveSnapshots.java:312)
at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.executeOperation(HiveIcebergStorageHandler.java:560)
at 
org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6844)
at 
org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360)
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333)
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250)
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
at java.security.AccessController.doPrivileged(java.base@11.0.19/Native 
Method)
at javax.security.auth.Subject.doAs(java.base@11.0.19/Subject.java:423)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
at 
java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
at 
java.util.concurrent.FutureTask.run(java.base@11.0.19/FutureTask.java:264)
at 
java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=860981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860981
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 10:19
Start Date: 08/May/23 10:19
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187278590


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/properties/PropertyManager.java:
##
@@ -0,0 +1,629 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.properties;
+
+
+import org.apache.commons.jexl3.JexlBuilder;
+import org.apache.commons.jexl3.JexlContext;
+import org.apache.commons.jexl3.JexlEngine;
+import org.apache.commons.jexl3.JexlException;
+import org.apache.commons.jexl3.JexlExpression;
+import org.apache.commons.jexl3.JexlFeatures;
+import org.apache.commons.jexl3.JexlScript;
+import org.apache.commons.jexl3.ObjectContext;
+import org.apache.commons.jexl3.introspection.JexlPermissions;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.NoSuchObjectException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.reflect.Constructor;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Properties;
+import java.util.TreeMap;
+import java.util.UUID;
+import java.util.function.Function;
+
+/**
+ * A property manager.
+ * 
+ * This handles operations at the higher functional level; an instance is 
created per-session and
+ * drives queries and updates in a transactional manner.
+ * 
+ * 
+ * The manager ties the property schemas into one namespace; all property maps 
it handles must and will use
+ * one of its known schema.
+ * 
+ * The manager class needs to be registered with its namespace as key
+ * 
+ *   Since a collection of properties are stored in a map, to avoid hitting 
the persistence store for each update
+ *   - which would mean rewriting the map multiple times - the manager keeps 
track of dirty maps whilst
+ *   serving as transaction manager. This way, when importing multiple 
properties targeting different elements (think
+ *   setting properties for different tables), each impacted map is only 
rewritten
+ *   once by the persistence layer during commit. This also allows multiple 
calls to participate to one transactions.
+ * 
+ */
+public abstract class PropertyManager {
+  /** The logger. */
+  public static final Logger LOGGER = 
LoggerFactory.getLogger(PropertyManager.class);
+  /** The set of dirty maps. */
+  protected final Map dirtyMaps = new HashMap<>();
+  /** This manager namespace. */
+  protected final String namespace;
+  /** The property map store. */
+  protected final PropertyStore store;
+  /** A Jexl engine for convenience. */
+  static final JexlEngine JEXL;
+  static {
+JexlFeatures features = new JexlFeatures()
+.sideEffect(false)
+.sideEffectGlobal(false);
+JexlPermissions p = JexlPermissions.RESTRICTED
+.compose("org.apache.hadoop.hive.metastore.properties.*");
+JEXL = new JexlBuilder()
+.features(features)
+.permissions(p)
+.create();
+  }
+
+  /**
+   * The map of defined managers.
+   */
+  private static final Map> 
NSMANAGERS = new HashMap<>();
+
+  /**
+   * Declares a property manager class.
+   * @param ns the namespace
+   * @param pmClazz the property manager class
+   */
+  public static boolean declare(String ns, Class 
pmClazz) {
+try {
+  synchronized(NSMANAGERS) {
+Constructor ctor = NSMANAGERS.get(ns);
+if (ctor == null) {
+  ctor = pmClazz.getConstructor(String.class, PropertyStore.class);
+  NSMANAGERS.put(ns, ctor);
+  return true;
+} else {
+  if (!Objects.equals(ctor.getDeclaringClass(), pmClazz)) {
+LOGGER.error("namespace 

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=860979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860979
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 08/May/23 10:16
Start Date: 08/May/23 10:16
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1187276410


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:
##
@@ -7536,9 +7524,41 @@ public List get_partitions_by_names(final 
String dbName, final String
 return ret;
   }
 
+  /**
+   * Creates an instance of property manager based on the (declared) namespace.
+   * @param ns the namespace
+   * @return the manager instance
+   * @throws TException
+   */
+  private PropertyManager getPropertyManager(String ns) throws MetaException, 
NoSuchObjectException {
+PropertyStore propertyStore = getMS().getPropertyStore();
+PropertyManager mgr = PropertyManager.create(ns, propertyStore);
+return mgr;
+  }
+  @Override
+  public PropertyGetResponse get_properties(PropertyGetRequest req) throws 
MetaException, NoSuchObjectException, TException {
+PropertyManager mgr = getPropertyManager(req.getNameSpace());
+Map selected = 
mgr.selectProperties(req.getMapPrefix(), req.getMapPredicate(), 
req.getMapSelection());

Review Comment:
   Should we convert the `PropertyException`, `JexlException` or others to 
`MetaException` here so that the client can tell the real cause?





Issue Time Tracking
---

Worklog Id: (was: 860979)
Time Spent: 15.5h  (was: 15h 20m)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> to. Any information related to a database, a table or the cluster - like 
> statistics for example or any operational data state or data (think rolling 
> backup) -  fall in this use-case.
> Typically, accommodating such a feature requires modifying the Metastore 
> database schema by adding or altering a table. It also usually implies 
> modifying the thrift APIs to expose such meta-data to consumers.
> The proposed feature wants to solve the persistence and query/transport for 
> these types of use-cases by exposing a 'key/(meta)value' store exposed as a 
> property system.
> HOW
> A property-value model is the simple and generic exposed API.
> To provision for several usage scenarios, the model entry point is a 
> 'namespace' that qualifies the feature-component property manager. For 
> example, 'stats' could be the namespace for all properties related to the 
> 'statistics' feature.
> The namespace identifies a manager that handles property-groups persisted as 
> property-maps. For instance, all statistics pertaining to a given table would 
> be collocated in the same property-group. As such, all properties (say number 
> of 'unique_values' per columns) for a given HMS table 'relation0' would all 
> be stored and persisted in the same property-map instance.
> Property-maps may be decorated by an (optional) schema that may declare the 
> name and value-type of allowed properties (and their optional default value). 
> Each property is addressed by a name, a path uniquely identifying the 
> property in a given property map.
> The manager also handles transforming property-map names to the property-map 
> keys used to persist them in the DB.
> The API provides inserting/updating properties in bulk transactionally. It 
> also provides selection/projection to help reduce the volume of exchange 
> between client/server; selection can use (JEXL expression) predicates to 
> filter maps.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27321) Mask HDFS_BYTES_READ/WRITTEN in orc_ppd_basic.q

2023-05-08 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27321.
---
Resolution: Fixed

Merged to master. Thanks [~abstractdog] for review.

> Mask HDFS_BYTES_READ/WRITTEN in orc_ppd_basic.q
> ---
>
> Key: HIVE-27321
> URL: https://issues.apache.org/jira/browse/HIVE-27321
> Project: Hive
>  Issue Type: Test
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HDFS_BYTES_READ/WRITTEN  depends on ORC file size and it can change in case 
> of an ORC lib upgrade but this value is not relevant in this test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27321) Mask HDFS_BYTES_READ/WRITTEN in orc_ppd_basic.q

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27321?focusedWorklogId=860974=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860974
 ]

ASF GitHub Bot logged work on HIVE-27321:
-

Author: ASF GitHub Bot
Created on: 08/May/23 10:04
Start Date: 08/May/23 10:04
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged PR #4295:
URL: https://github.com/apache/hive/pull/4295




Issue Time Tracking
---

Worklog Id: (was: 860974)
Time Spent: 50m  (was: 40m)

> Mask HDFS_BYTES_READ/WRITTEN in orc_ppd_basic.q
> ---
>
> Key: HIVE-27321
> URL: https://issues.apache.org/jira/browse/HIVE-27321
> Project: Hive
>  Issue Type: Test
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HDFS_BYTES_READ/WRITTEN  depends on ORC file size and it can change in case 
> of an ORC lib upgrade but this value is not relevant in this test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27163) Column stats are not getting published after an insert query into an external table with custom location

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27163?focusedWorklogId=860958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860958
 ]

ASF GitHub Bot logged work on HIVE-27163:
-

Author: ASF GitHub Bot
Created on: 08/May/23 09:30
Start Date: 08/May/23 09:30
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4228:
URL: https://github.com/apache/hive/pull/4228#issuecomment-1538054543

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4228)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4228=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4228=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4228=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=CODE_SMELL)
 [17 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4228=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4228=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4228=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 860958)
Time Spent: 4h 10m  (was: 4h)

> Column stats are not getting published after an insert query into an external 
> table with custom location
> 
>
> Key: HIVE-27163
> URL: https://issues.apache.org/jira/browse/HIVE-27163
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Test case details are below
> *test.q*
> {noformat}
> set hive.stats.column.autogather=true;
> set hive.stats.autogather=true;
> dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test;
> create external table test_custom(age int, name string) stored as orc 
> location '/tmp/test';
> insert into test_custom select 1, 'test';
> desc formatted test_custom age;{noformat}
> *test.q.out*
>  
>  
> {noformat}
>  A masked pattern was here 
> PREHOOK: type: CREATETABLE
>  A masked pattern was here 
> PREHOOK: Output: database:default
> PREHOOK: Output: default@test_custom
>  A masked pattern was here 
> POSTHOOK: type: CREATETABLE
>  A masked pattern was here 
> POSTHOOK: Output: database:default
> POSTHOOK: Output: default@test_custom
> PREHOOK: query: insert into test_custom select 1, 'test'
> PREHOOK: type: QUERY
> 

[jira] [Assigned] (HIVE-27244) Iceberg: Implement LOAD data for unpartitioned table via Append API

2023-05-08 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-27244:
---

Assignee: Ayush Saxena  (was: Ramesh Kumar Thangarajan)

> Iceberg: Implement LOAD data for unpartitioned table via Append API
> ---
>
> Key: HIVE-27244
> URL: https://issues.apache.org/jira/browse/HIVE-27244
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> Use Append API for Iceberg Load data command, Same as migration use case



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27163) Column stats are not getting published after an insert query into an external table with custom location

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27163?focusedWorklogId=860937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860937
 ]

ASF GitHub Bot logged work on HIVE-27163:
-

Author: ASF GitHub Bot
Created on: 08/May/23 08:08
Start Date: 08/May/23 08:08
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4228:
URL: https://github.com/apache/hive/pull/4228#discussion_r1187154337


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableDesc.java:
##
@@ -921,14 +925,23 @@ public Table toTable(HiveConf conf) throws HiveException {
 // When replicating the statistics for a table will be obtained from the 
source. Do not
 // reset it on replica.
 if (replicationSpec == null || !replicationSpec.isInReplicationScope()) {
-  if (!this.isCTAS && (tbl.getPath() == null || (!isExternal() && 
tbl.isEmpty( {
-if (!tbl.isPartitioned() && 
conf.getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
-  
StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(),
-  MetaStoreUtils.getColumnNames(tbl.getCols()), 
StatsSetupConst.TRUE);
-}
-  } else {
-
StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(), 
null,
-StatsSetupConst.FALSE);
+  // Remove COLUMN_STATS_ACCURATE=true from table's parameter, let the HMS 
determine if
+  // there is need to add column stats dependent on the table's location.
+  
StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(), 
null,
+  StatsSetupConst.FALSE);
+  if (!this.isCTAS && !tbl.isPartitioned() && !tbl.isTemporary() &&
+  conf.getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
+// Put the flag into the dictionary in order not to pollute the table,
+// ObjectDictionary is meant to convey repeatitive messages.
+ObjectDictionary dictionary = tbl.getTTable().isSetDictionary() ?
+tbl.getTTable().getDictionary() : new ObjectDictionary();
+List buffers = new ArrayList<>();
+String statsSetup = 
StatsSetupConst.ColumnStatsSetup.getStatsSetupAsString(true,
+tbl.isIcebergTable() ? "metadata" : null, // Skip metadata 
directory for Iceberg table

Review Comment:
   The `HiveStorageHandler` does not have such API for this purpose, and I'm a 
little nervous to introduce a new one in `HiveStorageHandler`.
   Removed the `isIcebergTable()` from the `Table` class, use 
`storageHandler.isMetadataTableSupported()`(only support Iceberg tables 
currently) instead.





Issue Time Tracking
---

Worklog Id: (was: 860937)
Time Spent: 4h  (was: 3h 50m)

> Column stats are not getting published after an insert query into an external 
> table with custom location
> 
>
> Key: HIVE-27163
> URL: https://issues.apache.org/jira/browse/HIVE-27163
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Test case details are below
> *test.q*
> {noformat}
> set hive.stats.column.autogather=true;
> set hive.stats.autogather=true;
> dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test;
> create external table test_custom(age int, name string) stored as orc 
> location '/tmp/test';
> insert into test_custom select 1, 'test';
> desc formatted test_custom age;{noformat}
> *test.q.out*
>  
>  
> {noformat}
>  A masked pattern was here 
> PREHOOK: type: CREATETABLE
>  A masked pattern was here 
> PREHOOK: Output: database:default
> PREHOOK: Output: default@test_custom
>  A masked pattern was here 
> POSTHOOK: type: CREATETABLE
>  A masked pattern was here 
> POSTHOOK: Output: database:default
> POSTHOOK: Output: default@test_custom
> PREHOOK: query: insert into test_custom select 1, 'test'
> PREHOOK: type: QUERY
> PREHOOK: Input: _dummy_database@_dummy_table
> PREHOOK: Output: default@test_custom
> POSTHOOK: query: insert into test_custom select 1, 'test'
> POSTHOOK: type: QUERY
> POSTHOOK: Input: _dummy_database@_dummy_table
> POSTHOOK: Output: default@test_custom
> POSTHOOK: Lineage: test_custom.age SIMPLE []
> POSTHOOK: Lineage: test_custom.name SIMPLE []
> PREHOOK: query: desc formatted test_custom age
> PREHOOK: type: DESCTABLE
> PREHOOK: Input: default@test_custom
> POSTHOOK: query: desc formatted test_custom age
> POSTHOOK: type: DESCTABLE
> POSTHOOK: Input: default@test_custom
> col_name                age
> data_type               int
> min
> max
> num_nulls
> 

[jira] [Resolved] (HIVE-27320) Mask total size in materialized_view_create_acid.q.out

2023-05-08 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27320.
---
Resolution: Fixed

Merged to master. Thanks [~abstractdog] for review.

> Mask total size in materialized_view_create_acid.q.out
> --
>
> Key: HIVE-27320
> URL: https://issues.apache.org/jira/browse/HIVE-27320
> Project: Hive
>  Issue Type: Test
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Total size depends on ORC file size and it can change in case of an ORC lib 
> upgrade but this value is not relevant in this test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27320) Mask total size in materialized_view_create_acid.q.out

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27320?focusedWorklogId=860928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860928
 ]

ASF GitHub Bot logged work on HIVE-27320:
-

Author: ASF GitHub Bot
Created on: 08/May/23 06:44
Start Date: 08/May/23 06:44
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged PR #4294:
URL: https://github.com/apache/hive/pull/4294




Issue Time Tracking
---

Worklog Id: (was: 860928)
Time Spent: 0.5h  (was: 20m)

> Mask total size in materialized_view_create_acid.q.out
> --
>
> Key: HIVE-27320
> URL: https://issues.apache.org/jira/browse/HIVE-27320
> Project: Hive
>  Issue Type: Test
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Total size depends on ORC file size and it can change in case of an ORC lib 
> upgrade but this value is not relevant in this test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27319) HMS server should throw InvalidObjectException in get_partitions_by_names() when the table is missing/dropped

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27319?focusedWorklogId=860925=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860925
 ]

ASF GitHub Bot logged work on HIVE-27319:
-

Author: ASF GitHub Bot
Created on: 08/May/23 06:07
Start Date: 08/May/23 06:07
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4299:
URL: https://github.com/apache/hive/pull/4299#discussion_r1187057103


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ExceptionHandler.java:
##
@@ -127,7 +128,7 @@ public ExceptionHandler toMetaExceptionIfInstance(String 
message, Class... cl
*/
   public static void rethrowException(Exception e) throws TException {
 throw handleException(e)
-.throwIfInstance(MetaException.class, NoSuchObjectException.class)
+.throwIfInstance(MetaException.class, NoSuchObjectException.class, 
InvalidObjectException.class)

Review Comment:
   nit: This method is invoked elsewhere besides `get_partitions_by_names`, so 
the same problem would reoccur if some other methods doesn't specify 
throwing`InvalidObjectException`. 
   The original would throw MetaException instead.





Issue Time Tracking
---

Worklog Id: (was: 860925)
Time Spent: 40m  (was: 0.5h)

> HMS server should throw InvalidObjectException in get_partitions_by_names() 
> when the table is missing/dropped
> -
>
> Key: HIVE-27319
> URL: https://issues.apache.org/jira/browse/HIVE-27319
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When the table object is dropped by a concurrent thread, the 
> get_partitions_by_names_req() API is currently throwing a 
> TApplicationException to the client. Instead, the HMS server should propagate 
> the InvalidObjectException thrown by getTable() to the HMS client. By doing 
> this, other services using HMS client will understand the exception better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27321) Mask HDFS_BYTES_READ/WRITTEN in orc_ppd_basic.q

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27321?focusedWorklogId=860924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-860924
 ]

ASF GitHub Bot logged work on HIVE-27321:
-

Author: ASF GitHub Bot
Created on: 08/May/23 06:06
Start Date: 08/May/23 06:06
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4295:
URL: https://github.com/apache/hive/pull/4295#issuecomment-1537802199

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4295)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4295=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4295=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4295=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4295=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4295=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4295=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4295=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4295=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4295=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4295=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4295=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4295=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4295=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4295=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 860924)
Time Spent: 40m  (was: 0.5h)

> Mask HDFS_BYTES_READ/WRITTEN in orc_ppd_basic.q
> ---
>
> Key: HIVE-27321
> URL: https://issues.apache.org/jira/browse/HIVE-27321
> Project: Hive
>  Issue Type: Test
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HDFS_BYTES_READ/WRITTEN  depends on ORC file size and it can change in case 
> of an ORC lib upgrade but this value is not relevant in this test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)