date:20221205

[jira] [Work logged] (HIVE-26692) Check for the expected thrift version before compiling

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26692?focusedWorklogId=831308=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831308
 ]

ASF GitHub Bot logged work on HIVE-26692:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 07:55
Start Date: 06/Dec/22 07:55
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3820:
URL: https://github.com/apache/hive/pull/3820#issuecomment-1338922096

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3820)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3820=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3820=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3820=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3820=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3820=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831308)
Time Spent: 2h 40m  (was: 2.5h)

> Check for the expected thrift version before compiling
> --
>
> Key: HIVE-26692
> URL: https://issues.apache.org/jira/browse/HIVE-26692
> Project: Hive
>  Issue Type: Task
>  Components: Thrift API
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> At the moment we don't check for the thrift version before launching thrift, 
> the error messages are often cryptic upon mismatches.
> An explicit check with a clear error message would be nice, like what parquet 
> does: 
> [https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml#L247-L268]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26770) Make "end of loop" compaction logs appear more selectively

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26770?focusedWorklogId=831304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831304
 ]

ASF GitHub Bot logged work on HIVE-26770:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 07:50
Start Date: 06/Dec/22 07:50
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3832:
URL: https://github.com/apache/hive/pull/3832#discussion_r1040607921


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java:
##
@@ -61,6 +61,13 @@ public abstract class CompactorThread extends Thread 
implements Configurable {
   protected String hostName;
   protected String runtimeVersion;
 
+  //Time threshold for compactor thread log
+  //In milliseconds:
+  protected Integer MAX_WARN_LOG_TIME = 120; //20 min
+
+  protected long checkInterval;
+
+  public enum CompactorThreadType {INITIATOR, WORKER, CLEANER}
   @Override

Review Comment:
   new line + should it be public or package-private is enough?





Issue Time Tracking
---

Worklog Id: (was: 831304)
Time Spent: 5h 40m  (was: 5.5h)

> Make "end of loop" compaction logs appear more selectively
> --
>
> Key: HIVE-26770
> URL: https://issues.apache.org/jira/browse/HIVE-26770
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1
>Reporter: Akshat Mathur
>Assignee: Akshat Mathur
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Currently Initiator, Worker, and Cleaner threads log something like "finished 
> one loop" on INFO level.
> This is useful to figure out if one of these threads is taking too long to 
> finish a loop, but expensive in general.
>  
> Suggested Time: 20mins
> Logging this should be changed in the following way
>  # If loop finished within a predefined amount of time, level should be DEBUG 
> and message should look like: *Initiator loop took \{ellapsedTime} seconds to 
> finish.*
>  # If loop ran longer than this predefined amount, level should be WARN and 
> message should look like: *Possible Initiator slowdown, loop took 
> \{ellapsedTime} seconds to finish.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26770) Make "end of loop" compaction logs appear more selectively

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26770?focusedWorklogId=831300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831300
 ]

ASF GitHub Bot logged work on HIVE-26770:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 07:47
Start Date: 06/Dec/22 07:47
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3832:
URL: https://github.com/apache/hive/pull/3832#discussion_r1040605656


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java:
##
@@ -138,6 +142,7 @@ public void run() {
   @Override
   public void init(AtomicBoolean stop) throws Exception {
 super.init(stop);
+checkInterval = 0;

Review Comment:
   set it to 0 in the declaration





Issue Time Tracking
---

Worklog Id: (was: 831300)
Time Spent: 5.5h  (was: 5h 20m)

> Make "end of loop" compaction logs appear more selectively
> --
>
> Key: HIVE-26770
> URL: https://issues.apache.org/jira/browse/HIVE-26770
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1
>Reporter: Akshat Mathur
>Assignee: Akshat Mathur
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Currently Initiator, Worker, and Cleaner threads log something like "finished 
> one loop" on INFO level.
> This is useful to figure out if one of these threads is taking too long to 
> finish a loop, but expensive in general.
>  
> Suggested Time: 20mins
> Logging this should be changed in the following way
>  # If loop finished within a predefined amount of time, level should be DEBUG 
> and message should look like: *Initiator loop took \{ellapsedTime} seconds to 
> finish.*
>  # If loop ran longer than this predefined amount, level should be WARN and 
> message should look like: *Possible Initiator slowdown, loop took 
> \{ellapsedTime} seconds to finish.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-23891) Using UNION sql clause and speculative execution can cause file duplication in Tez

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23891?focusedWorklogId=831287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831287
 ]

ASF GitHub Bot logged work on HIVE-23891:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 07:31
Start Date: 06/Dec/22 07:31
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3836:
URL: https://github.com/apache/hive/pull/3836#issuecomment-1338903701

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3836)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3836=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3836=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3836=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3836=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3836=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3836=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3836=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3836=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3836=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3836=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3836=false=CODE_SMELL)
 [8 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3836=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3836=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3836=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831287)
Time Spent: 2h 50m  (was: 2h 40m)

> Using UNION sql clause and speculative execution can cause file duplication 
> in Tez
> --
>
> Key: HIVE-23891
> URL: https://issues.apache.org/jira/browse/HIVE-23891
> Project: Hive
>  Issue Type: Bug
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23891.1.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Hello, 
> the specific scenario when this can happen:
>  - the execution engine is Tez;
>  - speculative execution is on;
>  - the query inserts into a table and the last step is a UNION sql clause;
> The problem is that Tez creates an extra layer of subdirectories when there 
> is a UNION. Later, when deduplicating, Hive doesn't take that into account 
> and only deduplicates folders but not the files inside.
> So for a query like this:
> {code:sql}
> insert overwrite table union_all
> select * from union_first_part
> union all
> select * from union_second_part;
> {code}
> The folder structure afterwards will be like this (a possible example):
> {code:java}
> .../union_all/HIVE_UNION_SUBDIR_1/00_0
> .../union_all/HIVE_UNION_SUBDIR_1/00_1
> .../union_all/HIVE_UNION_SUBDIR_2/00_1
> {code}
> The attached patch increases the number of folder levels that Hive

[jira] [Assigned] (HIVE-26810) Replace HiveFilterSetOpTransposeRule onMatch method with Calcite's built-in implementation

2022-12-05 Thread Alessandro Solimando (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26810:
---


> Replace HiveFilterSetOpTransposeRule onMatch method with Calcite's built-in 
> implementation
> --
>
> Key: HIVE-26810
> URL: https://issues.apache.org/jira/browse/HIVE-26810
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> After HIVE-26762, the _onMatch_ method is now the same as in the Calcite 
> implementation, we can drop the Hive's override in order to avoid the risk of 
> them drifting away again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26221) Add histogram-based column statistics

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=831286=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831286
 ]

ASF GitHub Bot logged work on HIVE-26221:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 07:28
Start Date: 06/Dec/22 07:28
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #3137:
URL: https://github.com/apache/hive/pull/3137#discussion_r1040592119


##
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/StatisticsTestUtils.java:
##
@@ -109,4 +135,116 @@ public static HyperLogLog createHll(String... values) {
 }
 return hll;
   }
+
+  /**
+   * Creates an HLL object initialized with the given values.
+   * @param values the values to be added
+   * @return an HLL object initialized with the given values.
+   */
+  public static HyperLogLog createHll(double... values) {
+HyperLogLog hll = HyperLogLog.builder().build();
+Arrays.stream(values).forEach(hll::addDouble);
+return hll;
+  }
+
+  /**
+   * Creates a KLL object initialized with the given values.
+   * @param values the values to be added
+   * @return a KLL object initialized with the given values.
+   */
+  public static KllFloatsSketch createKll(float... values) {
+KllFloatsSketch kll = new KllFloatsSketch();
+for (float value : values) {
+  kll.update(value);
+}
+return kll;
+  }
+
+  /**
+   * Creates a KLL object initialized with the given values.
+   * @param values the values to be added
+   * @return a KLL object initialized with the given values.
+   */
+  public static KllFloatsSketch createKll(double... values) {
+KllFloatsSketch kll = new KllFloatsSketch();
+for (double value : values) {
+  kll.update(Double.valueOf(value).floatValue());
+}
+return kll;
+  }
+
+  /**
+   * Creates a KLL object initialized with the given values.
+   * @param values the values to be added
+   * @return a KLL object initialized with the given values.
+   */
+  public static KllFloatsSketch createKll(long... values) {
+KllFloatsSketch kll = new KllFloatsSketch();
+for (long value : values) {
+  kll.update(value);
+}
+return kll;
+  }
+
+  /**
+   * Checks if expected and computed statistics data are equal.
+   * @param expected expected statistics data
+   * @param computed computed statistics data
+   */
+  public static void assertEqualStatistics(ColumnStatisticsData expected, 
ColumnStatisticsData computed) {
+if (expected.getSetField() != computed.getSetField()) {
+  throw new IllegalArgumentException("Expected data is of type " + 
expected.getSetField()
+  + " while computed data is of type " + computed.getSetField());
+}
+
+Class dataClass = null;
+switch (expected.getSetField()) {
+case DATE_STATS:
+  dataClass = DateColumnStatsData.class;
+  break;
+case LONG_STATS:
+  dataClass = LongColumnStatsData.class;
+  break;
+case DOUBLE_STATS:
+  dataClass = DoubleColumnStatsData.class;
+  break;
+case DECIMAL_STATS:
+  dataClass = DecimalColumnStatsData.class;
+  break;
+case TIMESTAMP_STATS:
+  dataClass = TimestampColumnStatsData.class;
+  break;
+default:
+  // it's an unsupported class for KLL, no special treatment needed
+  Assert.assertEquals(expected, computed);
+  return;
+}
+assertEqualStatistics(expected, computed, dataClass);
+  }
+
+  private static  void assertEqualStatistics(

Review Comment:
   This function only compares the `histogram`,  and does not tell much truth 
when either `computedHasHistograms` or `expectedHasHistograms` is false. Cloud 
we compare the `ColumnStatisticsData` by `Assert.assertEquals(expected, 
computed);` as we did in Line 219?





Issue Time Tracking
---

Worklog Id: (was: 831286)
Time Spent: 4.5h  (was: 4h 20m)

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
>

[jira] [Work logged] (HIVE-26221) Add histogram-based column statistics

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=831280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831280
 ]

ASF GitHub Bot logged work on HIVE-26221:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 07:10
Start Date: 06/Dec/22 07:10
Worklog Time Spent: 10m 
  Work Description: asolimando commented on code in PR #3137:
URL: https://github.com/apache/hive/pull/3137#discussion_r1040578295


##
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/columnstats/ColStatsBuilder.java:
##
@@ -103,6 +105,12 @@ public ColStatsBuilder hll(String... values) {
 return this;
   }
 
+  public ColStatsBuilder hll(double... values) {
+HyperLogLog hll = StatisticsTestUtils.createHll(values);
+this.bitVector = hll.serialize();

Review Comment:
   No, HLL is different from KLL, it's used for counting distinct values, the 
method naming is different because HLL has a in-house implementation in Hive 
while KLL comes from Apache datasketches library





Issue Time Tracking
---

Worklog Id: (was: 831280)
Time Spent: 4h 20m  (was: 4h 10m)

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable trade-off between memory footprint and accuracy
> Hive already integrates [KLL data 
> sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. 
> Datasketches are small, stateful programs that process massive data-streams 
> and can provide approximate answers, with mathematical guarantees, to 
> computationally difficult queries orders-of-magnitude faster than 
> traditional, exact methods.
> We propose to use KLL, and more specifically the cumulative distribution 
> function (CDF), as the underlying data structure for our histogram statistics.
> The current proposal targets numeric data types (float, integer and numeric 
> families) and temporal data types (date and timestamp).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26221) Add histogram-based column statistics

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=831279=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831279
 ]

ASF GitHub Bot logged work on HIVE-26221:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 07:07
Start Date: 06/Dec/22 07:07
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #3137:
URL: https://github.com/apache/hive/pull/3137#discussion_r1040575953


##
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/columnstats/ColStatsBuilder.java:
##
@@ -103,6 +105,12 @@ public ColStatsBuilder hll(String... values) {
 return this;
   }
 
+  public ColStatsBuilder hll(double... values) {
+HyperLogLog hll = StatisticsTestUtils.createHll(values);
+this.bitVector = hll.serialize();

Review Comment:
   This is meant to `this.kll = kll.toByteArray();`?





Issue Time Tracking
---

Worklog Id: (was: 831279)
Time Spent: 4h 10m  (was: 4h)

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable trade-off between memory footprint and accuracy
> Hive already integrates [KLL data 
> sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. 
> Datasketches are small, stateful programs that process massive data-streams 
> and can provide approximate answers, with mathematical guarantees, to 
> computationally difficult queries orders-of-magnitude faster than 
> traditional, exact methods.
> We propose to use KLL, and more specifically the cumulative distribution 
> function (CDF), as the underlying data structure for our histogram statistics.
> The current proposal targets numeric data types (float, integer and numeric 
> families) and temporal data types (date and timestamp).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26799) Make authorizations on custom UDFs involved in tables/view configurable.

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26799?focusedWorklogId=831272=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831272
 ]

ASF GitHub Bot logged work on HIVE-26799:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 06:49
Start Date: 06/Dec/22 06:49
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #3821:
URL: https://github.com/apache/hive/pull/3821#discussion_r1040564092


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -12550,6 +12550,20 @@ private ParseResult 
rewriteASTWithMaskAndFilter(TableMask tableMask, ASTNode ast
 }
   }
 
+  void gatherUserSuppliedFunctions(ASTNode ast) {
+int tokenType = ast.getToken().getType();
+if (tokenType == HiveParser.TOK_FUNCTION ||
+tokenType == HiveParser.TOK_FUNCTIONDI ||
+tokenType == HiveParser.TOK_FUNCTIONSTAR) {
+  if (ast.getChild(0).getType() == HiveParser.Identifier) {
+
this.userSuppliedFunctions.add(unescapeIdentifier(ast.getChild(0).getText()));

Review Comment:
   Could we add the lower-cased function names into `userSuppliedFunctions`? I 
wonder there are some queries like: `select MIN(a) from table_example`.
   Does it handle cast properly? for example: `select cast(a as int) from 
`table_example`.





Issue Time Tracking
---

Worklog Id: (was: 831272)
Time Spent: 1h 40m  (was: 1.5h)

> Make authorizations on custom UDFs involved in tables/view configurable.
> 
>
> Key: HIVE-26799
> URL: https://issues.apache.org/jira/browse/HIVE-26799
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, Security
>Affects Versions: 4.0.0-alpha-2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When Hive is using Ranger/Sentry as an authorization service, consider the 
> following scenario.
> {code:java}
> > create table test_udf(st string);   // privileged user operation 
> > create function Udf_UPPER as 'openkb.hive.udf.MyUpper' using jar 
> > 'hdfs:///tmp/MyUpperUDF-1.0.0.jar'; // privileged user operation
> > create view v1_udf as select udf_upper(st) from test_udf; // privileged 
> > user operation
> //unprivileged user test_user is given select permissions on view v1_udf
> > select * from v1_udf;  {code}
> It is expected that test_user needs to have select privilege on v1_udf and 
> select permissions on udf_upper custom UDF in order to do a select query on 
> view. 
> This patch introduces a configuration 
> "hive.security.authorization.functions.in.view"=false which disables 
> authorization on views associated with views/tables during the select query. 
> In this mode, only UDFs explicitly stated in the query would still be 
> authorized as it is currently.
> The reason for making these custom UDFs associated with view/tables 
> authorizable is that currently, test_user will need to be granted select 
> permissions on the custom udf. and the test_user can use this UDF and query 
> against any other table, which is a security concern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-23891) Using UNION sql clause and speculative execution can cause file duplication in Tez

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23891?focusedWorklogId=831269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831269
 ]

ASF GitHub Bot logged work on HIVE-23891:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 06:39
Start Date: 06/Dec/22 06:39
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request, #3836:
URL: https://github.com/apache/hive/pull/3836

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 831269)
Time Spent: 2h 40m  (was: 2.5h)

> Using UNION sql clause and speculative execution can cause file duplication 
> in Tez
> --
>
> Key: HIVE-23891
> URL: https://issues.apache.org/jira/browse/HIVE-23891
> Project: Hive
>  Issue Type: Bug
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23891.1.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Hello, 
> the specific scenario when this can happen:
>  - the execution engine is Tez;
>  - speculative execution is on;
>  - the query inserts into a table and the last step is a UNION sql clause;
> The problem is that Tez creates an extra layer of subdirectories when there 
> is a UNION. Later, when deduplicating, Hive doesn't take that into account 
> and only deduplicates folders but not the files inside.
> So for a query like this:
> {code:sql}
> insert overwrite table union_all
> select * from union_first_part
> union all
> select * from union_second_part;
> {code}
> The folder structure afterwards will be like this (a possible example):
> {code:java}
> .../union_all/HIVE_UNION_SUBDIR_1/00_0
> .../union_all/HIVE_UNION_SUBDIR_1/00_1
> .../union_all/HIVE_UNION_SUBDIR_2/00_1
> {code}
> The attached patch increases the number of folder levels that Hive will check 
> recursively for duplicates when we have a UNION in Tez.
> Feel free to reach out if you have any questions :).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26569) Support renewal and recreation of LLAP_TOKENs

2022-12-05 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-26569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26569:

Summary: Support renewal and recreation of LLAP_TOKENs  (was: 
LlapTokenRenewer: TezAM (LlapTaskCommunicator) to renew LLAP_TOKENs)

> Support renewal and recreation of LLAP_TOKENs
> -
>
> Key: HIVE-26569
> URL: https://issues.apache.org/jira/browse/HIVE-26569
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26788) Update stats of table/partition after minor compaction using noscan operation

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26788?focusedWorklogId=831260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831260
 ]

ASF GitHub Bot logged work on HIVE-26788:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 04:56
Start Date: 06/Dec/22 04:56
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3812:
URL: https://github.com/apache/hive/pull/3812#discussion_r1040462553


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/StatsUpdater.java:
##
@@ -52,10 +52,6 @@ public final class StatsUpdater {
  */
 public void gatherStats(CompactionInfo ci, HiveConf conf, String userName, 
String compactionQueueName) {
 try {
-if (!ci.isMajorCompaction()) {

Review Comment:
   I thought this was a problem but I did some investigation. There is a 
if-else statement which decides whether a MR or Tez task needs to be created. 
For the `NOSCAN` operation, it does not generate a MR or a Tez task. 
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java#L88-L108
 (If basic stats is ok to be used, then MR or Tez task is not created)
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java#L128-L134
 (If no scan is used then the MapRedTask is removed from the plan).
   
   AFAIK Tez sessions are created only when a Tez task is executed.
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java#L207-L208
   





Issue Time Tracking
---

Worklog Id: (was: 831260)
Time Spent: 1h 40m  (was: 1.5h)

> Update stats of table/partition after minor compaction using noscan operation
> -
>
> Key: HIVE-26788
> URL: https://issues.apache.org/jira/browse/HIVE-26788
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, statistics are not updated for minor compaction since minor 
> compaction performs little updates on the statistics (such as number of files 
> in table/partition & total size of the table/partition). It is better to 
> utilize NOSCAN operation  for minor compaction since NOSCAN operations 
> performs faster update of statistics and updates the relevant fields such as 
> number of files & total sizes of the table/partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26788) Update stats of table/partition after minor compaction using noscan operation

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26788?focusedWorklogId=831259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831259
 ]

ASF GitHub Bot logged work on HIVE-26788:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 04:53
Start Date: 06/Dec/22 04:53
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3812:
URL: https://github.com/apache/hive/pull/3812#discussion_r1040462553


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/StatsUpdater.java:
##
@@ -52,10 +52,6 @@ public final class StatsUpdater {
  */
 public void gatherStats(CompactionInfo ci, HiveConf conf, String userName, 
String compactionQueueName) {
 try {
-if (!ci.isMajorCompaction()) {

Review Comment:
   I thought this was a problem but I did some investigation. There is a 
if-else statement which decides whether a MR or Tez task needs to be created. 
For the `NOSCAN` operation, it does not generate a MR or a Tez task. 
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java#L88-L108
 (If basic stats is ok to be used, then MR or Tez task is not created)
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java#L128-L134
 (If no scan is used then the MapRedTask is removed from the plan).
   
   AFAIK Tez sessions are created only when a Tez task is executed.
   





Issue Time Tracking
---

Worklog Id: (was: 831259)
Time Spent: 1.5h  (was: 1h 20m)

> Update stats of table/partition after minor compaction using noscan operation
> -
>
> Key: HIVE-26788
> URL: https://issues.apache.org/jira/browse/HIVE-26788
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, statistics are not updated for minor compaction since minor 
> compaction performs little updates on the statistics (such as number of files 
> in table/partition & total size of the table/partition). It is better to 
> utilize NOSCAN operation  for minor compaction since NOSCAN operations 
> performs faster update of statistics and updates the relevant fields such as 
> number of files & total sizes of the table/partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26788) Update stats of table/partition after minor compaction using noscan operation

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26788?focusedWorklogId=831258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831258
 ]

ASF GitHub Bot logged work on HIVE-26788:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 04:51
Start Date: 06/Dec/22 04:51
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3812:
URL: https://github.com/apache/hive/pull/3812#discussion_r1040462553


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/StatsUpdater.java:
##
@@ -52,10 +52,6 @@ public final class StatsUpdater {
  */
 public void gatherStats(CompactionInfo ci, HiveConf conf, String userName, 
String compactionQueueName) {
 try {
-if (!ci.isMajorCompaction()) {

Review Comment:
   I thought this is a problem but I did some investigation. There is a if-else 
statement which decides whether a MR or Tez task needs to be created. For the 
`NOSCAN` operation, it does not generate a MR or a Tez task. 
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java#L88-L108
 (If basic stats is ok to be used, then MR or Tez task is not created)
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java#L128-L134
 (If no scan is used then the MapRedTask is removed from the plan).
   
   AFAIK Tez sessions are created only when a Tez task is executed.
   





Issue Time Tracking
---

Worklog Id: (was: 831258)
Time Spent: 1h 20m  (was: 1h 10m)

> Update stats of table/partition after minor compaction using noscan operation
> -
>
> Key: HIVE-26788
> URL: https://issues.apache.org/jira/browse/HIVE-26788
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, statistics are not updated for minor compaction since minor 
> compaction performs little updates on the statistics (such as number of files 
> in table/partition & total size of the table/partition). It is better to 
> utilize NOSCAN operation  for minor compaction since NOSCAN operations 
> performs faster update of statistics and updates the relevant fields such as 
> number of files & total sizes of the table/partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26788) Update stats of table/partition after minor compaction using noscan operation

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26788?focusedWorklogId=831257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831257
 ]

ASF GitHub Bot logged work on HIVE-26788:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 04:51
Start Date: 06/Dec/22 04:51
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3812:
URL: https://github.com/apache/hive/pull/3812#discussion_r1040462553


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/StatsUpdater.java:
##
@@ -52,10 +52,6 @@ public final class StatsUpdater {
  */
 public void gatherStats(CompactionInfo ci, HiveConf conf, String userName, 
String compactionQueueName) {
 try {
-if (!ci.isMajorCompaction()) {

Review Comment:
   I believe this is a problem but I did some investigation. There is a if-else 
statement which decides whether a MR or Tez task needs to be created. For the 
`NOSCAN` operation, it does not generate a MR or a Tez task. 
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java#L88-L108
 (If basic stats is ok to be used, then MR or Tez task is not created)
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java#L128-L134
 (If no scan is used then the MapRedTask is removed from the plan).
   
   AFAIK Tez sessions are created only when a Tez task is executed.
   





Issue Time Tracking
---

Worklog Id: (was: 831257)
Time Spent: 1h 10m  (was: 1h)

> Update stats of table/partition after minor compaction using noscan operation
> -
>
> Key: HIVE-26788
> URL: https://issues.apache.org/jira/browse/HIVE-26788
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, statistics are not updated for minor compaction since minor 
> compaction performs little updates on the statistics (such as number of files 
> in table/partition & total size of the table/partition). It is better to 
> utilize NOSCAN operation  for minor compaction since NOSCAN operations 
> performs faster update of statistics and updates the relevant fields such as 
> number of files & total sizes of the table/partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26770) Make "end of loop" compaction logs appear more selectively

2022-12-05 Thread Akshat Mathur (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643658#comment-17643658
 ] 

Akshat Mathur commented on HIVE-26770:
--

Test passed.

Had to close the old PR and create a new one

> Make "end of loop" compaction logs appear more selectively
> --
>
> Key: HIVE-26770
> URL: https://issues.apache.org/jira/browse/HIVE-26770
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1
>Reporter: Akshat Mathur
>Assignee: Akshat Mathur
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently Initiator, Worker, and Cleaner threads log something like "finished 
> one loop" on INFO level.
> This is useful to figure out if one of these threads is taking too long to 
> finish a loop, but expensive in general.
>  
> Suggested Time: 20mins
> Logging this should be changed in the following way
>  # If loop finished within a predefined amount of time, level should be DEBUG 
> and message should look like: *Initiator loop took \{ellapsedTime} seconds to 
> finish.*
>  # If loop ran longer than this predefined amount, level should be WARN and 
> message should look like: *Possible Initiator slowdown, loop took 
> \{ellapsedTime} seconds to finish.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-26770) Make "end of loop" compaction logs appear more selectively

2022-12-05 Thread Akshat Mathur (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643124#comment-17643124
 ] 

Akshat Mathur edited comment on HIVE-26770 at 12/6/22 4:42 AM:
---

Due to timeout the test are failing, blocking merge


was (Author: JIRAUSER298271):
Due to timeout the test are failing, blocking merge

> Make "end of loop" compaction logs appear more selectively
> --
>
> Key: HIVE-26770
> URL: https://issues.apache.org/jira/browse/HIVE-26770
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1
>Reporter: Akshat Mathur
>Assignee: Akshat Mathur
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently Initiator, Worker, and Cleaner threads log something like "finished 
> one loop" on INFO level.
> This is useful to figure out if one of these threads is taking too long to 
> finish a loop, but expensive in general.
>  
> Suggested Time: 20mins
> Logging this should be changed in the following way
>  # If loop finished within a predefined amount of time, level should be DEBUG 
> and message should look like: *Initiator loop took \{ellapsedTime} seconds to 
> finish.*
>  # If loop ran longer than this predefined amount, level should be WARN and 
> message should look like: *Possible Initiator slowdown, loop took 
> \{ellapsedTime} seconds to finish.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-05 Thread Akshat Mathur (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643656#comment-17643656
 ] 

Akshat Mathur commented on HIVE-26806:
--

[~zabetak] Closing PR-3803 and opening a new one worked thanks.

Run for new PR: 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-3832/1/pipeline/
 

 

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26799) Make authorizations on custom UDFs involved in tables/view configurable.

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26799?focusedWorklogId=831255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831255
 ]

ASF GitHub Bot logged work on HIVE-26799:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 03:57
Start Date: 06/Dec/22 03:57
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3821:
URL: https://github.com/apache/hive/pull/3821#issuecomment-1338706082

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3821)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3821=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3821=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3821=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3821=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3821=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3821=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3821=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3821=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3821=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3821=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3821=false=CODE_SMELL)
 [1 Code 
Smell](https://sonarcloud.io/project/issues?id=apache_hive=3821=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3821=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3821=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831255)
Time Spent: 1.5h  (was: 1h 20m)

> Make authorizations on custom UDFs involved in tables/view configurable.
> 
>
> Key: HIVE-26799
> URL: https://issues.apache.org/jira/browse/HIVE-26799
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, Security
>Affects Versions: 4.0.0-alpha-2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When Hive is using Ranger/Sentry as an authorization service, consider the 
> following scenario.
> {code:java}
> > create table test_udf(st string);   // privileged user operation 
> > create function Udf_UPPER as 'openkb.hive.udf.MyUpper' using jar 
> > 'hdfs:///tmp/MyUpperUDF-1.0.0.jar'; // privileged user operation
> > create view v1_udf as select udf_upper(st) from test_udf; // privileged 
> > user operation
> //unprivileged user test_user is given select permissions on view v1_udf
> > select * from v1_udf;  {code}
> It is expected that test_user needs to have select privilege on v1_udf and 
> select permissions on udf_upper custom UDF in order to do a select query on 
> view. 
> This patch introduces a configuration 
> "hive.security.authorization.functions.in.view"=false which disables 
> authorization on views associated with views/tables

[jira] [Work logged] (HIVE-23559) Optimise Hive::moveAcidFiles for cloud storage

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23559?focusedWorklogId=831250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831250
 ]

ASF GitHub Bot logged work on HIVE-23559:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 02:35
Start Date: 06/Dec/22 02:35
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3795:
URL: https://github.com/apache/hive/pull/3795#issuecomment-1338645658

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3795)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3795=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3795=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3795=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3795=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3795=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3795=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3795=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3795=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3795=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3795=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3795=false=CODE_SMELL)
 [7 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3795=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3795=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3795=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831250)
Time Spent: 50m  (was: 40m)

> Optimise Hive::moveAcidFiles for cloud storage
> --
>
> Key: HIVE-23559
> URL: https://issues.apache.org/jira/browse/HIVE-23559
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4752]
> It ends up transferring DELTA, DELETE_DELTA, BASE prefixes sequentially from 
> staging to final location.
> This causes delays even with simple updates statements, which updates smaller 
> number of records in cloud storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-25327) Mapjoins in HiveServer2 fail when jmxremote is used

2022-12-05 Thread hansonhe (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643627#comment-17643627
 ] 

hansonhe commented on HIVE-25327:
-

The same problem happened to me.
My Environment: hive-3.1.2 hadoop-3.1.4

> Mapjoins in HiveServer2 fail when jmxremote is used
> ---
>
> Key: HIVE-25327
> URL: https://issues.apache.org/jira/browse/HIVE-25327
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.2
> Environment: apache hadoop 3.1.3
> apache hive 3.1.2
> java version 1.8.0_282
> OS:RedHat 8.2
>  
>Reporter: louiechen
>Assignee: loushang
>Priority: Major
>
> I also encountered the same problem, although this problem has been closed in 
> the previous version, tracking the source code of the current version, there 
> are also corrections, but the problem remains
> The following is the main content of the previous issue[HIVE-11369]:
> having hive.auto.convert.join set to true works in the CLI with no issue, but 
> fails in HiveServer2 when jmx options are passed to the service on startup. 
> This (in hive-env.sh) is enough to make it fail:
> {noformat}
> -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.port=8009
> {noformat}
> As soon as I remove the line, it works properly. I have *no*idea...
>  Here's the log from the service:
> {noformat}
> 2015-07-24 17:19:27,457 INFO  [HiveServer2-Handler-Pool: Thread-22]: 
> ql.Driver (SessionState.java:printInfo(912)) - Query ID = 
> hive_20150724171919_aaa88a89-dc6d-490b-821c-4eec6d4c0421
> 2015-07-24 17:19:27,457 INFO  [HiveServer2-Handler-Pool: Thread-22]: 
> ql.Driver (SessionState.java:printInfo(912)) - Total jobs = 1
> 2015-07-24 17:19:27,465 INFO  [HiveServer2-Handler-Pool: Thread-22]: 
> ql.Driver (Driver.java:launchTask(1638)) - Starting task 
> [Stage-4:MAPREDLOCAL] in serial mode
> 2015-07-24 17:19:27,467 INFO  [HiveServer2-Handler-Pool: Thread-22]: 
> mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(159)) - Generating 
> plan file 
> file:/tmp/hive/8932c206-5420-4b6f-9f1f-5f1706f30df8/hive_2015-07-24_17-19-26_552_5082133674120283907-1/-local-10005/plan.xml
> 2015-07-24 17:19:27,625 WARN  [HiveServer2-Handler-Pool: Thread-22]: 
> conf.HiveConf (HiveConf.java:initialize(2620)) - HiveConf of name 
> hive.files.umask.value does not exist
> 2015-07-24 17:19:27,708 INFO  [HiveServer2-Handler-Pool: Thread-22]: 
> mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(288)) - Executing: 
> /usr/lib/hadoop/bin/hadoop jar 
> /usr/lib/hive/lib/hive-common-1.1.0-cdh5.4.3.jar 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver -localtask -plan 
> file:/tmp/hive/8932c206-5420-4b6f-9f1f-5f1706f30df8/hive_2015-07-24_17-19-26_552_5082133674120283907-1/-local-10005/plan.xml
>-jobconffile 
> file:/tmp/hive/8932c206-5420-4b6f-9f1f-5f1706f30df8/hive_2015-07-24_17-19-26_552_5082133674120283907-1/-local-10006/jobconf.xml
> 2015-07-24 17:19:28,499 ERROR [HiveServer2-Handler-Pool: Thread-22]: 
> exec.Task (SessionState.java:printError(921)) - Execution failed with exit 
> status: 1
> 2015-07-24 17:19:28,500 ERROR [HiveServer2-Handler-Pool: Thread-22]: 
> exec.Task (SessionState.java:printError(921)) - Obtaining error information
> 2015-07-24 17:19:28,500 ERROR [HiveServer2-Handler-Pool: Thread-22]: 
> exec.Task (SessionState.java:printError(921)) -
> Task failed!
> Task ID:
>   Stage-4
> Logs:
> 2015-07-24 17:19:28,501 ERROR [HiveServer2-Handler-Pool: Thread-22]: 
> exec.Task (SessionState.java:printError(921)) - 
> /tmp/hiveserver2_manual/hive-server2.log
> 2015-07-24 17:19:28,501 ERROR [HiveServer2-Handler-Pool: Thread-22]: 
> mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(308)) - Execution 
> failed with exit status: 1
> 2015-07-24 17:19:28,518 ERROR [HiveServer2-Handler-Pool: Thread-22]: 
> ql.Driver (SessionState.java:printError(921)) - FAILED: Execution Error, 
> return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
> 2015-07-24 17:19:28,599 WARN  [HiveServer2-Handler-Pool: Thread-22]: 
> security.UserGroupInformation (UserGroupInformation.java:doAs(1674)) - 
> PriviledgedActionException as:hive (auth:SIMPLE) 
> cause:org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
> 2015-07-24 17:19:28,600 WARN  [HiveServer2-Handler-Pool: Thread-22]: 
> thrift.ThriftCLIService (ThriftCLIService.java:ExecuteStatement(496)) - Error 
> executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
>   at

[jira] [Work logged] (HIVE-26799) Make authorizations on custom UDFs involved in tables/view configurable.

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26799?focusedWorklogId=831246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831246
 ]

ASF GitHub Bot logged work on HIVE-26799:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 02:20
Start Date: 06/Dec/22 02:20
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #3821:
URL: https://github.com/apache/hive/pull/3821#discussion_r1040352141


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -12550,6 +12550,21 @@ private ParseResult 
rewriteASTWithMaskAndFilter(TableMask tableMask, ASTNode ast
 }
   }
 
+  void gatherUserSuppliedFunctions(ASTNode ast) {
+int tokenType = ast.getToken().getType();
+if (tokenType == HiveParser.TOK_FUNCTION ||
+tokenType == HiveParser.TOK_FUNCTIONDI ||
+tokenType == HiveParser.TOK_FUNCTIONSTAR) {
+  if (ast.getChild(0).getType() == HiveParser.Identifier) {
+// maybe user supplied
+this.userSuppliedFunctions.add(ast.getChild(0).getText());

Review Comment:
   Ack





Issue Time Tracking
---

Worklog Id: (was: 831246)
Time Spent: 1h 20m  (was: 1h 10m)

> Make authorizations on custom UDFs involved in tables/view configurable.
> 
>
> Key: HIVE-26799
> URL: https://issues.apache.org/jira/browse/HIVE-26799
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, Security
>Affects Versions: 4.0.0-alpha-2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When Hive is using Ranger/Sentry as an authorization service, consider the 
> following scenario.
> {code:java}
> > create table test_udf(st string);   // privileged user operation 
> > create function Udf_UPPER as 'openkb.hive.udf.MyUpper' using jar 
> > 'hdfs:///tmp/MyUpperUDF-1.0.0.jar'; // privileged user operation
> > create view v1_udf as select udf_upper(st) from test_udf; // privileged 
> > user operation
> //unprivileged user test_user is given select permissions on view v1_udf
> > select * from v1_udf;  {code}
> It is expected that test_user needs to have select privilege on v1_udf and 
> select permissions on udf_upper custom UDF in order to do a select query on 
> view. 
> This patch introduces a configuration 
> "hive.security.authorization.functions.in.view"=false which disables 
> authorization on views associated with views/tables during the select query. 
> In this mode, only UDFs explicitly stated in the query would still be 
> authorized as it is currently.
> The reason for making these custom UDFs associated with view/tables 
> authorizable is that currently, test_user will need to be granted select 
> permissions on the custom udf. and the test_user can use this UDF and query 
> against any other table, which is a security concern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26758) Allow use scratchdir for staging final job

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26758?focusedWorklogId=831231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831231
 ]

ASF GitHub Bot logged work on HIVE-26758:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 00:33
Start Date: 06/Dec/22 00:33
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3831:
URL: https://github.com/apache/hive/pull/3831#issuecomment-1338488053

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3831)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3831=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3831=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3831=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=CODE_SMELL)
 [2 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3831=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3831=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831231)
Time Spent: 3h 50m  (was: 3h 40m)

> Allow use scratchdir for staging final job
> --
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.
> HIVE-15215 mentioned the possibility of using scratch for staging when write 
> to S3 but it was long time ago and no activity.
>  
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26809) Upgrade ORC to 1.8.0

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26809?focusedWorklogId=831229=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831229
 ]

ASF GitHub Bot logged work on HIVE-26809:
-

Author: ASF GitHub Bot
Created on: 06/Dec/22 00:21
Start Date: 06/Dec/22 00:21
Worklog Time Spent: 10m 
  Work Description: TuroczyX commented on PR #3833:
URL: https://github.com/apache/hive/pull/3833#issuecomment-1338477492

   like it :)




Issue Time Tracking
---

Worklog Id: (was: 831229)
Time Spent: 0.5h  (was: 20m)

> Upgrade ORC to 1.8.0
> 
>
> Key: HIVE-26809
> URL: https://issues.apache.org/jira/browse/HIVE-26809
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26758) Allow use scratchdir for staging final job

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26758?focusedWorklogId=831226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831226
 ]

ASF GitHub Bot logged work on HIVE-26758:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 23:39
Start Date: 05/Dec/22 23:39
Worklog Time Spent: 10m 
  Work Description: yigress commented on PR #3831:
URL: https://github.com/apache/hive/pull/3831#issuecomment-1338364244

   thanks @sunchao for the review! addressed comments




Issue Time Tracking
---

Worklog Id: (was: 831226)
Time Spent: 3h 40m  (was: 3.5h)

> Allow use scratchdir for staging final job
> --
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.
> HIVE-15215 mentioned the possibility of using scratch for staging when write 
> to S3 but it was long time ago and no activity.
>  
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26758) Allow use scratchdir for staging final job

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26758?focusedWorklogId=831223=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831223
 ]

ASF GitHub Bot logged work on HIVE-26758:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 23:23
Start Date: 05/Dec/22 23:23
Worklog Time Spent: 10m 
  Work Description: sunchao commented on code in PR #3831:
URL: https://github.com/apache/hive/pull/3831#discussion_r1040208683


##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java:
##
@@ -1971,13 +1971,19 @@ public static Path createMoveTask(Task currTask, 
boolean chDir,
* 2. INSERT operation on full ACID table
*/
   if (!isMmTable && !isDirectInsert) {
-// generate the temporary file
-// it must be on the same file system as the current destination
 Context baseCtx = parseCtx.getContext();
 
-// Create the required temporary file in the HDFS location if the 
destination
-// path of the FileSinkOperator table is a blobstore path.
-Path tmpDir = 
baseCtx.getTempDirForFinalJobPath(fileSinkDesc.getDestPath());
+// Choose location of required temporary file
+Path tmpDir = null;
+if (hconf.getBoolVar(ConfVars.HIVE_USE_SCRATCHDIR_FOR_STAGING)) {
+  tmpDir = 
baseCtx.getTempDirForInterimJobPath(fileSinkDesc.getDestPath());
+} else {
+  tmpDir = 
baseCtx.getTempDirForFinalJobPath(fileSinkDesc.getDestPath());
+}
+DynamicPartitionCtx dpCtx = fileSinkDesc.getDynPartCtx();
+if (dpCtx != null && dpCtx.getSPPath() != null) {
+tmpDir = new Path(tmpDir, dpCtx.getSPPath());

Review Comment:
   nit: 2 space indentation



##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -5629,6 +5629,10 @@ public static enum ConfVars {
 "This is a performance optimization that forces the final 
FileSinkOperator to write to the blobstore.\n" +
 "See HIVE-15121 for details."),
 
+HIVE_USE_SCRATCHDIR_FOR_STAGING("hive.use.scratchdir.for.staging", false,
+"Use ${hive.exec.scratchdir} for query results instead of 
${hive.exec.stagingdir}.\n" +
+"This stages query results in ${hive.exec.scratchdir} before move 
to final destination."),

Review Comment:
   nit: move -> moving



##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -2608,8 +2608,8 @@ private Partition loadPartitionInternal(Path loadPath, 
Table tbl, Map Allow use scratchdir for staging final job
> --
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.
> HIVE-15215 mentioned the possibility of using scratch for staging when write 
> to S3 but it was long time ago and no activity.
>  
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26809) Upgrade ORC to 1.8.0

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26809?focusedWorklogId=831219=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831219
 ]

ASF GitHub Bot logged work on HIVE-26809:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 23:06
Start Date: 05/Dec/22 23:06
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3833:
URL: https://github.com/apache/hive/pull/3833#issuecomment-1338304065

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3833)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3833=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3833=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3833=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3833=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3833=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3833=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3833=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3833=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3833=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3833=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3833=false=CODE_SMELL)
 [1 Code 
Smell](https://sonarcloud.io/project/issues?id=apache_hive=3833=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3833=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3833=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831219)
Time Spent: 20m  (was: 10m)

> Upgrade ORC to 1.8.0
> 
>
> Key: HIVE-26809
> URL: https://issues.apache.org/jira/browse/HIVE-26809
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26809) Upgrade ORC to 1.8.0

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26809:
--
Labels: pull-request-available  (was: )

> Upgrade ORC to 1.8.0
> 
>
> Key: HIVE-26809
> URL: https://issues.apache.org/jira/browse/HIVE-26809
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26809) Upgrade ORC to 1.8.0

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26809?focusedWorklogId=831207=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831207
 ]

ASF GitHub Bot logged work on HIVE-26809:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 22:11
Start Date: 05/Dec/22 22:11
Worklog Time Spent: 10m 
  Work Description: difin opened a new pull request, #3833:
URL: https://github.com/apache/hive/pull/3833

   
   
   ### What changes were proposed in this pull request?
   
   Upgrading ORC version to currently latest version 1.8.0.
   This PR is based on the changes proposed in unfinished PR 
https://github.com/apache/hive/pull/2853 (ticket 
https://issues.apache.org/jira/browse/HIVE-25497 - Bump ORC to 1.7.2) with 
changes on top of it which enabled CI to pass.
   Changes done in HIVE-25497:
   "LLAP EncodedTreeReaderFactory is implementing its own TreeReaderFactory 

Issue Time Tracking
---

Worklog Id: (was: 831207)
Remaining Estimate: 0h
Time Spent: 10m

> Upgrade ORC to 1.8.0
> 
>
> Key: HIVE-26809
> URL: https://issues.apache.org/jira/browse/HIVE-26809
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26809) Upgrade ORC to 1.8.0

2022-12-05 Thread Dmitriy Fingerman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Fingerman updated HIVE-26809:
-
Affects Version/s: 4.0.0

> Upgrade ORC to 1.8.0
> 
>
> Key: HIVE-26809
> URL: https://issues.apache.org/jira/browse/HIVE-26809
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26809) Upgrade ORC to 1.8.0

2022-12-05 Thread Dmitriy Fingerman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Fingerman reassigned HIVE-26809:



> Upgrade ORC to 1.8.0
> 
>
> Key: HIVE-26809
> URL: https://issues.apache.org/jira/browse/HIVE-26809
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-23559) Optimise Hive::moveAcidFiles for cloud storage

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23559?focusedWorklogId=831201=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831201
 ]

ASF GitHub Bot logged work on HIVE-23559:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 21:51
Start Date: 05/Dec/22 21:51
Worklog Time Spent: 10m 
  Work Description: ramesh0201 commented on code in PR #3795:
URL: https://github.com/apache/hive/pull/3795#discussion_r1040134949


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -5208,55 +5208,94 @@ private static void moveAcidFiles(String deltaFileType, 
PathFilter pathFilter, F
 }
 LOG.debug("Acid move found " + deltaStats.length + " " + deltaFileType + " 
files");
 
+List> futures = new LinkedList<>();
+final ExecutorService pool = 
conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 25) > 0 ?
+
Executors.newFixedThreadPool(conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname,
 25),
+new 
ThreadFactoryBuilder().setDaemon(true).setNameFormat("Move-Acid-Files-Thread-%d").build())
 : null;
+
 for (FileStatus deltaStat : deltaStats) {
-  Path deltaPath = deltaStat.getPath();
-  // Create the delta directory.  Don't worry if it already exists,
-  // as that likely means another task got to it first.  Then move each of 
the buckets.
-  // it would be more efficient to try to move the delta with it's buckets 
but that is
-  // harder to make race condition proof.
-  Path deltaDest = new Path(dst, deltaPath.getName());
-  try {
-if (!createdDeltaDirs.contains(deltaDest)) {
-  try {
-if(fs.mkdirs(deltaDest)) {
-  try {
-
fs.rename(AcidUtils.OrcAcidVersion.getVersionFilePath(deltaStat.getPath()),
-AcidUtils.OrcAcidVersion.getVersionFilePath(deltaDest));
-  } catch (FileNotFoundException fnf) {
-// There might be no side file. Skip in this case.
-  }
+
+  if (null == pool) {
+moveAcidFilesForDelta(deltaFileType, fs, dst, createdDeltaDirs, 
newFiles, deltaStat);
+  } else {
+futures.add(pool.submit(new Callable() {
+  @Override
+  public Void call() throws HiveException {
+try {
+  moveAcidFilesForDelta(deltaFileType, fs, dst, createdDeltaDirs, 
newFiles, deltaStat);
+} catch (Exception e) {
+  final String poolMsg =
+  "Unable to move source " + deltaStat.getPath().getName() 
+ " to destination " + dst.getName();
+  throw getHiveException(e, poolMsg);
 }
-createdDeltaDirs.add(deltaDest);
-  } catch (IOException swallowIt) {
-// Don't worry about this, as it likely just means it's already 
been created.
-LOG.info("Unable to create " + deltaFileType + " directory " + 
deltaDest +
-", assuming it already exists: " + swallowIt.getMessage());
+return null;
   }
+}));
+  }
+}
+
+if (null != pool) {
+  pool.shutdown();
+  for (Future future : futures) {
+try {
+  future.get();
+} catch (Exception e) {
+  throw handlePoolException(pool, e);
 }
-FileStatus[] bucketStats = fs.listStatus(deltaPath, 
AcidUtils.bucketFileFilter);
-LOG.debug("Acid move found " + bucketStats.length + " bucket files");
-for (FileStatus bucketStat : bucketStats) {
-  Path bucketSrc = bucketStat.getPath();
-  Path bucketDest = new Path(deltaDest, bucketSrc.getName());
-  final String msg = "Unable to move source " + bucketSrc + " to 
destination " +
-  bucketDest;
-  LOG.info("Moving bucket " + bucketSrc.toUri().toString() + " to " +
-  bucketDest.toUri().toString());
-  try {
-fs.rename(bucketSrc, bucketDest);
-if (newFiles != null) {
-  newFiles.add(bucketDest);
+  }
+}
+  }
+
+  private static void moveAcidFilesForDelta(String deltaFileType, FileSystem 
fs,
+Path dst, Set 
createdDeltaDirs,
+List newFiles, FileStatus 
deltaStat) throws HiveException {
+
+Path deltaPath = deltaStat.getPath();
+// Create the delta directory.  Don't worry if it already exists,
+// as that likely means another task got to it first.  Then move each of 
the buckets.
+// it would be more efficient to try to move the delta with it's buckets 
but that is
+// harder to make race condition proof.
+Path deltaDest = new Path(dst, deltaPath.getName());
+try {
+  if (!createdDeltaDirs.contains(deltaDest)) {
+try {
+  if(fs.mkdirs(deltaDest)) {
+try {
+

[jira] [Work logged] (HIVE-23559) Optimise Hive::moveAcidFiles for cloud storage

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23559?focusedWorklogId=831200=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831200
 ]

ASF GitHub Bot logged work on HIVE-23559:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 21:50
Start Date: 05/Dec/22 21:50
Worklog Time Spent: 10m 
  Work Description: ramesh0201 commented on code in PR #3795:
URL: https://github.com/apache/hive/pull/3795#discussion_r1040133418


##
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##
@@ -5208,55 +5208,94 @@ private static void moveAcidFiles(String deltaFileType, 
PathFilter pathFilter, F
 }
 LOG.debug("Acid move found " + deltaStats.length + " " + deltaFileType + " 
files");
 
+List> futures = new LinkedList<>();
+final ExecutorService pool = 
conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 25) > 0 ?
+
Executors.newFixedThreadPool(conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname,
 25),
+new 
ThreadFactoryBuilder().setDaemon(true).setNameFormat("Move-Acid-Files-Thread-%d").build())
 : null;
+
 for (FileStatus deltaStat : deltaStats) {
-  Path deltaPath = deltaStat.getPath();
-  // Create the delta directory.  Don't worry if it already exists,
-  // as that likely means another task got to it first.  Then move each of 
the buckets.
-  // it would be more efficient to try to move the delta with it's buckets 
but that is
-  // harder to make race condition proof.
-  Path deltaDest = new Path(dst, deltaPath.getName());
-  try {
-if (!createdDeltaDirs.contains(deltaDest)) {
-  try {
-if(fs.mkdirs(deltaDest)) {
-  try {
-
fs.rename(AcidUtils.OrcAcidVersion.getVersionFilePath(deltaStat.getPath()),
-AcidUtils.OrcAcidVersion.getVersionFilePath(deltaDest));
-  } catch (FileNotFoundException fnf) {
-// There might be no side file. Skip in this case.
-  }
+
+  if (null == pool) {
+moveAcidFilesForDelta(deltaFileType, fs, dst, createdDeltaDirs, 
newFiles, deltaStat);
+  } else {
+futures.add(pool.submit(new Callable() {
+  @Override
+  public Void call() throws HiveException {
+try {
+  moveAcidFilesForDelta(deltaFileType, fs, dst, createdDeltaDirs, 
newFiles, deltaStat);
+} catch (Exception e) {
+  final String poolMsg =
+  "Unable to move source " + deltaStat.getPath().getName() 
+ " to destination " + dst.getName();
+  throw getHiveException(e, poolMsg);
 }
-createdDeltaDirs.add(deltaDest);
-  } catch (IOException swallowIt) {
-// Don't worry about this, as it likely just means it's already 
been created.
-LOG.info("Unable to create " + deltaFileType + " directory " + 
deltaDest +
-", assuming it already exists: " + swallowIt.getMessage());
+return null;
   }
+}));
+  }
+}

Review Comment:
   I think we need to handle the thread interruption. We might need to cancel 
the running futures and and interrupt the current thread.





Issue Time Tracking
---

Worklog Id: (was: 831200)
Time Spent: 0.5h  (was: 20m)

> Optimise Hive::moveAcidFiles for cloud storage
> --
>
> Key: HIVE-23559
> URL: https://issues.apache.org/jira/browse/HIVE-23559
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4752]
> It ends up transferring DELTA, DELETE_DELTA, BASE prefixes sequentially from 
> staging to final location.
> This causes delays even with simple updates statements, which updates smaller 
> number of records in cloud storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-05 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643560#comment-17643560
 ] 

Stamatis Zampetakis commented on HIVE-26806:


[~asolimando] For documentation purposes can you elaborate what happened after 
deleting all successful builds? What was the problem that you observed?

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26770) Make "end of loop" compaction logs appear more selectively

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26770?focusedWorklogId=831177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831177
 ]

ASF GitHub Bot logged work on HIVE-26770:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 20:49
Start Date: 05/Dec/22 20:49
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3832:
URL: https://github.com/apache/hive/pull/3832#issuecomment-1338144472

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3832)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3832=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3832=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3832=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3832=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3832=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3832=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3832=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3832=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3832=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3832=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3832=false=CODE_SMELL)
 [10 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3832=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3832=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3832=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831177)
Time Spent: 5h 20m  (was: 5h 10m)

> Make "end of loop" compaction logs appear more selectively
> --
>
> Key: HIVE-26770
> URL: https://issues.apache.org/jira/browse/HIVE-26770
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1
>Reporter: Akshat Mathur
>Assignee: Akshat Mathur
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently Initiator, Worker, and Cleaner threads log something like "finished 
> one loop" on INFO level.
> This is useful to figure out if one of these threads is taking too long to 
> finish a loop, but expensive in general.
>  
> Suggested Time: 20mins
> Logging this should be changed in the following way
>  # If loop finished within a predefined amount of time, level should be DEBUG 
> and message should look like: *Initiator loop took \{ellapsedTime} seconds to 
> finish.*
>  # If loop ran longer than this predefined amount, level should be WARN and 
> message should look like: *Possible Initiator slowdown, loop took 
> \{ellapsedTime} seconds to finish.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26758) Allow use scratchdir for staging final job

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26758?focusedWorklogId=831165=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831165
 ]

ASF GitHub Bot logged work on HIVE-26758:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 19:28
Start Date: 05/Dec/22 19:28
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3831:
URL: https://github.com/apache/hive/pull/3831#issuecomment-1338025065

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3831)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3831=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3831=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3831=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=CODE_SMELL)
 [2 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3831=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3831=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3831=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831165)
Time Spent: 3h 20m  (was: 3h 10m)

> Allow use scratchdir for staging final job
> --
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.
> HIVE-15215 mentioned the possibility of using scratch for staging when write 
> to S3 but it was long time ago and no activity.
>  
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule

2022-12-05 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-26762.
---
Resolution: Fixed

Merged to master. Thanks [~asolimando] for the patch.

> Remove operand pruning in HiveFilterSetOpTransposeRule
> --
>
> Key: HIVE-26762
> URL: https://issues.apache.org/jira/browse/HIVE-26762
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if 
> the newly pushed filter simplifies to FALSE (due to the predicates holding on 
> the input).
> If this is true and there is more than one UNION ALL operand, it gets pruned.
> After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
> produces rows"), this is possibly redundant and we could drop this feature 
> and let the other rules take care of the pruning.
> In such a case, it might be even possible to drop the Hive specific rule and 
> relies on the Calcite one (the difference is just the operand pruning at the 
> moment of writing), similarly to what HIVE-26642 did for 
> HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended 
> to tackle this in a separate ticket after verifying that is feasible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26762?focusedWorklogId=831161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831161
 ]

ASF GitHub Bot logged work on HIVE-26762:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 19:17
Start Date: 05/Dec/22 19:17
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged PR #3825:
URL: https://github.com/apache/hive/pull/3825




Issue Time Tracking
---

Worklog Id: (was: 831161)
Time Spent: 1h 10m  (was: 1h)

> Remove operand pruning in HiveFilterSetOpTransposeRule
> --
>
> Key: HIVE-26762
> URL: https://issues.apache.org/jira/browse/HIVE-26762
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if 
> the newly pushed filter simplifies to FALSE (due to the predicates holding on 
> the input).
> If this is true and there is more than one UNION ALL operand, it gets pruned.
> After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
> produces rows"), this is possibly redundant and we could drop this feature 
> and let the other rules take care of the pruning.
> In such a case, it might be even possible to drop the Hive specific rule and 
> relies on the Calcite one (the difference is just the operand pruning at the 
> moment of writing), similarly to what HIVE-26642 did for 
> HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended 
> to tackle this in a separate ticket after verifying that is feasible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26221) Add histogram-based column statistics

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=831158=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831158
 ]

ASF GitHub Bot logged work on HIVE-26221:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 19:08
Start Date: 05/Dec/22 19:08
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3137:
URL: https://github.com/apache/hive/pull/3137#issuecomment-1337993922

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3137)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3137=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3137=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3137=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=CODE_SMELL)
 [38 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3137=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3137=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831158)
Time Spent: 4h  (was: 3h 50m)

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable trade-off

[jira] [Work logged] (HIVE-26770) Make "end of loop" compaction logs appear more selectively

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26770?focusedWorklogId=831154=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831154
 ]

ASF GitHub Bot logged work on HIVE-26770:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 18:59
Start Date: 05/Dec/22 18:59
Worklog Time Spent: 10m 
  Work Description: akshat0395 closed pull request #3803: HIVE-26770: Make 
end of loop compaction logs appear more selectively and reduce code duplication
URL: https://github.com/apache/hive/pull/3803




Issue Time Tracking
---

Worklog Id: (was: 831154)
Time Spent: 5h  (was: 4h 50m)

> Make "end of loop" compaction logs appear more selectively
> --
>
> Key: HIVE-26770
> URL: https://issues.apache.org/jira/browse/HIVE-26770
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1
>Reporter: Akshat Mathur
>Assignee: Akshat Mathur
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Currently Initiator, Worker, and Cleaner threads log something like "finished 
> one loop" on INFO level.
> This is useful to figure out if one of these threads is taking too long to 
> finish a loop, but expensive in general.
>  
> Suggested Time: 20mins
> Logging this should be changed in the following way
>  # If loop finished within a predefined amount of time, level should be DEBUG 
> and message should look like: *Initiator loop took \{ellapsedTime} seconds to 
> finish.*
>  # If loop ran longer than this predefined amount, level should be WARN and 
> message should look like: *Possible Initiator slowdown, loop took 
> \{ellapsedTime} seconds to finish.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26770) Make "end of loop" compaction logs appear more selectively

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26770?focusedWorklogId=831155=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831155
 ]

ASF GitHub Bot logged work on HIVE-26770:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 18:59
Start Date: 05/Dec/22 18:59
Worklog Time Spent: 10m 
  Work Description: akshat0395 opened a new pull request, #3832:
URL: https://github.com/apache/hive/pull/3832

   
   
   ### What changes were proposed in this pull request?
   
   Make "end of loop" compaction logs appear more selectively and move 
duplicate code from Compactor threads to base class, more details can be found 
in the following ticket
   [HIVE-26770](https://issues.apache.org/jira/browse/HIVE-26770)
   
   ### Why are the changes needed?
   
   
   Improved logging for Compactor threads to reduce noise and share time based 
stats
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   Unit tests




Issue Time Tracking
---

Worklog Id: (was: 831155)
Time Spent: 5h 10m  (was: 5h)

> Make "end of loop" compaction logs appear more selectively
> --
>
> Key: HIVE-26770
> URL: https://issues.apache.org/jira/browse/HIVE-26770
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1
>Reporter: Akshat Mathur
>Assignee: Akshat Mathur
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Currently Initiator, Worker, and Cleaner threads log something like "finished 
> one loop" on INFO level.
> This is useful to figure out if one of these threads is taking too long to 
> finish a loop, but expensive in general.
>  
> Suggested Time: 20mins
> Logging this should be changed in the following way
>  # If loop finished within a predefined amount of time, level should be DEBUG 
> and message should look like: *Initiator loop took \{ellapsedTime} seconds to 
> finish.*
>  # If loop ran longer than this predefined amount, level should be WARN and 
> message should look like: *Possible Initiator slowdown, loop took 
> \{ellapsedTime} seconds to finish.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26758) Allow use scratchdir for staging final job

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26758?focusedWorklogId=831150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831150
 ]

ASF GitHub Bot logged work on HIVE-26758:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 18:37
Start Date: 05/Dec/22 18:37
Worklog Time Spent: 10m 
  Work Description: yigress opened a new pull request, #3831:
URL: https://github.com/apache/hive/pull/3831

   ### What changes were proposed in this pull request?
   
   1. add a hive configuration hive.use.scratchdir.for.staging
   
   2. for native table, no-mm, no-direct-insert, no-acid, change dynamic 
partition staging directory layout from
   ///
   to 
   ///
   
   3. when hive.use.scratchdir.for.staging=true, FileSinkOperator's dirName, 
DynamicContext's sourcePath change from
   /{hive.exec.stagingdir}
   to
   
   
   
   for example for query 
   insert into/overwrite table partition(year=2001, season) select...
   
   before the change, the FileSinkOperator conf has
   /year=2001/.staging_dir/season=xxx
   
   after the change, it has
   /.staging_dir/year=2001/season=xxx
   
   This change allow to swap  with another path such as  
, and the moveTask will move into 
   
   ### Why are the changes needed?
   
   In the S3 blobstorage optimization, HIVE-15121 / HIVE-17620 changed interim 
job path to use hive.exec.scracthdir, final job to use hive.exec.stagingdir. 
https://issues.apache.org/jira/browse/HIVE-15215 is open whether to use scratch 
for staging dir for S3. 
   
   However for blobstorage where 'rename' is slow and no encryption, it can 
help performance to use scratchdir to staging query results and use the 
MoveTask to copy to blobstorage. This is especially true when there is 
FileMerge task.
   This may also help cross-filesystem when user wants to use local cluster 
filesystem to staging query results and move the results to destination 
filesystem.
   
   
   ### Does this PR introduce _any_ user-facing change?
   This adds a new hive configuration.
   
   
   ### How was this patch tested?
   




Issue Time Tracking
---

Worklog Id: (was: 831150)
Time Spent: 3h 10m  (was: 3h)

> Allow use scratchdir for staging final job
> --
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.
> HIVE-15215 mentioned the possibility of using scratch for staging when write 
> to S3 but it was long time ago and no activity.
>  
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26758) Allow use scratchdir for staging final job

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26758?focusedWorklogId=831142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831142
 ]

ASF GitHub Bot logged work on HIVE-26758:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 18:21
Start Date: 05/Dec/22 18:21
Worklog Time Spent: 10m 
  Work Description: yigress closed pull request #3781: HIVE-26758: Allow 
use scratchdir for staging final job
URL: https://github.com/apache/hive/pull/3781




Issue Time Tracking
---

Worklog Id: (was: 831142)
Time Spent: 3h  (was: 2h 50m)

> Allow use scratchdir for staging final job
> --
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.
> HIVE-15215 mentioned the possibility of using scratch for staging when write 
> to S3 but it was long time ago and no activity.
>  
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26758) Allow use scratchdir for staging final job

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26758?focusedWorklogId=831141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831141
 ]

ASF GitHub Bot logged work on HIVE-26758:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 18:21
Start Date: 05/Dec/22 18:21
Worklog Time Spent: 10m 
  Work Description: yigress commented on PR #3781:
URL: https://github.com/apache/hive/pull/3781#issuecomment-1337898840

   close this one and to create a new PR due to testing issue 
https://issues.apache.org/jira/browse/HIVE-26806




Issue Time Tracking
---

Worklog Id: (was: 831141)
Time Spent: 2h 50m  (was: 2h 40m)

> Allow use scratchdir for staging final job
> --
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.
> HIVE-15215 mentioned the possibility of using scratch for staging when write 
> to S3 but it was long time ago and no activity.
>  
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26685) Improve Path name escaping / unescaping performance

2022-12-05 Thread Wei Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-26685:
-
  Assignee: James Petty
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Improve Path name escaping / unescaping performance
> ---
>
> Key: HIVE-26685
> URL: https://issues.apache.org/jira/browse/HIVE-26685
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: All Versions
>Reporter: James Petty
>Assignee: James Petty
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-26685.1.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When escaping / unescaping partition path part names, the existing logic 
> incurs significant avoidable overhead by copying each character sequentially 
> into a new StringBuilder even when no escaping/unescaping is necessary as 
> well as using String.format to escape characters inside of the inner loop.
>  
> The included patch to improve the performance of these operations refactors 
> two static method implementations, but requires no external API surface or 
> user-visible behavior changes. This change is applicable and portable to a 
> wide range of Hive versions from branch-0.6 onward when the initial method 
> implementations were added.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26685) Improve Path name escaping / unescaping performance

2022-12-05 Thread Wei Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-26685:
-
Fix Version/s: 4.0.0

> Improve Path name escaping / unescaping performance
> ---
>
> Key: HIVE-26685
> URL: https://issues.apache.org/jira/browse/HIVE-26685
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: All Versions
>Reporter: James Petty
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-26685.1.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When escaping / unescaping partition path part names, the existing logic 
> incurs significant avoidable overhead by copying each character sequentially 
> into a new StringBuilder even when no escaping/unescaping is necessary as 
> well as using String.format to escape characters inside of the inner loop.
>  
> The included patch to improve the performance of these operations refactors 
> two static method implementations, but requires no external API surface or 
> user-visible behavior changes. This change is applicable and portable to a 
> wide range of Hive versions from branch-0.6 onward when the initial method 
> implementations were added.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26685) Improve Path name escaping / unescaping performance

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26685?focusedWorklogId=831136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831136
 ]

ASF GitHub Bot logged work on HIVE-26685:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 18:04
Start Date: 05/Dec/22 18:04
Worklog Time Spent: 10m 
  Work Description: weiatwork commented on PR #3721:
URL: https://github.com/apache/hive/pull/3721#issuecomment-1337872037

   Thanks Zoltan! That helped a lot. Going to merge this PR.




Issue Time Tracking
---

Worklog Id: (was: 831136)
Time Spent: 1h  (was: 50m)

> Improve Path name escaping / unescaping performance
> ---
>
> Key: HIVE-26685
> URL: https://issues.apache.org/jira/browse/HIVE-26685
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: All Versions
>Reporter: James Petty
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-26685.1.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When escaping / unescaping partition path part names, the existing logic 
> incurs significant avoidable overhead by copying each character sequentially 
> into a new StringBuilder even when no escaping/unescaping is necessary as 
> well as using String.format to escape characters inside of the inner loop.
>  
> The included patch to improve the performance of these operations refactors 
> two static method implementations, but requires no external API surface or 
> user-visible behavior changes. This change is applicable and portable to a 
> wide range of Hive versions from branch-0.6 onward when the initial method 
> implementations were added.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26685) Improve Path name escaping / unescaping performance

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26685?focusedWorklogId=831137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831137
 ]

ASF GitHub Bot logged work on HIVE-26685:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 18:04
Start Date: 05/Dec/22 18:04
Worklog Time Spent: 10m 
  Work Description: weiatwork merged PR #3721:
URL: https://github.com/apache/hive/pull/3721




Issue Time Tracking
---

Worklog Id: (was: 831137)
Time Spent: 1h 10m  (was: 1h)

> Improve Path name escaping / unescaping performance
> ---
>
> Key: HIVE-26685
> URL: https://issues.apache.org/jira/browse/HIVE-26685
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: All Versions
>Reporter: James Petty
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-26685.1.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When escaping / unescaping partition path part names, the existing logic 
> incurs significant avoidable overhead by copying each character sequentially 
> into a new StringBuilder even when no escaping/unescaping is necessary as 
> well as using String.format to escape characters inside of the inner loop.
>  
> The included patch to improve the performance of these operations refactors 
> two static method implementations, but requires no external API surface or 
> user-visible behavior changes. This change is applicable and portable to a 
> wide range of Hive versions from branch-0.6 onward when the initial method 
> implementations were added.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26685) Improve Path name escaping / unescaping performance

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26685?focusedWorklogId=831129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831129
 ]

ASF GitHub Bot logged work on HIVE-26685:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 17:34
Start Date: 05/Dec/22 17:34
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on PR #3721:
URL: https://github.com/apache/hive/pull/3721#issuecomment-1337801725

   @weiatwork you should follow something like 
[this](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=142642065)
 to link your asf/github accounts together; 
   after you have access you should see this group: 
https://github.com/orgs/apache/teams/hive-committers




Issue Time Tracking
---

Worklog Id: (was: 831129)
Time Spent: 50m  (was: 40m)

> Improve Path name escaping / unescaping performance
> ---
>
> Key: HIVE-26685
> URL: https://issues.apache.org/jira/browse/HIVE-26685
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: All Versions
>Reporter: James Petty
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-26685.1.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When escaping / unescaping partition path part names, the existing logic 
> incurs significant avoidable overhead by copying each character sequentially 
> into a new StringBuilder even when no escaping/unescaping is necessary as 
> well as using String.format to escape characters inside of the inner loop.
>  
> The included patch to improve the performance of these operations refactors 
> two static method implementations, but requires no external API surface or 
> user-visible behavior changes. This change is applicable and portable to a 
> wide range of Hive versions from branch-0.6 onward when the initial method 
> implementations were added.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26685) Improve Path name escaping / unescaping performance

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26685?focusedWorklogId=831125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831125
 ]

ASF GitHub Bot logged work on HIVE-26685:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 17:22
Start Date: 05/Dec/22 17:22
Worklog Time Spent: 10m 
  Work Description: weiatwork commented on PR #3721:
URL: https://github.com/apache/hive/pull/3721#issuecomment-1337775392

   @kgyrtkirk I don't seem to have write access on Github, although I can still 
push to the ASF Git repo directly I believe. Anyway for me to get write access 
here (as a committer, so that I can merge people's PRs)?




Issue Time Tracking
---

Worklog Id: (was: 831125)
Time Spent: 40m  (was: 0.5h)

> Improve Path name escaping / unescaping performance
> ---
>
> Key: HIVE-26685
> URL: https://issues.apache.org/jira/browse/HIVE-26685
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: All Versions
>Reporter: James Petty
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-26685.1.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When escaping / unescaping partition path part names, the existing logic 
> incurs significant avoidable overhead by copying each character sequentially 
> into a new StringBuilder even when no escaping/unescaping is necessary as 
> well as using String.format to escape characters inside of the inner loop.
>  
> The included patch to improve the performance of these operations refactors 
> two static method implementations, but requires no external API surface or 
> user-visible behavior changes. This change is applicable and portable to a 
> wide range of Hive versions from branch-0.6 onward when the initial method 
> implementations were added.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-05 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643470#comment-17643470
 ] 

Alessandro Solimando edited comment on HIVE-26806 at 12/5/22 5:19 PM:
--

It looks that deleting all green past runs did not fix for 
[https://github.com/apache/hive/pull/3137].

That's a big deal since the PR is huge and review is in progress, I don't think 
I can close and re-open it.

Is there a way to tweak timeout for that PR alone [~zabetak]?

EDIT: there is, I am using "Replay" in Jenkins so I can change the JenkinsFile 
for the given run without any change in Git, hopefully that will do the trick.


was (Author: asolimando):
It looks that deleting all green past runs did not fix for 
[https://github.com/apache/hive/pull/3137].

That's a big deal since the PR is huge and review is in progress, I don't think 
I can close and re-open it.

Is there a way to tweak timeout for that PR alone [~zabetak]?

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-05 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643470#comment-17643470
 ] 

Alessandro Solimando commented on HIVE-26806:
-

It looks that deleting all green past runs did not fix for 
[https://github.com/apache/hive/pull/3137].

That's a big deal since the PR is huge and review is in progress, I don't think 
I can close and re-open it.

Is there a way to tweak timeout for that PR alone [~zabetak]?

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26683) Sum over window produces 0 when row contains null

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26683?focusedWorklogId=831115=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831115
 ]

ASF GitHub Bot logged work on HIVE-26683:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 16:58
Start Date: 05/Dec/22 16:58
Worklog Time Spent: 10m 
  Work Description: ramesh0201 merged PR #3800:
URL: https://github.com/apache/hive/pull/3800




Issue Time Tracking
---

Worklog Id: (was: 831115)
Time Spent: 2h 10m  (was: 2h)

> Sum over window produces 0 when row contains null
> -
>
> Key: HIVE-26683
> URL: https://issues.apache.org/jira/browse/HIVE-26683
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Ran the following sql:
>  
> {code:java}
> create table sum_window_test_small (id int, tinyint_col tinyint);
> insert into sum_window_test_small values (5,5), (10, NULL), (11,1);
> select id,
> tinyint_col,
> sum(tinyint_col) over (order by id nulls last rows between 1 following and 1 
> following)
> from sum_window_test_small order by id;
> select id,
> tinyint_col,
> sum(tinyint_col) over (order by id nulls last rows between current row and 1 
> following)
> from sum_window_test_small order by id;
> {code}
> The result is
> {code:java}
> +-+--+---+
> | id  | tinyint_col  | sum_window_0  |
> +-+--+---+
> | 5   | 5            | 0             |
> | 10  | NULL         | 1             |
> | 11  | 1            | NULL          |
> +-+--+---+{code}
> The first row should have the sum as NULL
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26762?focusedWorklogId=831109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831109
 ]

ASF GitHub Bot logged work on HIVE-26762:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 16:46
Start Date: 05/Dec/22 16:46
Worklog Time Spent: 10m 
  Work Description: asolimando commented on PR #3825:
URL: https://github.com/apache/hive/pull/3825#issuecomment-1337709244

   @kasakrisz, tests are green, can we merge this?




Issue Time Tracking
---

Worklog Id: (was: 831109)
Time Spent: 1h  (was: 50m)

> Remove operand pruning in HiveFilterSetOpTransposeRule
> --
>
> Key: HIVE-26762
> URL: https://issues.apache.org/jira/browse/HIVE-26762
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if 
> the newly pushed filter simplifies to FALSE (due to the predicates holding on 
> the input).
> If this is true and there is more than one UNION ALL operand, it gets pruned.
> After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
> produces rows"), this is possibly redundant and we could drop this feature 
> and let the other rules take care of the pruning.
> In such a case, it might be even possible to drop the Hive specific rule and 
> relies on the Calcite one (the difference is just the operand pruning at the 
> moment of writing), similarly to what HIVE-26642 did for 
> HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended 
> to tackle this in a separate ticket after verifying that is feasible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26221) Add histogram-based column statistics

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=831063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831063
 ]

ASF GitHub Bot logged work on HIVE-26221:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 15:10
Start Date: 05/Dec/22 15:10
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3137:
URL: https://github.com/apache/hive/pull/3137#issuecomment-1337544903

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3137)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3137=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3137=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3137=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=CODE_SMELL)
 [38 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3137=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3137=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3137=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831063)
Time Spent: 3h 50m  (was: 3h 40m)

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable

[jira] [Work started] (HIVE-26808) Port Iceberg catalog changes

2022-12-05 Thread Zsolt Miskolczi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26808 started by Zsolt Miskolczi.
--
> Port Iceberg catalog changes
> 
>
> Key: HIVE-26808
> URL: https://issues.apache.org/jira/browse/HIVE-26808
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Zsolt Miskolczi
>Assignee: Zsolt Miskolczi
>Priority: Major
>
> The last round of porting happened in 2022 april, there were a couple of 
> changes especially in HiveTableOperations worth porting into iceberg-catalog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26808) Port Iceberg catalog changes

2022-12-05 Thread Zsolt Miskolczi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Miskolczi reassigned HIVE-26808:
--

Assignee: Zsolt Miskolczi

> Port Iceberg catalog changes
> 
>
> Key: HIVE-26808
> URL: https://issues.apache.org/jira/browse/HIVE-26808
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Zsolt Miskolczi
>Assignee: Zsolt Miskolczi
>Priority: Major
>
> The last round of porting happened in 2022 april, there were a couple of 
> changes especially in HiveTableOperations worth porting into iceberg-catalog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26762?focusedWorklogId=831021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831021
 ]

ASF GitHub Bot logged work on HIVE-26762:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 13:50
Start Date: 05/Dec/22 13:50
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3825:
URL: https://github.com/apache/hive/pull/3825#issuecomment-1337399540

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3825)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3825=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3825=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3825=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3825=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3825=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3825=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3825=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3825=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3825=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3825=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3825=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3825=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3825=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3825=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831021)
Time Spent: 50m  (was: 40m)

> Remove operand pruning in HiveFilterSetOpTransposeRule
> --
>
> Key: HIVE-26762
> URL: https://issues.apache.org/jira/browse/HIVE-26762
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if 
> the newly pushed filter simplifies to FALSE (due to the predicates holding on 
> the input).
> If this is true and there is more than one UNION ALL operand, it gets pruned.
> After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
> produces rows"), this is possibly redundant and we could drop this feature 
> and let the other rules take care of the pruning.
> In such a case, it might be even possible to drop the Hive specific rule and 
> relies on the Calcite one (the difference is just the operand pruning at the 
> moment of writing), similarly to what HIVE-26642 did for 
> HiveReduceExpressionRule. Writing it here as a reminder, but it's recommended 
> to tackle this in a separate ticket after verifying that is feasible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-22589) Add storage support for ProlepticCalendar in ORC, Parquet, and Avro

2022-12-05 Thread Mengkai Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mengkai Liu updated HIVE-22589:
---
Description: 
Hive recently moved its processing to the proleptic calendar, which has created 
some issues for users who have dates before 1580 AD.

Hive最近将其处理移到了proleptic日历上，这给日期在公元1580年之前的用户带来了一些问题。

HIVE-22405 extended the column vectors for times & dates to encode which 
calendar they are using.

HIVE-22405扩展了时间和日期的列向量&，以编码它们所使用的日历。

This issue is to support proleptic calendar in ORC, Parquet, and Avro, when 
files are written/read by Hive. To preserve compatibility with other engines 
until they upgrade their readers, files will be written using hybrid calendar 
by default. Default behavior when files do not contain calendar information in 
their metadata is configurable.

当文件由Hive写入/读取时，此问题用于支持ORC、Parquet和Avro中的proleptic日历。为了在升级阅读器之前保持与其他引擎的兼容性，默认情况下将使用混合日历写入文件。当文件的元数据中不包含日历信息时，默认行为是可配置的。

  was:
Hive recently moved its processing to the proleptic calendar, which has created 
some issues for users who have dates before 1580 AD.

HIVE-22405 extended the column vectors for times & dates to encode which 
calendar they are using.

This issue is to support proleptic calendar in ORC, Parquet, and Avro, when 
files are written/read by Hive. To preserve compatibility with other engines 
until they upgrade their readers, files will be written using hybrid calendar 
by default. Default behavior when files do not contain calendar information in 
their metadata is configurable.


> Add storage support for ProlepticCalendar in ORC, Parquet, and Avro
> ---
>
> Key: HIVE-22589
> URL: https://issues.apache.org/jira/browse/HIVE-22589
> Project: Hive
>  Issue Type: Bug
>  Components: Avro, ORC, Parquet
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: compatibility, datetime
> Fix For: 4.0.0-alpha-1
>
> Attachments: HIVE-22589.01.patch, HIVE-22589.02.patch, 
> HIVE-22589.03.patch, HIVE-22589.04.patch, HIVE-22589.05.patch, 
> HIVE-22589.06.patch, HIVE-22589.07.patch, HIVE-22589.07.patch, 
> HIVE-22589.07.patch, HIVE-22589.07.patch, HIVE-22589.08.patch, 
> HIVE-22589.08.patch, HIVE-22589.patch, HIVE-22589.patch
>
>
> Hive recently moved its processing to the proleptic calendar, which has 
> created some issues for users who have dates before 1580 AD.
> Hive最近将其处理移到了proleptic日历上，这给日期在公元1580年之前的用户带来了一些问题。
> HIVE-22405 extended the column vectors for times & dates to encode which 
> calendar they are using.
> HIVE-22405扩展了时间和日期的列向量&，以编码它们所使用的日历。
> This issue is to support proleptic calendar in ORC, Parquet, and Avro, when 
> files are written/read by Hive. To preserve compatibility with other engines 
> until they upgrade their readers, files will be written using hybrid calendar 
> by default. Default behavior when files do not contain calendar information 
> in their metadata is configurable.
> 当文件由Hive写入/读取时，此问题用于支持ORC、Parquet和Avro中的proleptic日历。为了在升级阅读器之前保持与其他引擎的兼容性，默认情况下将使用混合日历写入文件。当文件的元数据中不包含日历信息时，默认行为是可配置的。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26794) Explore changing TxnHandler#connPoolMutex to NoPoolConnectionPool

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26794?focusedWorklogId=831019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831019
 ]

ASF GitHub Bot logged work on HIVE-26794:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 13:44
Start Date: 05/Dec/22 13:44
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3817:
URL: https://github.com/apache/hive/pull/3817#issuecomment-1337381330

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3817)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3817=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3817=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3817=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3817=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3817=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3817=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3817=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3817=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3817=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3817=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3817=false=CODE_SMELL)
 [2 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3817=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3817=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3817=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831019)
Time Spent: 1h 50m  (was: 1h 40m)

> Explore changing TxnHandler#connPoolMutex to NoPoolConnectionPool
> -
>
> Key: HIVE-26794
> URL: https://issues.apache.org/jira/browse/HIVE-26794
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Instead of creating a fixed size connection pool for TxnHandler#MutexAPI, the 
> pool can be assigned to NoPoolConnectionPool due to: 
>  * TxnHandler#MutexAPI is primarily designed to provide coarse-grained mutex 
> support to maintenance tasks running inside the Metastore, these tasks are 
> not user faced;
>  * A fixed size connection pool as same as the pool used in ObjectStore is a 
> waste for other non leaders in the warehouse; 
> The NoPoolConnectionPool provides connection on demand, and 
> TxnHandler#MutexAPI only uses getConnection method to fetch a connection from 
> the pool, so it's doable to change the pool to NoPoolConnectionPool, this 
> would make the HMS more scaleable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26221) Add histogram-based column statistics

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=831012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831012
 ]

ASF GitHub Bot logged work on HIVE-26221:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 13:05
Start Date: 05/Dec/22 13:05
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #3137:
URL: https://github.com/apache/hive/pull/3137#discussion_r1039572781


##
standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql:
##
@@ -768,6 +769,7 @@ CREATE TABLE IF NOT EXISTS `PART_COL_STATS` (
  `NUM_NULLS` bigint(20) NOT NULL,
  `NUM_DISTINCTS` bigint(20),
  `BIT_VECTOR` blob,
+ `HISTOGRAM` blob,

Review Comment:
   Should this column `HISTOGRAM` be also placed at the end?





Issue Time Tracking
---

Worklog Id: (was: 831012)
Time Spent: 3h 40m  (was: 3.5h)

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable trade-off between memory footprint and accuracy
> Hive already integrates [KLL data 
> sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. 
> Datasketches are small, stateful programs that process massive data-streams 
> and can provide approximate answers, with mathematical guarantees, to 
> computationally difficult queries orders-of-magnitude faster than 
> traditional, exact methods.
> We propose to use KLL, and more specifically the cumulative distribution 
> function (CDF), as the underlying data structure for our histogram statistics.
> The current proposal targets numeric data types (float, integer and numeric 
> families) and temporal data types (date and timestamp).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26692) Check for the expected thrift version before compiling

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26692?focusedWorklogId=831007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-831007
 ]

ASF GitHub Bot logged work on HIVE-26692:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 12:39
Start Date: 05/Dec/22 12:39
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3820:
URL: https://github.com/apache/hive/pull/3820#issuecomment-1337272737

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3820)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3820=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3820=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3820=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3820=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3820=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3820=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 831007)
Time Spent: 2.5h  (was: 2h 20m)

> Check for the expected thrift version before compiling
> --
>
> Key: HIVE-26692
> URL: https://issues.apache.org/jira/browse/HIVE-26692
> Project: Hive
>  Issue Type: Task
>  Components: Thrift API
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> At the moment we don't check for the thrift version before launching thrift, 
> the error messages are often cryptic upon mismatches.
> An explicit check with a clear error message would be nice, like what parquet 
> does: 
> [https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml#L247-L268]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26807) Investigate test running times before/after Zookeeper upgrade to 3.6.3

2022-12-05 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643313#comment-17643313
 ] 

Stamatis Zampetakis commented on HIVE-26807:


First of all, I extracted the test results in CSV files with the following 
structure (testname@classname@time).

{noformat}
zgrep -a " 
/tmp/master-1514.csv
zgrep -a " 
/tmp/master-1495.csv
{noformat}

To facilitate the analysis, I imported the CSV files into Postgres tables.

{code:sql}
CREATE TABLE master_1514 (testname VARCHAR, classname VARCHAR, time DECIMAL);
CREATE TABLE master_1495 (testname VARCHAR, classname VARCHAR, time DECIMAL);
COPY master_1514 FROM '/tmp/master-1514.csv' WITH DELIMITER '@';
COPY master_1495 FROM '/tmp/master-1495.csv' WITH DELIMITER '@';
{code}

The combination of testname, classname is not unique due to parameterized tests 
so we need an way to distinguish duplicate tests if we want to perform joins.
The trick is to use the ROW_NUMBER window function and assign incrementing 
integers to seemingly duplicate tests; it is not 100% precise but satisfactory 
for our needs.

{code:sql}
SELECT testname, classname, time, ROW_NUMBER() OVER (PARTITION BY testname, 
classname ORDER BY time) as rnum  FROM master_1514
{code}

I used the following query to get an overview of the situation before and after 
upgrade.

{code:sql}
SELECT COUNT(*), MAX(diff), MIN(diff), AVG(diff), sum(ntime)/60/60 as 
total_hours_1514 ,sum(otime)/60/60 as total_hours_1495 FROM
(SELECT n.testname, n.classname,n.time as ntime,o.time as otime, 
n.time-o.time as diff
FROM (SELECT testname, classname, time, ROW_NUMBER() OVER (PARTITION BY 
testname, classname ORDER BY time) as rnum  FROM master_1514) n
INNER JOIN (SELECT testname, classname, time, ROW_NUMBER() OVER 
(PARTITION BY testname, classname ORDER BY time) as rnum  FROM master_1495) o
ON n.testname=o.testname AND n.classname = o.classname AND 
n.rnum = o.rnum) compare
{code}

{noformat}
 count |   max   |   min   |  avg   |  total_hours_1514   |  
total_hours_1495   
---+-+-++-+-
 47530 | 130.627 | -58.070 | 0.14675221965074689670 | 25.43901639 | 
23.50147944
{noformat}

Observe that the total duration of the tests has increased by 8% (cumulative is 
~2h) which is noticeable but maybe not problematic at this stage. The tests are 
running in parallel splits so the general slowdown per split is in the order of 
a few minutes. Moreover, there are tests that are much slower (see max) but 
also tests that are much faster (see min) so there is nothing justifying a 
revert of the Zookeeper upgrade.

Nevertheless, it may be interesting to investigate further the tests who became 
much slower to see if there is anything that could be done to save some CI 
resources. I used the following query to find the 1000 tests that were 
seemingly affected the most after the upgrade.
{code:sql}
COPY (
SELECT n.testname, n.classname,n.time as B_1514,o.time as B_1495, 
n.time-o.time as diff
FROM (SELECT testname, classname, time, ROW_NUMBER() OVER (PARTITION BY 
testname, classname ORDER BY time) as rnum  FROM master_1514) n
INNER JOIN (SELECT testname, classname, time, ROW_NUMBER() OVER 
(PARTITION BY testname, classname ORDER BY time) as rnum  FROM master_1495) o
ON n.testname=o.testname AND n.classname = o.classname AND 
n.rnum = o.rnum
ORDER BY diff DESC
LIMIT 1000)
TO '/tmp/testtimes-diff-1514-1495.csv' WITH DELIMITER '@';
{code}
The results are attached in [^diff-1514-1495.csv].

> Investigate test running times before/after Zookeeper upgrade to 3.6.3
> --
>
> Key: HIVE-26807
> URL: https://issues.apache.org/jira/browse/HIVE-26807
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: diff-1514-1495.csv, test-results-1495.tgz, 
> test-results-1514.tgz
>
>
> During the investigation of the CI timing out (HIVE-2686) there were some 
> concerns that the Zookeeper (HIVE-26763) upgrade caused some significant 
> slowdown.
> The goal of this issue is to analyse the test results from the following 
> builds:
> * [Build-1495|http://ci.hive.apache.org/job/hive-precommit/job/master/1495/], 
> commit just before Zookeeper upgrade;
> * 
> [Builld-1514|http://ci.hive.apache.org/job/hive-precommit/job/master/1514/], 
> commit after Zookeeper upgrade with skipped tests (HIVE-26796) and CI 
> timeouts (HIVE-26806) fixed;
> and reason about the impact of the Zookeeper upgrade in test execution.



--
This message was sent by Atlassian Jira

[jira] [Updated] (HIVE-26807) Investigate test running times before/after Zookeeper upgrade to 3.6.3

2022-12-05 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-26807:
---
Attachment: diff-1514-1495.csv

> Investigate test running times before/after Zookeeper upgrade to 3.6.3
> --
>
> Key: HIVE-26807
> URL: https://issues.apache.org/jira/browse/HIVE-26807
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: diff-1514-1495.csv, test-results-1495.tgz, 
> test-results-1514.tgz
>
>
> During the investigation of the CI timing out (HIVE-2686) there were some 
> concerns that the Zookeeper (HIVE-26763) upgrade caused some significant 
> slowdown.
> The goal of this issue is to analyse the test results from the following 
> builds:
> * [Build-1495|http://ci.hive.apache.org/job/hive-precommit/job/master/1495/], 
> commit just before Zookeeper upgrade;
> * 
> [Builld-1514|http://ci.hive.apache.org/job/hive-precommit/job/master/1514/], 
> commit after Zookeeper upgrade with skipped tests (HIVE-26796) and CI 
> timeouts (HIVE-26806) fixed;
> and reason about the impact of the Zookeeper upgrade in test execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26807) Investigate test running times before/after Zookeeper upgrade to 3.6.3

2022-12-05 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643300#comment-17643300
 ] 

Stamatis Zampetakis commented on HIVE-26807:


I uploaded the test-results from the 1495 and 1514 build to the JIRA in case 
the results of builds are not available in the future.

> Investigate test running times before/after Zookeeper upgrade to 3.6.3
> --
>
> Key: HIVE-26807
> URL: https://issues.apache.org/jira/browse/HIVE-26807
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: test-results-1495.tgz, test-results-1514.tgz
>
>
> During the investigation of the CI timing out (HIVE-2686) there were some 
> concerns that the Zookeeper (HIVE-26763) upgrade caused some significant 
> slowdown.
> The goal of this issue is to analyse the test results from the following 
> builds:
> * [Build-1495|http://ci.hive.apache.org/job/hive-precommit/job/master/1495/], 
> commit just before Zookeeper upgrade;
> * 
> [Builld-1514|http://ci.hive.apache.org/job/hive-precommit/job/master/1514/], 
> commit after Zookeeper upgrade with skipped tests (HIVE-26796) and CI 
> timeouts (HIVE-26806) fixed;
> and reason about the impact of the Zookeeper upgrade in test execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26807) Investigate test running times before/after Zookeeper upgrade to 3.6.3

2022-12-05 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-26807:
---
Attachment: test-results-1495.tgz

> Investigate test running times before/after Zookeeper upgrade to 3.6.3
> --
>
> Key: HIVE-26807
> URL: https://issues.apache.org/jira/browse/HIVE-26807
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: test-results-1495.tgz, test-results-1514.tgz
>
>
> During the investigation of the CI timing out (HIVE-2686) there were some 
> concerns that the Zookeeper (HIVE-26763) upgrade caused some significant 
> slowdown.
> The goal of this issue is to analyse the test results from the following 
> builds:
> * [Build-1495|http://ci.hive.apache.org/job/hive-precommit/job/master/1495/], 
> commit just before Zookeeper upgrade;
> * 
> [Builld-1514|http://ci.hive.apache.org/job/hive-precommit/job/master/1514/], 
> commit after Zookeeper upgrade with skipped tests (HIVE-26796) and CI 
> timeouts (HIVE-26806) fixed;
> and reason about the impact of the Zookeeper upgrade in test execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26807) Investigate test running times before/after Zookeeper upgrade to 3.6.3

2022-12-05 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-26807:
---
Attachment: test-results-1514.tgz

> Investigate test running times before/after Zookeeper upgrade to 3.6.3
> --
>
> Key: HIVE-26807
> URL: https://issues.apache.org/jira/browse/HIVE-26807
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: test-results-1495.tgz, test-results-1514.tgz
>
>
> During the investigation of the CI timing out (HIVE-2686) there were some 
> concerns that the Zookeeper (HIVE-26763) upgrade caused some significant 
> slowdown.
> The goal of this issue is to analyse the test results from the following 
> builds:
> * [Build-1495|http://ci.hive.apache.org/job/hive-precommit/job/master/1495/], 
> commit just before Zookeeper upgrade;
> * 
> [Builld-1514|http://ci.hive.apache.org/job/hive-precommit/job/master/1514/], 
> commit after Zookeeper upgrade with skipped tests (HIVE-26796) and CI 
> timeouts (HIVE-26806) fixed;
> and reason about the impact of the Zookeeper upgrade in test execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-26807) Investigate test running times before/after Zookeeper upgrade to 3.6.3

2022-12-05 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-26807:
--


> Investigate test running times before/after Zookeeper upgrade to 3.6.3
> --
>
> Key: HIVE-26807
> URL: https://issues.apache.org/jira/browse/HIVE-26807
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> During the investigation of the CI timing out (HIVE-2686) there were some 
> concerns that the Zookeeper (HIVE-26763) upgrade caused some significant 
> slowdown.
> The goal of this issue is to analyse the test results from the following 
> builds:
> * [Build-1495|http://ci.hive.apache.org/job/hive-precommit/job/master/1495/], 
> commit just before Zookeeper upgrade;
> * 
> [Builld-1514|http://ci.hive.apache.org/job/hive-precommit/job/master/1514/], 
> commit after Zookeeper upgrade with skipped tests (HIVE-26796) and CI 
> timeouts (HIVE-26806) fixed;
> and reason about the impact of the Zookeeper upgrade in test execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26578) Enable Iceberg storage format for materialized views

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26578?focusedWorklogId=830992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830992
 ]

ASF GitHub Bot logged work on HIVE-26578:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 10:58
Start Date: 05/Dec/22 10:58
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3823:
URL: https://github.com/apache/hive/pull/3823#issuecomment-1337134994

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3823)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3823=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3823=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3823=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3823=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3823=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3823=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3823=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3823=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3823=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3823=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3823=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3823=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3823=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3823=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 830992)
Time Spent: 40m  (was: 0.5h)

> Enable Iceberg storage format for materialized views
> 
>
> Key: HIVE-26578
> URL: https://issues.apache.org/jira/browse/HIVE-26578
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> create materialized view mat1 stored by iceberg stored as orc tblproperties 
> ('format-version'='1') as
> select tbl_ice.b, tbl_ice.c from tbl_ice where tbl_ice.c > 52;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-05 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643247#comment-17643247
 ] 

Stamatis Zampetakis commented on HIVE-26806:


*Important note:* The test timings from master *are not used* to split tests in 
PRs. The master branch and PR branches have separate Jenkins jobs so one does 
not use the other as a reference. The splitting of tests on the first run of a 
PR (or a PR without a previous successful build) is more or less random.

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-05 Thread Alessandro Solimando (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643222#comment-17643222
 ] 

Alessandro Solimando commented on HIVE-26806:
-

Thanks [~zabetak], as you say the issue now affects only existing PRs, I am 
trying 2. to see if it works, otherwise I will go for 1., I will keep you guys 
posted here.

Forgetting the old affected PRs, I am OK with reducing the timeout to the 
previous value, since it now works.

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26221) Add histogram-based column statistics

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=830981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830981
 ]

ASF GitHub Bot logged work on HIVE-26221:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 10:14
Start Date: 05/Dec/22 10:14
Worklog Time Spent: 10m 
  Work Description: asolimando commented on code in PR #3137:
URL: https://github.com/apache/hive/pull/3137#discussion_r1039397581


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/LongColumnStatsAggregator.java:
##
@@ -51,73 +52,94 @@ public ColumnStatisticsObj 
aggregate(List colStatsWit
 checkStatisticsList(colStatsWithSourceInfo);
 
 ColumnStatisticsObj statsObj = null;
-String colType = null;
+String colType;
 String colName = null;
 // check if all the ColumnStatisticsObjs contain stats and all the ndv are
 // bitvectors
 boolean doAllPartitionContainStats = partNames.size() == 
colStatsWithSourceInfo.size();
 NumDistinctValueEstimator ndvEstimator = null;
+KllHistogramEstimator histogramEstimator = null;
+boolean areAllNDVEstimatorsMergeable = true;
+boolean areAllHistogramEstimatorsMergeable = true;
 for (ColStatsObjWithSourceInfo csp : colStatsWithSourceInfo) {
   ColumnStatisticsObj cso = csp.getColStatsObj();
   if (statsObj == null) {
 colName = cso.getColName();
 colType = cso.getColType();
 statsObj = ColumnStatsAggregatorFactory.newColumnStaticsObj(colName, 
colType,
 cso.getStatsData().getSetField());
-LOG.trace("doAllPartitionContainStats for column: {} is: {}", colName,
-doAllPartitionContainStats);
+LOG.trace("doAllPartitionContainStats for column: {} is: {}", colName, 
doAllPartitionContainStats);
   }
-  LongColumnStatsDataInspector longColumnStatsData = 
longInspectorFromStats(cso);
-  if (longColumnStatsData.getNdvEstimator() == null) {
-ndvEstimator = null;
-break;
-  } else {
-// check if all of the bit vectors can merge
-NumDistinctValueEstimator estimator = 
longColumnStatsData.getNdvEstimator();
+  LongColumnStatsDataInspector columnStatsData = 
longInspectorFromStats(cso);
+
+  // check if we can merge NDV estimators
+  if (columnStatsData.getNdvEstimator() == null) {
+areAllNDVEstimatorsMergeable = false;
+  } else if (areAllNDVEstimatorsMergeable) {
+NumDistinctValueEstimator estimator = 
columnStatsData.getNdvEstimator();
 if (ndvEstimator == null) {
   ndvEstimator = estimator;
 } else {
-  if (ndvEstimator.canMerge(estimator)) {
-continue;
-  } else {
-ndvEstimator = null;
-break;
+  if (!ndvEstimator.canMerge(estimator)) {
+areAllNDVEstimatorsMergeable = false;
+  }
+}
+  }
+  // check if we can merge histogram estimators
+  if (columnStatsData.getHistogramEstimator() == null) {

Review Comment:
   You are right, I have double checked and indeed it can be simplified as you 
suggest, I have added those fours lines you have cited right before the 
`return` statement of `aggregate` method and it's enough.





Issue Time Tracking
---

Worklog Id: (was: 830981)
Time Spent: 3.5h  (was: 3h 20m)

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable trade-off between

[jira] [Comment Edited] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-05 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643213#comment-17643213
 ] 

Stamatis Zampetakis edited comment on HIVE-26806 at 12/5/22 10:13 AM:
--

The recent builds on master (1513, 1514) are now back to normal and each split 
takes at most ~2h.

[~asolimando] [~ayushtkn] I am planning to revert the timeout back to 6h by 
committing directly to master in a few hours. Please speak up if there is any 
reason not do to this.

[~akshatm] The Jenkins plugin that is used to split the test into buckets uses 
the last successful build of the job as a guide. Each PR corresponds to a 
separate Jenkins Job 
(http://ci.hive.apache.org/job/hive-precommit/view/change-requests/). The last 
successful build for your PR is 
http://ci.hive.apache.org/job/hive-precommit/job/PR-3803/8/ so this is what 
will be used to split the tests. This is not good cause the successful run has 
3K less tests than what exists in master so the splitting will be pretty bad. I 
see three  ways to unblock the current situation and overcome the problem: 
# Close PR-3803 and open a new one.
# Manually delete every successful build for JOB PR-3803 and start a new one.
# Increase the timeout on the JenkinsFile and try again.

None of these is perfect but I have higher hopes for 1 and 2.


was (Author: zabetak):
The recent builds on master (1513, 1514) are now back to normal and each split 
takes at most ~2h.

[~asolimando] [~ayushtkn] I am planning to revert the timeout back to 6h by 
committing directly to master in a few hours. Please speak up if there is any 
reason not do to this.

[~akshatm] The Jenkins plugin that is used to split the test into buckets uses 
the last successful build of the job as a guide. Each PR corresponds to a 
separate Jenkins Job 
(http://ci.hive.apache.org/job/hive-precommit/view/change-requests/). The last 
successful build for your PR is 
http://ci.hive.apache.org/job/hive-precommit/job/PR-3803/8/ so this is what 
will be used to split the tests. This is not good cause the successful run has 
3K less tests than what exists in master so the splitting will be pretty bad. I 
see three  ways to unblock the current situation and overcome the problem: 
# Close PR-3803 and open a new one.
# Manually delete every successful build for JOB PR-3803 and start a new one.
# Increase the timeout on the JenkinsFile and try again.
None of these is perfect but I have higher hopes for 1 and 2.

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2022-12-05 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643213#comment-17643213
 ] 

Stamatis Zampetakis commented on HIVE-26806:


The recent builds on master (1513, 1514) are now back to normal and each split 
takes at most ~2h.

[~asolimando] [~ayushtkn] I am planning to revert the timeout back to 6h by 
committing directly to master in a few hours. Please speak up if there is any 
reason not do to this.

[~akshatm] The Jenkins plugin that is used to split the test into buckets uses 
the last successful build of the job as a guide. Each PR corresponds to a 
separate Jenkins Job 
(http://ci.hive.apache.org/job/hive-precommit/view/change-requests/). The last 
successful build for your PR is 
http://ci.hive.apache.org/job/hive-precommit/job/PR-3803/8/ so this is what 
will be used to split the tests. This is not good cause the successful run has 
3K less tests than what exists in master so the splitting will be pretty bad. I 
see three  ways to unblock the current situation and overcome the problem: 
# Close PR-3803 and open a new one.
# Manually delete every successful build for JOB PR-3803 and start a new one.
# Increase the timeout on the JenkinsFile and try again.
None of these is perfect but I have higher hopes for 1 and 2.

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26569) LlapTokenRenewer: TezAM (LlapTaskCommunicator) to renew LLAP_TOKENs

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26569?focusedWorklogId=830980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830980
 ]

ASF GitHub Bot logged work on HIVE-26569:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 10:12
Start Date: 05/Dec/22 10:12
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3626:
URL: https://github.com/apache/hive/pull/3626#issuecomment-1337077515

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3626)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3626=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3626=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3626=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3626=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3626=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3626=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3626=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3626=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3626=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3626=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3626=false=CODE_SMELL)
 [7 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3626=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3626=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3626=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 830980)
Time Spent: 1h 20m  (was: 1h 10m)

> LlapTokenRenewer: TezAM (LlapTaskCommunicator) to renew LLAP_TOKENs
> ---
>
> Key: HIVE-26569
> URL: https://issues.apache.org/jira/browse/HIVE-26569
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26788) Update stats of table/partition after minor compaction using noscan operation

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26788?focusedWorklogId=830976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830976
 ]

ASF GitHub Bot logged work on HIVE-26788:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 09:58
Start Date: 05/Dec/22 09:58
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3812:
URL: https://github.com/apache/hive/pull/3812#discussion_r1039378601


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/StatsUpdater.java:
##
@@ -73,6 +69,9 @@ public void gatherStats(CompactionInfo ci, HiveConf conf, 
String userName, Strin
 sb.append(")");
 }
 sb.append(" compute statistics");
+if (ci.isMinorCompaction()) {
+sb.append(" noscan");

Review Comment:
   Minor compaction is expected to not compact too many files and hence in most 
scenarios only the number of files gets changed after minor compaction. Whereas 
large updates like major compaction needs to update all statistics (since it 
happens once in a while) to keep the metadata updated. Therefore the idea was 
to do a fast update of statistics on a minor compaction & do complete update in 
case of major compaction.





Issue Time Tracking
---

Worklog Id: (was: 830976)
Time Spent: 1h  (was: 50m)

> Update stats of table/partition after minor compaction using noscan operation
> -
>
> Key: HIVE-26788
> URL: https://issues.apache.org/jira/browse/HIVE-26788
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, statistics are not updated for minor compaction since minor 
> compaction performs little updates on the statistics (such as number of files 
> in table/partition & total size of the table/partition). It is better to 
> utilize NOSCAN operation  for minor compaction since NOSCAN operations 
> performs faster update of statistics and updates the relevant fields such as 
> number of files & total sizes of the table/partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-14305) To/From UTC timestamp may return incorrect result because of DST

2022-12-05 Thread David Scarlatti (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643196#comment-17643196
 ] 

David Scarlatti commented on HIVE-14305:


it seems solved in Hive3.

> To/From UTC timestamp may return incorrect result because of DST
> 
>
> Key: HIVE-14305
> URL: https://issues.apache.org/jira/browse/HIVE-14305
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
>  Labels: timestamp
>
> If the machine's local timezone involves DST, the UDFs return incorrect 
> results.
> For example:
> {code}
> select to_utc_timestamp('2005-04-03 02:01:00','UTC');
> {code}
> returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03 
> 02:01:00}}.
> {code}
> select to_utc_timestamp('2005-04-03 10:01:00','Asia/Shanghai');
> {code}
> returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03 
> 02:01:00}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26221) Add histogram-based column statistics

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=830970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830970
 ]

ASF GitHub Bot logged work on HIVE-26221:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 09:20
Start Date: 05/Dec/22 09:20
Worklog Time Spent: 10m 
  Work Description: asolimando commented on code in PR #3137:
URL: https://github.com/apache/hive/pull/3137#discussion_r1039339301


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java:
##
@@ -1064,6 +1118,9 @@ public static void 
setFieldsIntoOldStats(ColumnStatisticsObj oldStatObj,
   if (newDecimalStatsData.isSetBitVectors()) {
 oldDecimalStatsData.setBitVectors(newDecimalStatsData.getBitVectors());
   }
+  if (newDecimalStatsData.isSetHistogram()) {
+oldDecimalStatsData.setHistogram(newDecimalStatsData.getHistogram());
+  }

Review Comment:
   Yes, absolutely, thanks for catching that, added.





Issue Time Tracking
---

Worklog Id: (was: 830970)
Time Spent: 3h 20m  (was: 3h 10m)

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable trade-off between memory footprint and accuracy
> Hive already integrates [KLL data 
> sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. 
> Datasketches are small, stateful programs that process massive data-streams 
> and can provide approximate answers, with mathematical guarantees, to 
> computationally difficult queries orders-of-magnitude faster than 
> traditional, exact methods.
> We propose to use KLL, and more specifically the cumulative distribution 
> function (CDF), as the underlying data structure for our histogram statistics.
> The current proposal targets numeric data types (float, integer and numeric 
> families) and temporal data types (date and timestamp).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26221) Add histogram-based column statistics

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=830969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830969
 ]

ASF GitHub Bot logged work on HIVE-26221:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 09:16
Start Date: 05/Dec/22 09:16
Worklog Time Spent: 10m 
  Work Description: asolimando commented on code in PR #3137:
URL: https://github.com/apache/hive/pull/3137#discussion_r1039335554


##
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLPlanUtils.java:
##
@@ -395,29 +404,46 @@ public void addDoubleStats(ColumnStatisticsData cd, 
List ls) {
 ls.add(lowValue + dc.getLowValue() + "'");
   }
 
+  public String checkHistogram(ColumnStatisticsData cd) {
+byte[] buffer = null;
+
+if (cd.isSetDoubleStats() && cd.getDoubleStats().isSetHistogram()) {

Review Comment:
   Good catch, we need to handle all the other supported data types here, I 
have added that





Issue Time Tracking
---

Worklog Id: (was: 830969)
Time Spent: 3h 10m  (was: 3h)

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable trade-off between memory footprint and accuracy
> Hive already integrates [KLL data 
> sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. 
> Datasketches are small, stateful programs that process massive data-streams 
> and can provide approximate answers, with mathematical guarantees, to 
> computationally difficult queries orders-of-magnitude faster than 
> traditional, exact methods.
> We propose to use KLL, and more specifically the cumulative distribution 
> function (CDF), as the underlying data structure for our histogram statistics.
> The current proposal targets numeric data types (float, integer and numeric 
> families) and temporal data types (date and timestamp).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26221) Add histogram-based column statistics

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=830968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830968
 ]

ASF GitHub Bot logged work on HIVE-26221:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 09:14
Start Date: 05/Dec/22 09:14
Worklog Time Spent: 10m 
  Work Description: asolimando commented on code in PR #3137:
URL: https://github.com/apache/hive/pull/3137#discussion_r1039333194


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/BinaryColumnStatsAggregator.java:
##
@@ -60,4 +60,8 @@ public ColumnStatisticsObj 
aggregate(List colStatsWit
 statsObj.setStatsData(columnStatisticsData);
 return statsObj;
   }
+
+  @Override protected ColumnStatisticsData initColumnStatisticsData() {
+throw new UnsupportedOperationException("initColumnStatisticsData not 
supported for binary statistics");

Review Comment:
   You are right, the method does not do much for `binary` and `boolean`, but 
it still make sense, so I have:
   - removed the exception, replaced with `return new ColumnStatisticsData();`
   - used the method to actually initialize the emtpy `ColumnStatisticsData` 
for those two data types





Issue Time Tracking
---

Worklog Id: (was: 830968)
Time Spent: 3h  (was: 2h 50m)

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable trade-off between memory footprint and accuracy
> Hive already integrates [KLL data 
> sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. 
> Datasketches are small, stateful programs that process massive data-streams 
> and can provide approximate answers, with mathematical guarantees, to 
> computationally difficult queries orders-of-magnitude faster than 
> traditional, exact methods.
> We propose to use KLL, and more specifically the cumulative distribution 
> function (CDF), as the underlying data structure for our histogram statistics.
> The current proposal targets numeric data types (float, integer and numeric 
> families) and temporal data types (date and timestamp).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26799) Make authorizations on custom UDFs involved in tables/view configurable.

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26799?focusedWorklogId=830967=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830967
 ]

ASF GitHub Bot logged work on HIVE-26799:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 09:11
Start Date: 05/Dec/22 09:11
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #3821:
URL: https://github.com/apache/hive/pull/3821#discussion_r1039330370


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -12550,6 +12550,21 @@ private ParseResult 
rewriteASTWithMaskAndFilter(TableMask tableMask, ASTNode ast
 }
   }
 
+  void gatherUserSuppliedFunctions(ASTNode ast) {
+int tokenType = ast.getToken().getType();
+if (tokenType == HiveParser.TOK_FUNCTION ||
+tokenType == HiveParser.TOK_FUNCTIONDI ||
+tokenType == HiveParser.TOK_FUNCTIONSTAR) {
+  if (ast.getChild(0).getType() == HiveParser.Identifier) {
+// maybe user supplied
+this.userSuppliedFunctions.add(ast.getChild(0).getText());

Review Comment:
   The `ast.getChild(0).getText()` should be trimmed by 
`unescapeIdentifier(expressionTree.getChild(0).getText())`.





Issue Time Tracking
---

Worklog Id: (was: 830967)
Time Spent: 1h 10m  (was: 1h)

> Make authorizations on custom UDFs involved in tables/view configurable.
> 
>
> Key: HIVE-26799
> URL: https://issues.apache.org/jira/browse/HIVE-26799
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, Security
>Affects Versions: 4.0.0-alpha-2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When Hive is using Ranger/Sentry as an authorization service, consider the 
> following scenario.
> {code:java}
> > create table test_udf(st string);   // privileged user operation 
> > create function Udf_UPPER as 'openkb.hive.udf.MyUpper' using jar 
> > 'hdfs:///tmp/MyUpperUDF-1.0.0.jar'; // privileged user operation
> > create view v1_udf as select udf_upper(st) from test_udf; // privileged 
> > user operation
> //unprivileged user test_user is given select permissions on view v1_udf
> > select * from v1_udf;  {code}
> It is expected that test_user needs to have select privilege on v1_udf and 
> select permissions on udf_upper custom UDF in order to do a select query on 
> view. 
> This patch introduces a configuration 
> "hive.security.authorization.functions.in.view"=false which disables 
> authorization on views associated with views/tables during the select query. 
> In this mode, only UDFs explicitly stated in the query would still be 
> authorized as it is currently.
> The reason for making these custom UDFs associated with view/tables 
> authorizable is that currently, test_user will need to be granted select 
> permissions on the custom udf. and the test_user can use this UDF and query 
> against any other table, which is a security concern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26762?focusedWorklogId=830966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830966
 ]

ASF GitHub Bot logged work on HIVE-26762:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 09:10
Start Date: 05/Dec/22 09:10
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3825:
URL: https://github.com/apache/hive/pull/3825#discussion_r1039321050


##
ql/src/test/results/clientpositive/llap/union_all_filter_transpose_pruned_operands.q.out:
##
@@ -0,0 +1,140 @@
+PREHOOK: query: CREATE EXTERNAL TABLE t (a string, b string)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@t
+POSTHOOK: query: CREATE EXTERNAL TABLE t (a string, b string)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@t
+PREHOOK: query: INSERT INTO t VALUES ('1000', 'b1')
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@t
+POSTHOOK: query: INSERT INTO t VALUES ('1000', 'b1')
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@t
+POSTHOOK: Lineage: t.a SCRIPT []
+POSTHOOK: Lineage: t.b SCRIPT []
+PREHOOK: query: INSERT INTO t VALUES ('1001', 'b1')
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@t
+POSTHOOK: query: INSERT INTO t VALUES ('1001', 'b1')
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@t
+POSTHOOK: Lineage: t.a SCRIPT []
+POSTHOOK: Lineage: t.b SCRIPT []
+PREHOOK: query: INSERT INTO t VALUES ('1002', 'b1')
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@t
+POSTHOOK: query: INSERT INTO t VALUES ('1002', 'b1')
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@t
+POSTHOOK: Lineage: t.a SCRIPT []
+POSTHOOK: Lineage: t.b SCRIPT []
+PREHOOK: query: INSERT INTO t VALUES ('2000', 'b2')
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@t
+POSTHOOK: query: INSERT INTO t VALUES ('2000', 'b2')
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@t
+POSTHOOK: Lineage: t.a SCRIPT []
+POSTHOOK: Lineage: t.b SCRIPT []
+PREHOOK: query: SELECT * FROM (
+  SELECT
+   a,
+   b
+  FROM t
+   UNION ALL
+  SELECT
+   a,
+   b
+   FROM t
+   WHERE a = 1001
+UNION ALL
+   SELECT
+   a,
+   b
+   FROM t
+   WHERE a = 1002) AS t2
+WHERE a = 1000
+PREHOOK: type: QUERY
+PREHOOK: Input: default@t
+ A masked pattern was here 
+POSTHOOK: query: SELECT * FROM (
+  SELECT
+   a,
+   b
+  FROM t
+   UNION ALL
+  SELECT
+   a,
+   b
+   FROM t
+   WHERE a = 1001
+UNION ALL
+   SELECT
+   a,
+   b
+   FROM t
+   WHERE a = 1002) AS t2
+WHERE a = 1000
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@t
+ A masked pattern was here 
+1000   b1
+PREHOOK: query: EXPLAIN CBO
+SELECT * FROM (
+  SELECT
+   a,
+   b
+  FROM t
+   UNION ALL
+  SELECT
+   a,
+   b
+   FROM t
+   WHERE a = 1001
+UNION ALL
+   SELECT
+   a,
+   b
+   FROM t
+   WHERE a = 1002) AS t2
+WHERE a = 1000
+PREHOOK: type: QUERY
+PREHOOK: Input: default@t
+ A masked pattern was here 
+POSTHOOK: query: EXPLAIN CBO
+SELECT * FROM (
+  SELECT
+   a,
+   b
+  FROM t
+   UNION ALL
+  SELECT
+   a,
+   b
+   FROM t
+   WHERE a = 1001
+UNION ALL
+   SELECT
+   a,
+   b
+   FROM t
+   WHERE a = 1002) AS t2
+WHERE a = 1000
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@t
+ A masked pattern was here 
+CBO PLAN:
+HiveProject(a=[$0], b=[$1])
+  HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])

Review Comment:
   This test does not intend testing the automatic casting for comparison but 
pruning empty result union branches.
   
   Could you please change the literals to string in the predicates.





Issue Time Tracking
---

Worklog Id: (was: 830966)
Time Spent: 40m  (was: 0.5h)

> Remove operand pruning in HiveFilterSetOpTransposeRule
> --
>
> Key: HIVE-26762
> URL: https://issues.apache.org/jira/browse/HIVE-26762
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if 
> the newly pushed filter simplifies to FALSE (due to the predicates holding on 
> the input).
> If this is true and there is more than one UNION ALL operand, it gets pruned.
> After

[jira] [Work logged] (HIVE-26221) Add histogram-based column statistics

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=830964=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830964
 ]

ASF GitHub Bot logged work on HIVE-26221:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 09:09
Start Date: 05/Dec/22 09:09
Worklog Time Spent: 10m 
  Work Description: asolimando commented on code in PR #3137:
URL: https://github.com/apache/hive/pull/3137#discussion_r1039328901


##
standalone-metastore/metastore-server/src/main/sql/mysql/upgrade-4.0.0-alpha-2-to-4.0.0.mysql.sql:
##
@@ -1,5 +1,9 @@
 SELECT 'Upgrading MetaStore schema from 4.0.0-alpha-2 to 4.0.0' AS MESSAGE;
 
+

Issue Time Tracking
---

Worklog Id: (was: 830964)
Time Spent: 2h 50m  (was: 2h 40m)

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an additional column 
> statistics, stored into the Hive metastore at the table (or partition) level.
> The main requirements for histogram integration are the following:
>  * efficiency: the approach must scale and support billions of rows
>  * merge-ability: partition-level histograms have to be merged to form 
> table-level histograms
>  * explicit and configurable trade-off between memory footprint and accuracy
> Hive already integrates [KLL data 
> sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. 
> Datasketches are small, stateful programs that process massive data-streams 
> and can provide approximate answers, with mathematical guarantees, to 
> computationally difficult queries orders-of-magnitude faster than 
> traditional, exact methods.
> We propose to use KLL, and more specifically the cumulative distribution 
> function (CDF), as the underlying data structure for our histogram statistics.
> The current proposal targets numeric data types (float, integer and numeric 
> families) and temporal data types (date and timestamp).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26754) Implement array_distinct UDF to return an array after removing duplicates in it

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26754?focusedWorklogId=830962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830962
 ]

ASF GitHub Bot logged work on HIVE-26754:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 09:00
Start Date: 05/Dec/22 09:00
Worklog Time Spent: 10m 
  Work Description: tarak271 commented on PR #3806:
URL: https://github.com/apache/hive/pull/3806#issuecomment-1336982493

   Test failure seems unrelated to this changes. That test 'orc_ppd_basic.q' is 
even failing without my changes




Issue Time Tracking
---

Worklog Id: (was: 830962)
Time Spent: 3h 50m  (was: 3h 40m)

> Implement array_distinct UDF to return an array after removing duplicates in 
> it
> ---
>
> Key: HIVE-26754
> URL: https://issues.apache.org/jira/browse/HIVE-26754
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> *array_distinct(array(obj1, obj2,...))* - The function returns an array of 
> the same type as the input argument where all duplicate values have been 
> removed.
> Example:
> > SELECT array_distinct(array('b', 'd', 'd', 'a')) FROM src LIMIT 1;
> ['a', 'b', 'c']



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26221) Add histogram-based column statistics

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26221?focusedWorklogId=830961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830961
 ]

ASF GitHub Bot logged work on HIVE-26221:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 08:52
Start Date: 05/Dec/22 08:52
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #3137:
URL: https://github.com/apache/hive/pull/3137#discussion_r1039312469


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/LongColumnStatsAggregator.java:
##
@@ -51,73 +52,94 @@ public ColumnStatisticsObj 
aggregate(List colStatsWit
 checkStatisticsList(colStatsWithSourceInfo);
 
 ColumnStatisticsObj statsObj = null;
-String colType = null;
+String colType;
 String colName = null;
 // check if all the ColumnStatisticsObjs contain stats and all the ndv are
 // bitvectors
 boolean doAllPartitionContainStats = partNames.size() == 
colStatsWithSourceInfo.size();
 NumDistinctValueEstimator ndvEstimator = null;
+KllHistogramEstimator histogramEstimator = null;
+boolean areAllNDVEstimatorsMergeable = true;
+boolean areAllHistogramEstimatorsMergeable = true;
 for (ColStatsObjWithSourceInfo csp : colStatsWithSourceInfo) {
   ColumnStatisticsObj cso = csp.getColStatsObj();
   if (statsObj == null) {
 colName = cso.getColName();
 colType = cso.getColType();
 statsObj = ColumnStatsAggregatorFactory.newColumnStaticsObj(colName, 
colType,
 cso.getStatsData().getSetField());
-LOG.trace("doAllPartitionContainStats for column: {} is: {}", colName,
-doAllPartitionContainStats);
+LOG.trace("doAllPartitionContainStats for column: {} is: {}", colName, 
doAllPartitionContainStats);
   }
-  LongColumnStatsDataInspector longColumnStatsData = 
longInspectorFromStats(cso);
-  if (longColumnStatsData.getNdvEstimator() == null) {
-ndvEstimator = null;
-break;
-  } else {
-// check if all of the bit vectors can merge
-NumDistinctValueEstimator estimator = 
longColumnStatsData.getNdvEstimator();
+  LongColumnStatsDataInspector columnStatsData = 
longInspectorFromStats(cso);
+
+  // check if we can merge NDV estimators
+  if (columnStatsData.getNdvEstimator() == null) {
+areAllNDVEstimatorsMergeable = false;
+  } else if (areAllNDVEstimatorsMergeable) {
+NumDistinctValueEstimator estimator = 
columnStatsData.getNdvEstimator();
 if (ndvEstimator == null) {
   ndvEstimator = estimator;
 } else {
-  if (ndvEstimator.canMerge(estimator)) {
-continue;
-  } else {
-ndvEstimator = null;
-break;
+  if (!ndvEstimator.canMerge(estimator)) {
+areAllNDVEstimatorsMergeable = false;
+  }
+}
+  }
+  // check if we can merge histogram estimators
+  if (columnStatsData.getHistogramEstimator() == null) {

Review Comment:
   To keep things simple, can we call 
   ```java
// merge what can be merged and keep the one with the biggest cardinality
 KllHistogramEstimator mergedKllHistogramEstimator = 
mergeHistograms(colStatsWithSourceInfo);
 if (mergedKllHistogramEstimator != null) {
   
columnStatisticsData.getLongStats().setHistogram(mergedKllHistogramEstimator.serialize());
 }
   ``` 
   directly to aggregate the histogram statistics instead of introducing 
`areAllHistogramEstimatorsMergeable` and `histogramEstimator` via iterating 
over the `colStatsWithSourceInfo`?





Issue Time Tracking
---

Worklog Id: (was: 830961)
Time Spent: 2h 40m  (was: 2.5h)

> Add histogram-based column statistics
> -
>
> Key: HIVE-26221
> URL: https://issues.apache.org/jira/browse/HIVE-26221
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Metastore, Statistics
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Hive does not support histogram statistics, which are particularly useful for 
> skewed data (which is very common in practice) and range predicates.
> Hive's current selectivity estimation for range predicates is based on a 
> hard-coded value of 1/3 (see 
> [FilterSelectivityEstimator.java#L138-L144|https://github.com/apache/hive/blob/56c336268ea8c281d23c22d89271af37cb7e2572/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).])
> The current proposal aims at integrating histogram as an

[jira] [Comment Edited] (HIVE-26737) Subquery returning wrong results when database has materialized views

2022-12-05 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643166#comment-17643166
 ] 

Krisztian Kasa edited comment on HIVE-26737 at 12/5/22 8:51 AM:


[#3761|https://github.com/apache/hive/pull/3761] merged to master. Thanks 
[~scarlin] for the patch.


was (Author: kkasa):
Merged to master. Thanks [~scarlin] for the patch.

> Subquery returning wrong results when database has materialized views
> -
>
> Key: HIVE-26737
> URL: https://issues.apache.org/jira/browse/HIVE-26737
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> When HS2 has materialized views in its registry, subqueries with correlated 
> variables may return wrong results.
> An example of this:
>  
> {code:java}
> CREATE TABLE t_test1(
>   id int,
>   int_col int,
>   year int,
>   month int 
> );
> CREATE TABLE t_test2(
>   id int,
>   int_col int,
>   year int,
>   month int 
> );
> CREATE TABLE dummy (
>   id int 
> ) stored as orc TBLPROPERTIES ('transactional'='true');
> CREATE MATERIALIZED VIEW need_a_mat_view_in_registry AS
> SELECT * FROM dummy where id > 5;
> INSERT INTO t_test1 VALUES (1, 1, 2009, 1), (10,0, 2009, 1); 
> INSERT INTO t_test2 VALUES (1, 1, 2009, 1); 
> select id, int_col, year, month from t_test1 s where s.int_col = (select 
> count(*) from t_test2 t where s.id = t.id) order by id; 
> {code}
> The select statement should produce 2 rows, but it is only producing one.
> The CBO plan produced has an inner join instead of a left join.
> {code:java}
> HiveSortLimit(sort0=[$0], dir0=[ASC])
>   HiveProject(id=[$0], int_col=[$1], year=[$2], month=[$3])
>     HiveJoin(condition=[AND(=($0, $5), =($4, $6))], joinType=[inner], 
> algorithm=[none], cost=[not available])
>       HiveProject(id=[$0], int_col=[$1], year=[$2], month=[$3], 
> CAST=[CAST($1):BIGINT])
>         HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT 
> NULL(CAST($1):BIGINT))])
>           HiveTableScan(table=[[default, t_test1]], table:alias=[s])
>       HiveProject(id=[$0], $f1=[$1])
>         HiveFilter(condition=[IS NOT NULL($1)])
>           HiveAggregate(group=[{0}], agg#0=[count()])
>             HiveFilter(condition=[IS NOT NULL($0)])
>               HiveTableScan(table=[[default, t_test2]], table:alias=[t]){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26737) Subquery returning wrong results when database has materialized views

2022-12-05 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-26737.
---
Resolution: Fixed

Merged to master. Thanks [~scarlin] for the patch.

> Subquery returning wrong results when database has materialized views
> -
>
> Key: HIVE-26737
> URL: https://issues.apache.org/jira/browse/HIVE-26737
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> When HS2 has materialized views in its registry, subqueries with correlated 
> variables may return wrong results.
> An example of this:
>  
> {code:java}
> CREATE TABLE t_test1(
>   id int,
>   int_col int,
>   year int,
>   month int 
> );
> CREATE TABLE t_test2(
>   id int,
>   int_col int,
>   year int,
>   month int 
> );
> CREATE TABLE dummy (
>   id int 
> ) stored as orc TBLPROPERTIES ('transactional'='true');
> CREATE MATERIALIZED VIEW need_a_mat_view_in_registry AS
> SELECT * FROM dummy where id > 5;
> INSERT INTO t_test1 VALUES (1, 1, 2009, 1), (10,0, 2009, 1); 
> INSERT INTO t_test2 VALUES (1, 1, 2009, 1); 
> select id, int_col, year, month from t_test1 s where s.int_col = (select 
> count(*) from t_test2 t where s.id = t.id) order by id; 
> {code}
> The select statement should produce 2 rows, but it is only producing one.
> The CBO plan produced has an inner join instead of a left join.
> {code:java}
> HiveSortLimit(sort0=[$0], dir0=[ASC])
>   HiveProject(id=[$0], int_col=[$1], year=[$2], month=[$3])
>     HiveJoin(condition=[AND(=($0, $5), =($4, $6))], joinType=[inner], 
> algorithm=[none], cost=[not available])
>       HiveProject(id=[$0], int_col=[$1], year=[$2], month=[$3], 
> CAST=[CAST($1):BIGINT])
>         HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT 
> NULL(CAST($1):BIGINT))])
>           HiveTableScan(table=[[default, t_test1]], table:alias=[s])
>       HiveProject(id=[$0], $f1=[$1])
>         HiveFilter(condition=[IS NOT NULL($1)])
>           HiveAggregate(group=[{0}], agg#0=[count()])
>             HiveFilter(condition=[IS NOT NULL($0)])
>               HiveTableScan(table=[[default, t_test2]], table:alias=[t]){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26737) Subquery returning wrong results when database has materialized views

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26737?focusedWorklogId=830960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830960
 ]

ASF GitHub Bot logged work on HIVE-26737:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 08:49
Start Date: 05/Dec/22 08:49
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged PR #3761:
URL: https://github.com/apache/hive/pull/3761




Issue Time Tracking
---

Worklog Id: (was: 830960)
Time Spent: 2h 50m  (was: 2h 40m)

> Subquery returning wrong results when database has materialized views
> -
>
> Key: HIVE-26737
> URL: https://issues.apache.org/jira/browse/HIVE-26737
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> When HS2 has materialized views in its registry, subqueries with correlated 
> variables may return wrong results.
> An example of this:
>  
> {code:java}
> CREATE TABLE t_test1(
>   id int,
>   int_col int,
>   year int,
>   month int 
> );
> CREATE TABLE t_test2(
>   id int,
>   int_col int,
>   year int,
>   month int 
> );
> CREATE TABLE dummy (
>   id int 
> ) stored as orc TBLPROPERTIES ('transactional'='true');
> CREATE MATERIALIZED VIEW need_a_mat_view_in_registry AS
> SELECT * FROM dummy where id > 5;
> INSERT INTO t_test1 VALUES (1, 1, 2009, 1), (10,0, 2009, 1); 
> INSERT INTO t_test2 VALUES (1, 1, 2009, 1); 
> select id, int_col, year, month from t_test1 s where s.int_col = (select 
> count(*) from t_test2 t where s.id = t.id) order by id; 
> {code}
> The select statement should produce 2 rows, but it is only producing one.
> The CBO plan produced has an inner join instead of a left join.
> {code:java}
> HiveSortLimit(sort0=[$0], dir0=[ASC])
>   HiveProject(id=[$0], int_col=[$1], year=[$2], month=[$3])
>     HiveJoin(condition=[AND(=($0, $5), =($4, $6))], joinType=[inner], 
> algorithm=[none], cost=[not available])
>       HiveProject(id=[$0], int_col=[$1], year=[$2], month=[$3], 
> CAST=[CAST($1):BIGINT])
>         HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT 
> NULL(CAST($1):BIGINT))])
>           HiveTableScan(table=[[default, t_test1]], table:alias=[s])
>       HiveProject(id=[$0], $f1=[$1])
>         HiveFilter(condition=[IS NOT NULL($1)])
>           HiveAggregate(group=[{0}], agg#0=[count()])
>             HiveFilter(condition=[IS NOT NULL($0)])
>               HiveTableScan(table=[[default, t_test2]], table:alias=[t]){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26770) Make "end of loop" compaction logs appear more selectively

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26770?focusedWorklogId=830959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830959
 ]

ASF GitHub Bot logged work on HIVE-26770:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 08:36
Start Date: 05/Dec/22 08:36
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3803:
URL: https://github.com/apache/hive/pull/3803#issuecomment-1336957318

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3803)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3803=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=3803=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=3803=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3803=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3803=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3803=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3803=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3803=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3803=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3803=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3803=false=CODE_SMELL)
 [10 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3803=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3803=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3803=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 830959)
Time Spent: 4h 50m  (was: 4h 40m)

> Make "end of loop" compaction logs appear more selectively
> --
>
> Key: HIVE-26770
> URL: https://issues.apache.org/jira/browse/HIVE-26770
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1
>Reporter: Akshat Mathur
>Assignee: Akshat Mathur
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Currently Initiator, Worker, and Cleaner threads log something like "finished 
> one loop" on INFO level.
> This is useful to figure out if one of these threads is taking too long to 
> finish a loop, but expensive in general.
>  
> Suggested Time: 20mins
> Logging this should be changed in the following way
>  # If loop finished within a predefined amount of time, level should be DEBUG 
> and message should look like: *Initiator loop took \{ellapsedTime} seconds to 
> finish.*
>  # If loop ran longer than this predefined amount, level should be WARN and 
> message should look like: *Possible Initiator slowdown, loop took 
> \{ellapsedTime} seconds to finish.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26788) Update stats of table/partition after minor compaction using noscan operation

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26788?focusedWorklogId=830957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830957
 ]

ASF GitHub Bot logged work on HIVE-26788:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 08:21
Start Date: 05/Dec/22 08:21
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3812:
URL: https://github.com/apache/hive/pull/3812#discussion_r1039282031


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/StatsUpdater.java:
##
@@ -52,10 +52,6 @@ public final class StatsUpdater {
  */
 public void gatherStats(CompactionInfo ci, HiveConf conf, String userName, 
String compactionQueueName) {
 try {
-if (!ci.isMajorCompaction()) {

Review Comment:
   how much overhead we could get on a production cluster? AFAIK when multiple 
workers are used those would try to initiate a new Tez session. 





Issue Time Tracking
---

Worklog Id: (was: 830957)
Time Spent: 50m  (was: 40m)

> Update stats of table/partition after minor compaction using noscan operation
> -
>
> Key: HIVE-26788
> URL: https://issues.apache.org/jira/browse/HIVE-26788
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, statistics are not updated for minor compaction since minor 
> compaction performs little updates on the statistics (such as number of files 
> in table/partition & total size of the table/partition). It is better to 
> utilize NOSCAN operation  for minor compaction since NOSCAN operations 
> performs faster update of statistics and updates the relevant fields such as 
> number of files & total sizes of the table/partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26788) Update stats of table/partition after minor compaction using noscan operation

2022-12-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26788?focusedWorklogId=830956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830956
 ]

ASF GitHub Bot logged work on HIVE-26788:
-

Author: ASF GitHub Bot
Created on: 05/Dec/22 08:17
Start Date: 05/Dec/22 08:17
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3812:
URL: https://github.com/apache/hive/pull/3812#discussion_r1039279154


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/StatsUpdater.java:
##
@@ -73,6 +69,9 @@ public void gatherStats(CompactionInfo ci, HiveConf conf, 
String userName, Strin
 sb.append(")");
 }
 sb.append(" compute statistics");
+if (ci.isMinorCompaction()) {
+sb.append(" noscan");

Review Comment:
   why `noscan` is used only in case of minor compaction?





Issue Time Tracking
---

Worklog Id: (was: 830956)
Time Spent: 40m  (was: 0.5h)

> Update stats of table/partition after minor compaction using noscan operation
> -
>
> Key: HIVE-26788
> URL: https://issues.apache.org/jira/browse/HIVE-26788
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, statistics are not updated for minor compaction since minor 
> compaction performs little updates on the statistics (such as number of files 
> in table/partition & total size of the table/partition). It is better to 
> utilize NOSCAN operation  for minor compaction since NOSCAN operations 
> performs faster update of statistics and updates the relevant fields such as 
> number of files & total sizes of the table/partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

91 matches

Mail list logo