[jira] [Updated] (HIVE-25300) Fix hive conf items validator type

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25300:
--
Labels: pull-request-available  (was: )

> Fix hive conf items validator type
> --
>
> Key: HIVE-25300
> URL: https://issues.apache.org/jira/browse/HIVE-25300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jeff Min
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive conf items should use RangeValidator
>  # hive.mv.files.thread
>  # hive.load.dynamic.partitions.thread
>  # hive.exec.input.listing.max.threads



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25300) Fix hive conf items validator type

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25300?focusedWorklogId=646534&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646534
 ]

ASF GitHub Bot logged work on HIVE-25300:
-

Author: ASF GitHub Bot
Created on: 04/Sep/21 00:09
Start Date: 04/Sep/21 00:09
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2439:
URL: https://github.com/apache/hive/pull/2439#issuecomment-912871432


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646534)
Remaining Estimate: 0h
Time Spent: 10m

> Fix hive conf items validator type
> --
>
> Key: HIVE-25300
> URL: https://issues.apache.org/jira/browse/HIVE-25300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jeff Min
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive conf items should use RangeValidator
>  # hive.mv.files.thread
>  # hive.load.dynamic.partitions.thread
>  # hive.exec.input.listing.max.threads



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646433&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646433
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 18:39
Start Date: 03/Sep/21 18:39
Worklog Time Spent: 10m 
  Work Description: coufon commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r702097544



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5240,16 +5259,38 @@ public DropPartitionsResult drop_partitions_req(
 for (Path path : archToDelete) {
   wh.deleteDir(path, true, mustPurge, needsCm);
 }
+
+// Uses a priority queue to delete the parents of deleted directories 
if empty.
+// The parent with the largest size is always processed first. It 
guarantees that
+// the emptiness of a parent won't be changed once it has been 
processed. So duplicated
+// processing can be avoided.
+PriorityQueue parentsToDelete = new 
PriorityQueue<>();
 for (PathAndPartValSize p : dirsToDelete) {
   wh.deleteDir(p.path, true, mustPurge, needsCm);
+  addParentForDel(parentsToDelete, p);
+}
+
+HashSet processed = new HashSet<>();
+while (!parentsToDelete.isEmpty()) {
   try {
-deleteParentRecursive(p.path.getParent(), p.partValSize - 1, 
mustPurge, needsCm);
+PathAndPartValSize p = parentsToDelete.poll();
+if (processed.contains(p)) {
+  continue;
+}
+processed.add(p);
+
+Path path = p.path;
+if (wh.isWritable(path) && wh.isDir(path) && wh.isEmptyDir(path)) {

Review comment:
   wh.isEmptyDir uses listStatus that doesn't distinguish file and dir (at 
least for the GCS fs implementation: 
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/7825ab50c839aea43f1ff587b0e2803047af99bc/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorageFileSystem.java#L997).
 But I agree that isEmptyDir is enough no matter the path is a file or dir.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646433)
Time Spent: 3.5h  (was: 3h 20m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646431&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646431
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 18:25
Start Date: 03/Sep/21 18:25
Worklog Time Spent: 10m 
  Work Description: coufon commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r702090099



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5240,16 +5259,38 @@ public DropPartitionsResult drop_partitions_req(
 for (Path path : archToDelete) {
   wh.deleteDir(path, true, mustPurge, needsCm);
 }
+
+// Uses a priority queue to delete the parents of deleted directories 
if empty.
+// The parent with the largest size is always processed first. It 
guarantees that

Review comment:
   Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646431)
Time Spent: 3h 20m  (was: 3h 10m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646429&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646429
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 18:20
Start Date: 03/Sep/21 18:20
Worklog Time Spent: 10m 
  Work Description: coufon commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r702088091



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name, 
final String tbl_name,
 null);
   }
 
-  private static class PathAndPartValSize {
-PathAndPartValSize(Path path, int partValSize) {
-  this.path = path;
-  this.partValSize = partValSize;
+/** Stores a path and its size. */
+private static class PathAndPartValSize implements 
Comparable {
+
+  public Path path;
+  int partValSize;
+
+  public PathAndPartValSize(Path path, int partValSize) {
+this.path = path;
+this.partValSize = partValSize;
+  }
+
+  @Override
+  public boolean equals(Object o) {
+if (o == this) {
+  return true;
+}
+if (!(o instanceof PathAndPartValSize)) {
+  return false;
+}
+return path.equals(((PathAndPartValSize) o).path);

Review comment:
   Nice catch. It is a bug. The current code actually didn't correctly 
implement the HashSet, it just used the hashcode of the object but not (path, 
depth) pair.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646429)
Time Spent: 3h  (was: 2h 50m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646430&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646430
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 18:20
Start Date: 03/Sep/21 18:20
Worklog Time Spent: 10m 
  Work Description: coufon commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r702088341



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name, 
final String tbl_name,
 null);
   }
 
-  private static class PathAndPartValSize {
-PathAndPartValSize(Path path, int partValSize) {
-  this.path = path;
-  this.partValSize = partValSize;
+/** Stores a path and its size. */
+private static class PathAndPartValSize implements 
Comparable {
+
+  public Path path;
+  int partValSize;
+
+  public PathAndPartValSize(Path path, int partValSize) {
+this.path = path;
+this.partValSize = partValSize;
+  }
+
+  @Override
+  public boolean equals(Object o) {
+if (o == this) {
+  return true;
+}
+if (!(o instanceof PathAndPartValSize)) {
+  return false;
+}
+return path.equals(((PathAndPartValSize) o).path);
+  }
+
+  /** The highest {@code partValSize} is processed first in a {@link 
PriorityQueue}. */
+  @Override
+  public int compareTo(PathAndPartValSize o) {
+return ((PathAndPartValSize) o).partValSize - partValSize;

Review comment:
   Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646430)
Time Spent: 3h 10m  (was: 3h)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646425&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646425
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 18:09
Start Date: 03/Sep/21 18:09
Worklog Time Spent: 10m 
  Work Description: coufon commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r702082867



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name, 
final String tbl_name,
 null);
   }
 
-  private static class PathAndPartValSize {
-PathAndPartValSize(Path path, int partValSize) {
-  this.path = path;
-  this.partValSize = partValSize;
+/** Stores a path and its size. */
+private static class PathAndPartValSize implements 
Comparable {
+
+  public Path path;
+  int partValSize;
+
+  public PathAndPartValSize(Path path, int partValSize) {
+this.path = path;
+this.partValSize = partValSize;
+  }
+
+  @Override
+  public boolean equals(Object o) {

Review comment:
   Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646425)
Time Spent: 2h 50m  (was: 2h 40m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646421&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646421
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 17:56
Start Date: 03/Sep/21 17:56
Worklog Time Spent: 10m 
  Work Description: coufon commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r702075927



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name, 
final String tbl_name,
 null);
   }
 
-  private static class PathAndPartValSize {
-PathAndPartValSize(Path path, int partValSize) {
-  this.path = path;
-  this.partValSize = partValSize;
+/** Stores a path and its size. */
+private static class PathAndPartValSize implements 
Comparable {
+
+  public Path path;

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646421)
Time Spent: 2h 40m  (was: 2.5h)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646420&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646420
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 17:55
Start Date: 03/Sep/21 17:55
Worklog Time Spent: 10m 
  Work Description: coufon commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r702075034



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name, 
final String tbl_name,
 null);
   }
 
-  private static class PathAndPartValSize {
-PathAndPartValSize(Path path, int partValSize) {
-  this.path = path;
-  this.partValSize = partValSize;
+/** Stores a path and its size. */
+private static class PathAndPartValSize implements 
Comparable {

Review comment:
   Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646420)
Time Spent: 2.5h  (was: 2h 20m)

> Slow Hive partition deletion for Cloud object stores with expensive ListFiles
> -
>
> Key: HIVE-25277
> URL: https://issues.apache.org/jira/browse/HIVE-25277
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: All Versions
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Deleting a Hive partition is slow when use a Cloud object store as the 
> warehouse for which ListFiles is expensive. A root cause is that the 
> recursive parent dir deletion is very inefficient: there are many duplicated 
> calls to isEmpty (ListFiles is called at the end). This fix sorts the parents 
> to delete according to the path size, and always processes the longest one 
> (e.g., a/b/c is always before a/b). As a result, each parent path is only 
> needed to be checked once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25277) Slow Hive partition deletion for Cloud object stores with expensive ListFiles

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25277?focusedWorklogId=646395&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646395
 ]

ASF GitHub Bot logged work on HIVE-25277:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 17:22
Start Date: 03/Sep/21 17:22
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r702046193



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name, 
final String tbl_name,
 null);
   }
 
-  private static class PathAndPartValSize {
-PathAndPartValSize(Path path, int partValSize) {
-  this.path = path;
-  this.partValSize = partValSize;
+/** Stores a path and its size. */
+private static class PathAndPartValSize implements 
Comparable {
+
+  public Path path;
+  int partValSize;
+
+  public PathAndPartValSize(Path path, int partValSize) {
+this.path = path;
+this.partValSize = partValSize;
+  }
+
+  @Override
+  public boolean equals(Object o) {
+if (o == this) {
+  return true;
+}
+if (!(o instanceof PathAndPartValSize)) {
+  return false;
+}
+return path.equals(((PathAndPartValSize) o).path);
+  }
+
+  /** The highest {@code partValSize} is processed first in a {@link 
PriorityQueue}. */
+  @Override
+  public int compareTo(PathAndPartValSize o) {
+return ((PathAndPartValSize) o).partValSize - partValSize;

Review comment:
   nit: the cast `(PathAndPartValSize) o` is unnecessary

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name, 
final String tbl_name,
 null);
   }
 
-  private static class PathAndPartValSize {
-PathAndPartValSize(Path path, int partValSize) {
-  this.path = path;
-  this.partValSize = partValSize;
+/** Stores a path and its size. */
+private static class PathAndPartValSize implements 
Comparable {
+
+  public Path path;
+  int partValSize;
+
+  public PathAndPartValSize(Path path, int partValSize) {
+this.path = path;
+this.partValSize = partValSize;
+  }
+
+  @Override
+  public boolean equals(Object o) {

Review comment:
   we should also implement `hashCode` together with `equals`?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5240,16 +5259,38 @@ public DropPartitionsResult drop_partitions_req(
 for (Path path : archToDelete) {
   wh.deleteDir(path, true, mustPurge, needsCm);
 }
+
+// Uses a priority queue to delete the parents of deleted directories 
if empty.
+// The parent with the largest size is always processed first. It 
guarantees that
+// the emptiness of a parent won't be changed once it has been 
processed. So duplicated
+// processing can be avoided.
+PriorityQueue parentsToDelete = new 
PriorityQueue<>();
 for (PathAndPartValSize p : dirsToDelete) {
   wh.deleteDir(p.path, true, mustPurge, needsCm);
+  addParentForDel(parentsToDelete, p);
+}
+
+HashSet processed = new HashSet<>();
+while (!parentsToDelete.isEmpty()) {
   try {
-deleteParentRecursive(p.path.getParent(), p.partValSize - 1, 
mustPurge, needsCm);
+PathAndPartValSize p = parentsToDelete.poll();
+if (processed.contains(p)) {
+  continue;
+}
+processed.add(p);
+
+Path path = p.path;
+if (wh.isWritable(path) && wh.isDir(path) && wh.isEmptyDir(path)) {

Review comment:
   `wh.isDir(path) && wh.isEmptyDir(path)` seems duplicated? why we need 
`wh.isDir` if we already have `wh.isEmptyDir`?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name, 
final String tbl_name,
 null);
   }
 
-  private static class PathAndPartValSize {
-PathAndPartValSize(Path path, int partValSize) {
-  this.path = path;
-  this.partValSize = partValSize;
+/** Stores a path and its size. */
+private static class PathAndPartValSize implements 
Comparable {
+
+  public Path path;
+  int partValSize;
+
+  public PathAndPartValSize(Path path, int partValSize) {
+this.path = path;
+this.partValSize = partValSize;
+  }
+
+  @Override
+  

[jira] [Updated] (HIVE-25495) Upgrade to JLine3

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25495:
--
Labels: pull-request-available  (was: )

> Upgrade to JLine3
> -
>
> Key: HIVE-25495
> URL: https://issues.apache.org/jira/browse/HIVE-25495
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Jline 2 has been discontinued a long while ago.  Hadoop uses JLine3 so Hive 
> should match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25495) Upgrade to JLine3

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25495?focusedWorklogId=646317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646317
 ]

ASF GitHub Bot logged work on HIVE-25495:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 14:05
Start Date: 03/Sep/21 14:05
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2617:
URL: https://github.com/apache/hive/pull/2617


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646317)
Remaining Estimate: 0h
Time Spent: 10m

> Upgrade to JLine3
> -
>
> Key: HIVE-25495
> URL: https://issues.apache.org/jira/browse/HIVE-25495
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Jline 2 has been discontinued a long while ago.  Hadoop uses JLine3 so Hive 
> should match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25498) Query with more than 32 count distinct functions returns wrong result

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25498:
--
Labels: pull-request-available  (was: )

> Query with more than 32 count distinct functions returns wrong result
> -
>
> Key: HIVE-25498
> URL: https://issues.apache.org/jira/browse/HIVE-25498
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If there are more than 32 "COUNT(DISTINCT COL)" functions in a query, all 
> these COUNT functions in this query return 0 instead of the proper values.
> Here are the queries to reproduce this issue:
> {code:java}
> set hive.cbo.enable=true;
> create table test_count (c0 string, c1 string, c2 string, c3 string, c4 
> string, c5 string, c6 string, c7 string, c8 string, c9 string, c10 string, 
> c11 string, c12 string, c13 string, c14 string, c15 string, c16 string, c17 
> string, c18 string, c19 string, c20 string, c21 string, c22 string, c23 
> string, c24 string, c25 string, c26 string, c27 string, c28 string, c29 
> string, c30 string, c31 string, c32 string);
> INSERT INTO test_count values ('c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 
> 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17', 
> 'c18', 'c19', 'c20', 'c21', 'c22', 'c23', 'c24', 'c25', 'c26', 'c27', 'c28', 
> 'c29', 'c30', 'c31', 'c32'); 
> select count (distinct c0), count(distinct c1), count(distinct c2), 
> count(distinct c3), count(distinct c4), count(distinct c5), count(distinct 
> c6), count(distinct c7), count(distinct c8), count(distinct c9), 
> count(distinct c10), count(distinct c11), count(distinct c12), count(distinct 
> c13), count(distinct c14), count(distinct c15), count(distinct c16), 
> count(distinct c17), count(distinct c18), count(distinct c19), count(distinct 
> c20), count(distinct c21), count(distinct c22), count(distinct c23), 
> count(distinct c24), count(distinct c25), count(distinct c26), count(distinct 
> c27), count(distinct c28), count(distinct c29), count(distinct c30), 
> count(distinct c31), count(distinct c32) from test_count;
> {code}
>  This bug is caused by HiveExpandDistinctAggregatesRule.getGroupingIdValue() 
> which uses int type. When there are more than 32 groupings the values 
> overflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25498) Query with more than 32 count distinct functions returns wrong result

2021-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25498?focusedWorklogId=646309&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-646309
 ]

ASF GitHub Bot logged work on HIVE-25498:
-

Author: ASF GitHub Bot
Created on: 03/Sep/21 13:50
Start Date: 03/Sep/21 13:50
Worklog Time Spent: 10m 
  Work Description: ujc714 opened a new pull request #2616:
URL: https://github.com/apache/hive/pull/2616


   ### What changes were proposed in this pull request?
   Fix a bug in HiveExpandDistinctAggregatesRule.getGroupingIdValue() which 
causes "COUNT(DISTINCT COL)" function returns wrong result.
   
   ### Why are the changes needed?
   If there are more than 32 COUNT(DISTINCT COL)" function in a query, the 
values returned from HiveExpandDistinctAggregatesRule.getGroupingIdValue() 
overflow so these COUNT functions return 0.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   mvn test -Dtest=TestMiniTezCliDriver -Dqfile=multi_count_distinct.q


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 646309)
Remaining Estimate: 0h
Time Spent: 10m

> Query with more than 32 count distinct functions returns wrong result
> -
>
> Key: HIVE-25498
> URL: https://issues.apache.org/jira/browse/HIVE-25498
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If there are more than 32 "COUNT(DISTINCT COL)" functions in a query, all 
> these COUNT functions in this query return 0 instead of the proper values.
> Here are the queries to reproduce this issue:
> {code:java}
> set hive.cbo.enable=true;
> create table test_count (c0 string, c1 string, c2 string, c3 string, c4 
> string, c5 string, c6 string, c7 string, c8 string, c9 string, c10 string, 
> c11 string, c12 string, c13 string, c14 string, c15 string, c16 string, c17 
> string, c18 string, c19 string, c20 string, c21 string, c22 string, c23 
> string, c24 string, c25 string, c26 string, c27 string, c28 string, c29 
> string, c30 string, c31 string, c32 string);
> INSERT INTO test_count values ('c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 
> 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17', 
> 'c18', 'c19', 'c20', 'c21', 'c22', 'c23', 'c24', 'c25', 'c26', 'c27', 'c28', 
> 'c29', 'c30', 'c31', 'c32'); 
> select count (distinct c0), count(distinct c1), count(distinct c2), 
> count(distinct c3), count(distinct c4), count(distinct c5), count(distinct 
> c6), count(distinct c7), count(distinct c8), count(distinct c9), 
> count(distinct c10), count(distinct c11), count(distinct c12), count(distinct 
> c13), count(distinct c14), count(distinct c15), count(distinct c16), 
> count(distinct c17), count(distinct c18), count(distinct c19), count(distinct 
> c20), count(distinct c21), count(distinct c22), count(distinct c23), 
> count(distinct c24), count(distinct c25), count(distinct c26), count(distinct 
> c27), count(distinct c28), count(distinct c29), count(distinct c30), 
> count(distinct c31), count(distinct c32) from test_count;
> {code}
>  This bug is caused by HiveExpandDistinctAggregatesRule.getGroupingIdValue() 
> which uses int type. When there are more than 32 groupings the values 
> overflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25498) Query with more than 32 count distinct functions returns wrong result

2021-09-03 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang reassigned HIVE-25498:
---

Assignee: Robbie Zhang

> Query with more than 32 count distinct functions returns wrong result
> -
>
> Key: HIVE-25498
> URL: https://issues.apache.org/jira/browse/HIVE-25498
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>
> If there are more than 32 "COUNT(DISTINCT COL)" functions in a query, all 
> these COUNT functions in this query return 0 instead of the proper values.
> Here are the queries to reproduce this issue:
> {code:java}
> set hive.cbo.enable=true;
> create table test_count (c0 string, c1 string, c2 string, c3 string, c4 
> string, c5 string, c6 string, c7 string, c8 string, c9 string, c10 string, 
> c11 string, c12 string, c13 string, c14 string, c15 string, c16 string, c17 
> string, c18 string, c19 string, c20 string, c21 string, c22 string, c23 
> string, c24 string, c25 string, c26 string, c27 string, c28 string, c29 
> string, c30 string, c31 string, c32 string);
> INSERT INTO test_count values ('c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 
> 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17', 
> 'c18', 'c19', 'c20', 'c21', 'c22', 'c23', 'c24', 'c25', 'c26', 'c27', 'c28', 
> 'c29', 'c30', 'c31', 'c32'); 
> select count (distinct c0), count(distinct c1), count(distinct c2), 
> count(distinct c3), count(distinct c4), count(distinct c5), count(distinct 
> c6), count(distinct c7), count(distinct c8), count(distinct c9), 
> count(distinct c10), count(distinct c11), count(distinct c12), count(distinct 
> c13), count(distinct c14), count(distinct c15), count(distinct c16), 
> count(distinct c17), count(distinct c18), count(distinct c19), count(distinct 
> c20), count(distinct c21), count(distinct c22), count(distinct c23), 
> count(distinct c24), count(distinct c25), count(distinct c26), count(distinct 
> c27), count(distinct c28), count(distinct c29), count(distinct c30), 
> count(distinct c31), count(distinct c32) from test_count;
> {code}
>  This bug is caused by HiveExpandDistinctAggregatesRule.getGroupingIdValue() 
> which uses int type. When there are more than 32 groupings the values 
> overflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25496) hadoop 3.3.1 / hive 3.2.1 / OpenJDK11 compatible?

2021-09-03 Thread Jerome Le Ray (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerome Le Ray reassigned HIVE-25496:


Assignee: (was: Jerome Le Ray)

> hadoop 3.3.1 / hive 3.2.1 / OpenJDK11 compatible?
> -
>
> Key: HIVE-25496
> URL: https://issues.apache.org/jira/browse/HIVE-25496
> Project: Hive
>  Issue Type: Bug
> Environment: Linux VM
>Reporter: Jerome Le Ray
>Priority: Major
>
> We used the following configuration
> hadoop 3.2.1
> hive 3.1.2
> PostGres 12
> Java - OracleJDK 8
> For internal reasons, we have to migrate to OpenJDK11.
> So, I've migrated hadoop 3.2.1 to the new version hadoop 3.3.1
> When I'm starting the hiveserver2 service, I've got the error :
> which: no hbase in 
> (/usr/local/bin:/bin:/usr/pgsql-12/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/jdk-11.0.10+9/bin:/opt/hivemetastore/hadoop-3.3.1/bin:/opt/hivemetastore/apache-hive-3.1.2-bin/b
> in)
> 2021-09-02 16:48:05: Starting HiveServer2
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hivemetastore/hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hivemetastore/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2021-09-02 16:48:06,744 INFO conf.HiveConf: Found configuration file 
> file:/opt/hivemetastore/apache-hive-3.1.2-bin/conf/hive-site.xml
> 2021-09-02 16:48:07,169 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.local does not exist
> 2021-09-02 16:48:07,169 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.thrift.bind.host does not exist
> 2021-09-02 16:48:07,170 WARN conf.HiveConf: HiveConf of name 
> hive.enforce.bucketing does not exist
> 2021-09-02 16:48:08,414 INFO server.HiveServer2: STARTUP_MSG:
> /
> STARTUP_MSG: Starting HiveServer2
> STARTUP_MSG: host = lhroelcspt1001.enterprisenet.org/10.90.122.159
> STARTUP_MSG: args = [-hiveconf, mapred.job.tracker=local, -hiveconf, 
> fs.default.name=file:///cip-data, -hiveconf, 
> hive.metastore.warehouse.dir=file:cip-data, --hiveconf, hive.server2.thrif
> t.port=1, --hiveconf, hive.root.logger=INFO,console]
> STARTUP_MSG: version = 3.1.2
> (...)
> STARTUP_MSG: build = git://HW13934/Users/gates/tmp/hive-branch-3.1/hive -r 
> 8190d2be7b7165effa62bd21b7d60ef81fb0e4af; compiled by 'gates' on Thu Aug 22 
> 15:01:18 PDT 2019
> /
> 2021-09-02 16:48:08,436 INFO server.HiveServer2: Starting HiveServer2
> 2021-09-02 16:48:08,462 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.local does not exist
> 2021-09-02 16:48:08,463 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.thrift.bind.host does not exist
> 2021-09-02 16:48:08,463 WARN conf.HiveConf: HiveConf of name 
> hive.enforce.bucketing does not exist
> Hive Session ID = 440449ff-99b7-429c-82d9-e20bdcc9b46f
> 2021-09-02 16:48:08,566 INFO SessionState: Hive Session ID = 
> 440449ff-99b7-429c-82d9-e20bdcc9b46f
> 2021-09-02 16:48:08,566 INFO server.HiveServer2: Shutting down HiveServer2
> 2021-09-02 16:48:08,584 INFO server.HiveServer2: Stopping/Disconnecting tez 
> sessions.
> 2021-09-02 16:48:08,585 WARN server.HiveServer2: Error starting HiveServer2 
> on attempt 1, will retry in 6ms
> java.lang.RuntimeException: Error applying authorization policy on hive 
> configuration: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot 
> be cast to class java.net.URLClassLoader (jdk.
> internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are 
> in module java.base of loader 'bootstrap')
>  at org.apache.hive.service.cli.CLIService.init(CLIService.java:118)
>  at org.apache.hive.service.CompositeService.init(CompositeService.java:59)
>  at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:230)
>  at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1036)
>  at 
> org.apache.hive.service.server.HiveServer2.access$1600(HiveServer2.java:140)
>  at 
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1305)
>  at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1149)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Met

[jira] [Assigned] (HIVE-25496) hadoop 3.3.1 / hive 3.2.1 / OpenJDK11 compatible?

2021-09-03 Thread Jerome Le Ray (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerome Le Ray reassigned HIVE-25496:


Assignee: Jerome Le Ray

> hadoop 3.3.1 / hive 3.2.1 / OpenJDK11 compatible?
> -
>
> Key: HIVE-25496
> URL: https://issues.apache.org/jira/browse/HIVE-25496
> Project: Hive
>  Issue Type: Bug
> Environment: Linux VM
>Reporter: Jerome Le Ray
>Assignee: Jerome Le Ray
>Priority: Major
>
> We used the following configuration
> hadoop 3.2.1
> hive 3.1.2
> PostGres 12
> Java - OracleJDK 8
> For internal reasons, we have to migrate to OpenJDK11.
> So, I've migrated hadoop 3.2.1 to the new version hadoop 3.3.1
> When I'm starting the hiveserver2 service, I've got the error :
> which: no hbase in 
> (/usr/local/bin:/bin:/usr/pgsql-12/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/jdk-11.0.10+9/bin:/opt/hivemetastore/hadoop-3.3.1/bin:/opt/hivemetastore/apache-hive-3.1.2-bin/b
> in)
> 2021-09-02 16:48:05: Starting HiveServer2
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hivemetastore/hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hivemetastore/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2021-09-02 16:48:06,744 INFO conf.HiveConf: Found configuration file 
> file:/opt/hivemetastore/apache-hive-3.1.2-bin/conf/hive-site.xml
> 2021-09-02 16:48:07,169 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.local does not exist
> 2021-09-02 16:48:07,169 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.thrift.bind.host does not exist
> 2021-09-02 16:48:07,170 WARN conf.HiveConf: HiveConf of name 
> hive.enforce.bucketing does not exist
> 2021-09-02 16:48:08,414 INFO server.HiveServer2: STARTUP_MSG:
> /
> STARTUP_MSG: Starting HiveServer2
> STARTUP_MSG: host = lhroelcspt1001.enterprisenet.org/10.90.122.159
> STARTUP_MSG: args = [-hiveconf, mapred.job.tracker=local, -hiveconf, 
> fs.default.name=file:///cip-data, -hiveconf, 
> hive.metastore.warehouse.dir=file:cip-data, --hiveconf, hive.server2.thrif
> t.port=1, --hiveconf, hive.root.logger=INFO,console]
> STARTUP_MSG: version = 3.1.2
> (...)
> STARTUP_MSG: build = git://HW13934/Users/gates/tmp/hive-branch-3.1/hive -r 
> 8190d2be7b7165effa62bd21b7d60ef81fb0e4af; compiled by 'gates' on Thu Aug 22 
> 15:01:18 PDT 2019
> /
> 2021-09-02 16:48:08,436 INFO server.HiveServer2: Starting HiveServer2
> 2021-09-02 16:48:08,462 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.local does not exist
> 2021-09-02 16:48:08,463 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.thrift.bind.host does not exist
> 2021-09-02 16:48:08,463 WARN conf.HiveConf: HiveConf of name 
> hive.enforce.bucketing does not exist
> Hive Session ID = 440449ff-99b7-429c-82d9-e20bdcc9b46f
> 2021-09-02 16:48:08,566 INFO SessionState: Hive Session ID = 
> 440449ff-99b7-429c-82d9-e20bdcc9b46f
> 2021-09-02 16:48:08,566 INFO server.HiveServer2: Shutting down HiveServer2
> 2021-09-02 16:48:08,584 INFO server.HiveServer2: Stopping/Disconnecting tez 
> sessions.
> 2021-09-02 16:48:08,585 WARN server.HiveServer2: Error starting HiveServer2 
> on attempt 1, will retry in 6ms
> java.lang.RuntimeException: Error applying authorization policy on hive 
> configuration: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot 
> be cast to class java.net.URLClassLoader (jdk.
> internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are 
> in module java.base of loader 'bootstrap')
>  at org.apache.hive.service.cli.CLIService.init(CLIService.java:118)
>  at org.apache.hive.service.CompositeService.init(CompositeService.java:59)
>  at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:230)
>  at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1036)
>  at 
> org.apache.hive.service.server.HiveServer2.access$1600(HiveServer2.java:140)
>  at 
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1305)
>  at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1149)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.

[jira] [Commented] (HIVE-25496) hadoop 3.3.1 / hive 3.2.1 / OpenJDK11 compatible?

2021-09-03 Thread Jerome Le Ray (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409365#comment-17409365
 ] 

Jerome Le Ray commented on HIVE-25496:
--

Hello,

Below the versions used

[hive@lhroelcspt1001 hiveserver2log]$ java --version
openjdk 11.0.10 2021-01-19
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.10+9, mixed mode)
[hive@lhroelcspt1001 hiveserver2log]$

[hive@lhroelcspt1001 hiveserver2log]$ hadoop version
Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r 
a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T05:13Z
Compiled with protoc 3.7.1
>From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using 
/opt/hivemetastore/hadoop-3.3.1/share/hadoop/common/hadoop-common-3.3.1.jar
[hive@lhroelcspt1001 hiveserver2log]$

[hive@lhroelcspt1001 hiveserver2log]$ hive --version
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/hivemetastore/hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/hivemetastore/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Hive 3.1.2
Git git://HW13934/Users/gates/tmp/hive-branch-3.1/hive -r 
8190d2be7b7165effa62bd21b7d60ef81fb0e4af
Compiled by gates on Thu Aug 22 15:01:18 PDT 2019
>From source with checksum 0492c08f784b188c349f6afb1d8d9847
[hive@lhroelcspt1001 hiveserver2log]$

> hadoop 3.3.1 / hive 3.2.1 / OpenJDK11 compatible?
> -
>
> Key: HIVE-25496
> URL: https://issues.apache.org/jira/browse/HIVE-25496
> Project: Hive
>  Issue Type: Bug
> Environment: Linux VM
>Reporter: Jerome Le Ray
>Priority: Major
>
> We used the following configuration
> hadoop 3.2.1
> hive 3.1.2
> PostGres 12
> Java - OracleJDK 8
> For internal reasons, we have to migrate to OpenJDK11.
> So, I've migrated hadoop 3.2.1 to the new version hadoop 3.3.1
> When I'm starting the hiveserver2 service, I've got the error :
> which: no hbase in 
> (/usr/local/bin:/bin:/usr/pgsql-12/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/jdk-11.0.10+9/bin:/opt/hivemetastore/hadoop-3.3.1/bin:/opt/hivemetastore/apache-hive-3.1.2-bin/b
> in)
> 2021-09-02 16:48:05: Starting HiveServer2
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hivemetastore/hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hivemetastore/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2021-09-02 16:48:06,744 INFO conf.HiveConf: Found configuration file 
> file:/opt/hivemetastore/apache-hive-3.1.2-bin/conf/hive-site.xml
> 2021-09-02 16:48:07,169 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.local does not exist
> 2021-09-02 16:48:07,169 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.thrift.bind.host does not exist
> 2021-09-02 16:48:07,170 WARN conf.HiveConf: HiveConf of name 
> hive.enforce.bucketing does not exist
> 2021-09-02 16:48:08,414 INFO server.HiveServer2: STARTUP_MSG:
> /
> STARTUP_MSG: Starting HiveServer2
> STARTUP_MSG: host = lhroelcspt1001.enterprisenet.org/10.90.122.159
> STARTUP_MSG: args = [-hiveconf, mapred.job.tracker=local, -hiveconf, 
> fs.default.name=file:///cip-data, -hiveconf, 
> hive.metastore.warehouse.dir=file:cip-data, --hiveconf, hive.server2.thrif
> t.port=1, --hiveconf, hive.root.logger=INFO,console]
> STARTUP_MSG: version = 3.1.2
> (...)
> STARTUP_MSG: build = git://HW13934/Users/gates/tmp/hive-branch-3.1/hive -r 
> 8190d2be7b7165effa62bd21b7d60ef81fb0e4af; compiled by 'gates' on Thu Aug 22 
> 15:01:18 PDT 2019
> /
> 2021-09-02 16:48:08,436 INFO server.HiveServer2: Starting HiveServer2
> 2021-09-02 16:48:08,462 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.local does not exist
> 2021-09-02 16:48:08,463 WARN conf.HiveConf: HiveConf of name 
> hive.metastore.thrift.bind.host does not exist
> 2021-09-02 16:48:08,463 WARN conf.HiveConf: HiveConf of name 
> hive.enforce.bucketing does not exist
> Hive Session ID = 440449ff-99b7-429c-82d9-e20bdcc9b46f
> 2021-09-02 16:48:08,566 INFO SessionState: Hive Session ID = 
> 440449ff-99b7-429c-82d9-