sunchao commented on a change in pull request #2421:
URL: https://github.com/apache/hive/pull/2421#discussion_r702046193
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##########
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name,
final String tbl_name,
null);
}
- private static class PathAndPartValSize {
- PathAndPartValSize(Path path, int partValSize) {
- this.path = path;
- this.partValSize = partValSize;
+ /** Stores a path and its size. */
+ private static class PathAndPartValSize implements
Comparable<PathAndPartValSize> {
+
+ public Path path;
+ int partValSize;
+
+ public PathAndPartValSize(Path path, int partValSize) {
+ this.path = path;
+ this.partValSize = partValSize;
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (o == this) {
+ return true;
+ }
+ if (!(o instanceof PathAndPartValSize)) {
+ return false;
+ }
+ return path.equals(((PathAndPartValSize) o).path);
+ }
+
+ /** The highest {@code partValSize} is processed first in a {@link
PriorityQueue}. */
+ @Override
+ public int compareTo(PathAndPartValSize o) {
+ return ((PathAndPartValSize) o).partValSize - partValSize;
Review comment:
nit: the cast `(PathAndPartValSize) o` is unnecessary
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##########
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name,
final String tbl_name,
null);
}
- private static class PathAndPartValSize {
- PathAndPartValSize(Path path, int partValSize) {
- this.path = path;
- this.partValSize = partValSize;
+ /** Stores a path and its size. */
+ private static class PathAndPartValSize implements
Comparable<PathAndPartValSize> {
+
+ public Path path;
+ int partValSize;
+
+ public PathAndPartValSize(Path path, int partValSize) {
+ this.path = path;
+ this.partValSize = partValSize;
+ }
+
+ @Override
+ public boolean equals(Object o) {
Review comment:
we should also implement `hashCode` together with `equals`?
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##########
@@ -5240,16 +5259,38 @@ public DropPartitionsResult drop_partitions_req(
for (Path path : archToDelete) {
wh.deleteDir(path, true, mustPurge, needsCm);
}
+
+ // Uses a priority queue to delete the parents of deleted directories
if empty.
+ // The parent with the largest size is always processed first. It
guarantees that
+ // the emptiness of a parent won't be changed once it has been
processed. So duplicated
+ // processing can be avoided.
+ PriorityQueue<PathAndPartValSize> parentsToDelete = new
PriorityQueue<>();
for (PathAndPartValSize p : dirsToDelete) {
wh.deleteDir(p.path, true, mustPurge, needsCm);
+ addParentForDel(parentsToDelete, p);
+ }
+
+ HashSet<PathAndPartValSize> processed = new HashSet<>();
+ while (!parentsToDelete.isEmpty()) {
try {
- deleteParentRecursive(p.path.getParent(), p.partValSize - 1,
mustPurge, needsCm);
+ PathAndPartValSize p = parentsToDelete.poll();
+ if (processed.contains(p)) {
+ continue;
+ }
+ processed.add(p);
+
+ Path path = p.path;
+ if (wh.isWritable(path) && wh.isDir(path) && wh.isEmptyDir(path)) {
Review comment:
`wh.isDir(path) && wh.isEmptyDir(path)` seems duplicated? why we need
`wh.isDir` if we already have `wh.isEmptyDir`?
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##########
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name,
final String tbl_name,
null);
}
- private static class PathAndPartValSize {
- PathAndPartValSize(Path path, int partValSize) {
- this.path = path;
- this.partValSize = partValSize;
+ /** Stores a path and its size. */
+ private static class PathAndPartValSize implements
Comparable<PathAndPartValSize> {
+
+ public Path path;
+ int partValSize;
+
+ public PathAndPartValSize(Path path, int partValSize) {
+ this.path = path;
+ this.partValSize = partValSize;
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (o == this) {
+ return true;
+ }
+ if (!(o instanceof PathAndPartValSize)) {
+ return false;
+ }
+ return path.equals(((PathAndPartValSize) o).path);
Review comment:
why we only compare `path` but not `partValSize`?
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##########
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name,
final String tbl_name,
null);
}
- private static class PathAndPartValSize {
- PathAndPartValSize(Path path, int partValSize) {
- this.path = path;
- this.partValSize = partValSize;
+ /** Stores a path and its size. */
+ private static class PathAndPartValSize implements
Comparable<PathAndPartValSize> {
+
+ public Path path;
Review comment:
nit: we can just use package-private instead of `public`. Also these
should be final
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##########
@@ -5104,14 +5103,34 @@ public boolean drop_partition(final String db_name,
final String tbl_name,
null);
}
- private static class PathAndPartValSize {
- PathAndPartValSize(Path path, int partValSize) {
- this.path = path;
- this.partValSize = partValSize;
+ /** Stores a path and its size. */
+ private static class PathAndPartValSize implements
Comparable<PathAndPartValSize> {
Review comment:
I think it's clearer to call this `PathAndDepth` - `PartValSize` is
confusing.
##########
File path:
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##########
@@ -5240,16 +5259,38 @@ public DropPartitionsResult drop_partitions_req(
for (Path path : archToDelete) {
wh.deleteDir(path, true, mustPurge, needsCm);
}
+
+ // Uses a priority queue to delete the parents of deleted directories
if empty.
+ // The parent with the largest size is always processed first. It
guarantees that
Review comment:
nit: `The parent with the largest size is always processed first ` ->
`Parents with the deepest path are always processed first`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]