pvary commented on code in PR #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r849144249
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java:
##########
@@ -422,29 +411,67 @@ void findUnknownPartitions(Table table, Set<Path>
partPaths, byte[] filterExp,
}
allPartDirs = partDirs;
}
- // don't want the table dir
- allPartDirs.remove(tablePath);
-
- // remove the partition paths we know about
- allPartDirs.removeAll(partPaths);
-
Set<String> partColNames = Sets.newHashSet();
for(FieldSchema fSchema : getPartCols(table)) {
partColNames.add(fSchema.getName());
}
Map<String, String> partitionColToTypeMap =
getPartitionColtoTypeMap(table.getPartitionKeys());
+
+ Set<Path> partPathsInMS = new HashSet<>(partPaths);
+ partPathsInMS.remove(tablePath);
+ FileSystem fs = tablePath.getFileSystem(conf);
+ int tablePartDepth = getPartColNames(table).size();
+ for(Path partPath: partPaths) {
+ Path tmpPath = partPath;
+ for(int k=0;k<tablePartDepth;k++) {
+ tmpPath = tmpPath.getParent();
+ partPathsInMS.remove(tmpPath);
+ }
+ }
Review Comment:
Why not just do not add them this to `partPaths` in the first place.
Also at the same place we can remove them from the `allPartDirs`.
Basically the `checkTable` should start with:
```
checkPartitionDirs(tablePath, allPartDirs,
Collections.unmodifiableList(getPartColNames(table)));
for (Partition partition : parts) {
Path partPath = getDataLocation(table, partition);
if (allPartDirs.contains(partPath)) {
// Found partition on FS
allPartDirs.remove();
// also remove all children of the path from allPartDir
} else {
if (!path.children(tablePath) && path.exists) {
// Found partition on FS
} else {
// Missing partition on FS
}
}
}
// If there is anything left in the allPartDirs, then Missing partition in
HMS
```
Wouldn't this work?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]