pvary commented on code in PR #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r848648030


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java:
##########
@@ -422,21 +415,46 @@ void findUnknownPartitions(Table table, Set<Path> 
partPaths, byte[] filterExp,
       }
       allPartDirs = partDirs;
     }
-    // don't want the table dir
-    allPartDirs.remove(tablePath);
-
-    // remove the partition paths we know about
-    allPartDirs.removeAll(partPaths);
-
     Set<String> partColNames = Sets.newHashSet();
     for(FieldSchema fSchema : getPartCols(table)) {
       partColNames.add(fSchema.getName());
     }
 
     Map<String, String> partitionColToTypeMap = 
getPartitionColtoTypeMap(table.getPartitionKeys());
+    
+    FileSystem fs = tablePath.getFileSystem(conf);
+    Set<Path> correctPartPathsInMS = new HashSet<>(partPathsInMS);

Review Comment:
   At this place we have 4 more-or-less similar copies of file listing in 
memory:
   1. `partPaths` - Path objects from the HMS and every parent of the partitions
   2. `partPathsInMS` - Path objects from the HMS
   3.  `correctPartPathsInMS` - This will be the final result, but here this is 
a duplicate of the  partPathsInMS`
   4. `allPartDirs` - Recursive listing of the table root dir(?)
   
   Do we need all of these? Would it be better to store only the difference of 
the current `partPaths` and `partPathsInMS` in a list instead of storing the 
full list again?
   
   Could we build up the `correctPartPathsInMS` when we are iterating through 
the `partPathsInMS`? Would that be comparable in time complexity and more 
optimal in space complexity?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to