gatorsmile commented on a change in pull request #25014: [SPARK-28216][SQL][TEST] Add `getLocalDirSize` to SQLTestUtils URL: https://github.com/apache/spark/pull/25014#discussion_r307050435
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala ########## @@ -60,8 +60,8 @@ object DataSourceUtils { // SPARK-24626: Metadata files and temporary files should not be // counted as data files, so that they shouldn't participate in tasks like // location size calculation. - private[sql] def isDataPath(path: Path): Boolean = { - val name = path.getName - !(name.startsWith("_") || name.startsWith(".")) - } + private[sql] def isDataPath(path: Path): Boolean = isDataFile(path.getName) + + private[sql] def isDataFile(fileName: String) = + !(fileName.startsWith("_") || fileName.startsWith(".")) Review comment: This is actually different from our code which is calling `fs.getContentSummary(tablePath).getLength` in https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L125 getContentSummary is using isChecksumFile to filter out the metadata file. https://hadoop.apache.org/docs/r2.7.4/api/src-html/org/apache/hadoop/fs/ChecksumFileSystem.html ```Java 093 /** Return true iff file is a checksum file name.*/ 094 public static boolean isChecksumFile(Path file) { 095 String name = file.getName(); 096 return name.startsWith(".") && name.endsWith(".crc"); 097 } ``` Obviously, they are different. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org