gatorsmile commented on a change in pull request #25014: 
[SPARK-28216][SQL][TEST] Add `getLocalDirSize` to SQLTestUtils
URL: https://github.com/apache/spark/pull/25014#discussion_r307050435
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ##########
 @@ -60,8 +60,8 @@ object DataSourceUtils {
   // SPARK-24626: Metadata files and temporary files should not be
   // counted as data files, so that they shouldn't participate in tasks like
   // location size calculation.
-  private[sql] def isDataPath(path: Path): Boolean = {
-    val name = path.getName
-    !(name.startsWith("_") || name.startsWith("."))
-  }
+  private[sql] def isDataPath(path: Path): Boolean = isDataFile(path.getName)
+
+  private[sql] def isDataFile(fileName: String) =
+    !(fileName.startsWith("_") || fileName.startsWith("."))
 
 Review comment:
   This is actually different from our code which is calling 
`fs.getContentSummary(tablePath).getLength` in 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L125
 
   
   getContentSummary is using isChecksumFile to filter out the metadata file. 
   
   
https://hadoop.apache.org/docs/r2.7.4/api/src-html/org/apache/hadoop/fs/ChecksumFileSystem.html
   
   ```Java
   093  /** Return true iff file is a checksum file name.*/
   094  public static boolean isChecksumFile(Path file) {
   095    String name = file.getName();
   096    return name.startsWith(".") && name.endsWith(".crc");
   097  }
   ```
   
   Obviously, they are different. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to