Yang Jie created SPARK-40765: -------------------------------- Summary: Optimize redundant fs operations in `CommandUtils#calculateSingleLocationSize#getPathSize` method Key: SPARK-40765 URL: https://issues.apache.org/jira/browse/SPARK-40765 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Yang Jie
{code:java} def getPathSize(fs: FileSystem, path: Path): Long = { val fileStatus = fs.getFileStatus(path) val size = if (fileStatus.isDirectory) { fs.listStatus(path) .map { status => if (isDataPath(status.getPath, stagingDir)) { getPathSize(fs, status.getPath) } else { 0L } }.sum } else { fileStatus.getLen } size } {code} Change input parameter from `Path` to `FileStatus`, there is no need to do `fs.getFileStatus(path)` after each recursive call. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org