HeartSaVioR commented on a change in pull request #28967:
URL: https://github.com/apache/spark/pull/28967#discussion_r450052572



##########
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
##########
@@ -45,34 +31,16 @@ public static File getFile(String[] localDirs, int 
subDirsPerLocalDir, String fi
     int hash = JavaUtils.nonNegativeHash(filename);
     String localDir = localDirs[hash % localDirs.length];
     int subDirId = (hash / localDirs.length) % subDirsPerLocalDir;
-    return new File(createNormalizedInternedPathname(
-        localDir, String.format("%02x", subDirId), filename));
-  }
-
-  /**
-   * This method is needed to avoid the situation when multiple File instances 
for the
-   * same pathname "foo/bar" are created, each with a separate copy of the 
"foo/bar" String.
-   * According to measurements, in some scenarios such duplicate strings may 
waste a lot
-   * of memory (~ 10% of the heap). To avoid that, we intern the pathname, and 
before that
-   * we make sure that it's in a normalized form (contains no "//", "///" 
etc.) Otherwise,
-   * the internal code in java.io.File would normalize it later, creating a 
new "foo/bar"
-   * String copy. Unfortunately, we cannot just reuse the normalization code 
that java.io.File
-   * uses, since it is in the package-private class java.io.FileSystem.
-   *
-   * On Windows, separator "\" is used instead of "/".
-   *
-   * "\\" is a legal character in path name on Unix-like OS, but illegal on 
Windows.
-   */
-  @VisibleForTesting
-  static String createNormalizedInternedPathname(String dir1, String dir2, 
String fname) {
-    String pathname = dir1 + File.separator + dir2 + File.separator + fname;
-    Matcher m = MULTIPLE_SEPARATORS.matcher(pathname);
-    pathname = m.replaceAll(Matcher.quoteReplacement(File.separator));
-    // A single trailing slash needs to be taken care of separately
-    if (pathname.length() > 1 && pathname.charAt(pathname.length() - 1) == 
File.separatorChar) {
-      pathname = pathname.substring(0, pathname.length() - 1);
-    }
-    return pathname.intern();
+    final String notNormalizedPath =
+      localDir + File.separator + String.format("%02x", subDirId) + 
File.separator + filename;
+    // Interning the normalized path as according to measurements, in some 
scenarios such
+    // duplicate strings may waste a lot of memory (~ 10% of the heap).
+    // Unfortunately, we cannot just call the normalization code that 
java.io.File
+    // uses, since it is in the package-private class java.io.FileSystem.
+    // So we are creating a File just to get the normalized path back to 
intern it.
+    // Finally a new File is built and returned with this interned normalized 
path.
+    final String normalizedInternedPath = new 
File(notNormalizedPath).getPath().intern();

Review comment:
       Ah yes you're right. My bad I was confused a bit. Thanks for correcting 
me! So we had to do the double-normalization check, but second one would be 
just string seek.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to