attilapiros commented on a change in pull request #28967:
URL: https://github.com/apache/spark/pull/28967#discussion_r450045017



##########
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java
##########
@@ -45,34 +31,16 @@ public static File getFile(String[] localDirs, int 
subDirsPerLocalDir, String fi
     int hash = JavaUtils.nonNegativeHash(filename);
     String localDir = localDirs[hash % localDirs.length];
     int subDirId = (hash / localDirs.length) % subDirsPerLocalDir;
-    return new File(createNormalizedInternedPathname(
-        localDir, String.format("%02x", subDirId), filename));
-  }
-
-  /**
-   * This method is needed to avoid the situation when multiple File instances 
for the
-   * same pathname "foo/bar" are created, each with a separate copy of the 
"foo/bar" String.
-   * According to measurements, in some scenarios such duplicate strings may 
waste a lot
-   * of memory (~ 10% of the heap). To avoid that, we intern the pathname, and 
before that
-   * we make sure that it's in a normalized form (contains no "//", "///" 
etc.) Otherwise,
-   * the internal code in java.io.File would normalize it later, creating a 
new "foo/bar"
-   * String copy. Unfortunately, we cannot just reuse the normalization code 
that java.io.File
-   * uses, since it is in the package-private class java.io.FileSystem.
-   *
-   * On Windows, separator "\" is used instead of "/".
-   *
-   * "\\" is a legal character in path name on Unix-like OS, but illegal on 
Windows.
-   */
-  @VisibleForTesting
-  static String createNormalizedInternedPathname(String dir1, String dir2, 
String fname) {
-    String pathname = dir1 + File.separator + dir2 + File.separator + fname;
-    Matcher m = MULTIPLE_SEPARATORS.matcher(pathname);
-    pathname = m.replaceAll(Matcher.quoteReplacement(File.separator));
-    // A single trailing slash needs to be taken care of separately
-    if (pathname.length() > 1 && pathname.charAt(pathname.length() - 1) == 
File.separatorChar) {
-      pathname = pathname.substring(0, pathname.length() - 1);
-    }
-    return pathname.intern();
+    final String notNormalizedPath =
+      localDir + File.separator + String.format("%02x", subDirId) + 
File.separator + filename;
+    // Interning the normalized path as according to measurements, in some 
scenarios such
+    // duplicate strings may waste a lot of memory (~ 10% of the heap).
+    // Unfortunately, we cannot just call the normalization code that 
java.io.File
+    // uses, since it is in the package-private class java.io.FileSystem.
+    // So we are creating a File just to get the normalized path back to 
intern it.
+    // Finally a new File is built and returned with this interned normalized 
path.
+    final String normalizedInternedPath = new 
File(notNormalizedPath).getPath().intern();

Review comment:
       I am sorry but this is not how interning a String works. The `intern()` 
method gives back a new reference  to a String from the constant pool when the 
String not already interned. So the `this` cannot be re-referenced as there 
could be several places which refers to that. This example makes it clear:
   
   ```java
   public class InternExample {
       public static void main(String args[]) {
           String s1 = new String("hello");
           String s2 = "hello";
           String s3 = s1.intern(); //returns string from pool, now it will be 
same as s2   
           System.out.println(s1 == s2); //false because reference is different 
 
           System.out.println(s2 == s3); //true because reference is same  
       }
   }
   ```
   
   You can run it here: https://www.javatpoint.com/java-string-intern.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to