attilapiros commented on a change in pull request #28967: URL: https://github.com/apache/spark/pull/28967#discussion_r450045017
########## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExecutorDiskUtils.java ########## @@ -45,34 +31,16 @@ public static File getFile(String[] localDirs, int subDirsPerLocalDir, String fi int hash = JavaUtils.nonNegativeHash(filename); String localDir = localDirs[hash % localDirs.length]; int subDirId = (hash / localDirs.length) % subDirsPerLocalDir; - return new File(createNormalizedInternedPathname( - localDir, String.format("%02x", subDirId), filename)); - } - - /** - * This method is needed to avoid the situation when multiple File instances for the - * same pathname "foo/bar" are created, each with a separate copy of the "foo/bar" String. - * According to measurements, in some scenarios such duplicate strings may waste a lot - * of memory (~ 10% of the heap). To avoid that, we intern the pathname, and before that - * we make sure that it's in a normalized form (contains no "//", "///" etc.) Otherwise, - * the internal code in java.io.File would normalize it later, creating a new "foo/bar" - * String copy. Unfortunately, we cannot just reuse the normalization code that java.io.File - * uses, since it is in the package-private class java.io.FileSystem. - * - * On Windows, separator "\" is used instead of "/". - * - * "\\" is a legal character in path name on Unix-like OS, but illegal on Windows. - */ - @VisibleForTesting - static String createNormalizedInternedPathname(String dir1, String dir2, String fname) { - String pathname = dir1 + File.separator + dir2 + File.separator + fname; - Matcher m = MULTIPLE_SEPARATORS.matcher(pathname); - pathname = m.replaceAll(Matcher.quoteReplacement(File.separator)); - // A single trailing slash needs to be taken care of separately - if (pathname.length() > 1 && pathname.charAt(pathname.length() - 1) == File.separatorChar) { - pathname = pathname.substring(0, pathname.length() - 1); - } - return pathname.intern(); + final String notNormalizedPath = + localDir + File.separator + String.format("%02x", subDirId) + File.separator + filename; + // Interning the normalized path as according to measurements, in some scenarios such + // duplicate strings may waste a lot of memory (~ 10% of the heap). + // Unfortunately, we cannot just call the normalization code that java.io.File + // uses, since it is in the package-private class java.io.FileSystem. + // So we are creating a File just to get the normalized path back to intern it. + // Finally a new File is built and returned with this interned normalized path. + final String normalizedInternedPath = new File(notNormalizedPath).getPath().intern(); Review comment: I am sorry but this is not how interning a String works. The `intern()` method gives back a new reference to a String from the constant pool when the String not already interned. So the `this` cannot be re-referenced as there could be several places which refers to that. This example makes it clear: ```java public class InternExample { public static void main(String args[]) { String s1 = new String("hello"); String s2 = "hello"; String s3 = s1.intern(); //returns string from pool, now it will be same as s2 System.out.println(s1 == s2); //false because reference is different System.out.println(s2 == s3); //true because reference is same } } ``` You can run it here: https://www.javatpoint.com/java-string-intern. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org