[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r198705671 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1519,7 +1519,12 @@ class SparkContext(config: SparkConf) extends Logging { def addFile(path: String, recursive: Boolean): Unit = { val uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null => new File(path).getCanonicalFile.toURI.toString + case "local" => +logWarning("We do not support add a local file here because file with local scheme is " + + "already existed on every node, there is no need to call addFile to add it again. " + + "(See more discussion about this in SPARK-24195.)") --- End diff -- Got it, rephrase done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r198682844 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1519,7 +1519,12 @@ class SparkContext(config: SparkConf) extends Logging { def addFile(path: String, recursive: Boolean): Unit = { val uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null => new File(path).getCanonicalFile.toURI.toString + case "local" => +logWarning("We do not support add a local file here because file with local scheme is " + + "already existed on every node, there is no need to call addFile to add it again. " + + "(See more discussion about this in SPARK-24195.)") --- End diff -- Can we please rephrase to "File with 'local' scheme is not supported to add to file server, since it is already available on every node."? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r197603726 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1517,9 +1517,19 @@ class SparkContext(config: SparkConf) extends Logging { * only supported for Hadoop-supported filesystems. */ def addFile(path: String, recursive: Boolean): Unit = { -val uri = new Path(path).toUri +var uri = new Path(path).toUri +// mark the original path's scheme is local or not, there is no need to add the local file +// in file server. +var localFile = false val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null => +new File(path).getCanonicalFile.toURI.toString + case "local" => +localFile = true +val tmpPath = new File(uri.getPath).getCanonicalFile.toURI.toString +// SPARK-24195: Local is not a valid scheme for FileSystem, we should only keep path here. +uri = new Path(uri.getPath).toUri +tmpPath --- End diff -- `This makes me think whether supporting "local" scheme in addFile is meaningful or not? Because file with "local" scheme is already existed on every node, also it should be aware by the user, so adding it seems not meaingful.` Yeah, agree with you. The last change wants to treat "local" file without adding to fileServer and correct its scheme to "file:", but maybe add a local file, the behavior itself is a no-op? So we just forbidden user pass a file with "local" scheme in addFile? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r197068331 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1517,9 +1517,19 @@ class SparkContext(config: SparkConf) extends Logging { * only supported for Hadoop-supported filesystems. */ def addFile(path: String, recursive: Boolean): Unit = { -val uri = new Path(path).toUri +var uri = new Path(path).toUri +// mark the original path's scheme is local or not, there is no need to add the local file +// in file server. +var localFile = false val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null => +new File(path).getCanonicalFile.toURI.toString + case "local" => +localFile = true +val tmpPath = new File(uri.getPath).getCanonicalFile.toURI.toString +// SPARK-24195: Local is not a valid scheme for FileSystem, we should only keep path here. +uri = new Path(uri.getPath).toUri +tmpPath --- End diff -- I think the change here make file with "local" scheme a no-op. This makes me think whether supporting "local" scheme in `addFile` is meaningful or not? Because file with "local" scheme is already existed on every node, also it should be aware by the user, so adding it seems not meaingful. By looking at the similar method `addJar`, there "local" jar is properly treated without adding to fileServer, and properly convert to the right scheme used by classloader. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r195931509 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends Logging { * only supported for Hadoop-supported filesystems. */ def addFile(path: String, recursive: Boolean): Unit = { -val uri = new Path(path).toUri +var uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null | "local" => +// SPARK-24195: Local is not a valid scheme for FileSystem, we should only keep path here. +uri = new Path(uri.getPath).toUri --- End diff -- Ah, I see, thanks. I'll do this in the next commit. Thanks for your patient explain. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r195660767 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends Logging { * only supported for Hadoop-supported filesystems. */ def addFile(path: String, recursive: Boolean): Unit = { -val uri = new Path(path).toUri +var uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null | "local" => +// SPARK-24195: Local is not a valid scheme for FileSystem, we should only keep path here. +uri = new Path(uri.getPath).toUri --- End diff -- I mean, at least we can do: ``` val a = new File(uri.getPath).getCanonicalFile.toURI.toString uri = new Path(uri.getPath).toUri a ``` `new Path(uri.getPath).toUri` for trimming the scheme looks not quite clean though. It's a-okay at least to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r195655196 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends Logging { * only supported for Hadoop-supported filesystems. */ def addFile(path: String, recursive: Boolean): Unit = { -val uri = new Path(path).toUri +var uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null | "local" => +// SPARK-24195: Local is not a valid scheme for FileSystem, we should only keep path here. +uri = new Path(uri.getPath).toUri --- End diff -- @HyukjinKwon @jiangxb1987 Thanks for your explain, I think I know what's your meaning about `we getPath doesn't include scheme`. Actually the purpose of this code `uri = new Path(uri.getPath).toUri`, is to reassign the var in +1520, we don't want the uri including local scheme. ``` Can't we just do new File(uri.getPath).getCanonicalFile.toURI.toString without this line? ``` We can't because like I explained above, if we didn't do `uri = new Path(uri.getPath).toUri`, will get a exception like below: ``` No FileSystem for scheme: local java.io.IOException: No FileSystem for scheme: local at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1830) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:690) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:486) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1557) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r195310633 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends Logging { * only supported for Hadoop-supported filesystems. */ def addFile(path: String, recursive: Boolean): Unit = { -val uri = new Path(path).toUri +var uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null | "local" => +// SPARK-24195: Local is not a valid scheme for FileSystem, we should only keep path here. +uri = new Path(uri.getPath).toUri --- End diff -- Yea we can simplify this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r195143756 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends Logging { * only supported for Hadoop-supported filesystems. */ def addFile(path: String, recursive: Boolean): Unit = { -val uri = new Path(path).toUri +var uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null | "local" => +// SPARK-24195: Local is not a valid scheme for FileSystem, we should only keep path here. +uri = new Path(uri.getPath).toUri --- End diff -- I mean we `getPath` doesn't include scheme: ``` scala> new Path("local://a/b/c") res0: org.apache.hadoop.fs.Path = local://a/b/c scala> new Path("local://a/b/c").toUri res1: java.net.URI = local://a/b/c scala> new Path("local://a/b/c").toUri.getScheme res2: String = local scala> new Path("local://a/b/c").toUri.getPath res3: String = /b/c ``` why should we do this again? ``` scala> new Path(new Path("local://a/b/c").toUri.getPath).toUri.getPath res4: String = /b/c ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r195134460 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -116,49 +116,52 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu test("basic case for addFile and listFiles") { val dir = Utils.createTempDir() +// file and absolute path for normal path val file1 = File.createTempFile("someprefix1", "somesuffix1", dir) val absolutePath1 = file1.getAbsolutePath +// file and absolute path for relative path val file2 = File.createTempFile("someprefix2", "somesuffix2", dir) val relativePath = file2.getParent + "/../" + file2.getParentFile.getName + "/" + file2.getName val absolutePath2 = file2.getAbsolutePath +// file and absolute path for path with local scheme +val file3 = File.createTempFile("someprefix3", "somesuffix3", dir) +val localPath = "local://" + file3.getParent + "/../" + file3.getParentFile.getName + + "/" + file3.getName +val absolutePath3 = file3.getAbsolutePath + try { Files.write("somewords1", file1, StandardCharsets.UTF_8) Files.write("somewords2", file2, StandardCharsets.UTF_8) - val length1 = file1.length() - val length2 = file2.length() + Files.write("somewords3", file3, StandardCharsets.UTF_8) - sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")) - sc.addFile(file1.getAbsolutePath) - sc.addFile(relativePath) - sc.parallelize(Array(1), 1).map(x => { -val gotten1 = new File(SparkFiles.get(file1.getName)) -val gotten2 = new File(SparkFiles.get(file2.getName)) -if (!gotten1.exists()) { + def checkGottenFile(file: File, absolutePath: String): Unit = { +val length = file.length() +val gotten = new File(SparkFiles.get(file.getName)) +if (!gotten.exists()) { throw new SparkException("file doesn't exist : " + absolutePath1) } -if (!gotten2.exists()) { - throw new SparkException("file doesn't exist : " + absolutePath2) -} -if (length1 != gotten1.length()) { +if (file.length() != gotten.length()) { throw new SparkException( -s"file has different length $length1 than added file ${gotten1.length()} : " + +s"file has different length $length than added file ${gotten.length()} : " + absolutePath1) } -if (length2 != gotten2.length()) { - throw new SparkException( -s"file has different length $length2 than added file ${gotten2.length()} : " + - absolutePath2) -} -if (absolutePath1 == gotten1.getAbsolutePath) { +if (absolutePath == gotten.getAbsolutePath) { throw new SparkException("file should have been copied :" + absolutePath1) } -if (absolutePath2 == gotten2.getAbsolutePath) { - throw new SparkException("file should have been copied : " + absolutePath2) -} --- End diff -- Actually I keep all existing test and just do clean work for reducing common code line by adding a function checkGottenFile in https://github.com/apache/spark/pull/21533/files/f922fd8c995164cada4a8b72e92c369a827def16#diff-8d5858d578a2dda1a2edb0d8cefa4f24R139. If you think it's unnecessary, I just change it back. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r195133122 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -116,49 +116,52 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu test("basic case for addFile and listFiles") { val dir = Utils.createTempDir() +// file and absolute path for normal path val file1 = File.createTempFile("someprefix1", "somesuffix1", dir) val absolutePath1 = file1.getAbsolutePath +// file and absolute path for relative path val file2 = File.createTempFile("someprefix2", "somesuffix2", dir) val relativePath = file2.getParent + "/../" + file2.getParentFile.getName + "/" + file2.getName val absolutePath2 = file2.getAbsolutePath +// file and absolute path for path with local scheme +val file3 = File.createTempFile("someprefix3", "somesuffix3", dir) +val localPath = "local://" + file3.getParent + "/../" + file3.getParentFile.getName + + "/" + file3.getName +val absolutePath3 = file3.getAbsolutePath + try { Files.write("somewords1", file1, StandardCharsets.UTF_8) Files.write("somewords2", file2, StandardCharsets.UTF_8) - val length1 = file1.length() - val length2 = file2.length() + Files.write("somewords3", file3, StandardCharsets.UTF_8) - sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")) - sc.addFile(file1.getAbsolutePath) - sc.addFile(relativePath) - sc.parallelize(Array(1), 1).map(x => { -val gotten1 = new File(SparkFiles.get(file1.getName)) -val gotten2 = new File(SparkFiles.get(file2.getName)) -if (!gotten1.exists()) { + def checkGottenFile(file: File, absolutePath: String): Unit = { +val length = file.length() +val gotten = new File(SparkFiles.get(file.getName)) +if (!gotten.exists()) { throw new SparkException("file doesn't exist : " + absolutePath1) } -if (!gotten2.exists()) { - throw new SparkException("file doesn't exist : " + absolutePath2) -} -if (length1 != gotten1.length()) { +if (file.length() != gotten.length()) { throw new SparkException( -s"file has different length $length1 than added file ${gotten1.length()} : " + +s"file has different length $length than added file ${gotten.length()} : " + absolutePath1) } -if (length2 != gotten2.length()) { - throw new SparkException( -s"file has different length $length2 than added file ${gotten2.length()} : " + - absolutePath2) -} -if (absolutePath1 == gotten1.getAbsolutePath) { +if (absolutePath == gotten.getAbsolutePath) { throw new SparkException("file should have been copied :" + absolutePath1) } -if (absolutePath2 == gotten2.getAbsolutePath) { - throw new SparkException("file should have been copied : " + absolutePath2) -} + } + + sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")) + sc.addFile(file1.getAbsolutePath) + sc.addFile(relativePath) + sc.addFile(localPath) + sc.parallelize(Array(1), 1).map(x => { --- End diff -- Got it, fix in next commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r195132870 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends Logging { * only supported for Hadoop-supported filesystems. */ def addFile(path: String, recursive: Boolean): Unit = { -val uri = new Path(path).toUri +var uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null | "local" => +// SPARK-24195: Local is not a valid scheme for FileSystem, we should only keep path here. +uri = new Path(uri.getPath).toUri --- End diff -- Yes, just as @felixcheung said, this because we will use uri in https://github.com/apache/spark/pull/21533/files/f922fd8c995164cada4a8b72e92c369a827def16#diff-364713d7776956cb8b0a771e9b62f82dR1557, if the uri with local scheme, we'll get an exception cause local is not a valid scheme for FileSystem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r195133036 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -116,49 +116,52 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu test("basic case for addFile and listFiles") { val dir = Utils.createTempDir() +// file and absolute path for normal path val file1 = File.createTempFile("someprefix1", "somesuffix1", dir) val absolutePath1 = file1.getAbsolutePath +// file and absolute path for relative path val file2 = File.createTempFile("someprefix2", "somesuffix2", dir) val relativePath = file2.getParent + "/../" + file2.getParentFile.getName + "/" + file2.getName val absolutePath2 = file2.getAbsolutePath +// file and absolute path for path with local scheme +val file3 = File.createTempFile("someprefix3", "somesuffix3", dir) +val localPath = "local://" + file3.getParent + "/../" + file3.getParentFile.getName + --- End diff -- Got it, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r194953025 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -116,49 +116,52 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu test("basic case for addFile and listFiles") { val dir = Utils.createTempDir() +// file and absolute path for normal path val file1 = File.createTempFile("someprefix1", "somesuffix1", dir) val absolutePath1 = file1.getAbsolutePath +// file and absolute path for relative path val file2 = File.createTempFile("someprefix2", "somesuffix2", dir) val relativePath = file2.getParent + "/../" + file2.getParentFile.getName + "/" + file2.getName val absolutePath2 = file2.getAbsolutePath +// file and absolute path for path with local scheme +val file3 = File.createTempFile("someprefix3", "somesuffix3", dir) +val localPath = "local://" + file3.getParent + "/../" + file3.getParentFile.getName + + "/" + file3.getName +val absolutePath3 = file3.getAbsolutePath + try { Files.write("somewords1", file1, StandardCharsets.UTF_8) Files.write("somewords2", file2, StandardCharsets.UTF_8) - val length1 = file1.length() - val length2 = file2.length() + Files.write("somewords3", file3, StandardCharsets.UTF_8) - sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")) - sc.addFile(file1.getAbsolutePath) - sc.addFile(relativePath) - sc.parallelize(Array(1), 1).map(x => { -val gotten1 = new File(SparkFiles.get(file1.getName)) -val gotten2 = new File(SparkFiles.get(file2.getName)) -if (!gotten1.exists()) { + def checkGottenFile(file: File, absolutePath: String): Unit = { +val length = file.length() +val gotten = new File(SparkFiles.get(file.getName)) +if (!gotten.exists()) { throw new SparkException("file doesn't exist : " + absolutePath1) } -if (!gotten2.exists()) { - throw new SparkException("file doesn't exist : " + absolutePath2) -} -if (length1 != gotten1.length()) { +if (file.length() != gotten.length()) { throw new SparkException( -s"file has different length $length1 than added file ${gotten1.length()} : " + +s"file has different length $length than added file ${gotten.length()} : " + absolutePath1) } -if (length2 != gotten2.length()) { - throw new SparkException( -s"file has different length $length2 than added file ${gotten2.length()} : " + - absolutePath2) -} -if (absolutePath1 == gotten1.getAbsolutePath) { +if (absolutePath == gotten.getAbsolutePath) { throw new SparkException("file should have been copied :" + absolutePath1) } -if (absolutePath2 == gotten2.getAbsolutePath) { - throw new SparkException("file should have been copied : " + absolutePath2) -} --- End diff -- can we not change the existing test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r194953034 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends Logging { * only supported for Hadoop-supported filesystems. */ def addFile(path: String, recursive: Boolean): Unit = { -val uri = new Path(path).toUri +var uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null | "local" => +// SPARK-24195: Local is not a valid scheme for FileSystem, we should only keep path here. +uri = new Path(uri.getPath).toUri --- End diff -- it changes `uri` - which is reference again below. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r194927085 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends Logging { * only supported for Hadoop-supported filesystems. */ def addFile(path: String, recursive: Boolean): Unit = { -val uri = new Path(path).toUri +var uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null | "local" => +// SPARK-24195: Local is not a valid scheme for FileSystem, we should only keep path here. +uri = new Path(uri.getPath).toUri --- End diff -- Yes, same question. The above line seems not useful. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r194812872 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -116,49 +116,52 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu test("basic case for addFile and listFiles") { val dir = Utils.createTempDir() +// file and absolute path for normal path val file1 = File.createTempFile("someprefix1", "somesuffix1", dir) val absolutePath1 = file1.getAbsolutePath +// file and absolute path for relative path val file2 = File.createTempFile("someprefix2", "somesuffix2", dir) val relativePath = file2.getParent + "/../" + file2.getParentFile.getName + "/" + file2.getName val absolutePath2 = file2.getAbsolutePath +// file and absolute path for path with local scheme +val file3 = File.createTempFile("someprefix3", "somesuffix3", dir) +val localPath = "local://" + file3.getParent + "/../" + file3.getParentFile.getName + + "/" + file3.getName +val absolutePath3 = file3.getAbsolutePath + try { Files.write("somewords1", file1, StandardCharsets.UTF_8) Files.write("somewords2", file2, StandardCharsets.UTF_8) - val length1 = file1.length() - val length2 = file2.length() + Files.write("somewords3", file3, StandardCharsets.UTF_8) - sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")) - sc.addFile(file1.getAbsolutePath) - sc.addFile(relativePath) - sc.parallelize(Array(1), 1).map(x => { -val gotten1 = new File(SparkFiles.get(file1.getName)) -val gotten2 = new File(SparkFiles.get(file2.getName)) -if (!gotten1.exists()) { + def checkGottenFile(file: File, absolutePath: String): Unit = { +val length = file.length() +val gotten = new File(SparkFiles.get(file.getName)) +if (!gotten.exists()) { throw new SparkException("file doesn't exist : " + absolutePath1) } -if (!gotten2.exists()) { - throw new SparkException("file doesn't exist : " + absolutePath2) -} -if (length1 != gotten1.length()) { +if (file.length() != gotten.length()) { throw new SparkException( -s"file has different length $length1 than added file ${gotten1.length()} : " + +s"file has different length $length than added file ${gotten.length()} : " + absolutePath1) } -if (length2 != gotten2.length()) { - throw new SparkException( -s"file has different length $length2 than added file ${gotten2.length()} : " + - absolutePath2) -} -if (absolutePath1 == gotten1.getAbsolutePath) { +if (absolutePath == gotten.getAbsolutePath) { throw new SparkException("file should have been copied :" + absolutePath1) } -if (absolutePath2 == gotten2.getAbsolutePath) { - throw new SparkException("file should have been copied : " + absolutePath2) -} + } + + sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local")) + sc.addFile(file1.getAbsolutePath) + sc.addFile(relativePath) + sc.addFile(localPath) + sc.parallelize(Array(1), 1).map(x => { --- End diff -- nit: ``` map { x => ... } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r194812741 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -116,49 +116,52 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu test("basic case for addFile and listFiles") { val dir = Utils.createTempDir() +// file and absolute path for normal path val file1 = File.createTempFile("someprefix1", "somesuffix1", dir) val absolutePath1 = file1.getAbsolutePath +// file and absolute path for relative path val file2 = File.createTempFile("someprefix2", "somesuffix2", dir) val relativePath = file2.getParent + "/../" + file2.getParentFile.getName + "/" + file2.getName val absolutePath2 = file2.getAbsolutePath +// file and absolute path for path with local scheme +val file3 = File.createTempFile("someprefix3", "somesuffix3", dir) +val localPath = "local://" + file3.getParent + "/../" + file3.getParentFile.getName + --- End diff -- Let's use string interpolation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r194812492 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends Logging { * only supported for Hadoop-supported filesystems. */ def addFile(path: String, recursive: Boolean): Unit = { -val uri = new Path(path).toUri +var uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null | "local" => +// SPARK-24195: Local is not a valid scheme for FileSystem, we should only keep path here. +uri = new Path(uri.getPath).toUri --- End diff -- Why is this needed? Can't we just do `new File(uri.getPath).getCanonicalFile.toURI.toString` without this line? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
GitHub user xuanyuanking opened a pull request: https://github.com/apache/spark/pull/21533 [SPARK-24195][Core] Bug fix for local:/ path in SparkContext.addFile ## What changes were proposed in this pull request? In the chagnes in [SPARK-6300](https://issues.apache.org/jira/browse/SPARK-6300), essentially it change schemePath to ``` new File(path).getCanonicalFile.toURI.toString ``` . This has problem when path is local:, as `java.io.File` doesn't handle it. eg. new File("local:///home/user/demo/logger.config").getCanonicalFile.toURI.toString res1: String = file:/user/anotheruser/local:/home/user/demo/logger.config ## How was this patch tested? Add test in `SparkContextSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuanyuanking/spark SPARK-24195 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21533.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21533 commit f922fd8c995164cada4a8b72e92c369a827def16 Author: Yuanjian Li Date: 2018-06-12T01:51:44Z bug fix for local:/ path in sc.addFile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org