[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-27 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r198705671
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1519,7 +1519,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
   def addFile(path: String, recursive: Boolean): Unit = {
 val uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null => new File(path).getCanonicalFile.toURI.toString
+  case "local" =>
+logWarning("We do not support add a local file here because file 
with local scheme is " +
+  "already existed on every node, there is no need to call addFile 
to add it again. " +
+  "(See more discussion about this in SPARK-24195.)")
--- End diff --

Got it, rephrase done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-27 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r198682844
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1519,7 +1519,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
   def addFile(path: String, recursive: Boolean): Unit = {
 val uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null => new File(path).getCanonicalFile.toURI.toString
+  case "local" =>
+logWarning("We do not support add a local file here because file 
with local scheme is " +
+  "already existed on every node, there is no need to call addFile 
to add it again. " +
+  "(See more discussion about this in SPARK-24195.)")
--- End diff --

Can we please rephrase to "File with 'local' scheme is not supported to add 
to file server, since it is already available on every node."?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-22 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r197603726
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1517,9 +1517,19 @@ class SparkContext(config: SparkConf) extends 
Logging {
* only supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new Path(path).toUri
+var uri = new Path(path).toUri
+// mark the original path's scheme is local or not, there is no need 
to add the local file
+// in file server.
+var localFile = false
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null =>
+new File(path).getCanonicalFile.toURI.toString
+  case "local" =>
+localFile = true
+val tmpPath = new File(uri.getPath).getCanonicalFile.toURI.toString
+// SPARK-24195: Local is not a valid scheme for FileSystem, we 
should only keep path here.
+uri = new Path(uri.getPath).toUri
+tmpPath
--- End diff --

`This makes me think whether supporting "local" scheme in addFile is 
meaningful or not? Because file with "local" scheme is already existed on every 
node, also it should be aware by the user, so adding it seems not meaingful.`
Yeah, agree with you. The last change wants to treat "local" file without 
adding to fileServer and correct its scheme to "file:", but maybe add a local 
file, the behavior itself is a no-op? So we just forbidden user pass a file 
with "local" scheme in addFile? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-21 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r197068331
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1517,9 +1517,19 @@ class SparkContext(config: SparkConf) extends 
Logging {
* only supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new Path(path).toUri
+var uri = new Path(path).toUri
+// mark the original path's scheme is local or not, there is no need 
to add the local file
+// in file server.
+var localFile = false
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null =>
+new File(path).getCanonicalFile.toURI.toString
+  case "local" =>
+localFile = true
+val tmpPath = new File(uri.getPath).getCanonicalFile.toURI.toString
+// SPARK-24195: Local is not a valid scheme for FileSystem, we 
should only keep path here.
+uri = new Path(uri.getPath).toUri
+tmpPath
--- End diff --

I think the change here make file with "local" scheme a no-op. This makes 
me think whether supporting "local" scheme in `addFile` is meaningful or not? 
Because file with "local" scheme is already existed on every node, also it 
should be aware by the user, so adding it seems not meaingful.

By looking at the similar method `addJar`, there "local" jar is properly 
treated without adding to fileServer, and properly convert to the right scheme 
used by classloader.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-17 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r195931509
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
* only supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new Path(path).toUri
+var uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null | "local" =>
+// SPARK-24195: Local is not a valid scheme for FileSystem, we 
should only keep path here.
+uri = new Path(uri.getPath).toUri
--- End diff --

Ah, I see, thanks. I'll do this in the next commit. Thanks for your patient 
explain.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r195660767
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
* only supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new Path(path).toUri
+var uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null | "local" =>
+// SPARK-24195: Local is not a valid scheme for FileSystem, we 
should only keep path here.
+uri = new Path(uri.getPath).toUri
--- End diff --

I mean, at least we can do:

```
val a = new File(uri.getPath).getCanonicalFile.toURI.toString
uri = new Path(uri.getPath).toUri
a
```

`new Path(uri.getPath).toUri` for trimming the scheme looks not quite clean 
though. It's a-okay at least to me.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-15 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r195655196
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
* only supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new Path(path).toUri
+var uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null | "local" =>
+// SPARK-24195: Local is not a valid scheme for FileSystem, we 
should only keep path here.
+uri = new Path(uri.getPath).toUri
--- End diff --

@HyukjinKwon @jiangxb1987 
Thanks for your explain, I think I know what's your meaning about `we 
getPath doesn't include scheme`. Actually the purpose of this code `uri = new 
Path(uri.getPath).toUri`, is to reassign the var in +1520, we don't want the 
uri including local scheme.
```
Can't we just do new File(uri.getPath).getCanonicalFile.toURI.toString 
without this line?
```
We can't because like I explained above, if we didn't do `uri = new 
Path(uri.getPath).toUri`, will get a exception like below:
```
No FileSystem for scheme: local
java.io.IOException: No FileSystem for scheme: local
at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1830)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:690)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:486)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1557)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-14 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r195310633
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
* only supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new Path(path).toUri
+var uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null | "local" =>
+// SPARK-24195: Local is not a valid scheme for FileSystem, we 
should only keep path here.
+uri = new Path(uri.getPath).toUri
--- End diff --

Yea we can simplify this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r195143756
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
* only supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new Path(path).toUri
+var uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null | "local" =>
+// SPARK-24195: Local is not a valid scheme for FileSystem, we 
should only keep path here.
+uri = new Path(uri.getPath).toUri
--- End diff --

I mean we `getPath` doesn't include scheme:

```
scala> new Path("local://a/b/c")
res0: org.apache.hadoop.fs.Path = local://a/b/c

scala> new Path("local://a/b/c").toUri
res1: java.net.URI = local://a/b/c

scala> new Path("local://a/b/c").toUri.getScheme
res2: String = local

scala> new Path("local://a/b/c").toUri.getPath
res3: String = /b/c
```

why should we do this again?

```
scala> new Path(new Path("local://a/b/c").toUri.getPath).toUri.getPath
res4: String = /b/c
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-13 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r195134460
  
--- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
@@ -116,49 +116,52 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
   test("basic case for addFile and listFiles") {
 val dir = Utils.createTempDir()
 
+// file and absolute path for normal path
 val file1 = File.createTempFile("someprefix1", "somesuffix1", dir)
 val absolutePath1 = file1.getAbsolutePath
 
+// file and absolute path for relative path
 val file2 = File.createTempFile("someprefix2", "somesuffix2", dir)
 val relativePath = file2.getParent + "/../" + 
file2.getParentFile.getName + "/" + file2.getName
 val absolutePath2 = file2.getAbsolutePath
 
+// file and absolute path for path with local scheme
+val file3 = File.createTempFile("someprefix3", "somesuffix3", dir)
+val localPath = "local://" + file3.getParent + "/../" + 
file3.getParentFile.getName +
+  "/" + file3.getName
+val absolutePath3 = file3.getAbsolutePath
+
 try {
   Files.write("somewords1", file1, StandardCharsets.UTF_8)
   Files.write("somewords2", file2, StandardCharsets.UTF_8)
-  val length1 = file1.length()
-  val length2 = file2.length()
+  Files.write("somewords3", file3, StandardCharsets.UTF_8)
 
-  sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local"))
-  sc.addFile(file1.getAbsolutePath)
-  sc.addFile(relativePath)
-  sc.parallelize(Array(1), 1).map(x => {
-val gotten1 = new File(SparkFiles.get(file1.getName))
-val gotten2 = new File(SparkFiles.get(file2.getName))
-if (!gotten1.exists()) {
+  def checkGottenFile(file: File, absolutePath: String): Unit = {
+val length = file.length()
+val gotten = new File(SparkFiles.get(file.getName))
+if (!gotten.exists()) {
   throw new SparkException("file doesn't exist : " + absolutePath1)
 }
-if (!gotten2.exists()) {
-  throw new SparkException("file doesn't exist : " + absolutePath2)
-}
 
-if (length1 != gotten1.length()) {
+if (file.length() != gotten.length()) {
   throw new SparkException(
-s"file has different length $length1 than added file 
${gotten1.length()} : " +
+s"file has different length $length than added file 
${gotten.length()} : " +
   absolutePath1)
 }
-if (length2 != gotten2.length()) {
-  throw new SparkException(
-s"file has different length $length2 than added file 
${gotten2.length()} : " +
-  absolutePath2)
-}
 
-if (absolutePath1 == gotten1.getAbsolutePath) {
+if (absolutePath == gotten.getAbsolutePath) {
   throw new SparkException("file should have been copied :" + 
absolutePath1)
 }
-if (absolutePath2 == gotten2.getAbsolutePath) {
-  throw new SparkException("file should have been copied : " + 
absolutePath2)
-}
--- End diff --

Actually I keep all existing test and just do clean work for reducing 
common code line by adding a function checkGottenFile in 
https://github.com/apache/spark/pull/21533/files/f922fd8c995164cada4a8b72e92c369a827def16#diff-8d5858d578a2dda1a2edb0d8cefa4f24R139.
 If you think it's unnecessary, I just change it back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-13 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r195133122
  
--- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
@@ -116,49 +116,52 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
   test("basic case for addFile and listFiles") {
 val dir = Utils.createTempDir()
 
+// file and absolute path for normal path
 val file1 = File.createTempFile("someprefix1", "somesuffix1", dir)
 val absolutePath1 = file1.getAbsolutePath
 
+// file and absolute path for relative path
 val file2 = File.createTempFile("someprefix2", "somesuffix2", dir)
 val relativePath = file2.getParent + "/../" + 
file2.getParentFile.getName + "/" + file2.getName
 val absolutePath2 = file2.getAbsolutePath
 
+// file and absolute path for path with local scheme
+val file3 = File.createTempFile("someprefix3", "somesuffix3", dir)
+val localPath = "local://" + file3.getParent + "/../" + 
file3.getParentFile.getName +
+  "/" + file3.getName
+val absolutePath3 = file3.getAbsolutePath
+
 try {
   Files.write("somewords1", file1, StandardCharsets.UTF_8)
   Files.write("somewords2", file2, StandardCharsets.UTF_8)
-  val length1 = file1.length()
-  val length2 = file2.length()
+  Files.write("somewords3", file3, StandardCharsets.UTF_8)
 
-  sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local"))
-  sc.addFile(file1.getAbsolutePath)
-  sc.addFile(relativePath)
-  sc.parallelize(Array(1), 1).map(x => {
-val gotten1 = new File(SparkFiles.get(file1.getName))
-val gotten2 = new File(SparkFiles.get(file2.getName))
-if (!gotten1.exists()) {
+  def checkGottenFile(file: File, absolutePath: String): Unit = {
+val length = file.length()
+val gotten = new File(SparkFiles.get(file.getName))
+if (!gotten.exists()) {
   throw new SparkException("file doesn't exist : " + absolutePath1)
 }
-if (!gotten2.exists()) {
-  throw new SparkException("file doesn't exist : " + absolutePath2)
-}
 
-if (length1 != gotten1.length()) {
+if (file.length() != gotten.length()) {
   throw new SparkException(
-s"file has different length $length1 than added file 
${gotten1.length()} : " +
+s"file has different length $length than added file 
${gotten.length()} : " +
   absolutePath1)
 }
-if (length2 != gotten2.length()) {
-  throw new SparkException(
-s"file has different length $length2 than added file 
${gotten2.length()} : " +
-  absolutePath2)
-}
 
-if (absolutePath1 == gotten1.getAbsolutePath) {
+if (absolutePath == gotten.getAbsolutePath) {
   throw new SparkException("file should have been copied :" + 
absolutePath1)
 }
-if (absolutePath2 == gotten2.getAbsolutePath) {
-  throw new SparkException("file should have been copied : " + 
absolutePath2)
-}
+  }
+
+  sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local"))
+  sc.addFile(file1.getAbsolutePath)
+  sc.addFile(relativePath)
+  sc.addFile(localPath)
+  sc.parallelize(Array(1), 1).map(x => {
--- End diff --

Got it, fix in next commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-13 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r195132870
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
* only supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new Path(path).toUri
+var uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null | "local" =>
+// SPARK-24195: Local is not a valid scheme for FileSystem, we 
should only keep path here.
+uri = new Path(uri.getPath).toUri
--- End diff --

Yes, just as @felixcheung said, this because we will use uri in 
https://github.com/apache/spark/pull/21533/files/f922fd8c995164cada4a8b72e92c369a827def16#diff-364713d7776956cb8b0a771e9b62f82dR1557,
 if the uri with local scheme, we'll get an exception cause local is not a 
valid scheme for FileSystem.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-13 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r195133036
  
--- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
@@ -116,49 +116,52 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
   test("basic case for addFile and listFiles") {
 val dir = Utils.createTempDir()
 
+// file and absolute path for normal path
 val file1 = File.createTempFile("someprefix1", "somesuffix1", dir)
 val absolutePath1 = file1.getAbsolutePath
 
+// file and absolute path for relative path
 val file2 = File.createTempFile("someprefix2", "somesuffix2", dir)
 val relativePath = file2.getParent + "/../" + 
file2.getParentFile.getName + "/" + file2.getName
 val absolutePath2 = file2.getAbsolutePath
 
+// file and absolute path for path with local scheme
+val file3 = File.createTempFile("someprefix3", "somesuffix3", dir)
+val localPath = "local://" + file3.getParent + "/../" + 
file3.getParentFile.getName +
--- End diff --

Got it, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r194953025
  
--- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
@@ -116,49 +116,52 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
   test("basic case for addFile and listFiles") {
 val dir = Utils.createTempDir()
 
+// file and absolute path for normal path
 val file1 = File.createTempFile("someprefix1", "somesuffix1", dir)
 val absolutePath1 = file1.getAbsolutePath
 
+// file and absolute path for relative path
 val file2 = File.createTempFile("someprefix2", "somesuffix2", dir)
 val relativePath = file2.getParent + "/../" + 
file2.getParentFile.getName + "/" + file2.getName
 val absolutePath2 = file2.getAbsolutePath
 
+// file and absolute path for path with local scheme
+val file3 = File.createTempFile("someprefix3", "somesuffix3", dir)
+val localPath = "local://" + file3.getParent + "/../" + 
file3.getParentFile.getName +
+  "/" + file3.getName
+val absolutePath3 = file3.getAbsolutePath
+
 try {
   Files.write("somewords1", file1, StandardCharsets.UTF_8)
   Files.write("somewords2", file2, StandardCharsets.UTF_8)
-  val length1 = file1.length()
-  val length2 = file2.length()
+  Files.write("somewords3", file3, StandardCharsets.UTF_8)
 
-  sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local"))
-  sc.addFile(file1.getAbsolutePath)
-  sc.addFile(relativePath)
-  sc.parallelize(Array(1), 1).map(x => {
-val gotten1 = new File(SparkFiles.get(file1.getName))
-val gotten2 = new File(SparkFiles.get(file2.getName))
-if (!gotten1.exists()) {
+  def checkGottenFile(file: File, absolutePath: String): Unit = {
+val length = file.length()
+val gotten = new File(SparkFiles.get(file.getName))
+if (!gotten.exists()) {
   throw new SparkException("file doesn't exist : " + absolutePath1)
 }
-if (!gotten2.exists()) {
-  throw new SparkException("file doesn't exist : " + absolutePath2)
-}
 
-if (length1 != gotten1.length()) {
+if (file.length() != gotten.length()) {
   throw new SparkException(
-s"file has different length $length1 than added file 
${gotten1.length()} : " +
+s"file has different length $length than added file 
${gotten.length()} : " +
   absolutePath1)
 }
-if (length2 != gotten2.length()) {
-  throw new SparkException(
-s"file has different length $length2 than added file 
${gotten2.length()} : " +
-  absolutePath2)
-}
 
-if (absolutePath1 == gotten1.getAbsolutePath) {
+if (absolutePath == gotten.getAbsolutePath) {
   throw new SparkException("file should have been copied :" + 
absolutePath1)
 }
-if (absolutePath2 == gotten2.getAbsolutePath) {
-  throw new SparkException("file should have been copied : " + 
absolutePath2)
-}
--- End diff --

can we not change the existing test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r194953034
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
* only supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new Path(path).toUri
+var uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null | "local" =>
+// SPARK-24195: Local is not a valid scheme for FileSystem, we 
should only keep path here.
+uri = new Path(uri.getPath).toUri
--- End diff --

it changes `uri` - which is reference again below.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-12 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r194927085
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
* only supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new Path(path).toUri
+var uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null | "local" =>
+// SPARK-24195: Local is not a valid scheme for FileSystem, we 
should only keep path here.
+uri = new Path(uri.getPath).toUri
--- End diff --

Yes, same question. The above line seems not useful.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r194812872
  
--- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
@@ -116,49 +116,52 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
   test("basic case for addFile and listFiles") {
 val dir = Utils.createTempDir()
 
+// file and absolute path for normal path
 val file1 = File.createTempFile("someprefix1", "somesuffix1", dir)
 val absolutePath1 = file1.getAbsolutePath
 
+// file and absolute path for relative path
 val file2 = File.createTempFile("someprefix2", "somesuffix2", dir)
 val relativePath = file2.getParent + "/../" + 
file2.getParentFile.getName + "/" + file2.getName
 val absolutePath2 = file2.getAbsolutePath
 
+// file and absolute path for path with local scheme
+val file3 = File.createTempFile("someprefix3", "somesuffix3", dir)
+val localPath = "local://" + file3.getParent + "/../" + 
file3.getParentFile.getName +
+  "/" + file3.getName
+val absolutePath3 = file3.getAbsolutePath
+
 try {
   Files.write("somewords1", file1, StandardCharsets.UTF_8)
   Files.write("somewords2", file2, StandardCharsets.UTF_8)
-  val length1 = file1.length()
-  val length2 = file2.length()
+  Files.write("somewords3", file3, StandardCharsets.UTF_8)
 
-  sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local"))
-  sc.addFile(file1.getAbsolutePath)
-  sc.addFile(relativePath)
-  sc.parallelize(Array(1), 1).map(x => {
-val gotten1 = new File(SparkFiles.get(file1.getName))
-val gotten2 = new File(SparkFiles.get(file2.getName))
-if (!gotten1.exists()) {
+  def checkGottenFile(file: File, absolutePath: String): Unit = {
+val length = file.length()
+val gotten = new File(SparkFiles.get(file.getName))
+if (!gotten.exists()) {
   throw new SparkException("file doesn't exist : " + absolutePath1)
 }
-if (!gotten2.exists()) {
-  throw new SparkException("file doesn't exist : " + absolutePath2)
-}
 
-if (length1 != gotten1.length()) {
+if (file.length() != gotten.length()) {
   throw new SparkException(
-s"file has different length $length1 than added file 
${gotten1.length()} : " +
+s"file has different length $length than added file 
${gotten.length()} : " +
   absolutePath1)
 }
-if (length2 != gotten2.length()) {
-  throw new SparkException(
-s"file has different length $length2 than added file 
${gotten2.length()} : " +
-  absolutePath2)
-}
 
-if (absolutePath1 == gotten1.getAbsolutePath) {
+if (absolutePath == gotten.getAbsolutePath) {
   throw new SparkException("file should have been copied :" + 
absolutePath1)
 }
-if (absolutePath2 == gotten2.getAbsolutePath) {
-  throw new SparkException("file should have been copied : " + 
absolutePath2)
-}
+  }
+
+  sc = new SparkContext(new 
SparkConf().setAppName("test").setMaster("local"))
+  sc.addFile(file1.getAbsolutePath)
+  sc.addFile(relativePath)
+  sc.addFile(localPath)
+  sc.parallelize(Array(1), 1).map(x => {
--- End diff --

nit:

```
map { x =>
  ...
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r194812741
  
--- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
@@ -116,49 +116,52 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
   test("basic case for addFile and listFiles") {
 val dir = Utils.createTempDir()
 
+// file and absolute path for normal path
 val file1 = File.createTempFile("someprefix1", "somesuffix1", dir)
 val absolutePath1 = file1.getAbsolutePath
 
+// file and absolute path for relative path
 val file2 = File.createTempFile("someprefix2", "somesuffix2", dir)
 val relativePath = file2.getParent + "/../" + 
file2.getParentFile.getName + "/" + file2.getName
 val absolutePath2 = file2.getAbsolutePath
 
+// file and absolute path for path with local scheme
+val file3 = File.createTempFile("someprefix3", "somesuffix3", dir)
+val localPath = "local://" + file3.getParent + "/../" + 
file3.getParentFile.getName +
--- End diff --

Let's use string interpolation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21533#discussion_r194812492
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1517,9 +1517,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
* only supported for Hadoop-supported filesystems.
*/
   def addFile(path: String, recursive: Boolean): Unit = {
-val uri = new Path(path).toUri
+var uri = new Path(path).toUri
 val schemeCorrectedPath = uri.getScheme match {
-  case null | "local" => new File(path).getCanonicalFile.toURI.toString
+  case null | "local" =>
+// SPARK-24195: Local is not a valid scheme for FileSystem, we 
should only keep path here.
+uri = new Path(uri.getPath).toUri
--- End diff --

Why is this needed? Can't we just do `new 
File(uri.getPath).getCanonicalFile.toURI.toString` without this line?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-11 Thread xuanyuanking
GitHub user xuanyuanking opened a pull request:

https://github.com/apache/spark/pull/21533

[SPARK-24195][Core] Bug fix for local:/ path in SparkContext.addFile

## What changes were proposed in this pull request?

In the chagnes in 
[SPARK-6300](https://issues.apache.org/jira/browse/SPARK-6300), essentially it 
change schemePath to
```
new File(path).getCanonicalFile.toURI.toString
```
. This has problem when path is local:, as `java.io.File` doesn't handle it.

eg.

new 
File("local:///home/user/demo/logger.config").getCanonicalFile.toURI.toString
res1: String = file:/user/anotheruser/local:/home/user/demo/logger.config

## How was this patch tested?

Add test in `SparkContextSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuanyuanking/spark SPARK-24195

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21533.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21533


commit f922fd8c995164cada4a8b72e92c369a827def16
Author: Yuanjian Li 
Date:   2018-06-12T01:51:44Z

bug fix for local:/ path in sc.addFile




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org