[ https://issues.apache.org/jira/browse/SPARK-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312482#comment-15312482 ]
Marco Capuccini edited comment on SPARK-15729 at 6/2/16 3:29 PM: ----------------------------------------------------------------- [~sowen] I know what I am doing. I know that each of the node will have a piece of the resulting dataset. I am sure that this is a bug, as in version 1.4.0 this is working, and the parts are correctly located in /mnt/volume/test.txt in each node. This is what I see under /mnt/volume/test.txt in version 1.4.0: master: _SUCCESS worker 1: PART_0001 ... PART_000M worker 2: PART_000M ... In Spark 1.6.0, I see something like: master: _SUCCESS worker 1: _temporary worker 2: _temporary You can do this simple test, when you save to the regular file system: sc.parallelize(1 to 100).saveAsTextFile("file:///mnt/volume/test.txt") val count = sc.textFile("file:///mnt/volume/test.txt").count println(count) The result will be 0 in version 1.6.0. If this is not a bug, I'd like to understand what is going on on version 1.6.x, as in 1.4.0 this is working as a charm. was (Author: m.capucc...@gmail.com): [~sowen] I know what I am doing. I know that each of the node will have a piece of the resulting dataset. I am sure that this is a bug, as in version 1.4.0 this is working, and the parts are correctly located in the root directory of /mnt/volume/test.txt in each node. This is what I see under /mnt/volume/test.txt in version 1.4.0: master: _SUCCESS worker 1: PART_0001 ... PART_000M worker 2: PART_000M ... In Spark 1.6.0, I see something like: master: _SUCCESS worker 1: _temporary worker 2: _temporary You can do this simple test, when you save to the regular file system: sc.parallelize(1 to 100).saveAsTextFile("file:///mnt/volume/test.txt") val count = sc.textFile("file:///mnt/volume/test.txt").count println(count) The result will be 0 in version 1.6.0. If this is not a bug, I'd like to understand what is going on on version 1.6.x, as in 1.4.0 this is working as a charm. > saveAsTextFile not working on regular filesystem > ------------------------------------------------ > > Key: SPARK-15729 > URL: https://issues.apache.org/jira/browse/SPARK-15729 > Project: Spark > Issue Type: Bug > Affects Versions: 1.6.1 > Reporter: Marco Capuccini > Priority: Blocker > > I set up a standalone Spark cluster. I don't need HDFS, so I just want to > save the files on the regular file system in a distributed manner. For > testing purpose, I opened a Spark Shell, and I run the following code. > sc.parallelize(1 to 100).saveAsTextFile("file:///mnt/volume/test.txt") > I got no error from this, but if I go to inspect the /mnt/volume/test.txt > folder on each node this is what I see: > On the master (where I launched the spark shell): > /mnt/volume/test.txt/_SUCCESS > On the workers: > /mnt/volume/test.txt/_temporary > It seems like some failure occurred, but I didn't get any error. Is this a > bug, or am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org