[ https://issues.apache.org/jira/browse/SPARK-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312541#comment-15312541 ]
Marco Capuccini edited comment on SPARK-15729 at 6/2/16 4:05 PM: ----------------------------------------------------------------- [~sowen] I see! Thanks for the clarification. I'll open a PR, I think many people think that a distributed FS is optional when using Spark in a distributed environment. I did not mention that when I was running my applications with 1.4.0 the data was written to NFS, and maybe that's the reason why it was working fine. I have another question. Let's assume I have copied the input data on each node, in the exactly same path, and read it using sc.textFile. With such assumption, let's say I perform some analysis on the dataset, reducing it to something that can be collected on the driver node. Now if I collect the reduced dataset, and I save it only on the machine where the driver is running, using the Scala IO primitives, would this work? Or would there be some corruption in the results? was (Author: m.capucc...@gmail.com): I see! Thanks for the clarification. I'll open a PR, I think many people think that a distributed FS is optional when using Spark in a distributed environment. I did not mention that when I was running my applications with 1.4.0 the data was written to NFS, and maybe that's the reason why it was working fine. I have another question. Let's assume I have copied the input data on each node, in the exactly same path, and read it using sc.textFile. With such assumption, let's say I perform some analysis on the dataset, reducing it to something that can be collected on the driver node. Now if I collect the reduced dataset, and I save it only on the machine where the driver is running, using the Scala IO primitives, would this work? Or would there be some corruption in the results? > saveAsTextFile not working on regular filesystem > ------------------------------------------------ > > Key: SPARK-15729 > URL: https://issues.apache.org/jira/browse/SPARK-15729 > Project: Spark > Issue Type: Bug > Affects Versions: 1.6.1 > Reporter: Marco Capuccini > Priority: Blocker > > I set up a standalone Spark cluster. I don't need HDFS, so I just want to > save the files on the regular file system in a distributed manner. For > testing purpose, I opened a Spark Shell, and I run the following code. > sc.parallelize(1 to 100).saveAsTextFile("file:///mnt/volume/test.txt") > I got no error from this, but if I go to inspect the /mnt/volume/test.txt > folder on each node this is what I see: > On the master (where I launched the spark shell): > /mnt/volume/test.txt/_SUCCESS > On the workers: > /mnt/volume/test.txt/_temporary > It seems like some failure occurred, but I didn't get any error. Is this a > bug, or am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org