spark ml Dataframe vs Labeled Point RDD Mllib speed

2016-01-18 Thread jarias
rom a text file. I'm sorry if I'm just messing some concepts from the documentation, but after an intensive experimentation I don't really see a clear strategy to use these different elements. Any thoughts would be really appreciated :) Cheers, jarias -- View this message in context: http://ap

saveAsTextFile creates an empty folder in HDFS

2015-10-02 Thread jarias
dist.saveAsTextFile("hdfs://node1.i3a.info/user/jarias/test/") 15/10/02 10:19:22 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 15/10/02 10:19:22 INFO SparkContext: Starting job: saveAsTextFile at :27 15/10/02 10:19:22 INFO DAGScheduler: Got job 3 (saveAsT