Does dataframe spark API write/create a single file instead of directory as a result of write operation.

2020-02-21 Thread Kshitij
Hi, There is no dataframe spark API which writes/creates a single file instead of directory as a result of write operation. Below both options will create directory with a random file name. df.coalesce(1).write.csv() df.write.csv() Instead of creating directory with standard files

PowerIterationClustering

2020-02-21 Thread Monish R
Hi guys, I am new to mlib and trying out PowerIterationClustering as per the example mentioned below, https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/mllib/JavaPowerIterationClusteringExample.java I am having trouble in understanding how the output

Re: Serialization error when using scala kernel with Jupyter

2020-02-21 Thread Apostolos N. Papadopoulos
collect() returns the contents of the RDD back to the Driver in a local variable. Where is the local variable? Try val result = rdd.map(x => x + 1).collect() regards, Apostolos On 21/2/20 21:28, Nikhil Goyal wrote: Hi all, I am trying to use almond scala kernel to run spark session on

Serialization error when using scala kernel with Jupyter

2020-02-21 Thread Nikhil Goyal
Hi all, I am trying to use almond scala kernel to run spark session on Jupyter. I am using scala version 2.12.8. I am creating spark session with master set to Yarn. This is the code: val rdd = spark.sparkContext.parallelize(Seq(1, 2, 4)) rdd.map(x => x + 1).collect() Exception:

Spark RDD ouput path for data lineage

2020-02-21 Thread ard3nte
Hi, i am trying to do data lineage, so i need to extract output path from RDD job (for example someRDD.saveAsTextFile("/path/")) using SparListener. How can i do that? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/