You just need to assign it to a new variable: val avroFile = sqlContext.read.format("com.databricks.spark.avro").load(inFile) val repart = avroFile.repartition(10) repart.save(outFile, "parquet")
From: Steve Annessa Date: Wednesday, September 16, 2015 at 2:08 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: DataFrame repartition not repartitioning Hello, I'm trying to load in an Avro file and write it out as Parquet. I would like to have enough partitions to properly parallelize on. When I do the simple load and save I get 1 partition out. I thought I would be able to use repartition like the following: val avroFile = sqlContext.read.format("com.databricks.spark.avro").load(inFile) avroFile.repartition(10) avroFile.save(outFile, "parquet") However, the saved file is still a single partition in the directory. What am I missing? Thanks, -- Steve