DataFrame repartition not repartitioning

Steve Annessa Wed, 16 Sep 2015 11:08:55 -0700

Hello,

I'm trying to load in an Avro file and write it out as Parquet. I would
like to have enough partitions to properly parallelize on. When I do the
simple load and save I get 1 partition out. I thought I would be able to
use repartition like the following:


val avroFile =
sqlContext.read.format("com.databricks.spark.avro").load(inFile)
avroFile.repartition(10)
avroFile.save(outFile, "parquet")

However, the saved file is still a single partition in the directory.

What am I missing?

Thanks,

-- Steve

DataFrame repartition not repartitioning

Reply via email to