Re: DataFrame repartition not repartitioning

Silvio Fiorito Wed, 16 Sep 2015 11:15:08 -0700

You just need to assign it to a new variable:

val avroFile = sqlContext.read.format("com.databricks.spark.avro").load(inFile)
val repart = avroFile.repartition(10)
repart.save(outFile, "parquet")

From: Steve Annessa
Date: Wednesday, September 16, 2015 at 2:08 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: DataFrame repartition not repartitioning

Hello,

I'm trying to load in an Avro file and write it out as Parquet. I would like to 
have enough partitions to properly parallelize on. When I do the simple load 
and save I get 1 partition out. I thought I would be able to use repartition 
like the following:

val avroFile = sqlContext.read.format("com.databricks.spark.avro").load(inFile)
avroFile.repartition(10)
avroFile.save(outFile, "parquet")

However, the saved file is still a single partition in the directory.

What am I missing?

Thanks,

-- Steve

Re: DataFrame repartition not repartitioning

Reply via email to