Spark Partition by Columns doesn't work properly

Chanh Le Wed, 08 Jun 2016 21:09:07 -0700

Hi everyone,
I tested the partition by columns of data frame but it’s not good I mean wrong.
I am using Spark 1.6.1 load data from Cassandra.
I repartition by 2 field date, network_id - 200 partitions
I reparation by 1 field date - 200 partitions.
but my data is data of 90 days -> I mean if we reparation by date it will be 90 
partitions.
val daily = sql
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map("table" -> dailyDetailTableName, "keyspace" -> reportSpace))
  .load()
  .repartition(col("date"))



I mean It doesn’t change the way I put the columns to repartition.

Does anyone has the same problem? 

Thank in advance.

Spark Partition by Columns doesn't work properly

Reply via email to