Hi everyone, I tested the partition by columns of data frame but it’s not good I mean wrong. I am using Spark 1.6.1 load data from Cassandra. I repartition by 2 field date, network_id - 200 partitions I reparation by 1 field date - 200 partitions. but my data is data of 90 days -> I mean if we reparation by date it will be 90 partitions. val daily = sql .read .format("org.apache.spark.sql.cassandra") .options(Map("table" -> dailyDetailTableName, "keyspace" -> reportSpace)) .load() .repartition(col("date"))
I mean It doesn’t change the way I put the columns to repartition. Does anyone has the same problem? Thank in advance.