Re: Spark Partition by Columns doesn't work properly

Chanh Le Wed, 08 Jun 2016 22:49:40 -0700

Thanks, I'll look into it. Any luck to get link related to.

On Thu, Jun 9, 2016, 12:43 PM Jasleen Kaur <jasleenkaur1...@gmail.com>
wrote:


> Try using the datastax package. There was a great talk on spark summit
> about it. It will take care of the boiler plate code and you can focus on
> real business value
>
> On Wednesday, June 8, 2016, Chanh Le <giaosu...@gmail.com> wrote:
>
>> Hi everyone,
>> I tested the partition by columns of data frame but it’s not good I mean
>> wrong.
>> I am using Spark 1.6.1 load data from Cassandra.
>> I repartition by 2 field date, network_id - 200 partitions
>> I reparation by 1 field date - 200 partitions.
>> but my data is data of 90 days -> I mean if we reparation by date it will
>> be 90 partitions.
>>
>> val daily = sql
>>   .read
>>   .format("org.apache.spark.sql.cassandra")
>>   .options(Map("table" -> dailyDetailTableName, "keyspace" -> reportSpace))
>>   .load()
>>   .repartition(col("date"))
>>
>>
>>
>> I mean It doesn’t change the way I put the columns to repartition.
>>
>> Does anyone has the same problem?
>>
>> Thank in advance.
>>
>

Re: Spark Partition by Columns doesn't work properly

Reply via email to