Re: Spark Partition by Columns doesn't work properly

2016-06-08 Thread Jasleen Kaur
; On Thu, Jun 9, 2016, 12:43 PM Jasleen Kaur <jasleenkaur1...@gmail.com > <javascript:_e(%7B%7D,'cvml','jasleenkaur1...@gmail.com');>> wrote: > >> Try using the datastax package. There was a great talk on spark summit >> about it. It will take care of the boiler plate cod

Re: Spark Partition by Columns doesn't work properly

2016-06-08 Thread Jasleen Kaur
Try using the datastax package. There was a great talk on spark summit about it. It will take care of the boiler plate code and you can focus on real business value On Wednesday, June 8, 2016, Chanh Le wrote: > Hi everyone, > I tested the partition by columns of data frame

Writing to HDFS

2015-08-03 Thread Jasleen Kaur
I am executing a spark job on a cluster as a yarn-client(Yarn cluster not an option due to permission issues). - num-executors 800 - spark.akka.frameSize=1024 - spark.default.parallelism=25600 - driver-memory=4G - executor-memory=32G. - My input size is around 1.5TB. My problem