Re: [Pyspark 2.4] not able to partition the data frame by dates

2019-07-31 Thread Rishi Shah
Thanks for your prompt reply Gourav. I am using Spark 2.4.0 (cloudera distribution). The job consistently threw this error, so I narrowed down the dataset by adding a date filter (date rang: 2018-01-01 to 2018-06-30).. However it's still throwing the same error! *command*: spark2-submit --master

Re: [Pyspark 2.4] not able to partition the data frame by dates

2019-07-31 Thread Gourav Sengupta
Hi Rishi, there is no version as 2.4 :), can you please specify the exact SPARK version you are using? How are you starting the SPARK session? And what is the environment? I know this issue occurs intermittently over large writes in S3 and has to do with S3 eventual consistency issues. Just