Re: [pyspark2.4+] A lot of tasks failed, but job eventually completes

2020-01-05 Thread hemant singh
You can try repartitioning the data, if it’s a skewed data then you may need to salt the keys for better partitioning. Are you using a coalesce or any other fn which brings the data to lesser nodes. Window function also incurs shuffling that could be an issue. On Mon, 6 Jan 2020 at 9:49 AM, Rishi

Re: [pyspark2.4+] A lot of tasks failed, but job eventually completes

2020-01-05 Thread Rishi Shah
Thanks Hemant, underlying data volume increased from 550GB to 690GB and now the same job doesn't succeed. I tried incrementing executor memory to 20G as well, still fails. I am running this in Databricks and start cluster with 20G assigned to spark.executor.memory property. Also some more

Re: [pyspark2.4+] A lot of tasks failed, but job eventually completes

2020-01-05 Thread hemant singh
You can try increasing the executor memory, generally this error comes when there is not enough memory in individual executors. Job is getting completed may be because when tasks are re-scheduled it would be going through. Thanks. On Mon, 6 Jan 2020 at 5:47 AM, Rishi Shah wrote: > Hello All, >

[pyspark2.4+] A lot of tasks failed, but job eventually completes

2020-01-05 Thread Rishi Shah
Hello All, One of my jobs, keep getting into this situation where 100s of tasks keep failing with below error but job eventually completes. org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 bytes of memory Could someone advice? -- Regards, Rishi Shah

OrderBy Year and Month is not displaying correctly

2020-01-05 Thread Mich Talebzadeh
Hi, I am working out monthly outgoing etc from an account and I am using the following code import org.apache.spark.sql.expressions.Window val wSpec = Window.partitionBy(year(col("transactiondate")),month(col("transactiondate"))) joint_accounts.

Re: How more than one spark job can write to same partition in the parquet file

2020-01-05 Thread Iqbal Singh
Hey Chetan, I have not got your question. Are you trying to write to a partition from two actions ?? or you are looking for writing from two jobs. Except for maintaining the state for the dataset completeness in that case, I dont see any issues. We are writing data to a Partition using two

unsubscribe

2020-01-05 Thread Bruno S. de Barros
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

unsubscribe

2020-01-05 Thread Rishabh Pugalia
unsubscribe -- Thanks and Best Regards, Rishabh