Re: ASF board report draft for August

2021-08-10 Thread Matei Zaharia
Good point, I’ll make sure to include that. > On Aug 9, 2021, at 9:20 PM, Mridul Muralidharan wrote: > > Hi Matei, > > 3.2 will also include support for pushed based shuffle (spip SPARK-30602). > > Regards, > Mridul > > On Mon, Aug 9, 2021 at 9:26 PM Hyukjin Kwon

Re: Spark 3.2.0 first RC next week

2021-08-10 Thread Min Shen
Hi Gengliang, SPARK-36378 (Switch to using RPCResponse to communicate common block push failures to the client) should be another one. This introduces a slight protocol change to push-based shuffle to improve code robustness and performance, and is almost ready to be committed. Because of the

Re: Performance of PySpark jobs on the Kubernetes cluster

2021-08-10 Thread Khalid Mammadov
Hi Mich I think you need to check your code. If code does not use PySpark API effectively you may get this. I.e. if you use pure Python/pandas api rather than Pyspark i.e. transform->transform->action. e.g df.select(..).withColumn(...)...count() Hope this helps to put you on right direction.

Spark 3.2.0 first RC next week

2021-08-10 Thread Gengliang Wang
Hi all, As of now, there are still some open/in-progress blockers for Spark 3.2.0 release: - Prohibit update mode in native support of session window (SPARK-36463 ) - Avoid inlining non-deterministic With-CTEs(SPARK-36447