Apache Spark 3.3.4 EOL Release?

2023-12-01 Thread Dongjoon Hyun
Hi, All. Since the Apache Spark 3.3.0 RC6 vote passed on Jun 14, 2022, branch-3.3 has been maintained and served well until now. - https://github.com/apache/spark/releases/tag/v3.3.0 (tagged on Jun 9th, 2022) - https://lists.apache.org/thread/zg6k1spw6k1c7brgo6t7qldvsqbmfytm (vote result on June

[FYI] SPARK-45981: Improve Python language test coverage

2023-12-01 Thread Dongjoon Hyun
Hi, All. As a part of Apache Spark 4.0.0 (SPARK-44111), the Apache Spark community starts to have test coverage for all supported Python versions from Today. - https://github.com/apache/spark/actions/runs/7061665420 Here is a summary. 1. Main CI: All PRs and commits on `master` branch are

10x to 100x faster df.groupby().applyInPandas()

2023-12-01 Thread Enrico Minack
Hi devs, I am looking for some PySpark dev that is interested in some 10x to 100x speed up of df.groupby().applyInPandas() for small groups. A PoC and benchmark can be found at https://github.com/apache/spark/pull/37360#issuecomment-1228293766. I suppose, the same approach could be taken