Hi all, I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general but I am writing this to make sure everybody is happy.
Fokko made a very good investigation on it, see https://github.com/apache/spark/pull/28957#issuecomment-652022449. Assuming from the statistics, I think we're pretty safe to drop them. Also note that dropping Python 2 was actually declared at https://python3statement.org/ Roughly speaking, there are many main advantages by dropping them: 1. It removes a bunch of hacks we added around 700 lines in PySpark. 2. PyPy2 has a critical bug that causes a flaky test, https://issues.apache.org/jira/browse/SPARK-28358 given my testing and investigation. 3. Users can use Python type hints with Pandas UDFs without thinking about Python version 4. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle. 5. ... So it benefits both users and dev. WDYT guys?