Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5
at https://github.com/apache/spark/pull/28957. I assume people support it
in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see
https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at
https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test,
https://issues.apache.org/jira/browse/SPARK-28358 given my testing and
investigation.
  3. Users can use Python type hints with Pandas UDFs without thinking
about Python version
  4. Users can leverage one latest cloudpickle,
https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also
leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?

Reply via email to