Hi Spark Developers & Maintainers, I know we've been talking a lot about what we want changes we want in PySpark to help keep it interesting and usable (see http://apache-spark-developers-list.1001551.n3.nabble.com/Python-Spark-Improvements-forked-from-Spark-Improvement-Proposals-td19422.html). One of the underlying challenges that we haven't explicitly discussed is that a reason behind the slow pace of a lot of the PySpark development is the lack of dedicated Python reviewers.
For changes which are based around parity with an existing component, Python contributors like myself can sometimes get reviewers from the component (like ML) to take a look at our Python changes - but for core changes it's even harder to get reviewers. The general Python PR review dashboard <https://spark-prs.appspot.com/#python> shows the a number of PRs languishing - but to specifically call out a few: - pip installability - https://github.com/apache/spark/pull/15659 - KMeans summary in Python - https://github.com/apache/spark/pull/13557 - The various Anaconda/Virtualenv support PRs (none of them have had any luck with committer bandwidth) - PySpark ML models should have params finally starting to get committer review - but blocked for months ( https://github.com/apache/spark/pull/14653 ) - Python meta algorithms in Scala - https://github.com/apache/spark/pull/13794 (out of sync with master but waiting for months for a committer to say if they are interested in the feature or not) For those following a lot of Python JIRAs you also probably noticed a lot of Python related JIRAs being re-targeted for future versions that keep getting bumped back. The lack of core Python reviewers will make things like Arrow integration difficult to achieve unless the situation changes. This isn't meant to say that the current Python reviewers aren't good - there just isn't enough Python committer bandwidth available to move these things forward. The normal solution to this is adding more committers with that focus area. I'd love to hear y'alls thoughts on this. Cheers, Holden :) -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau