[ https://issues.apache.org/jira/browse/SPARK-17333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751040#comment-16751040 ]
Maciej Szymkiewicz commented on SPARK-17333: -------------------------------------------- [~Alexander_Gorokhov] Personally I maintain relatively complete set of annotations ([https://github.com/zero323/pyspark-stubs)] and in the past declared that I am happy to donate it and help with merge ([http://apache-spark-developers-list.1001551.n3.nabble.com/PYTHON-PySpark-typing-hints-td21560.html)]. This topic has been also raised on another occasion ([http://apache-spark-developers-list.1001551.n3.nabble.com/Python-friendly-API-for-Spark-3-0-td25016.html)|http://apache-spark-developers-list.1001551.n3.nabble.com/Python-friendly-API-for-Spark-3-0-td25016.html).] > Make pyspark interface friendly with static analysis > ---------------------------------------------------- > > Key: SPARK-17333 > URL: https://issues.apache.org/jira/browse/SPARK-17333 > Project: Spark > Issue Type: Improvement > Components: PySpark > Reporter: Assaf Mendelson > Priority: Trivial > > Static analysis tools such as those common to IDE for auto completion and > error marking, tend to have poor results with pyspark. > This is cause by two separate issues: > The first is that many elements are created programmatically such as the max > function in pyspark.sql.functions. > The second is that we tend to use pyspark in a functional manner, meaning > that we chain many actions (e.g. df.filter().groupby().agg()....) and since > python has no type information this can become difficult to understand. > I would suggest changing the interface to improve it. > The way I see it we can either change the interface or provide interface > enhancements. > Changing the interface means defining (when possible) all functions directly, > i.e. instead of having a __functions__ dictionary in pyspark.sql.functions.py > and then generating the functions programmatically by using _create_function, > create the function directly. > def max(col): > """ > docstring > """ > _create_function(max,"docstring") > Second we can add type indications to all functions as defined in pep 484 or > pycharm's legacy type hinting > (https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html#legacy). > So for example max might look like this: > def max(col): > """ > does a max. > :type col: Column > :rtype Column > """ > This would provide a wide range of support as these types of hints, while old > are pretty common. > A second option is to use PEP 3107 to define interfaces (pyi files) > in this case we might have a functions.pyi file which would contain something > like: > def max(col: Column) -> Column: > """ > Aggregate function: returns the maximum value of the expression in a > group. > """ > ... > This has the advantage of easier to understand types and not touching the > code (only supported code) but has the disadvantage of being separately > managed (i.e. greater chance of doing a mistake) and the fact that some > configuration would be needed in the IDE/static analysis tool instead of > working out of the box. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org