[GitHub] [spark] Hellsen83 commented on pull request #23877: [SPARK-26449][PYTHON] Add transform method to DataFrame API

GitBox Wed, 24 Jun 2020 20:50:10 -0700


Hellsen83 commented on pull request #23877:
URL: https://github.com/apache/spark/pull/23877#issuecomment-649198577



   Hello @MrPowers ,
   you are right, this is in fact motivated by your excellent blog post - thank 
you so much for that!
   From my experience - i.e. bringing this style of wrting PySpark 
transformations into a heterogeneous group of roughly 15 devs/data scientists - 
the following was used most frequently and people new to the game were able to 
pick this up quickly:
   
   ```
   def my_logical_name(arg1: type1, arg2: type2):
       """My Docstring Style goes here
       
       :arg1: does something
       :arg2: does something_else
       :returns: a dataframe that was first somethinged and then something_elsed
       """
       def _(df: DataFrame):
           return df.do_something(arg1).do_something_else(arg2)
       return _
   
   def test_my_logical_name_returns_none_if_args_are_equal():
       ..
       result_df: DataFrame = df.transform(my_logical_name(arg1, arg1))
       ..
   ```
   
   So I am right with ya, but propose "\_" as the inner function name and the 
above mentioned docstring placement. Main reasons for "_":
   1) the amount of visual noise should be lowered as much as possible (when 
writing many transformations and things do get more complicated, this pays off)
   2) if you name the inner function, people _will_ give custom names to it. 
this goes against 1) and uniform code, as names will start to vary.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hellsen83 commented on pull request #23877: [SPARK-26449][PYTHON] Add transform method to DataFrame API

Reply via email to