[jira] [Assigned] (SPARK-26449) Missing Dataframe.transform API in Python API

Sean Owen (JIRA) Tue, 26 Feb 2019 16:25:02 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-26449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen reassigned SPARK-26449:
---------------------------------

    Assignee: Hanan Shteingart

> Missing Dataframe.transform API in Python API
> ---------------------------------------------
>
>                 Key: SPARK-26449
>                 URL: https://issues.apache.org/jira/browse/SPARK-26449
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 2.4.0
>            Reporter: Hanan Shteingart
>            Assignee: Hanan Shteingart
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I would like to chain custom transformations as is suggested in this [blog 
> post|https://medium.com/@mrpowers/chaining-custom-pyspark-transformations-4f38a8c7ae55]
> This will allow to write something like the following:
>  
>  
> {code:java}
>  
> def with_greeting(df):
>     return df.withColumn("greeting", lit("hi"))
> def with_something(df, something):
>     return df.withColumn("something", lit(something))
> data = [("jose", 1), ("li", 2), ("liz", 3)]
> source_df = spark.createDataFrame(data, ["name", "age"])
> actual_df = (source_df
>     .transform(with_greeting)
>     .transform(lambda df: with_something(df, "crazy")))
> print(actual_df.show())
> +----+---+--------+---------+
> |name|age|greeting|something|
> +----+---+--------+---------+
> |jose|  1|      hi|    crazy|
> |  li|  2|      hi|    crazy|
> | liz|  3|      hi|    crazy|
> +----+---+--------+---------+
> {code}
> The only thing needed to accomplish this is the following simple method for 
> DataFrame:
> {code:java}
> from pyspark.sql.dataframe import DataFrame 
> def transform(self, f): 
>     return f(self) 
> DataFrame.transform = transform
> {code}
> I volunteer to do the pull request if approved (at least the python part)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26449) Missing Dataframe.transform API in Python API

Reply via email to