Stephen Offer created SPARK-44336: ------------------------------------- Summary: Add Python inbuilt functions to DataFrame for ease of use for Python developers Key: SPARK-44336 URL: https://issues.apache.org/jira/browse/SPARK-44336 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 3.4.1 Reporter: Stephen Offer
Python developers are used to common inbuilt functions when developing but PySpark doesn't support any of the most used inbuilt functionality for DataFrames. PySpark already has this functionality for columns but not for the DataFrame itself. Adding this support for DataFrames would simplify some parts of development. For example: {code:java} if df == df1: # DataFrame Equality if df != df2: # DataFrame Inequality df_large = df * 100 # Quickly make a larger dataframe through union of copies # Very useful for performance testing df_sub = df1 - df2 # Simple DataFrame subtraction # Equivalent to df1.subtract(df2) df4 = df + df1 # Equivalent to df.union(df1) len(df) # Equivalent to df.count() for row in df: # Equivalent to `for row in df.collect():` some_work(row) if "company_name" in df: # Check if item is in the DataFrame {code} There is an ongoing DataFrame equality function effort in PR: 41833, I've also built my own. These are suggestions, any other functions to be added or removed from this list can be discussed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org