[ https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514788#comment-17514788 ]
Brian Schaefer commented on SPARK-38483: ---------------------------------------- I've been thinking about this the past few weeks and would like to propose a minimal version of this suggested feature: * a property {{Column._name}} that simply returns {{Column._jc.toString()}} * an instance variable {{Column._alias}} that is set when {{Column.alias()}} is called. The combination of these two provides a convenient interface for Python users without promising too much. A common use case in my own work would be re-using an alias (mentioned in the ticket description): {code:python} >>> def process_values(col): ... new_values = ... ... return new_values.alias(col._alias or col._name) ... >>> values = F.col("original_values").alias("values") >>> df.select(process_values(values)) {code} > Column name or alias as an attribute of the PySpark Column class > ---------------------------------------------------------------- > > Key: SPARK-38483 > URL: https://issues.apache.org/jira/browse/SPARK-38483 > Project: Spark > Issue Type: New Feature > Components: PySpark > Affects Versions: 3.2.1 > Reporter: Brian Schaefer > Priority: Minor > Labels: starter > > Having the name of a column as an attribute of PySpark {{Column}} class > instances can enable some convenient patterns, for example: > Applying a function to a column and aliasing with the original name: > {code:java} > values = F.col("values") > # repeating the column name as an alias > distinct_values = F.array_distinct(values).alias("values") > # re-using the existing column name > distinct_values = F.array_distinct(values).alias(values._name){code} > Checking the column name inside a custom function and applying conditional > logic on the name: > {code:java} > def custom_function(col: Column) -> Column: > if col._name == "my_column": > return col.astype("int") > return col.astype("string"){code} > The proposal in this issue is to add a property {{Column.\_name}} that > obtains the name or alias of a column in a similar way as currently done in > the {{Column.\_\_repr\_\_}} method: > [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.] > The choice of {{_name}} intentionally avoids collision with the existing > {{Column.name}} method, which is an alias for {{{}Column.alias{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org