[ https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504529#comment-17504529 ]
Brian Schaefer commented on SPARK-38483: ---------------------------------------- The column name does differ between the two when selecting a struct field, but handling that case seems fairly straightforward. {code:python} >>> df = spark.createDataFrame([{"struct": {"outer_field": {"inner_field": >>> 1}}}]) >>> values = F.col("struct.outer_field.inner_field") >>> print(df.select(values).schema[0].name) inner_field >>> print(values._jc.toString()) struct.outer_field.inner_field >>> print(values._jc.toString().split(".")[-1]) inner_field{code} > Column name or alias as an attribute of the PySpark Column class > ---------------------------------------------------------------- > > Key: SPARK-38483 > URL: https://issues.apache.org/jira/browse/SPARK-38483 > Project: Spark > Issue Type: New Feature > Components: PySpark > Affects Versions: 3.2.1 > Reporter: Brian Schaefer > Priority: Minor > Labels: starter > > Having the name of a column as an attribute of PySpark {{Column}} class > instances can enable some convenient patterns, for example: > Applying a function to a column and aliasing with the original name: > {code:java} > values = F.col("values") > # repeating the column name as an alias > distinct_values = F.array_distinct(values).alias("values") > # re-using the existing column name > distinct_values = F.array_distinct(values).alias(values._name){code} > Checking the column name inside a custom function and applying conditional > logic on the name: > {code:java} > def custom_function(col: Column) -> Column: > if col._name == "my_column": > return col.astype("int") > return col.astype("string"){code} > The proposal in this issue is to add a property {{Column.\_name}} that > obtains the name or alias of a column in a similar way as currently done in > the {{Column.\_\_repr\_\_}} method: > [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.] > The choice of {{_name}} intentionally avoids collision with the existing > {{Column.name}} method, which is an alias for {{{}Column.alias{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org