[ https://issues.apache.org/jira/browse/SPARK-45509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allison Wang updated SPARK-45509: --------------------------------- Description: SPARK-45220 discovers a behavior difference for a self-join scenario between class Spark and Spark Connect. For instance. here is the query that works without Spark Connect: {code:java} joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) joined.show(){code} But in Spark Connect, it throws this exception: {code:java} pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `name` cannot be resolved. Did you mean one of the following? [`name`, `name`, `age`, `height`].; 'Sort ['name DESC NULLS LAST], true +- Join FullOuter, (name#64 = name#78) :- LocalRelation [name#64, age#65L] +- LocalRelation [name#78, height#79L] {code} On the other hand, this query failed in classic Spark Connect: {code:java} df.join(df, df.name == df.name, "outer").select(df.name).show() {code} {code:java} pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are ambiguous... {code} but this query works with Spark Connect. We need to investigate the behavior difference and fix it. was: SAPRK-45220 discovers a behavior difference for a self-join scenario between class Spark and Spark Connect. For instance. here is the query that works without Spark Connect: {code:java} joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) joined.show(){code} But in Spark Connect, it throws this exception: {code:java} pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `name` cannot be resolved. Did you mean one of the following? [`name`, `name`, `age`, `height`].; 'Sort ['name DESC NULLS LAST], true +- Join FullOuter, (name#64 = name#78) :- LocalRelation [name#64, age#65L] +- LocalRelation [name#78, height#79L] {code} On the other hand, this query failed in classic Spark Connect: {code:java} df.join(df, df.name == df.name, "outer").select(df.name).show() {code} {code:java} pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are ambiguous... {code} but this query works with Spark Connect. We need to investigate the behavior difference and fix it. > Investigate the behavior difference in self-join > ------------------------------------------------ > > Key: SPARK-45509 > URL: https://issues.apache.org/jira/browse/SPARK-45509 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark > Affects Versions: 3.5.0, 4.0.0 > Reporter: Allison Wang > Priority: Major > > SPARK-45220 discovers a behavior difference for a self-join scenario between > class Spark and Spark Connect. > For instance. here is the query that works without Spark Connect: > > {code:java} > joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) > joined.show(){code} > > But in Spark Connect, it throws this exception: > > {code:java} > pyspark.errors.exceptions.connect.AnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter > with name `name` cannot be resolved. Did you mean one of the following? > [`name`, `name`, `age`, `height`].; > 'Sort ['name DESC NULLS LAST], true > +- Join FullOuter, (name#64 = name#78) > :- LocalRelation [name#64, age#65L] > +- LocalRelation [name#78, height#79L] > {code} > > On the other hand, this query failed in classic Spark Connect: > > {code:java} > df.join(df, df.name == df.name, "outer").select(df.name).show() {code} > {code:java} > pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are > ambiguous... {code} > > but this query works with Spark Connect. > We need to investigate the behavior difference and fix it. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org