[jira] [Created] (SPARK-30989) TABLE.COLUMN reference doesn't work with new columns created by UDF
Chris Suchanek created SPARK-30989: -- Summary: TABLE.COLUMN reference doesn't work with new columns created by UDF Key: SPARK-30989 URL: https://issues.apache.org/jira/browse/SPARK-30989 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.4 Reporter: Chris Suchanek When a dataframe is created with an alias (`.as("...")`) its columns can be referred as `TABLE.COLUMN` but it doesn't work for newly created columns with UDF. {code:java} // code placeholder df1 = sc.parallelize(l).toDF("x","y").as("cat") val squared = udf((s: Int) => s * s) val df2 = df1.withColumn("z", squared(col("y"))) df2.columns //Array[String] = Array(x, y, z) df2.select("cat.x") // works df2.select("cat.z") // Doesn't work // org.apache.spark.sql.AnalysisException: cannot resolve '`cat.z`' given input // columns: [cat.x, cat.y, z];; {code} Might be related to: https://issues.apache.org/jira/browse/SPARK-30532 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30532) DataFrameStatFunctions.approxQuantile doesn't work with TABLE.COLUMN syntax
Chris Suchanek created SPARK-30532: -- Summary: DataFrameStatFunctions.approxQuantile doesn't work with TABLE.COLUMN syntax Key: SPARK-30532 URL: https://issues.apache.org/jira/browse/SPARK-30532 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.4 Reporter: Chris Suchanek The DataFrameStatFunctions.approxQuantile doesn't work with fully qualified column name (i.e TABLE_NAME.COLUMN_NAME) which is often the way you refer to the column when working with joined dataframes having ambiguous column names. See code below for example. {code:java} import scala.util.Random val l = (0 to 1000).map(_ => Random.nextGaussian() * 1000) val df1 = sc.parallelize(l).toDF("num").as("tt1") val df2 = sc.parallelize(l).toDF("num").as("tt2") val dfx = df2.crossJoin(df1) dfx.stat.approxQuantile("tt1.num", Array(0.1), 0.0) // throws: java.lang.IllegalArgumentException: Field "tt1.num" does not exist. Available fields: num dfx.stat.approxQuantile("num", Array(0.1), 0.0) // throws: org.apache.spark.sql.AnalysisException: Reference 'num' is ambiguous, could be: tt2.num, tt1.num.;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org