[ https://issues.apache.org/jira/browse/SPARK-30532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048839#comment-17048839 ]
Oleksii Kachaiev commented on SPARK-30532: ------------------------------------------ The same applies to other stats functions: {{cov}}, {{corr}} and {{freqItems}}. I'm working on PR to fix all of them (the problem is similar in all cases). > DataFrameStatFunctions.approxQuantile doesn't work with TABLE.COLUMN syntax > --------------------------------------------------------------------------- > > Key: SPARK-30532 > URL: https://issues.apache.org/jira/browse/SPARK-30532 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.4 > Reporter: Chris Suchanek > Priority: Minor > > The DataFrameStatFunctions.approxQuantile doesn't work with fully qualified > column name (i.e TABLE_NAME.COLUMN_NAME) which is often the way you refer to > the column when working with joined dataframes having ambiguous column names. > See code below for example. > {code:java} > import scala.util.Random > val l = (0 to 1000).map(_ => Random.nextGaussian() * 1000) > val df1 = sc.parallelize(l).toDF("num").as("tt1") > val df2 = sc.parallelize(l).toDF("num").as("tt2") > val dfx = df2.crossJoin(df1) > dfx.stat.approxQuantile("tt1.num", Array(0.1), 0.0) > // throws: java.lang.IllegalArgumentException: Field "tt1.num" does not exist. > Available fields: num > dfx.stat.approxQuantile("num", Array(0.1), 0.0) > // throws: org.apache.spark.sql.AnalysisException: Reference 'num' is > ambiguous, could be: tt2.num, tt1.num.;{code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org