Chris Suchanek created SPARK-30532:
--------------------------------------

             Summary: DataFrameStatFunctions.approxQuantile doesn't work with 
TABLE.COLUMN syntax
                 Key: SPARK-30532
                 URL: https://issues.apache.org/jira/browse/SPARK-30532
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.4
            Reporter: Chris Suchanek


The DataFrameStatFunctions.approxQuantile doesn't work with fully qualified 
column name (i.e TABLE_NAME.COLUMN_NAME) which is often the way you refer to 
the column when working with joined dataframes having ambiguous column names.


See code below for example.
{code:java}

import scala.util.Random
val l = (0 to 1000).map(_ => Random.nextGaussian() * 1000)
val df1 = sc.parallelize(l).toDF("num").as("tt1")
val df2 = sc.parallelize(l).toDF("num").as("tt2")
val dfx = df2.crossJoin(df1)

dfx.stat.approxQuantile("tt1.num", Array(0.1), 0.0)
// throws: java.lang.IllegalArgumentException: Field "tt1.num" does not exist.
Available fields: num

dfx.stat.approxQuantile("num", Array(0.1), 0.0)
// throws: org.apache.spark.sql.AnalysisException: Reference 'num' is 
ambiguous, could be: tt2.num, tt1.num.;{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to