[jira] [Created] (SPARK-30989) TABLE.COLUMN reference doesn't work with new columns created by UDF

2020-02-28 Thread Chris Suchanek (Jira)
Chris Suchanek created SPARK-30989:
--

 Summary: TABLE.COLUMN reference doesn't work with new columns 
created by UDF
 Key: SPARK-30989
 URL: https://issues.apache.org/jira/browse/SPARK-30989
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: Chris Suchanek


When a dataframe is created with an alias (`.as("...")`) its columns can be 
referred as `TABLE.COLUMN` but it doesn't work for newly created columns with 
UDF.
{code:java}

// code placeholder
df1 = sc.parallelize(l).toDF("x","y").as("cat")
val squared = udf((s: Int) => s * s)
val df2 = df1.withColumn("z", squared(col("y")))
df2.columns //Array[String] = Array(x, y, z)

df2.select("cat.x") // works

df2.select("cat.z") // Doesn't work
// org.apache.spark.sql.AnalysisException: cannot resolve '`cat.z`' given input 
// columns: [cat.x, cat.y, z];;
{code}
Might be related to: https://issues.apache.org/jira/browse/SPARK-30532



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30532) DataFrameStatFunctions.approxQuantile doesn't work with TABLE.COLUMN syntax

2020-01-16 Thread Chris Suchanek (Jira)
Chris Suchanek created SPARK-30532:
--

 Summary: DataFrameStatFunctions.approxQuantile doesn't work with 
TABLE.COLUMN syntax
 Key: SPARK-30532
 URL: https://issues.apache.org/jira/browse/SPARK-30532
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: Chris Suchanek


The DataFrameStatFunctions.approxQuantile doesn't work with fully qualified 
column name (i.e TABLE_NAME.COLUMN_NAME) which is often the way you refer to 
the column when working with joined dataframes having ambiguous column names.


See code below for example.
{code:java}

import scala.util.Random
val l = (0 to 1000).map(_ => Random.nextGaussian() * 1000)
val df1 = sc.parallelize(l).toDF("num").as("tt1")
val df2 = sc.parallelize(l).toDF("num").as("tt2")
val dfx = df2.crossJoin(df1)

dfx.stat.approxQuantile("tt1.num", Array(0.1), 0.0)
// throws: java.lang.IllegalArgumentException: Field "tt1.num" does not exist.
Available fields: num

dfx.stat.approxQuantile("num", Array(0.1), 0.0)
// throws: org.apache.spark.sql.AnalysisException: Reference 'num' is 
ambiguous, could be: tt2.num, tt1.num.;{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org