Re: "where" clause able to access fields not in its schema

2019-02-13 Thread Yeikel
It seems that we are using the function incorrectly. val a = Seq((1,10),(2,20)).toDF("foo","bar") val b = a.select($"foo") val c = b.where(b("bar") === 20) c.show Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot resolve column name "bar" among

Re: "where" clause able to access fields not in its schema

2019-02-13 Thread Vadim Semenov
Yeah, the filter gets infront of the select after analyzing scala> b.where($"bar" === 20).explain(true) == Parsed Logical Plan == 'Filter ('bar = 20) +- AnalysisBarrier +- Project [foo#6] +- Project [_1#3 AS foo#6, _2#4 AS bar#7] +- SerializeFromObject

Re: "where" clause able to access fields not in its schema

2019-02-13 Thread Yeikel
This is indeed strange. To add to the question , I can see that if I use a filter I get an exception (as expected) , so I am not sure what's the difference between the where clause and filter : b.filter(s=> { val bar : String = s.getAs("bar") bar.equals("20") }).show *

"where" clause able to access fields not in its schema

2019-02-13 Thread Alex Nastetsky
I don't know if this is a bug or a feature, but it's a bit counter-intuitive when reading code. The "b" dataframe does not have field "bar" in its schema, but is still able to filter on that field. scala> val a = sc.parallelize(Seq((1,10),(2,20))).toDF("foo","bar") a: