I don't know if this is a bug or a feature, but it's a bit counter-intuitive 
when reading code.

The "b" dataframe does not have field "bar" in its schema, but is still able to 
filter on that field.

scala> val a = sc.parallelize(Seq((1,10),(2,20))).toDF("foo","bar")
a: org.apache.spark.sql.DataFrame = [foo: int, bar: int]

scala> a.show
+---+---+
|foo|bar|
+---+---+
|  1| 10|
|  2| 20|
+---+---+

scala> val b = a.select($"foo")
b: org.apache.spark.sql.DataFrame = [foo: int]

scala> b.schema
res3: org.apache.spark.sql.types.StructType = 
StructType(StructField(foo,IntegerType,false))

scala> b.select($"bar").show
org.apache.spark.sql.AnalysisException: cannot resolve '`bar`' given input 
columns: [foo];;
[...snip...]

scala> b.where($"bar" === 20).show
+---+
|foo|
+---+
|  2|
+---+

Reply via email to