I am using spark-1.6.1. I create a data frame from a very complicated JSON file. I would assume that query planer would treat both version of my transformation chains the same way.
// org.apache.spark.sql.AnalysisException: Cannot resolve column name "tag" among (actor, body, generator, pip, id, inReplyTo, link, object, objectType, postedTime, provider, retweetCount, twitter_entities, verb); // DataFrame emptyDF = rawDF.selectExpr("*", ³pip.rules.tag") // .filter(rawDF.col(tagCol).isNull()); DataFrame emptyDF1 = rawDF.selectExpr("*", ³pip.rules.tag"); DataFrame emptyDF = emptyDF1.filter(emptyDF1.col(³tag").isNull()); Here is the schema for the gnip structure |-- pip: struct (nullable = true) | |-- _profile: struct (nullable = true) | | |-- topics: array (nullable = true) | | | |-- element: string (containsNull = true) | |-- rules: array (nullable = true) | | |-- element: struct (containsNull = true) | | | |-- tag: string (nullable = true) Is this a bug ? Andy