Re: strange behavior when I chain data frame transformations
Hi Ted Its seems really strange. Its seems like in the version were I used 2 data frames spark added ³as(tag)². (Which is really nice. ) Odd that I got different behavior Is this a bug? Kind regards Andy From: Ted Yu <yuzhih...@gmail.com> Date: Friday, May 13, 2016 at 12:38 PM To: Andrew Davidson <a...@santacruzintegration.com> Cc: "user @spark" <user@spark.apache.org> Subject: Re: strange behavior when I chain data frame transformations > In the structure shown, tag is under element. > > I wonder if that was a factor. > > On Fri, May 13, 2016 at 11:49 AM, Andy Davidson > <a...@santacruzintegration.com> wrote: >> I am using spark-1.6.1. >> >> I create a data frame from a very complicated JSON file. I would assume that >> query planer would treat both version of my transformation chains the same >> way. >> >> >> // org.apache.spark.sql.AnalysisException: Cannot resolve column name "tag" >> among (actor, body, generator, pip, id, inReplyTo, link, object, objectType, >> postedTime, provider, retweetCount, twitter_entities, verb); >> >> // DataFrame emptyDF = rawDF.selectExpr("*", ³pip.rules.tag") >> >> // .filter(rawDF.col(tagCol).isNull()); >> >> DataFrame emptyDF1 = rawDF.selectExpr("*", ³pip.rules.tag"); >> >> DataFrame emptyDF = emptyDF1.filter(emptyDF1.col(³tag").isNull()); >> >> >> >> Here is the schema for the gnip structure >> >> |-- pip: struct (nullable = true) >> >> ||-- _profile: struct (nullable = true) >> >> |||-- topics: array (nullable = true) >> >> ||||-- element: string (containsNull = true) >> >> ||-- rules: array (nullable = true) >> >> |||-- element: struct (containsNull = true) >> >> ||||-- tag: string (nullable = true) >> >> >> >> Is this a bug ? >> >> >> >> Andy >> >> >
Re: strange behavior when I chain data frame transformations
In the structure shown, tag is under element. I wonder if that was a factor. On Fri, May 13, 2016 at 11:49 AM, Andy Davidson < a...@santacruzintegration.com> wrote: > I am using spark-1.6.1. > > I create a data frame from a very complicated JSON file. I would assume > that query planer would treat both version of my transformation chains the > same way. > > > // org.apache.spark.sql.AnalysisException: Cannot resolve column name > "tag" among (actor, body, generator, pip, id, inReplyTo, link, object, > objectType, postedTime, provider, retweetCount, twitter_entities, verb); > > // DataFrame emptyDF = rawDF.selectExpr("*", “pip.rules.tag") > > // .filter(rawDF.col(tagCol).isNull()); > > DataFrame emptyDF1 = rawDF.selectExpr("*", “pip.rules.tag"); > > DataFrame emptyDF = emptyDF1.filter(emptyDF1.col(“tag").isNull()); > > > Here is the schema for the gnip structure > > |-- pip: struct (nullable = true) > > ||-- _profile: struct (nullable = true) > > |||-- topics: array (nullable = true) > > ||||-- element: string (containsNull = true) > > ||-- rules: array (nullable = true) > > |||-- element: struct (containsNull = true) > > ||||-- tag: string (nullable = true) > > > Is this a bug ? > > > Andy > > >
strange behavior when I chain data frame transformations
I am using spark-1.6.1. I create a data frame from a very complicated JSON file. I would assume that query planer would treat both version of my transformation chains the same way. // org.apache.spark.sql.AnalysisException: Cannot resolve column name "tag" among (actor, body, generator, pip, id, inReplyTo, link, object, objectType, postedTime, provider, retweetCount, twitter_entities, verb); // DataFrame emptyDF = rawDF.selectExpr("*", ³pip.rules.tag") // .filter(rawDF.col(tagCol).isNull()); DataFrame emptyDF1 = rawDF.selectExpr("*", ³pip.rules.tag"); DataFrame emptyDF = emptyDF1.filter(emptyDF1.col(³tag").isNull()); Here is the schema for the gnip structure |-- pip: struct (nullable = true) ||-- _profile: struct (nullable = true) |||-- topics: array (nullable = true) ||||-- element: string (containsNull = true) ||-- rules: array (nullable = true) |||-- element: struct (containsNull = true) ||||-- tag: string (nullable = true) Is this a bug ? Andy