Re: strange behavior when I chain data frame transformations

2016-05-13 Thread Andy Davidson
Hi Ted


Its seems really strange. Its seems like in the version were I used 2 data
frames spark added ³as(tag)². (Which is really nice. )

Odd that I got different behavior

Is this a bug?

Kind regards

Andy



From:  Ted Yu <yuzhih...@gmail.com>
Date:  Friday, May 13, 2016 at 12:38 PM
To:  Andrew Davidson <a...@santacruzintegration.com>
Cc:  "user @spark" <user@spark.apache.org>
Subject:  Re: strange behavior when I chain data frame transformations

> In the structure shown, tag is under element.
> 
> I wonder if that was a factor.
> 
> On Fri, May 13, 2016 at 11:49 AM, Andy Davidson
> <a...@santacruzintegration.com> wrote:
>> I am using spark-1.6.1.
>> 
>> I create a data frame from a very complicated JSON file. I would assume that
>> query planer would treat both version of my transformation chains the same
>> way.
>> 
>> 
>> // org.apache.spark.sql.AnalysisException: Cannot resolve column name "tag"
>> among (actor, body, generator, pip, id, inReplyTo, link, object, objectType,
>> postedTime, provider, retweetCount, twitter_entities, verb);
>> 
>> // DataFrame emptyDF = rawDF.selectExpr("*", ³pip.rules.tag")
>> 
>> // .filter(rawDF.col(tagCol).isNull());
>> 
>> DataFrame emptyDF1 = rawDF.selectExpr("*", ³pip.rules.tag");
>> 
>> DataFrame emptyDF =  emptyDF1.filter(emptyDF1.col(³tag").isNull());
>> 
>> 
>> 
>> Here is the schema for the gnip structure
>> 
>>  |-- pip: struct (nullable = true)
>> 
>>  ||-- _profile: struct (nullable = true)
>> 
>>  |||-- topics: array (nullable = true)
>> 
>>  ||||-- element: string (containsNull = true)
>> 
>>  ||-- rules: array (nullable = true)
>> 
>>  |||-- element: struct (containsNull = true)
>> 
>>  ||||-- tag: string (nullable = true)
>> 
>> 
>> 
>> Is this a bug ?
>> 
>> 
>> 
>> Andy
>> 
>> 
> 




Re: strange behavior when I chain data frame transformations

2016-05-13 Thread Ted Yu
In the structure shown, tag is under element.

I wonder if that was a factor.

On Fri, May 13, 2016 at 11:49 AM, Andy Davidson <
a...@santacruzintegration.com> wrote:

> I am using spark-1.6.1.
>
> I create a data frame from a very complicated JSON file. I would assume
> that query planer would treat both version of my transformation chains the
> same way.
>
>
> // org.apache.spark.sql.AnalysisException: Cannot resolve column name
> "tag" among (actor, body, generator, pip, id, inReplyTo, link, object,
> objectType, postedTime, provider, retweetCount, twitter_entities, verb);
>
> // DataFrame emptyDF = rawDF.selectExpr("*", “pip.rules.tag")
>
> // .filter(rawDF.col(tagCol).isNull());
>
> DataFrame emptyDF1 = rawDF.selectExpr("*", “pip.rules.tag");
>
> DataFrame emptyDF =  emptyDF1.filter(emptyDF1.col(“tag").isNull());
>
>
> Here is the schema for the gnip structure
>
>  |-- pip: struct (nullable = true)
>
>  ||-- _profile: struct (nullable = true)
>
>  |||-- topics: array (nullable = true)
>
>  ||||-- element: string (containsNull = true)
>
>  ||-- rules: array (nullable = true)
>
>  |||-- element: struct (containsNull = true)
>
>  ||||-- tag: string (nullable = true)
>
>
> Is this a bug ?
>
>
> Andy
>
>
>


strange behavior when I chain data frame transformations

2016-05-13 Thread Andy Davidson
I am using spark-1.6.1.

I create a data frame from a very complicated JSON file. I would assume that
query planer would treat both version of my transformation chains the same
way.


// org.apache.spark.sql.AnalysisException: Cannot resolve column name "tag"
among (actor, body, generator, pip, id, inReplyTo, link, object, objectType,
postedTime, provider, retweetCount, twitter_entities, verb);

// DataFrame emptyDF = rawDF.selectExpr("*", ³pip.rules.tag")

// .filter(rawDF.col(tagCol).isNull());

DataFrame emptyDF1 = rawDF.selectExpr("*", ³pip.rules.tag");

DataFrame emptyDF =  emptyDF1.filter(emptyDF1.col(³tag").isNull());



Here is the schema for the gnip structure

 |-- pip: struct (nullable = true)

 ||-- _profile: struct (nullable = true)

 |||-- topics: array (nullable = true)

 ||||-- element: string (containsNull = true)

 ||-- rules: array (nullable = true)

 |||-- element: struct (containsNull = true)

 ||||-- tag: string (nullable = true)



Is this a bug ?



Andy