Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Brett Antonides
Hello,

Given the following example customers.json file:
{
"name": "Sherlock Holmes",
"customerNumber": 12345,
"address": {
"street": "221b Baker Street",
"city": "London",
"zipcode": "NW1 6XE",
"country": "United Kingdom"
}
},
{
"name": "Big Bird",
"customerNumber": 10001,
"address": {
"street": "123 Sesame Street",
"city": "Hensonville",
"zipcode": "56789",
"country": "United States"
}
}

And loading it with the following Scala code:
val sqlCtx = new org.apache.spark.sql.SQLContext(sc)
val customers = sqlCtx.jsonFile("hdfs:///customers.json")

How would I perform a select on the elements under the address?  I tried
the following and a number of variations without success:

customers.select('address.city).collect()

I know I can do this using the SQL syntax, but I'm interested in how to do
this using the LINQ syntax.

Thank you for your help.

Cheers,
Brett


Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Michael Armbrust
Try: "address.city".attr

On Tue, Oct 28, 2014 at 8:30 AM, Brett Antonides  wrote:

> Hello,
>
> Given the following example customers.json file:
> {
> "name": "Sherlock Holmes",
> "customerNumber": 12345,
> "address": {
> "street": "221b Baker Street",
> "city": "London",
> "zipcode": "NW1 6XE",
> "country": "United Kingdom"
> }
> },
> {
> "name": "Big Bird",
> "customerNumber": 10001,
> "address": {
> "street": "123 Sesame Street",
> "city": "Hensonville",
> "zipcode": "56789",
> "country": "United States"
> }
> }
>
> And loading it with the following Scala code:
> val sqlCtx = new org.apache.spark.sql.SQLContext(sc)
> val customers = sqlCtx.jsonFile("hdfs:///customers.json")
>
> How would I perform a select on the elements under the address?  I tried
> the following and a number of variations without success:
>
> customers.select('address.city).collect()
>
> I know I can do this using the SQL syntax, but I'm interested in how to do
> this using the LINQ syntax.
>
> Thank you for your help.
>
> Cheers,
> Brett
>
>
>


Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Michael Armbrust
On Tue, Oct 28, 2014 at 2:19 PM, Corey Nolet  wrote:

> Is it possible to select if, say, there was an addresses field that had a
> json array?
>
You can get the Nth item by "address".getItem(0).  If you want to walk
through the whole array look at LATERAL VIEW EXPLODE in HiveQL


Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Corey Nolet
So it wouldn't be possible to have a json string like this:

{ "name":"John", "age":53, "locations": [{ "street":"Rodeo Dr",
"number":2300 }]}

And query all people who have a location with number = 2300?




On Tue, Oct 28, 2014 at 5:30 PM, Michael Armbrust 
wrote:

> On Tue, Oct 28, 2014 at 2:19 PM, Corey Nolet  wrote:
>
>> Is it possible to select if, say, there was an addresses field that had a
>> json array?
>>
> You can get the Nth item by "address".getItem(0).  If you want to walk
> through the whole array look at LATERAL VIEW EXPLODE in HiveQL
>
>


Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Michael Armbrust
You can do this:

$ sbt/sbt hive/console

scala> jsonRDD(sparkContext.parallelize("""{ "name":"John", "age":53,
"locations": [{ "street":"Rodeo Dr", "number":2300 }]}""" ::
Nil)).registerTempTable("people")

scala> sql("SELECT name FROM people LATERAL VIEW explode(locations) l AS
location WHERE location.number = 2300").collect()
res0: Array[org.apache.spark.sql.Row] = Array([John])

This will double show people who have more than one matching address.

On Tue, Oct 28, 2014 at 5:52 PM, Corey Nolet  wrote:

> So it wouldn't be possible to have a json string like this:
>
> { "name":"John", "age":53, "locations": [{ "street":"Rodeo Dr",
> "number":2300 }]}
>
> And query all people who have a location with number = 2300?
>
>
>
>
> On Tue, Oct 28, 2014 at 5:30 PM, Michael Armbrust 
> wrote:
>
>> On Tue, Oct 28, 2014 at 2:19 PM, Corey Nolet  wrote:
>>
>>> Is it possible to select if, say, there was an addresses field that had
>>> a json array?
>>>
>> You can get the Nth item by "address".getItem(0).  If you want to walk
>> through the whole array look at LATERAL VIEW EXPLODE in HiveQL
>>
>>
>
>


Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Corey Nolet
Michael,

Awesome, this is what I was looking for. So it's possible to use hive
dialect in a regular sql context? This is what was confusing to me- the
docs kind of allude to it but don't directly point it out.

On Tue, Oct 28, 2014 at 9:30 PM, Michael Armbrust 
wrote:

> You can do this:
>
> $ sbt/sbt hive/console
>
> scala> jsonRDD(sparkContext.parallelize("""{ "name":"John", "age":53,
> "locations": [{ "street":"Rodeo Dr", "number":2300 }]}""" ::
> Nil)).registerTempTable("people")
>
> scala> sql("SELECT name FROM people LATERAL VIEW explode(locations) l AS
> location WHERE location.number = 2300").collect()
> res0: Array[org.apache.spark.sql.Row] = Array([John])
>
> This will double show people who have more than one matching address.
>
> On Tue, Oct 28, 2014 at 5:52 PM, Corey Nolet  wrote:
>
>> So it wouldn't be possible to have a json string like this:
>>
>> { "name":"John", "age":53, "locations": [{ "street":"Rodeo Dr",
>> "number":2300 }]}
>>
>> And query all people who have a location with number = 2300?
>>
>>
>>
>>
>> On Tue, Oct 28, 2014 at 5:30 PM, Michael Armbrust > > wrote:
>>
>>> On Tue, Oct 28, 2014 at 2:19 PM, Corey Nolet  wrote:
>>>
 Is it possible to select if, say, there was an addresses field that had
 a json array?

>>> You can get the Nth item by "address".getItem(0).  If you want to walk
>>> through the whole array look at LATERAL VIEW EXPLODE in HiveQL
>>>
>>>
>>
>>
>


Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Corey Nolet
Am I able to do a join on an exploded field?

Like if I have another object:

{ "streetNumber":"2300", "locationName":"The Big Building"} and I want to
join with the previous json by the locations[].number field- is that
possible?


On Tue, Oct 28, 2014 at 9:31 PM, Corey Nolet  wrote:

> Michael,
>
> Awesome, this is what I was looking for. So it's possible to use hive
> dialect in a regular sql context? This is what was confusing to me- the
> docs kind of allude to it but don't directly point it out.
>
> On Tue, Oct 28, 2014 at 9:30 PM, Michael Armbrust 
> wrote:
>
>> You can do this:
>>
>> $ sbt/sbt hive/console
>>
>> scala> jsonRDD(sparkContext.parallelize("""{ "name":"John", "age":53,
>> "locations": [{ "street":"Rodeo Dr", "number":2300 }]}""" ::
>> Nil)).registerTempTable("people")
>>
>> scala> sql("SELECT name FROM people LATERAL VIEW explode(locations) l AS
>> location WHERE location.number = 2300").collect()
>> res0: Array[org.apache.spark.sql.Row] = Array([John])
>>
>> This will double show people who have more than one matching address.
>>
>> On Tue, Oct 28, 2014 at 5:52 PM, Corey Nolet  wrote:
>>
>>> So it wouldn't be possible to have a json string like this:
>>>
>>> { "name":"John", "age":53, "locations": [{ "street":"Rodeo Dr",
>>> "number":2300 }]}
>>>
>>> And query all people who have a location with number = 2300?
>>>
>>>
>>>
>>>
>>> On Tue, Oct 28, 2014 at 5:30 PM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 On Tue, Oct 28, 2014 at 2:19 PM, Corey Nolet  wrote:

> Is it possible to select if, say, there was an addresses field that
> had a json array?
>
 You can get the Nth item by "address".getItem(0).  If you want to walk
 through the whole array look at LATERAL VIEW EXPLODE in HiveQL


>>>
>>>
>>
>


Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Michael Armbrust
On Tue, Oct 28, 2014 at 6:56 PM, Corey Nolet  wrote:

> Am I able to do a join on an exploded field?
>
> Like if I have another object:
>
> { "streetNumber":"2300", "locationName":"The Big Building"} and I want to
> join with the previous json by the locations[].number field- is that
> possible?
>

I'm not sure I fully understand the question, but once its exploded its a
normal tuple and you can do any operations on it.  The explode is just
producing a new row for each element in the array.

Awesome, this is what I was looking for. So it's possible to use hive
>> dialect in a regular sql context? This is what was confusing to me- the
>> docs kind of allude to it but don't directly point it out.
>>
>
No, you need a HiveContext as we use the actual hive parser (SQLContext
only exists as a separate entity so that people who don't want Hive's
dependencies in their app can still use a limited subset of Spark SQL).


Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Michael Armbrust
Can you println the .queryExecution of the SchemaRDD?

On Tue, Oct 28, 2014 at 7:43 PM, Corey Nolet  wrote:

> So this appears to work just fine:
>
> hctx.sql("SELECT p.name, p.age  FROM people p LATERAL VIEW
> explode(locations) l AS location JOIN location5 lo ON l.number =
> lo.streetNumber WHERE location.number = '2300'").collect()
>
> But as soon as I try to join with another set based on a property from the
> exploded locations set, I get invalid attribute exceptions:
>
> hctx.sql("SELECT p.name, p.age, ln.locationName  FROM people as p LATERAL
> VIEW explode(locations) l AS location JOIN locationNames ln ON
> location.number = ln.streetNumber WHERE location.number = '2300'").collect()
>
>
> On Tue, Oct 28, 2014 at 10:19 PM, Michael Armbrust  > wrote:
>
>>
>>
>> On Tue, Oct 28, 2014 at 6:56 PM, Corey Nolet  wrote:
>>
>>> Am I able to do a join on an exploded field?
>>>
>>> Like if I have another object:
>>>
>>> { "streetNumber":"2300", "locationName":"The Big Building"} and I want
>>> to join with the previous json by the locations[].number field- is that
>>> possible?
>>>
>>
>> I'm not sure I fully understand the question, but once its exploded its a
>> normal tuple and you can do any operations on it.  The explode is just
>> producing a new row for each element in the array.
>>
>> Awesome, this is what I was looking for. So it's possible to use hive
 dialect in a regular sql context? This is what was confusing to me- the
 docs kind of allude to it but don't directly point it out.

>>>
>> No, you need a HiveContext as we use the actual hive parser (SQLContext
>> only exists as a separate entity so that people who don't want Hive's
>> dependencies in their app can still use a limited subset of Spark SQL).
>>
>
>


Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-28 Thread Corey Nolet
scala> locations.queryExecution

warning: there were 1 feature warning(s); re-run with -feature for details

res28: _4.sqlContext.QueryExecution forSome { val _4:
org.apache.spark.sql.SchemaRDD } =

== Parsed Logical Plan ==

SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
MappedRDD[99] at map at JsonRDD.scala:38)


== Analyzed Logical Plan ==

SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
MappedRDD[99] at map at JsonRDD.scala:38)


== Optimized Logical Plan ==

SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
MappedRDD[99] at map at JsonRDD.scala:38)


== Physical Plan ==

ExistingRdd [locationName#80,locationNumber#81], MappedRDD[99] at map at
JsonRDD.scala:38


Code Generation: false

== RDD ==


scala> people.queryExecution

warning: there were 1 feature warning(s); re-run with -feature for details

res29: _5.sqlContext.QueryExecution forSome { val _5:
org.apache.spark.sql.SchemaRDD } =

== Parsed Logical Plan ==

SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28]
at map at JsonRDD.scala:38)


== Analyzed Logical Plan ==

SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28]
at map at JsonRDD.scala:38)


== Optimized Logical Plan ==

SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28]
at map at JsonRDD.scala:38)


== Physical Plan ==

ExistingRdd [age#9,locations#10,name#11], MappedRDD[28] at map at
JsonRDD.scala:38


Code Generation: false

== RDD ==



Here's when I try executing the join and the lateral view explode() :


14/10/28 23:05:35 INFO ParseDriver: Parse Completed

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
attributes: 'p.name,'p.age, tree:

Project ['p.name,'p.age]

 Filter ('location.number = 2300)

  Join Inner, Some((location#110.number AS number#111 = 'ln.streetnumber))

   Generate explode(locations#10), true, false, Some(l)

LowerCaseSchema

 Subquery p

  Subquery people

   SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11],
MappedRDD[28] at map at JsonRDD.scala:38)

   LowerCaseSchema

Subquery ln

 Subquery locationNames

  SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
MappedRDD[99] at map at JsonRDD.scala:38)


at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)

at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)

at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)

at
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)

at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)

at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)

at
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)

at
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)

at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)

at scala.collection.immutable.List.foreach(List.scala:318)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)

at
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:397)

at
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:397)

at
org.apache.spark.sql.hive.HiveContext$QueryExecution.optimizedPlan$lzycompute(HiveContext.scala:358)

at
org.apache.spark.sql.hive.HiveContext$QueryExecution.optimizedPlan(HiveContext.scala:357)

at
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402)

 at
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400)

On Tue, Oct 28, 2014 at 10:48 PM, Michael Armbrust 
wrote:

> Can you println the .queryExecution of the SchemaRDD?
>
> On Tue, Oct 28, 2014 at 7:43 PM, Corey Nolet  wrote:
>
>> So this appears to work just fine:
>>
>> hctx.sql("SELECT p.name, p.age  FROM people p LATERAL VIEW
>> explode(locations) l AS location JOIN location5 lo ON l.number =
>> lo.streetNumber WHERE location.number = '2300'").collect()
>>
>> But as soon as I try to join with another set based on a property from
>> the exploded locations set, I get invalid attribute exceptions:
>>
>> hctx.sql("SELECT p.name, p.age, ln.locationName  FROM people as p
>> LATERAL VIEW explode(locations) l AS location JOIN locationNames ln ON
>> location.number = ln.streetNumber WHERE location.number = '230

Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-10-29 Thread Michael Armbrust
We are working on more helpful error messages, but in the meantime let me
explain how to read this output.

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
attributes: 'p.name,'p.age, tree:

Project ['p.name,'p.age]
 Filter ('location.number = 2300)
  Join Inner, Some((location#110.number AS number#111 = 'ln.streetnumber))
   Generate explode(locations#10), true, false, Some(l)
LowerCaseSchema
 Subquery p
  Subquery people
   SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11],
MappedRDD[28] at map at JsonRDD.scala:38)
   LowerCaseSchema
Subquery ln
 Subquery locationNames
  SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
MappedRDD[99] at map at JsonRDD.scala:38)

'tickedFields indicate a failure to resolve, where as numbered#10
attributes have been resolved. (The numbers are globally unique and can be
used to disambiguate where a column is coming from when the names are the
same)

Resolution happens bottom up.  So the first place that there is a problem
is 'ln.streetnumber, which prevents the rest of the query from resolving.
If you look at the subquery ln, it is only producing two columns:
locationName and locationNumber. So "streetnumber" is not valid.


On Tue, Oct 28, 2014 at 8:02 PM, Corey Nolet  wrote:

> scala> locations.queryExecution
>
> warning: there were 1 feature warning(s); re-run with -feature for details
>
> res28: _4.sqlContext.QueryExecution forSome { val _4:
> org.apache.spark.sql.SchemaRDD } =
>
> == Parsed Logical Plan ==
>
> SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
> MappedRDD[99] at map at JsonRDD.scala:38)
>
>
> == Analyzed Logical Plan ==
>
> SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
> MappedRDD[99] at map at JsonRDD.scala:38)
>
>
> == Optimized Logical Plan ==
>
> SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
> MappedRDD[99] at map at JsonRDD.scala:38)
>
>
> == Physical Plan ==
>
> ExistingRdd [locationName#80,locationNumber#81], MappedRDD[99] at map at
> JsonRDD.scala:38
>
>
> Code Generation: false
>
> == RDD ==
>
>
> scala> people.queryExecution
>
> warning: there were 1 feature warning(s); re-run with -feature for details
>
> res29: _5.sqlContext.QueryExecution forSome { val _5:
> org.apache.spark.sql.SchemaRDD } =
>
> == Parsed Logical Plan ==
>
> SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28]
> at map at JsonRDD.scala:38)
>
>
> == Analyzed Logical Plan ==
>
> SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28]
> at map at JsonRDD.scala:38)
>
>
> == Optimized Logical Plan ==
>
> SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28]
> at map at JsonRDD.scala:38)
>
>
> == Physical Plan ==
>
> ExistingRdd [age#9,locations#10,name#11], MappedRDD[28] at map at
> JsonRDD.scala:38
>
>
> Code Generation: false
>
> == RDD ==
>
>
>
> Here's when I try executing the join and the lateral view explode() :
>
>
> 14/10/28 23:05:35 INFO ParseDriver: Parse Completed
>
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
> attributes: 'p.name,'p.age, tree:
>
> Project ['p.name,'p.age]
>
>  Filter ('location.number = 2300)
>
>   Join Inner, Some((location#110.number AS number#111 = 'ln.streetnumber))
>
>Generate explode(locations#10), true, false, Some(l)
>
> LowerCaseSchema
>
>  Subquery p
>
>   Subquery people
>
>SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11],
> MappedRDD[28] at map at JsonRDD.scala:38)
>
>LowerCaseSchema
>
> Subquery ln
>
>  Subquery locationNames
>
>   SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
> MappedRDD[99] at map at JsonRDD.scala:38)
>
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>
> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>
> at
> org.apache.spark.s

Re: Selecting Based on Nested Values using Language Integrated Query Syntax

2014-11-06 Thread Corey Nolet
Michael,

Thanks for the explanation. I was able to get this running.

On Wed, Oct 29, 2014 at 3:07 PM, Michael Armbrust 
wrote:

> We are working on more helpful error messages, but in the meantime let me
> explain how to read this output.
>
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
> attributes: 'p.name,'p.age, tree:
>
> Project ['p.name,'p.age]
>  Filter ('location.number = 2300)
>   Join Inner, Some((location#110.number AS number#111 = 'ln.streetnumber))
>Generate explode(locations#10), true, false, Some(l)
> LowerCaseSchema
>  Subquery p
>   Subquery people
>SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11],
> MappedRDD[28] at map at JsonRDD.scala:38)
>LowerCaseSchema
> Subquery ln
>  Subquery locationNames
>   SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
> MappedRDD[99] at map at JsonRDD.scala:38)
>
> 'tickedFields indicate a failure to resolve, where as numbered#10
> attributes have been resolved. (The numbers are globally unique and can be
> used to disambiguate where a column is coming from when the names are the
> same)
>
> Resolution happens bottom up.  So the first place that there is a problem
> is 'ln.streetnumber, which prevents the rest of the query from resolving.
> If you look at the subquery ln, it is only producing two columns:
> locationName and locationNumber. So "streetnumber" is not valid.
>
>
> On Tue, Oct 28, 2014 at 8:02 PM, Corey Nolet  wrote:
>
>> scala> locations.queryExecution
>>
>> warning: there were 1 feature warning(s); re-run with -feature for details
>>
>> res28: _4.sqlContext.QueryExecution forSome { val _4:
>> org.apache.spark.sql.SchemaRDD } =
>>
>> == Parsed Logical Plan ==
>>
>> SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
>> MappedRDD[99] at map at JsonRDD.scala:38)
>>
>>
>> == Analyzed Logical Plan ==
>>
>> SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
>> MappedRDD[99] at map at JsonRDD.scala:38)
>>
>>
>> == Optimized Logical Plan ==
>>
>> SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
>> MappedRDD[99] at map at JsonRDD.scala:38)
>>
>>
>> == Physical Plan ==
>>
>> ExistingRdd [locationName#80,locationNumber#81], MappedRDD[99] at map at
>> JsonRDD.scala:38
>>
>>
>> Code Generation: false
>>
>> == RDD ==
>>
>>
>> scala> people.queryExecution
>>
>> warning: there were 1 feature warning(s); re-run with -feature for details
>>
>> res29: _5.sqlContext.QueryExecution forSome { val _5:
>> org.apache.spark.sql.SchemaRDD } =
>>
>> == Parsed Logical Plan ==
>>
>> SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28]
>> at map at JsonRDD.scala:38)
>>
>>
>> == Analyzed Logical Plan ==
>>
>> SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28]
>> at map at JsonRDD.scala:38)
>>
>>
>> == Optimized Logical Plan ==
>>
>> SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11], MappedRDD[28]
>> at map at JsonRDD.scala:38)
>>
>>
>> == Physical Plan ==
>>
>> ExistingRdd [age#9,locations#10,name#11], MappedRDD[28] at map at
>> JsonRDD.scala:38
>>
>>
>> Code Generation: false
>>
>> == RDD ==
>>
>>
>>
>> Here's when I try executing the join and the lateral view explode() :
>>
>>
>> 14/10/28 23:05:35 INFO ParseDriver: Parse Completed
>>
>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
>> Unresolved attributes: 'p.name,'p.age, tree:
>>
>> Project ['p.name,'p.age]
>>
>>  Filter ('location.number = 2300)
>>
>>   Join Inner, Some((location#110.number AS number#111 = 'ln.streetnumber))
>>
>>Generate explode(locations#10), true, false, Some(l)
>>
>> LowerCaseSchema
>>
>>  Subquery p
>>
>>   Subquery people
>>
>>SparkLogicalPlan (ExistingRdd [age#9,locations#10,name#11],
>> MappedRDD[28] at map at JsonRDD.scala:38)
>>
>>LowerCaseSchema
>>
>> Subquery ln
>>
>>  Subquery locationNames
>>
>>   SparkLogicalPlan (ExistingRdd [locationName#80,locationNumber#81],
>> MappedRDD[99] at map at JsonRDD.scala:38)
>>
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>
>> at
>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>
>> at
>> scala.collection.IndexedSeqOptimized$class.foldl(Inde