Re: Shouldn't the UNION of SchemaRDDs produce SchemaRDD ?

2014-03-31 Thread Michael Armbrust
* unionAll preserve duplicate v/s union that does not This is true, if you want to eliminate duplicate items you should follow the union with a distinct() * SQL union and unionAll result in same output format i.e. another SQL v/s different RDD types here. * Understand the existing union

Shouldn't the UNION of SchemaRDDs produce SchemaRDD ?

2014-03-30 Thread Manoj Samel
Hi, I am trying SparkSQL based on the example on doc ... val people = sc.textFile(/data/spark/examples/src/main/resources/people.txt).map(_.split(,)).map(p = Person(p(0), p(1).trim.toInt)) val olderThanTeans = people.where('age 19) val youngerThanTeans = people.where('age 13) val