I'm using CDH 5.1.0 with Spark-1.0.0. There is spark-sql-1.0.0 in clouder'a maven repository. After put it into the classpath, I can use spark-sql in my application.
One of issue is that I couldn't make the join as a hash join. It gives CartesianProduct when I join two SchemaRDDs as follows: scala> val event = sqlContext.parquetFile("/events/2014-09-28").select('MediaEventID).join(log, joinType=LeftOuter, on=Some("event.eventid".attr === "log.eventid".attr)) == Query Plan == BroadcastNestedLoopJoin LeftOuter, Some(('event.eventid = 'log.eventid)) ParquetTableScan [eventid#130L], (ParquetRelation /events/2014-09-28), None ParquetTableScan [eventid#125L,listid#126L,isfavorite#127], (ParquetRelation /logs/eventdt=2014-09-28), None If I join with another SchemaRDD, I would get Cartesian Product. Is it possible that make the join as a hash join in Spark-1.0.0?