1.0.1 does not have the support on outer joins (added in 1.1). Can you try 1.1 branch?
On Wed, Sep 10, 2014 at 9:28 PM, boyingk...@163.com <boyingk...@163.com> wrote: > Hi,michael : > > I think Arthur.hk.chan <arthur.hk.c...@gmail.com> isn't here now,I Can > Show something: > 1)my spark version is 1.0.1 > 2) when I use multiple join ,like this: > sql("SELECT * FROM youhao_data left join youhao_age on > (youhao_data.rowkey=youhao_age.rowkey) left join youhao_totalKiloMeter on > (youhao_age.rowkey=youhao_totalKiloMeter.rowkey)") > > youhao_data,youhao_age,youhao_totalKiloMeter were registerAsTable 。 > > I take the Exception: > Exception in thread "main" java.lang.RuntimeException: [1.90] failure: > ``UNION'' expected but `left' found > > SELECT * FROM youhao_data left join youhao_age on > (youhao_data.rowkey=youhao_age.rowkey) left join youhao_totalKiloMeter on > (youhao_age.rowkey=youhao_totalKiloMeter.rowkey) > > ^ > at scala.sys.package$.error(package.scala:27) > at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60) > at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:69) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:181) > at > org.apache.spark.examples.sql.SparkSQLHBaseRelation$.main(SparkSQLHBaseRelation.scala:140) > at > org.apache.spark.examples.sql.SparkSQLHBaseRelation.main(SparkSQLHBaseRelation.scala) > ------------------------------ > boyingk...@163.com > > *From:* Michael Armbrust <mich...@databricks.com> > *Date:* 2014-09-11 00:28 > *To:* arthur.hk.c...@gmail.com <arthur.hk.c...@gmail.com> > *CC:* arunshell87 <shell.a...@gmail.com>; u...@spark.incubator.apache.org > *Subject:* Re: Spark SQL -- more than two tables for join > What version of Spark SQL are you running here? I think a lot of your > concerns have likely been addressed in more recent versions of the code / > documentation. (Spark 1.1 should be published in the next few days) > > In particular, for serious applications you should use a HiveContext and > HiveQL as this is a much more complete implementation of a SQL Parser. The > one in SQL context is only suggested if the Hive dependencies conflict with > your application. > > >> 1) spark sql does not support multiple join >> > > This is not true. What problem were you running into? > > >> 2) spark left join: has performance issue >> > > Can you describe your data and query more? > > >> 3) spark sql’s cache table: does not support two-tier query >> > > I'm not sure what you mean here. > > >> 4) spark sql does not support repartition > > > You can repartition SchemaRDDs in the same way as normal RDDs. >