Re: Re: Spark SQL -- more than two tables for join

Yin Huai Thu, 11 Sep 2014 16:27:38 -0700

1.0.1 does not have the support on outer joins (added in 1.1). Can you try
1.1 branch?


On Wed, Sep 10, 2014 at 9:28 PM, boyingk...@163.com <boyingk...@163.com>
wrote:

>  Hi,michael :
>
> I think Arthur.hk.chan <arthur.hk.c...@gmail.com> isn't here now，I Can
> Show something:
> 1)my spark version is 1.0.1
> 2) when I use multiple join ，like this:
> sql("SELECT * FROM youhao_data left join youhao_age on
> (youhao_data.rowkey=youhao_age.rowkey) left join youhao_totalKiloMeter on
> (youhao_age.rowkey=youhao_totalKiloMeter.rowkey)")
>
>        youhao_data,youhao_age,youhao_totalKiloMeter  were registerAsTable 。
>
>  I take the Exception:
>  Exception in thread "main" java.lang.RuntimeException: [1.90] failure:
> ``UNION'' expected but `left' found
>
> SELECT * FROM youhao_data left join youhao_age on
> (youhao_data.rowkey=youhao_age.rowkey) left join youhao_totalKiloMeter on
> (youhao_age.rowkey=youhao_totalKiloMeter.rowkey)
>
> ^
> at scala.sys.package$.error(package.scala:27)
> at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60)
> at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:69)
> at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:181)
> at
> org.apache.spark.examples.sql.SparkSQLHBaseRelation$.main(SparkSQLHBaseRelation.scala:140)
> at
> org.apache.spark.examples.sql.SparkSQLHBaseRelation.main(SparkSQLHBaseRelation.scala)
> ------------------------------
>  boyingk...@163.com
>
>  *From:* Michael Armbrust <mich...@databricks.com>
> *Date:* 2014-09-11 00:28
> *To:* arthur.hk.c...@gmail.com <arthur.hk.c...@gmail.com>
> *CC:* arunshell87 <shell.a...@gmail.com>; u...@spark.incubator.apache.org
> *Subject:* Re: Spark SQL -- more than two tables for join
>    What version of Spark SQL are you running here?  I think a lot of your
> concerns have likely been addressed in more recent versions of the code /
> documentation.  (Spark 1.1 should be published in the next few days)
>
> In particular, for serious applications you should use a HiveContext and
> HiveQL as this is a much more complete implementation of a SQL Parser.  The
> one in SQL context is only suggested if the Hive dependencies conflict with
> your application.
>
>
>> 1)  spark sql does not support multiple join
>>
>
> This is not true.  What problem were you running into?
>
>
>> 2)  spark left join: has performance issue
>>
>
> Can you describe your data and query more?
>
>
>> 3)  spark sql’s cache table: does not support two-tier query
>>
>
> I'm not sure what you mean here.
>
>
>> 4)  spark sql does not support repartition
>
>
> You can repartition SchemaRDDs in the same way as normal RDDs.
>

Re: Re: Spark SQL -- more than two tables for join

Reply via email to