Yes and even today CBO (e.g. in Oracle) will still require hints in some cases 
so I think it is more like:

RBO -> RBO + Hints -> CBO + Hints. Most relational databases meet significant 
numbers of corner cases where CBO plans simply don’t do what you would want. I 
don’t know enough about Spark SQL to comment on whether the same problems would 
afflict Spark.




> On 31 Mar 2016, at 15:54, Yong Zhang <java8...@hotmail.com> wrote:
> 
> I agree that there won't be a generic solution for these kind of cases.
> 
> Without the CBO from Spark or Hadoop ecosystem in short future, maybe Spark 
> DataFrame/SQL should support more hints from the end user, as in these cases, 
> end users will be smart enough to tell the engine what is the correct way to 
> do.
> 
> Weren't the relational DBs doing exactly same path? RBO -> RBO + Hints -> CBO?
> 
> Yong
> 
> Date: Thu, 31 Mar 2016 16:07:14 +0530
> Subject: Re: SPARK-13900 - Join with simple OR conditions take too long
> From: hemant9...@gmail.com <mailto:hemant9...@gmail.com>
> To: ashokkumar.rajend...@gmail.com <mailto:ashokkumar.rajend...@gmail.com>
> CC: user@spark.apache.org <mailto:user@spark.apache.org>
> 
> Hi Ashok,
> 
> That's interesting. 
> 
> As I understand, on table A and B, a nested loop join (that will produce m X 
> n rows) is performed and than each row is evaluated to see if any of the 
> condition is met. You are asking that Spark should instead do a 
> BroadcastHashJoin on the equality conditions in parallel and then union the 
> results like you are doing in a different query. 
> 
> If we leave aside parallelism for a moment, theoretically, time taken for 
> nested loop join would vary little when the number of conditions are 
> increased while the time taken for the solution that you are suggesting would 
> increase linearly with number of conditions. So, when number of conditions 
> are too many, nested loop join would be faster than the solution that you 
> suggest. Now the question is, how should Spark decide when to do what? 
> 
> 
> Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811>
> www.snappydata.io <http://www.snappydata.io/> 
> 
> On Thu, Mar 31, 2016 at 2:28 PM, ashokkumar rajendran 
> <ashokkumar.rajend...@gmail.com <mailto:ashokkumar.rajend...@gmail.com>> 
> wrote:
> Hi,
> 
> I have filed ticket SPARK-13900. There was an initial reply from a developer 
> but did not get any reply on this. How can we do multiple hash joins together 
> for OR conditions based joins? Could someone please guide on how can we fix 
> this? 
> 
> Regards
> Ashok

Reply via email to