Yes and even today CBO (e.g. in Oracle) will still require hints in some cases so I think it is more like:
RBO -> RBO + Hints -> CBO + Hints. Most relational databases meet significant numbers of corner cases where CBO plans simply don’t do what you would want. I don’t know enough about Spark SQL to comment on whether the same problems would afflict Spark. > On 31 Mar 2016, at 15:54, Yong Zhang <java8...@hotmail.com> wrote: > > I agree that there won't be a generic solution for these kind of cases. > > Without the CBO from Spark or Hadoop ecosystem in short future, maybe Spark > DataFrame/SQL should support more hints from the end user, as in these cases, > end users will be smart enough to tell the engine what is the correct way to > do. > > Weren't the relational DBs doing exactly same path? RBO -> RBO + Hints -> CBO? > > Yong > > Date: Thu, 31 Mar 2016 16:07:14 +0530 > Subject: Re: SPARK-13900 - Join with simple OR conditions take too long > From: hemant9...@gmail.com <mailto:hemant9...@gmail.com> > To: ashokkumar.rajend...@gmail.com <mailto:ashokkumar.rajend...@gmail.com> > CC: user@spark.apache.org <mailto:user@spark.apache.org> > > Hi Ashok, > > That's interesting. > > As I understand, on table A and B, a nested loop join (that will produce m X > n rows) is performed and than each row is evaluated to see if any of the > condition is met. You are asking that Spark should instead do a > BroadcastHashJoin on the equality conditions in parallel and then union the > results like you are doing in a different query. > > If we leave aside parallelism for a moment, theoretically, time taken for > nested loop join would vary little when the number of conditions are > increased while the time taken for the solution that you are suggesting would > increase linearly with number of conditions. So, when number of conditions > are too many, nested loop join would be faster than the solution that you > suggest. Now the question is, how should Spark decide when to do what? > > > Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> > www.snappydata.io <http://www.snappydata.io/> > > On Thu, Mar 31, 2016 at 2:28 PM, ashokkumar rajendran > <ashokkumar.rajend...@gmail.com <mailto:ashokkumar.rajend...@gmail.com>> > wrote: > Hi, > > I have filed ticket SPARK-13900. There was an initial reply from a developer > but did not get any reply on this. How can we do multiple hash joins together > for OR conditions based joins? Could someone please guide on how can we fix > this? > > Regards > Ashok