Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-05 Thread Mich Talebzadeh
even today CBO (e.g. in Oracle) will still require hints in >>>> some cases so I think it is more like: >>>> >>>> RBO -> RBO + Hints -> CBO + Hints. Most relational databases meet >>>> significant numbers of corner cases where CBO plans simply d

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-05 Thread ashokkumar rajendran
don’t know enough about Spark SQL to comment on whether >>> the same problems would afflict Spark. >>> >>> >>> >>> >>> On 31 Mar 2016, at 15:54, Yong Zhang <java8...@hotmail.com> wrote: >>> >>> I agree that there won't be

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-04 Thread Mich Talebzadeh
should support more hints from the end user, as in >> these cases, end users will be smart enough to tell the engine what is the >> correct way to do. >> >> Weren't the relational DBs doing exactly same path? RBO -> RBO + Hints -> >> CBO? >> >>

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-01 Thread ashokkumar rajendran
ne what is the > correct way to do. > > Weren't the relational DBs doing exactly same path? RBO -> RBO + Hints -> > CBO? > > Yong > > ---------- > Date: Thu, 31 Mar 2016 16:07:14 +0530 > Subject: Re: SPARK-13900 - Join with simple OR conditions

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-01 Thread Robin East
? > > Yong > > Date: Thu, 31 Mar 2016 16:07:14 +0530 > Subject: Re: SPARK-13900 - Join with simple OR conditions take too long > From: hemant9...@gmail.com <mailto:hemant9...@gmail.com> > To: ashokkumar.rajend...@gmail.com <mailto:ashokkumar.rajend...@gmail.com> >

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-01 Thread Hemant Bhanawat
As Mich has already noticed, Spark defaults to NL join if there are more than one condition. Oracle is probably doing cost-based optimizations in this scenario. You can call it a bug but in my opinion it is an area where Spark is still evolving. >> Hemant has mentioned the nested loop time will

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-01 Thread ashokkumar rajendran
Hi Mich, Thanks for the input. Yes, it seems to be a bug. Is it possible to fix this in next release? Regards Ashok On Fri, Apr 1, 2016 at 2:06 PM, Mich Talebzadeh wrote: > hm. > > Sounds like it ends up in Nested Loop Join (NLJ) as opposed to Hash Join > (HJ) when

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-01 Thread Mich Talebzadeh
hm. Sounds like it ends up in Nested Loop Join (NLJ) as opposed to Hash Join (HJ) when OR is used for more than one predicate comparison. In below I have a table dummy created as ORC with 1 billion rows. Just created another one called dummy1 with 60K rows A simple join results in Hash Join

RE: SPARK-13900 - Join with simple OR conditions take too long

2016-03-31 Thread Yong Zhang
way to do. Weren't the relational DBs doing exactly same path? RBO -> RBO + Hints -> CBO? Yong Date: Thu, 31 Mar 2016 16:07:14 +0530 Subject: Re: SPARK-13900 - Join with simple OR conditions take too long From: hemant9...@gmail.com To: ashokkumar.rajend...@gmail.com CC: user@spark.apache.o

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-03-31 Thread Hemant Bhanawat
Hi Ashok, That's interesting. As I understand, on table A and B, a nested loop join (that will produce m X n rows) is performed and than each row is evaluated to see if any of the condition is met. You are asking that Spark should instead do a BroadcastHashJoin on the equality conditions in

SPARK-13900 - Join with simple OR conditions take too long

2016-03-31 Thread ashokkumar rajendran
Hi, I have filed ticket SPARK-13900. There was an initial reply from a developer but did not get any reply on this. How can we do multiple hash joins together for OR conditions based joins? Could someone please guide on how can we fix this? Regards Ashok