We are still not sure what is the problem, if you cannot show us with some 
example data.
For dps with 42632 rows, and swig with 42034 rows, if dps full outer join with 
swig on 3 columns; with additional filters, get the same resultSet row count as 
dps lefter outer join with swig on 3 columns, with additional filters, again 
get the the same resultSet row count as dps right outer join with swig on 3 
columns, with same additional filters.
Without knowing your data, I cannot see the reason that has to be a bug in the 
spark.
Am I misunderstanding your bug?
Yong

From: kpe...@gmail.com
Date: Mon, 2 May 2016 12:11:18 -0700
Subject: Re: Weird results with Spark SQL Outer joins
To: gourav.sengu...@gmail.com
CC: user@spark.apache.org

Gourav,
I wish that was case, but I have done a select count on each of the two tables 
individually and they return back different number of rows:









dps.registerTempTable("dps_pin_promo_lt")

swig.registerTempTable("swig_pin_promo_lt")




dps.count()

RESULT: 42632




swig.count()

RESULT: 42034

On Mon, May 2, 2016 at 11:55 AM, Gourav Sengupta <gourav.sengu...@gmail.com> 
wrote:
This shows that both the tables have matching records and no mismatches. 
Therefore obviously you have the same results irrespective of whether you use 
right or left join. 
I think that there is no problem here, unless I am missing something.
Regards,Gourav 
On Mon, May 2, 2016 at 7:48 PM, kpeng1 <kpe...@gmail.com> wrote:
Also, the results of the inner query produced the same results:

sqlContext.sql("SELECT s.date AS edate  , s.account AS s_acc  , d.account AS

d_acc  , s.ad as s_ad  , d.ad as d_ad , s.spend AS s_spend  ,

d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s INNER JOIN

dps_pin_promo_lt d  ON (s.date = d.date AND s.account = d.account AND s.ad =

d.ad) WHERE s.date >= '2016-01-03'    AND d.date >= '2016-01-03'").count()

RESULT:23747







--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Weird-results-with-Spark-SQL-Outer-joins-tp26861p26863.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.



---------------------------------------------------------------------

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org






                                          

Reply via email to