counts. They are basically
> pulling
> >>>>>> out
> >>>>>> selected columns from the query, but there is no roll up happening
> or
> >>>>>> anything that would possible make it suspicious that there is any
> >>>>
re
>>>>>> producing
>>>>>> the same counts, the natural suspicions is that the tables are
>>>>>> identical,
>>>>>> but I when I run the following two queries:
>>>>>>
>>>>>> scala> sqlCont
tered by date being above 2016-01-03. Since all the joins are
> >>>> > producing
> >>>> > the same counts, the natural suspicions is that the tables are
> >>>> > identical,
> >>>> > but I when I run the following two q
n_promo_lt where date
>>>> >>='2016-01-03'").count
>>>> >
>>>> > res14: Long = 34158
>>>> >
>>>> > scala> sqlContext.sql("select * from dps_pin_promo_lt where date
>>>> >>='2016-01-03'").count
>&
gt;
>>> > The above two queries filter out the data based on date used by the
>>> joins of
>>> > 2016-01-03 and you can see the row count between the two tables are
>>> > different, which is why I am suspecting something is wrong with the
>>> outer
&g
wrong counts for
>> > dps.count, the real value is res16: Long = 42694
>> >
>> >
>> > Thanks,
>> >
>> >
>> > KP
>> >
>> >
>> >
>> >
>> > On Mon, May 2, 2016 at 12:50 PM, Yong Zhang <java8...@hotmail.
gt; >> For dps with 42632 rows, and swig with 42034 rows, if dps full outer
> join
> >> with swig on 3 columns; with additional filters, get the same resultSet
> row
> >> count as dps lefter outer join with swig on 3 columns, with additional
> >> filters,
same resultSet row count as dps right outer join
>> with swig on 3 columns, with same additional filters.
>>
>> Without knowing your data, I cannot see the reason that has to be a bug in
>> the spark.
>>
>> Am I misunderstanding your bug?
>>
>> Yong
>
gt;>
>> Without knowing your data, I cannot see the reason that has to be a bug
>> in the spark.
>>
>> Am I misunderstanding your bug?
>>
>> Yong
>>
>> --
>> From: kpe...@gmail.com
>> Date: Mon, 2 May 201
ight outer join
> with swig on 3 columns, with same additional filters.
>
> Without knowing your data, I cannot see the reason that has to be a bug in
> the spark.
>
> Am I misunderstanding your bug?
>
> Yong
>
> ----------
> From: kpe...@gmail.com
: Mon, 2 May 2016 12:11:18 -0700
Subject: Re: Weird results with Spark SQL Outer joins
To: gourav.sengu...@gmail.com
CC: user@spark.apache.org
Gourav,
I wish that was case, but I have done a select count on each of the two tables
individually and they return back different number of rows
Gourav,
I wish that was case, but I have done a select count on each of the two
tables individually and they return back different number of rows:
dps.registerTempTable("dps_pin_promo_lt")
swig.registerTempTable("swig_pin_promo_lt")
dps.count()
RESULT: 42632
swig.count()
RESULT: 42034
This shows that both the tables have matching records and no mismatches.
Therefore obviously you have the same results irrespective of whether you
use right or left join.
I think that there is no problem here, unless I am missing something.
Regards,
Gourav
On Mon, May 2, 2016 at 7:48 PM, kpeng1
Also, the results of the inner query produced the same results:
sqlContext.sql("SELECT s.date AS edate , s.account AS s_acc , d.account AS
d_acc , s.ad as s_ad , d.ad as d_ad , s.spend AS s_spend ,
d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s INNER JOIN
dps_pin_promo_lt d ON (s.date
Hi Kevin,
Thanks.
Please post the result of the same query with INNER JOIN and then it will
give us a bit of insight.
Regards,
Gourav
On Mon, May 2, 2016 at 7:10 PM, Kevin Peng wrote:
> Gourav,
>
> Apologies. I edited my post with this information:
> Spark version: 1.6
>
Gourav,
Apologies. I edited my post with this information:
Spark version: 1.6
Result from spark shell
OS: Linux version 2.6.32-431.20.3.el6.x86_64 (
mockbu...@c6b9.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat
4.4.7-4) (GCC) ) #1 SMP Thu Jun 19 21:14:45 UTC 2014
Thanks,
KP
On Mon,
Hi,
As always, can you please write down details regarding your SPARK cluster -
the version, OS, IDE used, etc?
Regards,
Gourav Sengupta
On Mon, May 2, 2016 at 5:58 PM, kpeng1 wrote:
> Hi All,
>
> I am running into a weird result with Spark SQL Outer joins. The results
>
17 matches
Mail list logo