@spark.apache.org; user
Subject: Re: Spark SQL question: why build hashtable for both sides in
HashOuterJoin?
Hi Haopu,
How about full outer join? One hash table may not be efficient for this case.
Liquan
On Mon, Sep 29, 2014 at 11:47 PM, Haopu Wang hw...@qilinsoft.com wrote:
Hi
@spark.apache.org; user
Subject: Re: Spark SQL question: why build hashtable for both sides in
HashOuterJoin?
Hi Haopu,
How about full outer join? One hash table may not be efficient for this case.
Liquan
On Mon, Sep 29, 2014 at 11:47 PM, Haopu Wang hw...@qilinsoft.com wrote:
Hi, Liquan
*To:* Haopu Wang
*Cc:* dev@spark.apache.org; user
*Subject:* Re: Spark SQL question: why build hashtable for both sides in
HashOuterJoin?
Hi Haopu,
How about full outer join? One hash table may not be efficient for this
case.
Liquan
On Mon, Sep 29, 2014 at 11:47 PM, Haopu Wang hw
!
From: Liquan Pei [mailto:liquan...@gmail.com]
Sent: 2014年9月30日 12:31
To: Haopu Wang
Cc: dev@spark.apache.org; user
Subject: Re: Spark SQL question: why build hashtable for both sides in
HashOuterJoin?
Hi Haopu,
My understanding is that the hashtable on both left and right side
@spark.apache.org; user
*Subject:* Re: Spark SQL question: why build hashtable for both sides in
HashOuterJoin?
Hi Haopu,
My understanding is that the hashtable on both left and right side is used
for including null values in result in an efficient manner. If hash table
is only built on one side
I take a look at HashOuterJoin and it's building a Hashtable for both
sides.
This consumes quite a lot of memory when the partition is big. And it
doesn't reduce the iteration on streamed relation, right?
Thanks!
-
To
Hi Haopu,
My understanding is that the hashtable on both left and right side is used
for including null values in result in an efficient manner. If hash table
is only built on one side, let's say left side and we perform a left outer
join, for each row in left side, a scan over the right side is