Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

Robert Haas Fri, 20 Mar 2009 17:36:36 -0700

On Fri, Mar 20, 2009 at 8:14 PM, Tom Lane <[email protected]> wrote:
> Bryce Cutt <[email protected]> writes:
>> Here is the new patch.
>
> Applied with revisions.  I undid some of the "optimizations" that
> cluttered the code in order to save a cycle or two per tuple --- as per
> previous discussion, that's not what the performance questions were
> about.  Also, I did not like the terminology "in-memory"/"IM"; it seemed
> confusing since the main hash table is in-memory too.  I revised the
> code to consistently refer to the additional hash table as a "skew"
> hashtable and the optimization in general as skew optimization.  Hope
> that seems reasonable to you --- we could search-and-replace it to
> something else if you'd prefer.
>
> For the moment, I didn't really do anything about teaching the planner
> to account for this optimization in its cost estimates.  The initial
> estimate of the number of MCVs that will be specially treated seems to
> me to be too high (it's only accurate if the inner relation is unique),
> but getting a more accurate estimate seems pretty hard, and it's not
> clear it's worth the trouble.  Without that, though, you can't tell
> what fraction of outer tuples will get the short-circuit treatment.


If the inner relation isn't fairly close to unique you shouldn't be
using this optimization in the first place.

...Robert

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

Reply via email to