Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

Tomas Vondra Fri, 12 Sep 2014 13:56:36 -0700

On 12.9.2014 22:24, Robert Haas wrote:
> On Fri, Sep 12, 2014 at 3:39 PM, Tomas Vondra <[email protected]> wrote:
>> On 12.9.2014 18:49, Robert Haas wrote:
>>> I'm comfortable with this version if you are, but (maybe as a
>>> follow-on commit) I think we could make this even a bit smarter. If
>>> inner_rel_bytes + bucket_bytes > hash_table_bytes but
>>> inner_rel_bytes + bucket_bytes / 4 < hash_table_bytes, then reduce
>>> the number of buckets by 2x or 4x to make it fit. That would provide
>>> a bit of additional safety margin against cases where this patch
>>> might unluckily lose.
>>
>> I don't think that's a good idea. That essentially says 'we're shooting
>> for NTUP_PER_BUCKET but not really'. Moreover, I often see cases where
>> batching (because of smaller work_mem) actually significantly improves
>> performance. If we want to make this reasoning properly, deciding
>> whether to add batches or reduce buckets, we need a better heuristics -
>> that's quite complex and I'd expect it to result ain a quite complex patch.
>>
>> So -1 from me to this at this moment (it certainly needs much more
>> thought than a mere follow-on commit).
> 
> OK, but let's discuss further.  You make it sound like treating
> NTUP_PER_BUCKET as a soft limit is a bad thing, but I'm not convinced.
> I think it's perfectly reasonable to treat NTUP_PER_BUCKET as a range:
> if we can get NTUP_PER_BUCKET=1, great, but if not, NTUP_PER_BUCKET=2
> or NTUP_PER_BUCKET=4 may be perfectly reasonable.


I think this really depends on various factors. For example what may
work for small work_mem (fitting into L2 cache) may not work for large
values, etc.

> I'm actually quite surprised that you find batching to be a better 
> strategy than skimping on buckets, because I would have expect the 
> opposite, almost categorically. Batching means having to write out 
> the tuples we can't process right away and read them back in. If
> that involves disk I/O, I think the cost of that I/O is going to be
> FAR more than the overhead we would have incurred by searching
> slightly longer bucket chains. If it didn't, then you could've set
> work_mem higher and avoided batching in the first place.

No, I don't find batching to be a better strategy. I just think this
really needs more discussion than a simple "let's use NTUP_PER_BUCKET=4
to avoid batching" follow-up patch.

For example, let's say we switch to NTUP_PER_BUCKET=4 to avoid batching,
and then discover we need to start batching anyway. Should we keep the
settings, or should we revert NTUP_PER_BUCKET=1? Or maybe not doing that
for nbatch=2, but for nbatch=16?

I'm not saying it's wrong, but I think it's more complicated than it
seems from your description.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

Reply via email to