On 12.9.2014 22:24, Robert Haas wrote: > On Fri, Sep 12, 2014 at 3:39 PM, Tomas Vondra <t...@fuzzy.cz> wrote: >> On 12.9.2014 18:49, Robert Haas wrote: >>> I'm comfortable with this version if you are, but (maybe as a >>> follow-on commit) I think we could make this even a bit smarter. If >>> inner_rel_bytes + bucket_bytes > hash_table_bytes but >>> inner_rel_bytes + bucket_bytes / 4 < hash_table_bytes, then reduce >>> the number of buckets by 2x or 4x to make it fit. That would provide >>> a bit of additional safety margin against cases where this patch >>> might unluckily lose. >> >> I don't think that's a good idea. That essentially says 'we're shooting >> for NTUP_PER_BUCKET but not really'. Moreover, I often see cases where >> batching (because of smaller work_mem) actually significantly improves >> performance. If we want to make this reasoning properly, deciding >> whether to add batches or reduce buckets, we need a better heuristics - >> that's quite complex and I'd expect it to result ain a quite complex patch. >> >> So -1 from me to this at this moment (it certainly needs much more >> thought than a mere follow-on commit). > > OK, but let's discuss further. You make it sound like treating > NTUP_PER_BUCKET as a soft limit is a bad thing, but I'm not convinced. > I think it's perfectly reasonable to treat NTUP_PER_BUCKET as a range: > if we can get NTUP_PER_BUCKET=1, great, but if not, NTUP_PER_BUCKET=2 > or NTUP_PER_BUCKET=4 may be perfectly reasonable.
I think this really depends on various factors. For example what may work for small work_mem (fitting into L2 cache) may not work for large values, etc. > I'm actually quite surprised that you find batching to be a better > strategy than skimping on buckets, because I would have expect the > opposite, almost categorically. Batching means having to write out > the tuples we can't process right away and read them back in. If > that involves disk I/O, I think the cost of that I/O is going to be > FAR more than the overhead we would have incurred by searching > slightly longer bucket chains. If it didn't, then you could've set > work_mem higher and avoided batching in the first place. No, I don't find batching to be a better strategy. I just think this really needs more discussion than a simple "let's use NTUP_PER_BUCKET=4 to avoid batching" follow-up patch. For example, let's say we switch to NTUP_PER_BUCKET=4 to avoid batching, and then discover we need to start batching anyway. Should we keep the settings, or should we revert NTUP_PER_BUCKET=1? Or maybe not doing that for nbatch=2, but for nbatch=16? I'm not saying it's wrong, but I think it's more complicated than it seems from your description. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers