On 09/25/2015 02:54 AM, Robert Haas wrote:
On Thu, Sep 24, 2015 at 1:58 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
Meh, you're right - I got the math wrong. It's 1.3% in both cases.

However the question still stands - why should we handle the
over-estimate in one case and not the other? We're wasting the
samefraction of memory in both cases.

Well, I think we're going around in circles here. It doesn't seem
likely that either of us will convince the other.

Let's agree we disagree ;-) That's perfectly OK, no hard feelings.

But for the record, I agree with you that in the scenario you lay
out, it's the about the same problem in both cases. I could argue
that it's slightly different because of [ tedious and somewhat
tenuous argument omitted ], but I'll spare you that.

OK, although that makes kinda prevents further discussion.

However, consider the alternative scenario where, on the same
machine, perhaps even in the same query, we perform two hash joins,
one of which involves hashing a small table (say, 2MB) and one of
which involves hashing a big table (say, 2GB). If the small query
uses twice the intended amount of memory, probably nothing bad will
happen. If the big query does the same thing, a bad outcome is much
more likely. Say the machine has 16GB of RAM. Well, a 2MB
over-allocation is not going to break the world. A 2GB
over-allocation very well might.

I've asked about case A. You've presented case B and shown that indeed, the limit seems to help here. I don't see how that makes any difference in case A, which I asked about?

I really don't see why this is a controversial proposition. It seems
clearly as daylight from here.

I wouldn't say controversial, but I do see the proposed solution as misguided because we're fixing A and claiming to also fix B. Not only we're not really fixing B, but may actually make it needlessly slower for people who don't have problems with B at all.

We've ran into problem with allocating more than MaxAllocSize. The proposed fix (imposing arbitrary limit) is also supposedly fixing over-estimation problems, but actually it not (IMNSHO).

And I think my view is supported by the fact that solutions that seem to actually fix the over-estimation properly emerged. I mean the "let's not build the buckets at all, until the very end" and "let's start with nbatches=0" discussed yesterday. (And I'm not saying that because I proposed those two things.)

Anyway, I think you're right we're going in circles here. I think we both presented all the arguments we had and we still disagree. I'm not going to continue with this - I'm unlikely to win an argument against two committers if that didn't happen until now. Thanks for the discussion though.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to