On 11.12.2014 20:00, Robert Haas wrote: > On Thu, Dec 11, 2014 at 12:29 PM, Kevin Grittner <kgri...@ymail.com> wrote: >> >> Under what conditions do you see the inner side get loaded into the >> hash table multiple times? > > Huh, interesting. I guess I was thinking that the inner side got > rescanned for each new batch, but I guess that's not what happens.
No, it's not rescanned. It's scanned only once (for the batch #0), and tuples belonging to the other batches are stored in files. If the number of batches needs to be increased (e.g. because of incorrect estimate of the inner table), the tuples are moved later. > > Maybe there's no real problem here, and we just win. I'm a bit confused by this discussion, because the inner relation has nothing to do with this patch. It gets scanned exactly once, no matter what the load factor is. If a batching is necessary, only the already files (without reexecuting the inner part) are read. However in that case this patch makes no difference, because it explicitly reverts to load factor = NTUP_PER_BUCKET (which is 1). The only point of this patch was to prevent batching because of the outer table. Usually, the outer table is much larger than the inner one (e.g. in a star schema, outer = fact table, inner = dimension). Batching the outer table means you have to write >= 50% into a temporary file. The idea was that if we could increase the load a bit (e.g. using 2 tuples per bucket instead of 1), we will still use a single batch in some cases (when we miss the work_mem threshold by just a bit). The lookups will be slower, but we'll save the I/O. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers