Gregory Stark wrote: > We currently execute a lot of joins as Nested Loops which would be more > efficient if we could batch together all the outer keys and execute a single > inner bitmap index scan for all of them together. > > Essentially what I'm saying is that we're missing a trick with Hash Joins > which currently require that we can execute the inner side once without any > parameters from the outer side. > > Instead what we could do is build up the hash table, then scan the hash table > building up an array of keys and pass them as a parameter to the inner side. > The inner side could do a bitmap index scan to fetch them all at once and > start returning them just as normal to the hash join. > > There are a couple details: > > 1) Batched hash joins. Actually I think this would be fairly straightforward. > You want to rescan the inner side once for each batch. That would actually > be easier than what we currently do with saving tuples to files and all > that. > > 2) How to pass the keys. This could be a bit tricky especially for > multi-column keys. My first thought was to build up an actually Array node > but that only really works for single-column keys I think. Besides it would > be more efficient to somehow arrange to pass over a reference to the whole > hash. > > I fear the real complexity would be (as always) in the planner rather than the > executor. I haven't really looked into what it would take to arrange this or > how to decide when to do it.
If the scanning of the inner side is a performance problem, why would we be choosing a nested loop in the first place, vs. another type of join? -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers