Matthew Wakeling wrote:
On Tue, 22 Apr 2008, Mark Mielke wrote:
The poster I responded to said that the memory required for a hash
join was relative to the number of distinct values, not the number of
rows. They gave an example of millions of rows, but only a few
distinct values. Above, you agree with me that it it would include
the rows (or at least references to the rows) as well. If it stores
rows, or references to rows, then memory *is* relative to the number
of rows, and millions of records would require millions of rows (or
row references).
Yeah, I think we're talking at cross-purposes, due to hash tables
being used in two completely different places in Postgres. Firstly,
you have hash joins, where Postgres loads the references to the actual
rows, and puts those in the hash table. For that situation, you want a
small number of rows. Secondly, you have hash aggregates, where
Postgres stores an entry for each "group" in the hash table, and does
not store the actual rows. For that situation, you can have a
bazillion individual rows, but only a small number of distinct groups.
That makes sense with my reality. :-)
Thanks,
mark
--
Mark Mielke <[EMAIL PROTECTED]>
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance