Re: [PERFORM] Group by more efficient than distinct?

Mark Mielke Tue, 22 Apr 2008 06:05:10 -0700

Matthew Wakeling wrote:

On Tue, 22 Apr 2008, Mark Mielke wrote:
The poster I responded to said that the memory required for a hashjoin was relative to the number of distinct values, not the number ofrows. They gave an example of millions of rows, but only a fewdistinct values. Above, you agree with me that it it would includethe rows (or at least references to the rows) as well. If it storesrows, or references to rows, then memory *is* relative to the numberof rows, and millions of records would require millions of rows (orrow references).
Yeah, I think we're talking at cross-purposes, due to hash tablesbeing used in two completely different places in Postgres. Firstly,you have hash joins, where Postgres loads the references to the actualrows, and puts those in the hash table. For that situation, you want asmall number of rows. Secondly, you have hash aggregates, wherePostgres stores an entry for each "group" in the hash table, and doesnot store the actual rows. For that situation, you can have abazillion individual rows, but only a small number of distinct groups.


That makes sense with my reality. :-)

Thanks,
mark

--
Mark Mielke <[EMAIL PROTECTED]>


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Group by more efficient than distinct?

Reply via email to