Re: [PERFORM] Group by more efficient than distinct?

Mark Mielke Tue, 22 Apr 2008 05:02:01 -0700

Matthew Wakeling wrote:

On Mon, 21 Apr 2008, Mark Mielke wrote:
This surprises me - hash values are lossy, so it must still need toconfirm against the real list of values, which at a minimum shouldrequire references to the rows to check against?
Is PostgreSQL doing something beyond my imagination? :-)
Not too far beyond your imagination, I hope.
It's simply your assumption that the hash table is lossy. Sure, hashvalues are lossy, but a hash table isn't. Postgres stores in memorynot only the hash values, but the rows they refer to as well, havingchecked them all on disc beforehand. That way, it doesn't need to lookup anything on disc for that branch of the join again, and it has arapid in-memory lookup for each row.


I said hash *values* are lossy. I did not say hash table is lossy.

The poster I responded to said that the memory required for a hash joinwas relative to the number of distinct values, not the number of rows.They gave an example of millions of rows, but only a few distinctvalues. Above, you agree with me that it it would include the rows (orat least references to the rows) as well. If it stores rows, orreferences to rows, then memory *is* relative to the number of rows, andmillions of records would require millions of rows (or row references).


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Group by more efficient than distinct?

Reply via email to