Re: [PERFORM] Air-traffic benchmark

Craig Ringer Thu, 07 Jan 2010 19:45:40 -0800

On 8/01/2010 2:11 AM, Nikolas Everett wrote:

This table is totally unnormalized.  Normalize it and try again.  You'll
probably see a huge speedup.  Maybe even 10x.  My mantra has always been
less data stored means less data to scan means faster scans.

Sometimes one intentionally denormalizes storage, though. JOIN costs canbe considerable too, and if most of the time you're interested in allthe data for a record not just a subset of it, storing it denormalizedis often faster and cheaper than JOINing for it or using subqueries tofetch it.

Normalization or any other splitting of record into multiple separatelystored records also has costs in complexity, management, the need foradditional indexes, storage of foreign key references, all the extratuple headers you need to store, etc.

It's still generally the right thing to do, but it should be thoughtabout, not just tackled blindly. I only tend to view it as a no-brainerif the alternative is storing numbered fields ("field0", "field1","field2", etc) ... and even then there are exceptions. One of my schemaat the moment has address_line_1 through address_line_4 in a `contact'entity, and there's absolutely *no* way I'm splitting that into aseparate table of address_lines accessed by join and sort! (Arguably Ishould be using a single `text' field with embedded newlines instead,though).

Sometimes it's even better to hold your nose and embed an array in arecord rather than join to an external table. Purism can be taken too far.

Note that Pg's TOAST mechanism plays a part here, too. If you have a big`text' field, it's probably going to get stored out-of-line (TOASTed)anyway, and TOAST is going to be cleverer about fetching it than youwill be using a JOIN. So storing it in-line is likely to be the rightway to go. You can even force out-of-line storage if you're worried.

In the case of this benchmark, even if they split much of this data outinto other tables by reference, it's likely to be slower rather thanfaster if they still want the data they've split out for most of theirqueries.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Air-traffic benchmark

Reply via email to