[PERFORM] Database design - best practice

Niels Kristian Schjødt Wed, 28 Nov 2012 04:41:50 -0800

Hi,

I'm on the hunt for some solid knowledge on a theoretical level about the 
performance of postgresql. My question is regarding best practices, and how 
architectural decisions might influence the performance. First a little 
background:


The setup:
I have a database which holds informations on used cars. The database has 
mainly 3 tables of interest for this case:
A cars table, an adverts table and a sellers table. One car has many adverts 
and one seller has many adverts. One advert belongs to one car and one seller.
The database is powering a website for searching used cars. When searching for 
used cars, the cars table is mainly used, and a lot of the columns should be 
directly available for searching e.g. color, milage, price, 
has_automatic_transmission etc.

So my main concern is actually about the cars table, since this one currently 
has a lot of columns (151 - I expect thats quite a lot?), and a lot of data (4 
mil. rows, and growing). Now you might start by thinking, this could sound like 
a regular need for some normalization, but wait a second and let me explain :-)
The columns in this table is for the most very short stings, integers, decimals 
or booleans. So take for an example has_automatic_transmission (boolean) I 
can't see why it would make sense to put that into a separate table and join in 
the values. Or the milage or the price as another example. The cars table used 
for search is indexed quite a lot.

The questions:
Having the above setup in mind, what impact on performance, in terms of read 
performance and write performance, does it have, whether I do the following:
        1) In general would the read and/or the write on the database be 
faster, if I serialized some of the not searched columns in the table into a 
single text columns instead of let's say 20 booleans?
        2) Lets say I'm updating a timestamp in a single one of the 151 columns 
in the cars table. The update statement is using the id to find the car. Would 
the write performance of that UPDATE be affected, if the table had fewer 
columns?
        3) When adding a new column to the table i know that it becomes slower 
the more rows is in the table, but what about the "width" of the table does 
that affect the performance when adding new columns?
        4) In general what performance downsides do you get when adding a lot 
of columns to one table instead of having them in separate tables?
        5) Is it significantly faster to select * from a table with 20 columns, 
than selecting the same 20 in a table with 150 columns?

Hope there is some good answers out there :-)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Database design - best practice

Reply via email to