Re: [PERFORM] Database design - best practice

Willem Leenen Wed, 28 Nov 2012 05:10:55 -0800

Niels,

"  I can't see why it would make sense to put that into a separate table and 
join in the values " 
You don't normalize for performance. People DEnormalize for performance.



Questions: (AFAIK)
        
1) This is a way to disaster. Get yourself a book on RDBMS from for example 
Celko. Do NOT go against the flow of the RDBMS rules, as here in rule #1 atomic 
values of a column. 
 
        2) This is not the big fish you are after. First benchmark your setup 
and compare the results with your desired performance level. First quantify 
your problem, if there is any, before using tricks.

3) A row will need more memory when it is wider, this may be amplified during 
hash joins. 

        4)  People DEnormalize for performance. 

5) " Is it significantly faster to select * from a table with 20 columns, than 
selecting the same 20 in a table with 150 columns?" 

I know the answer, but i encourage you to simply test this. I have seen lot's 
of urban legends about performance ( including the dropping of the referential 
integrity be cause that would make a difference.... ). 
Of course , when it's a full table scan, and it are ALL disk reads, (or ALL 
memory reads_) you can simply calculate it too. But just get into the habit of  
testing for learning.


My advice:
- know what performance you need.
- test if you have this, varying tablecontent and systemload
- do not tamper with the RDBMS rules, this will haunt you.
- if you have the latest postgres version, you can use covering indexes: tables 
aren't accessed at all, bypassing most of your questions. Check with peers if 
you've got the indexes right.

Regards,
Willem



> From: [email protected]
> Subject: [PERFORM] Database design - best practice
> Date: Wed, 28 Nov 2012 13:41:14 +0100
> To: [email protected]
> 
> Hi,
> 
> I'm on the hunt for some solid knowledge on a theoretical level about the 
> performance of postgresql. My question is regarding best practices, and how 
> architectural decisions might influence the performance. First a little 
> background:
> 
> The setup:
> I have a database which holds informations on used cars. The database has 
> mainly 3 tables of interest for this case:
> A cars table, an adverts table and a sellers table. One car has many adverts 
> and one seller has many adverts. One advert belongs to one car and one seller.
> The database is powering a website for searching used cars. When searching 
> for used cars, the cars table is mainly used, and a lot of the columns should 
> be directly available for searching e.g. color, milage, price, 
> has_automatic_transmission etc.
> 
> So my main concern is actually about the cars table, since this one currently 
> has a lot of columns (151 - I expect thats quite a lot?), and a lot of data 
> (4 mil. rows, and growing). Now you might start by thinking, this could sound 
> like a regular need for some normalization, but wait a second and let me 
> explain :-)
> The columns in this table is for the most very short stings, integers, 
> decimals or booleans. So take for an example has_automatic_transmission 
> (boolean) I can't see why it would make sense to put that into a separate 
> table and join in the values. Or the milage or the price as another example. 
> The cars table used for search is indexed quite a lot.
> 
> The questions:
> Having the above setup in mind, what impact on performance, in terms of read 
> performance and write performance, does it have, whether I do the following:
>       1) In general would the read and/or the write on the database be 
> faster, if I serialized some of the not searched columns in the table into a 
> single text columns instead of let's say 20 booleans?
>       2) Lets say I'm updating a timestamp in a single one of the 151 columns 
> in the cars table. The update statement is using the id to find the car. 
> Would the write performance of that UPDATE be affected, if the table had 
> fewer columns?
>       3) When adding a new column to the table i know that it becomes slower 
> the more rows is in the table, but what about the "width" of the table does 
> that affect the performance when adding new columns?
>       4) In general what performance downsides do you get when adding a lot 
> of columns to one table instead of having them in separate tables?
>       5) Is it significantly faster to select * from a table with 20 columns, 
> than selecting the same 20 in a table with 150 columns?
> 
> Hope there is some good answers out there :-)
> 
> -- 
> Sent via pgsql-performance mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Database design - best practice

Reply via email to