Re: [PERFORM] Database design - best practice

Willem Leenen Wed, 28 Nov 2012 05:28:00 -0800

 In relation to question 2, I read somewhere in the documentation that 
because of MVCC, the whole row has to be rewritten even though I just 
update one single column in that row. Hence if the table is wider (has 
more columns), the update will be slower. Does this match your 
understanding?


No. I don't count the number of rows, but number of blocks (pages) that are 
modified, which are 8K each. 

My advice would be to first establish a solutiondirection via diagnosing the 
problem. My experience is that most solutions are not obscure at all.



Subject: Re: [PERFORM] Database design - best practice
From: nielskrist...@autouncle.com
Date: Wed, 28 Nov 2012 14:20:27 +0100
CC: pgsql-performance@postgresql.org
To: willem_lee...@hotmail.com

Thanks for the advice.
Currently I see a lot of I/O related to update/inserts, so I'm trying to track 
down these guys at first. In relation to question 2, I read somewhere in the 
documentation that because of MVCC, the whole row has to be rewritten even 
though I just update one single column in that row. Hence if the table is wider 
(has more columns), the update will be slower. Does this match your 
understanding?
Den 28/11/2012 kl. 14.10 skrev Willem Leenen <willem_lee...@hotmail.com>:Niels,

" I can't see why it would make sense to put that into a separate table and 
join in the values " 
You don't normalize for performance. People DEnormalize for performance.


Questions: (AFAIK)

1) This is a way to disaster. Get yourself a book on RDBMS from for example 
Celko. Do NOT go against the flow of the RDBMS rules, as here in rule #1 atomic 
values of a column. 

2) This is not the big fish you are after. First benchmark your setup and 
compare the results with your desired performance level. First quantify your 
problem, if there is any, before using tricks.

3) A row will need more memory when it is wider, this may be amplified during 
hash joins. 

4) People DEnormalize for performance. 

5) " Is it significantly faster to select * from a table with 20 columns, than 
selecting the same 20 in a table with 150 columns?" 

I know the answer, but i encourage you to simply test this. I have seen lot's 
of urban legends about performance ( including the dropping of the referential 
integrity be cause that would make a difference.... ). 
Of course , when it's a full table scan, and it are ALL disk reads, (or ALL 
memory reads_) you can simply calculate it too. But just get into the habit of  
testing for learning.


My advice:
- know what performance you need.
- test if you have this, varying tablecontent and systemload
- do not tamper with the RDBMS rules, this will haunt you.
- if you have the latest postgres version, you can use covering indexes: tables 
aren't accessed at all, bypassing most of your questions. Check with peers if 
you've got the indexes right.

Regards,
Willem



> From: nielskrist...@autouncle.com
> Subject: [PERFORM] Database design - best practice
> Date: Wed, 28 Nov 2012 13:41:14 +0100
> To: pgsql-performance@postgresql.org
> 
> Hi,
> 
> I'm on the hunt for some solid knowledge on a theoretical level about the 
> performance of postgresql. My question is regarding best practices, and how 
> architectural decisions might influence the performance. First a little 
> background:
> 
> The setup:
> I have a database which holds informations on used cars. The database has 
> mainly 3 tables of interest for this case:
> A cars table, an adverts table and a sellers table. One car has many adverts 
> and one seller has many adverts. One advert belongs to one car and one seller.
> The database is powering a website for searching used cars. When searching 
> for used cars, the cars table is mainly used, and a lot of the columns should 
> be directly available for searching e.g. color, milage, price, 
> has_automatic_transmission etc.
> 
> So my main concern is actually about the cars table, since this one currently 
> has a lot of columns (151 - I expect thats quite a lot?), and a lot of data 
> (4 mil. rows, and growing). Now you might start by thinking, this could sound 
> like a regular need for some normalization, but wait a second and let me 
> explain :-)
> The columns in this table is for the most very short stings, integers, 
> decimals or booleans. So take for an example has_automatic_transmission 
> (boolean) I can't see why it would make sense to put that into a separate 
> table and join in the values. Or the milage or the price as another example. 
> The cars table used for search is indexed quite a lot.
> 
> The questions:
> Having the above setup in mind, what impact on performance, in terms of read 
> performance and write performance, does it have, whether I do the following:
> 1) In general would the read and/or the write on the database be faster, if I 
> serialized some of the not searched columns in the table into a single text 
> columns instead of let's say 20 booleans?
> 2) Lets say I'm updating a timestamp in a single one of the 151 columns in 
> the cars table. The update statement is using the id to find the car. Would 
> the write performance of that UPDATE be affected, if the table had fewer 
> columns?
> 3) When adding a new column to the table i know that it becomes slower the 
> more rows is in the table, but what about the "width" of the table does that 
> affect the performance when adding new columns?
> 4) In general what performance downsides do you get when adding a lot of 
> columns to one table instead of having them in separate tables?
> 5) Is it significantly faster to select * from a table with 20 columns, than 
> selecting the same 20 in a table with 150 columns?
> 
> Hope there is some good answers out there :-)
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Database design - best practice

Reply via email to