[HACKERS] logical column order and physical column order

David Rowley Sun, 03 Nov 2013 00:38:44 -0700

I've just been looking at how alignment of columns in tuples can make the
tuple larger than needed.


I created 2 tables... None of which are very real world, but I was hunting
for the extremes here...

The first table contained an int primary key and then a total of 10 int
columns and 10 boolean columns, I placed one boolean column after an int
column so that it was int,bool, int bool, etc
With the 2nd table I had all the ints first then all the booleans at the
end of the table. I then inserted 1 million records per table and checked
the sizes of each table.

The first table was 112 MB and the 2nd table was 81MB, so naturally there
is a difference when it comes to running queries on these tables.

postgres=# select sum(Value1) from test1;
     sum
--------------
 500000500000
(1 row)


Time: 239.306 ms

postgres=# select sum(Value1) from test2;
     sum
--------------
 500000500000
(1 row)


Time: 186.926 ms

So in this example a full scan and aggregate of a single column is 28%
faster.

I'm sure in the real world there are many cases where a better choice in
column ordering would save space and save processing times, but is this
something that we want to leave up to our users?

I've not yet looked at the code to see how hard implementing separation of
column physical order and logical order would be. I really just want to get
an idea of what the thoughts would be on such a change.

I would imagine it should be possible to have a function which optimises
column orders which is run when a table is created or rewritten. New
columns would still go onto the end of the tuple unless the table had to be
rewritten and in this case the column order would be optimised again. All
plays where column names were displayed without explicit ordering, e.g
select * and in psql the catalog could be queried to see which order these
columns should be displayed in.

For reference I've attached the script I used for testing this.

I'd like to implement this as a project, but before I start any work on it
I'd just like to find out other people's thoughts on it.

Regards

David Rowley

CREATE TABLE test1 (
  id serial not null primary key,
  value1 int not null default 10,
  value1_enabled boolean not null default true,
  value2 int not null default 10,
  value2_enabled boolean not null default true,
  value3 int not null default 10,
  value3_enabled boolean not null default true,
  value4 int not null default 10,
  value4_enabled boolean not null default true,
  value5 int not null default 10,
  value5_enabled boolean not null default true,
  value6 int not null default 10,
  value6_enabled boolean not null default true,
  value7 int not null default 10,
  value7_enabled boolean not null default true,
  value8 int not null default 10,
  value8_enabled boolean not null default true,
  value9 int not null default 10,
  value9_enabled boolean not null default true,
  value10 int not null default 10,
  value10_enabled boolean not null default true
);

CREATE TABLE test2 (
  id serial not null primary key,
  value1 int not null default 10,
  value2 int not null default 10,
  value3 int not null default 10,
  value4 int not null default 10,
  value5 int not null default 10,
  value6 int not null default 10,
  value7 int not null default 10,
  value8 int not null default 10,
  value9 int not null default 10,
  value10 int not null default 10,
  value1_enabled boolean not null default true,
  value2_enabled boolean not null default true,
  value3_enabled boolean not null default true,
  value4_enabled boolean not null default true,
  value5_enabled boolean not null default true,
  value6_enabled boolean not null default true,
  value7_enabled boolean not null default true,
  value8_enabled boolean not null default true,
  value9_enabled boolean not null default true,
  value10_enabled boolean not null default true
);

insert into test1 (value1) select x.x from generate_series(1,1000000) x(x);
insert into test2 (value1) select x.x from generate_series(1,1000000) x(x);

postgres=# select pg_size_pretty(pg_relation_size('test2'));
 pg_size_pretty
----------------
 81 MB
(1 row)


postgres=# select pg_size_pretty(pg_relation_size('test1'));
 pg_size_pretty
----------------
 112 MB
(1 row)

postgres=# select sum(Value1) from test1;
     sum
--------------
 500000500000
(1 row)


Time: 239.306 ms

postgres=# select sum(Value1) from test2;
     sum
--------------
 500000500000
(1 row)


Time: 186.926 ms

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] logical column order and physical column order

Reply via email to