I wrote this query to identify how much space is being wasted in a
database by denormalized data. For each field in each table, it
calculates how much space the field is taking in it's table as well as
indexes, then it calculates how much space would be needed if that field
was normalized into a separate table. It places some (somewhat
arbitrary) minimums on how much space would have to be saved to include
that field in the output. If you want to get rid of the limit you should
still keep savings > 0 in the query, otherwise you'll start seeing
normalization suggestions that make no sense (like normalizing an int).

As it stands, the query reports a total for the entire database, but you
can use just sub-sections of the query to see savings per-table, or
per-field.

-- Total
SELECT pg_size_pretty(sum(sum)::bigint)
    -- Summarize by table
    FROM (SELECT schemaname, tablename, sum(savings), 
pg_size_pretty(sum(savings)::bigint)
            -- Get pretty size. Start here if you want per-table info
            FROM (SELECT *, pg_size_pretty(savings::bigint)
                    FROM (
-- Here's where the real work starts
SELECT *, table_space_delta + coalesce(index_space_delta, 0) - side_table_space 
AS savings
    FROM (
        -- Figure out how much space we'd save in indexes by converting to an 
int
        SELECT *, index_tuple_count*(avg_width-4) AS index_space_delta
            FROM (SELECT s.*, index_count, index_tuple_count
                        FROM (SELECT schemaname, tablename, attname, null_frac, 
avg_width, n_distinct
                                            -- How much space would we gain by 
changing this field to an int?
                                            , 
reltuples*(1-null_frac)*(avg_width-4) AS table_space_delta
                                            -- Estimate how big our "side 
table" will be
                                            , CASE WHEN n_distinct >= 0 THEN 
n_distinct ELSE -n_distinct * reltuples END
                                                * 
(24+4+avg_width+6+4+6+avg_width) AS side_table_space
                                FROM pg_stats s
                                    JOIN pg_class c ON c.relname=tablename AND 
c.relkind='r'
                                    JOIN pg_namespace n ON n.oid = 
c.relnamespace AND n.nspname=s.schemaname
                                WHERE schemaname NOT IN 
('pg_catalog','information_schema')
                            ) s
                            NATURAL LEFT JOIN (
                                    SELECT n.nspname AS schemaname, c.relname 
AS tablename, attname
                                                , count(*) AS index_count, 
sum(i.reltuples) AS index_tuple_count
                                        FROM pg_index x
                                            JOIN pg_class c ON c.oid = 
x.indrelid
                                            JOIN pg_class i ON i.oid = 
x.indexrelid
                                            JOIN pg_namespace n ON n.oid = 
c.relnamespace
                                            JOIN pg_attribute a ON a.attrelid = 
i.oid
                                        GROUP BY n.nspname, c.relname, attname
                                                ) i
                    ) a
            ) a
                            ) a
                        -- Minimum savings to consider per-field
                        WHERE savings > 1e6) a
                GROUP BY schemaname, tablename
                -- Minimum savings to consider per-table
                HAVING sum(savings) > 1e7
                ORDER BY sum(savings) DESC
            ) a
;
-- 
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Attachment: pgpPK8JSpK8sA.pgp
Description: PGP signature

Reply via email to