From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Ivan Fabris, Gruppo ColliniConsulting.it Sent: Wednesday, April 12, 2017 6:15 AM To: 'pgsql-general@postgresql.org' <pgsql-general@postgresql.org> Subject: [GENERAL] advice for efresh of materialized view
Attention: This email was sent from someone outside of Perceptron. Always exercise caution when opening attachments or clicking links from unknown senders or when receiving unexpected emails. Hi all, I have a table defined this way ( a couple of indexes are actually obsolete, the table needs a little clean ) : # \d categories_stat Table "public.categories_stat" Column | Type | Modifiers --------------+-----------------------------+----------------------------------------------------------------------------- id | numeric(1000,1) | not null default function_get_next_sequence('categories_stat_id_seq'::text) giorno | timestamp without time zone | categoria | character varying(50) | accessi | integer | bytes2 | integer | time_elapsed | character varying(10) | utenti | character varying(50) | action | integer | Indexes: "categories_stat_pkey" PRIMARY KEY, btree (id) "action_stat_ginidx" gin (to_tsvector('english'::regconfig, action::text)) "action_stat_idx" btree (action) "categoria_stat_ginidx" gin (to_tsvector('english'::regconfig, categoria::text)) "categoria_stat_idx" btree (categoria) "categories_stat_giorno_date_idx" brin ((giorno::date)) "categories_stat_giorno_time_idx" brin ((giorno::time without time zone)) "categories_stat_utenti_like_idx" btree (utenti text_pattern_ops) "giorno_idx" btree (giorno) CLUSTER "keycategories_stat" btree (id) "utenti_stat_idx" btree (utenti) The table has about 100M records, and is growing, we are thinking about partition it by day ( the column "giorno" ). To speed up some queries, we defined a materialized view # \d+ categories_stat_materialized_view Materialized view "public.categories_stat_materialized_view" Column | Type | Modifiers | Storage | Stats target | Description -----------+-----------------------+-----------+----------+--------------+------------- id | numeric | | main | | giorno | date | | plain | | categoria | character varying(50) | | extended | | accessi | bigint | | plain | | utenti | character varying(50) | | extended | | action | integer | | plain | | Indexes: "categories_stat_materialized_view_id_idx" UNIQUE, btree (id) "categories_stat_materialized_view_aggregate_1" btree (utenti, giorno, action) View definition: SELECT max(categories_stat.id) AS id, categories_stat.giorno::date AS giorno, categories_stat.categoria, sum(categories_stat.accessi) AS accessi, categories_stat.utenti, categories_stat.action FROM categories_stat GROUP BY (categories_stat.giorno::date), categories_stat.categoria, categories_stat.utenti, categories_stat.action; The "id" column was previously defined as "nextval( somesequence )" and was about 0.1% faster, then we kept max(categories_stat.id) due to internal standards. # explain analyze SELECT max(categories_stat.id) AS id, categories_stat.giorno::date AS giorno, categories_stat.categoria, sum(categories_stat.accessi) AS accessi, categories_stat.utenti, categories_stat.action FROM categories_stat GROUP BY (categories_stat.giorno::date), categories_stat.categoria, categories_stat.utenti, categories_stat.action; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize GroupAggregate (cost=22436069.43..23036104.47 rows=7273152 width=65) (actual time=186699.351..189719.626 rows=1529896 loops=1) Group Key: ((giorno)::date), categoria, utenti, action -> Sort (cost=22436069.43..22508800.95 rows=29092608 width=65) (actual time=186699.335..187540.305 rows=4987051 loops=1) Sort Key: ((giorno)::date), categoria, utenti, action Sort Method: external merge Disk: 287784kB -> Gather (cost=8094034.95..11499463.60 rows=29092608 width=65) (actual time=131126.010..164090.150 rows=4987051 loops=1) Workers Planned: 4 Workers Launched: 4 -> Partial GroupAggregate (cost=8093034.95..8589202.80 rows=7273152 width=65) (actual time=137793.619..161235.482 rows=997410 loops=5) Group Key: ((giorno)::date), categoria, utenti, action -> Sort (cost=8093034.95..8150928.30 rows=23157340 width=38) (actual time=137793.605..149953.194 rows=18522425 loops=5) Sort Key: ((giorno)::date), categoria, utenti, action Sort Method: external merge Disk: 946016kB -> Parallel Seq Scan on categories_stat (cost=0.00..1370601.75 rows=23157340 width=38) (actual time=0.044..13328.278 rows=18522425 loops=5) Planning time: 1.323 ms Execution time: 189956.108 ms (16 rows) The key point is that the periodic refresh of the MV ( REFRESH materialized view concurrently categories_stat_materialized_view ) takes about 8 minutes ( it takes about 3 to create the MV ). I'd like to know if there is a way to speed up the refresh ( or the creation ) .... with some additional indexes, maybe ? Thanks in advance ____________________________________________________________________________________________________ Also, in regards to: Sort Method: external merge Disk: 287784kB I'd increase work_mem setting, to avoid on-disk sorting. Regards, Igor Neyman