Hi, I've been wondering whether this might improve behavior with one of my workloads, suffering by GIN bloat - the same one I used to test GIN fastscan, for example.
It's a batch process that loads a mailing list archive into a table with a GIN index on message body, by doing something like this: for file in files: BEGIN; for message in file: SAVEPOINT s; INSERT INTO messages VALUES (...) if error: ROLLBACK TO s; COMMIT; And there are multiple processes, each processing subset of mbox files. There are ~1M messages and right after the load I see this: List of relations Schema | Name | Type | Owner | Table | Size --------+------------------+-------+-------+----------+--------- public | message_body_idx | index | tomas | messages | 2247 MB (1 row) and after VACUUM FULL: List of relations Schema | Name | Type | Owner | Table | Size --------+------------------+-------+-------+----------+--------- public | message_body_idx | index | tomas | messages | 403 MB (1 row) So the index is ~5x larger, which is probably expected due to the amount of random inserts within a very short time (~15 minutes), executed in parallel. I hoped lowering the fillfactor will improve this, but fillfactor=75 had pretty much no effect in this case. Is that expected for this kind of workload? I see the previous discussion talked about random updates, not inserts, so maybe that's the culprit? -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers