On Tue, May 22, 2012 at 11:01 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Tue, May 22, 2012 at 1:41 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Jeff Janes <jeff.ja...@gmail.com> writes: >>> Now that there are index only scans, there is a use case for having a >>> composite index which has the primary key or a unique key as the >>> prefix column(s) but with extra columns after that. Currently you >>> would also need another index with exactly the primary/unique key, >>> which seems like a waste of storage and maintenance. >> >>> Should there be a way to declare a "unique" index with the unique >>> property applying to a prefix of the indexed columns/expression? And >>> having that, a way to turn that prefix into a primary key constraint? >> >>> Of course this is easier said then done, but is there some reason for >>> it not to be a to-do item? >> >> Um ... other than it being ugly as sin? I can't say that I can get >> excited about this concept. It'd be better to work on index-organized >> tables, which is really more or less what you're wishing for here. >> Duplicating most of a table into an index is always going to be a loser >> in the end because of the redundant storage. > > An index on pgbench_accounts (aid, abalance) is the same size as an > index on pgbench_accounts (aid), but even if it were larger, there's > no theoretical reason it couldn't have enough utility to justify its > existence. A bigger problem is that creating such an index turns all > of pgbench's write traffic from HOT updates into non-HOT updates, > which means this is probably only going to be a win if the write > volume is miniscule.
That seems overly pessimistic to me. pgbench_accounts only has one index on it, and that index is already being used to find the row in the first place, so the relevant leaf block is already in memory. If you have a table with 12 indexes on it, then the cost of non-HOT would be much higher. But then again, with that number of indexes it is probably already non-HOT anyway. Since pgbench does not actually attempt to violate the PK constraint, I can drop it without altering the behavior of the system. This neglects the overhead of checking the "prefix" constraint were that to be possible, but that overhead should be almost entirely CPU, and so is negligible to this IO bound workload. I'm running some tests where I mix the work load of pgbench by doing "TPC-B (sort of)" transaction mixed in with a variable number of SELECT-only transactions, at a ratio varying between 1:0 to 1:10. It is often said that the default pgbench is an unrealistically write-heavy workload. So mixing in some SELECT-only is probably only going to improve its real-world alignment. In fact I wondering if it would make sense to add a feature to pgbench to make such admixture easy to do, rather than the current pain of creating multiple sql files, specifying a bunch of -f switches in various ratios, and remembering to always specify the correct -s flag. From preliminary test it looks like an index on (aid, abalance) wins at all ratios from 1:1 to 1:10, and at 1:0 it is mostly a toss up. I still want to do a few overnight runs to see how the decay in the visibility map, and perhaps autovacuum kicking in or failing to kick in, effect things. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers