Re: [HACKERS] Add primary key/unique constraint using prefix columns of an index

Jeff Janes Wed, 23 May 2012 11:00:52 -0700

On Tue, May 22, 2012 at 11:01 AM, Robert Haas <[email protected]> wrote:
> On Tue, May 22, 2012 at 1:41 PM, Tom Lane <[email protected]> wrote:
>> Jeff Janes <[email protected]> writes:
>>> Now that there are index only scans, there is a use case for having a
>>> composite index which has the primary key or a unique key as the
>>> prefix column(s) but with extra columns after that.  Currently you
>>> would also need another index with exactly the primary/unique key,
>>> which seems like a waste of storage and maintenance.
>>
>>> Should there be a way to declare a "unique" index with the unique
>>> property applying to a prefix of the indexed columns/expression?  And
>>> having that, a way to turn that prefix into a primary key constraint?
>>
>>> Of course this is easier said then done, but is there some reason for
>>> it not to be a to-do item?
>>
>> Um ... other than it being ugly as sin?  I can't say that I can get
>> excited about this concept.  It'd be better to work on index-organized
>> tables, which is really more or less what you're wishing for here.
>> Duplicating most of a table into an index is always going to be a loser
>> in the end because of the redundant storage.
>
> An index on pgbench_accounts (aid, abalance) is the same size as an
> index on pgbench_accounts (aid), but even if it were larger, there's
> no theoretical reason it couldn't have enough utility to justify its
> existence.   A bigger problem is that creating such an index turns all
> of pgbench's write traffic from HOT updates into non-HOT updates,
> which means this is probably only going to be a win if the write
> volume is miniscule.


That seems overly pessimistic to me.  pgbench_accounts only has one
index on it, and that index is already being used to find the row in
the first place, so the relevant leaf block is already in memory.  If
you have a table with 12 indexes on it, then the cost of non-HOT would
be much higher.  But then again, with that number of indexes it is
probably already non-HOT anyway.

Since pgbench does not actually attempt to violate the PK constraint,
I can drop it without altering the behavior of the system.  This
neglects the overhead of checking the "prefix" constraint were that to
be possible, but that overhead should be almost entirely CPU, and so
is negligible to this IO bound workload.

I'm running some tests where I mix the work load of pgbench by doing
"TPC-B (sort of)" transaction mixed in with a variable number of
SELECT-only transactions, at a ratio varying between 1:0 to 1:10.

It is often said that the default pgbench is an unrealistically
write-heavy workload.  So mixing in some SELECT-only is probably only
going to improve its real-world alignment.  In fact I wondering if it
would make sense to add a feature to pgbench to make such admixture
easy to do, rather than the current pain of creating multiple sql
files, specifying a bunch of -f switches in various ratios, and
remembering to always specify the correct -s flag.


From preliminary test it looks like an index on (aid, abalance) wins
at all ratios from 1:1 to 1:10, and at 1:0 it is mostly a toss up.  I
still want to do a few overnight runs to see how the decay in the
visibility map, and perhaps autovacuum kicking in or failing to kick
in, effect things.


Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add primary key/unique constraint using prefix columns of an index

Reply via email to