Proposal Clarification.
I see that discussion become too complicated. So, I'd like to clarify
what we are talking about.
We are discussing 2 different improvements of index.
The one is "partially unique index" and the other "index with included
columns".
Let's look at example.
- We have a table tbl(f1, f2, f3, f4).
- We want to have an unique index on (f1,f2).
- We want to have an index on (f1, f2, f3) which allow us to use index
for complex "where" clauses.
- We would like to have a covering index on all columns to avoid reading
of heap pages.
What are we doing now:
CREATE UNIQUE INDEX on tbl(f1,f2);
CREATE INDEX on tbl(f1, f2, f3, f4);
What's wrong?
- Two indexes with repeated data. Overhead to data manipulation
operations and database size.
- We don't need f4 as index key. But we have to...
- Problem related to previous. It's possible that f4 has no opclass for
our index. So there's no way to include it to index.
While we don't need any opclass at all.
Suggestions.
CREATE INDEX idx ON tbl (f1, f2, f3) [UNIQUE ON (f1, f2)] [INCLUDE (f4)];
Let's review it stepby step.
1. "partially unique index"
CREATE INDEX idx ON tbl (f1, f2, f3) UNIQUE ON (f1, f2);
It means that we want to have columns (f1, f2, f3) as index keys in our
index.
But we want enforce uniqueness only on first two.
It allows us insert (1,1,1), (1,2,1) and restricts insert (1,1,1), (1,1,2).
It doesn't affect "select" queries.
Following query can use index-only scan.
SELECT f1,f2, f3 where f1 ... and f2 ... and f3 ....;
We haven't to maintain two indexes now. Just one!
_bt_iseual()
<http://doxygen.postgresql.org/nbtinsert_8c.html#abcfb7a3dcd40a5d1759652239f3a0115>
works with (f1, f2)
_bt_compare()
<http://doxygen.postgresql.org/nbtsearch_8c.html#af689dabb25e99f551b68aa9b7a1e6ea6>
works with (f1,f2,f3)
2. "index with included columns" It goes well with both unique and
non-unique indexes.
CREATE INDEX idx ON tbl (f1, f2, f3) INCLUDE (f4);
What we get here:
- f4 is not search key.
- f4 could not have opclass for our index
- f4 is only in the leaf pages and it's not bloating internal nodes of
b-tree.
- f4 can still be retrieved without going to the heap, so including it
in the leaf nodes makes it possible to do index-only scans more often
Following query can use index-only scan:
SELECT f1,f2, f3, f4 where f1 ... and f2 ... and f3 ....;
And this one wouldn't use index-only scan because recheck on f4 is required.
SELECT f1,f2, f3, f4 where f1 ... and f2 ... and f3 .... and f4;
Catalog changes:
Now:
pg_index
<http://doxygen.postgresql.org/pg__index_8h.html#a5065be0408f03576083a30c97b43a09a>
int16 indnatts; /* number of columns in index */
bool indisunique; /* is this a unique index? */
New:
pg_index
int16 ind_n_unique_atts; /* number of unique columns in index. counted
from begin of index. 0 means that index is not unique */
int16 ind_n_key_atts; /* number of index key columns in index. counted
from begin of index.*/
int16 ind_n_total_atts; /* number of columns in index.*/
In our case:
ind_n_unique_atts = 2; // f1, f2
ind_n_key_atts = 3; // f1, f2, f3
ind_n_total_atts = 4; // f1, f2, f3, f4
P.S. I use many ideas from discussion without quotations just because
I'd like to keep this message readable. Thanks to everyone.
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company