[PERFORM] Statistics and Multi-Column indexes

lars Sun, 10 Jul 2011 14:17:01 -0700

I know this has been discussed various times...

We are maintaining a large multi tenant database where *all* tables havea tenant-id and all indexes and PKs lead with the tenant-id.Statistics and counts for the all other columns are only reallymeaningful within the context of the tenant they belong to.


There appear to be five options for me:

1. Using single column indexes on all interesting columns and rely onPostgreSQLs bitmap indexes to combine them (which are pretty cool).2. Use multi column indexes and accept that sometimes Postgres pick thewrong index (because a non-tenant-idcolumn might seem highly selective over the table, but it is not for aparticular tenant - or vice versa).3. Use a functional index that combines multiple columns and only queryvia these, that causes statistics

gathering for the expression.

I.e. create index i on t((tenantid||column1)) and SELECT ... FROM tWHERE tenantid||column1 = '...'4. Play with n_distinct and/or set the statistics for the inner columnsto some fixed values that lead to the plans that we want.

5. Have a completely different schema and maybe a database per tenant.

Currently we use Oracle and query hinting, but I do not like thatpractice at all (and Postgres does not have hints anyway).

Are there any other options?

#1 would be the simplest, but I am concerned about the overhead, bothmaintaining two indexes and building the bitmap during queries - forevery query.

I don't think #2 is actually an option. We have some tenants with many(sometimes 100s) millions of rows per table,

and picking the wrong index would be disastrous.

Could something like #3 be generally added to Postgres? I.e. if there isa multi column index keep combined statistics forthe involved columns. Of course in that case is it no longer possible toquery the index by prefix.#3 also seems expensive as the expression needs to be evaluated for eachchanged row.

Still trying #4. I guess it involves setting the stat target for theinner columns to 0 and then inserting my own records intopg_statistic. Probably only setting n_distinct, i.e. set it "low" if theinner column is not selective within the context of a tenant and "high"otherwise.


For various reasons #5 is also not an option.

And of course the same set of questions comes up with joins.

Thanks.

-- Lars


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Statistics and Multi-Column indexes

Reply via email to