Adam Gundy wrote:
On Thu, Apr 10, 2008 at 8:52 AM, Adam Gundy <[EMAIL PROTECTED]> wrote:
Richard Huxton wrote:
How many distinct values do you have in groups.groupid and
group_access.group_id?
 for the small database (since it shows the same problem):

 group_access: 280/268
 groups: 2006/139

 for the large database:

 group_access: same
 groups: 1712647/140

 the groupid key is an MD5 hash, so it should be uniformly distributed.
maybe that throws the stats? but, again, char works, varchar doesn't :-(

OK, I'm thinking the varchar/char part is not the issue.

Good, because it's not :-)

the database is very unbalanced, most of the groups are 1000 or less
records, with one group occupying 95% of the records.

I was wondering - that's why I asked for the stats.

I *think* that when I analyze using char instead of varchar, it is
recording a stat for the large group, but for some reason with varchar
doesn't add a stat for that one.

so, the real question is, how do I fix this? I can turn the stats way
up to 1000, but that doesn't guarantee that I'll get a stat for the
large group :-(

Well, by default it will be tracking the 10 most common values (and how often they occur). As you say, this can be increased to 1000 (although it obviously takes longer to check 1000 rather than 10).

We can have a look at the stats with something like:
SELECT * FROM pg_stats WHERE tablename='group_access' AND attname='uid';
You'll be interested in n_distinct, most_common_vals and most_common_freqs.

However, I think the problem may be that PG doesn't track cross-column stats, so it doesn't know that a particular uid implies one or more particular groupid values.

can I turn the statistics off completely for this column? I'm guessing
that if I can, that will mean it takes a guess based on the number of
distinct values in the groups table, which is still large number of
records, possibly enough to trigger the seqscan anyway.

No - can't disable stats. Besides, you want it the other way around - index scans for all groups except the largest.

does postgres have a way of building a 'counted index' that the
planner can use for it's record counts? some way of forcibly
maintaining a stat for every group?

No, but let's see what's in pg_stats.

the groups are not related to one another - is it possible to
partition them into their own indexes somehow?

Yes, but it will depend on having an explicit group_id=... clause in the query as well as on the index. That's not going to help you here.

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to