On Thu, Apr 10, 2008 at 8:52 AM, Adam Gundy <[EMAIL PROTECTED]> wrote:
> Richard Huxton wrote:
> > How many distinct values do you have in groups.groupid and
> group_access.group_id?
> >
>
>  for the small database (since it shows the same problem):
>
>  group_access: 280/268
>  groups: 2006/139
>
>  for the large database:
>
>  group_access: same
>  groups: 1712647/140
>
>  the groupid key is an MD5 hash, so it should be uniformly distributed.
> maybe that throws the stats? but, again, char works, varchar doesn't :-(

OK, I'm thinking the varchar/char part is not the issue.

the database is very unbalanced, most of the groups are 1000 or less
records, with one group occupying 95% of the records.

I *think* that when I analyze using char instead of varchar, it is
recording a stat for the large group, but for some reason with varchar
doesn't add a stat for that one.

so, the real question is, how do I fix this? I can turn the stats way
up to 1000, but that doesn't guarantee that I'll get a stat for the
large group :-(

can I turn the statistics off completely for this column? I'm guessing
that if I can, that will mean it takes a guess based on the number of
distinct values in the groups table, which is still large number of
records, possibly enough to trigger the seqscan anyway.

does postgres have a way of building a 'counted index' that the
planner can use for it's record counts? some way of forcibly
maintaining a stat for every group?

the groups are not related to one another - is it possible to
partition them into their own indexes somehow?

ahh. lots of questions, no (obvious to me) answers from googling around.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to