On Thu, Apr 10, 2008 at 8:52 AM, Adam Gundy <[EMAIL PROTECTED]> wrote: > Richard Huxton wrote: > > How many distinct values do you have in groups.groupid and > group_access.group_id? > > > > for the small database (since it shows the same problem): > > group_access: 280/268 > groups: 2006/139 > > for the large database: > > group_access: same > groups: 1712647/140 > > the groupid key is an MD5 hash, so it should be uniformly distributed. > maybe that throws the stats? but, again, char works, varchar doesn't :-(
OK, I'm thinking the varchar/char part is not the issue. the database is very unbalanced, most of the groups are 1000 or less records, with one group occupying 95% of the records. I *think* that when I analyze using char instead of varchar, it is recording a stat for the large group, but for some reason with varchar doesn't add a stat for that one. so, the real question is, how do I fix this? I can turn the stats way up to 1000, but that doesn't guarantee that I'll get a stat for the large group :-( can I turn the statistics off completely for this column? I'm guessing that if I can, that will mean it takes a guess based on the number of distinct values in the groups table, which is still large number of records, possibly enough to trigger the seqscan anyway. does postgres have a way of building a 'counted index' that the planner can use for it's record counts? some way of forcibly maintaining a stat for every group? the groups are not related to one another - is it possible to partition them into their own indexes somehow? ahh. lots of questions, no (obvious to me) answers from googling around. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance