Would it be possible to look at a much larger number of samples during
analyze,
then look at the variation in those to generate a reasonable number of
pg_statistic "samples" to represent our estimate of the actual
distribution?
More datapoints for tables where the planner might benefit from it, fewer
where it wouldn't.
Maybe it would be possible to take note somewhere of the percentage of
occurence of the most common value (in the OP's case, about 3%), in which
case a quick decision can be taken to use the index without even looking
at the value, if we know the most common one is below the index use
threshold...
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly