On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Alexander Korotkov <aekorot...@gmail.com> writes: > > On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> I am starting to look at this patch now. I'm wondering exactly why the > >> decision was made to continue storing btree-style statistics for arrays, > > > Probably, btree statistics really does matter for some sort of arrays? > For > > example, arrays representing paths in the tree. We could request a > subtree > > in a range query on such arrays. > > That seems like a pretty narrow, uncommon use-case. Also, to get > accurate stats for such queries that way, you'd need really enormous > histograms. I doubt that the existing parameters for histogram size > will permit meaningful estimation of more than the first array entry > (since we don't make the histogram any larger than we do for a scalar > column). > > The real point here is that the fact that we're storing btree-style > stats for arrays is an accident, backed into by having added btree > comparators for arrays plus analyze.c's habit of applying default > scalar-oriented analysis functions to any type without an explicit > typanalyze entry. I don't recall that we ever thought hard about > it or showed that those stats were worth anything. >
OK. I don't object to removing btree stats from arrays. What do you thinks about pg_stats view in this case? Should it combine values histogram and array length histogram in single column like do for MCV and MCELEM? ------ With best regards, Alexander Korotkov.