Robert Haas <robertmh...@gmail.com> writes:
> On Sat, Jul 31, 2010 at 9:16 PM, Stephen Frost <sfr...@snowman.net> wrote:
>> * Kevin Grittner (kevin.gritt...@wicourts.gov) wrote:
>>> Robert Haas  07/31/10 12:33 PM >>>
>>>> Tom Lane  wrote:
>>>>> Failing to store stats isn't a bug?

>>>> Well, it kind of sounds more like you're removing a known
>>>> limitation than fixing a bug.

>>> It's operating as designed and documented.

>> I have to disagree with this, to be honest.  The fact that we've
>> documented what is completely unexpected and frustrating behaviour
>> doesn't mean we get to say it's not a bug.  Not collecting stats, at
>> all, is a pretty bad bug, in my view.

I'm a bit bemused by the claim that this behavior is "documented".  One
comment buried deep in the bowels of the source is not user-visible
documentation in my book.

> I guess I'd appreciate it if someone could explain in more detail in
> what cases we fail to collect stats.  Do we have a typanalyze function
> here that can't possibly work for anything, ever?  Or is it just some
> subset of the cases?

ANALYZE normally collects stats for any expression that there is an
expression index for.  However, it will punt and fail to collect stats
if the expression index uses an opclass whose opckeytype (ie, storage
datatype) is different from the actual expression datatype.  A quick
look into the system catalogs shows that that applies to these opclasses:

 amname |     opcname      |           opcintype           |         opckeytype 
         
--------+------------------+-------------------------------+-----------------------------
 btree  | name_ops         | name                          | cstring
 gist   | point_ops        | point                         | box
 gist   | poly_ops         | polygon                       | box
 gist   | circle_ops       | circle                        | box
 gin    | _int4_ops        | integer[]                     | integer
 gin    | _text_ops        | text[]                        | text
 gin    | _abstime_ops     | abstime[]                     | abstime
 gin    | _bit_ops         | bit[]                         | bit
 gin    | _bool_ops        | boolean[]                     | boolean
 gin    | _bpchar_ops      | character[]                   | character
 gin    | _bytea_ops       | bytea[]                       | bytea
 gin    | _char_ops        | "char"[]                      | "char"
 gin    | _cidr_ops        | cidr[]                        | cidr
 gin    | _date_ops        | date[]                        | date
 gin    | _float4_ops      | real[]                        | real
 gin    | _float8_ops      | double precision[]            | double precision
 gin    | _inet_ops        | inet[]                        | inet
 gin    | _int2_ops        | smallint[]                    | smallint
 gin    | _int8_ops        | bigint[]                      | bigint
 gin    | _interval_ops    | interval[]                    | interval
 gin    | _macaddr_ops     | macaddr[]                     | macaddr
 gin    | _name_ops        | name[]                        | name
 gin    | _numeric_ops     | numeric[]                     | numeric
 gin    | _oid_ops         | oid[]                         | oid
 gin    | _oidvector_ops   | oidvector[]                   | oidvector
 gin    | _time_ops        | time without time zone[]      | time without time 
zone
 gin    | _timestamptz_ops | timestamp with time zone[]    | timestamp with 
time zone
 gin    | _timetz_ops      | time with time zone[]         | time with time zone
 gin    | _varbit_ops      | bit varying[]                 | bit varying
 gin    | _varchar_ops     | character varying[]           | character varying
 gin    | _timestamp_ops   | timestamp without time zone[] | timestamp without 
time zone
 gin    | _money_ops       | money[]                       | money
 gin    | _reltime_ops     | reltime[]                     | reltime
 gin    | _tinterval_ops   | tinterval[]                   | tinterval
 gist   | tsvector_ops     | tsvector                      | gtsvector
 gin    | tsvector_ops     | tsvector                      | text
 gist   | tsquery_ops      | tsquery                       | bigint
(37 rows)

Now, of the above the only cases where we'd be likely to be able to do
anything very useful with stats on the expression value are the name
case, which isn't that exciting in practice, and the tsvector cases.
For tsvector it was only with 8.4 that we had non-toy stats code, so
while the limitation is ancient it's only recently that it started to be
meaningful.

I don't think this can be claimed to be a corner case.  If you set up
an FTS index according to the first alternative offered in

http://developer.postgresql.org/pgdocs/postgres/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

you will find that the system fails to collect stats for it and so you
get stupid default estimates for your FTS queries.  If this were a
"documented" limitation I'd expect to see a big red warning there to
*not* do it that way.  The only way that you actually get usable
tsvector stats at the moment is to explicitly store the tsvector as an
ordinary column, as in the second approach offered in the above
documentation section.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to