Re: [HACKERS] Selectivity estimation for intarray with @@

Alexander Korotkov Tue, 21 Jul 2015 09:29:57 -0700

On Tue, Jul 21, 2015 at 6:49 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:


> On 07/21/2015 03:44 PM, Alexander Korotkov wrote:
>
>> While Uriy is on vacation, I've revised this patch a bit.
>>
>
> I whacked this around quite a bit, and I think it's in a committable state
> now. But if you could run whatever tests you were using before on this, to
> make sure it still produces the same estimates, that would be great. I
> didn't change the estimates it should produce, only the code structure.
>
> One thing that bothers me slightly with this patch is the way it peeks
> into the Most-Common-Elements arrays, which is produced by the built-in
> type analyze function. If we ever change what statistics are collected for
> arrays, or the way they are stored, this might break. In matchsel, why
> don't we just call the built-in estimator function for each element that we
> need to probe, and not look into the statistics ourselves at all? I
> actually experimented with that, and it did slash much of the code, and it
> would be more future-proof. However, it was also a lot slower for queries
> that contain multiple values. That's understandable: the built-in estimator
> will fetch the statistics tuple, parse the arrays, etc. separately for each
> value in the query_int, while this patch will do it only once for the whole
> query, and perform a simple binary search for each value. So overall, I
> think this is OK as it is. But if we find that we need to use the MCE list
> in this fashion in more places in the future, it might be worthwhile to add
> some support code for this in the backend to allow extracting the stats
> once, and doing multiple "lightweight estimations" using the extracted
> stats.
>

Yeah, I see. We could end up with something like this. But probably we
would need something more general for extensions which wants to play with
statistics.
For instance, pg_trgm could estimate selectivity for "text % text"
operator. But in order to provide that it needs trigram statistics. Now it
could be implemented by defining separate datatype, but it's a kluge.
Probably, we would end up with custom additional statistics for datatypes.


> Some things I fixed/changed:
>
> * I didn't like that transformOperator() function, which looked up the
> function's name. I replaced it with separate wrapper functions for each
> operator, so that the built-in operator's OID can be hardcoded into each.
>
> * I refactored the matchsel function heavily. I think it's more readable
> now.
>
> * I got rid of the Int4Freq array. It didn't seem significantly easier to
> work with than the separate values/numbers arrays, so I just used those
> directly.
>
> * Also use the matchsel estimator for ~~ (the commutator of @@)
>

In this version of patch it's not checked if variable is actually and int[]
not query_int. See following test case.

# create table test2 as (select '1'::query_int val from
generate_series(1,1000000));
# analyze test2;

# explain select * from test2 where '{1}'::int[] @@ val;
ERROR:  unrecognized int query item type: 0

I've added this check.

* Also use the estimators for the obsolete @ and ~ operators. Not that I
> care much about those as they are obsolete, but seems strange not to, as
> it's a trivial matter of setting the right estimator function.
>
> * I added an ANALYZE in the regression test. It still won't systematically
> test all the cost estimation code, and there's nothing to check that the
> estimates make sense, but at least more of the code will now run.


You also forgot to include intarray--1.0--1.1.sql into patch. I've also
added it.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

intarray-sel-4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Selectivity estimation for intarray with @@

Reply via email to