On Tue, Jul 21, 2015 at 6:49 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:
> On 07/21/2015 03:44 PM, Alexander Korotkov wrote: > >> While Uriy is on vacation, I've revised this patch a bit. >> > > I whacked this around quite a bit, and I think it's in a committable state > now. But if you could run whatever tests you were using before on this, to > make sure it still produces the same estimates, that would be great. I > didn't change the estimates it should produce, only the code structure. > > One thing that bothers me slightly with this patch is the way it peeks > into the Most-Common-Elements arrays, which is produced by the built-in > type analyze function. If we ever change what statistics are collected for > arrays, or the way they are stored, this might break. In matchsel, why > don't we just call the built-in estimator function for each element that we > need to probe, and not look into the statistics ourselves at all? I > actually experimented with that, and it did slash much of the code, and it > would be more future-proof. However, it was also a lot slower for queries > that contain multiple values. That's understandable: the built-in estimator > will fetch the statistics tuple, parse the arrays, etc. separately for each > value in the query_int, while this patch will do it only once for the whole > query, and perform a simple binary search for each value. So overall, I > think this is OK as it is. But if we find that we need to use the MCE list > in this fashion in more places in the future, it might be worthwhile to add > some support code for this in the backend to allow extracting the stats > once, and doing multiple "lightweight estimations" using the extracted > stats. > Yeah, I see. We could end up with something like this. But probably we would need something more general for extensions which wants to play with statistics. For instance, pg_trgm could estimate selectivity for "text % text" operator. But in order to provide that it needs trigram statistics. Now it could be implemented by defining separate datatype, but it's a kluge. Probably, we would end up with custom additional statistics for datatypes. > Some things I fixed/changed: > > * I didn't like that transformOperator() function, which looked up the > function's name. I replaced it with separate wrapper functions for each > operator, so that the built-in operator's OID can be hardcoded into each. > > * I refactored the matchsel function heavily. I think it's more readable > now. > > * I got rid of the Int4Freq array. It didn't seem significantly easier to > work with than the separate values/numbers arrays, so I just used those > directly. > > * Also use the matchsel estimator for ~~ (the commutator of @@) > In this version of patch it's not checked if variable is actually and int[] not query_int. See following test case. # create table test2 as (select '1'::query_int val from generate_series(1,1000000)); # analyze test2; # explain select * from test2 where '{1}'::int[] @@ val; ERROR: unrecognized int query item type: 0 I've added this check. * Also use the estimators for the obsolete @ and ~ operators. Not that I > care much about those as they are obsolete, but seems strange not to, as > it's a trivial matter of setting the right estimator function. > > * I added an ANALYZE in the regression test. It still won't systematically > test all the cost estimation code, and there's nothing to check that the > estimates make sense, but at least more of the code will now run. You also forgot to include intarray--1.0--1.1.sql into patch. I've also added it. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
intarray-sel-4.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers