Re: Allowing extensions to supply operator-/function-specific info

2019-03-05 Thread Tom Lane
Paul Ramsey  writes:
>> On Mar 5, 2019, at 3:56 PM, Tom Lane  wrote:
>> Then you're at least missing adequate tests for the 3-arg functions...
>> 3 args with the index column second will not work as this stands.

> Some of the operators are indifferent to order (&&,  overlaps) and others are 
> not (@, within) (~, contains).

Right.

> The 3-arg functions fortunately all have && strategies.

Hm ... that probably explains why it's okay to apply the "expand"
behavior to the non-indexed argument regardless of which one that is.
I imagine the official definition of those functions isn't really
symmetrical about which argument the expansion applies to, though?

> The types on either side of the operators are always the same (geometry && 
> geometry), ST_Intersects(geometry, geometry).
> I could simply be getting a free pass from the simplicity of my setup?

Yeah, seems so.  The real reason I'm pestering you about this is that,
since you're the first outside user of the support-function
infrastructure, other people are likely to be looking at your code
to see how to do things.  So I'd like your code to not contain
unnecessary dependencies on accidents like that ...

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-03-05 Thread Paul Ramsey


> On Mar 5, 2019, at 3:56 PM, Tom Lane  wrote:
> 
> Paul Ramsey  writes:
>> On Mar 5, 2019, at 3:26 PM, Tom Lane  wrote:
>>> Hm, I think your addition of this bit is wrong:
>>> 
>>> +/*
>>> +* Arguments were swapped to put the index value on the
>>> +* left, so we need the commutated operator for
>>> +* the OpExpr
>>> +*/
>>> +if (swapped)
>>> +{
>>> +oproid = get_commutator(oproid);
>>> +if (!OidIsValid(oproid))
>>> PG_RETURN_POINTER((Node *)NULL);
>>> +}
>>> 
>>> We already did the operator lookup with the argument types in the desired
>>> order, so this is introducing an extra swap.  The only reason it appears
>>> to work, I suspect, is that all your index operators are self-commutators.
> 
>> I was getting regression failures until I re-swapped the operator… 
>>  SELEcT * FROM foobar WHERE ST_Within(ConstA, VarB)
> 
> Ah ... so the real problem here is that *not* all of your functions
> treat their first two inputs alike, and the hypothetical future
> improvement I commented about is needed right now.  I should've
> looked more closely at the strategies in your table; then I would've
> realized the patch as I proposed it didn't work.
> 
> But this code isn't right either.  I'm surprised you're not getting
> crashes --- perhaps there aren't cases where the first and second args
> are of incompatible types?  Also, it's certainly wrong to be doing this
> sort of swap in only one of the two code paths.
> 
> There's more than one way you could handle this, but the way that
> I was vaguely imagining was to have two strategy entries in each
> IndexableFunction entry, one to apply if the first function argument
> is the indexable one, and the other to apply if the second function
> argument is the indexable one.  If you leave the operator lookup as
> I had it (using the already-swapped data types) then you'd have to
> make sure that the latter set of strategy entries are written as
> if the arguments get swapped before selecting the strategy, which
> would be confusing perhaps :-( --- for instance, st_within would
> use RTContainedByStrategyNumber in the first case but
> RTContainsStrategyNumber in the second.  But otherwise you need the
> separate get_commutator step, which seems like one more catalog lookup
> than you really need.
> 
>> I feel OK about it, if for no other reason than it passes all the tests :)
> 
> Then you're at least missing adequate tests for the 3-arg functions...
> 3 args with the index column second will not work as this stands.

Some of the operators are indifferent to order (&&,  overlaps) and others are 
not (@, within) (~, contains).

The 3-arg functions fortunately all have && strategies.

The types on either side of the operators are always the same (geometry && 
geometry), ST_Intersects(geometry, geometry).

I could simply be getting a free pass from the simplicity of my setup?

P.


Re: Allowing extensions to supply operator-/function-specific info

2019-03-05 Thread Tom Lane
Paul Ramsey  writes:
> On Mar 5, 2019, at 3:26 PM, Tom Lane  wrote:
>> Hm, I think your addition of this bit is wrong:
>> 
>> +/*
>> +* Arguments were swapped to put the index value on the
>> +* left, so we need the commutated operator for
>> +* the OpExpr
>> +*/
>> +if (swapped)
>> +{
>> +oproid = get_commutator(oproid);
>> +if (!OidIsValid(oproid))
>> PG_RETURN_POINTER((Node *)NULL);
>> +}
>> 
>> We already did the operator lookup with the argument types in the desired
>> order, so this is introducing an extra swap.  The only reason it appears
>> to work, I suspect, is that all your index operators are self-commutators.

> I was getting regression failures until I re-swapped the operator… 
>   SELEcT * FROM foobar WHERE ST_Within(ConstA, VarB)

Ah ... so the real problem here is that *not* all of your functions
treat their first two inputs alike, and the hypothetical future
improvement I commented about is needed right now.  I should've
looked more closely at the strategies in your table; then I would've
realized the patch as I proposed it didn't work.

But this code isn't right either.  I'm surprised you're not getting
crashes --- perhaps there aren't cases where the first and second args
are of incompatible types?  Also, it's certainly wrong to be doing this
sort of swap in only one of the two code paths.

There's more than one way you could handle this, but the way that
I was vaguely imagining was to have two strategy entries in each
IndexableFunction entry, one to apply if the first function argument
is the indexable one, and the other to apply if the second function
argument is the indexable one.  If you leave the operator lookup as
I had it (using the already-swapped data types) then you'd have to
make sure that the latter set of strategy entries are written as
if the arguments get swapped before selecting the strategy, which
would be confusing perhaps :-( --- for instance, st_within would
use RTContainedByStrategyNumber in the first case but
RTContainsStrategyNumber in the second.  But otherwise you need the
separate get_commutator step, which seems like one more catalog lookup
than you really need.

> I feel OK about it, if for no other reason than it passes all the tests :)

Then you're at least missing adequate tests for the 3-arg functions...
3 args with the index column second will not work as this stands.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-03-05 Thread Paul Ramsey


> On Mar 5, 2019, at 3:26 PM, Tom Lane  wrote:
> 
> Paul Ramsey  writes:
>> Thanks for the patch, I’ve applied and smoothed and taken your advice on 
>> schema-qualified lookups as well.
> 
> Hm, I think your addition of this bit is wrong:
> 
> +/*
> +* Arguments were swapped to put the index value on the
> +* left, so we need the commutated operator for
> +* the OpExpr
> +*/
> +if (swapped)
> +{
> +oproid = get_commutator(oproid);
> +if (!OidIsValid(oproid))
> PG_RETURN_POINTER((Node *)NULL);
> +}
> 
> We already did the operator lookup with the argument types in the desired
> order, so this is introducing an extra swap.  The only reason it appears
> to work, I suspect, is that all your index operators are self-commutators.

I was getting regression failures until I re-swapped the operator… 

  SELEcT * FROM foobar WHERE ST_Within(ConstA, VarB)

Place the indexed operator in the Left, now:

  Left == VarB
  Right == ConstA
  Strategy == Within
  get_opfamily_member(opfamilyoid, Left, Right, Within)

Unless we change the strategy number when we assign the left/right we’re 
looking up an operator for “B within A”, so we’re backwards.

I feel OK about it, if for no other reason than it passes all the tests :)

P


Re: Allowing extensions to supply operator-/function-specific info

2019-03-05 Thread Tom Lane
Paul Ramsey  writes:
> Thanks for the patch, I’ve applied and smoothed and taken your advice on 
> schema-qualified lookups as well.

Hm, I think your addition of this bit is wrong:

+/*
+* Arguments were swapped to put the index value on the
+* left, so we need the commutated operator for
+* the OpExpr
+*/
+if (swapped)
+{
+oproid = get_commutator(oproid);
+if (!OidIsValid(oproid))
 PG_RETURN_POINTER((Node *)NULL);
+}

We already did the operator lookup with the argument types in the desired
order, so this is introducing an extra swap.  The only reason it appears
to work, I suspect, is that all your index operators are self-commutators.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-03-04 Thread Tom Lane
Paul Ramsey  writes:
>> On Mar 4, 2019, at 2:52 PM, Tom Lane  wrote:
>> BTW, if you'd like me to review the code you added for this, I'd be happy
>> to do so.  I've never looked at PostGIS' innards, but probably I can make
>> sense of the code for this despite that.

> I would be ecstatic for a review, I'm sure I've left a million loose threads 
> dangling.

I took a look, and saw that you'd neglected to check pseudoconstantness
of the non-index argument, so this'd fail on cases like ST_DWithin(x, y)
where x is indexed and y is another column in the same table.  Also
I thought the handling of commutation could be done better.  Attached is
a suggested patch atop your f731c1b7022381dbf627cae311c3d37791bf40c3 to
fix those and a couple of nitpicky other things.  (I haven't tested this,
mind you.)

One thing that makes me itch, but I didn't address in the attached,
is that expandFunctionOid() is looking up a function by name without
any schema-qualification.  That'll fail outright if PostGIS isn't in
the search path, and even if it is, you've got security issues there.
One way to address this is to assume that the expandfn is in the same
schema as the ST_XXX function you're attached to, so you could
do "get_namespace_name(get_func_namespace(funcid))" and then include
that in the list passed to LookupFuncName.

Also, this might be as-intended but I was wondering: I'd sort of expected
you to make, eg, _ST_DWithin() and ST_DWithin() into exact synonyms.
They aren't, since the former is not connected to the support function.
Is that intentional?  I guess if you had a situation where you wanted to
force non-use of an index, being able to use _ST_DWithin() for that would
be helpful.

regards, tom lane

diff --git a/postgis/gserialized_supportfn.c b/postgis/gserialized_supportfn.c
index 90c61f3..66b699b 100644
--- a/postgis/gserialized_supportfn.c
+++ b/postgis/gserialized_supportfn.c
@@ -60,7 +60,7 @@ Datum postgis_index_supportfn(PG_FUNCTION_ARGS);
 */
 typedef struct
 {
-	char *fn_name;
+	const char *fn_name;
 	int   strategy_number; /* Index strategy to add */
 	int   nargs;   /* Expected number of function arguments */
 	int   expand_arg;  /* Radius argument for "within distance" search */
@@ -230,22 +230,42 @@ Datum postgis_index_supportfn(PG_FUNCTION_ARGS)
 }
 
 /*
-* Somehow an indexed third argument has slipped in. That
-* should not happen.
+* We can only do something with index matches on the first or
+* second argument.
 */
 if (req->indexarg > 1)
 	PG_RETURN_POINTER((Node *)NULL);
 
 /*
-* Need the argument types (which should always be geometry or geography
-* since this function is only bound to those functions)
-* to use in the operator function lookup
+* Make sure we have enough arguments (just paranoia really).
 */
-if (nargs < 2)
+if (nargs < 2 || nargs < idxfn.expand_arg)
 	elog(ERROR, "%s: associated with function with %d arguments", __func__, nargs);
 
-leftarg = linitial(clause->args);
-rightarg = lsecond(clause->args);
+/*
+* Extract "leftarg" as the arg matching the index, and
+* "rightarg" as the other one, even if they were in the
+* opposite order in the call.  NOTE: the functions we deal
+* with here treat their first two arguments symmetrically
+* enough that we needn't distinguish the two cases beyond
+* this.  This might need more work later.
+*/
+if (req->indexarg == 0)
+{
+	leftarg = linitial(clause->args);
+	rightarg = lsecond(clause->args);
+}
+else
+{
+	rightarg = linitial(clause->args);
+	leftarg = lsecond(clause->args);
+}
+
+/*
+* Need the argument types (which should always be geometry or geography
+* since this function is only bound to those functions)
+* to use in the operator function lookup.
+*/
 leftdatatype = exprType(leftarg);
 rightdatatype = exprType(rightarg);
 
@@ -267,20 +287,27 @@ Datum postgis_index_supportfn(PG_FUNCTION_ARGS)
 */
 if (idxfn.expand_arg)
 {
-	Node *indexarg = req->indexarg ? rightarg : leftarg;
-	Node *otherarg = req->indexarg ? leftarg : rightarg;
 	Node *radiusarg = (Node *) list_nth(clause->args, idxfn.expand_arg-1);
-	// Oid indexdatatype = exprType(indexarg);
-	Oid otherdatatype = exprType(otherarg);
-	Oid expandfn_oid = expandFunctionOid(otherdatatype);
+	Oid expandfn_oid = expandFunctionOid(rightdatatype);
+
+	FuncExpr *expandexpr = makeFuncExpr(expandfn_oid, rightdatatype,
+	list_make2(rightarg, radiusarg),
+		InvalidOid, InvalidOid, COERCE_EXPLICIT_CALL);
 
-	FuncExpr *expandexpr = makeFuncExpr(expandfn_oid, otherdatatype,
-	list_make2(otherarg, radiusarg),
-		InvalidOid, req->indexcollation, COERCE_EXPLICIT_CALL);
+	/*
+	* The comparison expression has to be a pseudoconstant,
+	* ie not volatile nor dependent on the target index's
+	* 

Re: Allowing extensions to supply operator-/function-specific info

2019-03-04 Thread Paul Ramsey


> On Mar 4, 2019, at 2:52 PM, Tom Lane  wrote:
> 
> Paul Ramsey  writes:
>> Gotcha, done and now have an implementation that passes all our regression 
>> tests.
> 
> Very cool!  So the next step, I guess, is to address your original problem
> by cranking up the cost estimates for these functions --- have you tried
> that yet?  In principle you should be able to do that and not have any
> bad planning side-effects, but this is all pretty new territory so maybe
> some problems remain to be ironed out.
> 
> BTW, if you'd like me to review the code you added for this, I'd be happy
> to do so.  I've never looked at PostGIS' innards, but probably I can make
> sense of the code for this despite that.

I would be ecstatic for a review, I’m sure I’ve left a million loose threads 
dangling.

P.

https://github.com/pramsey/postgis/blob/svn-trunk-supportfn/postgis/gserialized_supportfn.c#L191
 


https://github.com/pramsey/postgis/blob/svn-trunk-supportfn/postgis/postgis.sql.in#L4290
 





Re: Allowing extensions to supply operator-/function-specific info

2019-03-04 Thread Tom Lane
Paul Ramsey  writes:
> Gotcha, done and now have an implementation that passes all our regression 
> tests.

Very cool!  So the next step, I guess, is to address your original problem
by cranking up the cost estimates for these functions --- have you tried
that yet?  In principle you should be able to do that and not have any
bad planning side-effects, but this is all pretty new territory so maybe
some problems remain to be ironed out.

BTW, if you'd like me to review the code you added for this, I'd be happy
to do so.  I've never looked at PostGIS' innards, but probably I can make
sense of the code for this despite that.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-03-04 Thread Tom Lane
Paul Ramsey  writes:
> I had what seemed to be working code except for a couple rare cases,
> but when I fixed those cases it turned out that I had a major problem:
> building a  OP  expression works fine, but building a
>  OP  expression returns me an error.

Yup, you're not supposed to do that.  The output expression *must* have
the index key on the left, it's up to you to commute the operator if
needed to make that happen.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-03-04 Thread Paul Ramsey
So I am getting much closer to a working implementation in PostGIS,
but have just run into an issue which I am assuming is my
misunderstanding something...

https://github.com/pramsey/postgis/blob/92268c94f3aa1fc63a2941f2b451be15b28662cf/postgis/gserialized_supportfn.c#L287

I had what seemed to be working code except for a couple rare cases,
but when I fixed those cases it turned out that I had a major problem:
building a  OP  expression works fine, but building a
 OP  expression returns me an error.

create table f as select st_makepoint(200*random() - 100, 200*random()
- 100) as g from generate_series(0, 10);
create index f_g_x on f using gist (g);
explain select * from baz where st_coveredby('POINT(5 0)', geom);
explain select * from f where st_coveredby(g, 'POINT(5 0)');

QUERY PLAN
---
 Bitmap Heap Scan on f  (cost=13.36..314.58 rows=4 width=32)
   Filter: st_coveredby(g,
'0101001440'::geometry)
   ->  Bitmap Index Scan on f_g_x  (cost=0.00..5.03 rows=100 width=0)
 Index Cond: (g @
'0101001440'::geometry)

postgis=# explain select * from f where st_coveredby('POINT(5 0)', g);

ERROR:  index key does not match expected index column

Any thoughts?

P



Re: Allowing extensions to supply operator-/function-specific info

2019-02-27 Thread Tom Lane
Paul Ramsey  writes:
> I added three indexes to my test table:
>   CREATE INDEX foo_g_gist_x ON foo USING GIST (g);
>   CREATE INDEX foo_g_gist_nd_x ON foo USING GIST (g gist_geometry_ops);
>   CREATE INDEX foo_g_spgist_x ON foo USING SPGIST (g);
> They all support the overlaps (&&) operator.

> So, SupportRequestIndexCondition happens three times, and each time I say 
> “yep, sure, you can construct an index condition by putting the && operator 
> between left_arg and right_arg”.

Sounds right.

> How does the planner end up deciding on which index to *actually* use?

It's whichever has the cheapest cost estimate.  In case of an exact tie,
I believe it'll choose the index with lowest OID (or maybe highest OID,
not sure).

> The selectivity is the same, the operator is the same. I found that I got the 
> ND GIST one first, then the SPGIST and finally the 2d GIST, which is 
> unfortunate, because the 2D and SPGIST are almost certainly faster than the 
> ND GIST.

Given that it'll be the same selectivity, the cost preference is likely to
go to whichever index is physically smallest, at least for indexes of the
same type.  When they're not the same type there might be an issue with
the index AM cost estimators not being lined up very well as to what they
account for and how.

I don't doubt that there's plenty of work to be done in making the cost
estimates better in cases like this --- in particular, I don't think we
have any way of accounting for the idea that one index opclass might be
smarter than another one for the same query, unless that shakes out as a
smaller index.  But you'd have had the same issues with the old approach.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-02-27 Thread Paul Ramsey


> On Feb 27, 2019, at 3:40 PM, Tom Lane  wrote:
> 
>> Variable SupportRequestCost is very exciting, but given that variable cost 
>> is usually driven by the complexity of arguments, what kind of argument is 
>> the SupportRequestCost call fed during the planning stage? Constant 
>> arguments are pretty straight forward, but what gets sent in when a column 
>> is one (or all) of the arguments? 
> 
> You'll see whatever is in the post-constant-folding parse tree.  If it's a
> Const, you can look at the value.  If it's a Var, you could perhaps look
> at the pg_statistic info for that column, though whether that would give
> you much of a leg up for cost estimation is hard to say.  For any sort of
> expression, you're probably going to be reduced to using a default
> estimate.  The core code generally doesn't try to be intelligent about
> anything beyond the Const and Var cases.

Actually, this is interesting, maybe there’s something to be done looking at 
the vertex density of the area under consideration… would require gathering 
extra stats, but could be useful (maybe, at some point feeding costs into plans 
has to degenerate into wankery…)

Another question:

I added three indexes to my test table:

  CREATE INDEX foo_g_gist_x ON foo USING GIST (g);
  CREATE INDEX foo_g_gist_nd_x ON foo USING GIST (g gist_geometry_ops);
  CREATE INDEX foo_g_spgist_x ON foo USING SPGIST (g);

They all support the overlaps (&&) operator.

So, SupportRequestIndexCondition happens three times, and each time I say “yep, 
sure, you can construct an index condition by putting the && operator between 
left_arg and right_arg”.

How does the planner end up deciding on which index to *actually* use? The 
selectivity is the same, the operator is the same. I found that I got the ND 
GIST one first, then the SPGIST and finally the 2d GIST, which is unfortunate, 
because the 2D and SPGIST are almost certainly faster than the ND GIST.

In practice, most people will just have one spatial index at a time, but I 
still wonder?

P


Re: Allowing extensions to supply operator-/function-specific info

2019-02-27 Thread Tom Lane
Paul Ramsey  writes:
> The documentation says that a support function should have a signature 
> "supportfn(internal) returns internal”, but doesn’t say which (if any) 
> annotations should be provided. IMMUTABLE? PARALLEL SAFE? STRICT? None? All?

It doesn't matter much given that these things aren't callable from SQL.
The builtin ones are marked immutable/safe/strict, but that's mostly
because that's the default state for builtin functions.  The only one
I'd get excited about is marking it strict if you're not going to check
for a null argument ... and even that is neatnik-ism, not something that
will have any practical effect.

> Variable SupportRequestCost is very exciting, but given that variable cost is 
> usually driven by the complexity of arguments, what kind of argument is the 
> SupportRequestCost call fed during the planning stage? Constant arguments are 
> pretty straight forward, but what gets sent in when a column is one (or all) 
> of the arguments? 

You'll see whatever is in the post-constant-folding parse tree.  If it's a
Const, you can look at the value.  If it's a Var, you could perhaps look
at the pg_statistic info for that column, though whether that would give
you much of a leg up for cost estimation is hard to say.  For any sort of
expression, you're probably going to be reduced to using a default
estimate.  The core code generally doesn't try to be intelligent about
anything beyond the Const and Var cases.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-02-27 Thread Paul Ramsey
A few more questions…

The documentation says that a support function should have a signature 
"supportfn(internal) returns internal”, but doesn’t say which (if any) 
annotations should be provided. IMMUTABLE? PARALLEL SAFE? STRICT? None? All?

Variable SupportRequestCost is very exciting, but given that variable cost is 
usually driven by the complexity of arguments, what kind of argument is the 
SupportRequestCost call fed during the planning stage? Constant arguments are 
pretty straight forward, but what gets sent in when a column is one (or all) of 
the arguments? 

Thanks,
P




Re: Allowing extensions to supply operator-/function-specific info

2019-02-26 Thread Tom Lane
Paul Ramsey  writes:
>> On Feb 26, 2019, at 2:19 PM, Tom Lane  wrote:
>> In most cases, multiple matching arguments are going to lead to
>> failure to construct any useful index condition, because your
>> comparison value has to be a pseudoconstant (ie, not a variable
>> from the same table, so in both of the above examples there's
>> no function argument you could compare to).  

> This term “pseudoconstant” has been causing me some worry as it crops up
> in your explanations a fair amount.

It is defined in the documentation, but what it boils down to is that
your comparison value can't contain either (1) variables from the same
table the index is on or (2) volatile functions.  There is a function
defined in optimizer.h that can check that for you, so you don't have
to worry too much about the details.

> I expect to have queries of the form

>   SELECT a.*, b.*  
>   FROM a
>   JOIN b
>   ON ST_Intersects(a.geom, b.geom)

Sure, that's fine.  If there are indexes on both a.geom and b.geom,
you'll get separate opportunities to match to each of those, and
what you'd be constructing in each case is an indexqual that has to be
used on the inner side of a nestloop join (so that the outer side can
provide the comparison value).  What's not fine is "WHERE
ST_Intersects(a.geom, a.othergeom)" ... you can't make an indexscan
out of that, at least not with the && operator.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-02-26 Thread Paul Ramsey


> On Feb 26, 2019, at 2:19 PM, Tom Lane  wrote:
> 
> In most cases, multiple matching arguments are going to lead to
> failure to construct any useful index condition, because your
> comparison value has to be a pseudoconstant (ie, not a variable
> from the same table, so in both of the above examples there's
> no function argument you could compare to).  

This term “pseudoconstant” has been causing me some worry as it crops up in 
your explanations a fair amount. I expect to have queries of the form

  SELECT a.*, b.*  
  FROM a
  JOIN b
  ON ST_Intersects(a.geom, b.geom)

And I expect to be able to rewrite that in terms of having an additional call 
to the index operator (&&) and there won’t be a constant on either side of the 
operator. Am I mis-understanding the term, or are there issues with using this 
API in a join context?

P.

> But we don't prejudge
> that, because it's possible that a function with 3 or more arguments
> could produce something useful anyway.  For instance, if what we've
> got is "f(x, y, constant)" then it's possible that the semantics of
> the function are such that y can be ignored and we can make something
> indexable like "x && constant".  All this is the support function's
> job to know.

>   regards, tom lane




Re: Allowing extensions to supply operator-/function-specific info

2019-02-26 Thread Tom Lane
Paul Ramsey  writes:
> On Feb 26, 2019, at 2:19 PM, Tom Lane  wrote:
>> What's the query look like exactly?  The other two calls will occur
>> anyway, but SupportRequestIndexCondition depends on the function
>> call's placement.

> select geos_intersects_new(g, 'POINT(0 0)') from foo;

Right, so that's not useful for an index scan.  Try

select * from foo where geos_intersects_new(g, 'POINT(0 0)').

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-02-26 Thread Paul Ramsey


> On Feb 26, 2019, at 2:19 PM, Tom Lane  wrote:
> 
>> I have
>> created a table (foo) a geometry column (g) and an index (GIST on
>> foo(g)) and am running a query against foo using a noop function with
>> a support function bound to it.
> 
>> The support function is called, twice, once in
>> T_SupportRequestSimplify mode and once in T_SupportRequestCost mode.
> 
> What's the query look like exactly?  The other two calls will occur
> anyway, but SupportRequestIndexCondition depends on the function
> call's placement.

select geos_intersects_new(g, 'POINT(0 0)') from foo;

> 
>   regards, tom lane




Re: Allowing extensions to supply operator-/function-specific info

2019-02-26 Thread Tom Lane
Paul Ramsey  writes:
> New line of questioning: under what conditions will the support
> function be called in a T_SupportRequestIndexCondition mode?

It'll be called if the target function appears at top level of a
WHERE or JOIN condition and any one of the function's arguments
syntactically matches some column of an index.

If there's multiple arguments matching the same index column, say
index on "x" and we have "f(z, x, x)", you'll get one call and
it will tell you about the first match (req->indexarg == 1 in
this example).  Sorting out what to do in such a case is your
responsibility.

If there's arguments matching more than one index column, say
index declared on (x, y) and we have "f(x, y)", you'll get a
separate call for each index column.  Again, sorting out what
to do for each one is your responsibility.

In most cases, multiple matching arguments are going to lead to
failure to construct any useful index condition, because your
comparison value has to be a pseudoconstant (ie, not a variable
from the same table, so in both of the above examples there's
no function argument you could compare to).  But we don't prejudge
that, because it's possible that a function with 3 or more arguments
could produce something useful anyway.  For instance, if what we've
got is "f(x, y, constant)" then it's possible that the semantics of
the function are such that y can be ignored and we can make something
indexable like "x && constant".  All this is the support function's
job to know.

> I have
> created a table (foo) a geometry column (g) and an index (GIST on
> foo(g)) and am running a query against foo using a noop function with
> a support function bound to it.

> The support function is called, twice, once in
> T_SupportRequestSimplify mode and once in T_SupportRequestCost mode.

What's the query look like exactly?  The other two calls will occur
anyway, but SupportRequestIndexCondition depends on the function
call's placement.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-02-26 Thread Paul Ramsey
On Mon, Feb 25, 2019 at 4:09 PM Tom Lane  wrote:
>
> Paul Ramsey  writes:
> > On Mon, Feb 25, 2019 at 3:01 PM Tom Lane  wrote:
> >> It's whichever one the index column's opclass belongs to.  Basically what
> >> you're trying to do here is verify whether the index will support the
> >> optimization you want to perform.
>
> > * If I have tbl1.geom
> > * and I have built two indexes on it, a btree_geometry_ops and a
> > gist_geometry_ops_2d, and
> > * and SupportRequestIndexCondition.opfamily returns me the btree family
> > * and I look and see, "damn there is no && operator in there"
> > * am I SOL, even though an appropriate index does exist?
>
> No.  If there are two indexes matching your function's argument, you'll
> get a separate call for each index.  The support function is only
> responsible for thinking about one index at a time and seeing if it
> can be used.  If more than one can be used, figuring out which
> one is better is done by later cost comparisons.

Ah, wonderful!

New line of questioning: under what conditions will the support
function be called in a T_SupportRequestIndexCondition mode? I have
created a table (foo) a geometry column (g) and an index (GIST on
foo(g)) and am running a query against foo using a noop function with
a support function bound to it.

The support function is called, twice, once in
T_SupportRequestSimplify mode and once in T_SupportRequestCost mode.

What triggers T_SupportRequestIndexCondition mode?

Thanks!

P



Re: Allowing extensions to supply operator-/function-specific info

2019-02-25 Thread Tom Lane
Paul Ramsey  writes:
> On Mon, Feb 25, 2019 at 3:01 PM Tom Lane  wrote:
>> It's whichever one the index column's opclass belongs to.  Basically what
>> you're trying to do here is verify whether the index will support the
>> optimization you want to perform.

> * If I have tbl1.geom
> * and I have built two indexes on it, a btree_geometry_ops and a
> gist_geometry_ops_2d, and
> * and SupportRequestIndexCondition.opfamily returns me the btree family
> * and I look and see, "damn there is no && operator in there"
> * am I SOL, even though an appropriate index does exist?

No.  If there are two indexes matching your function's argument, you'll
get a separate call for each index.  The support function is only
responsible for thinking about one index at a time and seeing if it
can be used.  If more than one can be used, figuring out which
one is better is done by later cost comparisons.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-02-25 Thread Paul Ramsey
On Mon, Feb 25, 2019 at 3:01 PM Tom Lane  wrote:

> > Looking at the examples, they are making use of the opfamily that
> > comes in SupportRequestIndexCondition.opfamily.
> > That opfamily Oid is the first one in the IndexOptInfo.opfamily array.
> > Here's where my thread of understanding fails to follow. I have, in
> > PostGIS, actually no operator families defined (CREATE OPERATOR
> > FAMILY). I do, however, have quite a few operator classes defined for
> > geometry: 10, actually!
>
> Yes, you do have operator families: there's no such thing as an operator
> class without a containing operator family, and hasn't been for quite
> a long time.  If you write CREATE OPERATOR CLASS without a FAMILY
> clause, the command silently creates an opfamily with the same name you
> specified for the opclass, and links the two together.

OK, starting to understand...

> > Some of them (gist_geometry_ops_2d, spgist_geometry_ops_2d ) use the
> > && operator to indicate the lossy operation I would like to combine
> > with ST_Intersects.
> > Some of them (gist_geometry_ops_nd, spgist_geometry_ops_nd) use the
> > &&& operator to indicate the lossy operation I would like to combine
> > with ST_Intersects.
>
> Right.  So the hard part here is to figure out whether the OID you're
> handed matches one of these operator families.  As I mentioned (in
> the other thread [1], maybe you didn't see it?) the best short-term
> idea I've got for that is to look up the opfamily by OID (see the
> OPFAMILYOID syscache) and check to see if its name matches one of
> the above.  You might want to verify that the index AM's OID is what
> you expect, too, just for a little extra safety.

I read it, I just didn't entirely understand it. I think maybe I do
know? I'm reading and re-reading everything and trying to build a
mental model that makes sense :)

Back to SupportRequestIndexCondition.opfamily though:

> It's whichever one the index column's opclass belongs to.  Basically what
> you're trying to do here is verify whether the index will support the
> optimization you want to perform.

* If I have tbl1.geom
* and I have built two indexes on it, a btree_geometry_ops and a
gist_geometry_ops_2d, and
* and SupportRequestIndexCondition.opfamily returns me the btree family
* and I look and see, "damn there is no && operator in there"
* am I SOL, even though an appropriate index does exist?

> Sure, but that was a pretty lame way of getting the optimization,
> as you well know because you've been fighting its deficiencies for
> so long.

Hrm. :) I will agree to disagree. This is an intellectually
interesting journey, but most of its length is quite far removed from
our proximate goal of adding realistic costs to our functions, and the
code added will be quite a bit harder for folks to follow than what it
replaces.

Reading your code is a pleasure and the comments are great, it's just
a hard slog up for someone who is still going "Node*, hm, how does
that work..."

ATB,

P

> Perhaps at some point we'll have some infrastructure that makes this
> less painful, but it's not happening for v12.
>
> regards, tom lane
>
> [1] https://www.postgresql.org/message-id/22876.1550591...@sss.pgh.pa.us



Re: Allowing extensions to supply operator-/function-specific info

2019-02-25 Thread Tom Lane
Paul Ramsey  writes:
> So... trying to figure out how to use SupportRequestIndexCondition to
> convert a call to Intersects() in to a call that also has the operator
> && as well.

OK.

> Looking at the examples, they are making use of the opfamily that
> comes in SupportRequestIndexCondition.opfamily.
> That opfamily Oid is the first one in the IndexOptInfo.opfamily array.
> Here's where my thread of understanding fails to follow. I have, in
> PostGIS, actually no operator families defined (CREATE OPERATOR
> FAMILY). I do, however, have quite a few operator classes defined for
> geometry: 10, actually!

Yes, you do have operator families: there's no such thing as an operator
class without a containing operator family, and hasn't been for quite
a long time.  If you write CREATE OPERATOR CLASS without a FAMILY
clause, the command silently creates an opfamily with the same name you
specified for the opclass, and links the two together.

> Some of them (gist_geometry_ops_2d, spgist_geometry_ops_2d ) use the
> && operator to indicate the lossy operation I would like to combine
> with ST_Intersects.
> Some of them (gist_geometry_ops_nd, spgist_geometry_ops_nd) use the
> &&& operator to indicate the lossy operation I would like to combine
> with ST_Intersects.

Right.  So the hard part here is to figure out whether the OID you're
handed matches one of these operator families.  As I mentioned (in
the other thread [1], maybe you didn't see it?) the best short-term
idea I've got for that is to look up the opfamily by OID (see the
OPFAMILYOID syscache) and check to see if its name matches one of
the above.  You might want to verify that the index AM's OID is what
you expect, too, just for a little extra safety.

> A given call to ST_Intersects(tbl1.geom, tbl2.geom) could have two
> indexes to apply the problem, but
> SupportRequestIndexCondition.opfamily will, I assume, only be exposing
> one to me: which one?

It's whichever one the index column's opclass belongs to.  Basically what
you're trying to do here is verify whether the index will support the
optimization you want to perform.

> Anyways, to true up how hard this is, I've been carefully reading the
> implementations for network address types and LIKE, and I'm still
> barely at the WTF stage. The selectivity and the number of rows
> support modes I could do. The SupportRequestIndexCondition is based on
> a detailed knowledge of what an operator family is, an operator class
> is, how those relate to types... I think I can get there, but it's
> going to be far from easy (for me).

You definitely want to read this:

https://www.postgresql.org/docs/devel/xindex.html#XINDEX-OPFAMILY

and maybe some of the surrounding sections.

> And it'll put a pretty high bar in
> front of anyone who previously just whacked an inline SQL function in
> place to get an index assisted function up and running.

Sure, but that was a pretty lame way of getting the optimization,
as you well know because you've been fighting its deficiencies for
so long.

Perhaps at some point we'll have some infrastructure that makes this
less painful, but it's not happening for v12.

regards, tom lane

[1] https://www.postgresql.org/message-id/22876.1550591...@sss.pgh.pa.us



Re: Allowing extensions to supply operator-/function-specific info

2019-02-25 Thread Paul Ramsey
On Mon, Jan 28, 2019 at 9:51 AM Tom Lane  wrote:
> is people like PostGIS, who already cleared that bar.  I hope that
> we'll soon have a bunch of examples, like those in the 0004 patch,
> that people can look at to see how to do things in this area.  I see
> no reason to believe it'll be all that much harder than anything
> else extension authors have to do.

It's a little harder :)

So... trying to figure out how to use SupportRequestIndexCondition to
convert a call to Intersects() in to a call that also has the operator
&& as well.

Looking at the examples, they are making use of the opfamily that
comes in SupportRequestIndexCondition.opfamily.

That opfamily Oid is the first one in the IndexOptInfo.opfamily array.
Here's where my thread of understanding fails to follow. I have, in
PostGIS, actually no operator families defined (CREATE OPERATOR
FAMILY). I do, however, have quite a few operator classes defined for
geometry: 10, actually!

 btree_geometry_ops
 hash_geometry_ops
 gist_geometry_ops_2d
 gist_geometry_ops_nd
 brin_geometry_inclusion_ops_2d
 brin_geometry_inclusion_ops_3d
 brin_geometry_inclusion_ops_4d
 spgist_geometry_ops_2d
 spgist_geometry_ops_nd
 spgist_geometry_ops_nd

Some of those are not useful to me (btree, hash) for sure.
Some of them (gist_geometry_ops_2d, spgist_geometry_ops_2d ) use the
&& operator to indicate the lossy operation I would like to combine
with ST_Intersects.
Some of them (gist_geometry_ops_nd, spgist_geometry_ops_nd) use the
&&& operator to indicate the lossy operation I would like to combine
with ST_Intersects.

A given call to ST_Intersects(tbl1.geom, tbl2.geom) could have two
indexes to apply the problem, but
SupportRequestIndexCondition.opfamily will, I assume, only be exposing
one to me: which one?

Anyways, to true up how hard this is, I've been carefully reading the
implementations for network address types and LIKE, and I'm still
barely at the WTF stage. The selectivity and the number of rows
support modes I could do. The SupportRequestIndexCondition is based on
a detailed knowledge of what an operator family is, an operator class
is, how those relate to types... I think I can get there, but it's
going to be far from easy (for me). And it'll put a pretty high bar in
front of anyone who previously just whacked an inline SQL function in
place to get an index assisted function up and running.

P.



Re: Allowing extensions to supply operator-/function-specific info

2019-01-29 Thread Tom Lane
Simon Riggs  writes:
> On Tue, 29 Jan 2019 at 09:55, Tom Lane  wrote:
>> I have no particular
>> interest in working on that right now, because it doesn't respond to
>> what I understand PostGIS' need to be, and there are only so many
>> hours in the day.  But maybe it could be made workable in the future.

> I thought the whole exercise was about adding generic tools for everybody
> to use.

Well, I'm building infrastructure plus a small subset of what might
someday sit atop that infrastructure.  I'm not prepared to commit
right now to building stuff I can't finish for v12.

> You said PostGIS was looking to
> "automatically convert WHERE clauses into lossy index quals." which sounds
> very similar to what I outlined.

As I understand it, what they have is complex WHERE clauses from which
they want to extract clauses usable with simple (non-expression) indexes.
The case you seem to be worried about is the reverse: complicated index
definition and simple WHERE clause.  I think we're agreed that these two
cases can't be solved with the very same facility.  The support-function
mechanism probably can be used to provide extensibility for logic that
tries to attack the complicated-index case, but its mere existence won't
cause that logic to spring into being.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-01-28 Thread Simon Riggs
On Tue, 29 Jan 2019 at 09:55, Tom Lane  wrote:

> Simon Riggs  writes:
> > On Sun, 27 Jan 2019 at 19:17, Tom Lane  wrote:
> >> ... I don't
> >> know whether that would satisfy your concern, because I'm not clear
> >> on what your concern is.
>
> > To be able to extract indexable clauses where none existed before.
>
> That's a pretty vague statement, because it describes what I want
> to do perfectly, but this doesn't:
>
> > Hash functions assume that x = N => hash(x) = hash(N) AND x = N
> > so I want to be able to assume
> > x = K => f(x) = f(K) AND x = K
> > for specific f()
> > to allow indexable operations when we have an index on f(x) only
>
> The problem with that is that if the only thing that's in the query is
> "x = K" then there is nothing to cue the planner that it'd be worth
> expending cycles thinking about f(x).


I agree. That is the equivalent of a SeqScan; the wrong way to approach it.


> Sure, you could hang a planner
> support function on the equals operator that would go off and expend
> arbitrary amounts of computation looking for speculative matches ...
> but nobody is going to accept that as a patch, because the cost/benefit
> ratio is going to be awful for 99% of users.
>
> The mechanism I'm proposing is based on the thought that for
> specialized functions (or operators) like PostGIS' ST_Intersects(),
> it'll be worth expending extra cycles when one of those shows up
> in WHERE.  I don't think that scales to plain-vanilla equality though.
>
> Conceivably, you could turn that around and look for support functions
> attached to the functions/operators that are in an index expression,
> and give them the opportunity to derive lossy indexquals based on
> comparing the index expression to query quals.


That way around is the right way. If an index exists, explore whether it
can be used or not. If there are no indexes with appropriate support
functions, it will cost almost nothing to normal queries.

The problem of deriving potentially useful indexes is more expensive, I
understand.


> I have no particular
> interest in working on that right now, because it doesn't respond to
> what I understand PostGIS' need to be, and there are only so many
> hours in the day.  But maybe it could be made workable in the future.
>

I thought the whole exercise was about adding generic tools for everybody
to use. The Tom I've worked with for more than a few years would not have
said that; that is my normal line! You said PostGIS was looking to
"automatically convert WHERE clauses into lossy index quals." which sounds
very similar to what I outlined.

Either way, thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Allowing extensions to supply operator-/function-specific info

2019-01-28 Thread Tom Lane
Simon Riggs  writes:
> On Sun, 27 Jan 2019 at 19:17, Tom Lane  wrote:
>> ... I don't
>> know whether that would satisfy your concern, because I'm not clear
>> on what your concern is.

> To be able to extract indexable clauses where none existed before.

That's a pretty vague statement, because it describes what I want
to do perfectly, but this doesn't:

> Hash functions assume that x = N => hash(x) = hash(N) AND x = N
> so I want to be able to assume
> x = K => f(x) = f(K) AND x = K
> for specific f()
> to allow indexable operations when we have an index on f(x) only

The problem with that is that if the only thing that's in the query is
"x = K" then there is nothing to cue the planner that it'd be worth
expending cycles thinking about f(x).  Sure, you could hang a planner
support function on the equals operator that would go off and expend
arbitrary amounts of computation looking for speculative matches ...
but nobody is going to accept that as a patch, because the cost/benefit
ratio is going to be awful for 99% of users.

The mechanism I'm proposing is based on the thought that for
specialized functions (or operators) like PostGIS' ST_Intersects(),
it'll be worth expending extra cycles when one of those shows up
in WHERE.  I don't think that scales to plain-vanilla equality though.

Conceivably, you could turn that around and look for support functions
attached to the functions/operators that are in an index expression,
and give them the opportunity to derive lossy indexquals based on
comparing the index expression to query quals.  I have no particular
interest in working on that right now, because it doesn't respond to
what I understand PostGIS' need to be, and there are only so many
hours in the day.  But maybe it could be made workable in the future.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-01-28 Thread Simon Riggs
On Sun, 27 Jan 2019 at 19:17, Tom Lane  wrote:


> > * Allow a normal term to match a functional index, e.g. WHERE x =
> > 'abcdefgh' => WHERE substr(x, 1 , 5) = 'abcde' AND x = 'abcdefgh'
>
> I'm a bit confused about what you think this example means.  I do
> intend to work on letting extensions define rules for extracting
> index clauses from function calls, because that's the requirement
> that PostGIS is after in the thread that started this.  I don't
> know whether that would satisfy your concern, because I'm not clear
> on what your concern is.
>

To be able to extract indexable clauses where none existed before.

Hash functions assume that x = N => hash(x) = hash(N) AND x = N
so I want to be able to assume
x = K => f(x) = f(K) AND x = K
for specific f()
to allow indexable operations when we have an index on f(x) only

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Allowing extensions to supply operator-/function-specific info

2019-01-28 Thread Tom Lane
Robert Haas  writes:
> On Sat, Jan 26, 2019 at 12:35 PM Tom Lane  wrote:
>> Attached is an 0004 that makes a stab at providing some intelligence
>> for unnest() and the integer cases of generate_series().

> That looks awesome.

> I'm somewhat dubious about whole API.  It's basically -- if you have a
> problem and a PhD in PostgreSQL-ology, you can write some C code to
> fix it.  On the other hand, the status quo is that you may as well
> just forget about fixing it, which is clearly even worse.  And I don't
> really know how to do better.

Well, you need to be able to write a C extension :-(.  I kinda wish
that were not a requirement, but in practice I think the main audience
is people like PostGIS, who already cleared that bar.  I hope that
we'll soon have a bunch of examples, like those in the 0004 patch,
that people can look at to see how to do things in this area.  I see
no reason to believe it'll be all that much harder than anything
else extension authors have to do.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-01-28 Thread Robert Haas
On Sat, Jan 26, 2019 at 12:35 PM Tom Lane  wrote:
> Attached is an 0004 that makes a stab at providing some intelligence
> for unnest() and the integer cases of generate_series().

That looks awesome.

I'm somewhat dubious about whole API.  It's basically -- if you have a
problem and a PhD in PostgreSQL-ology, you can write some C code to
fix it.  On the other hand, the status quo is that you may as well
just forget about fixing it, which is clearly even worse.  And I don't
really know how to do better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Allowing extensions to supply operator-/function-specific info

2019-01-27 Thread Tom Lane
Simon Riggs  writes:
> On Sun, 20 Jan 2019 at 23:48, Tom Lane  wrote:
>> What I'm envisioning therefore is that we allow an auxiliary function ...

> Does this help with these cases?

> * Allow a set returning function to specify number of output rows, in cases
> where that is variable and dependent upon the input params?

Yes, within the usual limits of what the planner can know.  The 0004
patch I posted yesterday correctly estimates the number of rows for
constant-arguments cases of generate_series() and unnest(anyarray),
and it also understands unnest(array[x,y,z,...]) even when some of the
array[] elements aren't constants.  There's room to add knowledge about
other SRFs, but those are cases I can recall hearing complaints about.

> * Allow a normal term to match a functional index, e.g. WHERE x =
> 'abcdefgh' => WHERE substr(x, 1 , 5) = 'abcde' AND x = 'abcdefgh'

I'm a bit confused about what you think this example means.  I do
intend to work on letting extensions define rules for extracting
index clauses from function calls, because that's the requirement
that PostGIS is after in the thread that started this.  I don't
know whether that would satisfy your concern, because I'm not clear
on what your concern is.

> * Allow us to realise that ORDER BY f(x) => ORDER BY x so we can use
> ordered paths from indexes, or avoid sorts.

Hm.  That's not part of what I'm hoping to get done for v12, but you
could imagine a future extension to add a support request type that
allows deriving related pathkeys.  There would be a lot of work to do
to make that happen, but the aspect of it that requires adding
function-specific knowledge could usefully be packaged as a
support-function request.

regards, tom lane



Re: Allowing extensions to supply operator-/function-specific info

2019-01-27 Thread Simon Riggs
On Sun, 20 Jan 2019 at 23:48, Tom Lane  wrote:


> What I'm envisioning therefore is that we allow an auxiliary function to
> be attached to any operator or function that can provide functionality
> like this, and that we set things up so that the set of tasks that
> such functions can perform can be extended over time without SQL-level
> changes.  For example, we could say that the function takes a single
> Node* argument, and that the type of Node tells it what to do, and if it
> doesn't recognize the type of Node it should just return NULL indicating
> "use default handling".  We'd start out with two relevant Node types,
> one for the selectivity-estimation case and one for the extract-a-lossy-
> index-qual case, and we could add more over time.
>

Does this help with these cases?

* Allow a set returning function to specify number of output rows, in cases
where that is variable and dependent upon the input params?

* Allow a normal term to match a functional index, e.g. WHERE x =
'abcdefgh' => WHERE substr(x, 1 , 5) = 'abcde' AND x = 'abcdefgh'

* Allow us to realise that ORDER BY f(x) => ORDER BY x so we can use
ordered paths from indexes, or avoid sorts.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Allowing extensions to supply operator-/function-specific info

2019-01-26 Thread Tom Lane
I wrote:
> There's a considerable amount of follow-up work that ought to happen
> now to make use of these capabilities for places that have been
> pain points in the past, such as generate_series() and unnest().
> But I haven't touched that yet.

Attached is an 0004 that makes a stab at providing some intelligence
for unnest() and the integer cases of generate_series().  This only
affects one plan choice in the existing regression tests; I tweaked
that test to keep the plan the same.  I didn't add new test cases
demonstrating the functionality, since it's a bit hard to show it
directly within the constraints of EXPLAIN (COSTS OFF).  We could
do something along the lines of the quick-hack rowcount test in 0003,
perhaps, but that's pretty indirect.

Looking at this, I'm dissatisfied with the amount of new #include's
being dragged into datatype-specific .c files.  I don't really want
to end up with most of utils/adt/ having dependencies on planner
data structures, but that's where we would be headed.  I can think
of a couple of possibilities:

* Instead of putting support functions beside their target function,
group all the core's support functions into one new .c file.  I'm
afraid this would lead to the reverse problem of having to import
lots of datatype-private info into that file.

* Try to refactor the planner's .h files so that there's just one
"external use" header providing stuff like estimate_expression_value,
while keeping PlannerInfo as an opaque struct.  Then importing that
into utils/adt/ files would not represent such a big dependency
footprint.

I find the second choice more appealing, though it's getting a bit
far afield from where this started.  OTOH, lots of other header
refactoring is going on right now, so why not ...

Thoughts?

regards, tom lane

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index e457d81..14cc202 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -22,12 +22,15 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
+#include "nodes/supportnodes.h"
+#include "optimizer/clauses.h"
 #include "utils/array.h"
 #include "utils/arrayaccess.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
 
@@ -6026,6 +6029,36 @@ array_unnest(PG_FUNCTION_ARGS)
 	}
 }
 
+/*
+ * Planner support function for array_unnest(anyarray)
+ */
+Datum
+array_unnest_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestRows))
+	{
+		/* Try to estimate the number of rows returned */
+		SupportRequestRows *req = (SupportRequestRows *) rawreq;
+
+		if (is_funcclause(req->node))	/* be paranoid */
+		{
+			List	   *args = ((FuncExpr *) req->node)->args;
+			Node	   *arg1;
+
+			/* We can use estimated argument values here */
+			arg1 = estimate_expression_value(req->root, linitial(args));
+
+			req->rows = estimate_array_length(arg1);
+			ret = (Node *) req;
+		}
+	}
+
+	PG_RETURN_POINTER(ret);
+}
+
 
 /*
  * array_replace/array_remove support
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index fd82a83..263920c 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -30,11 +30,14 @@
 
 #include 
 #include 
+#include 
 
 #include "catalog/pg_type.h"
 #include "common/int.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
+#include "nodes/supportnodes.h"
+#include "optimizer/clauses.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 
@@ -1427,3 +1430,73 @@ generate_series_step_int4(PG_FUNCTION_ARGS)
 		/* do when there is no more left */
 		SRF_RETURN_DONE(funcctx);
 }
+
+/*
+ * Planner support function for generate_series(int4, int4 [, int4])
+ */
+Datum
+generate_series_int4_support(PG_FUNCTION_ARGS)
+{
+	Node	   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Node	   *ret = NULL;
+
+	if (IsA(rawreq, SupportRequestRows))
+	{
+		/* Try to estimate the number of rows returned */
+		SupportRequestRows *req = (SupportRequestRows *) rawreq;
+
+		if (is_funcclause(req->node))	/* be paranoid */
+		{
+			List	   *args = ((FuncExpr *) req->node)->args;
+			Node	   *arg1,
+	   *arg2,
+	   *arg3;
+
+			/* We can use estimated argument values here */
+			arg1 = estimate_expression_value(req->root, linitial(args));
+			arg2 = estimate_expression_value(req->root, lsecond(args));
+			if (list_length(args) >= 3)
+arg3 = estimate_expression_value(req->root, lthird(args));
+			else
+arg3 = NULL;
+
+			/*
+			 * If any argument is constant NULL, we can safely assume that
+			 * zero rows are returned.  Otherwise, if they're all non-NULL
+			 * constants, we can calculate the number of rows that will be
+			 * returned.  Use double arithmetic to avoid overflow hazards.
+			 */
+			if ((IsA(arg1, Const) &&
+ ((Const *) 

Allowing extensions to supply operator-/function-specific info

2019-01-20 Thread Tom Lane
Over in the thread at [1], we realized that PostGIS has been thrashing
around trying to fake its way to having "special index operators", ie
a way to automatically convert WHERE clauses into lossy index quals.
That's existed in a non-extensible way inside indxpath.c for twenty
years come July.  Since the beginning I've thought we should provide
a way for extensions to do similar things, but it never got to the top
of the to-do queue.  Now I think it's time.

One low-effort answer is to add a hook call in indxpath.c that lets
extensions manipulate the sets of index clauses extracted from a
relation's qual clauses, but I don't especially like that: it dumps
all the work onto extensions, resulting in lots of code duplication,
plus they have a badly-documented and probably moving target for what
they have to do.

Another bit of technical debt that's even older is the lack of a way
to attach selectivity estimation logic to boolean-returning functions.
So that motivates me to think that whatever we do here should be easily
extensible to allow different sorts of function- or operator-related
knowledge to be supplied by extensions.  We already have oprrest,
oprjoin, and protransform hooks that allow certain kinds of knowledge
to be attached to operators and functions, but we need something a bit
more general.

What I'm envisioning therefore is that we allow an auxiliary function to
be attached to any operator or function that can provide functionality
like this, and that we set things up so that the set of tasks that
such functions can perform can be extended over time without SQL-level
changes.  For example, we could say that the function takes a single
Node* argument, and that the type of Node tells it what to do, and if it
doesn't recognize the type of Node it should just return NULL indicating
"use default handling".  We'd start out with two relevant Node types,
one for the selectivity-estimation case and one for the extract-a-lossy-
index-qual case, and we could add more over time.

What we can do to attach such a support function to a target function
is to repurpose the pg_proc.protransform column to represent the
support function.  The existing protransform functions already have
nearly the sort of API I'm thinking about, but they only accept
FuncExpr* not any other node type.  It'd be easy to change them
though, because there's only about a dozen and they are all in core;
we never invented any way for extensions to access that functionality.
(So actually, the initial API spec here would include three
possibilities, the third one being equivalent to the current
protransform behavior.)

As for attaching support functions to operators, we could
consider widening the pg_operator catalog to add a new column.
But I think it might be a cleaner answer to just say "use the support
function attached to the operator's implementation function,
if there is one".  This would require that the support functions
be able to cope with either FuncExpr or OpExpr inputs, but that
does not seem like much of a burden as long as it's part of the
API spec from day one.

Since there isn't any SQL API for attaching support functions,
we'd have to add one, but adding another clause to CREATE FUNCTION
isn't all that hard.  (Annoyingly, we haven't created any cheaply
extensible syntax for CREATE FUNCTION, so this'd likely require
adding another keyword.  I'm not interested in doing more than
that right now, though.)

I'd be inclined to rename pg_proc.protransform to "prosupport"
to reflect its wider responsibility, and make the new CREATE FUNCTION
clause be "SUPPORT FUNCTION foo" or some such.  I'm not wedded
to that terminology, if anyone has a better idea.

One thing that's not entirely clear to me is what permissions would be
required to use that clause.  The support functions will have signature
"f(internal) returns internal", so creating them at all will require
superuser privilege, but it seems like we probably also need to restrict
the ability to attach one to a target function --- attaching one to
the wrong function could probably have bad consequences.  The easy way
out is to say "you must be superuser"; maybe that's enough for now,
since all the plausible use-cases for this are in extensions containing
C functions anyway.  (A support function would have to be coded in C,
although it seems possible that its target function could be something
else.)

Thoughts?  If we have agreement on this basic design, making it happen
seems like a pretty straightforward task.

regards, tom lane

PS: there is, however, a stumbling block that I'll address in a separate
message, as it seems independent of this infrastructure.

[1] 
https://www.postgresql.org/message-id/flat/CACowWR0TXXL0NfPMW2afCKzX++nHHBZLW3-BLusu_B0WjBB1=a...@mail.gmail.com