Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Sat, Mar 2, 2019 at 5:29 PM Tom Lane wrote: > > James Coleman writes: > > On Fri, Mar 1, 2019 at 5:28 PM Tom Lane wrote: > >> I also tweaked it to recognize the case where we can prove the > >> array, rather than the scalar, to be null. I'm not sure how useful > >> that is in practice, but it seemed weird that we'd exploit that > >> only if we can also prove the scalar to be null. > > > Just for my own understanding: I thought the "if > > (arrayconst->constisnull)" took care of the array constant being null? > > Yeah, but only after we've already matched the subexpr to the LHS, > otherwise we'd not be bothering to determine the array's size. > > On the other hand, if for some reason we know that the array side > is NULL, we can conclude that the ScalarArrayOp returns NULL > independently of what the LHS is. Thanks for the detailed explanation. > > I don't see a check on the scalar node / lhs. I do see you added a > > check for the entire clause being null, but I'm not sure I understand > > when that would occur (unless it's only in the recursive case?) > > Yeah, it's to handle the case where we run into a constant NULL > below a strict node. Typically, that doesn't happen because > eval_const_expressions would have const-folded the upper node, but > it's not impossible given all the cases that clause_is_strict_for > can recurse through now. (The coverage bot does show that code being > reached, although perhaps that only occurs for null-below-ScalarArrayOp, > in which case it might be dead if we were to make eval_const_expressions > smarter.) Ah, that makes sense. James Coleman
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > On Fri, Mar 1, 2019 at 5:28 PM Tom Lane wrote: >> I also tweaked it to recognize the case where we can prove the >> array, rather than the scalar, to be null. I'm not sure how useful >> that is in practice, but it seemed weird that we'd exploit that >> only if we can also prove the scalar to be null. > Just for my own understanding: I thought the "if > (arrayconst->constisnull)" took care of the array constant being null? Yeah, but only after we've already matched the subexpr to the LHS, otherwise we'd not be bothering to determine the array's size. On the other hand, if for some reason we know that the array side is NULL, we can conclude that the ScalarArrayOp returns NULL independently of what the LHS is. > I don't see a check on the scalar node / lhs. I do see you added a > check for the entire clause being null, but I'm not sure I understand > when that would occur (unless it's only in the recursive case?) Yeah, it's to handle the case where we run into a constant NULL below a strict node. Typically, that doesn't happen because eval_const_expressions would have const-folded the upper node, but it's not impossible given all the cases that clause_is_strict_for can recurse through now. (The coverage bot does show that code being reached, although perhaps that only occurs for null-below-ScalarArrayOp, in which case it might be dead if we were to make eval_const_expressions smarter.) regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Fri, Mar 1, 2019 at 5:28 PM Tom Lane wrote: > > James Coleman writes: > > [ saop_is_not_null-v10.patch ] > > I went through this again, and this time (after some more rewriting > of the comments) I satisfied myself that the logic is correct. > Hence, pushed. Thanks! > I also tweaked it to recognize the case where we can prove the > array, rather than the scalar, to be null. I'm not sure how useful > that is in practice, but it seemed weird that we'd exploit that > only if we can also prove the scalar to be null. Just for my own understanding: I thought the "if (arrayconst->constisnull)" took care of the array constant being null? I don't see a check on the scalar node / lhs. I do see you added a check for the entire clause being null, but I'm not sure I understand when that would occur (unless it's only in the recursive case?) > Take a look at the ScalarArrayOp case in eval_const_expressions. > Right now it's only smart about the all-inputs-constant case. > I'm not really convinced it's worth spending cycles on the constant- > null-array case, but that'd be where to do it if we want to do it > in a general way. (I think that what I added to clause_is_strict_for > is far cheaper, because while it's the same test, it's in a place > that we won't hit during most queries.) Thanks for the pointer; I'll take a look if for no other reason than curiosity. Thanks, James Coleman
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > [ saop_is_not_null-v10.patch ] I went through this again, and this time (after some more rewriting of the comments) I satisfied myself that the logic is correct. Hence, pushed. I stripped down the regression test cases somewhat. Those were good for development, but they were about doubling the runtime of test_predtest.sql, which seemed excessive for testing one feature. I also tweaked it to recognize the case where we can prove the array, rather than the scalar, to be null. I'm not sure how useful that is in practice, but it seemed weird that we'd exploit that only if we can also prove the scalar to be null. > I've implemented this fix as suggested. The added argument I've > initially called "allow_false"; I don't love the name, but it is > directly what it does. I'd appreciate other suggestions (if you prefer > it change). I was tempted to rename it to "weak", but decided that that might be overloading that term too much in this module. Left it as-is. > Ideally I think this case would be handled elsewhere in the optimizer > by directly replacing the saop and a constant NULL array with NULL, > but this isn't the patch to do that, and at any rate I'd have to do a > bit more investigation to understand how to do such (if it's easily > possible). Take a look at the ScalarArrayOp case in eval_const_expressions. Right now it's only smart about the all-inputs-constant case. I'm not really convinced it's worth spending cycles on the constant- null-array case, but that'd be where to do it if we want to do it in a general way. (I think that what I added to clause_is_strict_for is far cheaper, because while it's the same test, it's in a place that we won't hit during most queries.) > I also updated the tests with a new helper function "opaque_array" to > make it easy to test cases where the planner can't inspect the array. Yeah, that's a win. I used that in most of the tests that I left in place, but I kept a couple with long arrays just so we'd have code coverage of the parts of clause_is_strict_for that need to detect array size. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Mon, Feb 18, 2019 at 4:53 PM Tom Lane wrote: > So ... this is actively wrong. > > This case shows that you can't ignore the empty-array possibility > for a ScalarArrayOpExpr with useOr = false, because > "SELECT null::int = all(array[]::int[])" yields TRUE: > > contrib_regression=# select * from test_predtest($$ > select x is not null, x = all(array(select i from generate_series(1,0) i)) > from integers > $$); > WARNING: strong_implied_by result is incorrect > -[ RECORD 1 ]-+-- > strong_implied_by | t > weak_implied_by | f > strong_refuted_by | f > weak_refuted_by | f > s_i_holds | f > w_i_holds | f > s_r_holds | f > w_r_holds | f > > This case shows that you can't ignore the distinction between null > and false results once you've descended through strict function(s): > > contrib_regression=# select * from test_predtest($$ > select x is not null, strictf(true, x = any(array(select i from > generate_series(1,0) i))) > from integers > $$); > WARNING: strong_implied_by result is incorrect > -[ RECORD 1 ]-+-- > strong_implied_by | t > weak_implied_by | f > strong_refuted_by | f > weak_refuted_by | f > s_i_holds | f > w_i_holds | f > s_r_holds | f > w_r_holds | f > > (In this usage, the strictf() function from the test_predtest.sql > will simply return the NOT of its second input; so it gives different > answers depending on whether the SAOP returned false or null.) > > I think you could probably repair the second problem by adding > an additional argument to clause_is_strict_for() indicating whether > it has to prove the clause to be NULL or merely non-TRUE; when > recursing you'd always have to ask for the former. I've implemented this fix as suggested. The added argument I've initially called "allow_false"; I don't love the name, but it is directly what it does. I'd appreciate other suggestions (if you prefer it change). > The first problem could be resolved by insisting on being able to > prove the array non-empty when !useOr, but I'm not sure if it's > really worth the trouble, as opposed to just not trying to prove > anything for !useOr cases. I'm not sure that "x op ALL(ARRAY)" > occurs often enough in the wild to justify expending code on. > > The other case where being able to prove the array nonempty might > be worth something is when you've recursed and hence need to be > able to prove the SAOP's result to be NULL not just not-TRUE. > Again though, I don't know how often that occurs in practice, > so even the combination of those two motivations might not be > worth having code to check the array argument. I've implemented this also; the thing that puts it over the edge for me on the question of "is it worth it in the wild" is cases like "x !op ALL(ARRAY)", since it seems plausible to me that that's (unlike the non-negated case) a reasonably common operation. At the very least it seems likely to be a more logically interesting operation, which I'm taking as a proxy for common, I suppose. > It'd also be worth spending some brain cells on what happens > when the SAOP's array argument is null overall, which causes > its result to be null (not false). Maybe the logic goes > through without needing any explicit test for that case > (and indeed I couldn't break it in testing that), but it'd > likely be worth a comment. Since all SAOPs (both ALL and ANY) return NULL when the array is NULL regardless of the LHS argument, then this case qualifies as strict. I've included this in the above fix since both NULL constant arrays and non-empty arrays can both be proven strict with strict operators. Ideally I think this case would be handled elsewhere in the optimizer by directly replacing the saop and a constant NULL array with NULL, but this isn't the patch to do that, and at any rate I'd have to do a bit more investigation to understand how to do such (if it's easily possible). > I don't especially care for the proposed function name > "clause_proves_for_null_test". The only thing that brings to my > mind is the question "proves WHAT, exactly?" --- and as these > examples show, being crystal clear on what it proves is essential. Given the new argument, I've reverted the name change. I also updated the tests with a new helper function "opaque_array" to make it easy to test cases where the planner can't inspect the array. That way we don't need to rely on CTEs as an optimization fence -- thus improving both maintainability and readability. I also noticed there were also some tests where the array was already opaque yet I was still using CTEs as an optimization fence; I've cleaned those up. I've updated the comments significantly to reflect the above changes. I've attached an updated (and rebased) patch. James Coleman commit 2cf8f951ca6524d4eb4c927d9903736a0751c89c Author: jcoleman Date: Wed Nov 14 16:49:52 2018 + Extend IS [NOT] NULL predicate proofs for arrays For the purposes of predicate pr
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
BTW, I just pushed a fix (e04a3905e) that adds a little more code to clause_is_strict_for(). This shouldn't cause more than minor rebasing pain for you, I hope. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > I forgot to update in v8 so attaching v9. So ... this is actively wrong. This case shows that you can't ignore the empty-array possibility for a ScalarArrayOpExpr with useOr = false, because "SELECT null::int = all(array[]::int[])" yields TRUE: contrib_regression=# select * from test_predtest($$ select x is not null, x = all(array(select i from generate_series(1,0) i)) from integers $$); WARNING: strong_implied_by result is incorrect -[ RECORD 1 ]-+-- strong_implied_by | t weak_implied_by | f strong_refuted_by | f weak_refuted_by | f s_i_holds | f w_i_holds | f s_r_holds | f w_r_holds | f This case shows that you can't ignore the distinction between null and false results once you've descended through strict function(s): contrib_regression=# select * from test_predtest($$ select x is not null, strictf(true, x = any(array(select i from generate_series(1,0) i))) from integers $$); WARNING: strong_implied_by result is incorrect -[ RECORD 1 ]-+-- strong_implied_by | t weak_implied_by | f strong_refuted_by | f weak_refuted_by | f s_i_holds | f w_i_holds | f s_r_holds | f w_r_holds | f (In this usage, the strictf() function from the test_predtest.sql will simply return the NOT of its second input; so it gives different answers depending on whether the SAOP returned false or null.) I think you could probably repair the second problem by adding an additional argument to clause_is_strict_for() indicating whether it has to prove the clause to be NULL or merely non-TRUE; when recursing you'd always have to ask for the former. The first problem could be resolved by insisting on being able to prove the array non-empty when !useOr, but I'm not sure if it's really worth the trouble, as opposed to just not trying to prove anything for !useOr cases. I'm not sure that "x op ALL(ARRAY)" occurs often enough in the wild to justify expending code on. The other case where being able to prove the array nonempty might be worth something is when you've recursed and hence need to be able to prove the SAOP's result to be NULL not just not-TRUE. Again though, I don't know how often that occurs in practice, so even the combination of those two motivations might not be worth having code to check the array argument. It'd also be worth spending some brain cells on what happens when the SAOP's array argument is null overall, which causes its result to be null (not false). Maybe the logic goes through without needing any explicit test for that case (and indeed I couldn't break it in testing that), but it'd likely be worth a comment. I don't especially care for the proposed function name "clause_proves_for_null_test". The only thing that brings to my mind is the question "proves WHAT, exactly?" --- and as these examples show, being crystal clear on what it proves is essential. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
> > This comment seems wrong: > > > > + * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but > > + * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply > > non-falsity. > > > > "non-falsity does not imply non-falsity"? I suppose one of those > > negations should be different ... > > Earlier in the file weak implication (comments above > predicate_implied_by) is defined as "non-falsity of A implies > non-falsity of B". In this example we have NULL for A (non-false) but > false for B, so that definition doesn't hold. So I think the comment > is accurate, but I can reword if you have an idea of what you'd like > to see (I've tweaked a bit in the attached patch to start). I forgot to update in v8 so attaching v9. James Coleman saop_is_not_null-v9.patch Description: Binary data
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Tue, Jan 22, 2019 at 4:26 AM Alvaro Herrera wrote: > > Hello, I gave this patch a very quick scan. I didn't check the actual > logic behind it. > > This comment seems wrong: > > + * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but > + * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply > non-falsity. > > "non-falsity does not imply non-falsity"? I suppose one of those > negations should be different ... Earlier in the file weak implication (comments above predicate_implied_by) is defined as "non-falsity of A implies non-falsity of B". In this example we have NULL for A (non-false) but false for B, so that definition doesn't hold. So I think the comment is accurate, but I can reword if you have an idea of what you'd like to see (I've tweaked a bit in the attached patch to start). > I think the name clause_proved_for_null_test() is a bit weird, being in > the past tense. I'd maybe change "proved" to "proves". Changed. > s/exppresions/expresions/ in the test files. Fixed. James Coleman saop_is_not_null-v8.patch Description: Binary data
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
Hello, I gave this patch a very quick scan. I didn't check the actual logic behind it. This comment seems wrong: + * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but + * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply non-falsity. "non-falsity does not imply non-falsity"? I suppose one of those negations should be different ... I think the name clause_proved_for_null_test() is a bit weird, being in the past tense. I'd maybe change "proved" to "proves". s/exppresions/expresions/ in the test files. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Wed, Jan 16, 2019 at 8:49 AM James Coleman wrote: > > On Tue, Jan 15, 2019 at 11:37 PM Tom Lane wrote: > > Well, as I said upthread, it seems like we need to think a bit more > > carefully about what it is that clause_is_strict_for is testing --- > > and if that ends up finding that some other name is more apposite, > > I'd not have any hesitation about renaming it. But we're really > > missing a bet if the ScalarArrayOp-specific check isn't inside that > > recursive check of an "atomic" expression's properties. The > > routines above there only recurse through things that act like > > AND or OR, but clause_is_strict_for is capable of descending > > through other stuff as long as it's strict in some sense. What > > we need to be clear about here is exactly what that sense is. > > All right, I'll look over all of the other callers and see what makes > sense as far as naming (or perhaps consider a new parallel function; > unsure at this point.) I investigated all of the callers of clause_is_strict_for and confirmed they fall into two buckets: - Recursive case inside the function itself. - Callers proving a variant of IS [NOT] NULL. Given all cases appear to be safe to extend to scalar array ops, I've tentatively renamed the function clause_proved_for_null_test and moved the code back into that function rather than be in the caller. This also guarantees that beyond proving implication of IS NOT NULL and refutation of IS NULL we also get proof of weak refutation of strict predicates (since IS NOT NULL refutes them and we strongly imply IS NOT NULL); I've added tests for this case. I added many more comments explaining the proofs here, but it boils down to the fact that non-empty array ops are really just a strict clause, and the empty array case isn't strict, but when not returning NULL returns false, and therefore we can still make the same proofs. Since the empty array case is really the only interesting one, I brought back the empty array tests, but modified them to use calculated/non-constant arrays so that they actually hit the new code paths. I also verified that the proofs they make are a subset of the ones we get from existing code for constant empty arrays (it makes sense we'd be able to prove less about possibly empty arrays than known empty arrays.) > I might also try to see if we can edit the tests slightly to require > the recursion case to be exercised. I didn't tackle this, but if you think it would add real value, then I can look into it further. I also updated the commit message to better match the more extended implications of this change over just the IS NOT NULL case I was originally pursuing. One other thing: for known non-empty arrays we could further extend this to prove that an IS NULL clause weakly implies the ScalarArrayOp. I'm not sure this is worth is it; if someone things otherwise let me know. James Coleman saop_is_not_null-v7.patch Description: Binary data
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > One other question on testing: do you think the "calculated array" > tests are good enough by themselves (i.e., remove the ones with array > constants of > 100 values)? I dislike that it's not as obvious what's > going on, but given that the code shouldn't care about array size > anyway...so if there's an inclination to fewer tests that's the first > place I'd look at cutting them. I don't have a strong opinion about that at this point. It might be clearer once the patch is finished; for now, there's no harm in erring towards the more-tests direction. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Tue, Jan 15, 2019 at 11:37 PM Tom Lane wrote: > Well, as I said upthread, it seems like we need to think a bit more > carefully about what it is that clause_is_strict_for is testing --- > and if that ends up finding that some other name is more apposite, > I'd not have any hesitation about renaming it. But we're really > missing a bet if the ScalarArrayOp-specific check isn't inside that > recursive check of an "atomic" expression's properties. The > routines above there only recurse through things that act like > AND or OR, but clause_is_strict_for is capable of descending > through other stuff as long as it's strict in some sense. What > we need to be clear about here is exactly what that sense is. All right, I'll look over all of the other callers and see what makes sense as far as naming (or perhaps consider a new parallel function; unsure at this point.) I might also try to see if we can edit the tests slightly to require the recursion case to be exercised. One other question on testing: do you think the "calculated array" tests are good enough by themselves (i.e., remove the ones with array constants of > 100 values)? I dislike that it's not as obvious what's going on, but given that the code shouldn't care about array size anyway...so if there's an inclination to fewer tests that's the first place I'd look at cutting them. James Coleman
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
David Rowley writes: > On Wed, 16 Jan 2019 at 14:29, James Coleman wrote: >> Is your preference in this kind of case to comment the code and/or >> tests but stick with int4pl even though it doesn't demonstrate the >> "problem", or try to engineer a different test case such that the >> *_holds results in the tests don't seem to imply we could prove more >> things than we do? > I think using x+x or whatever is fine. I doubt there's a need to > invent some new function that returns NULL on non-NULL input. The > code you're adding has no idea about the difference between the two. > It has no way to know that anyway. Yeah, I never intended that the *_holds results would be "exact" in every test case. They're mostly there to catch egregious errors in test construction. Since the code we're actually testing doesn't get to see the input or output values of the test cases, it's not really useful to insist that the test cases exercise every possible combination of input/output values for the given expressions. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > On Tue, Jan 15, 2019 at 3:53 PM Tom Lane wrote: >> I quite dislike taking the responsibility out of clause_is_strict_for >> and putting it in the callers. > The reason I moved it was that we're no longer just proving > strictness, so it seemed odd to put it in a function specifically > named to be about proving strictness. Well, as I said upthread, it seems like we need to think a bit more carefully about what it is that clause_is_strict_for is testing --- and if that ends up finding that some other name is more apposite, I'd not have any hesitation about renaming it. But we're really missing a bet if the ScalarArrayOp-specific check isn't inside that recursive check of an "atomic" expression's properties. The routines above there only recurse through things that act like AND or OR, but clause_is_strict_for is capable of descending through other stuff as long as it's strict in some sense. What we need to be clear about here is exactly what that sense is. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Wed, 16 Jan 2019 at 14:29, James Coleman wrote: > > On Tue, Jan 15, 2019 at 8:14 PM David Rowley > > While int4pl might do what you want, some other strict function might > > not. A simple example would be a strict function that decided to > > return NULL when the two ints combined overflowed int. > > Yes, the claim about not all strict functions guarantee this (like > int4pl) made sense. > > Is your preference in this kind of case to comment the code and/or > tests but stick with int4pl even though it doesn't demonstrate the > "problem", or try to engineer a different test case such that the > *_holds results in the tests don't seem to imply we could prove more > things than we do? I think using x+x or whatever is fine. I doubt there's a need to invent some new function that returns NULL on non-NULL input. The code you're adding has no idea about the difference between the two. It has no way to know that anyway. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Tue, Jan 15, 2019 at 8:14 PM David Rowley wrote: > > On Wed, 16 Jan 2019 at 14:05, James Coleman wrote: > > At the risk of missing something obvious, I'm not sure I see a case > > where "x is not null" does not imply "(x + x) is not null", at least > > for integers. Since an integer + an integer results in an integer, > > then it must imply the addition of itself is not null also? > > A strict function guarantees it will return NULL on any NULL input. > This does not mean it can't return NULL on a non-NULL input. > > While int4pl might do what you want, some other strict function might > not. A simple example would be a strict function that decided to > return NULL when the two ints combined overflowed int. Yes, the claim about not all strict functions guarantee this (like int4pl) made sense. Is your preference in this kind of case to comment the code and/or tests but stick with int4pl even though it doesn't demonstrate the "problem", or try to engineer a different test case such that the *_holds results in the tests don't seem to imply we could prove more things than we do? James Coleman
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Wed, 16 Jan 2019 at 14:05, James Coleman wrote: > At the risk of missing something obvious, I'm not sure I see a case > where "x is not null" does not imply "(x + x) is not null", at least > for integers. Since an integer + an integer results in an integer, > then it must imply the addition of itself is not null also? A strict function guarantees it will return NULL on any NULL input. This does not mean it can't return NULL on a non-NULL input. While int4pl might do what you want, some other strict function might not. A simple example would be a strict function that decided to return NULL when the two ints combined overflowed int. The docs [1] define STRICT as: "RETURNS NULL ON NULL INPUT or STRICT indicates that the function always returns null whenever any of its arguments are null. If this parameter is specified, the function is not executed when there are null arguments; instead a null result is assumed automatically." [1] https://www.postgresql.org/docs/devel/sql-createfunction.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Tue, Jan 15, 2019 at 4:36 PM David Rowley wrote: > I wasn't suggesting any code changes. I just thought the code was > sufficiently hard to understand to warrant some additional tests that > ensure we don't assume that if the int4 column x is not null that also > x+x is not null. Only the reverse can be implied since int4pl is > strict. At the risk of missing something obvious, I'm not sure I see a case where "x is not null" does not imply "(x + x) is not null", at least for integers. Since an integer + an integer results in an integer, then it must imply the addition of itself is not null also? This is the root of the questions I had: James Coleman writes: > 1. The current code doesn't result in "strongly_implied_by = t" for > the "(x + x) is not null" case, but it does result in "s_i_holds = t". > This doesn't change if I switch to using "equal()" as mentioned above. > 3. The tests I have for refuting "(x + x) is null" show that both > s_r_holds and w_r_holds. I'd expected these to be false. James Coleman
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Tue, Jan 15, 2019 at 3:53 PM Tom Lane wrote: > > James Coleman writes: > > [ saop_is_not_null-v6.patch ] > > I quite dislike taking the responsibility out of clause_is_strict_for > and putting it in the callers. Aside from requiring duplicative code, > it means that this fails to prove anything for recursive situations > (i.e., where the ScalarArrayOp appears one or more levels down in a > clause examined by clause_is_strict_for). > > If the behavior needs to vary depending on proof rule, which it > looks like it does, the way to handle that is to add flag argument(s) > to clause_is_strict_for. The reason I moved it was that we're no longer just proving strictness, so it seemed odd to put it in a function specifically named to be about proving strictness. If you'd prefer an argument like "bool allow_false" or "bool check_saop_implies_is_null" I'm happy to make that change, but it seems generally unhelpful to change the function implementation to contradict the name (the empty array case that returns false even then the lhs is null). > I'm also not happy about the lack of comments justifying the proof > rules -- eg, it's far from obvious why you need to restrict one > case to !weak but not the other. Both implication and refutation have the !weak check; it's just that the check in the implication function was already present. I'll add more comments inline for the proof rules, but from what I can tell only strong implication holds for implication of IS NOT NULL. A sample case disproving weak implication is "select null is not null, null::int = any(array[null]::int[])" which results in "false, null", so non-falsity does not imply non-falsity. However I'm currently unable to come up with a case for refutation such that truth of a ScalarArrayOp refutes IS NULL, so I'll update code and tests for that. That also takes care of my point: James Coleman writes: > 2. The tests I have for refuting "x is null" show that w_r_holds as > well as s_r_holds. I'd only expected the latter, since only > "strongly_refuted_by = t". But it still leaves my questions (1) and (3). James Coleman
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Wed, 16 Jan 2019 at 03:33, James Coleman wrote: > > 2. I was also staring predicate_implied_by_simple_clause() a bit at > > the use of clause_is_strict_for() to ensure that the IS NOT NULL's > > operand matches the ScalarArrayOpExpr left operand. Since > > clause_is_strict_for() = "Can we prove that "clause" returns NULL if > > "subexpr" does?", in this case, your clause is the ScalarArrayOpExpr's > > left operand and subexpr is the IS NOT NULL's operand. This means > > that a partial index with "WHERE a IS NOT NULL" should also be fine to > > use for WHERE strict_func(a) IN (1,2,..., 101); since strict_func(a) > > must be NULL if a is NULL. Also also works for WHERE a+a > > IN(1,2,...,101); I wonder if it's worth adding a test for that, or > > even just modify one of the existing tests to ensure you get the same > > result from it. Perhaps it's worth an additional test to ensure that x > > IN(1,2,...,101) does not imply x+x IS NOT NULL and maybe that x+x IS > > NULL does not refute x IN(1,2,...,101), as a strict function is free > > to return NULL even if it's input are not NULL. > > Are you suggesting a different test than clause_is_strict_for to > verify the saop LHS is the same as the null test's arg? I suppose we > could use "equal()" instead? I wasn't suggesting any code changes. I just thought the code was sufficiently hard to understand to warrant some additional tests that ensure we don't assume that if the int4 column x is not null that also x+x is not null. Only the reverse can be implied since int4pl is strict. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > [ saop_is_not_null-v6.patch ] I quite dislike taking the responsibility out of clause_is_strict_for and putting it in the callers. Aside from requiring duplicative code, it means that this fails to prove anything for recursive situations (i.e., where the ScalarArrayOp appears one or more levels down in a clause examined by clause_is_strict_for). If the behavior needs to vary depending on proof rule, which it looks like it does, the way to handle that is to add flag argument(s) to clause_is_strict_for. I'm also not happy about the lack of comments justifying the proof rules -- eg, it's far from obvious why you need to restrict one case to !weak but not the other. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > I'm attaching the current version of the patch with the new tests, but > I'm not sure I understand the *_holds results mentioned above. Are > they an artifact of the data under test? Or do they suggest a > remaining bug? I'm a bit fuzzy on what to expect for *_holds though I > understand the requirements for strongly/weakly_implied/refuted_by. IIRC, the "foo_holds" outputs mean "the foo relationship appears to hold for these expressions across all tested inputs", for instance s_i_holds means "there were no tested cases where A is true and B is not true". The implied_by/refuted_by outputs mean "predtest.c claims it can prove this". Obviously, a claimed proof for a relationship that fails experimentally would be big trouble. The other way around just means that either the proof rules are inadequate to prove this particular case, or the set of test values for the expressions don't expose the fact that the relationship doesn't hold. Now, if we *expected* that predtest.c should be able to prove something, failure to do so would be a bug, but it's not a bug if we know it's incapable of making that proof. The second explanation might represent a bug in the test case. I added the foo_holds columns just as a sort of cross-check on the test cases themselves, they don't test anything about the predtest code. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Tue, Jan 15, 2019 at 12:47 AM David Rowley wrote: > 1. In: > > + if (IsA(clause, ScalarArrayOpExpr)) > + { > + ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause; > + Node *subexpr = (Node *) ((NullTest *) predicate)->arg; > + if (op_strict(saop->opno) && > + clause_is_strict_for((Node *) linitial(saop->args), subexpr)) > + return true; > + } > + > /* foo IS NOT NULL refutes foo IS NULL */ > if (clause && IsA(clause, NullTest) && > > Your IsA(clause, ScalarArrayOpExpr) condition should also be checking > that "clause" can't be NULL. The NullTest condition one below does > this. Fixed in my local copy. I also did the same in predicate_implied_by_simple_clause since it seems to have the same potential issue. After sleeping on it and looking at it again this morning, I also realized I need to exclude weak implication in predicate_refuted_by_simple_clause. > 2. I was also staring predicate_implied_by_simple_clause() a bit at > the use of clause_is_strict_for() to ensure that the IS NOT NULL's > operand matches the ScalarArrayOpExpr left operand. Since > clause_is_strict_for() = "Can we prove that "clause" returns NULL if > "subexpr" does?", in this case, your clause is the ScalarArrayOpExpr's > left operand and subexpr is the IS NOT NULL's operand. This means > that a partial index with "WHERE a IS NOT NULL" should also be fine to > use for WHERE strict_func(a) IN (1,2,..., 101); since strict_func(a) > must be NULL if a is NULL. Also also works for WHERE a+a > IN(1,2,...,101); I wonder if it's worth adding a test for that, or > even just modify one of the existing tests to ensure you get the same > result from it. Perhaps it's worth an additional test to ensure that x > IN(1,2,...,101) does not imply x+x IS NOT NULL and maybe that x+x IS > NULL does not refute x IN(1,2,...,101), as a strict function is free > to return NULL even if it's input are not NULL. Are you suggesting a different test than clause_is_strict_for to verify the saop LHS is the same as the null test's arg? I suppose we could use "equal()" instead? I've added several tests, and noticed a few things: 1. The current code doesn't result in "strongly_implied_by = t" for the "(x + x) is not null" case, but it does result in "s_i_holds = t". This doesn't change if I switch to using "equal()" as mentioned above. 2. The tests I have for refuting "x is null" show that w_r_holds as well as s_r_holds. I'd only expected the latter, since only "strongly_refuted_by = t". 3. The tests I have for refuting "(x + x) is null" show that both s_r_holds and w_r_holds. I'd expected these to be false. I'm attaching the current version of the patch with the new tests, but I'm not sure I understand the *_holds results mentioned above. Are they an artifact of the data under test? Or do they suggest a remaining bug? I'm a bit fuzzy on what to expect for *_holds though I understand the requirements for strongly/weakly_implied/refuted_by. James Coleman saop_is_not_null-v6.patch Description: Binary data
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Tue, 15 Jan 2019 at 12:24, James Coleman wrote: > While I add that though I wanted to get this updated version out to > get feedback on the approach. I had a look at this and there's a couple of things I see: 1. In: + if (IsA(clause, ScalarArrayOpExpr)) + { + ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause; + Node *subexpr = (Node *) ((NullTest *) predicate)->arg; + if (op_strict(saop->opno) && + clause_is_strict_for((Node *) linitial(saop->args), subexpr)) + return true; + } + /* foo IS NOT NULL refutes foo IS NULL */ if (clause && IsA(clause, NullTest) && Your IsA(clause, ScalarArrayOpExpr) condition should also be checking that "clause" can't be NULL. The NullTest condition one below does this. 2. I was also staring predicate_implied_by_simple_clause() a bit at the use of clause_is_strict_for() to ensure that the IS NOT NULL's operand matches the ScalarArrayOpExpr left operand. Since clause_is_strict_for() = "Can we prove that "clause" returns NULL if "subexpr" does?", in this case, your clause is the ScalarArrayOpExpr's left operand and subexpr is the IS NOT NULL's operand. This means that a partial index with "WHERE a IS NOT NULL" should also be fine to use for WHERE strict_func(a) IN (1,2,..., 101); since strict_func(a) must be NULL if a is NULL. Also also works for WHERE a+a IN(1,2,...,101); I wonder if it's worth adding a test for that, or even just modify one of the existing tests to ensure you get the same result from it. Perhaps it's worth an additional test to ensure that x IN(1,2,...,101) does not imply x+x IS NOT NULL and maybe that x+x IS NULL does not refute x IN(1,2,...,101), as a strict function is free to return NULL even if it's input are not NULL. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Mon, Jan 14, 2019 at 11:34 AM Tom Lane wrote: > >> Hm. That would be annoying, but on reflection I think maybe we don't > >> actually need to worry about emptiness of the array after all. The > >> thing that we need to prove, at least for the implication case, is > >> "truth of the ScalarArrayOp implies truth of the IS NOT NULL clause". > > > Are you thinking that implies clause_is_strict_for might not be the > > right/only place? Or just that the code in that function needs to > > consider if it should return different results in some cases to handle > > all 4 proof rules properly? > > The latter is what I was envisioning, but I haven't worked out details. The more that I look at this I'm wondering if there aren't two things here: it seems that the existing patch (with the code that excludes empty arrays) is likely useful on its own for cases I hadn't originally envisioned (or tested). Specifically it seems to allow us to conclude the result will be null if a field is known to be null. I think I'll try to add a test specific to this, but I'm not immediately sure I know a case where that matters. If anyone has an idea on what direction to head with this, I'd be happy to code it up. For the original goal of "truth of the ScalarArrayOp implies truth of the IS NOT NULL clause", it seems to me the proper place is predicate_implied_by_simple_clause where we already have a place to specifically work with IS NOT NULL expressions. That will allow this to be more targeted, work for empty arrays as well, and not potentially introduce an optimizer bug for strictness with empty arrays. I'm attaching an updated patch that does that. I've also added the "parallel" case in predicate_refuted_by_simple_clause, but I haven't added a test for that yet. I also would like to add a test for the sad path to parallel the happy path computed array/cte test. While I add that though I wanted to get this updated version out to get feedback on the approach. James Coleman saop_is_not_null-v5.patch Description: Binary data
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > Are those invariants we want to keep (and recognize as regression > tests)? If so, I can confirm that they aren't duplicative of the rest > of the file, and if not I can remove them. I can't get terribly excited about them ... if we were changing their behavior, or if there were a bug discovered here, then I'd think differently. But I'm not a fan of adding regression test cycles without a reason. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Mon, Jan 14, 2019 at 11:34 AM Tom Lane wrote: > > James Coleman writes: > > On Mon, Jan 14, 2019 at 11:08 AM Tom Lane wrote: > >> I think these test cases don't actually prove much about the behavior > >> of your patch. Wouldn't they be handled by the generic OR-conversion > >> logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items? > > > Which ones are you looking at in particular? The "inline" (non-cte) > > happy and sad path cases have 102 total array elements (as does the > > happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two > > tests are about the empty array case so much have 0 items (and were > > the ones that showed different behavior between v3 and v4). > > I was looking at the empty-array ones. I don't see how that reaches > your patch; we should not be doing predicate_implied_by_simple_clause > on the ScalarArrayOp itself, because predicate_classify should've > decided to treat it as an OR clause. At one point I was getting all 'f' instead of all 't' for those, but now I can't reproduce it going from either master to my patch or to my patch without the element count checks, so I'm not sure how/when I was getting that anymore. Are those invariants we want to keep (and recognize as regression tests)? If so, I can confirm that they aren't duplicative of the rest of the file, and if not I can remove them. James Coleman
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > On Mon, Jan 14, 2019 at 11:08 AM Tom Lane wrote: >> I think these test cases don't actually prove much about the behavior >> of your patch. Wouldn't they be handled by the generic OR-conversion >> logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items? > Which ones are you looking at in particular? The "inline" (non-cte) > happy and sad path cases have 102 total array elements (as does the > happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two > tests are about the empty array case so much have 0 items (and were > the ones that showed different behavior between v3 and v4). I was looking at the empty-array ones. I don't see how that reaches your patch; we should not be doing predicate_implied_by_simple_clause on the ScalarArrayOp itself, because predicate_classify should've decided to treat it as an OR clause. >> Hm. That would be annoying, but on reflection I think maybe we don't >> actually need to worry about emptiness of the array after all. The >> thing that we need to prove, at least for the implication case, is >> "truth of the ScalarArrayOp implies truth of the IS NOT NULL clause". > Are you thinking that implies clause_is_strict_for might not be the > right/only place? Or just that the code in that function needs to > consider if it should return different results in some cases to handle > all 4 proof rules properly? The latter is what I was envisioning, but I haven't worked out details. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Mon, Jan 14, 2019 at 11:08 AM Tom Lane wrote: > > James Coleman writes: > > On Sun, Jan 13, 2019 at 3:06 PM Tom Lane wrote: > >> There's still a logical problem here, which is that in order to > >> prove that the ScalarArrayOpExpr yields NULL when its LHS does, > >> you also have to prove that the RHS is not an empty array. > > > I've constructed a test case running the test_predtest function on the > > expression > > "select x is null, x = any(array[]::int[]) from integers" which I > > think gets at the logic bug you're noting. In doing this I noticed > > something I don't fully understand: on master this shows that "x = > > any(array[]::int[])" proves "x is null" (while my patch shows the > > opposite). Is this because the second expression being false means > > there's can be no valid value for x? It also claims it refutes the > > same, which is even more confusing. > > That sounds like a bug, but I think it's actually OK because a vacuous > OR is necessarily false, and a false predicate "implies" any conclusion > at all according to the customary definition of implication. If there > are any real optimization bugs in this area, it's probably a result of > calling code believing more than it should about the meaning of a proof. > > I think these test cases don't actually prove much about the behavior > of your patch. Wouldn't they be handled by the generic OR-conversion > logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items? Which ones are you looking at in particular? The "inline" (non-cte) happy and sad path cases have 102 total array elements (as does the happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two tests are about the empty array case so much have 0 items (and were the ones that showed different behavior between v3 and v4). > > It seems to me that this would mean that an IS NOT NULL index would > > still not be proven to be usable for a non-constant array (e.g., from > > a subquery calculated array), and in fact I've included a (failing) > > test demonstrating that fact in the attached patch. This feeds into my > > question above about there possibly being another corollary/proof that > > could be added for this case. > > Hm. That would be annoying, but on reflection I think maybe we don't > actually need to worry about emptiness of the array after all. The > thing that we need to prove, at least for the implication case, is > "truth of the ScalarArrayOp implies truth of the IS NOT NULL clause". > Per above, if the ScalarArrayOp is necessarily false, then we can > claim that the implication holds. However, we need to work through > all four proof rules (strong vs. weak, implication vs refutation) > and see which ones this applies to --- I'm not sure offhand that > the logic works for all four. (ENOCAFFEINE...) In any case it > requires thinking a bit differently about what clause_is_strict_for() > is really doing. Are you thinking that implies clause_is_strict_for might not be the right/only place? Or just that the code in that function needs to consider if it should return different results in some cases to handle all 4 proof rules properly? > > Semi-related: I noticed that predicate_classify handles an ArrayExpr > > by using list_length; this seems unnecessary if we know we can fail > > fast. I realize it's not a common problem, but I have seen extremely > > long arrays before, and maybe we should fall out of the count once we > > hit max+1 items. > > Huh? list_length() is constant-time. Facepalm. I'd somehow had it stuck in my head that we actually iterated the list to count. James Coleman
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > On Sun, Jan 13, 2019 at 3:06 PM Tom Lane wrote: >> There's still a logical problem here, which is that in order to >> prove that the ScalarArrayOpExpr yields NULL when its LHS does, >> you also have to prove that the RHS is not an empty array. > I've constructed a test case running the test_predtest function on the > expression > "select x is null, x = any(array[]::int[]) from integers" which I > think gets at the logic bug you're noting. In doing this I noticed > something I don't fully understand: on master this shows that "x = > any(array[]::int[])" proves "x is null" (while my patch shows the > opposite). Is this because the second expression being false means > there's can be no valid value for x? It also claims it refutes the > same, which is even more confusing. That sounds like a bug, but I think it's actually OK because a vacuous OR is necessarily false, and a false predicate "implies" any conclusion at all according to the customary definition of implication. If there are any real optimization bugs in this area, it's probably a result of calling code believing more than it should about the meaning of a proof. I think these test cases don't actually prove much about the behavior of your patch. Wouldn't they be handled by the generic OR-conversion logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items? > It seems to me that this would mean that an IS NOT NULL index would > still not be proven to be usable for a non-constant array (e.g., from > a subquery calculated array), and in fact I've included a (failing) > test demonstrating that fact in the attached patch. This feeds into my > question above about there possibly being another corollary/proof that > could be added for this case. Hm. That would be annoying, but on reflection I think maybe we don't actually need to worry about emptiness of the array after all. The thing that we need to prove, at least for the implication case, is "truth of the ScalarArrayOp implies truth of the IS NOT NULL clause". Per above, if the ScalarArrayOp is necessarily false, then we can claim that the implication holds. However, we need to work through all four proof rules (strong vs. weak, implication vs refutation) and see which ones this applies to --- I'm not sure offhand that the logic works for all four. (ENOCAFFEINE...) In any case it requires thinking a bit differently about what clause_is_strict_for() is really doing. > Semi-related: I noticed that predicate_classify handles an ArrayExpr > by using list_length; this seems unnecessary if we know we can fail > fast. I realize it's not a common problem, but I have seen extremely > long arrays before, and maybe we should fall out of the count once we > hit max+1 items. Huh? list_length() is constant-time. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Sun, Jan 13, 2019 at 3:06 PM Tom Lane wrote: > > There's still a logical problem here, which is that in order to > prove that the ScalarArrayOpExpr yields NULL when its LHS does, > you also have to prove that the RHS is not an empty array. > > Otherwise you're up against the fact that the OR of zero boolean > expressions is false not null: > > regression=# select null::int = any(array[]::int[]); > ?column? > -- > f > (1 row) I've constructed a test case running the test_predtest function on the expression "select x is null, x = any(array[]::int[]) from integers" which I think gets at the logic bug you're noting. In doing this I noticed something I don't fully understand: on master this shows that "x = any(array[]::int[])" proves "x is null" (while my patch shows the opposite). Is this because the second expression being false means there's can be no valid value for x? It also claims it refutes the same, which is even more confusing. > While this is doable when the RHS is an array constant or ArrayExpr, I'm > afraid it's likely to be a bit tedious. (Hm, actually predicate_classify > already has logic to extract the number of elements in these cases ... > I wonder if there's any use in trying to share code?) It seems to me that this would mean that an IS NOT NULL index would still not be proven to be usable for a non-constant array (e.g., from a subquery calculated array), and in fact I've included a (failing) test demonstrating that fact in the attached patch. This feeds into my question above about there possibly being another corollary/proof that could be added for this case. Semi-related: I noticed that predicate_classify handles an ArrayExpr by using list_length; this seems unnecessary if we know we can fail fast. I realize it's not a common problem, but I have seen extremely long arrays before, and maybe we should fall out of the count once we hit max+1 items. My new code also has this "problem", but I figured it was worth getting this logically correct before attempting to optimize that especially since it already exists in one place. > Minor stylistic quibble: I don't like where you inserted the new code in > clause_is_strict_for, because it renders the comment a few lines above > that unrelatable to the code. I'd put it at the bottom, after the > is_funcclause stanza. Fixed in this revision. James Coleman saop_is_not_null-v4.patch Description: Binary data
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > [ saop_is_not_null-v3.patch ] There's still a logical problem here, which is that in order to prove that the ScalarArrayOpExpr yields NULL when its LHS does, you also have to prove that the RHS is not an empty array. Otherwise you're up against the fact that the OR of zero boolean expressions is false not null: regression=# select null::int = any(array[]::int[]); ?column? -- f (1 row) While this is doable when the RHS is an array constant or ArrayExpr, I'm afraid it's likely to be a bit tedious. (Hm, actually predicate_classify already has logic to extract the number of elements in these cases ... I wonder if there's any use in trying to share code?) Minor stylistic quibble: I don't like where you inserted the new code in clause_is_strict_for, because it renders the comment a few lines above that unrelatable to the code. I'd put it at the bottom, after the is_funcclause stanza. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Sat, Jan 12, 2019 at 8:52 PM David Rowley wrote: > > Basically, the planner assumes that the WHERE a IS NOT NULL index > implies WHERE b IN(1,...,101), which is definitely not the case. > > Probably your new code needs to be expanded to become: > > if (IsA(clause, ScalarArrayOpExpr)) > { > ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause; > if (op_strict(saop->opno) && > clause_is_strict_for((Node *) linitial(saop->args), subexpr)) > return true; > } I've updated the code as well as included the sad path test in addition to my original happy path test. Patch v3 attached. James Coleman saop_is_not_null-v3.patch Description: Binary data
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Sun, 13 Jan 2019 at 14:49, David Rowley wrote: > I've not looked in detail, but perhaps the place to put the tests are > in src/test/modules/test_predtest. This module adds a function named > test_predtest() that you can likely use more easily than trying to > EXPLAIN plans and hoping the planner's behaviour helps to exhibit the > behaviour of the changed code. I just noticed that I accidentally reviewed v1 instead of v2. I see you found that module. I'll keep this as waiting on author until the other bug is fixed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Sun, 11 Nov 2018 at 10:33, James Coleman wrote: > At first I was imagining having the parse keep track of whether an array const > expr contained any nulls and perhaps adding generated quals (in an equivalence > class?) to allow the planner to easily prove the index was correct. I'd been > going down this track because in my mind the issue was because the planner > needed to verify whether all of the array elements were not null. > > But as I started to dig into the predtest.c NOT NULL proofs and add test > cases, > I realized that at least in many normal op cases we can safely infer that foo > is not null when "foo " is true even if the array contains null > elements. > > This is such a simple change that it seems like I must be missing a case where > the above doesn't hold true, but I can't immediately think of any, and indeed > with the attached patch all existing tests pass (including some additional > ones I added for predtest to play around with it). > > Am I missing something obvious? Is this a valid approach? I started looking at this patch and I see that there's no check to ensure that the IS NOT NULL pred matches the left operand in the IN() clause. Here's a demo of why this is bad: create table t (a int, b int); create index t_a_is_not_null on t(a) where a is not null; insert into t values(null,1); -- not indexed set enable_seqscan=0; -- force index to be used whenever possible select * from t where b in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101); -- row missing! a | b ---+--- (0 rows) explain select * from t where b in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98 ,99,100,101); -- should not use t_a_is_not_null index. QUERY PLAN -- Bitmap Heap Scan on t (cost=4.42..320.84 rows=1141 width=8) Recheck Cond: (a IS NOT NULL) Filter: (b = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101}'::integer[])) -> Bitmap Index Scan on t_a_is_not_null (cost=0.00..4.13 rows=2249 width=0) (4 rows) set enable_seqscan=1; SET select * from t where b in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101); -- Correct! a | b ---+--- | 1 (1 row) Basically, the planner assumes that the WHERE a IS NOT NULL index implies WHERE b IN(1,...,101), which is definitely not the case. Probably your new code needs to be expanded to become: if (IsA(clause, ScalarArrayOpExpr)) { ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause; if (op_strict(saop->opno) && clause_is_strict_for((Node *) linitial(saop->args), subexpr)) return true; } > Other outstanding questions: > > Should I add additional tests for predtest? It already seems to cover some > null > test cases with scalar array ops, but I'd be happy to add more if desired. > > Should I add a test case for the resulting plan with "foo IN (...)" with an > array with more than 100 elements? I've not looked in detail, but perhaps the place to put the tests are in src/test/modules/test_predtest. This module adds a function named test_predtest() that you can likely use more easily than trying to EXPLAIN plans and hoping the planner's behaviour helps to exhibit the behaviour of the changed code. I'll mark this as waiting on author. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > Is there anything I can do to solicit reviewers for the current CF? There is no active CF, which is why nothing is happening. We'll start looking at the 2019-01 items in January. Right now, people are either on vacation or working on their own patches. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
Is there anything I can do to solicit reviewers for the current CF? The patch is quite small and should be fairly simple to review. - James
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
Also, my apologies for top posting; I forgot to remove the old email before clicking send.
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
I've become more confident that this approach is correct after discussions with others on my team and have added the patch to the open commitfest. I'm attaching v2 of the patch here with a regression test (that fails on current master, but is green both with my patch and with current master if you remove items from the test array to make the array <= 100 items) as well as a comment detailing the reasoning in predtest.c. On Sat, Nov 10, 2018 at 4:33 PM James Coleman wrote: > > I've recently been investigating improving our plans for queries like: > SELECT * FROM t WHERE t.foo IN (1, 2..1000); > where the table "t" has a partial index on "foo" where "foo IS NOT NULL". > > Currently the planner generates an index [only] scan so long as the number of > items in the IN expression is <= 100, but as soon as you add the 101st item > it reverts to seq scan. If we add the explicit null check like: > SELECT * FROM t WHERE t.foo IN (1, 2..1000) AND foo IS NOT NULL; > then we go back to the desired index scan. > > This occurs because predtest.c stops expanding ScalarArrayOpExpr's with > constant array arguments into OR trees when the array size is > 100. The rest > of the predicate proving code then becomes unable to infer that foo is not > null > and therefore the planner cannot prove that the partial index is correct to > use. > > (Please pardon technical details in the below background that may be way off; > I don't have a lot of experience with the Postgres codebase yet, and am still > trying to build a mental model of things.) > > At first I was imagining having the parse keep track of whether an array const > expr contained any nulls and perhaps adding generated quals (in an equivalence > class?) to allow the planner to easily prove the index was correct. I'd been > going down this track because in my mind the issue was because the planner > needed to verify whether all of the array elements were not null. > > But as I started to dig into the predtest.c NOT NULL proofs and add test > cases, > I realized that at least in many normal op cases we can safely infer that foo > is not null when "foo " is true even if the array contains null > elements. > > This is such a simple change that it seems like I must be missing a case where > the above doesn't hold true, but I can't immediately think of any, and indeed > with the attached patch all existing tests pass (including some additional > ones I added for predtest to play around with it). > > Am I missing something obvious? Is this a valid approach? > > > Other outstanding questions: > > Should I add additional tests for predtest? It already seems to cover some > null > test cases with scalar array ops, but I'd be happy to add more if desired. > > Should I add a test case for the resulting plan with "foo IN (...)" with an > array with more than 100 elements? > > Thanks, > James Coleman saop_is_not_null-v2.patch Description: Binary data
Proving IS NOT NULL inference for ScalarArrayOpExpr's
I've recently been investigating improving our plans for queries like: SELECT * FROM t WHERE t.foo IN (1, 2..1000); where the table "t" has a partial index on "foo" where "foo IS NOT NULL". Currently the planner generates an index [only] scan so long as the number of items in the IN expression is <= 100, but as soon as you add the 101st item it reverts to seq scan. If we add the explicit null check like: SELECT * FROM t WHERE t.foo IN (1, 2..1000) AND foo IS NOT NULL; then we go back to the desired index scan. This occurs because predtest.c stops expanding ScalarArrayOpExpr's with constant array arguments into OR trees when the array size is > 100. The rest of the predicate proving code then becomes unable to infer that foo is not null and therefore the planner cannot prove that the partial index is correct to use. (Please pardon technical details in the below background that may be way off; I don't have a lot of experience with the Postgres codebase yet, and am still trying to build a mental model of things.) At first I was imagining having the parse keep track of whether an array const expr contained any nulls and perhaps adding generated quals (in an equivalence class?) to allow the planner to easily prove the index was correct. I'd been going down this track because in my mind the issue was because the planner needed to verify whether all of the array elements were not null. But as I started to dig into the predtest.c NOT NULL proofs and add test cases, I realized that at least in many normal op cases we can safely infer that foo is not null when "foo " is true even if the array contains null elements. This is such a simple change that it seems like I must be missing a case where the above doesn't hold true, but I can't immediately think of any, and indeed with the attached patch all existing tests pass (including some additional ones I added for predtest to play around with it). Am I missing something obvious? Is this a valid approach? Other outstanding questions: Should I add additional tests for predtest? It already seems to cover some null test cases with scalar array ops, but I'd be happy to add more if desired. Should I add a test case for the resulting plan with "foo IN (...)" with an array with more than 100 elements? Thanks, James Coleman saop_is_not_null-v1.patch Description: Binary data