Thank you for volunteering to review the DERBY-1633 changes, Rick! I welcome and appreciate your comments.

Rick Hillegas (JIRA) wrote:

I lost the thread of your explanation when I got to OBS#2 in DERBY-1633_v2.html. I did not understand this statement: "When a predicate is pushed to a UNION that appears in a Select list, the predicate will be "remapped" so that the column references point to their source result columns." I did not grasp how the problem
was caused by the UNION. Why does a UNION require this mapping?

Great question. I think the answer to this is (intended to be) contained in the line that follows the one you quoted. If we look at the OBS#2, we see the following:

<begin quote>

OBS#2: When a predicate is pushed to a UNION that appears in a Select list, the predicate will be "remapped" so that the column references point to their source result columns. This remapping occurs as part of the "pushOptPredicate()" method in ProjectRestrictNode, which will be called because all items in a Select's FROM list are required to have ProjectRestrictNodes above them (we add them at bind time).

<end quote>

So you were definitely on the right track: it's not the fact that we have a UNION node that necessitates the remapping. Rather, the remapping is required because the UNION node appears in the FROM list of a SELECT node, which means that there is a ProjectRestrictNode (PRN) above it. When the predicate is pushed to the ProjectRestrictNode, it will then be remapped as part of the pushing process.

Instead, it seemed to me that the remapping was required by what
was underneath the UNION.

Technically, as per comments above, the remapping is required by what is immediately *above* the UNION node--namely, the ProjectRestrictNode.

In this case, it is a Permuting Node (that's a term I just made up),
that is, an operator which changes column order.

Again, this is along the right lines. I think if you replace "Permuting Node" with "ProjectRestrictNode", this sentence is then correct. The predicate is being pushed to the ProjectRestrictNode that sits above the UnionNode, and the action of pushing the predicate to the PRN causes it (the predicate) to be remapped.

It seems to me that positional remapping is required for other predicates that we try to push down, even if no UNION is involved.

Yes, correct again. All items in a FROM list have a ProjectRestrictNode above them, so if we push a predicate to one of those PRNs, then the predicate is going to be remapped. This behavior is not specific to UnionNodes, it is specific to ProjectRestrictNodes. For the example in the DERBY-1633 document, it just so happens that the PRN's child is a UnionNode, but the same predicate remapping will happen if the PRN's child is any other kind of ResultSet node, as well.

That said, the PRN will then give its child a chance to push the predicate further down the tree. For most types of children this subsequent push will not happen, so the predicate will remain at the level of the PRN and that's where it will be enforced. For UnionNodes, though, it *is* possible to the push the predicate further down--that's what the work for DERBY-805 does. In order to complete that push, though, the predicates have to be "scoped" to the UnionNode's left and right children. This scoping operation is what is at issue for the regression described in DERBY-1633.

For instance, I think we need positional remapping in the following
cases. And this brings me to the heart of my confusion: Why does predicate
pushdown work in the following cases--or does it? If it does work, how is
it that UNIONs break the logic? Here are some cases that come to mind:

Again, great questions. To begin with, I should mention that for the two queries you show, Derby will actually flatten the subqueries into the top result set, effectively changing them into the following queries:

select *
from t1,
  ( select b as x, a as y from t2) v2
where t1.a = v2.y

effectively becomes

select *
from
  t1,
  t2
where t1.a = t2.a

and

select *
from ( select b as x, a as y from t2) v2
where v2.y = 1

effectively becomes

select *
from
  t2
where t2.a = 1

I assume, though, that you didn't intend for subquery flattening to occur when you asked your question, so let's pretend it doesn't happen. Or better, let's assume there's a "DISTINCT" keyword in the subqueries, which means that Derby will not flatten them. Then the answer to your questions are as follows:

Why does predicate pushdown work in the following cases--or does it?

Yes, predicate pushdown will work in the example queries. As mentioned above, the predicates are pushed to the ProjectRestrictNodes that sit above the FROM tables. So in the first example, the predicate will be pushed to a PRN that sits above a SelectNode that represents "v2". As part of that pushing, the predicate will be remapped. The PRN will then give its child--the SelectNode--a chance to push the predicate further down the tree, but the SelectNode will not do so (the code path to show this is a bit tricky; if you want me to explain it I can, but for now I'm just skipping it for brevity). Thus the predicate will remain at its pushed position in the PRN, and that's where it will be enforced.

The same is true for the second example, with one minor difference: since the predicate is one-sided, it will be pushed to the PRN above v2 as part of pre-processing; i.e. before optimization begins. But other than that minor detail, the behavior will be the same as for the first example: namely, the predicate will be remapped when it is pushed to the PRN and then that's where it will be enforced.

If it does work, how is it that UNIONs break the logic?

The case of UNIONs is unique for two reasons. First, as mentioned above, a UnionNode is one of few types of nodes that will take a predicate and attempt to push it further down the tree. This means that the predicate is no longer going to be enforced by the PRN above the UnionNode; instead, the UnionNode's children--and/or the UnionNode itself--become(s) responsible for enforcing the predicate.

The second difference is that, as part of the process for pushing the predicate further down the tree--i.e. to the UnionNode's children--we have to do the extra work of scoping the predicate to each child. It is this scoping operation--which currently *only* occurs for UnionNodes, per DERBY-805--that makes the case of UNIONs unique.

All of that said, "OBS#2" in the write-up for DERBY-1633 is basically saying that "since the predicate was remapped before we got here," (i.e. as part of the logic to push the predicate to the PRN above the UnionNode) "the column references are already pointing to their source result sets." This fact then influences the behavior of the scoping operation, and that behavior is what is at issue for DERBY-1633.

Hopefully that answers your question--and my apologies if that was too much information. If there are still things that aren't clear from my explanation, please feel free to ask more. I'll try to answer as best as I can...

Thanks for taking the time to review the document. I know it's a lot of detail, so I definitely appreciate the time of those who are willing to read it over and provide feedback...

Army

Reply via email to