[jira] [Created] (CALCITE-6363) Introduce a rule to derive more filters from inner join condition

2024-04-13 Thread ruanhui (Jira)
ruanhui created CALCITE-6363:


 Summary: Introduce a rule to derive more filters from inner join 
condition
 Key: CALCITE-6363
 URL: https://issues.apache.org/jira/browse/CALCITE-6363
 Project: Calcite
  Issue Type: New Feature
  Components: core
Reporter: ruanhui


Sometimes we can infer more predicates from inner Join , for example, in the 
query
SELECT * FROM ta INNER JOIN tb ON ta.x = tb.y WHERE ta.x > 10
we can infer condition tb.y > 10 and we can push it down to the table tb.
In this way, it is possible to reduce the amount of data involved in the Join.

To achieve this, here is my idea.
The core data strucature is two Multimap:
predicateMap : a map for inputRef to corresponding predicate such as: $1 -> [$1 
> 10, $1 < 20, $1 = $2]
equivalenceMap : a map for inputRef to corresponding equivalent values or 
inputRefs such as: $1 -> [$2, 1]

The filter derivation is divided into 4 steps:
1. construct predicate map and equivalence map by traversing all conjunctions 
in the condition
2. search map and rewrite predicates with equivalent inputRefs or literals
2.1 find all inputRefs that are equivalent to the current inputRef, and then 
rewrite all predicates involving equivalent inputRefs using inputRef, for 
example if we have inputRef $1 = equivInputRef $2, then we can rewrite \{$2 = 
10} to \{$1 = 10}.
2.2 find all predicates involving current inputRef. If any predicate refers to 
another inputRef, rewrite the predicate with the literal/constant equivalent to 
that inputRef, such as: if we have inputRef \{$1 > $2} and \{$2 = 10} then we 
can infer new condition \{$1 > 10}.
2.3 derive new predicates based on equivalence relation in equivalenceMultimap
3. compose all original predicates and derived predicates
4. simplify expression such as range merging, like \{$1 > 10 AND $1 > 20} => 
\{$1 > 20}, \{$1 > $2 AND $1 > $2} => \{$1 > $2}

Anyone interested in this, please feel free to comment on this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [jira] [Created] (CALCITE-6363) Introduce a rule to derive more filters from inner join condition

2024-04-15 Thread James Starr
The keyword I think you want is transitive filter pushdown.  The reduce
expression rule handles some of the trivial cases outlined as examples.
Also, you will need to simplify the pushed down filters after they are
extracted to prevent infinite loops.

Ideally, for the equivalenceMap, an arbitrary subtree that only
references a single side of the join could be used.

Example 1:
SELECT *
FROM t1, t2
WHERE subString(t1.zip, 0, 6) = subString(t2.zip, 0, 6)
  AND subString(t1.zip, 0, 6) IN ()

On Sat, Apr 13, 2024 at 1:24 AM ruanhui (Jira)  wrote:

> ruanhui created CALCITE-6363:
> 
>
>  Summary: Introduce a rule to derive more filters from inner
> join condition
>  Key: CALCITE-6363
>  URL: https://issues.apache.org/jira/browse/CALCITE-6363
>  Project: Calcite
>   Issue Type: New Feature
>   Components: core
> Reporter: ruanhui
>
>
> Sometimes we can infer more predicates from inner Join , for example, in
> the query
> SELECT * FROM ta INNER JOIN tb ON ta.x = tb.y WHERE ta.x > 10
> we can infer condition tb.y > 10 and we can push it down to the table tb.
> In this way, it is possible to reduce the amount of data involved in the
> Join.
>
> To achieve this, here is my idea.
> The core data strucature is two Multimap:
> predicateMap : a map for inputRef to corresponding predicate such as: $1
> -> [$1 > 10, $1 < 20, $1 = $2]
> equivalenceMap : a map for inputRef to corresponding equivalent values or
> inputRefs such as: $1 -> [$2, 1]
>
> The filter derivation is divided into 4 steps:
> 1. construct predicate map and equivalence map by traversing all
> conjunctions in the condition
> 2. search map and rewrite predicates with equivalent inputRefs or literals
> 2.1 find all inputRefs that are equivalent to the current inputRef, and
> then rewrite all predicates involving equivalent inputRefs using inputRef,
> for example if we have inputRef $1 = equivInputRef $2, then we can rewrite
> \{$2 = 10} to \{$1 = 10}.
> 2.2 find all predicates involving current inputRef. If any predicate
> refers to another inputRef, rewrite the predicate with the literal/constant
> equivalent to that inputRef, such as: if we have inputRef \{$1 > $2} and
> \{$2 = 10} then we can infer new condition \{$1 > 10}.
> 2.3 derive new predicates based on equivalence relation in
> equivalenceMultimap
> 3. compose all original predicates and derived predicates
> 4. simplify expression such as range merging, like \{$1 > 10 AND $1 > 20}
> => \{$1 > 20}, \{$1 > $2 AND $1 > $2} => \{$1 > $2}
>
> Anyone interested in this, please feel free to comment on this issue.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.20.10#820010)
>


Re: [jira] [Created] (CALCITE-6363) Introduce a rule to derive more filters from inner join condition

2024-04-15 Thread Julian Hyde
James,

Thanks for chiming in. I added your comments to the jira case.

Julian


> On Apr 15, 2024, at 12:58 PM, James Starr  wrote:
> 
> The keyword I think you want is transitive filter pushdown.  The reduce
> expression rule handles some of the trivial cases outlined as examples.
> Also, you will need to simplify the pushed down filters after they are
> extracted to prevent infinite loops.
> 
> Ideally, for the equivalenceMap, an arbitrary subtree that only
> references a single side of the join could be used.
> 
> Example 1:
> SELECT *
> FROM t1, t2
> WHERE subString(t1.zip, 0, 6) = subString(t2.zip, 0, 6)
>  AND subString(t1.zip, 0, 6) IN ()
> 
> On Sat, Apr 13, 2024 at 1:24 AM ruanhui (Jira)  wrote:
> 
>> ruanhui created CALCITE-6363:
>> 
>> 
>> Summary: Introduce a rule to derive more filters from inner
>> join condition
>> Key: CALCITE-6363
>> URL: https://issues.apache.org/jira/browse/CALCITE-6363
>> Project: Calcite
>>  Issue Type: New Feature
>>  Components: core
>>Reporter: ruanhui
>> 
>> 
>> Sometimes we can infer more predicates from inner Join , for example, in
>> the query
>> SELECT * FROM ta INNER JOIN tb ON ta.x = tb.y WHERE ta.x > 10
>> we can infer condition tb.y > 10 and we can push it down to the table tb.
>> In this way, it is possible to reduce the amount of data involved in the
>> Join.
>> 
>> To achieve this, here is my idea.
>> The core data strucature is two Multimap:
>> predicateMap : a map for inputRef to corresponding predicate such as: $1
>> -> [$1 > 10, $1 < 20, $1 = $2]
>> equivalenceMap : a map for inputRef to corresponding equivalent values or
>> inputRefs such as: $1 -> [$2, 1]
>> 
>> The filter derivation is divided into 4 steps:
>> 1. construct predicate map and equivalence map by traversing all
>> conjunctions in the condition
>> 2. search map and rewrite predicates with equivalent inputRefs or literals
>> 2.1 find all inputRefs that are equivalent to the current inputRef, and
>> then rewrite all predicates involving equivalent inputRefs using inputRef,
>> for example if we have inputRef $1 = equivInputRef $2, then we can rewrite
>> \{$2 = 10} to \{$1 = 10}.
>> 2.2 find all predicates involving current inputRef. If any predicate
>> refers to another inputRef, rewrite the predicate with the literal/constant
>> equivalent to that inputRef, such as: if we have inputRef \{$1 > $2} and
>> \{$2 = 10} then we can infer new condition \{$1 > 10}.
>> 2.3 derive new predicates based on equivalence relation in
>> equivalenceMultimap
>> 3. compose all original predicates and derived predicates
>> 4. simplify expression such as range merging, like \{$1 > 10 AND $1 > 20}
>> => \{$1 > 20}, \{$1 > $2 AND $1 > $2} => \{$1 > $2}
>> 
>> Anyone interested in this, please feel free to comment on this issue.
>> 
>> 
>> 
>> --
>> This message was sent by Atlassian Jira
>> (v8.20.10#820010)
>>