Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-19 Thread Yanjing Wang
A simple example is converting *SELECT a, b FROM t WHERE c = 1 UNION
ALL SELECT a, b FROM t WHERE c = 2 *to *SELECT a, b FROM t WHERE c in (1,
2)*

Yanjing Wang  于2022年1月19日周三 18:35写道:

> Hi, community
>
> Here I recommend a new rule for converting UNION ALL sub plan to a single
> input with an OR filter, the following is its conversion diagram.
> [image: UnionAllToOrRule.jpg]
>
> The conversion prerequisites are
> 1. left filter range has no intersection with right filter range.
> 2. Project and Input Sub Tree must be identical.
>
> The rule will be used when Input Sub Tree is a computing-intensive or
> large IO operation.
>
> I don't know whether the community supports it or not, any suggestions
> will be appreciated.
>
>
>


Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-19 Thread Julian Hyde
It sounds useful.

What do you think the rule should be called? UnionFilterTransposeRule, perhaps?

A challenge when writing the rule will be to ensure that all of the inputs to 
the Union are the same. The Volcano framework is not very good at that.

You should be careful of the case that the conditions overlap. For example, the 
rewrite

  SELECT * FROM Emp WHERE deptno < 30
  UNION ALL
  SELECT * FROM Emp WHERE deptno IN (25, 35, 45)

to

  SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45) 

Is not valid, because rows with deptno = 25 will appear twice in the first 
query, once in the second. Maybe that problem does not occur when applied to 
UNION than when applied to UNION ALL.

There would seem to be analogous rules for INTERSECT (combine the conditions 
using AND) and EXCEPT (combine the conditions using AND NOT). Perhaps one rule 
could cover all set operations (see FilterSetOpTransposeRule).

Julian



> On Jan 19, 2022, at 2:38 AM, Yanjing Wang  wrote:
> 
> A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION ALL 
> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
> 
> Yanjing Wang mailto:zhuangzixiao...@gmail.com>> 
> 于2022年1月19日周三 18:35写道:
> Hi, community
> 
> Here I recommend a new rule for converting UNION ALL sub plan to a single 
> input with an OR filter, the following is its conversion diagram.
> 
>  
> The conversion prerequisites are 
> 1. left filter range has no intersection with right filter range.
> 2. Project and Input Sub Tree must be identical.
> 
> The rule will be used when Input Sub Tree is a computing-intensive or large 
> IO operation.
> 
> I don't know whether the community supports it or not, any suggestions will 
> be appreciated.
> 
> 



Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-19 Thread Julian Hyde
Can you log a Jira case for this?

I think you should make your rule work for N-way Union, not just 2-way Union. 
And I think you should make it work whether or not a Project is present.

> On Jan 19, 2022, at 1:25 PM, Julian Hyde  wrote:
> 
> It sounds useful.
> 
> What do you think the rule should be called? UnionFilterTransposeRule, 
> perhaps?
> 
> A challenge when writing the rule will be to ensure that all of the inputs to 
> the Union are the same. The Volcano framework is not very good at that.
> 
> You should be careful of the case that the conditions overlap. For example, 
> the rewrite
> 
>   SELECT * FROM Emp WHERE deptno < 30
>   UNION ALL
>   SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> 
> to
> 
>   SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45) 
> 
> Is not valid, because rows with deptno = 25 will appear twice in the first 
> query, once in the second. Maybe that problem does not occur when applied to 
> UNION than when applied to UNION ALL.
> 
> There would seem to be analogous rules for INTERSECT (combine the conditions 
> using AND) and EXCEPT (combine the conditions using AND NOT). Perhaps one 
> rule could cover all set operations (see FilterSetOpTransposeRule).
> 
> Julian
> 
> 
> 
>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang > > wrote:
>> 
>> A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION ALL 
>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
>> 
>> Yanjing Wang mailto:zhuangzixiao...@gmail.com>> 
>> 于2022年1月19日周三 18:35写道:
>> Hi, community
>> 
>> Here I recommend a new rule for converting UNION ALL sub plan to a single 
>> input with an OR filter, the following is its conversion diagram.
>> 
>>  
>> The conversion prerequisites are 
>> 1. left filter range has no intersection with right filter range.
>> 2. Project and Input Sub Tree must be identical.
>> 
>> The rule will be used when Input Sub Tree is a computing-intensive or large 
>> IO operation.
>> 
>> I don't know whether the community supports it or not, any suggestions will 
>> be appreciated.
>> 
>> 
> 



Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-19 Thread Justin Swanhart
Hi,

Note that this will negate the optimization that one usually is looking for
when writing such queries:

Select * from TAB where a = 1
UNION ALL
Select * from TAB where b = 1

In a database with indexes (most databases) this will allow indexes to be
used on both the a column and the b column.
Databases with bitmap indexes or without indexes would benefit from the
rule.

On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde  wrote:

> Can you log a Jira case for this?
>
> I think you should make your rule work for N-way Union, not just 2-way
> Union. And I think you should make it work whether or not a Project is
> present.
>
> > On Jan 19, 2022, at 1:25 PM, Julian Hyde  wrote:
> >
> > It sounds useful.
> >
> > What do you think the rule should be called? UnionFilterTransposeRule,
> perhaps?
> >
> > A challenge when writing the rule will be to ensure that all of the
> inputs to the Union are the same. The Volcano framework is not very good at
> that.
> >
> > You should be careful of the case that the conditions overlap. For
> example, the rewrite
> >
> >   SELECT * FROM Emp WHERE deptno < 30
> >   UNION ALL
> >   SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> >
> > to
> >
> >   SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> >
> > Is not valid, because rows with deptno = 25 will appear twice in the
> first query, once in the second. Maybe that problem does not occur when
> applied to UNION than when applied to UNION ALL.
> >
> > There would seem to be analogous rules for INTERSECT (combine the
> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
> Perhaps one rule could cover all set operations (see
> FilterSetOpTransposeRule).
> >
> > Julian
> >
> >
> >
> >> On Jan 19, 2022, at 2:38 AM, Yanjing Wang  > wrote:
> >>
> >> A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION ALL
> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
> >>
> >> Yanjing Wang  zhuangzixiao...@gmail.com>> 于2022年1月19日周三 18:35写道:
> >> Hi, community
> >>
> >> Here I recommend a new rule for converting UNION ALL sub plan to a
> single input with an OR filter, the following is its conversion diagram.
> >>
> >>
> >> The conversion prerequisites are
> >> 1. left filter range has no intersection with right filter range.
> >> 2. Project and Input Sub Tree must be identical.
> >>
> >> The rule will be used when Input Sub Tree is a computing-intensive or
> large IO operation.
> >>
> >> I don't know whether the community supports it or not, any suggestions
> will be appreciated.
> >>
> >>
> >
>
>


Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-19 Thread Julian Hyde
Justin,

For planning table or index scans, I would recommend using a single TableScan 
with a Filter that uses a Sarg, rather than using multiple TableScans connected 
by a Union. So I think this rule will be useful.

But I do agree that this proposed rule is not a “no brainer”. It may not do 
what people want/expect in all cases, and therefore it probably should not be 
enabled it by default.

Julian





> On Jan 19, 2022, at 3:38 PM, Justin Swanhart  wrote:
> 
> Hi,
> 
> Note that this will negate the optimization that one usually is looking for
> when writing such queries:
> 
> Select * from TAB where a = 1
> UNION ALL
> Select * from TAB where b = 1
> 
> In a database with indexes (most databases) this will allow indexes to be
> used on both the a column and the b column.
> Databases with bitmap indexes or without indexes would benefit from the
> rule.
> 
> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde  > wrote:
> 
>> Can you log a Jira case for this?
>> 
>> I think you should make your rule work for N-way Union, not just 2-way
>> Union. And I think you should make it work whether or not a Project is
>> present.
>> 
>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde  wrote:
>>> 
>>> It sounds useful.
>>> 
>>> What do you think the rule should be called? UnionFilterTransposeRule,
>> perhaps?
>>> 
>>> A challenge when writing the rule will be to ensure that all of the
>> inputs to the Union are the same. The Volcano framework is not very good at
>> that.
>>> 
>>> You should be careful of the case that the conditions overlap. For
>> example, the rewrite
>>> 
>>>  SELECT * FROM Emp WHERE deptno < 30
>>>  UNION ALL
>>>  SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
>>> 
>>> to
>>> 
>>>  SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
>>> 
>>> Is not valid, because rows with deptno = 25 will appear twice in the
>> first query, once in the second. Maybe that problem does not occur when
>> applied to UNION than when applied to UNION ALL.
>>> 
>>> There would seem to be analogous rules for INTERSECT (combine the
>> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
>> Perhaps one rule could cover all set operations (see
>> FilterSetOpTransposeRule).
>>> 
>>> Julian
>>> 
>>> 
>>> 
 On Jan 19, 2022, at 2:38 AM, Yanjing Wang > >> wrote:
 
 A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION ALL
>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
 
 Yanjing Wang mailto:zhuangzixiao...@gmail.com> 
 > zhuangzixiao...@gmail.com >> 于2022年1月19日周三 
>> 18:35写道:
 Hi, community
 
 Here I recommend a new rule for converting UNION ALL sub plan to a
>> single input with an OR filter, the following is its conversion diagram.
 
 
 The conversion prerequisites are
 1. left filter range has no intersection with right filter range.
 2. Project and Input Sub Tree must be identical.
 
 The rule will be used when Input Sub Tree is a computing-intensive or
>> large IO operation.
 
 I don't know whether the community supports it or not, any suggestions
>> will be appreciated.



Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-20 Thread Yanjing Wang
Thanks Julian and Justin.

What do you think the rule should be called? UnionFilterTransposeRule,
> perhaps?
>
SetOpFilterMergeRule?

 Maybe that problem does not occur when applied to UNION than when applied
> to UNION ALL.

Yes, This is very important.

There would seem to be analogous rules for INTERSECT (combine the
> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
>
Excellent extensions, all the three operators process logic are:

For UNION:
New Filter = left Filter *OR* right Filter.

For INTERSECT:
New Filter = left Filter *AND* right Filter.

For EXCEPT:
If left Filter, right Filter have no overlap, transform to left child tree,
Otherwise
New Filter = left Filter *AND NOT *right Filter.

For UNION ALL:
Add prerequisites:
left Filter, right Filter must have no overlap.

For INTERSECT ALL:
Add prerequisites:
If left Filter, right Filter have no overlap, transform to empty values.
Otherwise
the rule can't be applied.

For EXCEPT ALL:
same for  EXCEPT.


work for N-way Union, not just 2-way Union.
>
Yes, I will add tests for this.

 And I think you should make it work whether or not a Project is present.

Ok, It seems I need construct several different operand match trees for
plan.

Hi Julian, what do you think of the above process logic?

Julian Hyde  于2022年1月20日周四 10:18写道:

> Justin,
>
> For planning table or index scans, I would recommend using a single
> TableScan with a Filter that uses a Sarg, rather than using multiple
> TableScans connected by a Union. So I think this rule will be useful.
>
> But I do agree that this proposed rule is not a “no brainer”. It may not
> do what people want/expect in all cases, and therefore it probably should
> not be enabled it by default.
>
> Julian
>
>
>
>
>
> > On Jan 19, 2022, at 3:38 PM, Justin Swanhart 
> wrote:
> >
> > Hi,
> >
> > Note that this will negate the optimization that one usually is looking
> for
> > when writing such queries:
> >
> > Select * from TAB where a = 1
> > UNION ALL
> > Select * from TAB where b = 1
> >
> > In a database with indexes (most databases) this will allow indexes to be
> > used on both the a column and the b column.
> > Databases with bitmap indexes or without indexes would benefit from the
> > rule.
> >
> > On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde  > wrote:
> >
> >> Can you log a Jira case for this?
> >>
> >> I think you should make your rule work for N-way Union, not just 2-way
> >> Union. And I think you should make it work whether or not a Project is
> >> present.
> >>
> >>> On Jan 19, 2022, at 1:25 PM, Julian Hyde 
> wrote:
> >>>
> >>> It sounds useful.
> >>>
> >>> What do you think the rule should be called? UnionFilterTransposeRule,
> >> perhaps?
> >>>
> >>> A challenge when writing the rule will be to ensure that all of the
> >> inputs to the Union are the same. The Volcano framework is not very
> good at
> >> that.
> >>>
> >>> You should be careful of the case that the conditions overlap. For
> >> example, the rewrite
> >>>
> >>>  SELECT * FROM Emp WHERE deptno < 30
> >>>  UNION ALL
> >>>  SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> >>>
> >>> to
> >>>
> >>>  SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> >>>
> >>> Is not valid, because rows with deptno = 25 will appear twice in the
> >> first query, once in the second. Maybe that problem does not occur when
> >> applied to UNION than when applied to UNION ALL.
> >>>
> >>> There would seem to be analogous rules for INTERSECT (combine the
> >> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
> >> Perhaps one rule could cover all set operations (see
> >> FilterSetOpTransposeRule).
> >>>
> >>> Julian
> >>>
> >>>
> >>>
>  On Jan 19, 2022, at 2:38 AM, Yanjing Wang  >> >>
> wrote:
> 
>  A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION
> ALL
> >> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
> 
>  Yanjing Wang  zhuangzixiao...@gmail.com>  >> zhuangzixiao...@gmail.com >>
> 于2022年1月19日周三 18:35写道:
>  Hi, community
> 
>  Here I recommend a new rule for converting UNION ALL sub plan to a
> >> single input with an OR filter, the following is its conversion diagram.
> 
> 
>  The conversion prerequisites are
>  1. left filter range has no intersection with right filter range.
>  2. Project and Input Sub Tree must be identical.
> 
>  The rule will be used when Input Sub Tree is a computing-intensive or
> >> large IO operation.
> 
>  I don't know whether the community supports it or not, any suggestions
> >> will be appreciated.
>
>


Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-20 Thread Julian Hyde
Process logic sounds great. Please log a Jira case.

There’s a lot here, so it’s probably wise to split into a simple PR that just 
does the simple case (UNION ALL, no Project) and extend later.

> On Jan 20, 2022, at 7:07 AM, Yanjing Wang  wrote:
> 
> Thanks Julian and Justin.
> 
> What do you think the rule should be called? UnionFilterTransposeRule,
>> perhaps?
>> 
> SetOpFilterMergeRule?
> 
> Maybe that problem does not occur when applied to UNION than when applied
>> to UNION ALL.
> 
> Yes, This is very important.
> 
> There would seem to be analogous rules for INTERSECT (combine the
>> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
>> 
> Excellent extensions, all the three operators process logic are:
> 
> For UNION:
> New Filter = left Filter *OR* right Filter.
> 
> For INTERSECT:
> New Filter = left Filter *AND* right Filter.
> 
> For EXCEPT:
> If left Filter, right Filter have no overlap, transform to left child tree,
> Otherwise
> New Filter = left Filter *AND NOT *right Filter.
> 
> For UNION ALL:
> Add prerequisites:
> left Filter, right Filter must have no overlap.
> 
> For INTERSECT ALL:
> Add prerequisites:
> If left Filter, right Filter have no overlap, transform to empty values.
> Otherwise
> the rule can't be applied.
> 
> For EXCEPT ALL:
> same for  EXCEPT.
> 
> 
> work for N-way Union, not just 2-way Union.
>> 
> Yes, I will add tests for this.
> 
> And I think you should make it work whether or not a Project is present.
> 
> Ok, It seems I need construct several different operand match trees for
> plan.
> 
> Hi Julian, what do you think of the above process logic?
> 
> Julian Hyde mailto:jhyde.apa...@gmail.com>> 
> 于2022年1月20日周四 10:18写道:
> 
>> Justin,
>> 
>> For planning table or index scans, I would recommend using a single
>> TableScan with a Filter that uses a Sarg, rather than using multiple
>> TableScans connected by a Union. So I think this rule will be useful.
>> 
>> But I do agree that this proposed rule is not a “no brainer”. It may not
>> do what people want/expect in all cases, and therefore it probably should
>> not be enabled it by default.
>> 
>> Julian
>> 
>> 
>> 
>> 
>> 
>>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> Note that this will negate the optimization that one usually is looking
>> for
>>> when writing such queries:
>>> 
>>> Select * from TAB where a = 1
>>> UNION ALL
>>> Select * from TAB where b = 1
>>> 
>>> In a database with indexes (most databases) this will allow indexes to be
>>> used on both the a column and the b column.
>>> Databases with bitmap indexes or without indexes would benefit from the
>>> rule.
>>> 
>>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde > >> wrote:
>>> 
 Can you log a Jira case for this?
 
 I think you should make your rule work for N-way Union, not just 2-way
 Union. And I think you should make it work whether or not a Project is
 present.
 
> On Jan 19, 2022, at 1:25 PM, Julian Hyde  >
>> wrote:
> 
> It sounds useful.
> 
> What do you think the rule should be called? UnionFilterTransposeRule,
 perhaps?
> 
> A challenge when writing the rule will be to ensure that all of the
 inputs to the Union are the same. The Volcano framework is not very
>> good at
 that.
> 
> You should be careful of the case that the conditions overlap. For
 example, the rewrite
> 
> SELECT * FROM Emp WHERE deptno < 30
> UNION ALL
> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> 
> to
> 
> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> 
> Is not valid, because rows with deptno = 25 will appear twice in the
 first query, once in the second. Maybe that problem does not occur when
 applied to UNION than when applied to UNION ALL.
> 
> There would seem to be analogous rules for INTERSECT (combine the
 conditions using AND) and EXCEPT (combine the conditions using AND NOT).
 Perhaps one rule could cover all set operations (see
 FilterSetOpTransposeRule).
> 
> Julian
> 
> 
> 
>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang > 
  
 > wrote:
>> 
>> A simple example is converting SELECT a, b FROM t WHERE c = 1 UNION
>> ALL
 SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c in (1, 2)
>> 
>> Yanjing Wang >  > zhuangzixiao...@gmail.com > >>> zhuangzixiao...@gmail.com >>
>> 于2022年1月19日周三 18:35写道:
>> Hi, community
>> 
>> Here I recommend a new rule for converting UNION ALL sub plan to a
 single input wi

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-20 Thread Yanjing Wang
Hi Julian, I have logged a jira CALCITE-4990
 for this new rule,
later I will submit a pr for this.

Julian Hyde  于2022年1月21日周五 01:37写道:

> Process logic sounds great. Please log a Jira case.
>
> There’s a lot here, so it’s probably wise to split into a simple PR that
> just does the simple case (UNION ALL, no Project) and extend later.
>
> > On Jan 20, 2022, at 7:07 AM, Yanjing Wang 
> wrote:
> >
> > Thanks Julian and Justin.
> >
> > What do you think the rule should be called? UnionFilterTransposeRule,
> >> perhaps?
> >>
> > SetOpFilterMergeRule?
> >
> > Maybe that problem does not occur when applied to UNION than when applied
> >> to UNION ALL.
> >
> > Yes, This is very important.
> >
> > There would seem to be analogous rules for INTERSECT (combine the
> >> conditions using AND) and EXCEPT (combine the conditions using AND NOT).
> >>
> > Excellent extensions, all the three operators process logic are:
> >
> > For UNION:
> > New Filter = left Filter *OR* right Filter.
> >
> > For INTERSECT:
> > New Filter = left Filter *AND* right Filter.
> >
> > For EXCEPT:
> > If left Filter, right Filter have no overlap, transform to left child
> tree,
> > Otherwise
> > New Filter = left Filter *AND NOT *right Filter.
> >
> > For UNION ALL:
> > Add prerequisites:
> > left Filter, right Filter must have no overlap.
> >
> > For INTERSECT ALL:
> > Add prerequisites:
> > If left Filter, right Filter have no overlap, transform to empty values.
> > Otherwise
> > the rule can't be applied.
> >
> > For EXCEPT ALL:
> > same for  EXCEPT.
> >
> >
> > work for N-way Union, not just 2-way Union.
> >>
> > Yes, I will add tests for this.
> >
> > And I think you should make it work whether or not a Project is present.
> >
> > Ok, It seems I need construct several different operand match trees for
> > plan.
> >
> > Hi Julian, what do you think of the above process logic?
> >
> > Julian Hyde mailto:jhyde.apa...@gmail.com>>
> 于2022年1月20日周四 10:18写道:
> >
> >> Justin,
> >>
> >> For planning table or index scans, I would recommend using a single
> >> TableScan with a Filter that uses a Sarg, rather than using multiple
> >> TableScans connected by a Union. So I think this rule will be useful.
> >>
> >> But I do agree that this proposed rule is not a “no brainer”. It may not
> >> do what people want/expect in all cases, and therefore it probably
> should
> >> not be enabled it by default.
> >>
> >> Julian
> >>
> >>
> >>
> >>
> >>
> >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart 
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Note that this will negate the optimization that one usually is looking
> >> for
> >>> when writing such queries:
> >>>
> >>> Select * from TAB where a = 1
> >>> UNION ALL
> >>> Select * from TAB where b = 1
> >>>
> >>> In a database with indexes (most databases) this will allow indexes to
> be
> >>> used on both the a column and the b column.
> >>> Databases with bitmap indexes or without indexes would benefit from the
> >>> rule.
> >>>
> >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde  >> >> wrote:
> >>>
>  Can you log a Jira case for this?
> 
>  I think you should make your rule work for N-way Union, not just 2-way
>  Union. And I think you should make it work whether or not a Project is
>  present.
> 
> > On Jan 19, 2022, at 1:25 PM, Julian Hyde  >
> >> wrote:
> >
> > It sounds useful.
> >
> > What do you think the rule should be called?
> UnionFilterTransposeRule,
>  perhaps?
> >
> > A challenge when writing the rule will be to ensure that all of the
>  inputs to the Union are the same. The Volcano framework is not very
> >> good at
>  that.
> >
> > You should be careful of the case that the conditions overlap. For
>  example, the rewrite
> >
> > SELECT * FROM Emp WHERE deptno < 30
> > UNION ALL
> > SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> >
> > to
> >
> > SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> >
> > Is not valid, because rows with deptno = 25 will appear twice in the
>  first query, once in the second. Maybe that problem does not occur
> when
>  applied to UNION than when applied to UNION ALL.
> >
> > There would seem to be analogous rules for INTERSECT (combine the
>  conditions using AND) and EXCEPT (combine the conditions using AND
> NOT).
>  Perhaps one rule could cover all set operations (see
>  FilterSetOpTransposeRule).
> >
> > Julian
> >
> >
> >
> >> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
> zhuangzixiao...@gmail.com 
>  
>  >> wrote:
> >>
> >> A simple example is converting SELECT a, 

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-22 Thread Albert
On a more general fusion case, you could probably refer to osdi paper
"resin general sub query fusion" which has chapters talking about general
union fusion that deals with the case Julian pointed out.

Btw, per my understanding cases of union fusion is not that many in
practice. Curious to know the real cases

在 2022年1月21日星期五,Yanjing Wang  写道:

> Hi Julian, I have logged a jira CALCITE-4990
>  for this new rule,
> later I will submit a pr for this.
>
> Julian Hyde  于2022年1月21日周五 01:37写道:
>
> > Process logic sounds great. Please log a Jira case.
> >
> > There’s a lot here, so it’s probably wise to split into a simple PR that
> > just does the simple case (UNION ALL, no Project) and extend later.
> >
> > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang 
> > wrote:
> > >
> > > Thanks Julian and Justin.
> > >
> > > What do you think the rule should be called? UnionFilterTransposeRule,
> > >> perhaps?
> > >>
> > > SetOpFilterMergeRule?
> > >
> > > Maybe that problem does not occur when applied to UNION than when
> applied
> > >> to UNION ALL.
> > >
> > > Yes, This is very important.
> > >
> > > There would seem to be analogous rules for INTERSECT (combine the
> > >> conditions using AND) and EXCEPT (combine the conditions using AND
> NOT).
> > >>
> > > Excellent extensions, all the three operators process logic are:
> > >
> > > For UNION:
> > > New Filter = left Filter *OR* right Filter.
> > >
> > > For INTERSECT:
> > > New Filter = left Filter *AND* right Filter.
> > >
> > > For EXCEPT:
> > > If left Filter, right Filter have no overlap, transform to left child
> > tree,
> > > Otherwise
> > > New Filter = left Filter *AND NOT *right Filter.
> > >
> > > For UNION ALL:
> > > Add prerequisites:
> > > left Filter, right Filter must have no overlap.
> > >
> > > For INTERSECT ALL:
> > > Add prerequisites:
> > > If left Filter, right Filter have no overlap, transform to empty
> values.
> > > Otherwise
> > > the rule can't be applied.
> > >
> > > For EXCEPT ALL:
> > > same for  EXCEPT.
> > >
> > >
> > > work for N-way Union, not just 2-way Union.
> > >>
> > > Yes, I will add tests for this.
> > >
> > > And I think you should make it work whether or not a Project is
> present.
> > >
> > > Ok, It seems I need construct several different operand match trees for
> > > plan.
> > >
> > > Hi Julian, what do you think of the above process logic?
> > >
> > > Julian Hyde mailto:jhyde.apa...@gmail.com>>
> > 于2022年1月20日周四 10:18写道:
> > >
> > >> Justin,
> > >>
> > >> For planning table or index scans, I would recommend using a single
> > >> TableScan with a Filter that uses a Sarg, rather than using multiple
> > >> TableScans connected by a Union. So I think this rule will be useful.
> > >>
> > >> But I do agree that this proposed rule is not a “no brainer”. It may
> not
> > >> do what people want/expect in all cases, and therefore it probably
> > should
> > >> not be enabled it by default.
> > >>
> > >> Julian
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart 
> > >> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> Note that this will negate the optimization that one usually is
> looking
> > >> for
> > >>> when writing such queries:
> > >>>
> > >>> Select * from TAB where a = 1
> > >>> UNION ALL
> > >>> Select * from TAB where b = 1
> > >>>
> > >>> In a database with indexes (most databases) this will allow indexes
> to
> > be
> > >>> used on both the a column and the b column.
> > >>> Databases with bitmap indexes or without indexes would benefit from
> the
> > >>> rule.
> > >>>
> > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde  > >> >>
> wrote:
> > >>>
> >  Can you log a Jira case for this?
> > 
> >  I think you should make your rule work for N-way Union, not just
> 2-way
> >  Union. And I think you should make it work whether or not a Project
> is
> >  present.
> > 
> > > On Jan 19, 2022, at 1:25 PM, Julian Hyde  > >
> > >> wrote:
> > >
> > > It sounds useful.
> > >
> > > What do you think the rule should be called?
> > UnionFilterTransposeRule,
> >  perhaps?
> > >
> > > A challenge when writing the rule will be to ensure that all of the
> >  inputs to the Union are the same. The Volcano framework is not very
> > >> good at
> >  that.
> > >
> > > You should be careful of the case that the conditions overlap. For
> >  example, the rewrite
> > >
> > > SELECT * FROM Emp WHERE deptno < 30
> > > UNION ALL
> > > SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> > >
> > > to
> > >
> > > SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> > >
> > > Is not valid, because rows with deptno = 25 will appear twice in
> the
> >  first query, once in the second. Maybe that problem does not occur
> > when
> >  applied to UNION than when ap

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-22 Thread Yanjing Wang
Thanks Albert.

 you could probably refer to osdi paper "resin general sub query fusion"
> which has chapters talking about general union fusion that deals with the
> case Julian pointed out.
>

The optimization seems very helpful, but it introduces new operators and
the rule for new operators may be more complex.
Hi, Community, What do you think?

Btw, per my understanding cases of union fusion is not that many in
> practice.
>

In particular situations such as VDT and BI tools which generates sql
according to a fixed pattern we will meet many such union all queries.

Albert  于2022年1月23日周日 11:22写道:

> On a more general fusion case, you could probably refer to osdi paper
> "resin general sub query fusion" which has chapters talking about general
> union fusion that deals with the case Julian pointed out.
>
> Btw, per my understanding cases of union fusion is not that many in
> practice. Curious to know the real cases
>
> 在 2022年1月21日星期五,Yanjing Wang  写道:
>
> > Hi Julian, I have logged a jira CALCITE-4990
> >  for this new rule,
> > later I will submit a pr for this.
> >
> > Julian Hyde  于2022年1月21日周五 01:37写道:
> >
> > > Process logic sounds great. Please log a Jira case.
> > >
> > > There’s a lot here, so it’s probably wise to split into a simple PR
> that
> > > just does the simple case (UNION ALL, no Project) and extend later.
> > >
> > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang  >
> > > wrote:
> > > >
> > > > Thanks Julian and Justin.
> > > >
> > > > What do you think the rule should be called?
> UnionFilterTransposeRule,
> > > >> perhaps?
> > > >>
> > > > SetOpFilterMergeRule?
> > > >
> > > > Maybe that problem does not occur when applied to UNION than when
> > applied
> > > >> to UNION ALL.
> > > >
> > > > Yes, This is very important.
> > > >
> > > > There would seem to be analogous rules for INTERSECT (combine the
> > > >> conditions using AND) and EXCEPT (combine the conditions using AND
> > NOT).
> > > >>
> > > > Excellent extensions, all the three operators process logic are:
> > > >
> > > > For UNION:
> > > > New Filter = left Filter *OR* right Filter.
> > > >
> > > > For INTERSECT:
> > > > New Filter = left Filter *AND* right Filter.
> > > >
> > > > For EXCEPT:
> > > > If left Filter, right Filter have no overlap, transform to left child
> > > tree,
> > > > Otherwise
> > > > New Filter = left Filter *AND NOT *right Filter.
> > > >
> > > > For UNION ALL:
> > > > Add prerequisites:
> > > > left Filter, right Filter must have no overlap.
> > > >
> > > > For INTERSECT ALL:
> > > > Add prerequisites:
> > > > If left Filter, right Filter have no overlap, transform to empty
> > values.
> > > > Otherwise
> > > > the rule can't be applied.
> > > >
> > > > For EXCEPT ALL:
> > > > same for  EXCEPT.
> > > >
> > > >
> > > > work for N-way Union, not just 2-way Union.
> > > >>
> > > > Yes, I will add tests for this.
> > > >
> > > > And I think you should make it work whether or not a Project is
> > present.
> > > >
> > > > Ok, It seems I need construct several different operand match trees
> for
> > > > plan.
> > > >
> > > > Hi Julian, what do you think of the above process logic?
> > > >
> > > > Julian Hyde mailto:jhyde.apa...@gmail.com>>
> > > 于2022年1月20日周四 10:18写道:
> > > >
> > > >> Justin,
> > > >>
> > > >> For planning table or index scans, I would recommend using a single
> > > >> TableScan with a Filter that uses a Sarg, rather than using multiple
> > > >> TableScans connected by a Union. So I think this rule will be
> useful.
> > > >>
> > > >> But I do agree that this proposed rule is not a “no brainer”. It may
> > not
> > > >> do what people want/expect in all cases, and therefore it probably
> > > should
> > > >> not be enabled it by default.
> > > >>
> > > >> Julian
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart 
> > > >> wrote:
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> Note that this will negate the optimization that one usually is
> > looking
> > > >> for
> > > >>> when writing such queries:
> > > >>>
> > > >>> Select * from TAB where a = 1
> > > >>> UNION ALL
> > > >>> Select * from TAB where b = 1
> > > >>>
> > > >>> In a database with indexes (most databases) this will allow indexes
> > to
> > > be
> > > >>> used on both the a column and the b column.
> > > >>> Databases with bitmap indexes or without indexes would benefit from
> > the
> > > >>> rule.
> > > >>>
> > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <
> jhyde.apa...@gmail.com
> > > >> >>
> > wrote:
> > > >>>
> > >  Can you log a Jira case for this?
> > > 
> > >  I think you should make your rule work for N-way Union, not just
> > 2-way
> > >  Union. And I think you should make it work whether or not a
> Project
> > is
> > >  present.
> > > 
> > > > On Jan 19, 2022, at 1:25 PM, Julian Hyde  > > >
> > > >>

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-23 Thread Alessandro Solimando
Hello everyone,
SQL queries automatically generated by semantic queries (see [1] for
instance) are pretty generous on UNIONs, so I'd say that in this area, you
will have many practical cases matching the patterns highlighted here.

The main idea is that when you have a concept hierarchy in your ontology
(isA relationship), where different concepts are generally mapped to
different tables (or subparts of a single table selected by changing the
filter condition), when you query an ancestor in that hierarchy (say:
Person), in order to account for implicit facts, you generally UNION all
the subtables (say, Employee, Customer, etc, which are also Person) in your
final SQL query.

Best regards,
Alessandro

[1] https://en.wikipedia.org/wiki/Ontology-based_data_integration

On Sun, 23 Jan 2022 at 08:19, Yanjing Wang 
wrote:

> Thanks Albert.
>
>  you could probably refer to osdi paper "resin general sub query fusion"
> > which has chapters talking about general union fusion that deals with the
> > case Julian pointed out.
> >
>
> The optimization seems very helpful, but it introduces new operators and
> the rule for new operators may be more complex.
> Hi, Community, What do you think?
>
> Btw, per my understanding cases of union fusion is not that many in
> > practice.
> >
>
> In particular situations such as VDT and BI tools which generates sql
> according to a fixed pattern we will meet many such union all queries.
>
> Albert  于2022年1月23日周日 11:22写道:
>
> > On a more general fusion case, you could probably refer to osdi paper
> > "resin general sub query fusion" which has chapters talking about general
> > union fusion that deals with the case Julian pointed out.
> >
> > Btw, per my understanding cases of union fusion is not that many in
> > practice. Curious to know the real cases
> >
> > 在 2022年1月21日星期五,Yanjing Wang  写道:
> >
> > > Hi Julian, I have logged a jira CALCITE-4990
> > >  for this new
> rule,
> > > later I will submit a pr for this.
> > >
> > > Julian Hyde  于2022年1月21日周五 01:37写道:
> > >
> > > > Process logic sounds great. Please log a Jira case.
> > > >
> > > > There’s a lot here, so it’s probably wise to split into a simple PR
> > that
> > > > just does the simple case (UNION ALL, no Project) and extend later.
> > > >
> > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
> zhuangzixiao...@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > Thanks Julian and Justin.
> > > > >
> > > > > What do you think the rule should be called?
> > UnionFilterTransposeRule,
> > > > >> perhaps?
> > > > >>
> > > > > SetOpFilterMergeRule?
> > > > >
> > > > > Maybe that problem does not occur when applied to UNION than when
> > > applied
> > > > >> to UNION ALL.
> > > > >
> > > > > Yes, This is very important.
> > > > >
> > > > > There would seem to be analogous rules for INTERSECT (combine the
> > > > >> conditions using AND) and EXCEPT (combine the conditions using AND
> > > NOT).
> > > > >>
> > > > > Excellent extensions, all the three operators process logic are:
> > > > >
> > > > > For UNION:
> > > > > New Filter = left Filter *OR* right Filter.
> > > > >
> > > > > For INTERSECT:
> > > > > New Filter = left Filter *AND* right Filter.
> > > > >
> > > > > For EXCEPT:
> > > > > If left Filter, right Filter have no overlap, transform to left
> child
> > > > tree,
> > > > > Otherwise
> > > > > New Filter = left Filter *AND NOT *right Filter.
> > > > >
> > > > > For UNION ALL:
> > > > > Add prerequisites:
> > > > > left Filter, right Filter must have no overlap.
> > > > >
> > > > > For INTERSECT ALL:
> > > > > Add prerequisites:
> > > > > If left Filter, right Filter have no overlap, transform to empty
> > > values.
> > > > > Otherwise
> > > > > the rule can't be applied.
> > > > >
> > > > > For EXCEPT ALL:
> > > > > same for  EXCEPT.
> > > > >
> > > > >
> > > > > work for N-way Union, not just 2-way Union.
> > > > >>
> > > > > Yes, I will add tests for this.
> > > > >
> > > > > And I think you should make it work whether or not a Project is
> > > present.
> > > > >
> > > > > Ok, It seems I need construct several different operand match trees
> > for
> > > > > plan.
> > > > >
> > > > > Hi Julian, what do you think of the above process logic?
> > > > >
> > > > > Julian Hyde mailto:jhyde.apa...@gmail.com
> >>
> > > > 于2022年1月20日周四 10:18写道:
> > > > >
> > > > >> Justin,
> > > > >>
> > > > >> For planning table or index scans, I would recommend using a
> single
> > > > >> TableScan with a Filter that uses a Sarg, rather than using
> multiple
> > > > >> TableScans connected by a Union. So I think this rule will be
> > useful.
> > > > >>
> > > > >> But I do agree that this proposed rule is not a “no brainer”. It
> may
> > > not
> > > > >> do what people want/expect in all cases, and therefore it probably
> > > > should
> > > > >> not be enabled it by default.
> > > > >>
> > > > >> Julian
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>> On Jan 19, 2022, at 3:38 

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-23 Thread Justin Swanhart
Hi Alessandro,

That is very fascinating.  Most of those situations won't apply to this
optimization though, since it requires the tables of the union to be
identical rather than different tables representing unions of different
ontologies, that is, this will only apply optimizations when an ontology is
used more than once in a union.  Unless I am mistaken.

--Justin

On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Hello everyone,
> SQL queries automatically generated by semantic queries (see [1] for
> instance) are pretty generous on UNIONs, so I'd say that in this area, you
> will have many practical cases matching the patterns highlighted here.
>
> The main idea is that when you have a concept hierarchy in your ontology
> (isA relationship), where different concepts are generally mapped to
> different tables (or subparts of a single table selected by changing the
> filter condition), when you query an ancestor in that hierarchy (say:
> Person), in order to account for implicit facts, you generally UNION all
> the subtables (say, Employee, Customer, etc, which are also Person) in your
> final SQL query.
>
> Best regards,
> Alessandro
>
> [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration
>
> On Sun, 23 Jan 2022 at 08:19, Yanjing Wang 
> wrote:
>
> > Thanks Albert.
> >
> >  you could probably refer to osdi paper "resin general sub query fusion"
> > > which has chapters talking about general union fusion that deals with
> the
> > > case Julian pointed out.
> > >
> >
> > The optimization seems very helpful, but it introduces new operators and
> > the rule for new operators may be more complex.
> > Hi, Community, What do you think?
> >
> > Btw, per my understanding cases of union fusion is not that many in
> > > practice.
> > >
> >
> > In particular situations such as VDT and BI tools which generates sql
> > according to a fixed pattern we will meet many such union all queries.
> >
> > Albert  于2022年1月23日周日 11:22写道:
> >
> > > On a more general fusion case, you could probably refer to osdi paper
> > > "resin general sub query fusion" which has chapters talking about
> general
> > > union fusion that deals with the case Julian pointed out.
> > >
> > > Btw, per my understanding cases of union fusion is not that many in
> > > practice. Curious to know the real cases
> > >
> > > 在 2022年1月21日星期五,Yanjing Wang  写道:
> > >
> > > > Hi Julian, I have logged a jira CALCITE-4990
> > > >  for this new
> > rule,
> > > > later I will submit a pr for this.
> > > >
> > > > Julian Hyde  于2022年1月21日周五 01:37写道:
> > > >
> > > > > Process logic sounds great. Please log a Jira case.
> > > > >
> > > > > There’s a lot here, so it’s probably wise to split into a simple PR
> > > that
> > > > > just does the simple case (UNION ALL, no Project) and extend later.
> > > > >
> > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
> > zhuangzixiao...@gmail.com
> > > >
> > > > > wrote:
> > > > > >
> > > > > > Thanks Julian and Justin.
> > > > > >
> > > > > > What do you think the rule should be called?
> > > UnionFilterTransposeRule,
> > > > > >> perhaps?
> > > > > >>
> > > > > > SetOpFilterMergeRule?
> > > > > >
> > > > > > Maybe that problem does not occur when applied to UNION than when
> > > > applied
> > > > > >> to UNION ALL.
> > > > > >
> > > > > > Yes, This is very important.
> > > > > >
> > > > > > There would seem to be analogous rules for INTERSECT (combine the
> > > > > >> conditions using AND) and EXCEPT (combine the conditions using
> AND
> > > > NOT).
> > > > > >>
> > > > > > Excellent extensions, all the three operators process logic are:
> > > > > >
> > > > > > For UNION:
> > > > > > New Filter = left Filter *OR* right Filter.
> > > > > >
> > > > > > For INTERSECT:
> > > > > > New Filter = left Filter *AND* right Filter.
> > > > > >
> > > > > > For EXCEPT:
> > > > > > If left Filter, right Filter have no overlap, transform to left
> > child
> > > > > tree,
> > > > > > Otherwise
> > > > > > New Filter = left Filter *AND NOT *right Filter.
> > > > > >
> > > > > > For UNION ALL:
> > > > > > Add prerequisites:
> > > > > > left Filter, right Filter must have no overlap.
> > > > > >
> > > > > > For INTERSECT ALL:
> > > > > > Add prerequisites:
> > > > > > If left Filter, right Filter have no overlap, transform to empty
> > > > values.
> > > > > > Otherwise
> > > > > > the rule can't be applied.
> > > > > >
> > > > > > For EXCEPT ALL:
> > > > > > same for  EXCEPT.
> > > > > >
> > > > > >
> > > > > > work for N-way Union, not just 2-way Union.
> > > > > >>
> > > > > > Yes, I will add tests for this.
> > > > > >
> > > > > > And I think you should make it work whether or not a Project is
> > > > present.
> > > > > >
> > > > > > Ok, It seems I need construct several different operand match
> trees
> > > for
> > > > > > plan.
> > > > > >
> > > > > > Hi Julian, what do you think of the above process logic?
> > > > > >
> > > >

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-23 Thread Alessandro Solimando
Hi Justin,
the same table could be used to describe different concepts/classes.

For instance, you could have a single table "Employee" where the column
"job_type" is used to describe the different kinds of jobs (and therefore
employees) you have.

At that point, you could have "White Collar Employee" with subclasses
"Secretary", "Manager", etc., and an ontology2db mapping like:

secretary := select * from employee where job_type=x
...
manager := select * from employee where job_type=y

Then any request involving the "white_collar" concept would rewrite to a
union over the subclasses, this should benefit from this optimization if I
am not mistaken.

Not sure how "IO intensive" such subqueries can be though, but apart from
the toy example, there are complex cases where the subquery cannot benefit
much from indexes and the data can be pretty massive depending on the
scenario.

The problem is that, for complex databases which has been evolving over
several years, the logical concepts from the ontology do not easily match
concepts from ER model of the DB (if one ever existed :)), and in general
when this happens you are also hitting the RDBMs with unforeseen query
patterns for which the data model hasn't been optimised.

Best regards,
Alessandro

On Sun, 23 Jan 2022 at 13:33, Justin Swanhart  wrote:

> Hi Alessandro,
>
> That is very fascinating.  Most of those situations won't apply to this
> optimization though, since it requires the tables of the union to be
> identical rather than different tables representing unions of different
> ontologies, that is, this will only apply optimizations when an ontology is
> used more than once in a union.  Unless I am mistaken.
>
> --Justin
>
> On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > Hello everyone,
> > SQL queries automatically generated by semantic queries (see [1] for
> > instance) are pretty generous on UNIONs, so I'd say that in this area,
> you
> > will have many practical cases matching the patterns highlighted here.
> >
> > The main idea is that when you have a concept hierarchy in your ontology
> > (isA relationship), where different concepts are generally mapped to
> > different tables (or subparts of a single table selected by changing the
> > filter condition), when you query an ancestor in that hierarchy (say:
> > Person), in order to account for implicit facts, you generally UNION all
> > the subtables (say, Employee, Customer, etc, which are also Person) in
> your
> > final SQL query.
> >
> > Best regards,
> > Alessandro
> >
> > [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration
> >
> > On Sun, 23 Jan 2022 at 08:19, Yanjing Wang 
> > wrote:
> >
> > > Thanks Albert.
> > >
> > >  you could probably refer to osdi paper "resin general sub query
> fusion"
> > > > which has chapters talking about general union fusion that deals with
> > the
> > > > case Julian pointed out.
> > > >
> > >
> > > The optimization seems very helpful, but it introduces new operators
> and
> > > the rule for new operators may be more complex.
> > > Hi, Community, What do you think?
> > >
> > > Btw, per my understanding cases of union fusion is not that many in
> > > > practice.
> > > >
> > >
> > > In particular situations such as VDT and BI tools which generates sql
> > > according to a fixed pattern we will meet many such union all queries.
> > >
> > > Albert  于2022年1月23日周日 11:22写道:
> > >
> > > > On a more general fusion case, you could probably refer to osdi paper
> > > > "resin general sub query fusion" which has chapters talking about
> > general
> > > > union fusion that deals with the case Julian pointed out.
> > > >
> > > > Btw, per my understanding cases of union fusion is not that many in
> > > > practice. Curious to know the real cases
> > > >
> > > > 在 2022年1月21日星期五,Yanjing Wang  写道:
> > > >
> > > > > Hi Julian, I have logged a jira CALCITE-4990
> > > > >  for this new
> > > rule,
> > > > > later I will submit a pr for this.
> > > > >
> > > > > Julian Hyde  于2022年1月21日周五 01:37写道:
> > > > >
> > > > > > Process logic sounds great. Please log a Jira case.
> > > > > >
> > > > > > There’s a lot here, so it’s probably wise to split into a simple
> PR
> > > > that
> > > > > > just does the simple case (UNION ALL, no Project) and extend
> later.
> > > > > >
> > > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
> > > zhuangzixiao...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > Thanks Julian and Justin.
> > > > > > >
> > > > > > > What do you think the rule should be called?
> > > > UnionFilterTransposeRule,
> > > > > > >> perhaps?
> > > > > > >>
> > > > > > > SetOpFilterMergeRule?
> > > > > > >
> > > > > > > Maybe that problem does not occur when applied to UNION than
> when
> > > > > applied
> > > > > > >> to UNION ALL.
> > > > > > >
> > > > > > > Yes, This is very important.
> > > > > > >
> > > > > > > There would seem to be analo

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-25 Thread Yanjing Wang
Hi Community,

I make a commit

for
UNION ALL Converting, Please review if you are available, thanks.

Alessandro Solimando  于2022年1月24日周一 00:42写道:

> Hi Justin,
> the same table could be used to describe different concepts/classes.
>
> For instance, you could have a single table "Employee" where the column
> "job_type" is used to describe the different kinds of jobs (and therefore
> employees) you have.
>
> At that point, you could have "White Collar Employee" with subclasses
> "Secretary", "Manager", etc., and an ontology2db mapping like:
>
> secretary := select * from employee where job_type=x
> ...
> manager := select * from employee where job_type=y
>
> Then any request involving the "white_collar" concept would rewrite to a
> union over the subclasses, this should benefit from this optimization if I
> am not mistaken.
>
> Not sure how "IO intensive" such subqueries can be though, but apart from
> the toy example, there are complex cases where the subquery cannot benefit
> much from indexes and the data can be pretty massive depending on the
> scenario.
>
> The problem is that, for complex databases which has been evolving over
> several years, the logical concepts from the ontology do not easily match
> concepts from ER model of the DB (if one ever existed :)), and in general
> when this happens you are also hitting the RDBMs with unforeseen query
> patterns for which the data model hasn't been optimised.
>
> Best regards,
> Alessandro
>
> On Sun, 23 Jan 2022 at 13:33, Justin Swanhart  wrote:
>
> > Hi Alessandro,
> >
> > That is very fascinating.  Most of those situations won't apply to this
> > optimization though, since it requires the tables of the union to be
> > identical rather than different tables representing unions of different
> > ontologies, that is, this will only apply optimizations when an ontology
> is
> > used more than once in a union.  Unless I am mistaken.
> >
> > --Justin
> >
> > On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando <
> > alessandro.solima...@gmail.com> wrote:
> >
> > > Hello everyone,
> > > SQL queries automatically generated by semantic queries (see [1] for
> > > instance) are pretty generous on UNIONs, so I'd say that in this area,
> > you
> > > will have many practical cases matching the patterns highlighted here.
> > >
> > > The main idea is that when you have a concept hierarchy in your
> ontology
> > > (isA relationship), where different concepts are generally mapped to
> > > different tables (or subparts of a single table selected by changing
> the
> > > filter condition), when you query an ancestor in that hierarchy (say:
> > > Person), in order to account for implicit facts, you generally UNION
> all
> > > the subtables (say, Employee, Customer, etc, which are also Person) in
> > your
> > > final SQL query.
> > >
> > > Best regards,
> > > Alessandro
> > >
> > > [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration
> > >
> > > On Sun, 23 Jan 2022 at 08:19, Yanjing Wang 
> > > wrote:
> > >
> > > > Thanks Albert.
> > > >
> > > >  you could probably refer to osdi paper "resin general sub query
> > fusion"
> > > > > which has chapters talking about general union fusion that deals
> with
> > > the
> > > > > case Julian pointed out.
> > > > >
> > > >
> > > > The optimization seems very helpful, but it introduces new operators
> > and
> > > > the rule for new operators may be more complex.
> > > > Hi, Community, What do you think?
> > > >
> > > > Btw, per my understanding cases of union fusion is not that many in
> > > > > practice.
> > > > >
> > > >
> > > > In particular situations such as VDT and BI tools which generates sql
> > > > according to a fixed pattern we will meet many such union all
> queries.
> > > >
> > > > Albert  于2022年1月23日周日 11:22写道:
> > > >
> > > > > On a more general fusion case, you could probably refer to osdi
> paper
> > > > > "resin general sub query fusion" which has chapters talking about
> > > general
> > > > > union fusion that deals with the case Julian pointed out.
> > > > >
> > > > > Btw, per my understanding cases of union fusion is not that many in
> > > > > practice. Curious to know the real cases
> > > > >
> > > > > 在 2022年1月21日星期五,Yanjing Wang  写道:
> > > > >
> > > > > > Hi Julian, I have logged a jira CALCITE-4990
> > > > > >  for this
> new
> > > > rule,
> > > > > > later I will submit a pr for this.
> > > > > >
> > > > > > Julian Hyde  于2022年1月21日周五 01:37写道:
> > > > > >
> > > > > > > Process logic sounds great. Please log a Jira case.
> > > > > > >
> > > > > > > There’s a lot here, so it’s probably wise to split into a
> simple
> > PR
> > > > > that
> > > > > > > just does the simple case (UNION ALL, no Project) and extend
> > later.
> > > > > > >
> > > > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
> > > > zhuangzixiao...@gmail.com
> > > > > >
> > > > > > > wrote:
> >

Re: New rule for Converting UNION ALL with same inputs but different filters to single input with OR FILTER

2022-01-25 Thread Yanjing Wang
The previous commit is obsolete, Please use the latest commit on branch
union_all_fusion_to_single
,
thanks.

Yanjing Wang  于2022年1月26日周三 12:17写道:

> Hi Community,
>
> I make a commit
> 
>  for
> UNION ALL Converting, Please review if you are available, thanks.
>
> Alessandro Solimando  于2022年1月24日周一
> 00:42写道:
>
>> Hi Justin,
>> the same table could be used to describe different concepts/classes.
>>
>> For instance, you could have a single table "Employee" where the column
>> "job_type" is used to describe the different kinds of jobs (and therefore
>> employees) you have.
>>
>> At that point, you could have "White Collar Employee" with subclasses
>> "Secretary", "Manager", etc., and an ontology2db mapping like:
>>
>> secretary := select * from employee where job_type=x
>> ...
>> manager := select * from employee where job_type=y
>>
>> Then any request involving the "white_collar" concept would rewrite to a
>> union over the subclasses, this should benefit from this optimization if I
>> am not mistaken.
>>
>> Not sure how "IO intensive" such subqueries can be though, but apart from
>> the toy example, there are complex cases where the subquery cannot benefit
>> much from indexes and the data can be pretty massive depending on the
>> scenario.
>>
>> The problem is that, for complex databases which has been evolving over
>> several years, the logical concepts from the ontology do not easily match
>> concepts from ER model of the DB (if one ever existed :)), and in general
>> when this happens you are also hitting the RDBMs with unforeseen query
>> patterns for which the data model hasn't been optimised.
>>
>> Best regards,
>> Alessandro
>>
>> On Sun, 23 Jan 2022 at 13:33, Justin Swanhart 
>> wrote:
>>
>> > Hi Alessandro,
>> >
>> > That is very fascinating.  Most of those situations won't apply to this
>> > optimization though, since it requires the tables of the union to be
>> > identical rather than different tables representing unions of different
>> > ontologies, that is, this will only apply optimizations when an
>> ontology is
>> > used more than once in a union.  Unless I am mistaken.
>> >
>> > --Justin
>> >
>> > On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando <
>> > alessandro.solima...@gmail.com> wrote:
>> >
>> > > Hello everyone,
>> > > SQL queries automatically generated by semantic queries (see [1] for
>> > > instance) are pretty generous on UNIONs, so I'd say that in this area,
>> > you
>> > > will have many practical cases matching the patterns highlighted here.
>> > >
>> > > The main idea is that when you have a concept hierarchy in your
>> ontology
>> > > (isA relationship), where different concepts are generally mapped to
>> > > different tables (or subparts of a single table selected by changing
>> the
>> > > filter condition), when you query an ancestor in that hierarchy (say:
>> > > Person), in order to account for implicit facts, you generally UNION
>> all
>> > > the subtables (say, Employee, Customer, etc, which are also Person) in
>> > your
>> > > final SQL query.
>> > >
>> > > Best regards,
>> > > Alessandro
>> > >
>> > > [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration
>> > >
>> > > On Sun, 23 Jan 2022 at 08:19, Yanjing Wang > >
>> > > wrote:
>> > >
>> > > > Thanks Albert.
>> > > >
>> > > >  you could probably refer to osdi paper "resin general sub query
>> > fusion"
>> > > > > which has chapters talking about general union fusion that deals
>> with
>> > > the
>> > > > > case Julian pointed out.
>> > > > >
>> > > >
>> > > > The optimization seems very helpful, but it introduces new operators
>> > and
>> > > > the rule for new operators may be more complex.
>> > > > Hi, Community, What do you think?
>> > > >
>> > > > Btw, per my understanding cases of union fusion is not that many in
>> > > > > practice.
>> > > > >
>> > > >
>> > > > In particular situations such as VDT and BI tools which generates
>> sql
>> > > > according to a fixed pattern we will meet many such union all
>> queries.
>> > > >
>> > > > Albert  于2022年1月23日周日 11:22写道:
>> > > >
>> > > > > On a more general fusion case, you could probably refer to osdi
>> paper
>> > > > > "resin general sub query fusion" which has chapters talking about
>> > > general
>> > > > > union fusion that deals with the case Julian pointed out.
>> > > > >
>> > > > > Btw, per my understanding cases of union fusion is not that many
>> in
>> > > > > practice. Curious to know the real cases
>> > > > >
>> > > > > 在 2022年1月21日星期五,Yanjing Wang  写道:
>> > > > >
>> > > > > > Hi Julian, I have logged a jira CALCITE-4990
>> > > > > >  for this
>> new
>> > > > rule,
>> > > > > > later I will submit a pr for this.
>> > > > > >
>> > > > > > Julian Hyde  于2022年1月21日周五 01:37写道:
>> > > > > >
>> > > > > > > Process logic sounds great. Ple