The previous commit is obsolete, Please use the latest commit on branch
union_all_fusion_to_single
<https://github.com/hannerwang/calcite/tree/union_all_fusion_to_single>,
thanks.

Yanjing Wang <[email protected]> 于2022年1月26日周三 12:17写道:

> Hi Community,
>
> I make a commit
> <https://github.com/hannerwang/calcite/commit/50fdf37c0c9d560cee3a1ce0d24b83e0346ecd90>
>  for
> UNION ALL Converting, Please review if you are available, thanks.
>
> Alessandro Solimando <[email protected]> 于2022年1月24日周一
> 00:42写道:
>
>> Hi Justin,
>> the same table could be used to describe different concepts/classes.
>>
>> For instance, you could have a single table "Employee" where the column
>> "job_type" is used to describe the different kinds of jobs (and therefore
>> employees) you have.
>>
>> At that point, you could have "White Collar Employee" with subclasses
>> "Secretary", "Manager", etc., and an ontology2db mapping like:
>>
>> secretary := select * from employee where job_type=x
>> ...
>> manager := select * from employee where job_type=y
>>
>> Then any request involving the "white_collar" concept would rewrite to a
>> union over the subclasses, this should benefit from this optimization if I
>> am not mistaken.
>>
>> Not sure how "IO intensive" such subqueries can be though, but apart from
>> the toy example, there are complex cases where the subquery cannot benefit
>> much from indexes and the data can be pretty massive depending on the
>> scenario.
>>
>> The problem is that, for complex databases which has been evolving over
>> several years, the logical concepts from the ontology do not easily match
>> concepts from ER model of the DB (if one ever existed :)), and in general
>> when this happens you are also hitting the RDBMs with unforeseen query
>> patterns for which the data model hasn't been optimised.
>>
>> Best regards,
>> Alessandro
>>
>> On Sun, 23 Jan 2022 at 13:33, Justin Swanhart <[email protected]>
>> wrote:
>>
>> > Hi Alessandro,
>> >
>> > That is very fascinating.  Most of those situations won't apply to this
>> > optimization though, since it requires the tables of the union to be
>> > identical rather than different tables representing unions of different
>> > ontologies, that is, this will only apply optimizations when an
>> ontology is
>> > used more than once in a union.  Unless I am mistaken.
>> >
>> > --Justin
>> >
>> > On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando <
>> > [email protected]> wrote:
>> >
>> > > Hello everyone,
>> > > SQL queries automatically generated by semantic queries (see [1] for
>> > > instance) are pretty generous on UNIONs, so I'd say that in this area,
>> > you
>> > > will have many practical cases matching the patterns highlighted here.
>> > >
>> > > The main idea is that when you have a concept hierarchy in your
>> ontology
>> > > (isA relationship), where different concepts are generally mapped to
>> > > different tables (or subparts of a single table selected by changing
>> the
>> > > filter condition), when you query an ancestor in that hierarchy (say:
>> > > Person), in order to account for implicit facts, you generally UNION
>> all
>> > > the subtables (say, Employee, Customer, etc, which are also Person) in
>> > your
>> > > final SQL query.
>> > >
>> > > Best regards,
>> > > Alessandro
>> > >
>> > > [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration
>> > >
>> > > On Sun, 23 Jan 2022 at 08:19, Yanjing Wang <[email protected]
>> >
>> > > wrote:
>> > >
>> > > > Thanks Albert.
>> > > >
>> > > >  you could probably refer to osdi paper "resin general sub query
>> > fusion"
>> > > > > which has chapters talking about general union fusion that deals
>> with
>> > > the
>> > > > > case Julian pointed out.
>> > > > >
>> > > >
>> > > > The optimization seems very helpful, but it introduces new operators
>> > and
>> > > > the rule for new operators may be more complex.
>> > > > Hi, Community, What do you think?
>> > > >
>> > > > Btw, per my understanding cases of union fusion is not that many in
>> > > > > practice.
>> > > > >
>> > > >
>> > > > In particular situations such as VDT and BI tools which generates
>> sql
>> > > > according to a fixed pattern we will meet many such union all
>> queries.
>> > > >
>> > > > Albert <[email protected]> 于2022年1月23日周日 11:22写道:
>> > > >
>> > > > > On a more general fusion case, you could probably refer to osdi
>> paper
>> > > > > "resin general sub query fusion" which has chapters talking about
>> > > general
>> > > > > union fusion that deals with the case Julian pointed out.
>> > > > >
>> > > > > Btw, per my understanding cases of union fusion is not that many
>> in
>> > > > > practice. Curious to know the real cases
>> > > > >
>> > > > > 在 2022年1月21日星期五,Yanjing Wang <[email protected]> 写道:
>> > > > >
>> > > > > > Hi Julian, I have logged a jira CALCITE-4990
>> > > > > > <https://issues.apache.org/jira/browse/CALCITE-4990> for this
>> new
>> > > > rule,
>> > > > > > later I will submit a pr for this.
>> > > > > >
>> > > > > > Julian Hyde <[email protected]> 于2022年1月21日周五 01:37写道:
>> > > > > >
>> > > > > > > Process logic sounds great. Please log a Jira case.
>> > > > > > >
>> > > > > > > There’s a lot here, so it’s probably wise to split into a
>> simple
>> > PR
>> > > > > that
>> > > > > > > just does the simple case (UNION ALL, no Project) and extend
>> > later.
>> > > > > > >
>> > > > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
>> > > > [email protected]
>> > > > > >
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > Thanks Julian and Justin.
>> > > > > > > >
>> > > > > > > > What do you think the rule should be called?
>> > > > > UnionFilterTransposeRule,
>> > > > > > > >> perhaps?
>> > > > > > > >>
>> > > > > > > > SetOpFilterMergeRule?
>> > > > > > > >
>> > > > > > > > Maybe that problem does not occur when applied to UNION than
>> > when
>> > > > > > applied
>> > > > > > > >> to UNION ALL.
>> > > > > > > >
>> > > > > > > > Yes, This is very important.
>> > > > > > > >
>> > > > > > > > There would seem to be analogous rules for INTERSECT
>> (combine
>> > the
>> > > > > > > >> conditions using AND) and EXCEPT (combine the conditions
>> using
>> > > AND
>> > > > > > NOT).
>> > > > > > > >>
>> > > > > > > > Excellent extensions, all the three operators process logic
>> > are:
>> > > > > > > >
>> > > > > > > > For UNION:
>> > > > > > > > New Filter = left Filter *OR* right Filter.
>> > > > > > > >
>> > > > > > > > For INTERSECT:
>> > > > > > > > New Filter = left Filter *AND* right Filter.
>> > > > > > > >
>> > > > > > > > For EXCEPT:
>> > > > > > > > If left Filter, right Filter have no overlap, transform to
>> left
>> > > > child
>> > > > > > > tree,
>> > > > > > > > Otherwise
>> > > > > > > > New Filter = left Filter *AND NOT *right Filter.
>> > > > > > > >
>> > > > > > > > For UNION ALL:
>> > > > > > > > Add prerequisites:
>> > > > > > > > left Filter, right Filter must have no overlap.
>> > > > > > > >
>> > > > > > > > For INTERSECT ALL:
>> > > > > > > > Add prerequisites:
>> > > > > > > > If left Filter, right Filter have no overlap, transform to
>> > empty
>> > > > > > values.
>> > > > > > > > Otherwise
>> > > > > > > > the rule can't be applied.
>> > > > > > > >
>> > > > > > > > For EXCEPT ALL:
>> > > > > > > > same for  EXCEPT.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > work for N-way Union, not just 2-way Union.
>> > > > > > > >>
>> > > > > > > > Yes, I will add tests for this.
>> > > > > > > >
>> > > > > > > > And I think you should make it work whether or not a
>> Project is
>> > > > > > present.
>> > > > > > > >
>> > > > > > > > Ok, It seems I need construct several different operand
>> match
>> > > trees
>> > > > > for
>> > > > > > > > plan.
>> > > > > > > >
>> > > > > > > > Hi Julian, what do you think of the above process logic?
>> > > > > > > >
>> > > > > > > > Julian Hyde <[email protected] <mailto:
>> > > [email protected]
>> > > > >>
>> > > > > > > 于2022年1月20日周四 10:18写道:
>> > > > > > > >
>> > > > > > > >> Justin,
>> > > > > > > >>
>> > > > > > > >> For planning table or index scans, I would recommend using
>> a
>> > > > single
>> > > > > > > >> TableScan with a Filter that uses a Sarg, rather than using
>> > > > multiple
>> > > > > > > >> TableScans connected by a Union. So I think this rule will
>> be
>> > > > > useful.
>> > > > > > > >>
>> > > > > > > >> But I do agree that this proposed rule is not a “no
>> brainer”.
>> > It
>> > > > may
>> > > > > > not
>> > > > > > > >> do what people want/expect in all cases, and therefore it
>> > > probably
>> > > > > > > should
>> > > > > > > >> not be enabled it by default.
>> > > > > > > >>
>> > > > > > > >> Julian
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <
>> > > > [email protected]>
>> > > > > > > >> wrote:
>> > > > > > > >>>
>> > > > > > > >>> Hi,
>> > > > > > > >>>
>> > > > > > > >>> Note that this will negate the optimization that one
>> usually
>> > is
>> > > > > > looking
>> > > > > > > >> for
>> > > > > > > >>> when writing such queries:
>> > > > > > > >>>
>> > > > > > > >>> Select * from TAB where a = 1
>> > > > > > > >>> UNION ALL
>> > > > > > > >>> Select * from TAB where b = 1
>> > > > > > > >>>
>> > > > > > > >>> In a database with indexes (most databases) this will
>> allow
>> > > > indexes
>> > > > > > to
>> > > > > > > be
>> > > > > > > >>> used on both the a column and the b column.
>> > > > > > > >>> Databases with bitmap indexes or without indexes would
>> > benefit
>> > > > from
>> > > > > > the
>> > > > > > > >>> rule.
>> > > > > > > >>>
>> > > > > > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <
>> > > > > [email protected]
>> > > > > > > >> <mailto:[email protected] <mailto:
>> [email protected]
>> > > >>>
>> > > > > > wrote:
>> > > > > > > >>>
>> > > > > > > >>>> Can you log a Jira case for this?
>> > > > > > > >>>>
>> > > > > > > >>>> I think you should make your rule work for N-way Union,
>> not
>> > > just
>> > > > > > 2-way
>> > > > > > > >>>> Union. And I think you should make it work whether or
>> not a
>> > > > > Project
>> > > > > > is
>> > > > > > > >>>> present.
>> > > > > > > >>>>
>> > > > > > > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <
>> > > > [email protected]
>> > > > > > > <mailto:[email protected]>>
>> > > > > > > >> wrote:
>> > > > > > > >>>>>
>> > > > > > > >>>>> It sounds useful.
>> > > > > > > >>>>>
>> > > > > > > >>>>> What do you think the rule should be called?
>> > > > > > > UnionFilterTransposeRule,
>> > > > > > > >>>> perhaps?
>> > > > > > > >>>>>
>> > > > > > > >>>>> A challenge when writing the rule will be to ensure that
>> > all
>> > > of
>> > > > > the
>> > > > > > > >>>> inputs to the Union are the same. The Volcano framework
>> is
>> > not
>> > > > > very
>> > > > > > > >> good at
>> > > > > > > >>>> that.
>> > > > > > > >>>>>
>> > > > > > > >>>>> You should be careful of the case that the conditions
>> > > overlap.
>> > > > > For
>> > > > > > > >>>> example, the rewrite
>> > > > > > > >>>>>
>> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30
>> > > > > > > >>>>> UNION ALL
>> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
>> > > > > > > >>>>>
>> > > > > > > >>>>> to
>> > > > > > > >>>>>
>> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25,
>> 35,
>> > 45)
>> > > > > > > >>>>>
>> > > > > > > >>>>> Is not valid, because rows with deptno = 25 will appear
>> > twice
>> > > > in
>> > > > > > the
>> > > > > > > >>>> first query, once in the second. Maybe that problem does
>> not
>> > > > occur
>> > > > > > > when
>> > > > > > > >>>> applied to UNION than when applied to UNION ALL.
>> > > > > > > >>>>>
>> > > > > > > >>>>> There would seem to be analogous rules for INTERSECT
>> > (combine
>> > > > the
>> > > > > > > >>>> conditions using AND) and EXCEPT (combine the conditions
>> > using
>> > > > AND
>> > > > > > > NOT).
>> > > > > > > >>>> Perhaps one rule could cover all set operations (see
>> > > > > > > >>>> FilterSetOpTransposeRule).
>> > > > > > > >>>>>
>> > > > > > > >>>>> Julian
>> > > > > > > >>>>>
>> > > > > > > >>>>>
>> > > > > > > >>>>>
>> > > > > > > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
>> > > > > > > [email protected] <mailto:[email protected]>
>> > > > > > > >>>> <mailto:[email protected] <mailto:
>> > > > > [email protected]
>> > > > > > >
>> > > > > > > <mailto:[email protected] <mailto:
>> > > [email protected]
>> > > > > >>>>
>> > > > > > > >> wrote:
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> A simple example is converting SELECT a, b FROM t
>> WHERE c
>> > =
>> > > 1
>> > > > > > UNION
>> > > > > > > >> ALL
>> > > > > > > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t
>> WHERE c
>> > > in
>> > > > > (1,
>> > > > > > 2)
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> Yanjing Wang <[email protected] <mailto:
>> > > > > > > [email protected]> <mailto:
>> > > > > > > >> [email protected] <mailto:
>> [email protected]>>
>> > > > > > <mailto:
>> > > > > > > >>>> [email protected] <mailto:
>> [email protected]
>> > > >>>
>> > > > > > > >> 于2022年1月19日周三 18:35写道:
>> > > > > > > >>>>>> Hi, community
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> Here I recommend a new rule for converting UNION ALL
>> sub
>> > > plan
>> > > > > to a
>> > > > > > > >>>> single input with an OR filter, the following is its
>> > > conversion
>> > > > > > > diagram.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> The conversion prerequisites are
>> > > > > > > >>>>>> 1. left filter range has no intersection with right
>> filter
>> > > > > range.
>> > > > > > > >>>>>> 2. Project and Input Sub Tree must be identical.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> The rule will be used when Input Sub Tree is a
>> > > > > computing-intensive
>> > > > > > > or
>> > > > > > > >>>> large IO operation.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> I don't know whether the community supports it or not,
>> any
>> > > > > > > suggestions
>> > > > > > > >>>> will be appreciated.
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > ~~~~~~~~~~~~~~~
>> > > > > no mistakes
>> > > > > ~~~~~~~~~~~~~~~~~~
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to