Hi Community,

I make a commit
<https://github.com/hannerwang/calcite/commit/50fdf37c0c9d560cee3a1ce0d24b83e0346ecd90>
for
UNION ALL Converting, Please review if you are available, thanks.

Alessandro Solimando <[email protected]> 于2022年1月24日周一 00:42写道:

> Hi Justin,
> the same table could be used to describe different concepts/classes.
>
> For instance, you could have a single table "Employee" where the column
> "job_type" is used to describe the different kinds of jobs (and therefore
> employees) you have.
>
> At that point, you could have "White Collar Employee" with subclasses
> "Secretary", "Manager", etc., and an ontology2db mapping like:
>
> secretary := select * from employee where job_type=x
> ...
> manager := select * from employee where job_type=y
>
> Then any request involving the "white_collar" concept would rewrite to a
> union over the subclasses, this should benefit from this optimization if I
> am not mistaken.
>
> Not sure how "IO intensive" such subqueries can be though, but apart from
> the toy example, there are complex cases where the subquery cannot benefit
> much from indexes and the data can be pretty massive depending on the
> scenario.
>
> The problem is that, for complex databases which has been evolving over
> several years, the logical concepts from the ontology do not easily match
> concepts from ER model of the DB (if one ever existed :)), and in general
> when this happens you are also hitting the RDBMs with unforeseen query
> patterns for which the data model hasn't been optimised.
>
> Best regards,
> Alessandro
>
> On Sun, 23 Jan 2022 at 13:33, Justin Swanhart <[email protected]> wrote:
>
> > Hi Alessandro,
> >
> > That is very fascinating.  Most of those situations won't apply to this
> > optimization though, since it requires the tables of the union to be
> > identical rather than different tables representing unions of different
> > ontologies, that is, this will only apply optimizations when an ontology
> is
> > used more than once in a union.  Unless I am mistaken.
> >
> > --Justin
> >
> > On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando <
> > [email protected]> wrote:
> >
> > > Hello everyone,
> > > SQL queries automatically generated by semantic queries (see [1] for
> > > instance) are pretty generous on UNIONs, so I'd say that in this area,
> > you
> > > will have many practical cases matching the patterns highlighted here.
> > >
> > > The main idea is that when you have a concept hierarchy in your
> ontology
> > > (isA relationship), where different concepts are generally mapped to
> > > different tables (or subparts of a single table selected by changing
> the
> > > filter condition), when you query an ancestor in that hierarchy (say:
> > > Person), in order to account for implicit facts, you generally UNION
> all
> > > the subtables (say, Employee, Customer, etc, which are also Person) in
> > your
> > > final SQL query.
> > >
> > > Best regards,
> > > Alessandro
> > >
> > > [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration
> > >
> > > On Sun, 23 Jan 2022 at 08:19, Yanjing Wang <[email protected]>
> > > wrote:
> > >
> > > > Thanks Albert.
> > > >
> > > >  you could probably refer to osdi paper "resin general sub query
> > fusion"
> > > > > which has chapters talking about general union fusion that deals
> with
> > > the
> > > > > case Julian pointed out.
> > > > >
> > > >
> > > > The optimization seems very helpful, but it introduces new operators
> > and
> > > > the rule for new operators may be more complex.
> > > > Hi, Community, What do you think?
> > > >
> > > > Btw, per my understanding cases of union fusion is not that many in
> > > > > practice.
> > > > >
> > > >
> > > > In particular situations such as VDT and BI tools which generates sql
> > > > according to a fixed pattern we will meet many such union all
> queries.
> > > >
> > > > Albert <[email protected]> 于2022年1月23日周日 11:22写道:
> > > >
> > > > > On a more general fusion case, you could probably refer to osdi
> paper
> > > > > "resin general sub query fusion" which has chapters talking about
> > > general
> > > > > union fusion that deals with the case Julian pointed out.
> > > > >
> > > > > Btw, per my understanding cases of union fusion is not that many in
> > > > > practice. Curious to know the real cases
> > > > >
> > > > > 在 2022年1月21日星期五,Yanjing Wang <[email protected]> 写道:
> > > > >
> > > > > > Hi Julian, I have logged a jira CALCITE-4990
> > > > > > <https://issues.apache.org/jira/browse/CALCITE-4990> for this
> new
> > > > rule,
> > > > > > later I will submit a pr for this.
> > > > > >
> > > > > > Julian Hyde <[email protected]> 于2022年1月21日周五 01:37写道:
> > > > > >
> > > > > > > Process logic sounds great. Please log a Jira case.
> > > > > > >
> > > > > > > There’s a lot here, so it’s probably wise to split into a
> simple
> > PR
> > > > > that
> > > > > > > just does the simple case (UNION ALL, no Project) and extend
> > later.
> > > > > > >
> > > > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
> > > > [email protected]
> > > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks Julian and Justin.
> > > > > > > >
> > > > > > > > What do you think the rule should be called?
> > > > > UnionFilterTransposeRule,
> > > > > > > >> perhaps?
> > > > > > > >>
> > > > > > > > SetOpFilterMergeRule?
> > > > > > > >
> > > > > > > > Maybe that problem does not occur when applied to UNION than
> > when
> > > > > > applied
> > > > > > > >> to UNION ALL.
> > > > > > > >
> > > > > > > > Yes, This is very important.
> > > > > > > >
> > > > > > > > There would seem to be analogous rules for INTERSECT (combine
> > the
> > > > > > > >> conditions using AND) and EXCEPT (combine the conditions
> using
> > > AND
> > > > > > NOT).
> > > > > > > >>
> > > > > > > > Excellent extensions, all the three operators process logic
> > are:
> > > > > > > >
> > > > > > > > For UNION:
> > > > > > > > New Filter = left Filter *OR* right Filter.
> > > > > > > >
> > > > > > > > For INTERSECT:
> > > > > > > > New Filter = left Filter *AND* right Filter.
> > > > > > > >
> > > > > > > > For EXCEPT:
> > > > > > > > If left Filter, right Filter have no overlap, transform to
> left
> > > > child
> > > > > > > tree,
> > > > > > > > Otherwise
> > > > > > > > New Filter = left Filter *AND NOT *right Filter.
> > > > > > > >
> > > > > > > > For UNION ALL:
> > > > > > > > Add prerequisites:
> > > > > > > > left Filter, right Filter must have no overlap.
> > > > > > > >
> > > > > > > > For INTERSECT ALL:
> > > > > > > > Add prerequisites:
> > > > > > > > If left Filter, right Filter have no overlap, transform to
> > empty
> > > > > > values.
> > > > > > > > Otherwise
> > > > > > > > the rule can't be applied.
> > > > > > > >
> > > > > > > > For EXCEPT ALL:
> > > > > > > > same for  EXCEPT.
> > > > > > > >
> > > > > > > >
> > > > > > > > work for N-way Union, not just 2-way Union.
> > > > > > > >>
> > > > > > > > Yes, I will add tests for this.
> > > > > > > >
> > > > > > > > And I think you should make it work whether or not a Project
> is
> > > > > > present.
> > > > > > > >
> > > > > > > > Ok, It seems I need construct several different operand match
> > > trees
> > > > > for
> > > > > > > > plan.
> > > > > > > >
> > > > > > > > Hi Julian, what do you think of the above process logic?
> > > > > > > >
> > > > > > > > Julian Hyde <[email protected] <mailto:
> > > [email protected]
> > > > >>
> > > > > > > 于2022年1月20日周四 10:18写道:
> > > > > > > >
> > > > > > > >> Justin,
> > > > > > > >>
> > > > > > > >> For planning table or index scans, I would recommend using a
> > > > single
> > > > > > > >> TableScan with a Filter that uses a Sarg, rather than using
> > > > multiple
> > > > > > > >> TableScans connected by a Union. So I think this rule will
> be
> > > > > useful.
> > > > > > > >>
> > > > > > > >> But I do agree that this proposed rule is not a “no
> brainer”.
> > It
> > > > may
> > > > > > not
> > > > > > > >> do what people want/expect in all cases, and therefore it
> > > probably
> > > > > > > should
> > > > > > > >> not be enabled it by default.
> > > > > > > >>
> > > > > > > >> Julian
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <
> > > > [email protected]>
> > > > > > > >> wrote:
> > > > > > > >>>
> > > > > > > >>> Hi,
> > > > > > > >>>
> > > > > > > >>> Note that this will negate the optimization that one
> usually
> > is
> > > > > > looking
> > > > > > > >> for
> > > > > > > >>> when writing such queries:
> > > > > > > >>>
> > > > > > > >>> Select * from TAB where a = 1
> > > > > > > >>> UNION ALL
> > > > > > > >>> Select * from TAB where b = 1
> > > > > > > >>>
> > > > > > > >>> In a database with indexes (most databases) this will allow
> > > > indexes
> > > > > > to
> > > > > > > be
> > > > > > > >>> used on both the a column and the b column.
> > > > > > > >>> Databases with bitmap indexes or without indexes would
> > benefit
> > > > from
> > > > > > the
> > > > > > > >>> rule.
> > > > > > > >>>
> > > > > > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <
> > > > > [email protected]
> > > > > > > >> <mailto:[email protected] <mailto:
> [email protected]
> > > >>>
> > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>>> Can you log a Jira case for this?
> > > > > > > >>>>
> > > > > > > >>>> I think you should make your rule work for N-way Union,
> not
> > > just
> > > > > > 2-way
> > > > > > > >>>> Union. And I think you should make it work whether or not
> a
> > > > > Project
> > > > > > is
> > > > > > > >>>> present.
> > > > > > > >>>>
> > > > > > > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <
> > > > [email protected]
> > > > > > > <mailto:[email protected]>>
> > > > > > > >> wrote:
> > > > > > > >>>>>
> > > > > > > >>>>> It sounds useful.
> > > > > > > >>>>>
> > > > > > > >>>>> What do you think the rule should be called?
> > > > > > > UnionFilterTransposeRule,
> > > > > > > >>>> perhaps?
> > > > > > > >>>>>
> > > > > > > >>>>> A challenge when writing the rule will be to ensure that
> > all
> > > of
> > > > > the
> > > > > > > >>>> inputs to the Union are the same. The Volcano framework is
> > not
> > > > > very
> > > > > > > >> good at
> > > > > > > >>>> that.
> > > > > > > >>>>>
> > > > > > > >>>>> You should be careful of the case that the conditions
> > > overlap.
> > > > > For
> > > > > > > >>>> example, the rewrite
> > > > > > > >>>>>
> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30
> > > > > > > >>>>> UNION ALL
> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> > > > > > > >>>>>
> > > > > > > >>>>> to
> > > > > > > >>>>>
> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35,
> > 45)
> > > > > > > >>>>>
> > > > > > > >>>>> Is not valid, because rows with deptno = 25 will appear
> > twice
> > > > in
> > > > > > the
> > > > > > > >>>> first query, once in the second. Maybe that problem does
> not
> > > > occur
> > > > > > > when
> > > > > > > >>>> applied to UNION than when applied to UNION ALL.
> > > > > > > >>>>>
> > > > > > > >>>>> There would seem to be analogous rules for INTERSECT
> > (combine
> > > > the
> > > > > > > >>>> conditions using AND) and EXCEPT (combine the conditions
> > using
> > > > AND
> > > > > > > NOT).
> > > > > > > >>>> Perhaps one rule could cover all set operations (see
> > > > > > > >>>> FilterSetOpTransposeRule).
> > > > > > > >>>>>
> > > > > > > >>>>> Julian
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
> > > > > > > [email protected] <mailto:[email protected]>
> > > > > > > >>>> <mailto:[email protected] <mailto:
> > > > > [email protected]
> > > > > > >
> > > > > > > <mailto:[email protected] <mailto:
> > > [email protected]
> > > > > >>>>
> > > > > > > >> wrote:
> > > > > > > >>>>>>
> > > > > > > >>>>>> A simple example is converting SELECT a, b FROM t WHERE
> c
> > =
> > > 1
> > > > > > UNION
> > > > > > > >> ALL
> > > > > > > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t
> WHERE c
> > > in
> > > > > (1,
> > > > > > 2)
> > > > > > > >>>>>>
> > > > > > > >>>>>> Yanjing Wang <[email protected] <mailto:
> > > > > > > [email protected]> <mailto:
> > > > > > > >> [email protected] <mailto:[email protected]
> >>
> > > > > > <mailto:
> > > > > > > >>>> [email protected] <mailto:
> [email protected]
> > > >>>
> > > > > > > >> 于2022年1月19日周三 18:35写道:
> > > > > > > >>>>>> Hi, community
> > > > > > > >>>>>>
> > > > > > > >>>>>> Here I recommend a new rule for converting UNION ALL sub
> > > plan
> > > > > to a
> > > > > > > >>>> single input with an OR filter, the following is its
> > > conversion
> > > > > > > diagram.
> > > > > > > >>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>> The conversion prerequisites are
> > > > > > > >>>>>> 1. left filter range has no intersection with right
> filter
> > > > > range.
> > > > > > > >>>>>> 2. Project and Input Sub Tree must be identical.
> > > > > > > >>>>>>
> > > > > > > >>>>>> The rule will be used when Input Sub Tree is a
> > > > > computing-intensive
> > > > > > > or
> > > > > > > >>>> large IO operation.
> > > > > > > >>>>>>
> > > > > > > >>>>>> I don't know whether the community supports it or not,
> any
> > > > > > > suggestions
> > > > > > > >>>> will be appreciated.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > ~~~~~~~~~~~~~~~
> > > > > no mistakes
> > > > > ~~~~~~~~~~~~~~~~~~
> > > > >
> > > >
> > >
> >
>

Reply via email to