Hi Justin,
the same table could be used to describe different concepts/classes.

For instance, you could have a single table "Employee" where the column
"job_type" is used to describe the different kinds of jobs (and therefore
employees) you have.

At that point, you could have "White Collar Employee" with subclasses
"Secretary", "Manager", etc., and an ontology2db mapping like:

secretary := select * from employee where job_type=x
...
manager := select * from employee where job_type=y

Then any request involving the "white_collar" concept would rewrite to a
union over the subclasses, this should benefit from this optimization if I
am not mistaken.

Not sure how "IO intensive" such subqueries can be though, but apart from
the toy example, there are complex cases where the subquery cannot benefit
much from indexes and the data can be pretty massive depending on the
scenario.

The problem is that, for complex databases which has been evolving over
several years, the logical concepts from the ontology do not easily match
concepts from ER model of the DB (if one ever existed :)), and in general
when this happens you are also hitting the RDBMs with unforeseen query
patterns for which the data model hasn't been optimised.

Best regards,
Alessandro

On Sun, 23 Jan 2022 at 13:33, Justin Swanhart <greenl...@gmail.com> wrote:

> Hi Alessandro,
>
> That is very fascinating.  Most of those situations won't apply to this
> optimization though, since it requires the tables of the union to be
> identical rather than different tables representing unions of different
> ontologies, that is, this will only apply optimizations when an ontology is
> used more than once in a union.  Unless I am mistaken.
>
> --Justin
>
> On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > Hello everyone,
> > SQL queries automatically generated by semantic queries (see [1] for
> > instance) are pretty generous on UNIONs, so I'd say that in this area,
> you
> > will have many practical cases matching the patterns highlighted here.
> >
> > The main idea is that when you have a concept hierarchy in your ontology
> > (isA relationship), where different concepts are generally mapped to
> > different tables (or subparts of a single table selected by changing the
> > filter condition), when you query an ancestor in that hierarchy (say:
> > Person), in order to account for implicit facts, you generally UNION all
> > the subtables (say, Employee, Customer, etc, which are also Person) in
> your
> > final SQL query.
> >
> > Best regards,
> > Alessandro
> >
> > [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration
> >
> > On Sun, 23 Jan 2022 at 08:19, Yanjing Wang <zhuangzixiao...@gmail.com>
> > wrote:
> >
> > > Thanks Albert.
> > >
> > >  you could probably refer to osdi paper "resin general sub query
> fusion"
> > > > which has chapters talking about general union fusion that deals with
> > the
> > > > case Julian pointed out.
> > > >
> > >
> > > The optimization seems very helpful, but it introduces new operators
> and
> > > the rule for new operators may be more complex.
> > > Hi, Community, What do you think?
> > >
> > > Btw, per my understanding cases of union fusion is not that many in
> > > > practice.
> > > >
> > >
> > > In particular situations such as VDT and BI tools which generates sql
> > > according to a fixed pattern we will meet many such union all queries.
> > >
> > > Albert <zinki...@gmail.com> 于2022年1月23日周日 11:22写道:
> > >
> > > > On a more general fusion case, you could probably refer to osdi paper
> > > > "resin general sub query fusion" which has chapters talking about
> > general
> > > > union fusion that deals with the case Julian pointed out.
> > > >
> > > > Btw, per my understanding cases of union fusion is not that many in
> > > > practice. Curious to know the real cases
> > > >
> > > > 在 2022年1月21日星期五,Yanjing Wang <zhuangzixiao...@gmail.com> 写道:
> > > >
> > > > > Hi Julian, I have logged a jira CALCITE-4990
> > > > > <https://issues.apache.org/jira/browse/CALCITE-4990> for this new
> > > rule,
> > > > > later I will submit a pr for this.
> > > > >
> > > > > Julian Hyde <jhyde.apa...@gmail.com> 于2022年1月21日周五 01:37写道:
> > > > >
> > > > > > Process logic sounds great. Please log a Jira case.
> > > > > >
> > > > > > There’s a lot here, so it’s probably wise to split into a simple
> PR
> > > > that
> > > > > > just does the simple case (UNION ALL, no Project) and extend
> later.
> > > > > >
> > > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
> > > zhuangzixiao...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > Thanks Julian and Justin.
> > > > > > >
> > > > > > > What do you think the rule should be called?
> > > > UnionFilterTransposeRule,
> > > > > > >> perhaps?
> > > > > > >>
> > > > > > > SetOpFilterMergeRule?
> > > > > > >
> > > > > > > Maybe that problem does not occur when applied to UNION than
> when
> > > > > applied
> > > > > > >> to UNION ALL.
> > > > > > >
> > > > > > > Yes, This is very important.
> > > > > > >
> > > > > > > There would seem to be analogous rules for INTERSECT (combine
> the
> > > > > > >> conditions using AND) and EXCEPT (combine the conditions using
> > AND
> > > > > NOT).
> > > > > > >>
> > > > > > > Excellent extensions, all the three operators process logic
> are:
> > > > > > >
> > > > > > > For UNION:
> > > > > > > New Filter = left Filter *OR* right Filter.
> > > > > > >
> > > > > > > For INTERSECT:
> > > > > > > New Filter = left Filter *AND* right Filter.
> > > > > > >
> > > > > > > For EXCEPT:
> > > > > > > If left Filter, right Filter have no overlap, transform to left
> > > child
> > > > > > tree,
> > > > > > > Otherwise
> > > > > > > New Filter = left Filter *AND NOT *right Filter.
> > > > > > >
> > > > > > > For UNION ALL:
> > > > > > > Add prerequisites:
> > > > > > > left Filter, right Filter must have no overlap.
> > > > > > >
> > > > > > > For INTERSECT ALL:
> > > > > > > Add prerequisites:
> > > > > > > If left Filter, right Filter have no overlap, transform to
> empty
> > > > > values.
> > > > > > > Otherwise
> > > > > > > the rule can't be applied.
> > > > > > >
> > > > > > > For EXCEPT ALL:
> > > > > > > same for  EXCEPT.
> > > > > > >
> > > > > > >
> > > > > > > work for N-way Union, not just 2-way Union.
> > > > > > >>
> > > > > > > Yes, I will add tests for this.
> > > > > > >
> > > > > > > And I think you should make it work whether or not a Project is
> > > > > present.
> > > > > > >
> > > > > > > Ok, It seems I need construct several different operand match
> > trees
> > > > for
> > > > > > > plan.
> > > > > > >
> > > > > > > Hi Julian, what do you think of the above process logic?
> > > > > > >
> > > > > > > Julian Hyde <jhyde.apa...@gmail.com <mailto:
> > jhyde.apa...@gmail.com
> > > >>
> > > > > > 于2022年1月20日周四 10:18写道:
> > > > > > >
> > > > > > >> Justin,
> > > > > > >>
> > > > > > >> For planning table or index scans, I would recommend using a
> > > single
> > > > > > >> TableScan with a Filter that uses a Sarg, rather than using
> > > multiple
> > > > > > >> TableScans connected by a Union. So I think this rule will be
> > > > useful.
> > > > > > >>
> > > > > > >> But I do agree that this proposed rule is not a “no brainer”.
> It
> > > may
> > > > > not
> > > > > > >> do what people want/expect in all cases, and therefore it
> > probably
> > > > > > should
> > > > > > >> not be enabled it by default.
> > > > > > >>
> > > > > > >> Julian
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <
> > > greenl...@gmail.com>
> > > > > > >> wrote:
> > > > > > >>>
> > > > > > >>> Hi,
> > > > > > >>>
> > > > > > >>> Note that this will negate the optimization that one usually
> is
> > > > > looking
> > > > > > >> for
> > > > > > >>> when writing such queries:
> > > > > > >>>
> > > > > > >>> Select * from TAB where a = 1
> > > > > > >>> UNION ALL
> > > > > > >>> Select * from TAB where b = 1
> > > > > > >>>
> > > > > > >>> In a database with indexes (most databases) this will allow
> > > indexes
> > > > > to
> > > > > > be
> > > > > > >>> used on both the a column and the b column.
> > > > > > >>> Databases with bitmap indexes or without indexes would
> benefit
> > > from
> > > > > the
> > > > > > >>> rule.
> > > > > > >>>
> > > > > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <
> > > > jhyde.apa...@gmail.com
> > > > > > >> <mailto:jhyde.apa...@gmail.com <mailto:jhyde.apa...@gmail.com
> > >>>
> > > > > wrote:
> > > > > > >>>
> > > > > > >>>> Can you log a Jira case for this?
> > > > > > >>>>
> > > > > > >>>> I think you should make your rule work for N-way Union, not
> > just
> > > > > 2-way
> > > > > > >>>> Union. And I think you should make it work whether or not a
> > > > Project
> > > > > is
> > > > > > >>>> present.
> > > > > > >>>>
> > > > > > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <
> > > jhyde.apa...@gmail.com
> > > > > > <mailto:jhyde.apa...@gmail.com>>
> > > > > > >> wrote:
> > > > > > >>>>>
> > > > > > >>>>> It sounds useful.
> > > > > > >>>>>
> > > > > > >>>>> What do you think the rule should be called?
> > > > > > UnionFilterTransposeRule,
> > > > > > >>>> perhaps?
> > > > > > >>>>>
> > > > > > >>>>> A challenge when writing the rule will be to ensure that
> all
> > of
> > > > the
> > > > > > >>>> inputs to the Union are the same. The Volcano framework is
> not
> > > > very
> > > > > > >> good at
> > > > > > >>>> that.
> > > > > > >>>>>
> > > > > > >>>>> You should be careful of the case that the conditions
> > overlap.
> > > > For
> > > > > > >>>> example, the rewrite
> > > > > > >>>>>
> > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30
> > > > > > >>>>> UNION ALL
> > > > > > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> > > > > > >>>>>
> > > > > > >>>>> to
> > > > > > >>>>>
> > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35,
> 45)
> > > > > > >>>>>
> > > > > > >>>>> Is not valid, because rows with deptno = 25 will appear
> twice
> > > in
> > > > > the
> > > > > > >>>> first query, once in the second. Maybe that problem does not
> > > occur
> > > > > > when
> > > > > > >>>> applied to UNION than when applied to UNION ALL.
> > > > > > >>>>>
> > > > > > >>>>> There would seem to be analogous rules for INTERSECT
> (combine
> > > the
> > > > > > >>>> conditions using AND) and EXCEPT (combine the conditions
> using
> > > AND
> > > > > > NOT).
> > > > > > >>>> Perhaps one rule could cover all set operations (see
> > > > > > >>>> FilterSetOpTransposeRule).
> > > > > > >>>>>
> > > > > > >>>>> Julian
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
> > > > > > zhuangzixiao...@gmail.com <mailto:zhuangzixiao...@gmail.com>
> > > > > > >>>> <mailto:zhuangzixiao...@gmail.com <mailto:
> > > > zhuangzixiao...@gmail.com
> > > > > >
> > > > > > <mailto:zhuangzixiao...@gmail.com <mailto:
> > zhuangzixiao...@gmail.com
> > > > >>>>
> > > > > > >> wrote:
> > > > > > >>>>>>
> > > > > > >>>>>> A simple example is converting SELECT a, b FROM t WHERE c
> =
> > 1
> > > > > UNION
> > > > > > >> ALL
> > > > > > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c
> > in
> > > > (1,
> > > > > 2)
> > > > > > >>>>>>
> > > > > > >>>>>> Yanjing Wang <zhuangzixiao...@gmail.com <mailto:
> > > > > > zhuangzixiao...@gmail.com> <mailto:
> > > > > > >> zhuangzixiao...@gmail.com <mailto:zhuangzixiao...@gmail.com>>
> > > > > <mailto:
> > > > > > >>>> zhuangzixiao...@gmail.com <mailto:zhuangzixiao...@gmail.com
> > >>>
> > > > > > >> 于2022年1月19日周三 18:35写道:
> > > > > > >>>>>> Hi, community
> > > > > > >>>>>>
> > > > > > >>>>>> Here I recommend a new rule for converting UNION ALL sub
> > plan
> > > > to a
> > > > > > >>>> single input with an OR filter, the following is its
> > conversion
> > > > > > diagram.
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>> The conversion prerequisites are
> > > > > > >>>>>> 1. left filter range has no intersection with right filter
> > > > range.
> > > > > > >>>>>> 2. Project and Input Sub Tree must be identical.
> > > > > > >>>>>>
> > > > > > >>>>>> The rule will be used when Input Sub Tree is a
> > > > computing-intensive
> > > > > > or
> > > > > > >>>> large IO operation.
> > > > > > >>>>>>
> > > > > > >>>>>> I don't know whether the community supports it or not, any
> > > > > > suggestions
> > > > > > >>>> will be appreciated.
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > ~~~~~~~~~~~~~~~
> > > > no mistakes
> > > > ~~~~~~~~~~~~~~~~~~
> > > >
> > >
> >
>

Reply via email to