Hi Community, I make a commit <https://github.com/hannerwang/calcite/commit/50fdf37c0c9d560cee3a1ce0d24b83e0346ecd90> for UNION ALL Converting, Please review if you are available, thanks.
Alessandro Solimando <[email protected]> 于2022年1月24日周一 00:42写道: > Hi Justin, > the same table could be used to describe different concepts/classes. > > For instance, you could have a single table "Employee" where the column > "job_type" is used to describe the different kinds of jobs (and therefore > employees) you have. > > At that point, you could have "White Collar Employee" with subclasses > "Secretary", "Manager", etc., and an ontology2db mapping like: > > secretary := select * from employee where job_type=x > ... > manager := select * from employee where job_type=y > > Then any request involving the "white_collar" concept would rewrite to a > union over the subclasses, this should benefit from this optimization if I > am not mistaken. > > Not sure how "IO intensive" such subqueries can be though, but apart from > the toy example, there are complex cases where the subquery cannot benefit > much from indexes and the data can be pretty massive depending on the > scenario. > > The problem is that, for complex databases which has been evolving over > several years, the logical concepts from the ontology do not easily match > concepts from ER model of the DB (if one ever existed :)), and in general > when this happens you are also hitting the RDBMs with unforeseen query > patterns for which the data model hasn't been optimised. > > Best regards, > Alessandro > > On Sun, 23 Jan 2022 at 13:33, Justin Swanhart <[email protected]> wrote: > > > Hi Alessandro, > > > > That is very fascinating. Most of those situations won't apply to this > > optimization though, since it requires the tables of the union to be > > identical rather than different tables representing unions of different > > ontologies, that is, this will only apply optimizations when an ontology > is > > used more than once in a union. Unless I am mistaken. > > > > --Justin > > > > On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando < > > [email protected]> wrote: > > > > > Hello everyone, > > > SQL queries automatically generated by semantic queries (see [1] for > > > instance) are pretty generous on UNIONs, so I'd say that in this area, > > you > > > will have many practical cases matching the patterns highlighted here. > > > > > > The main idea is that when you have a concept hierarchy in your > ontology > > > (isA relationship), where different concepts are generally mapped to > > > different tables (or subparts of a single table selected by changing > the > > > filter condition), when you query an ancestor in that hierarchy (say: > > > Person), in order to account for implicit facts, you generally UNION > all > > > the subtables (say, Employee, Customer, etc, which are also Person) in > > your > > > final SQL query. > > > > > > Best regards, > > > Alessandro > > > > > > [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration > > > > > > On Sun, 23 Jan 2022 at 08:19, Yanjing Wang <[email protected]> > > > wrote: > > > > > > > Thanks Albert. > > > > > > > > you could probably refer to osdi paper "resin general sub query > > fusion" > > > > > which has chapters talking about general union fusion that deals > with > > > the > > > > > case Julian pointed out. > > > > > > > > > > > > > The optimization seems very helpful, but it introduces new operators > > and > > > > the rule for new operators may be more complex. > > > > Hi, Community, What do you think? > > > > > > > > Btw, per my understanding cases of union fusion is not that many in > > > > > practice. > > > > > > > > > > > > > In particular situations such as VDT and BI tools which generates sql > > > > according to a fixed pattern we will meet many such union all > queries. > > > > > > > > Albert <[email protected]> 于2022年1月23日周日 11:22写道: > > > > > > > > > On a more general fusion case, you could probably refer to osdi > paper > > > > > "resin general sub query fusion" which has chapters talking about > > > general > > > > > union fusion that deals with the case Julian pointed out. > > > > > > > > > > Btw, per my understanding cases of union fusion is not that many in > > > > > practice. Curious to know the real cases > > > > > > > > > > 在 2022年1月21日星期五,Yanjing Wang <[email protected]> 写道: > > > > > > > > > > > Hi Julian, I have logged a jira CALCITE-4990 > > > > > > <https://issues.apache.org/jira/browse/CALCITE-4990> for this > new > > > > rule, > > > > > > later I will submit a pr for this. > > > > > > > > > > > > Julian Hyde <[email protected]> 于2022年1月21日周五 01:37写道: > > > > > > > > > > > > > Process logic sounds great. Please log a Jira case. > > > > > > > > > > > > > > There’s a lot here, so it’s probably wise to split into a > simple > > PR > > > > > that > > > > > > > just does the simple case (UNION ALL, no Project) and extend > > later. > > > > > > > > > > > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang < > > > > [email protected] > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > Thanks Julian and Justin. > > > > > > > > > > > > > > > > What do you think the rule should be called? > > > > > UnionFilterTransposeRule, > > > > > > > >> perhaps? > > > > > > > >> > > > > > > > > SetOpFilterMergeRule? > > > > > > > > > > > > > > > > Maybe that problem does not occur when applied to UNION than > > when > > > > > > applied > > > > > > > >> to UNION ALL. > > > > > > > > > > > > > > > > Yes, This is very important. > > > > > > > > > > > > > > > > There would seem to be analogous rules for INTERSECT (combine > > the > > > > > > > >> conditions using AND) and EXCEPT (combine the conditions > using > > > AND > > > > > > NOT). > > > > > > > >> > > > > > > > > Excellent extensions, all the three operators process logic > > are: > > > > > > > > > > > > > > > > For UNION: > > > > > > > > New Filter = left Filter *OR* right Filter. > > > > > > > > > > > > > > > > For INTERSECT: > > > > > > > > New Filter = left Filter *AND* right Filter. > > > > > > > > > > > > > > > > For EXCEPT: > > > > > > > > If left Filter, right Filter have no overlap, transform to > left > > > > child > > > > > > > tree, > > > > > > > > Otherwise > > > > > > > > New Filter = left Filter *AND NOT *right Filter. > > > > > > > > > > > > > > > > For UNION ALL: > > > > > > > > Add prerequisites: > > > > > > > > left Filter, right Filter must have no overlap. > > > > > > > > > > > > > > > > For INTERSECT ALL: > > > > > > > > Add prerequisites: > > > > > > > > If left Filter, right Filter have no overlap, transform to > > empty > > > > > > values. > > > > > > > > Otherwise > > > > > > > > the rule can't be applied. > > > > > > > > > > > > > > > > For EXCEPT ALL: > > > > > > > > same for EXCEPT. > > > > > > > > > > > > > > > > > > > > > > > > work for N-way Union, not just 2-way Union. > > > > > > > >> > > > > > > > > Yes, I will add tests for this. > > > > > > > > > > > > > > > > And I think you should make it work whether or not a Project > is > > > > > > present. > > > > > > > > > > > > > > > > Ok, It seems I need construct several different operand match > > > trees > > > > > for > > > > > > > > plan. > > > > > > > > > > > > > > > > Hi Julian, what do you think of the above process logic? > > > > > > > > > > > > > > > > Julian Hyde <[email protected] <mailto: > > > [email protected] > > > > >> > > > > > > > 于2022年1月20日周四 10:18写道: > > > > > > > > > > > > > > > >> Justin, > > > > > > > >> > > > > > > > >> For planning table or index scans, I would recommend using a > > > > single > > > > > > > >> TableScan with a Filter that uses a Sarg, rather than using > > > > multiple > > > > > > > >> TableScans connected by a Union. So I think this rule will > be > > > > > useful. > > > > > > > >> > > > > > > > >> But I do agree that this proposed rule is not a “no > brainer”. > > It > > > > may > > > > > > not > > > > > > > >> do what people want/expect in all cases, and therefore it > > > probably > > > > > > > should > > > > > > > >> not be enabled it by default. > > > > > > > >> > > > > > > > >> Julian > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart < > > > > [email protected]> > > > > > > > >> wrote: > > > > > > > >>> > > > > > > > >>> Hi, > > > > > > > >>> > > > > > > > >>> Note that this will negate the optimization that one > usually > > is > > > > > > looking > > > > > > > >> for > > > > > > > >>> when writing such queries: > > > > > > > >>> > > > > > > > >>> Select * from TAB where a = 1 > > > > > > > >>> UNION ALL > > > > > > > >>> Select * from TAB where b = 1 > > > > > > > >>> > > > > > > > >>> In a database with indexes (most databases) this will allow > > > > indexes > > > > > > to > > > > > > > be > > > > > > > >>> used on both the a column and the b column. > > > > > > > >>> Databases with bitmap indexes or without indexes would > > benefit > > > > from > > > > > > the > > > > > > > >>> rule. > > > > > > > >>> > > > > > > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde < > > > > > [email protected] > > > > > > > >> <mailto:[email protected] <mailto: > [email protected] > > > >>> > > > > > > wrote: > > > > > > > >>> > > > > > > > >>>> Can you log a Jira case for this? > > > > > > > >>>> > > > > > > > >>>> I think you should make your rule work for N-way Union, > not > > > just > > > > > > 2-way > > > > > > > >>>> Union. And I think you should make it work whether or not > a > > > > > Project > > > > > > is > > > > > > > >>>> present. > > > > > > > >>>> > > > > > > > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde < > > > > [email protected] > > > > > > > <mailto:[email protected]>> > > > > > > > >> wrote: > > > > > > > >>>>> > > > > > > > >>>>> It sounds useful. > > > > > > > >>>>> > > > > > > > >>>>> What do you think the rule should be called? > > > > > > > UnionFilterTransposeRule, > > > > > > > >>>> perhaps? > > > > > > > >>>>> > > > > > > > >>>>> A challenge when writing the rule will be to ensure that > > all > > > of > > > > > the > > > > > > > >>>> inputs to the Union are the same. The Volcano framework is > > not > > > > > very > > > > > > > >> good at > > > > > > > >>>> that. > > > > > > > >>>>> > > > > > > > >>>>> You should be careful of the case that the conditions > > > overlap. > > > > > For > > > > > > > >>>> example, the rewrite > > > > > > > >>>>> > > > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 > > > > > > > >>>>> UNION ALL > > > > > > > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45) > > > > > > > >>>>> > > > > > > > >>>>> to > > > > > > > >>>>> > > > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, > > 45) > > > > > > > >>>>> > > > > > > > >>>>> Is not valid, because rows with deptno = 25 will appear > > twice > > > > in > > > > > > the > > > > > > > >>>> first query, once in the second. Maybe that problem does > not > > > > occur > > > > > > > when > > > > > > > >>>> applied to UNION than when applied to UNION ALL. > > > > > > > >>>>> > > > > > > > >>>>> There would seem to be analogous rules for INTERSECT > > (combine > > > > the > > > > > > > >>>> conditions using AND) and EXCEPT (combine the conditions > > using > > > > AND > > > > > > > NOT). > > > > > > > >>>> Perhaps one rule could cover all set operations (see > > > > > > > >>>> FilterSetOpTransposeRule). > > > > > > > >>>>> > > > > > > > >>>>> Julian > > > > > > > >>>>> > > > > > > > >>>>> > > > > > > > >>>>> > > > > > > > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang < > > > > > > > [email protected] <mailto:[email protected]> > > > > > > > >>>> <mailto:[email protected] <mailto: > > > > > [email protected] > > > > > > > > > > > > > > <mailto:[email protected] <mailto: > > > [email protected] > > > > > >>>> > > > > > > > >> wrote: > > > > > > > >>>>>> > > > > > > > >>>>>> A simple example is converting SELECT a, b FROM t WHERE > c > > = > > > 1 > > > > > > UNION > > > > > > > >> ALL > > > > > > > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t > WHERE c > > > in > > > > > (1, > > > > > > 2) > > > > > > > >>>>>> > > > > > > > >>>>>> Yanjing Wang <[email protected] <mailto: > > > > > > > [email protected]> <mailto: > > > > > > > >> [email protected] <mailto:[email protected] > >> > > > > > > <mailto: > > > > > > > >>>> [email protected] <mailto: > [email protected] > > > >>> > > > > > > > >> 于2022年1月19日周三 18:35写道: > > > > > > > >>>>>> Hi, community > > > > > > > >>>>>> > > > > > > > >>>>>> Here I recommend a new rule for converting UNION ALL sub > > > plan > > > > > to a > > > > > > > >>>> single input with an OR filter, the following is its > > > conversion > > > > > > > diagram. > > > > > > > >>>>>> > > > > > > > >>>>>> > > > > > > > >>>>>> The conversion prerequisites are > > > > > > > >>>>>> 1. left filter range has no intersection with right > filter > > > > > range. > > > > > > > >>>>>> 2. Project and Input Sub Tree must be identical. > > > > > > > >>>>>> > > > > > > > >>>>>> The rule will be used when Input Sub Tree is a > > > > > computing-intensive > > > > > > > or > > > > > > > >>>> large IO operation. > > > > > > > >>>>>> > > > > > > > >>>>>> I don't know whether the community supports it or not, > any > > > > > > > suggestions > > > > > > > >>>> will be appreciated. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > ~~~~~~~~~~~~~~~ > > > > > no mistakes > > > > > ~~~~~~~~~~~~~~~~~~ > > > > > > > > > > > > > > >
