The previous commit is obsolete, Please use the latest commit on branch union_all_fusion_to_single <https://github.com/hannerwang/calcite/tree/union_all_fusion_to_single>, thanks.
Yanjing Wang <[email protected]> 于2022年1月26日周三 12:17写道: > Hi Community, > > I make a commit > <https://github.com/hannerwang/calcite/commit/50fdf37c0c9d560cee3a1ce0d24b83e0346ecd90> > for > UNION ALL Converting, Please review if you are available, thanks. > > Alessandro Solimando <[email protected]> 于2022年1月24日周一 > 00:42写道: > >> Hi Justin, >> the same table could be used to describe different concepts/classes. >> >> For instance, you could have a single table "Employee" where the column >> "job_type" is used to describe the different kinds of jobs (and therefore >> employees) you have. >> >> At that point, you could have "White Collar Employee" with subclasses >> "Secretary", "Manager", etc., and an ontology2db mapping like: >> >> secretary := select * from employee where job_type=x >> ... >> manager := select * from employee where job_type=y >> >> Then any request involving the "white_collar" concept would rewrite to a >> union over the subclasses, this should benefit from this optimization if I >> am not mistaken. >> >> Not sure how "IO intensive" such subqueries can be though, but apart from >> the toy example, there are complex cases where the subquery cannot benefit >> much from indexes and the data can be pretty massive depending on the >> scenario. >> >> The problem is that, for complex databases which has been evolving over >> several years, the logical concepts from the ontology do not easily match >> concepts from ER model of the DB (if one ever existed :)), and in general >> when this happens you are also hitting the RDBMs with unforeseen query >> patterns for which the data model hasn't been optimised. >> >> Best regards, >> Alessandro >> >> On Sun, 23 Jan 2022 at 13:33, Justin Swanhart <[email protected]> >> wrote: >> >> > Hi Alessandro, >> > >> > That is very fascinating. Most of those situations won't apply to this >> > optimization though, since it requires the tables of the union to be >> > identical rather than different tables representing unions of different >> > ontologies, that is, this will only apply optimizations when an >> ontology is >> > used more than once in a union. Unless I am mistaken. >> > >> > --Justin >> > >> > On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando < >> > [email protected]> wrote: >> > >> > > Hello everyone, >> > > SQL queries automatically generated by semantic queries (see [1] for >> > > instance) are pretty generous on UNIONs, so I'd say that in this area, >> > you >> > > will have many practical cases matching the patterns highlighted here. >> > > >> > > The main idea is that when you have a concept hierarchy in your >> ontology >> > > (isA relationship), where different concepts are generally mapped to >> > > different tables (or subparts of a single table selected by changing >> the >> > > filter condition), when you query an ancestor in that hierarchy (say: >> > > Person), in order to account for implicit facts, you generally UNION >> all >> > > the subtables (say, Employee, Customer, etc, which are also Person) in >> > your >> > > final SQL query. >> > > >> > > Best regards, >> > > Alessandro >> > > >> > > [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration >> > > >> > > On Sun, 23 Jan 2022 at 08:19, Yanjing Wang <[email protected] >> > >> > > wrote: >> > > >> > > > Thanks Albert. >> > > > >> > > > you could probably refer to osdi paper "resin general sub query >> > fusion" >> > > > > which has chapters talking about general union fusion that deals >> with >> > > the >> > > > > case Julian pointed out. >> > > > > >> > > > >> > > > The optimization seems very helpful, but it introduces new operators >> > and >> > > > the rule for new operators may be more complex. >> > > > Hi, Community, What do you think? >> > > > >> > > > Btw, per my understanding cases of union fusion is not that many in >> > > > > practice. >> > > > > >> > > > >> > > > In particular situations such as VDT and BI tools which generates >> sql >> > > > according to a fixed pattern we will meet many such union all >> queries. >> > > > >> > > > Albert <[email protected]> 于2022年1月23日周日 11:22写道: >> > > > >> > > > > On a more general fusion case, you could probably refer to osdi >> paper >> > > > > "resin general sub query fusion" which has chapters talking about >> > > general >> > > > > union fusion that deals with the case Julian pointed out. >> > > > > >> > > > > Btw, per my understanding cases of union fusion is not that many >> in >> > > > > practice. Curious to know the real cases >> > > > > >> > > > > 在 2022年1月21日星期五,Yanjing Wang <[email protected]> 写道: >> > > > > >> > > > > > Hi Julian, I have logged a jira CALCITE-4990 >> > > > > > <https://issues.apache.org/jira/browse/CALCITE-4990> for this >> new >> > > > rule, >> > > > > > later I will submit a pr for this. >> > > > > > >> > > > > > Julian Hyde <[email protected]> 于2022年1月21日周五 01:37写道: >> > > > > > >> > > > > > > Process logic sounds great. Please log a Jira case. >> > > > > > > >> > > > > > > There’s a lot here, so it’s probably wise to split into a >> simple >> > PR >> > > > > that >> > > > > > > just does the simple case (UNION ALL, no Project) and extend >> > later. >> > > > > > > >> > > > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang < >> > > > [email protected] >> > > > > > >> > > > > > > wrote: >> > > > > > > > >> > > > > > > > Thanks Julian and Justin. >> > > > > > > > >> > > > > > > > What do you think the rule should be called? >> > > > > UnionFilterTransposeRule, >> > > > > > > >> perhaps? >> > > > > > > >> >> > > > > > > > SetOpFilterMergeRule? >> > > > > > > > >> > > > > > > > Maybe that problem does not occur when applied to UNION than >> > when >> > > > > > applied >> > > > > > > >> to UNION ALL. >> > > > > > > > >> > > > > > > > Yes, This is very important. >> > > > > > > > >> > > > > > > > There would seem to be analogous rules for INTERSECT >> (combine >> > the >> > > > > > > >> conditions using AND) and EXCEPT (combine the conditions >> using >> > > AND >> > > > > > NOT). >> > > > > > > >> >> > > > > > > > Excellent extensions, all the three operators process logic >> > are: >> > > > > > > > >> > > > > > > > For UNION: >> > > > > > > > New Filter = left Filter *OR* right Filter. >> > > > > > > > >> > > > > > > > For INTERSECT: >> > > > > > > > New Filter = left Filter *AND* right Filter. >> > > > > > > > >> > > > > > > > For EXCEPT: >> > > > > > > > If left Filter, right Filter have no overlap, transform to >> left >> > > > child >> > > > > > > tree, >> > > > > > > > Otherwise >> > > > > > > > New Filter = left Filter *AND NOT *right Filter. >> > > > > > > > >> > > > > > > > For UNION ALL: >> > > > > > > > Add prerequisites: >> > > > > > > > left Filter, right Filter must have no overlap. >> > > > > > > > >> > > > > > > > For INTERSECT ALL: >> > > > > > > > Add prerequisites: >> > > > > > > > If left Filter, right Filter have no overlap, transform to >> > empty >> > > > > > values. >> > > > > > > > Otherwise >> > > > > > > > the rule can't be applied. >> > > > > > > > >> > > > > > > > For EXCEPT ALL: >> > > > > > > > same for EXCEPT. >> > > > > > > > >> > > > > > > > >> > > > > > > > work for N-way Union, not just 2-way Union. >> > > > > > > >> >> > > > > > > > Yes, I will add tests for this. >> > > > > > > > >> > > > > > > > And I think you should make it work whether or not a >> Project is >> > > > > > present. >> > > > > > > > >> > > > > > > > Ok, It seems I need construct several different operand >> match >> > > trees >> > > > > for >> > > > > > > > plan. >> > > > > > > > >> > > > > > > > Hi Julian, what do you think of the above process logic? >> > > > > > > > >> > > > > > > > Julian Hyde <[email protected] <mailto: >> > > [email protected] >> > > > >> >> > > > > > > 于2022年1月20日周四 10:18写道: >> > > > > > > > >> > > > > > > >> Justin, >> > > > > > > >> >> > > > > > > >> For planning table or index scans, I would recommend using >> a >> > > > single >> > > > > > > >> TableScan with a Filter that uses a Sarg, rather than using >> > > > multiple >> > > > > > > >> TableScans connected by a Union. So I think this rule will >> be >> > > > > useful. >> > > > > > > >> >> > > > > > > >> But I do agree that this proposed rule is not a “no >> brainer”. >> > It >> > > > may >> > > > > > not >> > > > > > > >> do what people want/expect in all cases, and therefore it >> > > probably >> > > > > > > should >> > > > > > > >> not be enabled it by default. >> > > > > > > >> >> > > > > > > >> Julian >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart < >> > > > [email protected]> >> > > > > > > >> wrote: >> > > > > > > >>> >> > > > > > > >>> Hi, >> > > > > > > >>> >> > > > > > > >>> Note that this will negate the optimization that one >> usually >> > is >> > > > > > looking >> > > > > > > >> for >> > > > > > > >>> when writing such queries: >> > > > > > > >>> >> > > > > > > >>> Select * from TAB where a = 1 >> > > > > > > >>> UNION ALL >> > > > > > > >>> Select * from TAB where b = 1 >> > > > > > > >>> >> > > > > > > >>> In a database with indexes (most databases) this will >> allow >> > > > indexes >> > > > > > to >> > > > > > > be >> > > > > > > >>> used on both the a column and the b column. >> > > > > > > >>> Databases with bitmap indexes or without indexes would >> > benefit >> > > > from >> > > > > > the >> > > > > > > >>> rule. >> > > > > > > >>> >> > > > > > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde < >> > > > > [email protected] >> > > > > > > >> <mailto:[email protected] <mailto: >> [email protected] >> > > >>> >> > > > > > wrote: >> > > > > > > >>> >> > > > > > > >>>> Can you log a Jira case for this? >> > > > > > > >>>> >> > > > > > > >>>> I think you should make your rule work for N-way Union, >> not >> > > just >> > > > > > 2-way >> > > > > > > >>>> Union. And I think you should make it work whether or >> not a >> > > > > Project >> > > > > > is >> > > > > > > >>>> present. >> > > > > > > >>>> >> > > > > > > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde < >> > > > [email protected] >> > > > > > > <mailto:[email protected]>> >> > > > > > > >> wrote: >> > > > > > > >>>>> >> > > > > > > >>>>> It sounds useful. >> > > > > > > >>>>> >> > > > > > > >>>>> What do you think the rule should be called? >> > > > > > > UnionFilterTransposeRule, >> > > > > > > >>>> perhaps? >> > > > > > > >>>>> >> > > > > > > >>>>> A challenge when writing the rule will be to ensure that >> > all >> > > of >> > > > > the >> > > > > > > >>>> inputs to the Union are the same. The Volcano framework >> is >> > not >> > > > > very >> > > > > > > >> good at >> > > > > > > >>>> that. >> > > > > > > >>>>> >> > > > > > > >>>>> You should be careful of the case that the conditions >> > > overlap. >> > > > > For >> > > > > > > >>>> example, the rewrite >> > > > > > > >>>>> >> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 >> > > > > > > >>>>> UNION ALL >> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45) >> > > > > > > >>>>> >> > > > > > > >>>>> to >> > > > > > > >>>>> >> > > > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, >> 35, >> > 45) >> > > > > > > >>>>> >> > > > > > > >>>>> Is not valid, because rows with deptno = 25 will appear >> > twice >> > > > in >> > > > > > the >> > > > > > > >>>> first query, once in the second. Maybe that problem does >> not >> > > > occur >> > > > > > > when >> > > > > > > >>>> applied to UNION than when applied to UNION ALL. >> > > > > > > >>>>> >> > > > > > > >>>>> There would seem to be analogous rules for INTERSECT >> > (combine >> > > > the >> > > > > > > >>>> conditions using AND) and EXCEPT (combine the conditions >> > using >> > > > AND >> > > > > > > NOT). >> > > > > > > >>>> Perhaps one rule could cover all set operations (see >> > > > > > > >>>> FilterSetOpTransposeRule). >> > > > > > > >>>>> >> > > > > > > >>>>> Julian >> > > > > > > >>>>> >> > > > > > > >>>>> >> > > > > > > >>>>> >> > > > > > > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang < >> > > > > > > [email protected] <mailto:[email protected]> >> > > > > > > >>>> <mailto:[email protected] <mailto: >> > > > > [email protected] >> > > > > > > >> > > > > > > <mailto:[email protected] <mailto: >> > > [email protected] >> > > > > >>>> >> > > > > > > >> wrote: >> > > > > > > >>>>>> >> > > > > > > >>>>>> A simple example is converting SELECT a, b FROM t >> WHERE c >> > = >> > > 1 >> > > > > > UNION >> > > > > > > >> ALL >> > > > > > > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t >> WHERE c >> > > in >> > > > > (1, >> > > > > > 2) >> > > > > > > >>>>>> >> > > > > > > >>>>>> Yanjing Wang <[email protected] <mailto: >> > > > > > > [email protected]> <mailto: >> > > > > > > >> [email protected] <mailto: >> [email protected]>> >> > > > > > <mailto: >> > > > > > > >>>> [email protected] <mailto: >> [email protected] >> > > >>> >> > > > > > > >> 于2022年1月19日周三 18:35写道: >> > > > > > > >>>>>> Hi, community >> > > > > > > >>>>>> >> > > > > > > >>>>>> Here I recommend a new rule for converting UNION ALL >> sub >> > > plan >> > > > > to a >> > > > > > > >>>> single input with an OR filter, the following is its >> > > conversion >> > > > > > > diagram. >> > > > > > > >>>>>> >> > > > > > > >>>>>> >> > > > > > > >>>>>> The conversion prerequisites are >> > > > > > > >>>>>> 1. left filter range has no intersection with right >> filter >> > > > > range. >> > > > > > > >>>>>> 2. Project and Input Sub Tree must be identical. >> > > > > > > >>>>>> >> > > > > > > >>>>>> The rule will be used when Input Sub Tree is a >> > > > > computing-intensive >> > > > > > > or >> > > > > > > >>>> large IO operation. >> > > > > > > >>>>>> >> > > > > > > >>>>>> I don't know whether the community supports it or not, >> any >> > > > > > > suggestions >> > > > > > > >>>> will be appreciated. >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > ~~~~~~~~~~~~~~~ >> > > > > no mistakes >> > > > > ~~~~~~~~~~~~~~~~~~ >> > > > > >> > > > >> > > >> > >> >
