Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On Mon, Mar 18, 2019 at 8:09 AM Noah Misch wrote: > On Tue, Oct 02, 2018 at 10:53:40AM -0400, Tom Lane wrote: > > Noah Misch writes: > > > On Mon, Oct 01, 2018 at 09:32:10PM -0400, Tom Lane wrote: > > >> FWIW, my problem with this patch is that I remain unconvinced of the > basic > > >> correctness of the transform (specifically the unique-ification > approach). > > >> Noah's points would be important to address if we were moving the > patch > > >> towards commit, but I don't see much reason to put effort into it > until > > >> we can think of a way to prove whether that works. > > > > > Not even effort to fix the assertion failures I reported? > > > > If it seemed relevant to the proof-of-correctness problem, I would look > > into it, but ... > I put some hours into theoretical study of the proof, and I didn't find any > holes. When the planner removes "l0 LEFT JOIN r1", it does that because > there's one output row per l0.ctid regardless of the rows of r1. Hence, > deduplicating on (l0.ctid) is equivalent to deduplicating on (l0.ctid, > r1.ctid). In the bad FULL JOIN case, (l0.ctid, r1.ctid) would be > sufficient > as a key, but we're out of luck because removing the join makes us have > only > l0.ctid for some tuples and only r1.ctid for other tuples. > > If PostgreSQL ever gets inner join removal, it would be harder to preserve > this optimization in this form. At that point, perhaps we'd cost the path > that retains the join for the benefit of $SUBJECT. Given the performance > test > results, $SUBJECT may already need a cost-based decision on whether to use > it. > > As for the proof-of-correctness problem, I think the below strategy should be easy to understand. SELECT * FROM any_v WHERE (A OR B OR C). => SELECT * FROM any_v WHERE A UNION ALL SELECT * FROM any_v WHERE B AND !A UNION ALL SELECT * FROM any_v WHERE C AND !A AND !(B AND !A); where !(Expr) means (NOT (expr) OR (EXPR) IS NULL) In this way, there is no duplicated data at the first. Oracle uses a similar way like this[1]. - A shared planner info idea. About the planning time case, suppose we have a query below. SELECT * FROM t1, t2, ... t20 WHERE (t1.a = 1 or t2.a = 1) and xxx; With the current strategy, t3 .. t20 would be planned many times, including base relation like (t3.. t20) and joinrel like (t{3, 4}, t(3,4,5}, t(3,4,5,6}) and so on. I guess all these relations would get the same result at every time. we don't need to do that at every section of the Union ALL case. So an idea of PlannerInfo A can access the planning result of PlannerInfo B comes, this idea doesn't come in my mind for the first time, many cost based transformations may need this. Then I have the following POC in my mind. PlannerInfo { PlannerInfo *shared_planner_info; Relids or_relids; // this field can be improved later. }; With this way, if we find a 'relids' is not related to or_relids. We can grab the planning result from shared_planner_info (in this or-union case, we can set that after we plan the first section of UNION). Then I found this one would not work after planning time partition prune since the relid would change. For example: P (P1, P2) PARTITION BY A Q (Q1, Q2) PARTITION BY A SELECT * FROM t, p, q where (t.a = 1 or p.a = 1). So in the un-transformed case, Q2 would have relid = 7. After the transform, Q2 probably has relid = 6. So basically, I have 4 questions now. 1. Would investing on the shared_planner_info idea be a good idea? 2. Without planning time prune, shall the above design work? 3. Currently I want to use relid to identify a resultset and pathlists which have the planning time prune issue, should we consider other methods? 4. Any other suggestion about to resolve the 'duplicated planning case' besides the shared_planner_info method? [1] https://blogs.oracle.com/optimizer/optimizer-transformations:-or-expansion -- Best Regards Andy Fan (https://www.aliyun.com/)
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On Tue, Oct 02, 2018 at 10:53:40AM -0400, Tom Lane wrote: > Noah Misch writes: > > On Mon, Oct 01, 2018 at 09:32:10PM -0400, Tom Lane wrote: > >> FWIW, my problem with this patch is that I remain unconvinced of the basic > >> correctness of the transform (specifically the unique-ification approach). > >> Noah's points would be important to address if we were moving the patch > >> towards commit, but I don't see much reason to put effort into it until > >> we can think of a way to prove whether that works. > > > Not even effort to fix the assertion failures I reported? > > If it seemed relevant to the proof-of-correctness problem, I would look > into it, but ... I put some hours into theoretical study of the proof, and I didn't find any holes. When the planner removes "l0 LEFT JOIN r1", it does that because there's one output row per l0.ctid regardless of the rows of r1. Hence, deduplicating on (l0.ctid) is equivalent to deduplicating on (l0.ctid, r1.ctid). In the bad FULL JOIN case, (l0.ctid, r1.ctid) would be sufficient as a key, but we're out of luck because removing the join makes us have only l0.ctid for some tuples and only r1.ctid for other tuples. If PostgreSQL ever gets inner join removal, it would be harder to preserve this optimization in this form. At that point, perhaps we'd cost the path that retains the join for the benefit of $SUBJECT. Given the performance test results, $SUBJECT may already need a cost-based decision on whether to use it.
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
Noah Misch writes: > On Mon, Oct 01, 2018 at 09:32:10PM -0400, Tom Lane wrote: >> FWIW, my problem with this patch is that I remain unconvinced of the basic >> correctness of the transform (specifically the unique-ification approach). >> Noah's points would be important to address if we were moving the patch >> towards commit, but I don't see much reason to put effort into it until >> we can think of a way to prove whether that works. > Not even effort to fix the assertion failures I reported? If it seemed relevant to the proof-of-correctness problem, I would look into it, but ... regards, tom lane
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On Mon, Oct 01, 2018 at 09:32:10PM -0400, Tom Lane wrote: > Michael Paquier writes: > > On Mon, Sep 03, 2018 at 06:59:10PM -0700, Noah Misch wrote: > >> If you're going to keep this highly-simplified estimate, please expand the > >> comment to say why it doesn't matter or what makes it hard to do better. > >> The > >> non-planunionor.c path for the same query computes its own estimate of the > >> same underlying quantity. Though it may be too difficult in today's > >> system, > >> one could copy the competing path's row count estimate here. Perhaps it > >> doesn't matter because higher-level processing already assumes equal row > >> count > >> among competitors? > > > As there has been no replies to Noah's review for one month, I am > > marking this patch as returned with feedback for now. > > FWIW, my problem with this patch is that I remain unconvinced of the basic > correctness of the transform (specifically the unique-ification approach). > Noah's points would be important to address if we were moving the patch > towards commit, but I don't see much reason to put effort into it until > we can think of a way to prove whether that works. Not even effort to fix the assertion failures I reported?
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
Michael Paquier writes: > On Mon, Sep 03, 2018 at 06:59:10PM -0700, Noah Misch wrote: >> If you're going to keep this highly-simplified estimate, please expand the >> comment to say why it doesn't matter or what makes it hard to do better. The >> non-planunionor.c path for the same query computes its own estimate of the >> same underlying quantity. Though it may be too difficult in today's system, >> one could copy the competing path's row count estimate here. Perhaps it >> doesn't matter because higher-level processing already assumes equal row >> count >> among competitors? > As there has been no replies to Noah's review for one month, I am > marking this patch as returned with feedback for now. FWIW, my problem with this patch is that I remain unconvinced of the basic correctness of the transform (specifically the unique-ification approach). Noah's points would be important to address if we were moving the patch towards commit, but I don't see much reason to put effort into it until we can think of a way to prove whether that works. regards, tom lane
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On Mon, Sep 03, 2018 at 06:59:10PM -0700, Noah Misch wrote: > If you're going to keep this highly-simplified estimate, please expand the > comment to say why it doesn't matter or what makes it hard to do better. The > non-planunionor.c path for the same query computes its own estimate of the > same underlying quantity. Though it may be too difficult in today's system, > one could copy the competing path's row count estimate here. Perhaps it > doesn't matter because higher-level processing already assumes equal row count > among competitors? As there has been no replies to Noah's review for one month, I am marking this patch as returned with feedback for now. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On Sun, Feb 12, 2017 at 09:32:36AM -0800, Tom Lane wrote: > It's not so much poor choices as the cost of the optimization attempt --- > if there's a K-relation OR clause, this will increase the cost of planning > by a factor approaching K+1, whether or not you get a better plan out of > it. I ran the regression tests with some instrumentation and determined > that this logic fires a dozen or two times, and fails to produce a plan > that looks cheaper than the standard plan in any of those cases. So if we > go down this road, not only do we need a GUC but I suspect it had better > default to off; only people using star schemas are really likely to get a > win out of it. I benchmarked using the attached script. Highlights: $ perl mk-bench-union-or.pl --dimensions=11 --tests=const | psql -X # TRAP: FailedAssertion("!(uniq_exprs != ((List *) ((void *)0)))", File: "planunionor.c", Line: 335) $ perl mk-bench-union-or.pl --dimensions=11 --tests=var --exhaustive --values-per-dimension=0 | psql -X # TRAP: FailedAssertion("!(list_length(dest_tlist) == list_length(src_tlist))", File: "tlist.c", Line: 344) $ perl mk-bench-union-or.pl --dimensions=7 --or-clauses=20 --exhaustive --tests=const,rowmark | psql -X ... (with planunionor.c plan) Planning Time: 1902.013 ms Execution Time: 291.629 ms ... (without planunionor.c plan) Planning Time: 11.100 ms Execution Time: 64.452 ms # planning 170x slower, chosen plan 4.5x slower I agree this warrants a GUC, but I propose a goal of making it fitting to enable by default. The is_suitable_join_or_clause() test is a good indicator of promising queries, and I suspect it's cheap enough to run all the time. As a DBA, I'd struggle to predict when is_suitable_join_or_clause() will pass while the optimization as a whole will lose; I would resort to testing each important query both ways. In other words, the GUC would boil down to a planner hint, not to a declaration about the table schema. On Thu, Aug 23, 2018 at 02:10:46PM -0400, Tom Lane wrote: > Rebased up to HEAD, per cfbot nagging. Still no substantive change from > v2. > + * is retty mechanical, but we can't do it until we have a RelOptInfo for > the Jim Nasby had pointed out this s/retty/pretty/ typo. > + void > + finish_union_or_paths(PlannerInfo *root, RelOptInfo *joinrel, > + List *union_or_subpaths) > + { ... > + pathnode = makeNode(UniquePath); ... > + /* Estimate number of output rows */ > + pathnode->path.rows = appendpath->rows; If you're going to keep this highly-simplified estimate, please expand the comment to say why it doesn't matter or what makes it hard to do better. The non-planunionor.c path for the same query computes its own estimate of the same underlying quantity. Though it may be too difficult in today's system, one could copy the competing path's row count estimate here. Perhaps it doesn't matter because higher-level processing already assumes equal row count among competitors? mk-bench-union-or.pl Description: Perl program
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On Thu, Aug 23, 2018 at 5:20 PM, Peter Geoghegan wrote: > This patch adds an enhancement that is an example of a broader class > of optimizer enhancement primarily aimed at making star-schema queries > have more efficient plans, by arranging to use several independent > nested loop joins based on a common pattern. Each nestloop join has > one particular dimension table on the outer side, and the fact table > on the inner side. Correction: I meant that for each join, the outer side is a scan of some particular fact table index, while the inner side probes some particular dimension table's primary key, and evaluates the dimension's qual. -- Peter Geoghegan
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On Thu, Aug 23, 2018 at 11:10 AM, Tom Lane wrote: > Rebased up to HEAD, per cfbot nagging. Still no substantive change from > v2. I happened to have the opportunity to talk to Tom about this patch in person. I expressed some very general concerns that are worth repeating publicly. This patch adds an enhancement that is an example of a broader class of optimizer enhancement primarily aimed at making star-schema queries have more efficient plans, by arranging to use several independent nested loop joins based on a common pattern. Each nestloop join has one particular dimension table on the outer side, and the fact table on the inner side. The query plan is not so much a tree as it is a DAG (directed acyclic graph), because the fact table is visited multiple times. (There are already cases in Postgres in which the query plan is technically a DAG, actually, but it could be taken much further.) Aside from being inherently more efficient, DAG-like star schema plans are also *ideal* targets for parallel query. The executor can execute each nested loop join in a parallel worker with minimal contention -- the inner side of each nestloop join all probe a different fact table index to the others. It's almost like executing several different simple queries concurrently, with some serial processing at the end. Even that serial processing can sometimes be minimized by having some of the parallel workers use a Bloom filter in shared memory. Tom is already concerned that the optimization added by this patch may be too much of a special case, which is understandable. It may be that we're failing to identify some greater opportunity to add DAG-like plans for star schema queries. -- Peter Geoghegan
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
I wrote: > [ join-or-to-union-4.patch ] Rebased up to HEAD, per cfbot nagging. Still no substantive change from v2. regards, tom lane diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 6269f47..8935503 100644 *** a/src/backend/nodes/outfuncs.c --- b/src/backend/nodes/outfuncs.c *** _outPlannerInfo(StringInfo str, const Pl *** 2310,2315 --- 2310,2316 WRITE_FLOAT_FIELD(limit_tuples, "%.0f"); WRITE_UINT_FIELD(qual_security_level); WRITE_ENUM_FIELD(inhTargetKind, InheritanceKind); + WRITE_BOOL_FIELD(needBaseTids); WRITE_BOOL_FIELD(hasJoinRTEs); WRITE_BOOL_FIELD(hasLateralRTEs); WRITE_BOOL_FIELD(hasDeletedRTEs); diff --git a/src/backend/optimizer/plan/Makefile b/src/backend/optimizer/plan/Makefile index 88a9f7f..1db6bd5 100644 *** a/src/backend/optimizer/plan/Makefile --- b/src/backend/optimizer/plan/Makefile *** top_builddir = ../../../.. *** 13,18 include $(top_builddir)/src/Makefile.global OBJS = analyzejoins.o createplan.o initsplan.o planagg.o planmain.o planner.o \ ! setrefs.o subselect.o include $(top_srcdir)/src/backend/common.mk --- 13,18 include $(top_builddir)/src/Makefile.global OBJS = analyzejoins.o createplan.o initsplan.o planagg.o planmain.o planner.o \ ! planunionor.o setrefs.o subselect.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 0e73f9c..fdffb6b 100644 *** a/src/backend/optimizer/plan/analyzejoins.c --- b/src/backend/optimizer/plan/analyzejoins.c *** clause_sides_match_join(RestrictInfo *ri *** 155,160 --- 155,163 * cases, but we don't have the infrastructure to prove them.) We also * have to check that the inner side doesn't generate any variables needed * above the join. + * + * Note: making this significantly smarter might break planunionor.c. + * Study that file before doing so. */ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo) diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index b05adc7..4486885 100644 *** a/src/backend/optimizer/plan/planmain.c --- b/src/backend/optimizer/plan/planmain.c *** query_planner(PlannerInfo *root, List *t *** 212,217 --- 212,225 add_placeholders_to_base_rels(root); /* + * Also, if we have a request to emit baserel CTIDs, add those to the + * baserel targetlists now. This likewise has to be done after join + * removal, because we only want CTIDs for rels that survive join removal. + */ + if (root->needBaseTids) + add_base_rel_ctids(root); + + /* * Construct the lateral reference sets now that we have finalized * PlaceHolderVar eval levels. */ diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 96bf060..36907a4 100644 *** a/src/backend/optimizer/plan/planner.c --- b/src/backend/optimizer/plan/planner.c *** subquery_planner(PlannerGlobal *glob, Qu *** 623,628 --- 623,629 root->minmax_aggs = NIL; root->qual_security_level = 0; root->inhTargetKind = INHKIND_NONE; + root->needBaseTids = false; root->hasRecursion = hasRecursion; if (hasRecursion) root->wt_param_id = SS_assign_special_param(root); *** grouping_planner(PlannerInfo *root, bool *** 1797,1802 --- 1798,1804 WindowFuncLists *wflists = NULL; List *activeWindows = NIL; grouping_sets_data *gset_data = NULL; + List *union_or_subpaths; standard_qp_extra qp_extra; /* A recursive query should always have setOperations */ *** grouping_planner(PlannerInfo *root, bool *** 1874,1879 --- 1876,1889 preprocess_minmax_aggregates(root, tlist); /* + * Preprocess join OR clauses that might be better handled as UNIONs. + * This likewise needs to be close to the query_planner() call. But + * it doesn't matter which of preprocess_minmax_aggregates() and this + * function we call first, because they treat disjoint sets of cases. + */ + union_or_subpaths = split_join_or_clauses(root); + + /* * Figure out whether there's a hard limit on the number of rows that * query_planner's result subplan needs to return. Even if we know a * hard limit overall, it doesn't apply if the query has any *** grouping_planner(PlannerInfo *root, bool *** 1908,1913 --- 1918,1931 standard_qp_callback, _extra); /* + * If we found any way to convert a join OR clause to a union, finish + * up generating the path(s) for that, and add them into the topmost + * scan/join relation. + */ + if (union_or_subpaths) + finish_union_or_paths(root, current_rel, union_or_subpaths); + + /* * Convert the query's result tlist into PathTarget format. * * Note: it's desirable to
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On 30 March 2018 at 15:05, Andres Freundwrote: >> + * To allow join removal to happen, we can't reference the CTID column >> + * of an otherwise-removable relation. > > A brief hint why wouldn't hurt. Maybe something like: /* * Join removal is only ever possible when no columns of the to-be-removed relation * are referenced. If we added the CTID here then we could inadvertently disable join * removal. We'll need to delay adding the CTID until after join removal takes place. */ (I've not read the code, just the thread) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On 3 February 2018 at 03:26, Tom Lanewrote: > Tomas Vondra writes: >> ISTM this patch got somewhat stuck as we're not quite sure the >> transformation is correct in all cases. Is my impression correct? > > Yeah, that's the core issue. > >> If yes, how to we convince ourselves? Would some sort of automated testing >> (generating data and queries) help? I'm willing to spend some cycles on >> that, if considered helpful. > > I'm not sure if that would be enough to convince doubters. On the other > hand, if it found problems, that would definitely be useful. I've read over this thread and as far as I can see there is concern that the UNION on the ctids might not re-uniquify the rows again. Tom mentioned a problem with FULL JOINs, but the concern appears to have been invalidated due to wrong thinking about how join removals work. As of now, I'm not quite sure who exactly is concerned. Tom thought there was an issue but quickly corrected himself. As far as I see it, there are a few cases we'd need to disable the optimization: 1. Query contains target SRFs (the same row might get duplicated, we don't want to UNION these duplicates out again, they're meant to be there) 2. Query has setops (ditto) 3. Any base rels are not RELKIND_RELATION (we need the ctid to uniquely identify rows) 4. Query has volatile functions (don't want to evaluate volatile functions more times than requested) As far as the DISTINCT clause doing the right thing for joins, I see no issues, even with FULL JOINs. In both branches of the UNION the join condition will be the same so each side of the join always has the same candidate row to join to. I don't think the optimization is possible if there are OR clauses in the join condition, but that's not being proposed. FULL JOINS appear to be fine as the row is never duplicated on a non-match, so there will only be one version of t1.ctid, NULL::tid or NULL::tid, t1.ctid and all ctids in the distinctClauses cannot all be NULL at once. I used the following SQL to help my brain think through this. There are two versions of each query, one with DISTINCT and one without. If the DISTINCT returns fewer rows than the one without then we need to disable this optimization for that case. I've written queries for 3 of the above 4 cases. I saw from reading the thread that case #4 is already disabled: drop table if exists t1,t2; create table t1 (a int); create table t2 (a int); insert into t1 values(1),(1),(2),(4); insert into t2 values(1),(1),(3),(3),(4),(4); select t1.ctid,t2.ctid from t1 full join t2 on t1.a = t2.a; select distinct t1.ctid,t2.ctid from t1 full join t2 on t1.a = t2.a; -- case 1: must disable in face of tSRFs select ctid from (select ctid,generate_Series(1,2) from t1) t; select distinct ctid from (select ctid,generate_Series(1,2) from t1) t; -- case 2: must disable optimization with setops. select ctid from (select ctid from t1 union all select ctid from t1) t; select distinct ctid from (select ctid from t1 union all select ctid from t1) t; -- case 3: must disable if we join to anything other than a RELKIND_RELATION (no ctid) select ctid from (select t1.ctid from t1 inner join (values(1),(1)) x(x) on t1.a = x.x) t; select distinct ctid from (select t1.ctid from t1 inner join (values(1),(1)) x(x) on t1.a = x.x) t; I've not read the patch yet, but I understand what it's trying to achieve. My feelings about the patch are that it would be useful to have. I think if someone needs this then they'll be very happythat we've added it. I also think there should be a GUC to disable it, and it should be enabled through the entire alpha/beta period, and we should consider what the final value for it should be just before RC1. It's a bit sad to exclude foreign tables, and I'm not too sure what hurdles this leaves out for pluggable storage. No doubt we'll need to disable the optimization for those too unless they can provide us with some row identifier. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
Hi, I've only skimmed the thread, looking at the patch on its own. On 2018-01-04 17:50:48 -0500, Tom Lane wrote: > diff --git a/src/backend/optimizer/plan/plaindex ...dd11e72 . > --- a/src/backend/optimizer/plan/planunionor.c > +++ b/src/backend/optimizer/plan/planunionor.c > @@ -0,0 +1,667 @@ > +/*- > + * > + * planunionor.c > + * Consider whether join OR clauses can be converted to UNION queries. > + * > + * The current implementation of the UNION step is to de-duplicate using > + * row CTIDs. Could we skip using the ctid if there's a DISTINCT (or something to that effect) above? We do not need to avoid removing rows that are identical if that's done anyway. > A big limitation is that this only works on plain relations, > + * and not for instance on foreign tables. Another problem is that we can > + * only de-duplicate by sort/unique, not hashing; but that could be fixed > + * if we write a hash opclass for TID. I wonder if an alternative could be some sort of rowid that we invent. It'd not be that hard to introduce an executor node (or do it in projection) that simply counts row and returns that as a column. Together with e.g. range table id that'd be unique. But for that we would need to guarantee that foreign tables / subqueries / ... returned the same result in two scans. We could do so by pushing the data gathering into a CTE, but that'd make this exercise moot. Why can't we ask at least FDWs to return something ctid like? > + * To allow join removal to happen, we can't reference the CTID column > + * of an otherwise-removable relation. A brief hint why wouldn't hurt. > +/* > + * Is query as a whole safe to apply union OR transformation to? > + * This checks relatively-expensive conditions that we don't want to > + * worry about until we've found a candidate OR clause. > + */ > +static bool > +is_query_safe_for_union_or_transform(PlannerInfo *root) > +{ > + Query *parse = root->parse; > + Relids allbaserels; > + ListCell *lc; > + int relid; > + > + /* > + * Must not have any volatile functions in FROM or WHERE (see notes at > + * head of file). > + */ > + if (contain_volatile_functions((Node *) parse->jointree)) > + return false; Hm, are there any SRFs that could be in relevant places? I think we reject them everywhere were they'd be problematic (as targetlist is processed above)? Do you have any plans for this patch at this moment? Greetings, Andres Freund
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On Fri, Feb 2, 2018 at 4:53 PM, Tom Lanewrote: > Tomas Vondra writes: >> BTW wouldn't it be possible to derive "traditional" proof in relational >> algebra, similarly to other transforms? > > Perhaps. The patch depends on CTID, but you could probably model that > as a primary key in a proof. I'll remind you that commit b648b703 made the TID sort performed by CREATE INDEX CONCURRENTLY over 3 times faster in cases where the sort completes in memory. The simplest way to get a significant portion of that benefit for your approach with sort+unique on CTID would be to add sortsupport with abbreviated keys to the TID btree opclass. -- Peter Geoghegan
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
Tomas Vondrawrites: > BTW wouldn't it be possible to derive "traditional" proof in relational > algebra, similarly to other transforms? Perhaps. The patch depends on CTID, but you could probably model that as a primary key in a proof. regards, tom lane
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On 02/02/2018 03:26 PM, Tom Lane wrote: > Tomas Vondrawrites: >> ISTM this patch got somewhat stuck as we're not quite sure the >> transformation is correct in all cases. Is my impression correct? > > Yeah, that's the core issue. > >> If yes, how to we convince ourselves? Would some sort of automated testing >> (generating data and queries) help? I'm willing to spend some cycles on >> that, if considered helpful. > > I'm not sure if that would be enough to convince doubters. On the other > hand, if it found problems, that would definitely be useful. > OK, I'll take a stab at this, then. I wasn't really suggesting it might prove definitely prove the transformation is correct - clearly, it can only find counter-examples. But I think it's worth it anyway. BTW wouldn't it be possible to derive "traditional" proof in relational algebra, similarly to other transforms? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
Tomas Vondrawrites: > ISTM this patch got somewhat stuck as we're not quite sure the > transformation is correct in all cases. Is my impression correct? Yeah, that's the core issue. > If yes, how to we convince ourselves? Would some sort of automated testing > (generating data and queries) help? I'm willing to spend some cycles on > that, if considered helpful. I'm not sure if that would be enough to convince doubters. On the other hand, if it found problems, that would definitely be useful. regards, tom lane
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On 01/04/2018 11:50 PM, Tom Lane wrote: > I wrote: >> Jim Nasbywrites: >>> I've verified that the patch still applies and make check-world is clean. > >> Not any more :-(. Here's a v3 rebased over HEAD. No substantive >> change from v2. > > Again rebased up to HEAD; still no substantive changes. > ISTM this patch got somewhat stuck as we're not quite sure the transformation is correct in all cases. Is my impression correct? If yes, how to we convince ourselves? Would some sort of automated testing (generating data and queries) help? I'm willing to spend some cycles on that, if considered helpful. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
On Tue, Sep 12, 2017 at 9:48 PM, Tom Lanewrote: > Jim Nasby writes: >> I've verified that the patch still applies and make check-world is clean. > > Not any more :-(. Here's a v3 rebased over HEAD. No substantive > change from v2. Patch applies and compiles, and it got no reviews. Moved to CF 2018-01. -- Michael