Re: Uninterruptible long planning of a query with too many WHERE clauses
Alexander Kuzmenkov writes: > Recently one of our customers encountered a situation when the planning > of a particular query takes too long (several minutes) and can't be > interrupted by pg_terminate_backend(). The query and schema are attached > (this is generated by Zabbix). Ugh. I hope they aren't expecting actually *good* performance on this sort of query. Still, O(N^2) behavior isn't nice. When I first saw your post, I thought maybe the problem was an unreasonable number of paths, but actually there are only two indexpaths being considered in choose_bitmap_and(). The problem is that one of them has 8 quals attached to it, and the code that's sorting through the quals is O(N^2). > Our first attempt to fix this was putting these clauses into an rbtree > or dynahash. This improves the performance, but is not entirely correct. ... depends on how you do it ... > I settled on a simpler solution: limiting the number of clauses we try > to uniquely identify. If there are too many, skip the smarter logic that > requires comparing paths by clauses, and just return the cheapest input > path from choose_bitmap_and(). The patch is attached. I think you have the right basic idea, but we don't have to completely lobotomize the bitmap-and search logic in order to cope with this. This code is only trying to figure out which paths are potentially redundant, so for a path with too many quals, we can just deem it not-redundant, as attached. A different line of thought is that using equal() to compare quals here is likely overkill: plain old pointer equality ought to be enough, since what we are looking for is different indexpaths derived from the same members of the relation's baserestrictinfo list. In itself, such a change would not fix the O(N^2) problem, it'd just cut a constant factor off the big-O multiplier. (A big constant factor, perhaps, but still just a constant factor.) However, once we go to pointer equality as the definition, we could treat the pointer values as scalars and then use hashing or whatever on them. But this would take a good deal of work, and I think it might be a net loss for typical not-very-large numbers of quals. Also, I tried just quickly changing the equal() call to a pointer comparison, and it didn't seem to make much difference given that I'd already done the attached. So my feeling is that possibly that'd be worth doing sometime in the future, but this particular example isn't offering a compelling reason to do it. Another thought is that maybe we need a CHECK_FOR_INTERRUPTS call somewhere in here; but I'm not sure where would be a good place. I'm not excited about sticking one into classify_index_clause_usage, but adding one up at the per-path loops would not help for this case. regards, tom lane diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index f295558..7bf29a6 100644 *** a/src/backend/optimizer/path/indxpath.c --- b/src/backend/optimizer/path/indxpath.c *** typedef struct *** 67,72 --- 67,73 List *quals; /* the WHERE clauses it uses */ List *preds; /* predicates of its partial index(es) */ Bitmapset *clauseids; /* quals+preds represented as a bitmapset */ + bool unclassifiable; /* has too many quals+preds to process? */ } PathClauseUsage; /* Callback argument for ec_member_matches_indexcol */ *** choose_bitmap_and(PlannerInfo *root, Rel *** 1447,1455 Path *ipath = (Path *) lfirst(l); pathinfo = classify_index_clause_usage(ipath, ); for (i = 0; i < npaths; i++) { ! if (bms_equal(pathinfo->clauseids, pathinfoarray[i]->clauseids)) break; } if (i < npaths) --- 1448,1465 Path *ipath = (Path *) lfirst(l); pathinfo = classify_index_clause_usage(ipath, ); + + /* If it's unclassifiable, treat it as distinct from all others */ + if (pathinfo->unclassifiable) + { + pathinfoarray[npaths++] = pathinfo; + continue; + } + for (i = 0; i < npaths; i++) { ! if (!pathinfoarray[i]->unclassifiable && ! bms_equal(pathinfo->clauseids, pathinfoarray[i]->clauseids)) break; } if (i < npaths) *** choose_bitmap_and(PlannerInfo *root, Rel *** 1484,1489 --- 1494,1503 * For each surviving index, consider it as an "AND group leader", and see * whether adding on any of the later indexes results in an AND path with * cheaper total cost than before. Then take the cheapest AND group. + * + * Note: paths that are either clauseless or unclassifiable will have + * empty clauseids, so that they will not be rejected by the clauseids + * filter here, nor will they cause later paths to be rejected by it. */ for (i = 0; i < npaths; i++) { *** classify_index_clause_usage(Path *path, *** 1711,1716 --- 1725,1745 result->preds = NIL; find_indexpath_quals(path, >quals, >preds);
Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
On Wed, Nov 7, 2018 at 5:46 PM Peter Geoghegan wrote: > Teodor: Do you think that the issue is fixable? It looks like there > are serious issues with the design of 218f51584d5 to me. I don't think > the general "there can't be any inserters at this subtree" thing works > given that we have to couple buffer locks when moving right for other > reasons. We call ginStepRight() within ginFinishSplit(), for reasons > that date back to bug fix commit ac4ab97e from 2013 -- that detail is > probably important, because it seems to be what breaks the subtree > design (we don't delete in two phases or anything in GIN). Ping? This is a thorny problem, and I'd like to get your input soon. I suspect that reverting 218f51584d5 may be the ultimate outcome, even though it's a v10 feature. -- Peter Geoghegan
Re: Skylake-S warning
On 2018-11-11 11:29:54 +1300, Thomas Munro wrote: > On Sat, Nov 10, 2018 at 6:01 PM Andres Freund wrote: > > I've replaced that with a write barrier / read barrier. I strongly > > suspect this isn't a problem on the write side in practice (due to the > > dependent read), but the read side looks much less clear to me. I think > > explicitly using barriers is much saner these days. > > > > Comments? > > +1 > > I said the same over here: > > https://www.postgresql.org/message-id/flat/CAEepm%3D1nff0x%3D7i3YQO16jLA2qw-F9O39YmUew4oq-xcBQBs0g%40mail.gmail.com Hah! Sorry for missing that back then. I think your patch from back then misses a few things that mine did. But I also think mine missed the fact that XidCacheRemove is problematic - I only was thinking of the *reads* of MyPgXact->nxids in XidCacheRemoveRunningXids(), but you correctly observe that the write is the problematic case (the reads are ok, because it's the same process as GetNewTransactionId()). Greetings, Andres Freund
Re: Skylake-S warning
On Sat, Nov 10, 2018 at 6:01 PM Andres Freund wrote: > On 2018-10-05 10:29:55 -0700, Andres Freund wrote: > > - remove the volatiles from GetSnapshotData(). As we've, for quite a > > while now, made sure both lwlocks and spinlocks are proper barriers > > they're not needed. > > Attached is a patch that removes all the volatiles from procarray.c and > varsup.c. I'd started to just remove them from GetSnapshotData(), but > that doesn't seem particularly consistent. > > I actually think there's a bit of a correctness problem with the > previous state - the logic in GetNewTransactionId() relies on volatile > guaranteeing memory ordering, which it doesn't do: > > * Use volatile pointer to prevent code rearrangement; other > backends > * could be examining my subxids info concurrently, and we > don't want > * them to see an invalid intermediate state, such as > incrementing > * nxids before filling the array entry. Note we are > assuming that > * TransactionId and int fetch/store are atomic. > */ > volatile PGPROC *myproc = MyProc; > volatile PGXACT *mypgxact = MyPgXact; > > if (!isSubXact) > mypgxact->xid = xid; > else > { > int nxids = mypgxact->nxids; > > if (nxids < PGPROC_MAX_CACHED_SUBXIDS) > { > myproc->subxids.xids[nxids] = xid; > mypgxact->nxids = nxids + 1; > } > > I've replaced that with a write barrier / read barrier. I strongly > suspect this isn't a problem on the write side in practice (due to the > dependent read), but the read side looks much less clear to me. I think > explicitly using barriers is much saner these days. > > Comments? +1 I said the same over here: https://www.postgresql.org/message-id/flat/CAEepm%3D1nff0x%3D7i3YQO16jLA2qw-F9O39YmUew4oq-xcBQBs0g%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Proving IS NOT NULL inference for ScalarArrayOpExpr's
I've recently been investigating improving our plans for queries like: SELECT * FROM t WHERE t.foo IN (1, 2..1000); where the table "t" has a partial index on "foo" where "foo IS NOT NULL". Currently the planner generates an index [only] scan so long as the number of items in the IN expression is <= 100, but as soon as you add the 101st item it reverts to seq scan. If we add the explicit null check like: SELECT * FROM t WHERE t.foo IN (1, 2..1000) AND foo IS NOT NULL; then we go back to the desired index scan. This occurs because predtest.c stops expanding ScalarArrayOpExpr's with constant array arguments into OR trees when the array size is > 100. The rest of the predicate proving code then becomes unable to infer that foo is not null and therefore the planner cannot prove that the partial index is correct to use. (Please pardon technical details in the below background that may be way off; I don't have a lot of experience with the Postgres codebase yet, and am still trying to build a mental model of things.) At first I was imagining having the parse keep track of whether an array const expr contained any nulls and perhaps adding generated quals (in an equivalence class?) to allow the planner to easily prove the index was correct. I'd been going down this track because in my mind the issue was because the planner needed to verify whether all of the array elements were not null. But as I started to dig into the predtest.c NOT NULL proofs and add test cases, I realized that at least in many normal op cases we can safely infer that foo is not null when "foo " is true even if the array contains null elements. This is such a simple change that it seems like I must be missing a case where the above doesn't hold true, but I can't immediately think of any, and indeed with the attached patch all existing tests pass (including some additional ones I added for predtest to play around with it). Am I missing something obvious? Is this a valid approach? Other outstanding questions: Should I add additional tests for predtest? It already seems to cover some null test cases with scalar array ops, but I'd be happy to add more if desired. Should I add a test case for the resulting plan with "foo IN (...)" with an array with more than 100 elements? Thanks, James Coleman saop_is_not_null-v1.patch Description: Binary data
Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA
On 2018-11-10 20:18:33 +0100, Dmitry Dolgov wrote: > > On Mon, 2 Jul 2018 at 15:54, Jesper Pedersen > > wrote: > > > > The patch from November 27, 2017 still applies (with hunks), > > > > https://commitfest.postgresql.org/18/1166/ > > > > passes "make check-world" and shows performance improvements. > > > > Keeping it in "Ready for Committer". > > Looks like for some reason this patch is failing to attract committers, any > idea why? One of the plausible explanations for me is that the patch requires > some more intensive benchmarking of different workloads and types of lock > contention to make everyone more confident about it. Personally it's twofold: 1) It changes a lot of things, more than I think are strictly necessary to achieve the goal. 2) While clearly the retry logic is not necessary anymore (it was introduced when wait-queue was protected by a separate spinlock, which could not atomically manipulated together with the lock's state), there's reasons why it would be advantageous to keep: My original patch for changing lwlocks to atomics, used lock xadd / fetch_add to acquire shared locks (just incrementing the #shared bits after an unlocked check) - obviously that can cause superfluous failures for concurrent lock releases. Having the retry logic around can make that safe. Using lock xadd to acquire shared locks turns out to be considerably more efficient - most locks where the lock state is contended (rather than just having processes wait), tend to have a very large fraction of shared lockers. And being able to do such a lock acquisition on a conteded cacheline with just a single locked operation, commonly without retries, is quite beneficial. Greetings, Andres Freund
Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA
> On Mon, 2 Jul 2018 at 15:54, Jesper Pedersen > wrote: > > The patch from November 27, 2017 still applies (with hunks), > > https://commitfest.postgresql.org/18/1166/ > > passes "make check-world" and shows performance improvements. > > Keeping it in "Ready for Committer". Looks like for some reason this patch is failing to attract committers, any idea why? One of the plausible explanations for me is that the patch requires some more intensive benchmarking of different workloads and types of lock contention to make everyone more confident about it. But then it's not exactly clear for me, what it has to do with NUMA? Non uniform memory access effects were not mentioned here even once, and we're not talking about e.g. migrating the memory between nodes - does it mean that NUMA here is a replacement for a multi-core system? Thinking in this direction I've performed few tests of this patch (using so far only one type of zipfian'ish workload) on my modest few cores laptop, wondering if I can see any difference at all. To my surprise, while on "bare" hardware I've indeed got some visible performance improvement, when I tried to run the same tests on a postgres inside a virtual machine under qemu-kvm, some of them were few percents slower in comparison with the master (depending on the number of virtual cpus). Probably it's related to the situation, when locks can behave differently with preemtable vcpus - I'll try to perform few more rounds of testing to make clear if it's something significant or not. Als I wonder if it makes sense to take a look at the [1], where the similar topic is discussed. [1]: https://www.postgresql.org/message-id/flat/CAPpHfdsZPS%3Db85cxcqO7YjoH3vtPBKYhVRLkZ_WMVbfZB-3bHA%40mail.gmail.com#a5e6d14c098e95e6a36ebd392f280c7a
Re: proposal: variadic argument support for least, greatest function
so 10. 11. 2018 v 19:12 odesílatel Vik Fearing napsal: > On 08/11/2018 15:59, Pavel Stehule wrote: > > Hi > > > > We can pass variadic arguments as a array to any variadic function. But > > some our old variadic functions doesn't supports this feature. > > > > We cannot to write > > > > SELECT least(VARIADIC ARRAY[1,2,3]); > > > > Attached patch add this possibility to least, greatest functions. > > Is there any particular reason you didn't just make least and greatest > actual functions? > These functions has are able to use most common type and enforce casting. Generic variadic function cannot do it. Regards Pavel -- > Vik Fearing +33 6 46 75 15 36 > http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support >
Re: proposal: variadic argument support for least, greatest function
> "Vik" == Vik Fearing writes: >> Attached patch add this possibility to least, greatest functions. Vik> Is there any particular reason you didn't just make least and Vik> greatest actual functions? least() and greatest() have some type unification logic that I don't think works for actual functions. create function s(variadic anyarray) returns anyelement language sql immutable as $$ select min(v) from unnest($1) u(v); $$; select s(1,2,3); -- works select s(1,2,3.0); -- ERROR: function s(integer, integer, numeric) does not exist select least(1,2,3.0); -- works -- Andrew (irc:RhodiumToad)
Re: proposal: variadic argument support for least, greatest function
On 08/11/2018 15:59, Pavel Stehule wrote: > Hi > > We can pass variadic arguments as a array to any variadic function. But > some our old variadic functions doesn't supports this feature. > > We cannot to write > > SELECT least(VARIADIC ARRAY[1,2,3]); > > Attached patch add this possibility to least, greatest functions. Is there any particular reason you didn't just make least and greatest actual functions? -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: unused/redundant foreign key code
On 09/11/2018 21:37, Daniel Gustafsson wrote: >> On 8 Aug 2018, at 21:34, Peter Eisentraut >> wrote: >> >> I found some unused and some redundant code in ri_triggers.c that was >> left around by some previous changes that aimed to optimize away certain >> trigger invocations. See attached patches. > > Both of these patches apply cleanly with minimal offsets, build without > warnings and make check passes. > > From reading code and testing, I concur with your findings that this is indeed > dead code. > > +1 on this cleanup, I’m marking this as Ready for Committer. Committed, thanks. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: zheap: a new storage format for PostgreSQL
>>Thanks. Initializing the variable seems like the right fix here. ... just had a warning when recompiling from the latest sources on CentOS 7: labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o tpd.o tpd.c tpd.c: In function ‘TPDFreePage’: tpd.c:1003:15: warning: variable ‘curblkno’ set but not used [-Wunused-but-set-variable] BlockNumber curblkno = InvalidBlockNumber; ^ Not sure if this is important but as I could not find anything on this thread related to this I thought I'd report it Regards Daniel
Re: speeding up planning with partitions
On 9 November 2018 at 21:55, Amit Langote wrote: > v5-0001-Store-inheritance-root-parent-index-in-otherrel-s.patch > > Adds a inh_root_parent field that's set in inheritance child otherrel > RelOptInfos to store the RT index of their root parent > > v5-0002-Overhaul-inheritance-update-delete-planning.patch > > Patch that adjusts planner so that inheritance_planner can use partition > pruning. I've started looking at these two, but only so far made it through 0001 and 0002. Here's what I noted down during the review. 0001: 1. Why do we need the new field that this patch adds? I see in 0002 it's used like: + if (childrel->inh_root_parent > 0 && + childrel->inh_root_parent == root->parse->resultRelation) Would something like... int relid; if (childrel->part_schema == NULL && bms_get_singleton_member(childrel->top_parent_relids, ) && relid == root->parse->resultRelation) ...not do the trick? 0002: 2. What's wrong with childrel->relids? + child_relids = bms_make_singleton(appinfo->child_relid); 3. Why not use childrel->top_parent_relids? + top_parent_relids = bms_make_singleton(childrel->inh_root_parent); 4. The following comment in inheritance_make_rel_from_joinlist() implies that the function will not be called for SELECT, but the comment above the function does not mention that. /* * For UPDATE/DELETE queries, the top parent can only ever be a table. * As a contrast, it could be a UNION ALL subquery in the case of SELECT. */ Assert(planner_rt_fetch(top_parent_relid, root)->rtekind == RTE_RELATION); 5. Should the following comment talk about "Sub-partitioned tables" rather than "sub-partitions"? + * Sub-partitions have to be processed recursively, because + * AppendRelInfos link sub-partitions to their immediate parents, not + * the root partitioned table. 6. Can't you just pass childrel->relids and childrel->top_parent_relids instead of making new ones? + child_relids = bms_make_singleton(appinfo->child_relid); + Assert(childrel->inh_root_parent > 0); + top_parent_relids = bms_make_singleton(childrel->inh_root_parent); + translated_joinlist = (List *) + adjust_appendrel_attrs_multilevel(root, + (Node *) joinlist, + child_relids, + top_parent_relids); 7. I'm just wondering what your idea is here? + /* Reset join planning specific data structures. */ + root->join_rel_list = NIL; + root->join_rel_hash = NULL; Is there a reason to nullify these? You're not releasing any memory and the new structures that will be built won't overlap with the ones built last round. I don't mean to imply that the code is wrong, it just does not appear to be particularly right. 8. In regards to: + * NB: Do we need to change the child EC members to be marked + * as non-child somehow? + */ + childrel->reloptkind = RELOPT_BASEREL; I know we talked a bit about this before, but this time I put together a crude patch that runs some code each time we skip an em_is_child == true EquivalenceMember. The code checks if any of the em_relids are RELOPT_BASEREL. What I'm looking for here are places where we erroneously skip the member when we shouldn't. Running the regression tests with this patch in place shows a number of problems. Likely I should only trigger the warning when bms_membership(em->em_relids) == BMS_SINGLETON, but it never-the-less appears to highlight various possible issues. Applying the same on master only appears to show the cases where em->em_relids isn't a singleton set. I've attached the patch to let you see what I mean. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services verify_em_child.diff Description: Binary data
Re: Delta Materialized View Refreshes?
Hi folks, I've shared a new patch against 11.0, which seems to work as expected. (Message ID 5100c2b3-641b-4a35-86d0-12ed2e618...@qqdd.eu.) While playing with it, it is actually quite easy to get it confused. And so I wonder — is it actually what we want? For example, if I refresh including a WHERE that filters /out/ some content presently in the MV, but filters /in/ some new content relating to those same rows, then we predictably get a fail. Using the following example MV MV, 'testview', AS SELECT test.type, test.message, count(1) AS count FROM test GROUP BY test.type, test.message, then a refresh materialized view concurrently testview where type = 'main' and count>2 hits: ERROR: duplicate key value violates unique constraint "testview_type_message_idx" DETAIL: Key (type, message)=(main, hello world) already exists. CONTEXT: SQL statement "INSERT INTO public.testview SELECT (diff.newdata).* FROM pg_temp_3.pg_temp_16390_2 diff WHERE tid IS NULL" The message can probably be cleaned up. But the root cause is clearly in the fact that REFRESH ... WHERE really needs to be used quite carefully. I mused about restricting the WHERE clause Vars to allow reference only to columns that are part of the MV's UNIQUE index. It seems it would prevent the issue arising in my simple example, but is it always necessary? And would it be overly restrictive? (For example: would it prevent people issuing delta refreshes and including clauses that make the refresh performant — because perhaps it helps the planner see a short cut to speedy execution?) On a different topic, in implementing it, I noticed that there is rudimentary code-level support for incremental refreshes (see Open/CloseMatViewIncrementalMaintenance() and MatViewIncrementalMaintenanceIsEnabled()), but the facility is not hook-able. There's another discussion (Flexible permissions for REFRESH MATERIALIZED VIEW), and I wonder if a more interesting feature would be to either allow the incremental refresh barriers to be hooked by extensions, or even to offer a fine-grained permission that allows direct manipulation of data in the MV's underlying table. Far as I can see, allowing extensions to hook the incremental refresh APIs would be trivial. Exposing the same via a fine-grained permission would certainly be much harder but it might enable advanced delta-refresh strategies to emerge that are written in high level languages such as PL/pgSQL or Java (etc.) — that is certainly desirable. denty. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Patch for Delta Materialized View Refreshes
Hi folks, I’ve updated this patch against 11.0, and tidied up a few loose ends. refresh-mv-where-clause-#2.diff Description: Binary data d. > On 5 Nov 2018, at 18:14, John Dent wrote: > > Hi folks, > > I failed to post a patch on the thread “Delta Materialized View Refreshes?” > (Message-ID 1541368916681-0.p...@n3.nabble.com), so I figured I’d try again > and post directly this time. Hopefully this time, it’ll make it through. > Thanks for your patience. > > (Original message follows…) > > Hi folks, > > I had a crack at this, and it was pretty simple to get something working to > play around with, and it seems like it might be useful. > > I developed it against 10.1, as that's what I happened to be working with at > the time. The patch is pretty small, and I hoped it would apply cleanly > against 11. Unfortunately it doesn't, but I doubt the issues are substantial. > If there is interest in moving this forward, I'll update and re-share. > > The patch enables pretty much exactly what Jeremy suggests — something like > "refresh materialized view concurrently testview where type = 'main';” — with > fairly obvious semantics. > > Welcome comments on the patch or approach. > > denty. > >
Re: Performance improvements of INSERTs to a partitioned table
On 9 November 2018 at 18:45, Amit Langote wrote: > As long as queries involve tuple routing that touches multiple not yet > seen partitions, someone doing conflicting operations directly on multiple > partitions in a transaction will have to be ready to see deadlocks. > Maybe, we can document that. Perhaps it's good enough. A guess there's at least a workaround of doing LOCK TABLE in the CREATE INDEX / TRUNCATE session. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services