Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
On Sun, May 10, 2015 at 6:01 AM, Andrew Dunstan wrote: > > > > This generally looks good, but I have a couple of questions before I commit it. > > First, why is the new option for the BASE_BACKUP command of the Streaming Replication protcol "TAR"? It seems rather misleading. Shouldn't it be something like "TABLESPACEMAP"? > The reason to keep new option's name as TAR was that tablespace_map was generated for that format type, but I agree with you that something like "TABLESPACEMAP" suits better, so I have changed it to "TABLESPACE_MAP". Putting '_' in name makes it somewhat consistent with other names and filename it generates with this new option. > Second, these lines in xlog.c seem wrong: > > else if ((ch == '\n' || ch == '\r') && prev_ch == '\\') > str[i-1] = '\n'; > > It looks to me like we should be putting ch in the string, not arbitrarily transforming \r into \n. > You are right, I have changed it as per your suggestion. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com extend_basebackup_to_include_symlink_v7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multixacts woes
On Sun, May 10, 2015 at 09:17:58PM -0400, Robert Haas wrote: > On Sun, May 10, 2015 at 1:40 PM, Noah Misch wrote: > > I don't know whether this deserves prompt remediation, but if it does, I > > would > > look no further than the hard-coded 25% figure. We permit users to operate > > close to XID wraparound design limits. GUC maximums force an > > anti-wraparound > > vacuum at no later than 93.1% of design capacity. XID assignment warns at > > 99.5%, then stops at 99.95%. PostgreSQL mandates a larger cushion for > > pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at > > 50.0%. Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the > > bulkiest mandatory cushion yet, an anti-wraparound vacuum when > > pg_multixact/members is just 25% full. > > That's certainly one possible approach. I had discounted it because > you can't really get more than a small multiple out of it, but getting > 2-3x more room might indeed be enough to help some people quite a bit. > Just raising the threshold from 25% to say 40% would buy back a > healthy amount. Right. It's fair to assume that the new VACUUM burden would be discountable at a 90+% threshold, because the installations that could possibly find it expensive are precisely those experiencing corruption today. These reports took eighteen months to appear, whereas some corruption originating in commit 0ac5ad5 saw reports within three months. Therefore, sites burning pg_multixact/members proportionally faster than both pg_multixact/offsets and XIDs must be unusual. Bottom line: if we do need to reduce VACUUM burden caused by the commits you cited upthread, we almost certainly don't need more than a 4x improvement. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
> On 2015-05-10 21:26:26 -0400, Robert Haas wrote: > > On Sun, May 10, 2015 at 8:41 PM, Tom Lane wrote: > > > > This commit reverts create_plan_recurse() as static function. > > > Yes. I am not convinced that external callers should be calling that, > > > and would prefer not to enlarge createplan.c's API footprint without a > > > demonstration that this is right and useful. (This is one of many > > > ways in which this patch is suffering from having gotten committed > > > without submitted use-cases.) > > Wasn't there a submitted use case? IIRC Kaigai had referenced some > pg-strom (?) code using it? > > I'm failing to see how create_plan_recurse() being exposed externally is > related to "having gotten committed without submitted use-cases". Even > if submitted, presumably as simple as possible code, doesn't use it, > that's not a proof that less simple code does not need it. > Yes, PG-Strom code uses create_plan_recurse() to construct child plan node of the GPU accelerated custom-join logic, once it got chosen. Here is nothing special. It calls create_plan_recurse() as built-in join path doing on the underlying inner/outer paths. It is not difficult to submit as a working example, however, its total code size (excludes GPU code) is 25KL at this moment. I'm not certain whether it is a simple example. > > Your unwillingness to make functions global or to stick PGDLLIMPORT > > markings on variables that people want access to is hugely > > handicapping extension authors. Many people have complained about > > that on multiple occasions. Frankly, I find it obstructionist and > > petty. > > While I don't find the tone of the characterization super helpful, I do > tend to agree that we're *far* too conservative on that end. I've now > seen a significant number of extension that copied large swathes of code > just to cope with individual functions not being available. And even > cases where that lead to minor forks with such details changed. > I may have to join the members? > I know that I'm "fearful" of asking for functions being made > public. Because it'll invariably get into a discussion of merits that's > completely out of proportion with the size of the change. And if I, who > has been on the list for a while now, am "afraid" in that way, you can > be sure that others won't even dare to ask, lest argue their way > through. > > I think the problem is that during development the default often is to > create function as static if they're used only in one file. Which is > fine. But it really doesn't work if it's a larger battle to change > single incidences. Besides the pain of having to wait for the next > major release... > Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] feature freeze and beta schedule
On 2015-05-01 18:37:23 +0200, Andres Freund wrote: > * Multivariate statistics > This is not intended to be committed this CF. > => I'd like to mark this as returned with (little) feedback. > > * Avoiding plan disasters with LIMIT > I'm not enthused by the approach, it's disabled by default though. So > it might not be too bad to just do it. Would probably have been a good > idea to discuss the patch in a separate thread. > => ? > > * Turning off HOT for larger SQL queries > Seems to have degenerated into a discussion of not really related > things. I personally would vote for committing something close to what > Simon proposed last *directly at the beginning* of the next cycle. > => Move? > * Unique Joins > This seems to require more work and came in pretty late > => Returned with feedback. > > * INNER JOIN removals > Seem far to controversial to consider comitting in 9.5. > => Returned (or even rejected :() > * Async execution of postgres_fdw. > Later iterations of the patch haven't gotten much review yet. The > original version of the patch is just from 2014-12-15. > => Should imo be moved to the next CF > > * Allow "snapshot too old" error, to prevent bloat > > http://archives.postgresql.org/message-id/1361166406.1897609.1424371443904.JavaMail.yahoo%40mail.yahoo.com > talked about a new version that afaics never materialized > => Returned with feedback > * Parallel Seq scan > In my opinion the topic has progressed greatly. But at the same time > it doesn't seem like it's in a state we should consider for 9.5. > => Return? > * logical column ordering (WIP) > This pretty clearly isn't 9.5 material. > => Return > * Support ORDER BY in CREATE FUNCTION for Set Returning Functions > Uhm. I think the outcome of the discussion so far wasn't really > favorable to the idea s proposed. > => Rejected Marked as such. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On 2015-05-10 22:51:33 -0400, Robert Haas wrote: > > And there's definitely some things > > around that pretty much only still exist because changing them would > > break too much stuff. > > Such as what? Without even thinking about it: * linitial vs lfirst vs lnext. That thing still induces an impedance mismatch when reading code for me, and I believe a good number of other people. * Two 'string buffer' APIs with essentially only minor differences. * A whole bunch of libpq APIs. Admittedly that's a bit more exposed than lots of backend only things. * The whole V0 calling convention that makes it so much easier to get odd crashes. Admittedly that's all I could come up without having to think. But I do vaguely remember a lot of things we did not do because of bwcompat concerns. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On Sun, May 10, 2015 at 10:37 PM, Andres Freund wrote: > On 2015-05-10 21:53:45 -0400, Robert Haas wrote: >> Please name EVEN ONE instance in which core development has been >> prevented for fear of changing a function API. > > Even *moving* function declarations to a different file has been laudly > and repeatedly complained about... Moving declarations is a lot more likely to break compiles than adding declarations. But even the 9.3 header file reorganizations, which broke enough compiles to be annoying, were only annoying, not a serious problem for anyone. I doubted whether that stuff was worth changing, but that's just because I don't really get excited about partial recompiles. > And there's definitely some things > around that pretty much only still exist because changing them would > break too much stuff. Such as what? > But. > > I don't think that's a reason to not expose more functions > externally. Because the usual consequence of not exposing them is that > either ugly workarounds will be found, or code will just copy pasted > around. That's not in any way better, and much likely to be worse. Yes. > I'm not saying that we shouldn't use judgement, but I do think that the > current approach ridicules our vaunted extensibility in many cases. Double yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Async execution of postgres_fdw.
Hello. > I've gone ahead and marked this as Rejected. The concept of async > execution of postgres_fdw is certainly still open and a worthwhile goal, > but this implementation isn't the way to achieve that. It sounds fair. I'm satisfied that we have agreed that the goal is worthwhile. I'll show other implementations sooner. Thank you. > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > Stephen Frost writes: > > > I'm all for improving performance of postgres_fdw and would like to see > > > us support sending queries off to be worked asyncronously, but starting > > > execution on the remote server during ExecInitNode is against the > > > documentated FDW API spec. I discussed exactly this issue over a year > > > ago here: > > > > > http://www.postgresql.org/message-id/20131104032604.gb2...@tamriel.snowman.net > > > > > Sadly, there weren't any direct responses to that email, but I do recall > > > having a discussion on another thread (or in person?) with Tom where we > > > ended up agreeing that we can't simply remove that requirement from the > > > docs or the API. > > > > Yeah. There are at least a couple of reasons why not: > > Thanks for the reminders of those. > > > Also, so far as a quick review of the actual patch goes, I would really > > like to see this lose the "PFC" wrapper layer, which accounts for 95% of > > the code churn in the patch and doesn't seem to add any actual value. > > What it does add is unchecked malloc failure conditions. > > Agreed, the wrapper isn't doing anything particularly useful; I had > started out thinking that would be my first comment until it became > clear where all the performance improvement was coming from. > > I've gone ahead and marked this as Rejected. The concept of async > execution of postgres_fdw is certainly still open and a worthwhile goal, > but this implementation isn't the way to achieve that. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On 2015-05-10 21:53:45 -0400, Robert Haas wrote: > Please name EVEN ONE instance in which core development has been > prevented for fear of changing a function API. Even *moving* function declarations to a different file has been laudly and repeatedly complained about... And there's definitely some things around that pretty much only still exist because changing them would break too much stuff. But. I don't think that's a reason to not expose more functions externally. Because the usual consequence of not exposing them is that either ugly workarounds will be found, or code will just copy pasted around. That's not in any way better, and much likely to be worse. I'm not saying that we shouldn't use judgement, but I do think that the current approach ridicules our vaunted extensibility in many cases. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On 2015-05-10 21:26:26 -0400, Robert Haas wrote: > On Sun, May 10, 2015 at 8:41 PM, Tom Lane wrote: > > > This commit reverts create_plan_recurse() as static function. > > Yes. I am not convinced that external callers should be calling that, > > and would prefer not to enlarge createplan.c's API footprint without a > > demonstration that this is right and useful. (This is one of many > > ways in which this patch is suffering from having gotten committed > > without submitted use-cases.) Wasn't there a submitted use case? IIRC Kaigai had referenced some pg-strom (?) code using it? I'm failing to see how create_plan_recurse() being exposed externally is related to "having gotten committed without submitted use-cases". Even if submitted, presumably as simple as possible code, doesn't use it, that's not a proof that less simple code does not need it. > Your unwillingness to make functions global or to stick PGDLLIMPORT > markings on variables that people want access to is hugely > handicapping extension authors. Many people have complained about > that on multiple occasions. Frankly, I find it obstructionist and > petty. While I don't find the tone of the characterization super helpful, I do tend to agree that we're *far* too conservative on that end. I've now seen a significant number of extension that copied large swathes of code just to cope with individual functions not being available. And even cases where that lead to minor forks with such details changed. I know that I'm "fearful" of asking for functions being made public. Because it'll invariably get into a discussion of merits that's completely out of proportion with the size of the change. And if I, who has been on the list for a while now, am "afraid" in that way, you can be sure that others won't even dare to ask, lest argue their way through. I think the problem is that during development the default often is to create function as static if they're used only in one file. Which is fine. But it really doesn't work if it's a larger battle to change single incidences. Besides the pain of having to wait for the next major release... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On Sun, May 10, 2015 at 9:34 PM, Tom Lane wrote: > Robert Haas writes: >> Your unwillingness to make functions global or to stick PGDLLIMPORT >> markings on variables that people want access to is hugely >> handicapping extension authors. Many people have complained about >> that on multiple occasions. Frankly, I find it obstructionist and >> petty. > > Sure, we could export every last static function in the core code, > and extension authors would rejoice ... while development on the core > code basically stops for fear of breaking extensions. It's important > not to export things that we don't have to, especially when doing so > is really just a quick-n-dirty substitute for doing things properly. Please name EVEN ONE instance in which core development has been prevented for fear of changing a function API. Sure, we take those things into consideration, like trying to ensure that there will be type conflicts until people update their code, but I cannot recall a single instance in six and a half years of working on this project where that's been a real problem. I think this concern is entirely hypothetical. Besides, no one has ever proposed making every static function public. It's been proposed a handful of times for limited classes of functions - in this case ONE - and you've fought it every time despite clear consensus on the other side. I find that highly regrettable and I'm very sure I'm not the only one. I notice that you carefully didn't answer the other part of my question: what gives you the right to revert my commits without discussion or consensus, and do I have an equal right to change it back? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] "unaddressable bytes" in BRIN
Andres Freund just forwarded me a valgrind error report that Peter Geoghegan noticed: ==29892== Unaddressable byte(s) found during client check request ==29892==at 0x7D1317: PageAddItem (bufpage.c:314) ==29892==by 0x468106: brin_doinsert (brin_pageops.c:315) ==29892==by 0x4671A5: form_and_insert_tuple (brin.c:1178) ==29892==by 0x466006: brinbuildCallback (brin.c:596) ==29892==by 0x53F6E4: IndexBuildHeapRangeScan (index.c:2548) ==29892==by 0x53EC19: IndexBuildHeapScan (index.c:2161) ==29892==by 0x466443: brinbuild (brin.c:694) ==29892==by 0x92F09F: OidFunctionCall3Coll (fmgr.c:1649) ==29892==by 0x53E924: index_build (index.c:2024) ==29892==by 0x53D5FC: index_create (index.c:1099) ==29892==by 0x60B3B7: DefineIndex (indexcmds.c:605) ==29892==by 0x7E2142: ProcessUtilitySlow (utility.c:1203) ==29892== Address 0xccffd86 is 5,270 bytes inside a block of size 8,192 alloc'd ==29892==at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==29892==by 0x950425: AllocSetAlloc (aset.c:847) ==29892==by 0x9530A5: palloc (mcxt.c:821) ==29892==by 0x4C6A32: heap_tuple_untoast_attr (tuptoaster.c:215) ==29892==by 0x9302F2: pg_detoast_datum (fmgr.c:2238) ==29892==by 0x8794C8: numeric_lt (numeric.c:2060) ==29892==by 0x92E211: FunctionCall2Coll (fmgr.c:1323) ==29892==by 0x46C441: brin_minmax_add_value (brin_minmax.c:113) ==29892==by 0x92E49C: FunctionCall4Coll (fmgr.c:1375) ==29892==by 0x466108: brinbuildCallback (brin.c:618) ==29892==by 0x53F6E4: IndexBuildHeapRangeScan (index.c:2548) ==29892==by 0x53EC19: IndexBuildHeapScan (index.c:2161) What I think this means is that during an index build brin_minmax_add_value() called numeric_lt() which detoasted one of its input values; later, brin_doinsert() inserts a tuple containing the value, but it tries to use more bytes than were allocated. I haven't had time to actually study what is going on here, but wanted to archive this publicly. (Value detoasting evidently plays a role here, but I don't know how.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory
On 2015-05-10 21:09:14 -0400, Tom Lane wrote: > Andres Freund writes: > > I'm not sure what exactly to use as a performance benchmark > > here. For now I chose > > SELECT * FROM (SELECT ARRAY(SELECT generate_series(1, 1))) d, > > generate_series(1, 1000) repeat(i); > > that'll hit array_out, which uses iterators. > > Hmm, probably those results are swamped by I/O functions though. I did check with a quick profile, and the iteration itself is a significant part of the total execution time. > I'd suggest trying something that exercises array_map(), which > it looks like means doing an array coercion. Perhaps like so: > do $$ > declare a int4[]; > x int; > begin > a := array(select generate_series(1,1000)); > for i in 1..10 loop > x := array_length(a::int8[], 1); > end loop; > end$$; with the loop count set to 1 instead, I get: before: after: tps = 20.940092 (including connections establishing) after: tps = 20.568730 (including connections establishing) Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
Robert Haas writes: > Your unwillingness to make functions global or to stick PGDLLIMPORT > markings on variables that people want access to is hugely > handicapping extension authors. Many people have complained about > that on multiple occasions. Frankly, I find it obstructionist and > petty. Sure, we could export every last static function in the core code, and extension authors would rejoice ... while development on the core code basically stops for fear of breaking extensions. It's important not to export things that we don't have to, especially when doing so is really just a quick-n-dirty substitute for doing things properly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On Sun, May 10, 2015 at 8:41 PM, Tom Lane wrote: > Kohei KaiGai writes: >> I briefly checked your updates. >> Even though it is not described in the commit-log, I noticed a >> problematic change. > >> This commit reverts create_plan_recurse() as static function. > > Yes. I am not convinced that external callers should be calling that, > and would prefer not to enlarge createplan.c's API footprint without a > demonstration that this is right and useful. (This is one of many > ways in which this patch is suffering from having gotten committed > without submitted use-cases.) I really think that reverting somebody else's committed change without discussion is inappropriate. If I don't like the fact that you reverted this change, can I go revert it back? Your unwillingness to make functions global or to stick PGDLLIMPORT markings on variables that people want access to is hugely handicapping extension authors. Many people have complained about that on multiple occasions. Frankly, I find it obstructionist and petty. If you want to improve the design of this so that it does the same things more elegantly, fine: I'll get out of the way. If you just want to make things impossible that the patch previously made possible, I strongly object to that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multixacts woes
On Sun, May 10, 2015 at 1:40 PM, Noah Misch wrote: > I don't know whether this deserves prompt remediation, but if it does, I would > look no further than the hard-coded 25% figure. We permit users to operate > close to XID wraparound design limits. GUC maximums force an anti-wraparound > vacuum at no later than 93.1% of design capacity. XID assignment warns at > 99.5%, then stops at 99.95%. PostgreSQL mandates a larger cushion for > pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at > 50.0%. Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the > bulkiest mandatory cushion yet, an anti-wraparound vacuum when > pg_multixact/members is just 25% full. That's certainly one possible approach. I had discounted it because you can't really get more than a small multiple out of it, but getting 2-3x more room might indeed be enough to help some people quite a bit. Just raising the threshold from 25% to say 40% would buy back a healthy amount. Or, as you suggest, we could just add a GUC. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory
Andres Freund writes: > I'm not sure what exactly to use as a performance benchmark > here. For now I chose > SELECT * FROM (SELECT ARRAY(SELECT generate_series(1, 1))) d, > generate_series(1, 1000) repeat(i); > that'll hit array_out, which uses iterators. Hmm, probably those results are swamped by I/O functions though. I'd suggest trying something that exercises array_map(), which it looks like means doing an array coercion. Perhaps like so: do $$ declare a int4[]; x int; begin a := array(select generate_series(1,1000)); for i in 1..10 loop x := array_length(a::int8[], 1); end loop; end$$; Anyway, thanks for poking at it! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
> Kohei KaiGai writes: > > I briefly checked your updates. > > Even though it is not described in the commit-log, I noticed a > > problematic change. > > > This commit reverts create_plan_recurse() as static function. > > Yes. I am not convinced that external callers should be calling that, > and would prefer not to enlarge createplan.c's API footprint without a > demonstration that this is right and useful. (This is one of many > ways in which this patch is suffering from having gotten committed > without submitted use-cases.) > Hmm. I got it is intentional change. > > It means extension > > cannot have child node, even if it wants to add a custom-join logic. > > Please assume a simple case below: > > SELECT * FROM t0, t1 WHERE t0.a = t1.x; > > > An extension adds a custom join path, then its PlanCustomPath method will be > > called back to create a plan node once it gets chosen by planner. > > The role of PlanCustomPath is to construct a plan-node of itself, and > > plan-nodes > > of the source relations also. > > If create_plan_recurse() is static, we have no way to initialize > > plan-node for t0 > > and t1 scan even if join-logic itself is powerful than built-in ones. > > I find this argument quite unconvincing, because even granting that this > is an appropriate way to create child nodes of a CustomScan, there is a > lot of core code besides createplan.c that would not know about those > child nodes either. > > As a counterexample, suppose that your cool-new-join-method is capable of > joining three inputs at once. You could stick two of the children into > lefttree and righttree perhaps, but where are you gonna put the other? > > This comes back to the fact that trying to wedge join behavior into scan > node types was a pretty bad idea (as evidenced by the entirely bogus > decision that now scanrelid can be zero, which I rather doubt you've found > all the places that that broke). Really there should have been a new > CustomJoin node or something like that. If we'd done that, it would be > possible to design that node type more like Append, with any number of > child nodes. And we could have set things up so that createplan.c knows > about those child nodes and takes care of the recursion for you; it would > still not be a good idea to expose create_plan_recurse and hope that > callers of that would know how to use it correctly. > > I am still more than half tempted to say we should revert this entire > patch series and hope for a better design to be submitted for 9.6. > In the meantime, though, poking random holes in the modularity of core > code is a poor substitute for having designed a well-thought-out API. > > A possible compromise that we could perhaps still wedge into 9.5 is to > extend CustomPath with a List of child Paths, and CustomScan with a List > of child Plans, which createplan.c would know to build from the Paths, > and other modules would then also be aware of these children. I find that > uglier than a separate join node type, but it would be tolerable I guess. > At this moment, my custom-join logic add a dummy node to have two child nodes when it tries to join more than 3 relations. Yep, if CustomPath node (ForeignPath also?) can have a list of child-path nodes then core backend handles its initialization job, it will be more comfortable for extensions. I prefer this idea, rather than agree. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory
On 2015-05-10 12:09:41 -0400, Tom Lane wrote: > > * I find the ARRAY_ITER_VARS/ARRAY_ITER_NEXT macros rather ugly. I don't > > buy the argument that turning them into functions will be slower. I'd > > bet the contrary on common platforms. > Perhaps; do you want to do some testing and see? I've added new iterator functions using a on-stack state variable and array_iter_setup/next functions pretty analogous to the macros. And then converted arrayfuncs.c to use them. Codesize before introducing inline functions: assert: textdata bss dec hex filename 8142400 50562 295952 8488914 8187d2 src/backend/postgres optimize: textdata bss dec hex filename 6892928 50022 295920 7238870 6e74d6 src/backend/postgres After: assert: textdata bss dec hex filename 8133040 50562 295952 8479554 816342 src/backend/postgres optimize: textdata bss dec hex filename 6890256 50022 295920 7236198 6e6a66 src/backend/postgres That's a small decrease. I'm not sure what exactly to use as a performance benchmark here. For now I chose SELECT * FROM (SELECT ARRAY(SELECT generate_series(1, 1))) d, generate_series(1, 1000) repeat(i); that'll hit array_out, which uses iterators. pgbench -P 10 -h /tmp -p 5440 postgres -n -f /tmp/bench.sql -c 4 -T 60 (I chose parallel because it'll show icache efficiency differences) before, best of four: tps = 4.921260 (including connections establishing) after, best of four: tps = 5.046437 (including connections establishing) That's a relatively small difference. I'm not surprised, I'd not have expected anything major. Personally I think something roughly along those lines is both more robust and easier to maintain. Even if possibly need to protect against inlines not being available. Similarly using inline funcs for AARR_NDIMS/HASNULL does not appear to hamper performance and gets rid of the multiple evaluation risk. These patches are obviously WIP. Especially with the iter stuff it's possible that the concept could be extended a bit further. Greetings, Andres Freund >From 1fe208cda8df5341794ef6b76e11578ecd46cdd8 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 11 May 2015 02:43:06 +0200 Subject: [PATCH 1/2] WIP: Use inline functions for iteration. --- src/backend/utils/adt/arrayfuncs.c | 68 + src/include/c.h| 2 + src/include/utils/array.h | 78 +- 3 files changed, 115 insertions(+), 33 deletions(-) diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index 26fa648..287aca9 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -1037,7 +1037,7 @@ array_out(PG_FUNCTION_ARGS) int ndim, *dims, *lb; - ARRAY_ITER ARRAY_ITER_VARS(iter); + array_iter iter; ArrayMetaState *my_extra; /* @@ -1105,7 +1105,7 @@ array_out(PG_FUNCTION_ARGS) needquotes = (bool *) palloc(nitems * sizeof(bool)); overall_length = 1; /* don't forget to count \0 at end. */ - ARRAY_ITER_SETUP(iter, v); + array_iter_setup(&iter, v); for (i = 0; i < nitems; i++) { @@ -1114,7 +1114,8 @@ array_out(PG_FUNCTION_ARGS) bool needquote; /* Get source element, checking for NULL */ - ARRAY_ITER_NEXT(iter, i, itemvalue, isnull, typlen, typbyval, typalign); + itemvalue = array_iter_next(&iter, i, &isnull, + typlen, typbyval, typalign); if (isnull) { @@ -1542,7 +1543,7 @@ array_send(PG_FUNCTION_ARGS) *dim, *lb; StringInfoData buf; - ARRAY_ITER ARRAY_ITER_VARS(iter); + array_iter iter; ArrayMetaState *my_extra; /* @@ -1597,7 +1598,7 @@ array_send(PG_FUNCTION_ARGS) } /* Send the array elements using the element's own sendproc */ - ARRAY_ITER_SETUP(iter, v); + array_iter_setup(&iter, v); for (i = 0; i < nitems; i++) { @@ -1605,7 +1606,8 @@ array_send(PG_FUNCTION_ARGS) bool isnull; /* Get source element, checking for NULL */ - ARRAY_ITER_NEXT(iter, i, itemvalue, isnull, typlen, typbyval, typalign); + itemvalue = array_iter_next(&iter, i, &isnull, + typlen, typbyval, typalign); if (isnull) { @@ -3105,7 +3107,7 @@ array_map(FunctionCallInfo fcinfo, Oid retType, ArrayMapState *amstate) int typlen; bool typbyval; char typalign; - ARRAY_ITER ARRAY_ITER_VARS(iter); + array_iter iter; ArrayMetaState *inp_extra; ArrayMetaState *ret_extra; @@ -3165,7 +3167,7 @@ array_map(FunctionCallInfo fcinfo, Oid retType, ArrayMapState *amstate) nulls = (bool *) palloc(nitems * sizeof(bool)); /* Loop over source data */ - ARRAY_ITER_SETUP(iter, v); + array_iter_setup(&iter, v); hasnulls = false; for (i = 0; i < nitems; i++) @@ -3173,8 +3175,8 @@ array_map(FunctionCallInfo fcinfo, Oid retType, ArrayMapState *amstate) bool callit = true; /* Get source element, checking for NULL */ - ARRAY_ITER_NEXT(iter, i, fcinfo->arg[
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
Kohei KaiGai writes: > I briefly checked your updates. > Even though it is not described in the commit-log, I noticed a > problematic change. > This commit reverts create_plan_recurse() as static function. Yes. I am not convinced that external callers should be calling that, and would prefer not to enlarge createplan.c's API footprint without a demonstration that this is right and useful. (This is one of many ways in which this patch is suffering from having gotten committed without submitted use-cases.) > It means extension > cannot have child node, even if it wants to add a custom-join logic. > Please assume a simple case below: > SELECT * FROM t0, t1 WHERE t0.a = t1.x; > An extension adds a custom join path, then its PlanCustomPath method will be > called back to create a plan node once it gets chosen by planner. > The role of PlanCustomPath is to construct a plan-node of itself, and > plan-nodes > of the source relations also. > If create_plan_recurse() is static, we have no way to initialize > plan-node for t0 > and t1 scan even if join-logic itself is powerful than built-in ones. I find this argument quite unconvincing, because even granting that this is an appropriate way to create child nodes of a CustomScan, there is a lot of core code besides createplan.c that would not know about those child nodes either. As a counterexample, suppose that your cool-new-join-method is capable of joining three inputs at once. You could stick two of the children into lefttree and righttree perhaps, but where are you gonna put the other? This comes back to the fact that trying to wedge join behavior into scan node types was a pretty bad idea (as evidenced by the entirely bogus decision that now scanrelid can be zero, which I rather doubt you've found all the places that that broke). Really there should have been a new CustomJoin node or something like that. If we'd done that, it would be possible to design that node type more like Append, with any number of child nodes. And we could have set things up so that createplan.c knows about those child nodes and takes care of the recursion for you; it would still not be a good idea to expose create_plan_recurse and hope that callers of that would know how to use it correctly. I am still more than half tempted to say we should revert this entire patch series and hope for a better design to be submitted for 9.6. In the meantime, though, poking random holes in the modularity of core code is a poor substitute for having designed a well-thought-out API. A possible compromise that we could perhaps still wedge into 9.5 is to extend CustomPath with a List of child Paths, and CustomScan with a List of child Plans, which createplan.c would know to build from the Paths, and other modules would then also be aware of these children. I find that uglier than a separate join node type, but it would be tolerable I guess. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns
Etsuro Fujita writes: > [ EvalPlanQual-v6.patch ] I've started to study this in a little more detail, and I'm not terribly happy with some of the API decisions in it. In particular, I find the addition of "void *fdw_state" to ExecRowMark to be pretty questionable. That does not seem like a good place to keep random state. (I realize that WHERE CURRENT OF keeps some state in ExecRowMark, but that's a crock not something to emulate.) ISTM that in most scenarios, the state that LockForeignRow/FetchForeignRow would like to get at is probably the FDW state associated with the ForeignScanState that the tuple came from. Which this API doesn't help with particularly. I wonder if we should instead add a "ScanState*" field and expect the core code to set that up (ExecOpenScanRelation could do it with minor API changes...). I'm also a bit tempted to pass the TIDs to LockForeignRow and FetchForeignRow as Datums not ItemPointers. We have the Datum format available already at the call sites, so this is free as far as the core code is concerned, and would only cost another line or so for the FDWs. This is by no means sufficient to allow FDWs to use some other type than "tid" for row identifiers; but it would be a down payment on that problem, and at least would avoid nailing the rowids-are-tids assumption into yet another global API. Thoughts? Also, as I mentioned, I'd be a whole lot happier if we had a way to test this... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multixacts woes
On 5/8/15 1:15 PM, Robert Haas wrote: I somehow did not realize until very recently that we actually use two SLRUs to keep track of multixacts: one for the multixacts themselves (pg_multixacts/offsets) and one for the members (pg_multixacts/members). Confusingly, members are sometimes called offsets, and offsets are sometimes called IDs, or multixacts. FWIW, since I had to re-read this bit... * We use two SLRU areas, one for storing the offsets at which the data * starts for each MultiXactId in the other one. This trick allows us to * store variable length arrays of TransactionIds. Another way this could be 'fixed' would be to bump MultiXactOffset (but NOT MultiXactId) to uint64. That would increase the number of total members we could keep by a factor of 2^32. At that point wraparound wouldn't even be possible, because you can't have more than 2^31 members in an MXID (and there can only be 2^31 MXIDs). It may not be a trivial change through, because SLRUs are currently capped at 2^32 pages. This probably isn't a good long-term solution, but it would eliminate the risk of really frequent freeze vacuums. It sounds like Josh at least knows some people that could cause big problems for. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory
On 2015-05-10 12:09:41 -0400, Tom Lane wrote: > Andres Freund writes: > > Looking at this. First reading the patch to understand the details. > > > * The VARTAG_IS_EXPANDED(tag) trick in VARTAG_SIZE is unlikely to > > beneficial, before the compiler could implement the whole thing as a > > computed goto or lookup table, afterwards not. > > Well, if you're worried about the speed of VARTAG_SIZE() then the right > thing to do would be to revert your change that made enum vartag_external > distinct from the size of the struct, so that we could go back to just > using the second byte of a varattrib_1b_e datum as its size. As I said > at the time, inserting pad bytes to force each different type of toast > pointer to be a different size would probably be a better tradeoff than > what commit 3682025015 did. I doubt that'd be a net positive. Anyway, all I'm saying is that I can't see the VARTAG_IS_EXPANDED trick being beneficial in comparison to checking both explicit values. > > * You were rather bothered by the potential of multiple evaluations for > > the ilist stuff. And now the AARR macros are full of them... > > Yeah, there is doubtless some added cost there. But I think it's a better > answer than duplicating each function in toto; the code space that that > would take isn't free either. Yea, duplicating would be horrid. I'm more thinking of declaring some iterator state outside the macro, or just using an inline function. > > * I find the ARRAY_ITER_VARS/ARRAY_ITER_NEXT macros rather ugly. I don't > > buy the argument that turning them into functions will be slower. I'd > > bet the contrary on common platforms. > > Perhaps; do you want to do some testing and see? Not exactly with great joy, but I will. > > * The list of hardwired safe ops in exec_check_rw_parameter is somewhat > > sad. Don't have a better idea though. > > It's very sad, and it will be high on my list to improve that in 9.6. > But I do not think it's a fatal problem to ship it that way in 9.5, > because *as things stand today* those are the only two functions that > could benefit anyway. It won't really matter until we have extensions > that want to use this mechanism. Agreed that it's not fatal. > > ISTM that the worst case for the new situation is large arrays that > > exist as plpgsql variables but are only ever passed on. > > Well, more to the point, large arrays that are forced into expanded format > (costing a conversion step) but then we never do anything with them that > would benefit from that. Just saying they're "passed on" doesn't prove > much since the called function might or might not get any benefit. > array_length doesn't, but some other things would. Right. But I'm not sure it's that uncommon. > Your example with array_agg() is interesting, since one of the things on > my to-do list is to see whether we could change array_agg to return an > expanded array. Well, I chose array_agg only because it was a trivial way to generate a large array. The values could actually come from disk or something. > It would not be hard to make it build that representation > directly, instead of its present ad-hoc internal state. The trick would > be, when can you return the internal state without an additional copy > step? But maybe it could return a R/O pointer ... R/O or R/W? > > ... Expanding only in > > cases where it'd be beneficial is going to be hard. > > Yeah, improving that heuristic looks like a research project. Still, even > with all the limitations and to-do items in the patch now, I'm pretty sure > this will be a net win for practically all applications. I wonder if we could somehow 'mark' other toast pointers as 'expand if useful'. I.e. have something pretty much like ExpandedObjectHeader, except that it initially works like the indirect toast stuff. So eoh_context is set, but the data is still in the original datum. When accessed via 'plain' accessors that don't know about the expanded format the pointed to datum is returned. But when accessed by something "desiring" the expanded version it's expanded. It seemed that'd be doable expanding the new infrastructure a bit more. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
Tom, I briefly checked your updates. Even though it is not described in the commit-log, I noticed a problematic change. This commit reverts create_plan_recurse() as static function. It means extension cannot have child node, even if it wants to add a custom-join logic. Please assume a simple case below: SELECT * FROM t0, t1 WHERE t0.a = t1.x; An extension adds a custom join path, then its PlanCustomPath method will be called back to create a plan node once it gets chosen by planner. The role of PlanCustomPath is to construct a plan-node of itself, and plan-nodes of the source relations also. If create_plan_recurse() is static, we have no way to initialize plan-node for t0 and t1 scan even if join-logic itself is powerful than built-in ones. It was already discussed in the upthread, and people's consensus. Please revert create_plan_recurse() as like initial commit. Also, regarding of the *_scan_tlist, > I don't have time to pursue this idea right now, but I think it would be > a good change to squeeze into 9.5, just so that we could have some test > coverage on those parts of this patch. > Do you want just a test purpose module and regression test cases? Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default value dosen't get applyed in this situation
Aliouii Ali writes: > this a test case : This is not supposed to insert a default. Attach a default expression to the view column, if you want inserts on the view to have defaults. ALTER VIEW ... ALTER COLUMN ... SET DEFAULT is the way. (Note that in recent versions of PG, that view would be auto-insertable so you would not need to bother with the trigger. But the situation with defaults hasn't changed, and won't because it would be a backwards compatibility break with no real value-add.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Thu, May 7, 2015 at 3:23 AM, Amit Kapila wrote: >> > I observed one issue while working on this review comment. When we >> > try to destroy the parallel setup via ExecEndNode (as due to Limit >> > Node, it could not destroy after consuming all tuples), it waits for >> > parallel >> > workers to finish (WaitForParallelWorkersToFinish()) and parallel >> > workers >> > are waiting for master backend to signal them as their queue is full. >> > I think in such a case master backend needs to inform workers either >> > when >> > the scan is discontinued due to limit node or while waiting for parallel >> > workers to finish. >> >> Isn't this why TupleQueueFunnelShutdown() calls shm_mq_detach()? >> That's supposed to unstick the workers; any impending or future writes >> will just return SHM_MQ_DETACHED without waiting. > > Okay, that can work if we call it in ExecEndNode() before > WaitForParallelWorkersToFinish(), however what if we want to do something > like TupleQueueFunnelShutdown() when Limit node decides to stop processing > the outer node. We can traverse the whole plan tree and find the nodes > where > parallel workers needs to be stopped, but I don't think thats good way to > handle > it. If we don't want to stop workers from processing until > ExecutorEnd()--->ExecEndNode(), then it will lead to workers continuing till > that time and it won't be easy to get instrumentation/buffer usage > information > from workers (workers fill such information for master backend after > execution > is complete) as that is done before ExecutorEnd(). For Explain Analyze .., > we > can ensure that workers are stopped before fetching that information from > Funnel node, but the same is not easy for buffer usage stats required by > plugins as that operates at ExecutorRun() and ExecutorFinish() level where > we don't have direct access to node level information. You can refer > pgss_ExecutorEnd() where it completes the storage of stats information > before calling ExecutorEnd(). Offhand, I could not think of a good way to > do this, but one crude way could be introduce a new API > (ParallelExecutorEnd()) > for such plugins which needs to be called before completing the stats > accumulation. > This API will call ExecEndPlan() if parallelmodeNeeded flag is set and allow > accumulation of stats (InstrStartNode()/InstrStopNode()) OK, so if I understand you here, the problem is what to do about an "orphaned" worker. The Limit node just stops fetching from the lower nodes, and those nodes don't get any clue that this has happened, so their workers just sit there until the end of the query. Of course, that happens already, but it doesn't usually hurt very much, because the Limit node usually appears at or near the top of the plan. It could matter, though. Suppose the Limit is for a subquery that has a Sort somewhere (not immediately) beneath it. My guess is the Sort's tuplestore will stick around until after the subquery finishes executing for as long as the top-level query is executing, which in theory could be a huge waste of resources. In practice, I guess people don't really write queries that way. If they did, I think we'd have already developed some general method for fixing this sort of problem. I think it might be better to try to solve this problem in a more localized way. Can we arrange for planstate->instrumentation to point directory into the DSM, instead of copying the data over later? That seems like it might help, or perhaps there's another approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] default value dosen't get applyed in this situation
this a test case : CREATE TABLE tab ( _id bigserial NOT NULL, _name text, CONSTRAINT tab_pkey PRIMARY KEY (_id) ); CREATE TABLE tab_s1 ( CONSTRAINT tab_s1_check CHECK (1 = 1) ) INHERITS (tab); CREATE OR REPLACE VIEW v_tab AS SELECT tab._id, tab._name FROM tab; CREATE OR REPLACE FUNCTION tab_insert() RETURNS trigger AS $BODY$ BEGIN INSERT INTO tab_s1 VALUES ((NEW).*); RETURN NEW; END $BODY$ LANGUAGE plpgsql; CREATE TRIGGER tab_trigger INSTEAD OF INSERT ON v_tab FOR EACH ROW EXECUTE PROCEDURE tab_insert(); -- the query fail because _id is null insert into v_tab(_name) values ('');
Re: [HACKERS] [BUGS] BUG #13148: Unexpected deferred EXCLUDE constraint violation on derived table
postgres...@realityexists.net writes: > I have a deferred EXCLUDE constraint on a derived table. Inside a > transaction I insert a new row that conflicts with an existing one (so the > constraint would fail if it was immediate), delete the old row and run an > unrelated UPDATE on the new row, then try to commit. I would expect the > commit to succeed, since there is now no conflict, but it fails with > ERROR: conflicting key value violates exclusion constraint > "uq_derived_timeslice_dup_time_ex" Hm. The given test case is overcomplicated; in point of fact it will fail on any deferred exclusion constraint, eg DROP TABLE IF EXISTS derived_timeslice CASCADE; CREATE TABLE derived_timeslice ( timeslice_id integer NOT NULL, feature_id integer NOT NULL, name text, CONSTRAINT pk_derived_timeslice PRIMARY KEY (timeslice_id), CONSTRAINT uq_derived_timeslice_dup_time_ex EXCLUDE USING btree (feature_id WITH =) DEFERRABLE INITIALLY DEFERRED ); INSERT INTO derived_timeslice (timeslice_id, feature_id) VALUES (51, 1); BEGIN; -- Insert row that violates deferred constraint INSERT INTO derived_timeslice (timeslice_id, feature_id) VALUES (52, 1); -- Delete the old row - now there should be no more conflict DELETE FROM derived_timeslice WHERE timeslice_id = 51; -- Problem doesn't occur without an UPDATE statement UPDATE derived_timeslice SET name = 'Updated' WHERE timeslice_id = 52; -- This confirms there is only 1 row - no conflict SELECT * FROM derived_timeslice; COMMIT; -- Enforce constraint - error occurs here The cause of the problem seems to be that the UPDATE performs a HOT update of the new tuple, leaving in this case a dead tuple at (0,2) that is HOT updated by (0,3). When unique_key_recheck() is invoked for (0,2), it believes, correctly, that it has to perform the recheck anyway ... but it tells check_exclusion_constraint that the check is being performed for (0,2). So the index search inside check_exclusion_constraint finds the live tuple at (0,3) and thinks that is a conflict. This is reproducible clear back to 9.0 where exclusion constraints were added. The easiest fix seems to be to pass the HOT child's TID instead of the TID we were called for. (Note that the other path, for a regular unique constraint, is correct as-is because the original TID is what the index will know about.) The attached patch seems to fix the problem without breaking any existing regression tests, but I wonder if anyone can see a hole in it. regards, tom lane diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c index e49affb..28fccaf 100644 *** a/src/backend/commands/constraint.c --- b/src/backend/commands/constraint.c *** unique_key_recheck(PG_FUNCTION_ARGS) *** 89,97 * because this trigger gets queued only in response to index insertions; * which means it does not get queued for HOT updates. The row we are * called for might now be dead, but have a live HOT child, in which case ! * we still need to make the check. Therefore we have to use ! * heap_hot_search, not just HeapTupleSatisfiesVisibility as is done in ! * the comparable test in RI_FKey_check. * * This might look like just an optimization, because the index AM will * make this identical test before throwing an error. But it's actually --- 89,98 * because this trigger gets queued only in response to index insertions; * which means it does not get queued for HOT updates. The row we are * called for might now be dead, but have a live HOT child, in which case ! * we still need to make the check --- effectively, we're applying the ! * check against the live child row, although we can use the values from ! * this row since by definition all columns of interest to us are the ! * same. * * This might look like just an optimization, because the index AM will * make this identical test before throwing an error. But it's actually *** unique_key_recheck(PG_FUNCTION_ARGS) *** 159,165 { /* * Note: this is not a real insert; it is a check that the index entry ! * that has already been inserted is unique. */ index_insert(indexRel, values, isnull, &(new_row->t_self), trigdata->tg_relation, UNIQUE_CHECK_EXISTING); --- 160,168 { /* * Note: this is not a real insert; it is a check that the index entry ! * that has already been inserted is unique. Passing t_self is ! * correct even if t_self is now dead, because that is the TID the ! * index will know about. */ index_insert(indexRel, values, isnull, &(new_row->t_self), trigdata->tg_relation, UNIQUE_CHECK_EXISTING); *** unique_key_recheck(PG_FUNCTION_ARGS) *** 168,177 { /* * For exclusion constraints we just do the normal check, but now it's ! * okay to throw error. */ check_exclusion_constraint(trigdata->tg_relation, index
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
I've committed a cleanup patch along the lines discussed. One thought is that we could test the nondefault-scan-tuple logic without a lot of work by modifying the way postgres_fdw deals with columns it decides don't need to be fetched. Right now it injects NULL columns so that the remote query results still match the foreign table's rowtype, but that's not especially desirable or efficient. What we could do instead is generate an fdw_scan_tlist that just lists the columns we intend to fetch. I don't have time to pursue this idea right now, but I think it would be a good change to squeeze into 9.5, just so that we could have some test coverage on those parts of this patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multixacts woes
On 05/10/2015 10:30 AM, Robert Haas wrote: 2. We have some logic that causes autovacuum to run in spite of autovacuum=off when wraparound threatens. My commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the anti-wraparound protections for multixact members that exist for multixact IDs and for regular XIDs, but this remains an outstanding issue. I believe I know how to fix this, and will work up an appropriate patch based on some of Thomas's earlier work. I believe autovacuum=off is fortunately uncommon, but certainly getting this issue fixed is a good idea. Right. I suspect it's quite a bit more common than many people imagine. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multixacts woes
On Fri, May 08, 2015 at 02:15:44PM -0400, Robert Haas wrote: > My colleague Thomas Munro and I have been working with Alvaro, and > also with Kevin and Amit, to fix bug #12990, a multixact-related data > corruption bug. Thanks Alvaro, Amit, Kevin, Robert and Thomas for mobilizing to get this fixed. > 1. I believe that there is still a narrow race condition that cause > the multixact code to go crazy and delete all of its data when > operating very near the threshold for member space exhaustion. See > http://www.postgresql.org/message-id/ca+tgmozihwybetx8nzzptosjprg2kcr-nawgajkzclcbvj1...@mail.gmail.com > for the scenario and proposed fix. For anyone else following along, Thomas's subsequent test verified this threat beyond reasonable doubt: http://www.postgresql.org/message-id/CAEepm=3c32vpjloo45y0c3-3kwxnv2xm4japtsvjcrd2vg0...@mail.gmail.com > 2. We have some logic that causes autovacuum to run in spite of > autovacuum=off when wraparound threatens. My commit > 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the > anti-wraparound protections for multixact members that exist for > multixact IDs and for regular XIDs, but this remains an outstanding > issue. I believe I know how to fix this, and will work up an > appropriate patch based on some of Thomas's earlier work. That would be good to have, and its implementation should be self-contained. > 3. It seems to me that there is a danger that some users could see > extremely frequent anti-mxid-member-wraparound vacuums as a result of > this work. Granted, that beats data corruption or errors, but it > could still be pretty bad. The default value of > autovacuum_multixact_freeze_max_age is 4. > Anti-mxid-member-wraparound vacuums kick in when you exceed 25% of the > addressable space, or 1073741824 total members. So, if your typical > multixact has more than 1073741824/4 = ~2.68 members, you're > going to see more autovacuum activity as a result of this change. > We're effectively capping autovacuum_multixact_freeze_max_age at > 1073741824/(average size of your multixacts). If your multixacts are > just a couple of members (like 3 or 4) this is probably not such a big > deal. If your multixacts typically run to 50 or so members, your > effective freeze age is going to drop from 400m to ~21.4m. At that > point, I think it's possible that relminmxid advancement might start > to force full-table scans more often than would be required for > relfrozenxid advancement. If so, that may be a problem for some > users. I don't know whether this deserves prompt remediation, but if it does, I would look no further than the hard-coded 25% figure. We permit users to operate close to XID wraparound design limits. GUC maximums force an anti-wraparound vacuum at no later than 93.1% of design capacity. XID assignment warns at 99.5%, then stops at 99.95%. PostgreSQL mandates a larger cushion for pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at 50.0%. Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the bulkiest mandatory cushion yet, an anti-wraparound vacuum when pg_multixact/members is just 25% full. The pgsql-bugs thread driving that patch did reject making it GUC-controlled, essentially on the expectation that 25% should be adequate for everyone: http://www.postgresql.org/message-id/CA+Tgmoap6-o_5ESu5X2mBRVht_F+KNoY+oO12OvV_WekSA=e...@mail.gmail.com http://www.postgresql.org/message-id/20150506143418.gt2...@alvh.no-ip.org http://www.postgresql.org/message-id/1570859840.1241196.1430928954257.javamail.ya...@mail.yahoo.com > What can we do about this? Alvaro proposed back-porting his fix for > bug #8470, which avoids locking a row if a parent subtransaction > already has the same lock. Like Andres and yourself, I would not back-patch it. > Another thought that occurs to me is that if we had a freeze map, it > would radically decrease the severity of this problem, because > freezing would become vastly cheaper. I wonder if we ought to try to > get that into 9.5, even if it means holding up 9.5. Declaring that a release will wait for a particular feature has consistently ended badly for PostgreSQL, and this feature is just in the planning stages. If folks are ready to hit the ground running on the project, I suggest they do so; a non-WIP submission to the first 9.6 CF would be a big accomplishment. The time to contemplate slipping it into 9.5 comes after the patch is done. If these aggressive ideas earn more than passing consideration, the 25% threshold should become user-controllable after all. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multixacts woes
On 05/08/2015 09:57 PM, Josh Berkus wrote: [snip] It's certainly possible to have workloads triggering that, but I think it's relatively uncommon. I in most cases I've checked the multixact consumption rate is much lower than the xid consumption. There are some exceptions, but often that's pretty bad code. I have a couple workloads in my pool which do consume mxids faster than xids, due to (I think) execeptional numbers of FK conflicts. It's definitely unusual, though, and I'm sure they'd rather have corruption protection and endure some more vacuums. Seen corruption happen recently with OpenBravo on PostgreSQL 9.3.6 (Debian; binaries upgraded from 9.3.2) in a cluster pg_upgraded from 9.2.4 (albeit with quite insufficient autovacuum / poorly configured Postgres) I fear that this might be more widespread than we thought, depending on the exact workload/activity pattern. If it would help, I can try to get hold of a copy of the cluster in question (if the customer keeps any copy at all) If we do this, though, it might be worthwhile to backport the multixact age function, so that affected users can check and schedule mxact wraparound vacuums themselves, something you currently can't do on 9.3. Thanks, J.L. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory
Andres Freund writes: > Looking at this. First reading the patch to understand the details. > * The VARTAG_IS_EXPANDED(tag) trick in VARTAG_SIZE is unlikely to > beneficial, before the compiler could implement the whole thing as a > computed goto or lookup table, afterwards not. Well, if you're worried about the speed of VARTAG_SIZE() then the right thing to do would be to revert your change that made enum vartag_external distinct from the size of the struct, so that we could go back to just using the second byte of a varattrib_1b_e datum as its size. As I said at the time, inserting pad bytes to force each different type of toast pointer to be a different size would probably be a better tradeoff than what commit 3682025015 did. > * It'd be nice if the get_flat_size comment in expandeddatm.h could > specify whether the header size is included. That varies enough around > toast that it seems worthwhile. OK. > * You were rather bothered by the potential of multiple evaluations for > the ilist stuff. And now the AARR macros are full of them... Yeah, there is doubtless some added cost there. But I think it's a better answer than duplicating each function in toto; the code space that that would take isn't free either. > * I find the ARRAY_ITER_VARS/ARRAY_ITER_NEXT macros rather ugly. I don't > buy the argument that turning them into functions will be slower. I'd > bet the contrary on common platforms. Perhaps; do you want to do some testing and see? > * Not a fan of the EH_ prefix in array_expanded.c and EOH_ > elsewhere. Just looks ugly to me. Whatever. I'm not wedded to that naming if you have a better suggestion. > * The list of hardwired safe ops in exec_check_rw_parameter is somewhat > sad. Don't have a better idea though. It's very sad, and it will be high on my list to improve that in 9.6. But I do not think it's a fatal problem to ship it that way in 9.5, because *as things stand today* those are the only two functions that could benefit anyway. It won't really matter until we have extensions that want to use this mechanism. > * "Also, a C function that is modifying a read-write expanded value > in-place should take care to leave the value in a sane state if it > fails partway through." - that's a pretty hefty requirement imo. It is, which is one reason that I'm nervous about relaxing the test in exec_check_rw_parameter. Still, it was possible to code array_set_element to meet that restriction without too much pain. I'm inclined to leave the stronger requirement in the docs for now, until we get more push-back. > * The forced RW->RO conversion in subquery scans is a bit sad, but I > seems like something left for later. Yes, there are definitely some things that look like future opportunities here. > Somewhere in the thread you comment on the fact that it's a bit sad that > plpgsql is the sole benefactor of this (unless some function forces > expansion internally). I'd be ok to leave it at that for now. It'd be > quite cool to get some feedback from postgis folks about the suitability > of this for their cases. Indeed, that's the main reason I'm eager to ship something in 9.5, even if it's not perfect. I hope those guys will look at it and use it, and maybe give us feedback on ways to improve it. > ISTM that the worst case for the new situation is large arrays that > exist as plpgsql variables but are only ever passed on. Well, more to the point, large arrays that are forced into expanded format (costing a conversion step) but then we never do anything with them that would benefit from that. Just saying they're "passed on" doesn't prove much since the called function might or might not get any benefit. array_length doesn't, but some other things would. Your example with array_agg() is interesting, since one of the things on my to-do list is to see whether we could change array_agg to return an expanded array. It would not be hard to make it build that representation directly, instead of its present ad-hoc internal state. The trick would be, when can you return the internal state without an additional copy step? But maybe it could return a R/O pointer ... > ... Expanding only in > cases where it'd be beneficial is going to be hard. Yeah, improving that heuristic looks like a research project. Still, even with all the limitations and to-do items in the patch now, I'm pretty sure this will be a net win for practically all applications. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
Robert Haas writes: > On Sat, May 9, 2015 at 1:05 PM, Tom Lane wrote: >> For these reasons, I think that if an FDW tried to be laxer than "tables >> must be on the same pg_foreign_server entry to be joined", the odds >> approach unity that it would be broken, and probably dangerously broken. >> So we should just make that check for the FDWs. Anybody who thinks >> they're smarter than the average bear can use set_join_pathlist_hook, >> but they are probably wrong. > Drilling down into postgres_fdw's connection properties seems pretty > silly; the user isn't likely to create two SERVER objects that are > identical and then choose which one to use at random, and if they do, > they deserve what they get. The universe of FDWs, however, is > potentially bigger than that. What does a server even mean for > file_fdw, for example? Nothing, which is why you'd only ever create one per database, and so the issue doesn't arise anyway. It would only be sane to create multiple servers per FDW if there were a meaningful difference between them. In any case, since the existing code doesn't support "remote" joins involving a local table unless you use the join-path hook, this argument seems pretty academic. If we tighten the test to be same-server, we will benefit all but very weirdly designed FDWs. Anybody who's not happy with that can still use the hook (and I continue to maintain that they will probably have subtle bugs, but whatever). Another point worth making is that the coding I have in mind doesn't really do anything with RelOptInfo.serverid except compare it for equality. So an FDW that wants to consider some servers interchangeable for joining purposes could override the value at GetForeignPaths time (ie replace "serverid" with the OID of a preferred server), and then it would get GetForeignJoinPaths calls as desired. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN range operator class
> I pushed patches 04 and 07, as well as adopting some of the changes to > the regression test in 06. I'm afraid I caused a bit of merge pain for > you -- sorry about that. No problem. I rebased the remaining ones. brin-inclusion-v09-02-strategy-numbers.patch Description: Binary data brin-inclusion-v09-03-remove-assert-checking.patch Description: Binary data brin-inclusion-v09-05-box-vs-point-operators.patch Description: Binary data brin-inclusion-v09-06-inclusion-opclasses.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On Sat, May 9, 2015 at 1:05 PM, Tom Lane wrote: >> I originally wanted to go quite the other way with this and check for >> join pushdown via handler X any time at least one of the two relations >> involved used handler X, even if the other one used some other handler >> or was a plain table. In particular, it seems to me quite plausible >> to want to teach an FDW that a certain local table is replicated on a >> remote node, allowing a join between a foreign table and a plain table >> to be pushed down. > > If we did do something like that, I think a saner implementation would > involve substituting a foreign table for the local one earlier, via view > expansion. So by the time we are doing join planning, there would be no > need to consider cross-server joins anyway. Huh? You can't do this at rewrite time; it's very fundamentally a planning problem. Suppose the user wants to join A, B, and C, where A is a plain table, B is a plain table that is replicated on a foreign server, and C is a foreign table on that same foreign server. If we decide to join B to C first, we probably want to push down the join, although maybe not, if we estimate that B JOIN C will have more rows than C. If we decide to join A to B first, we want to use the local copy of B. >> This infrastructure can't be used that way anyhow, >> so maybe there's no harm in tightening it up, but I'm wary of >> circumscribing what FDW authors can do. > > Somebody who's really intent on shooting themselves in the foot can always > use the set_join_pathlist_hook to inject paths for arbitrary joins. > The FDW mechanism should support reasonable use cases without undue > complication, and I doubt that what we've got now is adding anything > except complication and risk of bugs. > > For the archives' sake, let me lay out a couple of reasons why an FDW > that tried to allow cross-server joins would almost certainly be broken, > and broken in security-relevant ways. Suppose for instance that > postgres_fdw tried to be smart and drill down into foreign tables' server > IDs to allow joining of any two tables that have the same effective host > name, port, database name, user name, and anything else you think would be > relevant to its choice of connections. The trouble with that is that the > user mapping is context dependent, in particular one local userID might > map to the same remote user name for two different server OIDs, while > another might map to different user names. So if we plan a query under > the first userID we might decide it's okay to do the join remotely. > Then, if we re-use that plan while executing as another userID (which is > entirely possible) what will probably happen is that the remote join > query will get sent off under one or the other of the remote usernames > associated with the second local userID. This could lead to either a > permission failure, or a remote table access that should not be allowed > to the current local userID. Admittedly, such cases might be rare in > practice, but it's still a security hole. Also, even if the FDW is > defensive enough to recheck full matching of the tables' connection > properties at execution time, there's not much it could do about the > situation except fail; it couldn't cause a re-plan to occur. > > For another case, we do not have any mechanism whereby operations like > ALTER SERVER OPTIONS could invalidate existing plans. Thus, even if > the two tables' connection properties matched at plan time, there's no > guarantee that they still match at execution. This is probably not a > security hole (at least not if you assume foreign-server owners are > trusted users), but it's still a bug that exists only if someone tries > to allow cross-server joins. > > For these reasons, I think that if an FDW tried to be laxer than "tables > must be on the same pg_foreign_server entry to be joined", the odds > approach unity that it would be broken, and probably dangerously broken. > So we should just make that check for the FDWs. Anybody who thinks > they're smarter than the average bear can use set_join_pathlist_hook, > but they are probably wrong. Drilling down into postgres_fdw's connection properties seems pretty silly; the user isn't likely to create two SERVER objects that are identical and then choose which one to use at random, and if they do, they deserve what they get. The universe of FDWs, however, is potentially bigger than that. What does a server even mean for file_fdw, for example? I can't think of any reason why somebody would want to implement joins inside file_fdw, but if they did, all the things being joined would be local files, so the server ID doesn't really matter. Now you may say that's a silly use case, but it's less obviously silly if the files contain structured data, as with cstore_fdw, yet the server ID could still be not especially relevant. Maybe you've got servers representing filesystem directories; that shouldn't preclude cross "server" joins.
Re: [HACKERS] multixacts woes
On Fri, May 8, 2015 at 5:39 PM, Alvaro Herrera wrote: >> 1. I believe that there is still a narrow race condition that cause >> the multixact code to go crazy and delete all of its data when >> operating very near the threshold for member space exhaustion. See >> http://www.postgresql.org/message-id/ca+tgmozihwybetx8nzzptosjprg2kcr-nawgajkzclcbvj1...@mail.gmail.com >> for the scenario and proposed fix. > > I agree that there is a problem here. OK, I'm glad we now agree on that, since it seemed like you were initially unconvinced. >> 2. We have some logic that causes autovacuum to run in spite of >> autovacuum=off when wraparound threatens. My commit >> 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the >> anti-wraparound protections for multixact members that exist for >> multixact IDs and for regular XIDs, but this remains an outstanding >> issue. I believe I know how to fix this, and will work up an >> appropriate patch based on some of Thomas's earlier work. > > I believe autovacuum=off is fortunately uncommon, but certainly getting > this issue fixed is a good idea. Right. >> 3. It seems to me that there is a danger that some users could see >> extremely frequent anti-mxid-member-wraparound vacuums as a result of >> this work. > > I agree with the idea that the long term solution to this issue is to > make the freeze process cheaper. I don't have any good ideas on how to > make this less severe in the interim. You say the fix for #8470 is not > tested thoroughly enough to back-patch it just yet, and I can behind > that; so let's wait until 9.5 has been tested a bit more. Sounds good. > Another avenue not mentioned and possibly worth exploring is making some > more use of the multixact cache, and reuse multixacts that were > previously issued and have the same effects as the one you're interested > in: for instance, if you want a multixact with locking members > (10,20,30) and you have one for (5,10,20,30) but transaction 5 has > finished, then essentially both have the same semantics (because locks > don't have any effect the transaction has finished) so we can use it > instead of creating a new one. I have no idea how to implement this; > obviously, having to run TransactionIdIsCurrentTransactionId for each > member on each multixact in the cache each time you want to create a new > multixact is not very reasonable. This sounds to me like it's probably too clever. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers